Developer Guide

This guide will introduce how to add new task types to Trinity-RFT and provide relevant development guidelines.

Note

Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.

Creating New Task Types

Trinity-RFT supports developers in registering new task types (e.g., multi-round interaction scenarios). Below are the steps for creating a new task type.

Step 0: Basic Concepts

Before starting development, it’s important to understand several core concepts:

Task: Represents a data structure that can be converted into a Workflow. The Task data format may vary significantly depending on the type of task:
- Math problems: Task contains the problem description and the standard answer.
- Programming scenarios: Task includes the problem description, test cases, runtime environment, and other complex information.
Workflow: Can be understood as the running state of a Task, defining the interaction flow between Agents and Environments, including logic similar to Rollout and Reward calculations in other frameworks. After execution, it generates Experience. Trinity-RFT has several built-in Workflows:
- MathWorkflow: For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
- CodeWorkflow (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
- …
Experience: The output of running a Workflow, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, Experience includes lists of token_ids, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.

Step 1: Prepare Task Dataset

Each Task contains various parameters needed to initialize the Workflow. Due to significant differences in initialization parameters across different Workflows, the following example uses a math problem scenario.

In the math problem scenario, the Task dataset can be a jsonl file, where each line’s JSON contains question and answer fields representing the problem description and standard answer, respectively.

{"question": "1+1=", "answer": "2"}
{"question": "2+2=", "answer": "4"}
...

Step 2: Write Workflow

The core of creating a new task type is writing a new Workflow, whose base class interface is as follows:

# import some packages

class Workflow(ABC):

    def __init__(
        self,
        model: ModelWrapper,
        task: Task,
        auxiliary_models: Optional[List[openai.OpenAI]] = None,
    ):
        self.model = model
        self.auxiliary_models = auxiliary_models

    @abstractmethod
    def run(self) -> List[Experience]:
        """Run the workflow and return a list of Experiences."""

Developers can register their own Workflow through the WORKFLOWS.register_module method, but need to ensure that the name does not conflict with existing Workflow classes.

# import some packages
from trinity.common.workflows.workflow import WORKFLOWS

@WORKFLOWS.register_module("my_workflow")
class MyWorkflow(Workflow):
    pass

Initialization Parameters

When initializing, Workflow receives the following parameters:

model: The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text response_text, full sequence token ids tokens, prompt part token length prompt_length, and a list of output token logprobs logprobs).
task: An instance of Task, which is generated by one line of data from the Task dataset. The raw_task field contains the Dict format source data, which can be used to construct the Workflow instance. The rollout_args field contains the parameters for the rollout process, such as n, temperature, top_k and top_p.
auxiliary_models: A list of auxiliary models, which will not be trained. All of them provide OpenAI compatible API.

Tip

The model also provided an OpenAI compatible API, you can switch to it by setting explorer.rollout_model.enable_openai_api to true in your config file and use model.get_openai_client() to get an openai.OpenAI instance in your workflow.

Example Code

Below is a simple example demonstrating how to implement a math problem Workflow:

@WORKFLOWS.register_module("example_workflow")
class ExampleWorkflow(Workflow):

    def __init__(self, model: ModelWrapper, task: Task, **kwargs):
        super().__init__(model, **kwargs)
        self.question = task.raw_task.get("question")
        self.answer = task.raw_task.get("answer")

    def calculate_reward(self, response: str, truth: str) -> float:
        if response == truth:
            return 1.0
        else:
            return 0.0

    def run(self) -> List[Experience]:
        response = self.model.chat(
            [
                {
                    "role": "user",
                    "content": f"Question:\n{self.question}",
                }
            ],
            n=self.task.rollout_args.n,
            temperature=self.task.rollout_args.temperature,
        )
        reward: float = self.calculate_reward(response.response_text, self.answer)
        return [
            Experience(
                tokens=response.tokens,
                prompt_length=response.prompt_length,
                reward=reward,
                logprobs=response.logprobs,
            )
        ]

For some heavy workflows, the initialization process may be time-consuming. In this case, you can implement the resettable and reset methods to avoid re-initialization.

@WORKFLOWS.register_module("example_workflow")
class ExampleWorkflow(Workflow):
    # some code
    # ...

    def resettable(self):
        return True

    def reset(self, task: Task):
        self.question = task.raw_task.get("question")
        self.answer = task.raw_task.get("answer")

Step 3: Modify Configuration File

After completing the development of the Workflow, you need to modify the configuration file to set the default_workflow_type in the buffer.explorer_input domain to the newly registered Workflow name.

buffer:
  # Other fields
  explorer_input:
    taskset:
      name: example_task
      storage_type: file
      path: /path/to/taskset
        # Other fields
    default_workflow_type: example_workflow
# Other fields

Check Code Style

Before submitting the code, make sure it passes the code style check. Follow these steps:

# Install code style checking tools
cd <path_to_trinity_rft>
# bash
pip install -e .[dev]
# zsh
# pip install -e .\[dev\]

# Run code style checks
pre-commit --all-files

# Commit the code after all checks pass
git commit -am "create example workflow"