Developer Guide
This guide will introduce how to add new task types to Trinity-RFT and provide relevant development guidelines.
Note
Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
Creating New Task Types
Trinity-RFT supports developers in registering new task types (e.g., multi-round interaction scenarios). Below are the steps for creating a new task type.
Step 0: Basic Concepts
Before starting development, it’s important to understand several core concepts:
Task: Represents a data structure that can be converted into a
Workflow
. TheTask
data format may vary significantly depending on the type of task:Math problems:
Task
contains the problem description and the standard answer.Programming scenarios:
Task
includes the problem description, test cases, runtime environment, and other complex information.
Workflow: Can be understood as the running state of a
Task
, defining the interaction flow between Agents and Environments, including logic similar to Rollout and Reward calculations in other frameworks. After execution, it generatesExperience
. Trinity-RFT has several built-inWorkflows
:MathWorkflow
: For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).CodeWorkflow
(Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.…
Experience: The output of running a
Workflow
, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms,Experience
includes lists of token_ids, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.
Step 1: Prepare Task Dataset
Each Task
contains various parameters needed to initialize the Workflow
. Due to significant differences in initialization parameters across different Workflows
, the following example uses a math problem scenario.
In the math problem scenario, the Task
dataset can be a jsonl
file, where each line’s JSON contains question
and answer
fields representing the problem description and standard answer, respectively.
{"question": "1+1=", "answer": "2"}
{"question": "2+2=", "answer": "4"}
...
Step 2: Write Workflow
The core of creating a new task type is writing a new Workflow
, whose base class interface is as follows:
# import some packages
class Workflow(ABC):
def __init__(
self,
model: ModelWrapper,
task: Task,
auxiliary_models: Optional[List[openai.OpenAI]] = None,
):
self.model = model
self.auxiliary_models = auxiliary_models
@abstractmethod
def run(self) -> List[Experience]:
"""Run the workflow and return a list of Experiences."""
Developers can register their own Workflow
through the WORKFLOWS.register_module
method, but need to ensure that the name does not conflict with existing Workflow
classes.
# import some packages
from trinity.common.workflows.workflow import WORKFLOWS
@WORKFLOWS.register_module("my_workflow")
class MyWorkflow(Workflow):
pass
Initialization Parameters
When initializing, Workflow
receives the following parameters:
model
: The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply textresponse_text
, full sequence token idstokens
, prompt part token lengthprompt_length
, and a list of output token logprobslogprobs
).task
: An instance ofTask
, which is generated by one line of data from theTask
dataset. Theraw_task
field contains theDict
format source data, which can be used to construct theWorkflow
instance. Therollout_args
field contains the parameters for the rollout process, such asn
,temperature
,top_k
andtop_p
.auxiliary_models
: A list of auxiliary models, which will not be trained. All of them provide OpenAI compatible API.
Tip
The model
also provided an OpenAI compatible API, you can switch to it by setting explorer.rollout_model.enable_openai_api
to true
in your config file and use model.get_openai_client()
to get an openai.OpenAI
instance in your workflow.
Example Code
Below is a simple example demonstrating how to implement a math problem Workflow
:
@WORKFLOWS.register_module("example_workflow")
class ExampleWorkflow(Workflow):
def __init__(self, model: ModelWrapper, task: Task, **kwargs):
super().__init__(model, **kwargs)
self.question = task.raw_task.get("question")
self.answer = task.raw_task.get("answer")
def calculate_reward(self, response: str, truth: str) -> float:
if response == truth:
return 1.0
else:
return 0.0
def run(self) -> List[Experience]:
response = self.model.chat(
[
{
"role": "user",
"content": f"Question:\n{self.question}",
}
],
n=self.task.rollout_args.n,
temperature=self.task.rollout_args.temperature,
)
reward: float = self.calculate_reward(response.response_text, self.answer)
return [
Experience(
tokens=response.tokens,
prompt_length=response.prompt_length,
reward=reward,
logprobs=response.logprobs,
)
]
For some heavy workflows, the initialization process may be time-consuming.
In this case, you can implement the resettable
and reset
methods to avoid re-initialization.
@WORKFLOWS.register_module("example_workflow")
class ExampleWorkflow(Workflow):
# some code
# ...
def resettable(self):
return True
def reset(self, task: Task):
self.question = task.raw_task.get("question")
self.answer = task.raw_task.get("answer")
Step 3: Modify Configuration File
After completing the development of the Workflow
, you need to modify the configuration file to set the default_workflow_type
in the buffer.explorer_input
domain to the newly registered Workflow
name.
buffer:
# Other fields
explorer_input:
taskset:
name: example_task
storage_type: file
path: /path/to/taskset
# Other fields
default_workflow_type: example_workflow
# Other fields
Check Code Style
Before submitting the code, make sure it passes the code style check. Follow these steps:
# Install code style checking tools
cd <path_to_trinity_rft>
# bash
pip install -e .[dev]
# zsh
# pip install -e .\[dev\]
# Run code style checks
pre-commit --all-files
# Commit the code after all checks pass
git commit -am "create example workflow"