Welcome to Trinity-RFT’s documentation!#

💡 What is Trinity-RFT?#

Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuning (RFT) of large language models (LLMs). It decouples the RFT process into three key components: Explorer, Trainer, and Buffer, and provides functionalities for users with different backgrounds and objectives:

🤖 For agent application developers. [tutorial]
- Train agent applications to improve their ability to complete tasks in specific environments.
- Examples: Multi-Turn Interaction, ReAct Agent
🧠 For RL algorithm researchers. [tutorial]
- Design and validate new reinforcement learning algorithms using compact, plug-and-play modules.
- Example: Mixture of SFT and GRPO
📊 For data engineers. [tutorial]
- Create datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
- Example: Data Processing

🌟 Key Features#

Flexible RFT Modes:
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL.
- Rollout and training can run separately and scale independently across devices.
- Boost sample and time efficiency by experience replay.
Agentic RL Support:
- Supports both concatenated and general multi-step agentic workflows.
- Able to directly train agent applications developed using agent frameworks like AgentScope.
Full-Lifecycle Data Pipelines:
- Enables pipeline processing of rollout tasks and experience samples.
- Active data management (e.g., prioritization, cleaning, augmentation) throughout the RFT lifecycle.
- Native support for multi-task joint learning.
User-Friendly Design:
- Plug-and-play modules and decoupled architecture, facilitating easy adoption and development.
- Rich graphical user interfaces enable low-code usage.

🔨 Tutorials and Guidelines#

Category	Tutorial / Guideline
Run diverse RFT modes	+ Quick example: GRPO on GSM8k + Off-policy RFT + Fully asynchronous RFT + Offline learning by DPO or SFT
Multi-step agentic scenarios	+ Concatenated multi-turn workflow + General multi-step workflow + ReAct workflow with an agent framework
Advanced data pipelines	+ Rollout task mixing and selection + Experience replay + Advanced data processing & human-in-the-loop
Algorithm development / research	+ RL algorithm development with Trinity-RFT (paper) + Non-verifiable domains: RULER, trainable RULER, rubric-as-reward + Research project: group-relative REINFORCE (paper)
Going deeper into Trinity-RFT	+ Full configurations + Benchmark toolkit for quick verification and experimentation + Understand the coordination between explorer and trainer

Acknowledgements#

This project is built upon many excellent open-source projects, including:

verl and PyTorch’s FSDP for LLM training;
vLLM for LLM inference;
Data-Juicer for data processing pipelines;
AgentScope for agentic workflow;
Ray for distributed systems;
we have also drawn inspirations from RL frameworks like OpenRLHF, TRL and ChatLearn;
……

Citation#

@misc{trinity-rft,
      title={Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models},
      author={Xuchen Pan and Yanxi Chen and Yushuo Chen and Yuchang Sun and Daoyuan Chen and Wenhao Zhang and Yuexiang Xie and Yilun Huang and Yilei Zhang and Dawei Gao and Yaliang Li and Bolin Ding and Jingren Zhou},
      year={2025},
      eprint={2505.17826},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.17826},
}

Welcome to Trinity-RFT’s documentation!

Contents