trinity
Subpackages
- trinity.algorithm
- Subpackages
- trinity.algorithm.add_strategy
- trinity.algorithm.advantage_fn
- Submodules
- trinity.algorithm.advantage_fn.advantage_fn module
- trinity.algorithm.advantage_fn.grpo_advantage module
- trinity.algorithm.advantage_fn.opmd_advantage module
- trinity.algorithm.advantage_fn.ppo_advantage module
- trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage module
- trinity.algorithm.advantage_fn.remax_advantage module
- trinity.algorithm.advantage_fn.rloo_advantage module
- Module contents
- trinity.algorithm.entropy_loss_fn
- trinity.algorithm.kl_fn
- trinity.algorithm.policy_loss_fn
- Submodules
- trinity.algorithm.policy_loss_fn.dpo_loss module
- trinity.algorithm.policy_loss_fn.gspo_policy_loss module
- trinity.algorithm.policy_loss_fn.mix_policy_loss module
- trinity.algorithm.policy_loss_fn.opmd_policy_loss module
- trinity.algorithm.policy_loss_fn.policy_loss_fn module
- trinity.algorithm.policy_loss_fn.ppo_policy_loss module
- trinity.algorithm.policy_loss_fn.sft_loss module
- Module contents
- trinity.algorithm.sample_strategy
- Submodules
- trinity.algorithm.algorithm module
- trinity.algorithm.algorithm_manager module
- trinity.algorithm.key_mapper module
- trinity.algorithm.utils module
- Module contents
- Subpackages
- trinity.buffer
- Subpackages
- Submodules
- trinity.buffer.buffer module
- trinity.buffer.buffer_reader module
- trinity.buffer.buffer_writer module
- trinity.buffer.queue module
- trinity.buffer.ray_wrapper module
- trinity.buffer.utils module
- Module contents
- trinity.common
- Subpackages
- trinity.common.models
- trinity.common.rewards
- Submodules
- trinity.common.rewards.accuracy_reward module
- trinity.common.rewards.agents_reward module
- trinity.common.rewards.countdown_reward module
- trinity.common.rewards.dapo_reward module
- trinity.common.rewards.format_reward module
- trinity.common.rewards.human_reward module
- trinity.common.rewards.math_reward module
- trinity.common.rewards.reward_fn module
- trinity.common.rewards.tool_reward module
- trinity.common.rewards.utils module
- Module contents
- trinity.common.workflows
- Submodules
- trinity.common.workflows.customized_math_workflows module
- trinity.common.workflows.customized_toolcall_workflows module
- trinity.common.workflows.eval_workflow module
- trinity.common.workflows.math_rm_workflow module
- trinity.common.workflows.step_wise_workflow module
- trinity.common.workflows.workflow module
- Module contents
- Submodules
- trinity.common.config module
FormatConfig
FormatConfig.prompt_type
FormatConfig.prompt_key
FormatConfig.response_key
FormatConfig.messages_key
FormatConfig.chat_template
FormatConfig.system_prompt
FormatConfig.reply_prefix
FormatConfig.reward_fn_key
FormatConfig.workflow_key
FormatConfig.solution_key
FormatConfig.reward_key
FormatConfig.chosen_key
FormatConfig.rejected_key
FormatConfig.label_key
FormatConfig.__init__()
GenerationConfig
StorageConfig
StorageConfig.name
StorageConfig.storage_type
StorageConfig.path
StorageConfig.repeat_times
StorageConfig.raw
StorageConfig.split
StorageConfig.subset_name
StorageConfig.format
StorageConfig.index
StorageConfig.wrap_in_ray
StorageConfig.capacity
StorageConfig.max_read_timeout
StorageConfig.use_priority_queue
StorageConfig.reuse_cooldown_time
StorageConfig.replay_buffer_kwargs
StorageConfig.default_workflow_type
StorageConfig.default_eval_workflow_type
StorageConfig.default_reward_fn_type
StorageConfig.rollout_args
StorageConfig.workflow_args
StorageConfig.reward_fn_args
StorageConfig.enable_progress_bar
StorageConfig.ray_namespace
StorageConfig.algorithm_type
StorageConfig.total_epochs
StorageConfig.total_steps
StorageConfig.task_type
StorageConfig.__init__()
RewardShapingConfig
DataPipelineConfig
DataPipelineConfig.input_buffers
DataPipelineConfig.output_buffer
DataPipelineConfig.format
DataPipelineConfig.dj_config_path
DataPipelineConfig.dj_process_desc
DataPipelineConfig.agent_model_name
DataPipelineConfig.clean_strategy
DataPipelineConfig.min_size_ratio
DataPipelineConfig.min_priority_score
DataPipelineConfig.priority_weights
DataPipelineConfig.data_dist
DataPipelineConfig.reward_shaping
DataPipelineConfig.__init__()
DataProcessorConfig
ModelConfig
InferenceModelConfig
InferenceModelConfig.model_path
InferenceModelConfig.engine_type
InferenceModelConfig.engine_num
InferenceModelConfig.tensor_parallel_size
InferenceModelConfig.use_v1
InferenceModelConfig.enforce_eager
InferenceModelConfig.enable_prefix_caching
InferenceModelConfig.enable_chunked_prefill
InferenceModelConfig.gpu_memory_utilization
InferenceModelConfig.dtype
InferenceModelConfig.seed
InferenceModelConfig.max_model_len
InferenceModelConfig.max_prompt_tokens
InferenceModelConfig.max_response_tokens
InferenceModelConfig.chat_template
InferenceModelConfig.enable_thinking
InferenceModelConfig.enable_history
InferenceModelConfig.enable_openai_api
InferenceModelConfig.enable_auto_tool_choice
InferenceModelConfig.tool_call_parser
InferenceModelConfig.reasoning_parser
InferenceModelConfig.bundle_indices
InferenceModelConfig.__init__()
AlgorithmConfig
AlgorithmConfig.algorithm_type
AlgorithmConfig.repeat_times
AlgorithmConfig.add_strategy
AlgorithmConfig.add_strategy_args
AlgorithmConfig.sample_strategy
AlgorithmConfig.sample_strategy_args
AlgorithmConfig.advantage_fn
AlgorithmConfig.advantage_fn_args
AlgorithmConfig.kl_penalty_fn
AlgorithmConfig.kl_penalty_fn_args
AlgorithmConfig.policy_loss_fn
AlgorithmConfig.policy_loss_fn_args
AlgorithmConfig.kl_loss_fn
AlgorithmConfig.kl_loss_fn_args
AlgorithmConfig.entropy_loss_fn
AlgorithmConfig.entropy_loss_fn_args
AlgorithmConfig.use_token_level_loss
AlgorithmConfig.__init__()
ClusterConfig
ExplorerInput
TrainerInput
BufferConfig
BufferConfig.batch_size
BufferConfig.train_batch_size
BufferConfig.total_epochs
BufferConfig.total_steps
BufferConfig.explorer_input
BufferConfig.explorer_output
BufferConfig.trainer_input
BufferConfig.max_retry_times
BufferConfig.max_retry_interval
BufferConfig.tokenizer_path
BufferConfig.pad_token_id
BufferConfig.cache_dir
BufferConfig.__init__()
ExplorerConfig
ExplorerConfig.name
ExplorerConfig.runner_per_model
ExplorerConfig.max_timeout
ExplorerConfig.max_retry_times
ExplorerConfig.env_vars
ExplorerConfig.max_repeat_times_per_runner
ExplorerConfig.runner_num
ExplorerConfig.rollout_model
ExplorerConfig.auxiliary_models
ExplorerConfig.eval_interval
ExplorerConfig.eval_on_startup
ExplorerConfig.bench_on_latest_checkpoint
ExplorerConfig.collect_experiences
ExplorerConfig.__init__()
TrainerConfig
MonitorConfig
SynchronizerConfig
Config
Config.mode
Config.project
Config.group
Config.name
Config.checkpoint_root_dir
Config.checkpoint_job_dir
Config.ray_namespace
Config.continue_from_checkpoint
Config.algorithm
Config.data_processor
Config.model
Config.cluster
Config.buffer
Config.explorer
Config.trainer
Config.monitor
Config.synchronizer
Config.save()
Config.check_and_update()
Config.flatten()
Config.__init__()
load_config()
- trinity.common.constants module
- trinity.common.experience module
EID
ExperienceType
CustomField
Experience
Experience.__init__()
Experience.eid
Experience.reward
Experience.advantages
Experience.returns
Experience.experience_type
Experience.info
Experience.metrics
Experience.prompt_length
Experience.response_text
Experience.prompt_text
Experience.messages
Experience.chosen_text
Experience.rejected_text
Experience.tokens
Experience.logprobs
Experience.action_mask
Experience.chosen
Experience.rejected
Experience.serialize()
Experience.deserialize()
Experience.to_dict()
Experience.gather()
split_dpo_experience_to_single_turn()
Experiences
Experiences.__init__()
Experiences.eids
Experiences.tokens
Experiences.rewards
Experiences.advantages
Experiences.returns
Experiences.attention_masks
Experiences.action_masks
Experiences.prompt_length
Experiences.logprobs
Experiences.custom_fields
Experiences.batch_size
Experiences.gather_experiences()
empty_experiences()
gather_token_ids()
gather_action_masks()
gather_attention_masks()
gather_logprobs()
gather_advantages()
gather_returns()
- trinity.common.verl_config module
Data
FusedKernelOptions
ActorModel
Optim
WrapPolicy
FSDPConfig
Checkpoint
Actor
Actor.strategy
Actor.ppo_mini_batch_size
Actor.ppo_micro_batch_size
Actor.ppo_micro_batch_size_per_gpu
Actor.use_dynamic_bsz
Actor.ppo_max_token_len_per_gpu
Actor.grad_clip
Actor.ppo_epochs
Actor.shuffle
Actor.ulysses_sequence_parallel_size
Actor.entropy_from_logits_with_chunking
Actor.entropy_checkpointing
Actor.checkpoint
Actor.optim
Actor.fsdp_config
Actor.loss_agg_mode
Actor.clip_ratio
Actor.entropy_coeff
Actor.use_kl_loss
Actor.kl_loss_coef
Actor.kl_loss_type
Actor.__init__()
Ref
Rollout
ActorRolloutRef
CriticModel
Critic
Critic.strategy
Critic.optim
Critic.model
Critic.ppo_mini_batch_size
Critic.ppo_micro_batch_size
Critic.ppo_micro_batch_size_per_gpu
Critic.forward_micro_batch_size
Critic.forward_micro_batch_size_per_gpu
Critic.use_dynamic_bsz
Critic.ppo_max_token_len_per_gpu
Critic.forward_max_token_len_per_gpu
Critic.ulysses_sequence_parallel_size
Critic.ppo_epochs
Critic.shuffle
Critic.grad_clip
Critic.cliprange_value
Critic.checkpoint
Critic.rollout_n
Critic.loss_agg_mode
Critic.__init__()
RewardModel
CustomRewardFunction
KL_Ctrl
Algorithm
Trainer
Trainer.balance_batch
Trainer.total_epochs
Trainer.total_training_steps
Trainer.project_name
Trainer.group_name
Trainer.experiment_name
Trainer.logger
Trainer.val_generations_to_log_to_wandb
Trainer.nnodes
Trainer.n_gpus_per_node
Trainer.save_freq
Trainer.resume_mode
Trainer.resume_from_path
Trainer.test_freq
Trainer.critic_warmup
Trainer.default_hdfs_dir
Trainer.remove_previous_ckpt_in_save
Trainer.del_local_ckpt_after_load
Trainer.default_local_dir
Trainer.val_before_train
Trainer.training_rollout_mode
Trainer.enable_exp_buffer
Trainer.sync_freq
Trainer.sft_warmup_steps
Trainer.max_actor_ckpt_to_keep
Trainer.max_critic_ckpt_to_keep
Trainer.__init__()
veRLConfig
load_config()
- Module contents
- Subpackages
- trinity.explorer
- Submodules
- trinity.explorer.explorer module
Explorer
Explorer.__init__()
Explorer.setup_weight_sync_group()
Explorer.prepare()
Explorer.get_weight()
Explorer.explore()
Explorer.explore_step()
Explorer.need_sync()
Explorer.need_eval()
Explorer.eval()
Explorer.benchmark()
Explorer.save_checkpoint()
Explorer.sync_weight()
Explorer.shutdown()
Explorer.is_alive()
- trinity.explorer.scheduler module
- trinity.explorer.workflow_runner module
- Module contents
Explorer
Explorer.__init__()
Explorer.setup_weight_sync_group()
Explorer.prepare()
Explorer.get_weight()
Explorer.explore()
Explorer.explore_step()
Explorer.need_sync()
Explorer.need_eval()
Explorer.eval()
Explorer.benchmark()
Explorer.save_checkpoint()
Explorer.sync_weight()
Explorer.shutdown()
Explorer.is_alive()
- trinity.manager
- Subpackages
- trinity.manager.config_registry
- Submodules
- trinity.manager.config_registry.algorithm_config_manager module
- trinity.manager.config_registry.buffer_config_manager module
- trinity.manager.config_registry.config_registry module
- trinity.manager.config_registry.explorer_config_manager module
- trinity.manager.config_registry.model_config_manager module
- trinity.manager.config_registry.trainer_config_manager module
- Module contents
- trinity.manager.config_registry
- Submodules
- trinity.manager.config_manager module
- trinity.manager.manager module
- trinity.manager.synchronizer module
Synchronizer
Synchronizer.trainer_status
Synchronizer.explorer_status_counts
Synchronizer._ready_condition
Synchronizer.model_state_dict
Synchronizer.model_version
Synchronizer.checkpoint_shard_counter
Synchronizer.__init__()
Synchronizer.add_module()
Synchronizer.set_trainer_status()
Synchronizer.get_trainer_status()
Synchronizer.set_explorer_status()
Synchronizer.get_explorer_status_counts()
Synchronizer.set_model_state_dict_with_step_num()
Synchronizer.set_model_state_dict()
Synchronizer.get_model_state_dict()
Synchronizer.get_state_dict_meta()
Synchronizer.setup_weight_sync_group()
Synchronizer.wait_new_model_state_dict()
Synchronizer.ready_to_nccl_sync()
Synchronizer.get_actor()
- Module contents
CacheManager
Synchronizer
Synchronizer.trainer_status
Synchronizer.explorer_status_counts
Synchronizer._ready_condition
Synchronizer.model_state_dict
Synchronizer.model_version
Synchronizer.checkpoint_shard_counter
Synchronizer.__init__()
Synchronizer.add_module()
Synchronizer.set_trainer_status()
Synchronizer.get_trainer_status()
Synchronizer.set_explorer_status()
Synchronizer.get_explorer_status_counts()
Synchronizer.set_model_state_dict_with_step_num()
Synchronizer.set_model_state_dict()
Synchronizer.get_model_state_dict()
Synchronizer.get_state_dict_meta()
Synchronizer.setup_weight_sync_group()
Synchronizer.wait_new_model_state_dict()
Synchronizer.ready_to_nccl_sync()
Synchronizer.get_actor()
- Subpackages
- trinity.plugins
- trinity.trainer
- Subpackages
- Submodules
- trinity.trainer.trainer module
- trinity.trainer.verl_trainer module
VerlPPOTrainerWrapper
VerlPPOTrainerWrapper.__init__()
VerlPPOTrainerWrapper.init_workers()
VerlPPOTrainerWrapper.train_step_num
VerlPPOTrainerWrapper.prepare()
VerlPPOTrainerWrapper.save_state_dict()
VerlPPOTrainerWrapper.upload_state_dict()
VerlPPOTrainerWrapper.train_step()
VerlPPOTrainerWrapper.save_checkpoint()
VerlPPOTrainerWrapper.sync_weight()
VerlPPOTrainerWrapper.sft_to_rft()
- Module contents
- trinity.utils
- Submodules
- trinity.utils.distributed module
- trinity.utils.dlc_utils module
- trinity.utils.eval_utils module
- trinity.utils.log module
- trinity.utils.math_eval_utils module
- trinity.utils.monitor module
- trinity.utils.plugin_loader module
- trinity.utils.registry module
- trinity.utils.timer module
- Module contents
Module contents
Trinity-RFT (Reinforcement Fine-Tuning)