trinity.trainer.verl
Submodules
trinity.trainer.verl.dp_actor module
Single Process Actor. Modified from https://github.com/volcengine/verl/blob/0758489422e8d41a89e6c36d4c477714520f0dcc/verl/workers/actor/dp_actor.py
- class trinity.trainer.verl.dp_actor.DataParallelPPOActor(config, actor_module: Module, actor_optimizer: Optimizer | None = None)[source]
Bases:
DataParallelPPOActor
- __init__(config, actor_module: Module, actor_optimizer: Optimizer | None = None)[source]
When optimizer is None, it is Reference Policy
- set_algorithm(algorithm_config: AlgorithmConfig)[source]
- compute_log_prob(**kwargs)
Compute logits given a batch of data.
- Parameters:
data (DataProto) – a batch of data represented by DataProto. It must contain key
`input_ids`
,`attention_mask`
and`position_ids`
.- Returns:
a DataProto containing the key
`log_probs`
- Return type:
DataProto
- update_policy(**kwargs)
Update the policy with an iterator of DataProto
- Parameters:
data (DataProto) – an iterator over the DataProto that returns by
`make_minibatch_iterator`
- Returns:
a dictionary contains anything. Typically, it contains the statistics during updating the model such as
`loss`
,`grad_norm`
, etc,.- Return type:
Dict
trinity.trainer.verl.fsdp_workers module
The main entry point to run the PPO algorithm. Modified from https://github.com/volcengine/verl/blob/0758489422e8d41a89e6c36d4c477714520f0dcc/verl/workers/fsdp_workers.py
- class trinity.trainer.verl.fsdp_workers.ActorRolloutRefWorker(*args, **kwargs)[source]
Bases:
Worker
This worker can be instantiated as a standalone actor or a standalone rollout or a standalone reference policy or a hybrid engine based on the config.rollout
- __init__(config: DictConfig, role: str)[source]
Initialize the worker with environment settings and device configuration.
- Parameters:
cuda_visible_devices (str, optional) – CUDA visible devices configuration. Defaults to None.
- set_algorithm(algo_config: AlgorithmConfig)[source]