trinity.trainer.verl

Submodules

trinity.trainer.verl.dp_actor module

Single Process Actor. Modified from https://github.com/volcengine/verl/blob/0758489422e8d41a89e6c36d4c477714520f0dcc/verl/workers/actor/dp_actor.py

class trinity.trainer.verl.dp_actor.DataParallelPPOActor(config, actor_module: Module, actor_optimizer: Optimizer | None = None)[source]

Bases: DataParallelPPOActor

__init__(config, actor_module: Module, actor_optimizer: Optimizer | None = None)[source]

When optimizer is None, it is Reference Policy

set_algorithm(algorithm_config: AlgorithmConfig)[source]
compute_log_prob(**kwargs)

Compute logits given a batch of data.

Parameters:

data (DataProto) – a batch of data represented by DataProto. It must contain key `input_ids`, `attention_mask` and `position_ids`.

Returns:

a DataProto containing the key `log_probs`

Return type:

DataProto

update_policy(**kwargs)

Update the policy with an iterator of DataProto

Parameters:

data (DataProto) – an iterator over the DataProto that returns by `make_minibatch_iterator`

Returns:

a dictionary contains anything. Typically, it contains the statistics during updating the model such as `loss`, `grad_norm`, etc,.

Return type:

Dict

trinity.trainer.verl.fsdp_workers module

The main entry point to run the PPO algorithm. Modified from https://github.com/volcengine/verl/blob/0758489422e8d41a89e6c36d4c477714520f0dcc/verl/workers/fsdp_workers.py

class trinity.trainer.verl.fsdp_workers.ActorRolloutRefWorker(*args, **kwargs)[source]

Bases: Worker

This worker can be instantiated as a standalone actor or a standalone rollout or a standalone reference policy or a hybrid engine based on the config.rollout

__init__(config: DictConfig, role: str)[source]

Initialize the worker with environment settings and device configuration.

Parameters:

cuda_visible_devices (str, optional) – CUDA visible devices configuration. Defaults to None.

init_model()[source]
setup_weight_sync_group()[source]
sync_weight()[source]
set_algorithm(algo_config: AlgorithmConfig)[source]
update_actor(data: DataProto)[source]
compute_log_prob(data: DataProto)[source]
compute_ref_log_prob(data: DataProto)[source]
save_checkpoint(local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None)[source]
load_checkpoint(local_path, hdfs_path=None, del_local_after_load=False)[source]
clear_optimizer_state()[source]
class trinity.trainer.verl.fsdp_workers.CriticWorker(*args, **kwargs)[source]

Bases: Worker

__init__(config)[source]

Initialize the worker with environment settings and device configuration.

Parameters:

cuda_visible_devices (str, optional) – CUDA visible devices configuration. Defaults to None.

init_model()[source]
compute_values(data: DataProto)[source]
update_critic(data: DataProto)[source]
save_checkpoint(local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None)[source]
load_checkpoint(local_path, hdfs_path=None, del_local_after_load=True)[source]
clear_optimizer_state()[source]

Module contents