trinity.trainer.verl.dp_actor module

Single Process Actor. Modified from https://github.com/volcengine/verl/blob/v0.4.1/verl/workers/actor/dp_actor.py

class trinity.trainer.verl.dp_actor.DataParallelPPOActor(config, actor_module: Module, actor_optimizer: Optimizer | None = None)[source]

Bases: DataParallelPPOActor

__init__(config, actor_module: Module, actor_optimizer: Optimizer | None = None)[source]

When optimizer is None, it is Reference Policy

set_algorithm(algorithm_config: AlgorithmConfig)[source]
update_policy(**kwargs)

Update the policy with an iterator of DataProto

Parameters:

data (DataProto) – an iterator over the DataProto that returns by `make_minibatch_iterator`

Returns:

a dictionary contains anything. Typically, it contains the statistics during updating the model such as `loss`, `grad_norm`, etc,.

Return type:

Dict