trinity.algorithm.policy_loss_fn.ppo_policy_loss module

PPO policy loss function.

Modified from https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py

class trinity.algorithm.policy_loss_fn.ppo_policy_loss.PPOPolicyLossFn(backend: str = 'verl', clip_range: float | None = None, clip_range_low: float | None = None, clip_range_high: float | None = None)[source]

Bases: PolicyLossFn

__init__(backend: str = 'verl', clip_range: float | None = None, clip_range_low: float | None = None, clip_range_high: float | None = None) → None[source]

Initialize the policy loss function.

Parameters:: backend – The training framework/backend to use (e.g., “verl”)

property select_keys: Returns parameter keys mapped to the specific training framework’s naming convention.

classmethod default_args() → Dict[source]

Get default initialization arguments for this loss function.

Returns:: The default init arguments for the policy loss function.
Return type:: Dict