trinity.algorithm.policy_loss_fn.ppo_policy_loss module#
PPO policy loss function.
Modified from volcengine/verl
- class trinity.algorithm.policy_loss_fn.ppo_policy_loss.PPOPolicyLossFn(backend: str = 'verl', clip_range: float | None = None, clip_range_low: float | None = None, clip_range_high: float | None = None)[source]#
Bases:
PolicyLossFn
- __init__(backend: str = 'verl', clip_range: float | None = None, clip_range_low: float | None = None, clip_range_high: float | None = None) None [source]#
Initialize the policy loss function.
- Parameters:
backend – The training framework/backend to use (e.g., “verl”)
- property select_keys#
Returns parameter keys mapped to the specific training framework’s naming convention.