trinity.algorithm.policy_loss_fn.sppo_loss_fn module#
sPPO-token policy loss function. Relevant paper: https://arxiv.org/abs/2108.05828.
- class trinity.algorithm.policy_loss_fn.sppo_loss_fn.sPPOPolicyLossFn(backend: str = 'verl', epsilon: float = 0.3)[source]#
Bases:
PolicyLossFn
- __init__(backend: str = 'verl', epsilon: float = 0.3) None [source]#
Initialize the policy loss function.
- Parameters:
backend – The training framework/backend to use (e.g., “verl”)
- classmethod default_args() Dict [source]#
Get default initialization arguments for this loss function.
- Returns:
The default init arguments for the policy loss function.
- Return type:
Dict
- property select_keys#
Returns parameter keys mapped to the specific training framework’s naming convention.