trinity.algorithm.policy_loss_fn.gspo_policy_loss module#

GSPO-token policy loss function.

Implemented from https://arxiv.org/pdf/2507.18071

class trinity.algorithm.policy_loss_fn.gspo_policy_loss.GSPOLossFn(backend: str = 'verl', clip_range: float | None = None, clip_range_low: float | None = None, clip_range_high: float | None = None, loss_agg_mode: str | None = 'seq-mean-token-mean')[源代码]#

基类：PolicyLossFn

__init__(backend: str = 'verl', clip_range: float | None = None, clip_range_low: float | None = None, clip_range_high: float | None = None, loss_agg_mode: str | None = 'seq-mean-token-mean') → None[源代码]#

Initialize the policy loss function.

参数:: backend -- The training framework/backend to use (e.g., "verl")

classmethod default_args() → Dict[源代码]#

Get default initialization arguments for this loss function.

返回:: The default init arguments for the policy loss function.
返回类型:: Dict

property select_keys#: Returns parameter keys mapped to the specific training framework's naming convention.

trinity.algorithm.policy_loss_fn.gspo_policy_loss module

目录

trinity.algorithm.policy_loss_fn.gspo_policy_loss module#