trinity.algorithm.policy_loss_fn.dpo_loss module

trinity.algorithm.policy_loss_fn.dpo_loss module#

DPO loss function.

class trinity.algorithm.policy_loss_fn.dpo_loss.DPOLossFn(backend: str = 'verl', beta: float = 0.1, label_smoothing: float = 0.0)[源代码]#

基类:PolicyLossFn

__init__(backend: str = 'verl', beta: float = 0.1, label_smoothing: float = 0.0) None[源代码]#

Initialize the policy loss function.

参数:

backend -- The training framework/backend to use (e.g., "verl")

classmethod default_args() Dict[源代码]#

Get default initialization arguments for this loss function.

返回:

The default init arguments for the policy loss function.

返回类型:

Dict

property select_keys#

Returns parameter keys mapped to the specific training framework's naming convention.