trinity.algorithm.kl_fn package

Module contents

class trinity.algorithm.kl_fn.KLFn(adaptive: bool = False, kl_coef: float = 0.001, target_kl: float | None = None, horizon: float | None = None)[source]

Bases: ABC

KL penalty and loss.

__init__(adaptive: bool = False, kl_coef: float = 0.001, target_kl: float | None = None, horizon: float | None = None) → None[source]

update_kl_coef(current_kl: float, batch_size: int) → None[source]: Update kl coefficient.

apply_kl_penalty_to_reward(experiences: Any) → Tuple[Any, Dict][source]: Apply KL penalty to reward. Only support DataProto input for now.

calculate_kl_loss(logprob: Tensor, ref_logprob: Tensor, response_mask: Tensor) → Tuple[Tensor, Dict][source]: Compute KL loss.

abstract calculate_kl(logprob: Tensor, ref_logprob: Tensor) → Tensor[source]: Compute KL divergence between logprob and ref_logprob.

classmethod default_args()[source]: Get the default initialization arguments.

trinity.algorithm.kl_fn package

Submodules

Module contents