trinity.algorithm package

trinity.algorithm package#

Subpackages#

trinity.algorithm.advantage_fn package
- Submodules
- Module contents
trinity.algorithm.entropy_loss_fn package
- Submodules
  - trinity.algorithm.entropy_loss_fn.entropy_loss_fn module
- Module contents
  - EntropyLossFn
    - EntropyLossFn.default_args()
trinity.algorithm.kl_fn package
- Submodules
  - trinity.algorithm.kl_fn.kl_fn module
    - KLFn
    - DummyKLFn
    - K1Fn
    - K2Fn
    - K3Fn
    - AbsFn
- Module contents
  - KLFn
trinity.algorithm.policy_loss_fn package
- Submodules
- Module contents
trinity.algorithm.sample_strategy package
- Submodules
- Module contents

Submodules#

Module contents#

class trinity.algorithm.AlgorithmType[source]#

Bases: ABC

use_critic: bool#

use_reference: bool#

compute_advantage_in_trainer: bool#

can_balance_batch: bool#

schema: str#

abstract classmethod default_config() → Dict[source]#

classmethod name() → str[source]#

classmethod check_config(config: Config) → None[source]#

class trinity.algorithm.AdvantageFn[source]#

Bases: ABC

abstract classmethod default_args() → Dict[source]#

Returns:: The default init arguments for the advantage function.
Return type:: Dict

classmethod compute_in_trainer() → bool[source]#: Whether the advantage should be computed in the trainer loop.

class trinity.algorithm.PolicyLossFn(backend: str = 'verl')[source]#

Bases: ABC

Abstract base class for policy loss functions.

This class provides the interface for implementing different policy gradient loss functions while handling parameter name mapping between different training frameworks.

__init__(backend: str = 'verl')[source]#

Initialize the policy loss function.

Parameters:: backend – The training framework/backend to use (e.g., “verl”)

abstract classmethod default_args() → Dict[source]#

Get default initialization arguments for this loss function.

Returns:: The default init arguments for the policy loss function.
Return type:: Dict

property select_keys#: Returns parameter keys mapped to the specific training framework’s naming convention.

class trinity.algorithm.KLFn(adaptive: bool = False, kl_coef: float = 0.001, target_kl: float | None = None, horizon: float | None = None)[source]#

Bases: ABC

KL penalty and loss.

__init__(adaptive: bool = False, kl_coef: float = 0.001, target_kl: float | None = None, horizon: float | None = None) → None[source]#

update_kl_coef(current_kl: float, batch_size: int) → None[source]#: Update kl coefficient.

apply_kl_penalty_to_reward(experiences: Any) → Tuple[Any, Dict][source]#: Apply KL penalty to reward. Only support DataProto input for now.

calculate_kl_loss(logprob: Tensor, ref_logprob: Tensor, response_mask: Tensor) → Tuple[Tensor, Dict][source]#: Compute KL loss.

abstract calculate_kl(logprob: Tensor, ref_logprob: Tensor) → Tensor[source]#: Compute KL divergence between logprob and ref_logprob.

classmethod default_args()[source]#: Get the default initialization arguments.

class trinity.algorithm.EntropyLossFn[source]#

Bases: ABC

Entropy loss function.

classmethod default_args() → Dict[source]#

Returns:: The default arguments for the entropy loss function.
Return type:: Dict

class trinity.algorithm.SampleStrategy(buffer_config: BufferConfig, **kwargs)[source]#

Bases: ABC

__init__(buffer_config: BufferConfig, **kwargs) → None[source]#

abstract async sample(step: int) → Tuple[Experiences, Dict, List][source]#

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled Experiences data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Experiences

abstract classmethod default_args() → dict[source]#: Get the default arguments of the sample strategy.

trinity.algorithm package

Contents

trinity.algorithm package#

Subpackages#

Submodules#

Module contents#