trinity.algorithm.advantage_fn
Submodules
trinity.algorithm.advantage_fn.advantage_fn module
trinity.algorithm.advantage_fn.grpo_advantage module
GRPO advantage computation
Ref: https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py
- class trinity.algorithm.advantage_fn.grpo_advantage.GRPOAdvantageFn(epsilon: float = 1e-06)[source]
Bases:
AdvantageFn
GRPO advantage computation
trinity.algorithm.advantage_fn.opmd_advantage module
OPMD advantage computation
- class trinity.algorithm.advantage_fn.opmd_advantage.OPMDAdvantageFn(opmd_baseline: str = 'mean', tau: float = 1.0)[source]
Bases:
AdvantageFn
OPMD advantage computation
trinity.algorithm.advantage_fn.ppo_advantage module
PPO’s GAE advantage computation
Ref: https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py
- class trinity.algorithm.advantage_fn.ppo_advantage.PPOAdvantageFn(gamma: float = 1.0, lam: float = 1.0)[source]
Bases:
AdvantageFn
trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage module
REINFORCE++ advantage computation
Ref: https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py
- class trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage.REINFORCEPLUSPLUSAdvantageFn(gamma: float = 1.0)[source]
Bases:
AdvantageFn
trinity.algorithm.advantage_fn.remax_advantage module
REMAX advantage computation
Ref: https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py
trinity.algorithm.advantage_fn.rloo_advantage module
RLOO advantage computation
Ref: https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py
Module contents
- class trinity.algorithm.advantage_fn.PPOAdvantageFn(gamma: float = 1.0, lam: float = 1.0)[source]
Bases:
AdvantageFn
- class trinity.algorithm.advantage_fn.GRPOAdvantageFn(epsilon: float = 1e-06)[source]
Bases:
AdvantageFn
GRPO advantage computation
- class trinity.algorithm.advantage_fn.REINFORCEPLUSPLUSAdvantageFn(gamma: float = 1.0)[source]
Bases:
AdvantageFn
- class trinity.algorithm.advantage_fn.REMAXAdvantageFn[source]
Bases:
AdvantageFn
- class trinity.algorithm.advantage_fn.RLOOAdvantageFn[source]
Bases:
AdvantageFn
- class trinity.algorithm.advantage_fn.OPMDAdvantageFn(opmd_baseline: str = 'mean', tau: float = 1.0)[source]
Bases:
AdvantageFn
OPMD advantage computation