trinity.algorithm.advantage_fn package#
Submodules#
- trinity.algorithm.advantage_fn.advantage_fn module
- trinity.algorithm.advantage_fn.asymre_advantage module
- trinity.algorithm.advantage_fn.grpo_advantage module
- trinity.algorithm.advantage_fn.multi_step_grpo_advantage module
- trinity.algorithm.advantage_fn.on_policy_distill_advantage module
- trinity.algorithm.advantage_fn.opmd_advantage module
- trinity.algorithm.advantage_fn.ppo_advantage module
- trinity.algorithm.advantage_fn.rec_advantage module
- trinity.algorithm.advantage_fn.reinforce_advantage module
- trinity.algorithm.advantage_fn.reinforce_plus_plus_advantage module
- trinity.algorithm.advantage_fn.remax_advantage module
- trinity.algorithm.advantage_fn.rloo_advantage module
Module contents#
- class trinity.algorithm.advantage_fn.AdvantageFn[source]#
Bases:
ABC
- class trinity.algorithm.advantage_fn.GroupAdvantage[source]#
Bases:
AdvantageFn,ExperienceOperatorFor group-based advantages calculation.
- abstract group_experiences(exps: List[Experience]) Dict[str, List[Experience]][source]#
Group experiences by a certain criterion.
- Parameters:
exps (List[Experience]) β List of experiences to be grouped.
- Returns:
A dictionary where keys are group identifiers and values are lists of experiences.
- Return type:
Dict[str, List[Experience]]
- abstract calculate_group_advantage(group_id: str, exps: List[Experience]) Tuple[List[Experience], Dict][source]#
Calculate advantages for a group of experiences.
- Parameters:
group_id (str) β The identifier for the group of experiences.
exps (List[Experience]) β List of experiences in the group.
- Returns:
A tuple containing the modified list of experiences and a dictionary of metrics.
- Return type:
List[Experience]
- process(exps: List[Experience]) Tuple[List[Experience], Dict][source]#
Process a list of experiences and return a transformed list.
- Parameters:
exps (List[Experience]) β List of experiences to process, which contains all experiences generated by the Explorer in one explore step.
- Returns:
A tuple containing the processed list of experiences and a dictionary of metrics.
- Return type:
Tuple[List[Experience], Dict]