trinity.algorithm.advantage_fn.opmd_advantage module
OPMD advantage computation
- class trinity.algorithm.advantage_fn.opmd_advantage.OPMDAdvantageFn(opmd_baseline: str = 'mean', tau: float = 1.0)[source]
Bases:
AdvantageFn
OPMD advantage computation
- class trinity.algorithm.advantage_fn.opmd_advantage.OPMDGroupAdvantage(opmd_baseline: str = 'mean', tau: float = 1.0, **kwargs)[source]
Bases:
GroupAdvantage
OPMD Group Advantage computation
- group_experiences(exps)[source]
Group experiences by a certain criterion.
- Parameters:
exps (List[Experience]) – List of experiences to be grouped.
- Returns:
A dictionary where keys are group identifiers and values are lists of experiences.
- Return type:
Dict[str, List[Experience]]
- calculate_group_advantage(group_id: str, exps: List[Experience]) Tuple[List[Experience], Dict] [source]
Calculate advantages for a group of experiences.
- Parameters:
group_id (str) – The identifier for the group of experiences.
exps (List[Experience]) – List of experiences in the group.
- Returns:
A tuple containing the modified list of experiences and a dictionary of metrics.
- Return type:
List[Experience]