trinity.algorithm.advantage_fn.opmd_advantage module

OPMD advantage computation

class trinity.algorithm.advantage_fn.opmd_advantage.OPMDAdvantageFn(opmd_baseline: str = 'mean', tau: float = 1.0)[source]

OPMD advantage computation

classmethod default_args() → Dict[source]

class trinity.algorithm.advantage_fn.opmd_advantage.OPMDGroupAdvantage(opmd_baseline: str = 'mean', tau: float = 1.0, **kwargs)[source]

OPMD Group Advantage computation

__init__(opmd_baseline: str = 'mean', tau: float = 1.0, **kwargs) → None[source]

group_experiences(exps)[source]

Group experiences by a certain criterion.

Parameters:: exps (List[Experience]) – List of experiences to be grouped.
Returns:: A dictionary where keys are group identifiers and values are lists of experiences.
Return type:: Dict[str, List[Experience]]

calculate_group_advantage(group_id: str, exps: List[Experience]) → Tuple[List[Experience], Dict][source]

Calculate advantages for a group of experiences.

Parameters:

Returns:

A tuple containing the modified list of experiences and a dictionary of metrics.

Return type:

List[Experience]

classmethod default_args() → dict[source]