trinity.algorithm.advantage_fn.asymre_advantage module#

AsymRE advantage computation

class trinity.algorithm.advantage_fn.asymre_advantage.ASYMREAdvantageFn(baseline_shift: float = -0.1)[source]#

Bases: AdvantageFn

AsymRE advantage computation

__init__(baseline_shift: float = -0.1) None[source]#
classmethod default_args() Dict[source]#
Returns:

The default init arguments for the advantage function.

Return type:

Dict

class trinity.algorithm.advantage_fn.asymre_advantage.ASYMREGroupAdvantage(baseline_shift: float = -0.1, **kwargs)[source]#

Bases: GroupAdvantage

asymre Group Advantage computation

__init__(baseline_shift: float = -0.1, **kwargs) None[source]#
group_experiences(exps)[source]#

Group experiences by a certain criterion.

Parameters:

exps (List[Experience]) – List of experiences to be grouped.

Returns:

A dictionary where keys are group identifiers and values are lists of experiences.

Return type:

Dict[str, List[Experience]]

calculate_group_advantage(group_id: str, exps: List[Experience]) Tuple[List[Experience], Dict][source]#

Calculate advantages for a group of experiences.

Parameters:
  • group_id (str) – The identifier for the group of experiences.

  • exps (List[Experience]) – List of experiences in the group.

Returns:

A tuple containing the modified list of experiences and a dictionary of metrics.

Return type:

List[Experience]

classmethod default_args() dict[source]#
Returns:

The default init arguments for the advantage function.

Return type:

Dict