trinity.algorithm.advantage_fn.advantage_fn module

class trinity.algorithm.advantage_fn.advantage_fn.AdvantageFn[source]

Bases: ABC

abstract classmethod default_args() → Dict[source]

classmethod compute_in_trainer() → bool[source]: Whether the advantage should be computed in the trainer loop.

class trinity.algorithm.advantage_fn.advantage_fn.GroupAdvantage[source]

For group-based advantages calculation.

abstract group_experiences(exps: List[Experience]) → Dict[str, List[Experience]][source]

Group experiences by a certain criterion.

Parameters:: exps (List[Experience]) – List of experiences to be grouped.
Returns:: A dictionary where keys are group identifiers and values are lists of experiences.
Return type:: Dict[str, List[Experience]]

abstract calculate_group_advantage(group_id: str, exps: List[Experience]) → Tuple[List[Experience], Dict][source]

Calculate advantages for a group of experiences.

Parameters:

Returns:

A tuple containing the modified list of experiences and a dictionary of metrics.

Return type:

List[Experience]

process(exps: List[Experience]) → Tuple[List[Experience], Dict][source]

Process a list of experiences and return a transformed list.

Parameters:: exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.
Returns:: A tuple containing the processed list of experiences and a dictionary of metrics.
Return type:: Tuple[List[Experience], Dict]

classmethod compute_in_trainer() → bool[source]: Whether the advantage should be computed in the trainer loop.