trinity.algorithm.advantage_fn package#

Submodules#

class trinity.algorithm.advantage_fn.AdvantageFn[源代码]#

基类：ABC

classmethod compute_in_trainer() → bool[源代码]#: Whether the advantage should be computed in the trainer loop.

abstractmethod classmethod default_args() → Dict[源代码]#

class trinity.algorithm.advantage_fn.GroupAdvantage[源代码]#

For group-based advantages calculation.

abstractmethod calculate_group_advantage(group_id: str, exps: List[Experience]) → Tuple[List[Experience], Dict][源代码]#

Calculate advantages for a group of experiences.

参数:

返回:

A tuple containing the modified list of experiences and a dictionary of metrics.

返回类型:

List[Experience]

classmethod compute_in_trainer() → bool[源代码]#: Whether the advantage should be computed in the trainer loop.

abstractmethod group_experiences(exps: List[Experience]) → Dict[str, List[Experience]][源代码]#

Group experiences by a certain criterion.

参数:: exps (List[Experience]) -- List of experiences to be grouped.
返回:: A dictionary where keys are group identifiers and values are lists of experiences.
返回类型:: Dict[str, List[Experience]]

process(exps: List[Experience]) → Tuple[List[Experience], Dict][源代码]#

Process a list of experiences and return a transformed list.

参数:: exps (List[Experience]) -- List of experiences to process, which contains all experiences generated by the Explorer in one explore step.
返回:: A tuple containing the processed list of experiences and a dictionary of metrics.
返回类型:: Tuple[List[Experience], Dict]