trinity.buffer.operators.mappers.reward_shaping_mapper module

class trinity.buffer.operators.mappers.reward_shaping_mapper.RewardShapingMapper(reward_shaping_configs: List[Dict] | None = None)[source]

Bases: ExperienceOperator

Re-shaping the existing rewards of experiences based on rules or other advanced methods.

Note

This mapper assumes that the reward is already calculated and stored in the Experience object, and the necessary stats are already calculated and stored in the Experience info field.

__init__(reward_shaping_configs: List[Dict] | None = None)[source]

Initializes the RewardShapingMapper.

Parameters:

reward_shaping_configs (list[dict], optional) –

A list of dictionaries containing reward shaping configurations. Each dictionary should include the following keys:

stats_key (str): The field key name of target stats used to shape the reward.
op_type (str): The type of operator to apply between the reward and the target stats. Should be one of {“ADD”, “SUB”, “MUL”, “DIV”}.
weight (float): The weight for the target stats.

Example

[

{: “stats_key”: “llm_quality_score”, “op_type”: “ADD”, “weight”: 1.0,

}

]

process(exps: List[Experience]) → Tuple[List[Experience], Dict][source]

Process a list of experiences and return a transformed list.

Parameters:: exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.
Returns:: A tuple containing the processed list of experiences and a dictionary of metrics.
Return type:: Tuple[List[Experience], Dict]