trinity.buffer.operators package
Subpackages
Submodules
Module contents
- class trinity.buffer.operators.ExperienceOperator[source]
Bases:
ABC
Base class for all experience operators in the Trinity framework. Operators are used to process experiences and perform some transformations based on them.
- abstract process(exps: List[Experience]) Tuple[List[Experience], Dict] [source]
Process a list of experiences and return a transformed list.
- Parameters:
exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.
- Returns:
A tuple containing the processed list of experiences and a dictionary of metrics.
- Return type:
Tuple[List[Experience], Dict]
- classmethod create_operators(operator_configs: List[OperatorConfig]) List[ExperienceOperator] [source]
Create a list of ExperienceOperator instances based on the provided operator configurations.
- Parameters:
operator_configs (List[OperatorConfig]) – List of operator configurations.
- Returns:
List of instantiated ExperienceOperator objects.
- Return type:
List[ExperienceOperator]
- class trinity.buffer.operators.RewardFilter(threshold: float = 0.0)[source]
Bases:
ExperienceOperator
Filter experiences based on the reward value.
Note: This filter assumes that the reward is already calculated and stored in the Experience object.
- process(exps: List[Experience]) Tuple[List[Experience], dict] [source]
Filter experiences based on reward value.
- class trinity.buffer.operators.RewardSTDFilter(threshold: float = 0.0)[source]
Bases:
ExperienceOperator
Filter experiences based on the standard deviation of rewards within each group.
Note: This filter assumes that the reward is already calculated and stored in the Experience object.
- process(exps: List[Experience]) Tuple[List[Experience], dict] [source]
Filter experiences based on reward std.
- class trinity.buffer.operators.RewardShapingMapper(reward_shaping_configs: List[Dict] | None = None)[source]
Bases:
ExperienceOperator
Re-shaping the existing rewards of experiences based on rules or other advanced methods.
Note
This mapper assumes that the reward is already calculated and stored in the Experience object, and the necessary stats are already calculated and stored in the Experience info field.
- __init__(reward_shaping_configs: List[Dict] | None = None)[source]
Initializes the RewardShapingMapper.
- Parameters:
reward_shaping_configs (list[dict], optional) –
A list of dictionaries containing reward shaping configurations. Each dictionary should include the following keys:
stats_key (str): The field key name of target stats used to shape the reward.
op_type (str): The type of operator to apply between the reward and the target stats. Should be one of {“ADD”, “SUB”, “MUL”, “DIV”}.
weight (float): The weight for the target stats.
Example
- [
- {
“stats_key”: “llm_quality_score”, “op_type”: “ADD”, “weight”: 1.0,
}
]
- process(exps: List[Experience]) Tuple[List[Experience], Dict] [source]
Process a list of experiences and return a transformed list.
- Parameters:
exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.
- Returns:
A tuple containing the processed list of experiences and a dictionary of metrics.
- Return type:
Tuple[List[Experience], Dict]
- class trinity.buffer.operators.DataJuicerOperator(service_config: DataJuicerServiceConfig, operators: List[Dict] | None = None, config_path: str | None = None, np: int = 4)[source]
Bases:
ExperienceOperator
- __init__(service_config: DataJuicerServiceConfig, operators: List[Dict] | None = None, config_path: str | None = None, np: int = 4)[source]
Initialize the DataJuicerOperator.
- Parameters:
service_config (config) – The configuration for the DataJuicer service.
operators (List[Dict]) – A list of operators with their configurations.
config_path (str) – Path to the Data-Juicer configuration file.
np (int) – Number of processes to use for Data-Juicer. Default is 4.
Note
- Must include one of the following, and the priority is from high to low:
config_path (str)
operators (List[Dict])
- process(exps: List[Experience]) Tuple[List[Experience], Dict] [source]
Process a list of experiences and return a transformed list.
- Parameters:
exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.
- Returns:
A tuple containing the processed list of experiences and a dictionary of metrics.
- Return type:
Tuple[List[Experience], Dict]