trinity.buffer.operators package

Subpackages

Submodules

Module contents

class trinity.buffer.operators.ExperienceOperator[source]

Bases: ABC

Base class for all experience operators in the Trinity framework. Operators are used to process experiences and perform some transformations based on them.

abstract process(exps: List[Experience]) Tuple[List[Experience], Dict][source]

Process a list of experiences and return a transformed list.

Parameters:

exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.

Returns:

A tuple containing the processed list of experiences and a dictionary of metrics.

Return type:

Tuple[List[Experience], Dict]

classmethod create_operators(operator_configs: List[OperatorConfig]) List[ExperienceOperator][source]

Create a list of ExperienceOperator instances based on the provided operator configurations.

Parameters:

operator_configs (List[OperatorConfig]) – List of operator configurations.

Returns:

List of instantiated ExperienceOperator objects.

Return type:

List[ExperienceOperator]

close()[source]

Close the operator if it has any resources to release.

class trinity.buffer.operators.RewardFilter(threshold: float = 0.0)[source]

Bases: ExperienceOperator

Filter experiences based on the reward value.

Note: This filter assumes that the reward is already calculated and stored in the Experience object.

__init__(threshold: float = 0.0)[source]
process(exps: List[Experience]) Tuple[List[Experience], dict][source]

Filter experiences based on reward value.

class trinity.buffer.operators.RewardSTDFilter(threshold: float = 0.0)[source]

Bases: ExperienceOperator

Filter experiences based on the standard deviation of rewards within each group.

Note: This filter assumes that the reward is already calculated and stored in the Experience object.

__init__(threshold: float = 0.0)[source]
process(exps: List[Experience]) Tuple[List[Experience], dict][source]

Filter experiences based on reward std.

class trinity.buffer.operators.RewardShapingMapper(reward_shaping_configs: List[Dict] | None = None)[source]

Bases: ExperienceOperator

Re-shaping the existing rewards of experiences based on rules or other advanced methods.

Note

This mapper assumes that the reward is already calculated and stored in the Experience object, and the necessary stats are already calculated and stored in the Experience info field.

__init__(reward_shaping_configs: List[Dict] | None = None)[source]

Initializes the RewardShapingMapper.

Parameters:

reward_shaping_configs (list[dict], optional) –

A list of dictionaries containing reward shaping configurations. Each dictionary should include the following keys:

  • stats_key (str): The field key name of target stats used to shape the reward.

  • op_type (str): The type of operator to apply between the reward and the target stats. Should be one of {“ADD”, “SUB”, “MUL”, “DIV”}.

  • weight (float): The weight for the target stats.

Example

[
{

“stats_key”: “llm_quality_score”, “op_type”: “ADD”, “weight”: 1.0,

}

]

process(exps: List[Experience]) Tuple[List[Experience], Dict][source]

Process a list of experiences and return a transformed list.

Parameters:

exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.

Returns:

A tuple containing the processed list of experiences and a dictionary of metrics.

Return type:

Tuple[List[Experience], Dict]

class trinity.buffer.operators.DataJuicerOperator(service_config: DataJuicerServiceConfig, operators: List[Dict] | None = None, config_path: str | None = None, np: int = 4)[source]

Bases: ExperienceOperator

__init__(service_config: DataJuicerServiceConfig, operators: List[Dict] | None = None, config_path: str | None = None, np: int = 4)[source]

Initialize the DataJuicerOperator.

Parameters:
  • service_config (config) – The configuration for the DataJuicer service.

  • operators (List[Dict]) – A list of operators with their configurations.

  • config_path (str) – Path to the Data-Juicer configuration file.

  • np (int) – Number of processes to use for Data-Juicer. Default is 4.

Note

  • Must include one of the following, and the priority is from high to low:
    • config_path (str)

    • operators (List[Dict])

process(exps: List[Experience]) Tuple[List[Experience], Dict][source]

Process a list of experiences and return a transformed list.

Parameters:

exps (List[Experience]) – List of experiences to process, which contains all experiences generated by the Explorer in one explore step.

Returns:

A tuple containing the processed list of experiences and a dictionary of metrics.

Return type:

Tuple[List[Experience], Dict]

close()[source]

Close the DataJuicer client connection.