trinity.algorithm.sample_strategy

Submodules

trinity.algorithm.sample_strategy.mix_sample_strategy module

class trinity.algorithm.sample_strategy.mix_sample_strategy.MixSampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: SampleStrategy

The default sample strategy.

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

sample(step: int) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

classmethod default_args() → Dict[source]: Get the default arguments of the sample strategy.

trinity.algorithm.sample_strategy.mix_sample_strategy.to_data_proto_mix(experiences: Experiences, is_expert_mask: tensor) → DataProto[source]

trinity.algorithm.sample_strategy.sample_strategy module

class trinity.algorithm.sample_strategy.sample_strategy.SampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: ABC

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs) → None[source]

abstract sample(step: int) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

abstract warmup_state(step: int) → Tuple[bool, bool][source]

Check the warmup state of the current step.

Parameters:: step (int) – The step number of current step.
Returns:: Current step is in warmup or not. bool: Warmup is finished on this step or not.
Return type:: bool

abstract classmethod default_args() → dict[source]: Get the default arguments of the sample strategy.

class trinity.algorithm.sample_strategy.sample_strategy.WarmupSampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: SampleStrategy

The default sample strategy.

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

sample(step: int, **kwargs) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

warmup_state(step: int) → Tuple[bool, bool][source]

Check the warmup state of the current step.

Parameters:: step (int) – The step number of current step.
Returns:: Current step is in warmup or not. bool: Warmup is finished on this step or not.
Return type:: bool

classmethod default_args() → dict[source]: Get the default arguments of the sample strategy.

class trinity.algorithm.sample_strategy.sample_strategy.DefaultSampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: SampleStrategy

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

sample(step: int, **kwargs) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

warmup_state(step: int) → Tuple[bool, bool][source]

Check the warmup state of the current step.

Parameters:: step (int) – The step number of current step.
Returns:: Current step is in warmup or not. bool: Warmup is finished on this step or not.
Return type:: bool

classmethod default_args() → dict[source]: Get the default arguments of the sample strategy.

class trinity.algorithm.sample_strategy.sample_strategy.DPOSampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: WarmupSampleStrategy

sample(step: int, **kwargs) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

trinity.algorithm.sample_strategy.utils module

trinity.algorithm.sample_strategy.utils.to_data_proto(experiences: Experiences) → DataProto[source]

trinity.algorithm.sample_strategy.utils.representative_sample(experiences: List[Experience]) → List[dict][source]

Module contents

class trinity.algorithm.sample_strategy.SampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: ABC

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs) → None[source]

abstract sample(step: int) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

abstract warmup_state(step: int) → Tuple[bool, bool][source]

Check the warmup state of the current step.

Parameters:: step (int) – The step number of current step.
Returns:: Current step is in warmup or not. bool: Warmup is finished on this step or not.
Return type:: bool

abstract classmethod default_args() → dict[source]: Get the default arguments of the sample strategy.

class trinity.algorithm.sample_strategy.DefaultSampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: SampleStrategy

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

sample(step: int, **kwargs) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

warmup_state(step: int) → Tuple[bool, bool][source]

Check the warmup state of the current step.

Parameters:: step (int) – The step number of current step.
Returns:: Current step is in warmup or not. bool: Warmup is finished on this step or not.
Return type:: bool

classmethod default_args() → dict[source]: Get the default arguments of the sample strategy.

class trinity.algorithm.sample_strategy.WarmupSampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: SampleStrategy

The default sample strategy.

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

sample(step: int, **kwargs) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

warmup_state(step: int) → Tuple[bool, bool][source]

Check the warmup state of the current step.

Parameters:: step (int) – The step number of current step.
Returns:: Current step is in warmup or not. bool: Warmup is finished on this step or not.
Return type:: bool

classmethod default_args() → dict[source]: Get the default arguments of the sample strategy.

class trinity.algorithm.sample_strategy.MixSampleStrategy(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

Bases: SampleStrategy

The default sample strategy.

__init__(buffer_config: BufferConfig, trainer_type: str, **kwargs)[source]

sample(step: int) → Tuple[Any, Dict, List][source]

Sample data from buffer.

Parameters:: step (int) – The step number of current step.
Returns:: The sampled data. Dict: Metrics for logging. List: Representative data for logging.
Return type:: Any

classmethod default_args() → Dict[source]: Get the default arguments of the sample strategy.