trinity.buffer.operators.filters.reward_filter module

class trinity.buffer.operators.filters.reward_filter.RewardFilter(threshold: float = 0.0)[source]

Filter experiences based on the reward value.

Note: This filter assumes that the reward is already calculated and stored in the Experience object.

process(exps: List[Experience]) → Tuple[List[Experience], dict][source]: Filter experiences based on reward value.

class trinity.buffer.operators.filters.reward_filter.RewardSTDFilter(threshold: float = 0.0)[source]

Filter experiences based on the standard deviation of rewards within each group.

Note: This filter assumes that the reward is already calculated and stored in the Experience object.

process(exps: List[Experience]) → Tuple[List[Experience], dict][source]: Filter experiences based on reward std.