trinity.common.rewards

Submodules

trinity.common.rewards.accuracy_reward module

Accuracy Reward Function Class.

class trinity.common.rewards.accuracy_reward.AccuracyReward(answer_parser: Callable[[str], str] | None = None)[source]

Bases: RewardFn

A reward function that rewards correct answers. Ref: https://github.com/huggingface/open-r1/blob/main/src/open_r1/rewards.py

__init__(answer_parser: Callable[[str], str] | None = None)[source]

trinity.common.rewards.agents_reward module

trinity.common.rewards.countdown_reward module

Base Reward Function Class.

class trinity.common.rewards.countdown_reward.CountDownRewardFn[source]

Bases: RewardFn

A reward function that rewards for countdown task. Ref: Jiayi-Pan/TinyZero verl/utils/reward_score/countdown.py

__init__()[source]

trinity.common.rewards.dapo_reward module

Reward Function with Overlong Reward Shaping described in DAPO (https://arxiv.org/pdf/2503.14476)

class trinity.common.rewards.dapo_reward.MathDAPORewardFn(enable_overlong_penalty: bool | None = None, penalty_factor: float | None = None, max_response_length: int | None = None, cache_length: int | None = None)[source]

Bases: RewardFn

A reward function that follows the definition in DAPO for math task.

__init__(enable_overlong_penalty: bool | None = None, penalty_factor: float | None = None, max_response_length: int | None = None, cache_length: int | None = None) None[source]
compute_overlong_penalty(response_token)[source]

trinity.common.rewards.format_reward module

Base Reward Function Class.

class trinity.common.rewards.format_reward.FormatReward(pattern: str | None = None)[source]

Bases: RewardFn

A reward function that checks if the reasoning process is enclosed within <think> and </think> tags, while the final answer is enclosed within <answer> and </answer> tags. Ref: https://github.com/huggingface/open-r1/blob/main/src/open_r1/rewards.py

__init__(pattern: str | None = None)[source]

trinity.common.rewards.human_reward module

trinity.common.rewards.math_reward module

Math Reward Function Class.

class trinity.common.rewards.math_reward.MathRewardFn(answer_parser=<function simple_answer_parser>, pattern='.*?<think>.*?</think>\\s*<answer>.*?</answer>\\s*$')[source]

Bases: RewardFn

A reward function that rewards for math task.

DEFAULT_FORMAT_PATTERN = '.*?<think>.*?</think>\\s*<answer>.*?</answer>\\s*$'
DEFAULT_ANSWER_PARSER() str
__init__(answer_parser=<function simple_answer_parser>, pattern='.*?<think>.*?</think>\\s*<answer>.*?</answer>\\s*$') None[source]
class trinity.common.rewards.math_reward.MathBoxedRewardFn(**kwargs)[source]

Bases: RewardFn

A reward function that rewards for math task.

__init__(**kwargs) None[source]

trinity.common.rewards.reward_fn module

Base Reward Function Class.

class trinity.common.rewards.reward_fn.RewardFn(**kwargs)[source]

Bases: ABC

Base Reward Function Class.

abstract __init__(**kwargs) None[source]
class trinity.common.rewards.reward_fn.RMGalleryFn(reward_name, **kwargs)[source]

Bases: RewardFn

Reward Function from RMGallery. https://github.com/modelscope/RM-Gallery

__init__(reward_name, **kwargs)[source]

trinity.common.rewards.tool_reward module

trinity.common.rewards.utils module

Converts string list to structured ChatMessage list for debugging.

Parameters:

messages – List of alternating user/assistant messages

Returns:

List of structured ChatMessage objects

Module contents

Reward functions for RFT

class trinity.common.rewards.RewardFn(**kwargs)[source]

Bases: ABC

Base Reward Function Class.

abstract __init__(**kwargs) None[source]
class trinity.common.rewards.RMGalleryFn(reward_name, **kwargs)[source]

Bases: RewardFn

Reward Function from RMGallery. https://github.com/modelscope/RM-Gallery

__init__(reward_name, **kwargs)[source]
class trinity.common.rewards.AccuracyReward(answer_parser: Callable[[str], str] | None = None)[source]

Bases: RewardFn

A reward function that rewards correct answers. Ref: https://github.com/huggingface/open-r1/blob/main/src/open_r1/rewards.py

__init__(answer_parser: Callable[[str], str] | None = None)[source]
class trinity.common.rewards.CountDownRewardFn[source]

Bases: RewardFn

A reward function that rewards for countdown task. Ref: Jiayi-Pan/TinyZero verl/utils/reward_score/countdown.py

__init__()[source]
class trinity.common.rewards.FormatReward(pattern: str | None = None)[source]

Bases: RewardFn

A reward function that checks if the reasoning process is enclosed within <think> and </think> tags, while the final answer is enclosed within <answer> and </answer> tags. Ref: https://github.com/huggingface/open-r1/blob/main/src/open_r1/rewards.py

__init__(pattern: str | None = None)[source]
class trinity.common.rewards.MathRewardFn(answer_parser=<function simple_answer_parser>, pattern='.*?<think>.*?</think>\\s*<answer>.*?</answer>\\s*$')[source]

Bases: RewardFn

A reward function that rewards for math task.

DEFAULT_FORMAT_PATTERN = '.*?<think>.*?</think>\\s*<answer>.*?</answer>\\s*$'
DEFAULT_ANSWER_PARSER() str
__init__(answer_parser=<function simple_answer_parser>, pattern='.*?<think>.*?</think>\\s*<answer>.*?</answer>\\s*$') None[source]
class trinity.common.rewards.MathBoxedRewardFn(**kwargs)[source]

Bases: RewardFn

A reward function that rewards for math task.

__init__(**kwargs) None[source]
class trinity.common.rewards.MathDAPORewardFn(enable_overlong_penalty: bool | None = None, penalty_factor: float | None = None, max_response_length: int | None = None, cache_length: int | None = None)[source]

Bases: RewardFn

A reward function that follows the definition in DAPO for math task.

__init__(enable_overlong_penalty: bool | None = None, penalty_factor: float | None = None, max_response_length: int | None = None, cache_length: int | None = None) None[source]
compute_overlong_penalty(response_token)[source]