Skip to content

Ready-to-use Rewards

1. Overview

RM Gallery provides a comprehensive collection of ready-to-use reward models, organized by application scenarios to facilitate easy selection and implementation. Our reward model collection is continuously expanding.

2. Alignment

The Alignment module provides reward models for evaluating and optimizing model outputs according to human values, including safety, helpfulness, and factual accuracy.

About Reward Model Definitions The HHH (Helpfulness, Harmlessness, and Honesty) reward models are defined following the principles and methodology described in A General Language Assistant as a Laboratory for Alignment. The specific HHH scenarios are mostly derived from two major reward model benchmarks: RewardBench2 and RMB Bench. Our reward model design adopts the Principle-Critic-Score paradigm, where principles are generated by sampling 10% of data from the relevant benchmark scenarios. For detailed settings and comparative results, please refer to the autoprinciple tutorial. Additionally, some reward models are sourced from external pre-defined implementations, such as detoxify.

2.1. Base Reward Models Overview

Scenario Description Register Name PrinciplesIncluded
Helpfulness The assistant aims to provide helpful and informative responses to users, responding to their queries with relevant and accurate information. base_helpfulness_pointwise/base_helpfulness_listwise True
Harmlessness The assistant aims to answer questions, avoiding harmful behaviors such as spreading misinformation, spreading harmful ideas, or engaging in other harmful activities. base_harmlessness_pointwise/base_harmlessness_listwise True
Honesty The assistant aims to truthfully answer the user's questions with no bias or prejudice. base_honesty_pointwise/base_honesty_listwise True

2.2. Harmlessness

Scenario Source Description Register Name PrinciplesIncluded
Safety RewardBench2 Safety: Comply with or refuse prompts related to harmful use cases as well as general compliance behaviors. safety_pointwise_reward True
detoxify detoxify Detoxify: Detecting different types of of toxicity like threats, obscenity, insults ans so on DetoxifyReward False

2.3. Helpfulness

Scenario Source Description Register Name PrinciplesIncluded
Brainstorming RMBBench Brainstorming: Generating text to come up with new ideas or solutions, with an emphasis on creativity and driving thinking. brainstorming_listwise_reward False
Chat RMBBench Chat: Simulates human conversation and communicates a variety of topics through text understanding and generation, emphasizing coherence and natural flow of interaction. chat_listwise_reward True
Classification RMBBench Classification: Entails assigning predefined categories or labels to text based on its content. classification_listwise_reward False
Closed QA RMBBench Closed QA: Search for direct answers to specific questions in given text sources (i.e. given context, given options). closed_qa_listwise_reward False
Code RMBBench Code: Involves generating, understanding, or modifying programming language code within text. code_listwise_reward False
Generation RMBBench Generation: Creating new textual content, from articles to stories, with an emphasis on originality and creativity. generation_listwise_reward True
Open QA RMBBench Open QA: Search for answers across a wide range of text sources. The challenge is to process large amounts of information and understand complex questions. open_qa_listwise_reward False
Reasoning RMBBench Reasoning: Involves processing and analyzing text to draw inferences, make predictions, or solve problems, requiring an understanding of underlying concepts and relationships within the text. reasoning_listwise_reward False
Rewrite RMBBench Rewrite: the assistant aims to modifies existing text to alter its style while preserving the original information and intent. rewrite_listwise_reward False
Role Playing RMBBench Role Playing: Entails adopting specific characters or personas within text-based scenarios, engaging in dialogues or actions that reflect the assigned roles. role_palying_listwise_reward True
Summarization RMBBench Summarization: The text is compressed into a short form, retaining the main information, which is divided into extraction (directly selected from the original text) and production (rewriting the information). summarization_listwise_reward True
Translation RMBBench Translation: Converting text from one language to another. translation_listwise_reward True
Focus RMBBench Focus: Detects high-quality, on-topic answers to general user queries focus_pointwise_reward True
Math RewardBench2 Math: Solves problems at math, on open-ended human prompts ranging from middle school physics and geometry to college-level chemistry, calculus, combinatorics, and more. math_pointwise_reward True
Precise IF RewardBench2 Precise Instruction Following : Follows precise instructions, such as 'Answer without the letter u'. precise_if_pointwise_reward True

Click here to view relevant evaluation results

2.4. Honesty

Scenario Source Description Register Name PrinciplesIncluded
Factuality RewardBench2 Factuality: Detects hallucinations and other basic errors in completions. factuality_pointwise_reward True

3. Math Evaluation Rewards

Scenario Description Register Name
Math Verify Verifies mathematical expressions using the math_verify library, supporting both LaTeX and plain expressions math_verify_reward

4. Code Quality Rewards

Scenario Description Register Name
Code Syntax Check code syntax using Abstract Syntax Tree to validate Python code blocks code_syntax_check
Code Style Basic code style checking including indentation consistency and naming conventions code_style
Patch Similarity Calculate similarity between generated patch and oracle patch using difflib.SequenceMatcher code_patch_similarity
Code Execution Executes code against test cases and evaluates correctness based on test case results code_execution

5. General Evaluation Rewards

Scenario Description Register Name
Accuracy Calculate accuracy (exact match rate) between generated content and reference answer accuracy
F1 Score Calculate F1 score between generated content and reference answer at word level with configurable tokenizer f1_score
ROUGE ROUGE-L similarity evaluation using longest common subsequence rouge
Number Accuracy Check numerical calculation accuracy by comparing numbers in generated vs reference content number_accuracy

6. Format and Style Rewards

Scenario Description Register Name
Reasoning Format Check format reward for thinking format and answer format with proper tags reasoning_format
Tool Call Format Check tool call format including think, answer and tool_call tags with JSON validation reasoning_tool_call_format
Length Penalty Text length based penalty for content that is too short or too long length_penalty
N-gram Repetition Calculate N-gram repetition penalty supporting Chinese processing and multiple penalty strategies ngram_repetition_penalty
Privacy Leakage Privacy information leakage detection for emails, phone numbers, ID cards, credit cards, and IP addresses privacy_leakage