Ready-to-use Rewards
1. Overview
RM Gallery provides a comprehensive collection of ready-to-use reward models, organized by application scenarios to facilitate easy selection and implementation.
Our reward model collection is continuously expanding.
2. Alignment
The Alignment module provides reward models for evaluating and optimizing model outputs according to human values, including safety, helpfulness, and factual accuracy.
About Reward Model Definitions
The HHH (Helpfulness, Harmlessness, and Honesty) reward models are defined following the principles and methodology described in A General Language Assistant as a Laboratory for Alignment. The specific HHH scenarios are mostly derived from two major reward model benchmarks: RewardBench2 and RMB Bench. Our reward model design adopts the Principle-Critic-Score paradigm, where principles are generated by sampling 10% of data from the relevant benchmark scenarios. For detailed settings and comparative results, please refer to the autoprinciple tutorial.
Additionally, some reward models are sourced from external pre-defined implementations, such as detoxify.
2.1. Base Reward Models Overview
Scenario |
Description |
Register Name |
PrinciplesIncluded |
Helpfulness |
The assistant aims to provide helpful and informative responses to users, responding to their queries with relevant and accurate information. |
base_helpfulness_pointwise/base_helpfulness_listwise |
True |
Harmlessness |
The assistant aims to answer questions, avoiding harmful behaviors such as spreading misinformation, spreading harmful ideas, or engaging in other harmful activities. |
base_harmlessness_pointwise/base_harmlessness_listwise |
True |
Honesty |
The assistant aims to truthfully answer the user's questions with no bias or prejudice. |
base_honesty_pointwise/base_honesty_listwise |
True |
2.2. Harmlessness
Scenario |
Source |
Description |
Register Name |
PrinciplesIncluded |
Safety |
RewardBench2 |
Safety: Comply with or refuse prompts related to harmful use cases as well as general compliance behaviors. |
safety_pointwise_reward |
True |
detoxify |
detoxify |
Detoxify: Detecting different types of of toxicity like threats, obscenity, insults ans so on |
DetoxifyReward |
False |
2.3. Helpfulness
Scenario |
Source |
Description |
Register Name |
PrinciplesIncluded |
Brainstorming |
RMBBench |
Brainstorming: Generating text to come up with new ideas or solutions, with an emphasis on creativity and driving thinking. |
brainstorming_listwise_reward |
False |
Chat |
RMBBench |
Chat: Simulates human conversation and communicates a variety of topics through text understanding and generation, emphasizing coherence and natural flow of interaction. |
chat_listwise_reward |
True |
Classification |
RMBBench |
Classification: Entails assigning predefined categories or labels to text based on its content. |
classification_listwise_reward |
False |
Closed QA |
RMBBench |
Closed QA: Search for direct answers to specific questions in given text sources (i.e. given context, given options). |
closed_qa_listwise_reward |
False |
Code |
RMBBench |
Code: Involves generating, understanding, or modifying programming language code within text. |
code_listwise_reward |
False |
Generation |
RMBBench |
Generation: Creating new textual content, from articles to stories, with an emphasis on originality and creativity. |
generation_listwise_reward |
True |
Open QA |
RMBBench |
Open QA: Search for answers across a wide range of text sources. The challenge is to process large amounts of information and understand complex questions. |
open_qa_listwise_reward |
False |
Reasoning |
RMBBench |
Reasoning: Involves processing and analyzing text to draw inferences, make predictions, or solve problems, requiring an understanding of underlying concepts and relationships within the text. |
reasoning_listwise_reward |
False |
Rewrite |
RMBBench |
Rewrite: the assistant aims to modifies existing text to alter its style while preserving the original information and intent. |
rewrite_listwise_reward |
False |
Role Playing |
RMBBench |
Role Playing: Entails adopting specific characters or personas within text-based scenarios, engaging in dialogues or actions that reflect the assigned roles. |
role_palying_listwise_reward |
True |
Summarization |
RMBBench |
Summarization: The text is compressed into a short form, retaining the main information, which is divided into extraction (directly selected from the original text) and production (rewriting the information). |
summarization_listwise_reward |
True |
Translation |
RMBBench |
Translation: Converting text from one language to another. |
translation_listwise_reward |
True |
Focus |
RMBBench |
Focus: Detects high-quality, on-topic answers to general user queries |
focus_pointwise_reward |
True |
Math |
RewardBench2 |
Math: Solves problems at math, on open-ended human prompts ranging from middle school physics and geometry to college-level chemistry, calculus, combinatorics, and more. |
math_pointwise_reward |
True |
Precise IF |
RewardBench2 |
Precise Instruction Following : Follows precise instructions, such as 'Answer without the letter u'. |
precise_if_pointwise_reward |
True |
Click here to view relevant evaluation results
2.4. Honesty
Scenario |
Source |
Description |
Register Name |
PrinciplesIncluded |
Factuality |
RewardBench2 |
Factuality: Detects hallucinations and other basic errors in completions. |
factuality_pointwise_reward |
True |
3. Math Evaluation Rewards
Scenario |
Description |
Register Name |
Math Verify |
Verifies mathematical expressions using the math_verify library, supporting both LaTeX and plain expressions |
math_verify_reward |
4. Code Quality Rewards
Scenario |
Description |
Register Name |
Code Syntax |
Check code syntax using Abstract Syntax Tree to validate Python code blocks |
code_syntax_check |
Code Style |
Basic code style checking including indentation consistency and naming conventions |
code_style |
Patch Similarity |
Calculate similarity between generated patch and oracle patch using difflib.SequenceMatcher |
code_patch_similarity |
Code Execution |
Executes code against test cases and evaluates correctness based on test case results |
code_execution |
5. General Evaluation Rewards
Scenario |
Description |
Register Name |
Accuracy |
Calculate accuracy (exact match rate) between generated content and reference answer |
accuracy |
F1 Score |
Calculate F1 score between generated content and reference answer at word level with configurable tokenizer |
f1_score |
ROUGE |
ROUGE-L similarity evaluation using longest common subsequence |
rouge |
Number Accuracy |
Check numerical calculation accuracy by comparing numbers in generated vs reference content |
number_accuracy |
Scenario |
Description |
Register Name |
Reasoning Format |
Check format reward for thinking format and answer format with proper tags |
reasoning_format |
Tool Call Format |
Check tool call format including think, answer and tool_call tags with JSON validation |
reasoning_tool_call_format |
Length Penalty |
Text length based penalty for content that is too short or too long |
length_penalty |
N-gram Repetition |
Calculate N-gram repetition penalty supporting Chinese processing and multiple penalty strategies |
ngram_repetition_penalty |
Privacy Leakage |
Privacy information leakage detection for emails, phone numbers, ID cards, credit cards, and IP addresses |
privacy_leakage |