Using Built-in RMs

1. Overview

RM Gallery provides a comprehensive collection of ready-to-use reward models, organized by application scenarios to facilitate easy selection and implementation. Our reward model collection is continuously expanding.

2. Alignment

The Alignment module provides reward models for evaluating and optimizing model outputs according to human values, including safety, helpfulness, and factual accuracy.

About Reward Model Definitions The HHH (Helpfulness, Harmlessness, and Honesty) reward models are defined following the rubrics and methodology described in A General Language Assistant as a Laboratory for Alignment. The specific HHH scenarios are mostly derived from two major reward model benchmarks: RewardBench2 and RMB Bench. Our reward model design adopts the Rubric-Critic-Score paradigm, where rubrics are generated by sampling 10% of data from the relevant benchmark scenarios. For detailed settings and comparative results, please refer to the autorubric tutorial. Additionally, some reward models are sourced from external pre-defined implementations, such as detoxify.

2.1. Base Reward Models Overview

Scenario	Description	Register Name	RubricsIncluded
Helpfulness	The assistant aims to provide helpful and informative responses to users, responding to their queries with relevant and accurate information.	base_helpfulness_pointwise/base_helpfulness_listwise	True
Harmlessness	The assistant aims to answer questions, avoiding harmful behaviors such as spreading misinformation, spreading harmful ideas, or engaging in other harmful activities.	base_harmlessness_pointwise/base_harmlessness_listwise	True
Honesty	The assistant aims to truthfully answer the user's questions with no bias or prejudice.	base_honesty_pointwise/base_honesty_listwise	True

2.2. Harmlessness

Scenario	Source	Description	Register Name	RubricsIncluded
Safety	RewardBench2	Safety: Comply with or refuse prompts related to harmful use cases as well as general compliance behaviors.	safety_pointwise_reward	True
detoxify	detoxify	Detoxify: Detecting different types of of toxicity like threats, obscenity, insults ans so on	DetoxifyReward	False

2.3. Helpfulness

Scenario	Source	Description	Register Name	RubricsIncluded
Brainstorming	RMBBench	Brainstorming: Generating text to come up with new ideas or solutions, with an emphasis on creativity and driving thinking.	brainstorming_listwise_reward	False
Chat	RMBBench	Chat: Simulates human conversation and communicates a variety of topics through text understanding and generation, emphasizing coherence and natural flow of interaction.	chat_listwise_reward	True
Classification	RMBBench	Classification: Entails assigning predefined categories or labels to text based on its content.	classification_listwise_reward	False
Closed QA	RMBBench	Closed QA: Search for direct answers to specific questions in given text sources (i.e. given context, given options).	closed_qa_listwise_reward	False
Code	RMBBench	Code: Involves generating, understanding, or modifying programming language code within text.	code_listwise_reward	False
Generation	RMBBench	Generation: Creating new textual content, from articles to stories, with an emphasis on originality and creativity.	generation_listwise_reward	True
Open QA	RMBBench	Open QA: Search for answers across a wide range of text sources. The challenge is to process large amounts of information and understand complex questions.	open_qa_listwise_reward	False
Reasoning	RMBBench	Reasoning: Involves processing and analyzing text to draw inferences, make predictions, or solve problems, requiring an understanding of underlying concepts and relationships within the text.	reasoning_listwise_reward	False
Rewrite	RMBBench	Rewrite: the assistant aims to modifies existing text to alter its style while preserving the original information and intent.	rewrite_listwise_reward	False
Role Playing	RMBBench	Role Playing: Entails adopting specific characters or personas within text-based scenarios, engaging in dialogues or actions that reflect the assigned roles.	role_palying_listwise_reward	True
Summarization	RMBBench	Summarization: The text is compressed into a short form, retaining the main information, which is divided into extraction (directly selected from the original text) and production (rewriting the information).	summarization_listwise_reward	True
Translation	RMBBench	Translation: Converting text from one language to another.	translation_listwise_reward	True
Focus	RMBBench	Focus: Detects high-quality, on-topic answers to general user queries	focus_pointwise_reward	True
Math	RewardBench2	Math: Solves problems at math, on open-ended human prompts ranging from middle school physics and geometry to college-level chemistry, calculus, combinatorics, and more.	math_pointwise_reward	True
Precise IF	RewardBench2	Precise Instruction Following : Follows precise instructions, such as 'Answer without the letter u'.	precise_if_pointwise_reward	True

Click here to view relevant evaluation results

2.4. Honesty

Scenario	Source	Description	Register Name	RubricsIncluded
Factuality	RewardBench2	Factuality: Detects hallucinations and other basic errors in completions.	factuality_pointwise_reward	True

3. Math Evaluation Rewards

Scenario	Description	Register Name
Math Verify	Verifies mathematical expressions using the math_verify library, supporting both LaTeX and plain expressions	math_verify_reward

4. Code Quality Rewards

Scenario	Description	Register Name
Code Syntax	Check code syntax using Abstract Syntax Tree to validate Python code blocks	code_syntax_check
Code Style	Basic code style checking including indentation consistency and naming conventions	code_style
Patch Similarity	Calculate similarity between generated patch and oracle patch using difflib.SequenceMatcher	code_patch_similarity
Code Execution	Executes code against test cases and evaluates correctness based on test case results	code_execution

5. General Evaluation Rewards

Scenario	Description	Register Name
Accuracy	Calculate accuracy (exact match rate) between generated content and reference answer	accuracy
F1 Score	Calculate F1 score between generated content and reference answer at word level with configurable tokenizer	f1_score
ROUGE	ROUGE-L similarity evaluation using longest common subsequence	rouge
Number Accuracy	Check numerical calculation accuracy by comparing numbers in generated vs reference content	number_accuracy

6. Format and Style Rewards

Scenario	Description	Register Name
Reasoning Format	Check format reward for thinking format and answer format with proper tags	reasoning_format
Tool Call Format	Check tool call format including think, answer and tool_call tags with JSON validation	reasoning_tool_call_format
Length Penalty	Text length based penalty for content that is too short or too long	length_penalty
N-gram Repetition	Calculate N-gram repetition penalty supporting Chinese processing and multiple penalty strategies	ngram_repetition_penalty
Privacy Leakage	Privacy information leakage detection for emails, phone numbers, ID cards, credit cards, and IP addresses	privacy_leakage