Train, build, and apply reward models with ease. Comprehensive platform for LLM evaluation, RLHF training, and AI alignment.

RM-Gallery Logo

RM-Gallery: A One-Stop Reward Model Platform
Train, Build, and Apply Reward Models with Ease.


🚀

Quick Start

Get started in 5 minutes

Start Now →
📚

Tutorials

Step-by-step guides

Learn More →
📚

RM Library

35+ pre-built models

Explore Models →

FAQ

Common questions

Get Answers →

📢 News

  • [2025-07-09] We release RM Gallery v0.1.0 now, which is also available in PyPI!

RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities.

Framework
RM-Gallery Framework

🏋️‍♂️ Training RM

  • Integrated RM Training Pipeline: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework.

Training RM Accuracy Curve
RM Training Pipeline improves accuracy on RM Bench

This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%.

🏗️ Building RM

  • Unified Reward Model Architecture: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise)

  • Comprehensive RM Gallery: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel.

  • Rubric-Critic-Score Paradigm: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data.

The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise.

🛠️ Applying RM

  • Multiple Usage Scenarios: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N,data-correction)

  • High-Performance RM Serving: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency.

📥 Installation

RM Gallery requires Python >= 3.10 and < 3.13

pip install rm-gallery
# Pull the source code from GitHub
git clone https://github.com/modelscope/RM-Gallery.git
cd RM-Gallery

# Install the package
pip install .

🚀 Quick Start

Get started with RM-Gallery in just a few lines of code:

from rm_gallery.core.reward.registry import RewardRegistry

# Use a built-in reward model
rm = RewardRegistry.get("harmlessness")

# Evaluate your data
result = rm.evaluate(sample)

Ready to dive deeper? Check out our Quickstart Guide for a complete walkthrough, or explore our Tutorials for specific use cases.


🎯 Core Capabilities

🏋️‍♂️ Training RM

Train custom reward models with our integrated pipeline, supporting both pointwise and pairwise paradigms. Compatible with popular frameworks like VERL for reinforcement learning.

Key Features: - Pointwise & pairwise training modes - Distributed training support (Ray) - Integration with RLHF pipelines

→ Training RM Tutorial


🏗️ Building RM

Build reward models tailored to your needs with our flexible architecture.

35+ Pre-built Models covering: - 🧮 Math: Mathematical correctness verification - 💻 Code: Code quality and execution assessment - 🎯 Alignment: Helpfulness, harmlessness, honesty - 📊 General: Accuracy, F1, ROUGE metrics - ✍️ Style: Format, length, privacy compliance

Or Build Your Own: - Use our Rubric-Critic-Score paradigm - Customize LLM templates - Implement rule-based logic

→ View RM Library | → Building Guide


🛠️ Applying RM

Put your reward models to work in production scenarios:

  • Best-of-N: Select top responses from multiple candidates
  • Post-Training: Integrate with RLHF for model optimization
  • Data Refinement: Iteratively improve outputs with RM feedback
  • RM Server: Deploy high-throughput, fault-tolerant RM services

→ Application Tutorials

🤝 Contribute

Contributions are always encouraged!

We highly recommend install pre-commit hooks in this repo before committing pull requests. These hooks are small house-keeping scripts executed every time you make a git commit, which will take care of the formatting and linting automatically.

pip install -e .
pre-commit install

Please refer to our Contribution Guide for more details.

📝 Citation

Reference to cite if you use RM-Gallery in a paper:

@software{
title = {RM-Gallery: A One-Stop Reward Model Platform},
author = {The RM-Gallery Team},
url = {https://github.com/modelscope/RM-Gallery},
month = {07},
year = {2025}
}