Welcome to the RM-Gallery tutorial series! This directory contains comprehensive guides to help you master reward models.

🗺️ Learning Paths

🌱 Beginner Path

Goal: Get started with reward models in 30 minutes

  1. Quickstart Guide - Install, use, and evaluate your first RM (5 min)
  2. Building RM Overview - Understand RM types and architecture (10 min)
  3. Using Built-in RMs - Explore 35+ pre-built models (15 min)

🚀 Intermediate Path

Goal: Build and customize reward models

  1. Building Custom RMs - Create rule-based and LLM-based rewards (30 min)
  2. Data Pipeline - Load, process, and transform data (20 min)
  3. End-to-End Tutorial - Complete workflow from data to deployment (30 min)

🎓 Advanced Path

Goal: Train, evaluate, and deploy at scale

  1. Training RM Overview - Understand training paradigms and setup (15 min)
  2. Training with VERL - Complete RL-based training workflow (60 min)
  3. High-Performance Serving - Deploy RM as production service (45 min)

📚 Tutorial Catalog

Building Reward Models

Tutorial Level Time Description
Overview Beginner 10 min Introduction to building RMs
Ready-to-Use RMs Beginner 15 min Using pre-built models
Custom Rewards Intermediate 30 min Building custom RMs
Auto Rubric Advanced 45 min Automatic rubric generation

Training Reward Models

Tutorial Level Time Description
Training Overview Intermediate 15 min Introduction to training
Bradley-Terry RM Advanced 60 min Training Bradley-Terry models
SFT RM Advanced 45 min Training with SFT
RL Training Advanced 90 min Full RL-based training

Evaluating Reward Models

Tutorial Level Time Description
Evaluation Overview Beginner 10 min Introduction to evaluation
RMB Intermediate 30 min Reward Model Benchmark
RM-Bench Intermediate 30 min Subtlety and style evaluation
JudgeBench Intermediate 30 min Judge capability testing
RewardBench2 Intermediate 30 min Latest benchmark
Conflict Detector Advanced 45 min Detect evaluation conflicts

Data Processing

Tutorial Level Time Description
Data Pipeline Beginner 20 min Complete data workflow
Data Annotation Intermediate 30 min Annotating training data
Data Loading Beginner 15 min Loading from various sources
Data Processing Intermediate 25 min Transforming data

Applications

Tutorial Level Time Description
RM Server Advanced 45 min Deploy RM as service
Best-of-N Intermediate 20 min Select best response
Data Refinement Intermediate 30 min Improve data quality
Post Training Advanced 60 min RLHF integration

🎯 By Use Case

I want to...

Evaluate AI responses → Start with Quickstart → Then Using Built-in RMs

Build a custom reward model → Read Building Custom RMs → Try End-to-End Tutorial

Train my own reward model → Start with Training Overview → Then RL Training

Test on benchmarks → Read Evaluation Overview → Try specific benchmarks: RMB, RM-Bench, RewardBench2

Deploy to production → Follow RM Server Guide → Implement Best-of-N

Process custom data → Read Data Pipeline → Use Data Loading

💡 Tutorial Tips

Before You Start

  • ✅ Install RM-Gallery: pip install rm-gallery
  • ✅ Set up Python environment (>= 3.10, < 3.13)
  • ✅ (Optional) Get API credentials for LLM-based models

While Learning

  • 📖 Read in order: Tutorials build on each other
  • 💻 Run the code: Try examples in your environment
  • 🔄 Experiment: Modify code and see what happens
  • Ask questions: Use GitHub Discussions

After Completing

  • 🎯 Apply to your project: Use what you learned
  • 🤝 Share feedback: Help us improve tutorials
  • 📝 Contribute: Add your own examples

Essential

Interactive

Reference

🆘 Getting Help

Stuck on a tutorial?

  1. Check the FAQ first
  2. Search GitHub Issues
  3. Ask in GitHub Discussions
  4. Join our community channels

Found an error?

Please open a GitHub Issue with the tutorial name and problem description.

🚀 Next Steps

After completing the tutorials:

  1. Build your first project using RM-Gallery
  2. Share your experience with the community
  3. Contribute back with examples or improvements
  4. Stay updated on new features and models

Ready to start? Go to the Quickstart Guide 🎉

Have questions? Check the FAQ or ask in Discussions 💬

Want to contribute? Read our Contribution Guide 🤝