Find answers to common questions about RM-Gallery.
General Questions
What is RM-Gallery?
RM-Gallery is a comprehensive platform for training, building, and applying reward models. It provides:
- 35+ pre-built reward models for various scenarios
- Unified architecture for custom reward model development
- Integration with training frameworks (VERL)
- Benchmark evaluation tools
- Production-ready serving capabilities
Who should use RM-Gallery?
RM-Gallery is designed for:
- Researchers evaluating reward models on benchmarks
- ML Engineers building and deploying reward models in production
- AI Practitioners training reward models for RLHF/post-training
- Developers integrating reward models into applications
What scenarios does RM-Gallery support?
RM-Gallery covers diverse evaluation scenarios:
- Math: Correctness verification, step-by-step reasoning
- Code: Quality assessment, syntax checking, execution correctness
- Alignment: Helpfulness, harmlessness, honesty (3H)
- General: Accuracy, F1, ROUGE, factuality
- Format & Style: Length, repetition, privacy compliance
Getting Started
How do I install RM-Gallery?
# From PyPI (recommended)
pip install rm-gallery
# From source
git clone https://github.com/modelscope/RM-Gallery.git
cd RM-Gallery
pip install .
Requirements: Python >= 3.10 and < 3.13
Do I need API keys to use RM-Gallery?
It depends on which reward models you use:
- Rule-based models (e.g., length checks, format validation): No API key needed
- LLM-based models (e.g., helpfulness, safety evaluation): API key required
For LLM-based models, set up your credentials:
import os
os.environ["OPENAI_API_KEY"] = "your_api_key"
os.environ["BASE_URL"] = "your_base_url" # Optional: custom endpoint
How do I choose the right reward model?
- Check available models:
RewardRegistry.list() - Match your scenario: See the RM Library
- Consider complexity:
- Simple tasks → Rule-based models
- Complex evaluation → LLM-based models
- Test on your data: Try multiple models and compare results
Building Reward Models
Can I create my own reward model?
Yes! RM-Gallery provides multiple levels of customization:
Level 1: Custom Rubrics (Easiest)
from rm_gallery.gallery.rm.alignment.base import BaseHarmlessnessListWiseReward
custom_rm = BaseHarmlessnessListWiseReward(
name="custom_safety",
rubrics=["Your criterion 1", "Your criterion 2"],
llm=llm
)
Level 2: Custom LLM Template
- Inherit from BaseLLMReward
- Override _before_evaluate and _after_evaluate
- See Custom RM Tutorial
Level 3: Full Custom Logic
- Inherit from BasePointWiseReward or BaseListWiseReward
- Implement _evaluate method
- Complete control over evaluation logic
What's the difference between pointwise, pairwise, and listwise?
- Pointwise: Evaluates each response independently (e.g., "Is this response factually correct?")
-
Use case: Grammar checking, length validation, format compliance
-
Pairwise: Compares two responses directly (e.g., "Which response is better?")
-
Use case: Preference learning, A/B testing
-
Listwise: Ranks multiple responses (e.g., "Rank these 5 responses from best to worst")
- Use case: Best-of-N selection, multi-candidate evaluation
How do I use the Rubric-Critic-Score paradigm?
The Rubric-Critic-Score paradigm follows three steps:
- Rubric: Define evaluation criteria
- Critic: LLM analyzes responses based on rubrics
- Score: Get numerical scores
# 1. Define rubrics
rubrics = [
"Response must be factually accurate",
"Response should be concise and clear",
"Response must be helpful to the user"
]
# 2. Create rubric-based reward
from rm_gallery.gallery.rm.alignment.base import BaseHelpfulnessListWiseReward
rm = BaseHelpfulnessListWiseReward(
name="helpful_rm",
rubrics=rubrics,
llm=llm
)
# 3. Evaluate and get scores
result = rm.evaluate(sample)
Training
How do I train my own reward model?
RM-Gallery integrates with the VERL framework:
# 1. Prepare training data
python examples/data/data_from_yaml.py --config config.yaml
# 2. Launch distributed training
cd examples/train/pointwise
./run_pointwise.sh
See the Training Guide for details.
What training paradigms are supported?
- Pointwise: Train models to assign absolute scores
- Pairwise: Train models to predict preferences between pairs
- Bradley-Terry: Train models using the Bradley-Terry model
Do I need multiple GPUs for training?
It depends on model size and dataset:
- Small models (<3B): Single GPU possible
- Medium models (3B-14B): 2-4 GPUs recommended
- Large models (>14B): 8+ GPUs recommended
RM-Gallery supports distributed training via Ray.
Evaluation
What benchmarks are available?
RM-Gallery supports multiple standard benchmarks:
- RewardBench2: Latest comprehensive benchmark
- RM-Bench: Multi-dimensional evaluation
- Conflict Detector: Identify evaluation conflicts
- JudgeBench: Judge capability testing
How do I run evaluations efficiently?
Use batch processing with parallel workers:
# Batch evaluation (recommended)
results = rm.evaluate_batch(
samples,
max_workers=8 # Adjust based on your resources
)
# vs. Sequential evaluation (slower)
results = [rm.evaluate(sample) for sample in samples]
Batch processing can be 10-100x faster for large datasets!
How do I interpret reward scores?
Score interpretation depends on the reward model:
- Binary scores: 0.0 (bad) to 1.0 (good)
- Continuous scores: Usually normalized to [0, 1]
- Relative scores: Compare across responses (e.g., ranking)
Always check the reason field for detailed explanations:
for detail in result.output[0].answer.reward.details:
print(f"Score: {detail.score}")
print(f"Reason: {detail.reason}")
Production & Deployment
How do I deploy reward models in production?
RM-Gallery supports high-performance serving via New API:
- Set up New API server (see RM Server Guide)
- Deploy your reward model
- Update client code:
os.environ["BASE_URL"] = "https://your-api-endpoint.com"
rm = RewardRegistry.get("your_reward_model")
Benefits: - High throughput (100+ requests/sec) - Fault tolerance - Unified API management - Easy scaling
Can I use RM-Gallery with RLHF?
Yes! RM-Gallery integrates seamlessly with RLHF pipelines:
# Use reward model as RLHF reward function
from rm_gallery.core.reward.registry import RewardRegistry
reward_fn = RewardRegistry.get("alignment_reward")
# Integrate with your RLHF framework
# See examples/train/rl_training/
How do I handle rate limits?
For API-based reward models:
- Batch processing: Reduce number of API calls
- Parallel workers: Control
max_workersparameter - Caching: Cache results for duplicate queries
- Local deployment: Use New API for unlimited throughput
Troubleshooting
Import errors: "No module named 'rm_gallery'"
Solution: Make sure RM-Gallery is installed:
API errors: "Authentication failed"
Solution: Check your API credentials:
import os
print(os.environ.get("OPENAI_API_KEY")) # Should not be None
print(os.environ.get("BASE_URL"))
Evaluation is too slow
Solutions:
1. Use evaluate_batch() instead of individual evaluate() calls
2. Increase max_workers (but respect API rate limits)
3. Use rule-based models instead of LLM-based when possible
4. Deploy local serving with New API
"KeyError: reward model not found"
Solution: Check available models:
from rm_gallery.core.reward.registry import RewardRegistry
print(RewardRegistry.list()) # See all available models
Memory errors during training
Solutions: 1. Reduce batch size in training config 2. Use gradient accumulation 3. Enable mixed precision training (fp16/bf16) 4. Use more GPUs with data parallelism
Contributing
How can I contribute to RM-Gallery?
We welcome contributions! You can:
- Add new reward models to the gallery
- Improve documentation and examples
- Report bugs via GitHub Issues
- Submit benchmarks for evaluation
- Share use cases and best practices
See our Contribution Guide for details.
Can I contribute a benchmark?
Yes! We encourage benchmark contributions:
- Prepare your dataset in JSONL format
- Create a data loader (see
rm_gallery/gallery/data/) - Submit a PR with documentation
- We'll review and integrate it
How do I report a bug?
- Check if it's already reported: GitHub Issues
- Create a new issue with:
- Clear description
- Steps to reproduce
- Environment info (Python version, RM-Gallery version)
- Error messages/logs
Best Practices
What are the best practices for building reward models?
- Start with pre-built models: Test existing models before building custom ones
- Use rubrics: Define clear evaluation criteria
- Validate on diverse data: Test on multiple datasets
- Iterate: Refine based on evaluation results
- Document: Keep track of model versions and performance
How do I ensure reward model quality?
- Human evaluation: Compare against human judgments
- Cross-validation: Test on held-out data
- Benchmark testing: Evaluate on standard benchmarks
- A/B testing: Compare multiple reward models
- Monitor in production: Track performance metrics
What data format should I use?
RM-Gallery uses a standardized data schema:
from rm_gallery.core.data.schema import DataSample, DataOutput, Step
from rm_gallery.core.model.message import ChatMessage, MessageRole
sample = DataSample(
unique_id="example_1",
input=[ChatMessage(role=MessageRole.USER, content="...")],
output=[DataOutput(answer=Step(role=MessageRole.ASSISTANT, content="..."))]
)
See Data Pipeline Tutorial for details.
Still Have Questions?
- 📚 Full Documentation - Comprehensive guides
- 💬 GitHub Discussions - Community Q&A
- 🐛 Report Issues - Bug reports
- 🤝 Contribution Guide - Get involved
Last Updated: October 2025