trinity.common.workflows.math_trainable_ruler_workflow module#
Math workflow with trainable RULER.
- class trinity.common.workflows.math_trainable_ruler_workflow.MathTrainableRULERWorkflow(*, task: Task, model: ModelWrapper, auxiliary_models: List[OpenAI] | None = None)[source]#
Bases:
SimpleWorkflow
A workflow for math, where the policy model itself serves as a RULER reward model. Modified from MathRULERWorkflow. RULER is adapted from OpenPipe/ART
- __init__(*, task: Task, model: ModelWrapper, auxiliary_models: List[OpenAI] | None = None)[source]#
- run() List[Experience] [source]#
Modified from MathRULERWorkflow.run
- get_ruler_responses(responses: List[Experience], judger: Any, ruler_rollout_args: Any, gold_scores: List[float] | None = None) Tuple[float, List[Experience], List[float]] [source]#
Get RULER scores :returns: float
ruler_responses: List[Experience] ruler_scores: List[float]
- Return type:
judge_success_rate