trinity.trainer.verl.utils module

Utils for ccompatibility issues with verl.

trinity.trainer.verl.utils.to_data_proto(experiences: Experiences) DataProto[source]

Convert Experiences to verl DataProto.

trinity.trainer.verl.utils.compute_data_metrics(batch: DataProto, use_critic: bool = False) dict[source]

Computes various metrics from a batch of data for PPO training. Modified from verl.trainer.ppo.metric_utils.compute_data_metrics

This function calculates metrics related to scores, rewards, advantages, returns, values, and sequence lengths from a batch of data. It provides statistical information (mean, max, min) for each metric category.

Parameters:
  • batch – A DataProto object containing batch data with token-level scores, rewards, advantages, etc.

  • use_critic – Whether to include critic-specific metrics. Defaults to True.

Returns:

  • critic/score/mean, max, min: Statistics about sequence scores

  • critic/rewards/mean, max, min: Statistics about sequence rewards

  • critic/advantages/mean, max, min: Statistics about advantages

  • critic/returns/mean, max, min: Statistics about returns

  • critic/values/mean, max, min: Statistics about critic values (if use_critic=True)

  • critic/vf_explained_var: Explained variance of the value function (if use_critic=True)

  • response_length/mean, max, min, clip_ratio: Statistics about response lengths

  • prompt_length/mean, max, min, clip_ratio: Statistics about prompt lengths

Return type:

A dictionary of metrics including