trinity.common.models.mm_utils module

trinity.common.models.mm_utils module#

“Multi-modal utilities for processing and handling multi-modal data such as images and videos. Only support Qwen2.5 VL series.

Modified from: verl/utils/dataset/rl_dataset.py

trinity.common.models.mm_utils.build_multi_modal_inputs(prompt: str, images: List[Image], videos: List[ndarray], processor: Any) Dict[str, Any][source]#

Preprocess multi-modal data and build multi-modal inputs

trinity.common.models.mm_utils.convert_messages_to_mm_format(messages: List[Dict]) List[Dict][source]#