data_juicer.ops.mapper.video_captioning_from_audio_mapper module¶

class data_juicer.ops.mapper.video_captioning_from_audio_mapper.VideoCaptioningFromAudioMapper(keep_original_sample: bool = True, *args, **kwargs)[source]¶

Bases: Mapper

Mapper to caption a video according to its audio streams based on Qwen-Audio model.

__init__(keep_original_sample: bool = True, *args, **kwargs)[source]¶

Initialization method.

Parameters:

keep_original_sample – whether to keep the original sample. If it’s set to False, there will be only captioned sample in the final datasets and the original sample will be removed. It’s True in default.
args – extra args
kwargs – extra args

process_batched(samples, rank=None)[source]¶