data_juicer.ops.mapper.video_captioning_from_summarizer_mapper module¶

class data_juicer.ops.mapper.video_captioning_from_summarizer_mapper.VideoCaptioningFromSummarizerMapper(hf_summarizer: str = None, trust_remote_code: bool = False, consider_video_caption_from_video: bool = True, consider_video_caption_from_audio: bool = True, consider_video_caption_from_frames: bool = True, consider_video_tags_from_audio: bool = True, consider_video_tags_from_frames: bool = True, vid_cap_from_vid_args: Dict | None = None, vid_cap_from_frm_args: Dict | None = None, vid_tag_from_aud_args: Dict | None = None, vid_tag_from_frm_args: Dict | None = None, keep_tag_num: Annotated[int, Gt(gt=0)] = 5, keep_original_sample: bool = True, *args, **kwargs)[source]¶

Bases: Mapper

Mapper to generate video captions by summarizing several kinds of generated texts (captions from video/audio/frames, tags from audio/frames, …)

__init__(hf_summarizer: str = None, trust_remote_code: bool = False, consider_video_caption_from_video: bool = True, consider_video_caption_from_audio: bool = True, consider_video_caption_from_frames: bool = True, consider_video_tags_from_audio: bool = True, consider_video_tags_from_frames: bool = True, vid_cap_from_vid_args: Dict | None = None, vid_cap_from_frm_args: Dict | None = None, vid_tag_from_aud_args: Dict | None = None, vid_tag_from_frm_args: Dict | None = None, keep_tag_num: Annotated[int, Gt(gt=0)] = 5, keep_original_sample: bool = True, *args, **kwargs)[source]¶

Initialization method.

Parameters:

hf_summarizer – the summarizer model used to summarize texts generated by other methods.
trust_remote_code – whether to trust the remote code of HF models.
consider_video_caption_from_video – whether to consider the video caption generated from video directly in the summarization process. Default: True.
consider_video_caption_from_audio – whether to consider the video caption generated from audio streams in the video in the summarization process. Default: True.
consider_video_caption_from_frames – whether to consider the video caption generated from sampled frames from the video in the summarization process. Default: True.
consider_video_tags_from_audio – whether to consider the video tags generated from audio streams in the video in the summarization process. Default: True.
consider_video_tags_from_frames – whether to consider the video tags generated from sampled frames from the video in the summarization process. Default: True.
vid_cap_from_vid_args – the arg dict for video captioning from video directly with keys are the arg names and values are the arg values. Default: None.
vid_cap_from_frm_args – the arg dict for video captioning from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.
vid_tag_from_aud_args – the arg dict for video tagging from audio streams in the video with keys are the arg names and values are the arg values. Default: None.
vid_tag_from_frm_args – the arg dict for video tagging from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.
keep_tag_num – max number N of tags from sampled frames to keep. Too many tags might bring negative influence to summarized text, so we consider to only keep the N most frequent tags. Default: 5.
keep_original_sample – whether to keep the original sample. If it’s set to False, there will be only summarized captions in the final datasets and the original captions will be removed. It’s True in default.
args – extra args
kwargs – extra args

process_batched(samples, rank=None)[source]¶