video_captioning_from_summarizer_mapper¶

Mapper to generate video captions by summarizing several kinds of generated texts (captions from video/audio/frames, tags from audio/frames, ...)

通过总结多种生成的文本（来自视频/音频/帧的字幕、来自音频/帧的标签等）来生成视频字幕的映射器。

Type 算子类型: mapper

Tags 标签: cpu, hf, multimodal

🔧 Parameter Configuration 参数配置¶

name 参数名	type 类型	default 默认值	desc 说明
`hf_summarizer`	<class 'str'>	`None`	the summarizer model used to summarize texts generated by other methods.
`trust_remote_code`	<class 'bool'>	`False`	whether to trust the remote code of HF models.
`consider_video_caption_from_video`	<class 'bool'>	`True`	whether to consider the video caption generated from video directly in the summarization process. Default: True.
`consider_video_caption_from_audio`	<class 'bool'>	`True`	whether to consider the video caption generated from audio streams in the video in the summarization process. Default: True.
`consider_video_caption_from_frames`	<class 'bool'>	`True`	whether to consider the video caption generated from sampled frames from the video in the summarization process. Default: True.
`consider_video_tags_from_audio`	<class 'bool'>	`True`	whether to consider the video tags generated from audio streams in the video in the summarization process. Default: True.
`consider_video_tags_from_frames`	<class 'bool'>	`True`	whether to consider the video tags generated from sampled frames from the video in the summarization process. Default: True.
`vid_cap_from_vid_args`	typing.Optional[typing.Dict]	`None`	the arg dict for video captioning from video directly with keys are the arg names and values are the arg values. Default: None.
`vid_cap_from_frm_args`	typing.Optional[typing.Dict]	`None`	the arg dict for video captioning from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.
`vid_tag_from_aud_args`	typing.Optional[typing.Dict]	`None`	the arg dict for video tagging from audio streams in the video with keys are the arg names and values are the arg values. Default: None.
`vid_tag_from_frm_args`	typing.Optional[typing.Dict]	`None`	the arg dict for video tagging from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.
`keep_tag_num`	typing.Annotated[int, Gt(gt=0)]	`5`	max number N of tags from sampled frames to keep. Too many tags might bring negative influence to summarized text, so we consider to only keep the N most frequent tags. Default: 5.
`keep_original_sample`	<class 'bool'>	`True`	whether to keep the original sample. If it's set to False, there will be only summarized captions in the final datasets and the original captions will be removed. It's True in default.
`args`		`''`	extra args
`kwargs`		`''`	extra args

📊 Effect demonstration 效果演示¶

not available 暂无

video_captioning_from_summarizer_mapper¶

🔧 Parameter Configuration 参数配置¶

📊 Effect demonstration 效果演示¶

🔗 related links 相关链接¶