video_captioning_from_summarizer_mapper

Mapper to generate video captions by summarizing several kinds of generated texts (captions from video/audio/frames, tags from audio/frames, ...)

通过总结多种生成的文本(来自视频/音频/帧的字幕、来自音频/帧的标签等)来生成视频字幕的映射器。

Type 算子类型: mapper

Tags 标签: cpu, hf, multimodal

🔧 Parameter Configuration 参数配置

name 参数名

type 类型

default 默认值

desc 说明

hf_summarizer

<class 'str'>

None

the summarizer model used to summarize texts generated by other methods.

trust_remote_code

<class 'bool'>

False

whether to trust the remote code of HF models.

consider_video_caption_from_video

<class 'bool'>

True

whether to consider the video caption generated from video directly in the summarization process. Default: True.

consider_video_caption_from_audio

<class 'bool'>

True

whether to consider the video caption generated from audio streams in the video in the summarization process. Default: True.

consider_video_caption_from_frames

<class 'bool'>

True

whether to consider the video caption generated from sampled frames from the video in the summarization process. Default: True.

consider_video_tags_from_audio

<class 'bool'>

True

whether to consider the video tags generated from audio streams in the video in the summarization process. Default: True.

consider_video_tags_from_frames

<class 'bool'>

True

whether to consider the video tags generated from sampled frames from the video in the summarization process. Default: True.

vid_cap_from_vid_args

typing.Optional[typing.Dict]

None

the arg dict for video captioning from video directly with keys are the arg names and values are the arg values. Default: None.

vid_cap_from_frm_args

typing.Optional[typing.Dict]

None

the arg dict for video captioning from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.

vid_tag_from_aud_args

typing.Optional[typing.Dict]

None

the arg dict for video tagging from audio streams in the video with keys are the arg names and values are the arg values. Default: None.

vid_tag_from_frm_args

typing.Optional[typing.Dict]

None

the arg dict for video tagging from sampled frames from the video with keys are the arg names and values are the arg values. Default: None.

keep_tag_num

typing.Annotated[int, Gt(gt=0)]

5

max number N of tags from sampled frames to keep. Too many tags might bring negative influence to summarized text, so we consider to only keep the N most frequent tags. Default: 5.

keep_original_sample

<class 'bool'>

True

whether to keep the original sample. If it's set to False, there will be only summarized captions in the final datasets and the original captions will be removed. It's True in default.

args

''

extra args

kwargs

''

extra args

📊 Effect demonstration 效果演示

not available 暂无