data_juicer.ops.mapper.video_split_by_key_frame_mapper module

data_juicer.ops.mapper.video_split_by_key_frame_mapper.create_replacer(replacements)[源代码]
class data_juicer.ops.mapper.video_split_by_key_frame_mapper.VideoSplitByKeyFrameMapper(keep_original_sample: bool = True, save_dir: str = None, *args, **kwargs)[源代码]

基类:Mapper

Splits a video into segments based on key frames.

This operator processes video data by splitting it into multiple segments at key frame boundaries. It uses the key frames to determine where to make the splits. The original sample can be kept or discarded based on the keep_original_sample parameter. If save_dir is specified, the split video files will be saved in that directory; otherwise, they will be saved in the same directory as the input files. The operator processes each video in the sample and updates the sample with the new video keys and text placeholders. The Fields.source_file field is updated to reflect the new video segments. This operator works in batch mode, processing multiple samples at once.

__init__(keep_original_sample: bool = True, save_dir: str = None, *args, **kwargs)[源代码]

Initialization method.

参数:
  • keep_original_sample -- whether to keep the original sample. If it's set to False, there will be only split sample in the final datasets and the original sample will be removed. It's True in default.

  • save_dir -- The directory where generated video files will be stored. If not specified, outputs will be saved in the same directory as their corresponding input files. This path can alternatively be defined by setting the DJ_PRODUCED_DATA_DIR environment variable.

  • args -- extra args

  • kwargs -- extra args

get_split_key_frame(video_key, container)[源代码]
process_batched(samples)[源代码]