data_juicer.ops.mapper.video_split_by_key_frame_mapper module¶
- class data_juicer.ops.mapper.video_split_by_key_frame_mapper.VideoSplitByKeyFrameMapper(keep_original_sample: bool = True, save_dir: str = None, *args, **kwargs)[source]¶
Bases:
Mapper
Splits a video into segments based on key frames.
This operator processes video data by splitting it into multiple segments at key frame boundaries. It uses the key frames to determine where to make the splits. The original sample can be kept or discarded based on the keep_original_sample parameter. If save_dir is specified, the split video files will be saved in that directory; otherwise, they will be saved in the same directory as the input files. The operator processes each video in the sample and updates the sample with the new video keys and text placeholders. The Fields.source_file field is updated to reflect the new video segments. This operator works in batch mode, processing multiple samples at once.
- __init__(keep_original_sample: bool = True, save_dir: str = None, *args, **kwargs)[source]¶
Initialization method.
- Parameters:
keep_original_sample – whether to keep the original sample. If it’s set to False, there will be only split sample in the final datasets and the original sample will be removed. It’s True in default.
save_dir – The directory where generated video files will be stored. If not specified, outputs will be saved in the same directory as their corresponding input files. This path can alternatively be defined by setting the DJ_PRODUCED_DATA_DIR environment variable.
args – extra args
kwargs – extra args