data_juicer.ops.mapper.video_split_by_duration_mapper module

data_juicer.ops.mapper.video_split_by_duration_mapper.create_replacer(replacements)[source]
class data_juicer.ops.mapper.video_split_by_duration_mapper.VideoSplitByDurationMapper(split_duration: float = 10, min_last_split_duration: float = 0, keep_original_sample: bool = True, save_dir: str = None, *args, **kwargs)[source]

Bases: Mapper

Mapper to split video by duration.

__init__(split_duration: float = 10, min_last_split_duration: float = 0, keep_original_sample: bool = True, save_dir: str = None, *args, **kwargs)[source]

Initialization method.

Parameters:
  • split_duration – duration of each video split in seconds.

  • min_last_split_duration – The minimum allowable duration in seconds for the last video split. If the duration of the last split is less than this value, it will be discarded.

  • keep_original_sample – whether to keep the original sample. If it’s set to False, there will be only cut sample in the final datasets and the original sample will be removed. It’s True in default.

  • save_dir – The directory where generated video files will be stored. If not specified, outputs will be saved in the same directory as their corresponding input files. This path can alternatively be defined by setting the DJ_PRODUCED_DATA_DIR environment variable.

  • args – extra args

  • kwargs – extra args

split_videos_by_duration(video_key, container)[source]
process_batched(samples)[source]