data_juicer.ops.mapper.video_split_by_scene_mapper module

data_juicer.ops.mapper.video_split_by_scene_mapper.replace_func(match, scene_counts_iter)[source]
class data_juicer.ops.mapper.video_split_by_scene_mapper.VideoSplitBySceneMapper(detector: str = 'ContentDetector', threshold: Annotated[float, Ge(ge=0)] = 27.0, min_scene_len: Annotated[int, Ge(ge=0)] = 15, show_progress: bool = False, save_dir: str = None, *args, **kwargs)[source]

Bases: Mapper

Mapper to cut videos into scene clips.

avaliable_detectors = {'AdaptiveDetector': ['window_width', 'min_content_val', 'weights', 'luma_only', 'kernel_size', 'video_manager', 'min_delta_hsv'], 'ContentDetector': ['weights', 'luma_only', 'kernel_size'], 'ThresholdDetector': ['fade_bias', 'add_final_scene', 'method', 'block_size']}
__init__(detector: str = 'ContentDetector', threshold: Annotated[float, Ge(ge=0)] = 27.0, min_scene_len: Annotated[int, Ge(ge=0)] = 15, show_progress: bool = False, save_dir: str = None, *args, **kwargs)[source]

Initialization method.

Parameters:
  • detector – Algorithm from scenedetect.detectors. Should be one of [‘ContentDetector’, ‘ThresholdDetector’, ‘AdaptiveDetector`].

  • threshold – Threshold passed to the detector.

  • min_scene_len – Minimum length of any scene.

  • show_progress – Whether to show progress from scenedetect.

  • save_dir – The directory where generated video files will be stored. If not specified, outputs will be saved in the same directory as their corresponding input files. This path can alternatively be defined by setting the DJ_PRODUCED_DATA_DIR environment variable.

  • args – extra args

  • kwargs – extra args

process_single(sample, context=False)[source]

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample