data_juicer.ops.filter.video_resolution_filter module

class data_juicer.ops.filter.video_resolution_filter.VideoResolutionFilter(min_width: int = 1, max_width: int = 9223372036854775807, min_height: int = 1, max_height: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]

Bases: Filter

Keep data samples whose videos’ resolutions are within a specified range.

This operator filters data samples based on the resolution of the videos they contain. It keeps samples if the video resolutions fall within the defined width and height ranges. The filtering strategy can be set to ‘any’ or ‘all’: - ‘any’: Keeps the sample if any video meets the resolution criteria. - ‘all’: Keeps the sample only if all videos meet the resolution criteria.

The operator computes and caches the ‘video_width’ and ‘video_height’ for each video in the sample. If no videos are present, it sets these fields to empty arrays. These cached values are used to determine whether to keep or filter out the sample.

__init__(min_width: int = 1, max_width: int = 9223372036854775807, min_height: int = 1, max_height: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]

Initialization method.

Parameters:
  • min_width – The min horizontal resolution.

  • max_width – The max horizontal resolution.

  • min_height – The min vertical resolution.

  • max_height – The max vertical resolution.

  • any_or_all – keep this sample with ‘any’ or ‘all’ strategy of all videos. ‘any’: keep this sample if any videos meet the condition. ‘all’: keep this sample only if all videos meet the condition.

  • args – extra args

  • kwargs – extra args

compute_stats_single(sample, context=False)[source]

Compute stats for the sample which is used as a metric to decide whether to filter this sample.

Parameters:
  • sample – input sample.

  • context – whether to store context information of intermediate vars in the sample temporarily.

Returns:

sample with computed stats

process_single(sample)[source]

For sample level, sample –> Boolean.

Parameters:

sample – sample to decide whether to filter

Returns:

true for keeping and false for filtering