data_juicer.ops.filter.video_resolution_filter module¶
- class data_juicer.ops.filter.video_resolution_filter.VideoResolutionFilter(min_width: int = 1, max_width: int = 9223372036854775807, min_height: int = 1, max_height: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]¶
Bases:
Filter
Keep data samples whose videos’ resolutions are within a specified range.
This operator filters data samples based on the resolution of the videos they contain. It keeps samples if the video resolutions fall within the defined width and height ranges. The filtering strategy can be set to ‘any’ or ‘all’: - ‘any’: Keeps the sample if any video meets the resolution criteria. - ‘all’: Keeps the sample only if all videos meet the resolution criteria.
The operator computes and caches the ‘video_width’ and ‘video_height’ for each video in the sample. If no videos are present, it sets these fields to empty arrays. These cached values are used to determine whether to keep or filter out the sample.
- __init__(min_width: int = 1, max_width: int = 9223372036854775807, min_height: int = 1, max_height: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]¶
Initialization method.
- Parameters:
min_width – The min horizontal resolution.
max_width – The max horizontal resolution.
min_height – The min vertical resolution.
max_height – The max vertical resolution.
any_or_all – keep this sample with ‘any’ or ‘all’ strategy of all videos. ‘any’: keep this sample if any videos meet the condition. ‘all’: keep this sample only if all videos meet the condition.
args – extra args
kwargs – extra args
- compute_stats_single(sample, context=False)[source]¶
Compute stats for the sample which is used as a metric to decide whether to filter this sample.
- Parameters:
sample – input sample.
context – whether to store context information of intermediate vars in the sample temporarily.
- Returns:
sample with computed stats