data_juicer.ops.filter.image_shape_filter module¶
- class data_juicer.ops.filter.image_shape_filter.ImageShapeFilter(min_width: int = 1, max_width: int = 9223372036854775807, min_height: int = 1, max_height: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]¶
Bases:
Filter
Filter to keep samples with image shape (width, height) within specific ranges.
This operator filters samples based on the width and height of images. It keeps samples where the image dimensions fall within the specified ranges. The operator supports two strategies: ‘any’ and ‘all’. In ‘any’ mode, a sample is kept if at least one image meets the criteria. In ‘all’ mode, all images in the sample must meet the criteria for the sample to be kept. The image width and height are stored in the ‘image_width’ and ‘image_height’ fields of the sample’s stats. If no images are present in the sample, the corresponding stats fields will be empty arrays.
- __init__(min_width: int = 1, max_width: int = 9223372036854775807, min_height: int = 1, max_height: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]¶
Initialization method.
- Parameters:
min_width – The min width to keep samples.
max_width – The max width to keep samples.
min_height – The min height to keep samples.
max_height – The max height to keep samples.
any_or_all – keep this sample with ‘any’ or ‘all’ strategy of all images. ‘any’: keep this sample if any images meet the condition. ‘all’: keep this sample only if all images meet the condition.
args – extra args
kwargs – extra args
- compute_stats_single(sample, context=False)[source]¶
Compute stats for the sample which is used as a metric to decide whether to filter this sample.
- Parameters:
sample – input sample.
context – whether to store context information of intermediate vars in the sample temporarily.
- Returns:
sample with computed stats