data_juicer.ops.filter.image_aesthetics_filter module

class data_juicer.ops.filter.image_aesthetics_filter.ImageAestheticsFilter(hf_scorer_model: str = '', trust_remote_code: bool = False, min_score: float = 0.5, max_score: float = 1.0, any_or_all: str = 'any', *args, **kwargs)[source]

Bases: Filter

Filter to keep samples with aesthetics scores within a specific range.

__init__(hf_scorer_model: str = '', trust_remote_code: bool = False, min_score: float = 0.5, max_score: float = 1.0, any_or_all: str = 'any', *args, **kwargs)[source]

Initialization method.

Parameters:
  • hf_scorer_model – Huggingface model name for the aesthetics predictor. By default, we will use ‘shunk031/aesthetics-predictor-v2-sac-logos-ava1-l14-linearMSE’, refer to pypi.org/project/simple-aesthetics-predictor

  • min_score – Min score for the predicted aesthetics in an image.

  • max_score – Max score for the predicted aesthetics in an image.

  • any_or_all – Keep this sample with ‘any’ or ‘all’ strategy of all images. ‘any’: keep this sample if any images meet the condition. ‘all’: keep this sample only if all images meet the condition.

  • args – Extra positional arguments.

  • kwargs – Extra keyword arguments.

compute_stats_single(sample, rank=None, context=False)[source]

Compute stats for the sample which is used as a metric to decide whether to filter this sample.

Parameters:
  • sample – input sample.

  • context – whether to store context information of intermediate vars in the sample temporarily.

Returns:

sample with computed stats

process_single(sample)[source]

For sample level, sample –> Boolean.

Parameters:

sample – sample to decide whether to filter

Returns:

true for keeping and false for filtering