data_juicer.ops.filter.video_aesthetics_filter module¶

class data_juicer.ops.filter.video_aesthetics_filter.VideoAestheticsFilter(hf_scorer_model: str = '', trust_remote_code: bool = False, min_score: float = 0.4, max_score: float = 1.0, frame_sampling_method: str = 'uniform', frame_num: Annotated[int, Gt(gt=0)] = 3, any_or_all: str = 'any', reduce_mode: str = 'avg', *args, **kwargs)[source]¶

Bases: Filter

Filter to keep data samples with aesthetics scores for specified frames in the videos within a specific range.

This operator evaluates the aesthetic quality of video frames using a Hugging Face model. It keeps samples where the aesthetics scores of the specified frames fall within a given range. The key metric, ‘video_frames_aesthetics_score’, is computed by averaging, taking the max, or min of the frame scores, depending on the reduce mode. Frame sampling can be done uniformly or by extracting all keyframes. The filter applies a ‘any’ or ‘all’ strategy to decide if a sample should be kept based on the scores of multiple videos.

__init__(hf_scorer_model: str = '', trust_remote_code: bool = False, min_score: float = 0.4, max_score: float = 1.0, frame_sampling_method: str = 'uniform', frame_num: Annotated[int, Gt(gt=0)] = 3, any_or_all: str = 'any', reduce_mode: str = 'avg', *args, **kwargs)[source]¶

Initialization method.

Parameters:

hf_scorer_model – Huggingface model name for the aesthetics predictor. By default, we will use ‘shunk031/aesthetics-predictor-v2-sac-logos-ava1-l14-linearMSE’, refer to pypi.org/project/simple-aesthetics-predictor
trust_remote_code – whether to trust the remote code of HF models.
min_score – Min score for the predicted aesthetics in a video.
max_score – Max score for the predicted aesthetics in a video.
frame_sampling_method – sampling method of extracting frame images from the videos. Should be one of [“all_keyframes”, “uniform”]. The former one extracts all key frames and the latter one extract specified number of frames uniformly from the video. Default: “uniform” with frame_num=3, considering that the number of keyframes can be large while their difference is usually small in terms of their aesthetics.
frame_num – the number of frames to be extracted uniformly from the video. Only works when frame_sampling_method is “uniform”. If it’s 1, only the middle frame will be extracted. If it’s 2, only the first and the last frames will be extracted. If it’s larger than 2, in addition to the first and the last frames, other frames will be extracted uniformly within the video duration.
any_or_all – Keep this sample with ‘any’ or ‘all’ strategy of all videos. ‘any’: keep this sample if any videos meet the condition. ‘all’: keep this sample only if all videos meet the condition.
reduce_mode – reduce mode when one sample corresponds to multiple frames, must be one of [‘avg’,’max’, ‘min’]. ‘avg’: Take the average of multiple values ‘max’: Take the max of multiple values ‘min’: Take the min of multiple values
args – Extra positional arguments.
kwargs – Extra keyword arguments.

compute_stats_single(sample, rank=None, context=False)[source]¶

Compute stats for the sample which is used as a metric to decide whether to filter this sample.

Parameters:

sample – input sample.
context – whether to store context information of intermediate vars in the sample temporarily.

Returns:

sample with computed stats

process_single(sample)[source]¶

For sample level, sample –> Boolean.

Parameters:: sample – sample to decide whether to filter
Returns:: true for keeping and false for filtering