data_juicer.ops.filter.audio_duration_filter module¶
- class data_juicer.ops.filter.audio_duration_filter.AudioDurationFilter(min_duration: int = 0, max_duration: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]¶
Bases:
Filter
Keep data samples whose audios’ durations are within a specified range.
- __init__(min_duration: int = 0, max_duration: int = 9223372036854775807, any_or_all: str = 'any', *args, **kwargs)[source]¶
Initialization method.
- Parameters:
min_duration – The min audio duration to keep samples in seconds. It’s 0 by default.
max_duration – The max audio duration to keep samples in seconds. It’s sys.maxsize by default.
any_or_all – keep this sample with ‘any’ or ‘all’ strategy of all audios. ‘any’: keep this sample if any audios meet the condition. ‘all’: keep this sample only if all audios meet the condition.
args – extra args
kwargs – extra args
- compute_stats_single(sample, context=False)[source]¶
Compute stats for the sample which is used as a metric to decide whether to filter this sample.
- Parameters:
sample – input sample.
context – whether to store context information of intermediate vars in the sample temporarily.
- Returns:
sample with computed stats