data_juicer.ops.filter.perplexity_filter module¶
- class data_juicer.ops.filter.perplexity_filter.PerplexityFilter(lang: str = 'en', min_ppl: float = 0, max_ppl: float = 1500, *args, **kwargs)[source]¶
Bases:
Filter
Filter to keep samples with perplexity score in a specified range.
- __init__(lang: str = 'en', min_ppl: float = 0, max_ppl: float = 1500, *args, **kwargs)[source]¶
Initialization method.
- Parameters:
lang – Compute perplexity for samples in which language.
min_ppl – The min filter perplexity in this op.
max_ppl – The max filter perplexity in this op.
args – extra args
kwargs – extra args