data_juicer.ops.filter.perplexity_filter module¶
- class data_juicer.ops.filter.perplexity_filter.PerplexityFilter(lang: str = 'en', min_ppl: float = 0, max_ppl: float = 1500, *args, **kwargs)[源代码]¶
基类:
Filter
Filter to keep samples with perplexity score in a specified range.
- __init__(lang: str = 'en', min_ppl: float = 0, max_ppl: float = 1500, *args, **kwargs)[源代码]¶
Initialization method.
- 参数:
lang -- Compute perplexity for samples in which language.
min_ppl -- The min filter perplexity in this op.
max_ppl -- The max filter perplexity in this op.
args -- extra args
kwargs -- extra args