data_juicer.ops.filter.perplexity_filter module

class data_juicer.ops.filter.perplexity_filter.PerplexityFilter(lang: str = 'en', min_ppl: float = 0, max_ppl: float = 1500, *args, **kwargs)[源代码]

基类:Filter

Filter to keep samples with perplexity score in a specified range.

__init__(lang: str = 'en', min_ppl: float = 0, max_ppl: float = 1500, *args, **kwargs)[源代码]

Initialization method.

参数:
  • lang -- Compute perplexity for samples in which language.

  • min_ppl -- The min filter perplexity in this op.

  • max_ppl -- The max filter perplexity in this op.

  • args -- extra args

  • kwargs -- extra args

compute_stats_batched(samples, context=False)[源代码]
process_batched(samples)[源代码]