data_juicer.ops.filter.suffix_filter module¶
- class data_juicer.ops.filter.suffix_filter.SuffixFilter(suffixes: str | List[str] = [], *args, **kwargs)[源代码]¶
基类:
Filter
Filter to keep samples with specified suffix.
This operator retains samples that have a suffix matching any of the provided suffixes. If no suffixes are specified, all samples are kept. The key metric 'keep' is computed based on whether the sample's suffix matches the specified list. The 'suffix' field of each sample is checked against the list of allowed suffixes. If the suffix matches, the sample is kept; otherwise, it is filtered out.
- __init__(suffixes: str | List[str] = [], *args, **kwargs)[源代码]¶
Initialization method.
- 参数:
suffixes -- the suffix of text that will be keep. For example: '.txt', 'txt' or ['txt', '.pdf', 'docx']
args -- extra args
kwargs -- extra args