data_juicer.ops.filter.suffix_filter module¶
- class data_juicer.ops.filter.suffix_filter.SuffixFilter(suffixes: str | List[str] = [], *args, **kwargs)[source]¶
Bases:
Filter
Filter to keep samples with specified suffix.
- __init__(suffixes: str | List[str] = [], *args, **kwargs)[source]¶
Initialization method.
- Parameters:
suffixes – the suffix of text that will be keep. For example: ‘.txt’, ‘txt’ or [‘txt’, ‘.pdf’, ‘docx’]
args – extra args
kwargs – extra args