data_juicer.ops.filter.special_characters_filter module¶
- class data_juicer.ops.filter.special_characters_filter.SpecialCharactersFilter(min_ratio: float = 0.0, max_ratio: float = 0.25, *args, **kwargs)[source]¶
Bases:
Filter
Filter to keep samples with special-char ratio within a specific range.
- __init__(min_ratio: float = 0.0, max_ratio: float = 0.25, *args, **kwargs)[source]¶
Initialization method.
- Parameters:
min_ratio – The min filter ratio in this op, samples will be filtered if their special-char ratio is below this parameter.
max_ratio – The max filter ratio in this op, samples will be filtered if their special-char ratio exceeds this parameter.
args – extra args
kwargs – extra args