data_juicer.ops.mapper.punctuation_normalization_mapper module¶
- class data_juicer.ops.mapper.punctuation_normalization_mapper.PunctuationNormalizationMapper(*args, **kwargs)[source]¶
Bases:
Mapper
Normalizes unicode punctuations to their English equivalents in text samples.
This operator processes a batch of text samples and replaces any unicode punctuation with its corresponding English punctuation. The mapping includes common substitutions like “,” to “,”, “。” to “.”, and ““” to “. It iterates over each character in the text, replacing it if it is found in the predefined punctuation map. The result is a set of text samples with consistent punctuation formatting.