data_juicer.ops.mapper.whitespace_normalization_mapper module

class data_juicer.ops.mapper.whitespace_normalization_mapper.WhitespaceNormalizationMapper(*args, **kwargs)[源代码]

基类:Mapper

Mapper to normalize different kinds of whitespaces to whitespace ' ' (0x20) in text samples.

Different kinds of whitespaces can be found here: https://en.wikipedia.org/wiki/Whitespace_character

__init__(*args, **kwargs)[源代码]

Initialization method.

参数:
  • args -- extra args

  • kwargs -- extra args

process_batched(samples)[源代码]