data_juicer.ops.mapper.fix_unicode_mapper module

class data_juicer.ops.mapper.fix_unicode_mapper.FixUnicodeMapper(normalization: str = None, *args, **kwargs)[源代码]

基类:Mapper

Mapper to fix unicode errors in text samples.

__init__(normalization: str = None, *args, **kwargs)[源代码]

Initialization method.

参数:
  • normalization -- the specified form of Unicode normalization mode, which can be one of ['NFC', 'NFKC', 'NFD', and 'NFKD'], default 'NFC'.

  • args -- extra args

  • kwargs -- extra args

process_batched(samples)[源代码]