data_juicer.ops.mapper.remove_non_chinese_character_mapper module

class data_juicer.ops.mapper.remove_non_chinese_character_mapper.RemoveNonChineseCharacterlMapper(keep_alphabet: bool = True, keep_number: bool = True, keep_punc: bool = True, *args, **kwargs)[source]

Bases: Mapper

Mapper to remove non chinese Character in text samples.

__init__(keep_alphabet: bool = True, keep_number: bool = True, keep_punc: bool = True, *args, **kwargs)[source]

Initialization method.

Parameters:
  • keep_alphabet – whether to keep alphabet

  • keep_number – whether to keep number

  • keep_punc – whether to keep punctuation

  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]