data_juicer.ops.mapper.remove_non_chinese_character_mapper module¶
- class data_juicer.ops.mapper.remove_non_chinese_character_mapper.RemoveNonChineseCharacterlMapper(keep_alphabet: bool = True, keep_number: bool = True, keep_punc: bool = True, *args, **kwargs)[source]¶
Bases:
Mapper
Mapper to remove non chinese Character in text samples.
- __init__(keep_alphabet: bool = True, keep_number: bool = True, keep_punc: bool = True, *args, **kwargs)[source]¶
Initialization method.
- Parameters:
keep_alphabet – whether to keep alphabet
keep_number – whether to keep number
keep_punc – whether to keep punctuation
args – extra args
kwargs – extra args