data_juicer.ops.mapper.chinese_convert_mapper module

data_juicer.ops.mapper.chinese_convert_mapper.prepare_converter(mode)[source]
class data_juicer.ops.mapper.chinese_convert_mapper.ChineseConvertMapper(mode: str = 's2t', *args, **kwargs)[source]

Bases: Mapper

Mapper to convert Chinese between Traditional Chinese, Simplified Chinese and Japanese Kanji.

__init__(mode: str = 's2t', *args, **kwargs)[source]

Initialization method.

Parameters:
  • mode

    Choose the mode to convert Chinese:

    s2t: Simplified Chinese to Traditional Chinese,

    t2s: Traditional Chinese to Simplified Chinese,

    s2tw: Simplified Chinese to Traditional Chinese (Taiwan Standard),

    tw2s: Traditional Chinese (Taiwan Standard) to Simplified Chinese,

    s2hk: Simplified Chinese to Traditional Chinese (Hong Kong variant),

    hk2s: Traditional Chinese (Hong Kong variant) to Simplified Chinese,

    s2twp: Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom,

    tw2sp: Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom,

    t2tw: Traditional Chinese to Traditional Chinese (Taiwan Standard),

    tw2t: Traditional Chinese (Taiwan standard) to Traditional Chinese,

    hk2t: Traditional Chinese (Hong Kong variant) to Traditional Chinese,

    t2hk: Traditional Chinese to Traditional Chinese (Hong Kong variant),

    t2jp: Traditional Chinese Characters (Kyūjitai) to New Japanese Kanji,

    jp2t: New Japanese Kanji (Shinjitai) to Traditional Chinese Characters,

  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]