data_juicer.ops.mapper.chinese_convert_mapper module

data_juicer.ops.mapper.chinese_convert_mapper.prepare_converter(mode)[源代码]
class data_juicer.ops.mapper.chinese_convert_mapper.ChineseConvertMapper(mode: str = 's2t', *args, **kwargs)[源代码]

基类:Mapper

Mapper to convert Chinese between Traditional Chinese, Simplified Chinese and Japanese Kanji.

__init__(mode: str = 's2t', *args, **kwargs)[源代码]

Initialization method.

参数:
  • mode --

    Choose the mode to convert Chinese:

    s2t: Simplified Chinese to Traditional Chinese,

    t2s: Traditional Chinese to Simplified Chinese,

    s2tw: Simplified Chinese to Traditional Chinese (Taiwan Standard),

    tw2s: Traditional Chinese (Taiwan Standard) to Simplified Chinese,

    s2hk: Simplified Chinese to Traditional Chinese (Hong Kong variant),

    hk2s: Traditional Chinese (Hong Kong variant) to Simplified Chinese,

    s2twp: Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom,

    tw2sp: Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom,

    t2tw: Traditional Chinese to Traditional Chinese (Taiwan Standard),

    tw2t: Traditional Chinese (Taiwan standard) to Traditional Chinese,

    hk2t: Traditional Chinese (Hong Kong variant) to Traditional Chinese,

    t2hk: Traditional Chinese to Traditional Chinese (Hong Kong variant),

    t2jp: Traditional Chinese Characters (Kyūjitai) to New Japanese Kanji,

    jp2t: New Japanese Kanji (Shinjitai) to Traditional Chinese Characters,

  • args -- extra args

  • kwargs -- extra args

process_batched(samples)[源代码]