data_juicer.ops.mapper.remove_long_words_mapper module

class data_juicer.ops.mapper.remove_long_words_mapper.RemoveLongWordsMapper(min_len: int = 1, max_len: int = 9223372036854775807, *args, **kwargs)[源代码]

基类:Mapper

Mapper to remove long words within a specific range.

__init__(min_len: int = 1, max_len: int = 9223372036854775807, *args, **kwargs)[源代码]

Initialization method.

参数:
  • min_len -- The min mapper word length in this op, words will be filtered if their length is below this parameter.

  • max_len -- The max mapper word length in this op, words will be filtered if their length exceeds this parameter.

  • args -- extra args

  • kwargs -- extra args

should_keep_long_word(word)[源代码]
process_batched(samples)[源代码]