data_juicer.ops.mapper.sentence_split_mapper module

class data_juicer.ops.mapper.sentence_split_mapper.SentenceSplitMapper(lang: str = 'en', *args, **kwargs)[source]

Bases: Mapper

Mapper to split text samples to sentences.

__init__(lang: str = 'en', *args, **kwargs)[source]

Initialization method.

Parameters:
  • lang – split sentence of text in which language.

  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]