data_juicer.ops.mapper.remove_specific_chars_mapper module

class data_juicer.ops.mapper.remove_specific_chars_mapper.RemoveSpecificCharsMapper(chars_to_remove: str | List[str] = '◆●■►▼▲▴∆▻▷❖♡□', *args, **kwargs)[source]

Bases: Mapper

Removes specific characters from text samples.

This operator removes specified characters from the text. The characters to be removed can be provided as a string or a list of strings. If no characters are specified, the default set includes special and non-alphanumeric characters. The operator processes the text using a regular expression pattern that matches any of the specified characters and replaces them with an empty string. This is done in a batched manner for efficiency.

__init__(chars_to_remove: str | List[str] = '◆●■►▼▲▴∆▻▷❖♡□', *args, **kwargs)[source]

Initialization method.

Parameters:
  • chars_to_remove – a list or a string including all characters that need to be removed from text.

  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]