data_juicer.ops.mapper.replace_content_mapper module

class data_juicer.ops.mapper.replace_content_mapper.ReplaceContentMapper(pattern: str | List[str] | None = None, repl: str | List[str] = '', *args, **kwargs)[source]

Bases: Mapper

Replaces content in the text that matches a specific regular expression pattern with a designated replacement string.

This operator processes text by searching for patterns defined in pattern and replacing them with the corresponding repl string. If multiple patterns and replacements are provided, each pattern is replaced by its respective replacement. The operator supports both single and multiple patterns and replacements. The regular expressions are compiled with the re.DOTALL flag to match across multiple lines. If the length of the patterns and replacements do not match, a ValueError is raised. This operation is batched, meaning it processes multiple samples at once.

__init__(pattern: str | List[str] | None = None, repl: str | List[str] = '', *args, **kwargs)[source]

Initialization method.

Parameters:
  • pattern – regular expression pattern(s) to search for within text

  • repl – replacement string(s), default is empty string

  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]