data_juicer.ops.mapper.replace_content_mapper module

class data_juicer.ops.mapper.replace_content_mapper.ReplaceContentMapper(pattern: str | List[str] | None = None, repl: str | List[str] = '', *args, **kwargs)[源代码]

基类:Mapper

Replaces content in the text that matches a specific regular expression pattern with a designated replacement string.

This operator processes text by searching for patterns defined in pattern and replacing them with the corresponding repl string. If multiple patterns and replacements are provided, each pattern is replaced by its respective replacement. The operator supports both single and multiple patterns and replacements. The regular expressions are compiled with the re.DOTALL flag to match across multiple lines. If the length of the patterns and replacements do not match, a ValueError is raised. This operation is batched, meaning it processes multiple samples at once.

__init__(pattern: str | List[str] | None = None, repl: str | List[str] = '', *args, **kwargs)[源代码]

Initialization method.

参数:
  • pattern -- regular expression pattern(s) to search for within text

  • repl -- replacement string(s), default is empty string

  • args -- extra args

  • kwargs -- extra args

process_batched(samples)[源代码]