replace_content_mapper¶
Replaces content in the text that matches a specific regular expression pattern with a designated replacement string.
This operator processes text by searching for patterns defined in pattern
and replacing them with the corresponding repl
string. If multiple patterns and replacements are provided, each pattern is replaced by its respective replacement. The operator supports both single and multiple patterns and replacements. The regular expressions are compiled with the re.DOTALL
flag to match across multiple lines. If the length of the patterns and replacements do not match, a ValueError
is raised. This operation is batched, meaning it processes multiple samples at once.
用指定的替换字符串替换与特定正则表达式模式匹配的内容。
该算子通过搜索 pattern
中定义的模式并将它们替换为相应的 repl
字符串来处理文本。如果提供了多个模式和替换字符串,则每个模式都会被其对应的替换字符串替换。算子支持单个和多个模式及替换字符串。正则表达式使用 re.DOTALL
标志编译,以便跨多行匹配。如果模式和替换字符串的长度不匹配,则会引发 ValueError
。此操作是批处理的,即一次处理多个样本。
Type 算子类型: mapper
Tags 标签: cpu, text
🔧 Parameter Configuration 参数配置¶
name 参数名 |
type 类型 |
default 默认值 |
desc 说明 |
---|---|---|---|
|
typing.Union[str, typing.List[str], NoneType] |
|
regular expression pattern(s) to search for within text |
|
typing.Union[str, typing.List[str]] |
|
replacement string(s), default is empty string |
|
|
extra args |
|
|
|
extra args |
📊 Effect demonstration 效果演示¶
test_special_char_pattern_text¶
ReplaceContentMapper(pattern='●■', repl='<SPEC>')
📥 input data 输入数据¶
['这是一个干净的文本。Including Chinese and English.', '◆●■►▼▲▴∆▻▷❖♡□', '多个●■►▼这样的特殊字符可以►▼▲▴∆吗?', '未指定的●■☛₨➩►▼▲特殊字符会☻▷❖被删掉吗??']
📤 output data 输出数据¶
['这是一个干净的文本。Including Chinese and English.', '◆<SPEC>►▼▲▴∆▻▷❖♡□', '多个<SPEC>►▼这样的特殊字符可以►▼▲▴∆吗?', '未指定的<SPEC>☛₨➩►▼▲特殊字符会☻▷❖被删掉吗??']
✨ explanation 解释¶
This example demonstrates how the operator replaces specific special characters (in this case, ‘●■’) with a designated replacement string (’
test_raw_digit_pattern_text¶
ReplaceContentMapper(pattern='\\d+(?:,\\d+)*', repl='<DIGIT>')
📥 input data 输入数据¶
['这是一个123。Including 456 and English.']
📤 output data 输出数据¶
['这是一个<DIGIT>。Including <DIGIT> and English.']
✨ explanation 解释¶
In this example, the operator replaces sequences of digits (e.g., ‘123’, ‘456’) in the text with a designated replacement string (’