clean_email_mapper¶
Cleans email addresses from text samples using a regular expression.
This operator removes or replaces email addresses in the text based on a regular expression pattern. By default, it uses a standard pattern to match email addresses, but a custom pattern can be provided. The matched email addresses are replaced with a specified replacement string, which defaults to an empty string. The operation is applied to each text sample in the batch. If no email address is found in a sample, it remains unchanged.
使用正则表达式从文本样本中清理电子邮件地址。
此算子基于正则表达式模式删除或替换文本中的电子邮件地址。默认情况下,它使用标准模式匹配电子邮件地址,但可以提供自定义模式。匹配到的电子邮件地址将被替换为指定的替换字符串,默认为空字符串。该操作应用于批次中的每个文本样本。如果样本中没有找到电子邮件地址,则保持不变。
Type 算子类型: mapper
Tags 标签: cpu, text
🔧 Parameter Configuration 参数配置¶
name 参数名 |
type 类型 |
default 默认值 |
desc 说明 |
---|---|---|---|
|
typing.Optional[str] |
|
regular expression pattern to search for within text. |
|
<class ‘str’> |
|
replacement string, default is empty string. |
|
|
extra args |
|
|
|
extra args |
📊 Effect demonstration 效果演示¶
test_clean_email¶
CleanEmailMapper()
📥 input data 输入数据¶
['happy day euqdh@cjqi.com', '请问你是谁dasoidhao@1264fg.45om', 'ftp://examplema-nièrdash@hqbchd.ckdhnfes.cds', '👊23da44sh12@46hqb12chd.ckdhnfes.comd.dasd.asd.dc']
📤 output data 输出数据¶
['happy day ', '请问你是谁dasoidhao@1264fg.45om', 'ftp://examplema-niè', '👊']
✨ explanation 解释¶
This example demonstrates the default behavior of the CleanEmailMapper, which removes email addresses from the text. The operator uses a regular expression to identify and remove any email addresses found in the ‘text’ field. In the provided samples, emails like ‘euqdh@cjqi.com’ and ‘rdash@hqbchd.ckdhnfes.cds’ are removed, leaving only the non-email parts of the text. The sample with no valid email address (‘请问你是谁dasoidhao@1264fg.45om’) remains unchanged. 这个例子展示了CleanEmailMapper的默认行为,即从文本中移除电子邮件地址。算子使用正则表达式来识别并移除’text’字段中的任何电子邮件地址。在提供的样本中,像’euqdh@cjqi.com’和’rdash@hqbchd.ckdhnfes.cds’这样的电子邮件被移除,只留下文本中的非电子邮件部分。没有有效电子邮件地址的样本(’请问你是谁dasoidhao@1264fg.45om’)保持不变。
test_replace_email¶
CleanEmailMapper(repl='<EMAIL>')
📥 input data 输入数据¶
['happy day euqdh@cjqi.com', '请问你是谁dasoidhao@1264fg.45om', 'ftp://examplema-nièrdash@hqbchd.ckdhnfes.cds', '👊23da44sh12@46hqb12chd.ckdhnfes.comd.dasd.asd.dc']
📤 output data 输出数据¶
['happy day <EMAIL>', '请问你是谁dasoidhao@1264fg.45om', 'ftp://examplema-niè<EMAIL>', '👊<EMAIL>']
✨ explanation 解释¶
In this case, the CleanEmailMapper is configured to replace email addresses with a specific string ‘