clean_ip_mapper¶
Cleans IPv4 and IPv6 addresses from text samples.
This operator removes or replaces IPv4 and IPv6 addresses in the text. It uses a regular expression to identify and clean the IP addresses. By default, it replaces the IP addresses with an empty string, effectively removing them. The operator can be configured with a custom pattern and replacement string. If no pattern is provided, a default pattern for both IPv4 and IPv6 addresses is used. The operator processes samples in batches.
Uses a regular expression to find and clean IP addresses.
Replaces found IP addresses with a specified replacement string.
Default replacement string is an empty string, which removes the IP addresses.
Can use a custom regular expression pattern if provided.
Processes samples in batches for efficiency.
从文本样本中清理IPv4和IPv6地址。
此算子删除或替换文本中的IPv4和IPv6地址。它使用正则表达式来识别和清理IP地址。默认情况下,它将IP地址替换为空字符串,从而删除它们。可以通过自定义模式和替换字符串配置算子。如果没有提供模式,则使用默认的IPv4和IPv6地址模式。算子以批量方式处理样本。
使用正则表达式查找并清理IP地址。
将找到的IP地址替换为指定的替换字符串。
默认替换字符串为空字符串,从而删除IP地址。
如果提供了自定义正则表达式模式,则可以使用。
以批量方式处理样本以提高效率。
Type 算子类型: mapper
Tags 标签: cpu, text
🔧 Parameter Configuration 参数配置¶
name 参数名 |
type 类型 |
default 默认值 |
desc 说明 |
---|---|---|---|
|
typing.Optional[str] |
|
regular expression pattern to search for within text. |
|
<class ‘str’> |
|
replacement string, default is empty string. |
|
|
extra args |
|
|
|
extra args |
📊 Effect demonstration 效果演示¶
test_ipv4¶
CleanIpMapper()
📥 input data 输入数据¶
['test of ip 234.128.124.123', '34.0.124.123', 'ftp://example.com/188.46.244.216my-page.html', 'ft174.1421.237.246my']
📤 output data 输出数据¶
['test of ip ', '', 'ftp://example.com/my-page.html', 'ft174.1421.237.246my']
✨ explanation 解释¶
This example demonstrates the operator’s default behavior of removing IPv4 addresses from the text. The operator uses a regular expression to find and remove any IPv4 addresses, leaving the rest of the text unchanged. In the output, you can see that the IPv4 addresses have been removed, and the remaining text is preserved as it is. For instance, ‘234.128.124.123’ is removed, resulting in ‘test of ip ‘. 这个例子展示了算子的默认行为,即从文本中移除IPv4地址。算子使用正则表达式来查找并移除任何IPv4地址,而其余文本保持不变。在输出中,你可以看到IPv4地址已经被移除,剩余的文本被保留。例如,’234.128.124.123’ 被移除后,结果是 ‘test of ip ‘。
test_replace_ipv4¶
CleanIpMapper(repl='<IP>')
📥 input data 输入数据¶
['test of ip 234.128.124.123', '34.0.124.123', 'ftp://example.com/188.46.244.216my-page.html', 'ft174.1421.237.246my']
📤 output data 输出数据¶
['test of ip <IP>', '<IP>', 'ftp://example.com/<IP>my-page.html', 'ft174.1421.237.246my']
✨ explanation 解释¶
This example shows how the operator can be configured to replace IPv4 addresses with a custom string, ‘