# remove_specific_chars_mapper Removes specific characters from text samples. This operator removes specified characters from the text. The characters to be removed can be provided as a string or a list of strings. If no characters are specified, the default set includes special and non-alphanumeric characters. The operator processes the text using a regular expression pattern that matches any of the specified characters and replaces them with an empty string. This is done in a batched manner for efficiency. 移除文本样本中的特定字符。 该算子从文本中移除指定的字符。要移除的字符可以作为字符串或字符串列表提供。如果没有指定字符,默认设置包括特殊字符和非字母数字字符。该算子使用正则表达式模式匹配任何指定的字符,并将其替换为空字符串。为了提高效率,这以批量方式进行。 Type 算子类型: **mapper** Tags 标签: cpu, text ## 🔧 Parameter Configuration 参数配置 | name 参数名 | type 类型 | default 默认值 | desc 说明 | |--------|------|--------|------| | `chars_to_remove` | typing.Union[str, typing.List[str]] | `'◆●■►▼▲▴∆▻▷❖♡□'` | a list or a string including all | | `args` | | `''` | extra args | | `kwargs` | | `''` | extra args | ## 📊 Effect demonstration 效果演示 ### test_complete_html_text ```python RemoveSpecificCharsMapper() ``` #### 📥 input data 输入数据
['这是一个干净的文本。Including Chinese and English.', '◆●■►▼▲▴∆▻▷❖♡□', '►This is a dirty text ▻ 包括中文和英文', '多个●■►▼这样的特殊字符可以►▼▲▴∆吗?', '未指定的●■☛₨➩►▼▲特殊字符会☻▷❖被删掉吗??']
['这是一个干净的文本。Including Chinese and English.', '', 'This is a dirty text 包括中文和英文', '多个这样的特殊字符可以吗?', '未指定的☛₨➩特殊字符会☻被删掉吗??']