data_juicer.ops.mapper.remove_table_text_mapper module¶
- class data_juicer.ops.mapper.remove_table_text_mapper.RemoveTableTextMapper(min_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 2, max_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 20, *args, **kwargs)[source]¶
Bases:
Mapper
Mapper to remove table texts from text samples.
Regular expression is used to remove tables in the range of column number of tables.
- __init__(min_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 2, max_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 20, *args, **kwargs)[source]¶
Initialization method.
- Parameters:
min_col – The min number of columns of table to remove.
max_col – The max number of columns of table to remove.
args – extra args
kwargs – extra args