data_juicer.ops.mapper.remove_table_text_mapper module

class data_juicer.ops.mapper.remove_table_text_mapper.RemoveTableTextMapper(min_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 2, max_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 20, *args, **kwargs)[源代码]

基类:Mapper

Mapper to remove table texts from text samples.

Regular expression is used to remove tables in the range of column number of tables.

__init__(min_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 2, max_col: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=2), Le(le=20)])] = 20, *args, **kwargs)[源代码]

Initialization method.

参数:
  • min_col -- The min number of columns of table to remove.

  • max_col -- The max number of columns of table to remove.

  • args -- extra args

  • kwargs -- extra args

process_batched(samples)[源代码]