data_juicer.ops.mapper.extract_tables_from_html_mapper module¶
- class data_juicer.ops.mapper.extract_tables_from_html_mapper.ExtractTablesFromHtmlMapper(tables_field_name: str = 'html_tables', retain_html_tags: bool = False, include_header: bool = True, *args, **kwargs)[source]¶
Bases:
Mapper
Mapper to extract tables from HTML content.
- __init__(tables_field_name: str = 'html_tables', retain_html_tags: bool = False, include_header: bool = True, *args, **kwargs)[source]¶
Initialization method. :param tables_field_name: Field name to store the extracted tables. :param retain_html_tags: If True, retains HTML tags in the tables;
otherwise, removes them.
- Parameters:
include_header –
- If True, includes the table header;
otherwise, excludes it.
- This parameter is effective
only when retain_html_tags is False
and applies solely to the extracted table content.