data_juicer.ops.mapper.clean_html_mapper module

class data_juicer.ops.mapper.clean_html_mapper.CleanHtmlMapper(*args, **kwargs)[source]

Bases: Mapper

Mapper to clean html code in text samples.

__init__(*args, **kwargs)[source]

Initialization method.

Parameters:
  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]