data_juicer.ops.mapper.remove_comments_mapper module

class data_juicer.ops.mapper.remove_comments_mapper.RemoveCommentsMapper(doc_type: str | List[str] = 'tex', inline: bool = True, multiline: bool = True, *args, **kwargs)[source]

Bases: Mapper

Mapper to remove comments in different kinds of documents.

Only support ‘tex’ for now.

__init__(doc_type: str | List[str] = 'tex', inline: bool = True, multiline: bool = True, *args, **kwargs)[source]

Initialization method.

Parameters:
  • doc_type – Type of document to remove comments.

  • inline – Whether to remove inline comments.

  • multiline – Whether to remove multiline comments.

  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]