data_juicer.ops.mapper.remove_bibliography_mapper module

class data_juicer.ops.mapper.remove_bibliography_mapper.RemoveBibliographyMapper(*args, **kwargs)[source]

Bases: Mapper

Removes bibliography sections at the end of LaTeX documents.

This operator identifies and removes bibliography sections in LaTeX documents. It uses a regular expression to match common bibliography commands such as appendix, begin{references}, begin{thebibliography}, and bibliography. The matched sections are removed from the text. The operator processes samples in batch mode for efficiency.

__init__(*args, **kwargs)[source]

Initialization method.

Parameters:
  • args – extra args

  • kwargs – extra args

process_batched(samples)[source]