data_juicer.ops.mapper.remove_bibliography_mapper module

class data_juicer.ops.mapper.remove_bibliography_mapper.RemoveBibliographyMapper(*args, **kwargs)[源代码]

基类:Mapper

Removes bibliography sections at the end of LaTeX documents.

This operator identifies and removes bibliography sections in LaTeX documents. It uses a regular expression to match common bibliography commands such as appendix, begin{references}, begin{thebibliography}, and bibliography. The matched sections are removed from the text. The operator processes samples in batch mode for efficiency.

__init__(*args, **kwargs)[源代码]

Initialization method.

参数:
  • args -- extra args

  • kwargs -- extra args

process_batched(samples)[源代码]