data_juicer.format.load module

data_juicer.format.load.load_formatter(dataset_path, text_keys=None, suffixes=None, add_suffix=False, **kwargs) BaseFormatter[源代码]

Load the appropriate formatter for different types of data formats.

参数:
  • dataset_path -- Path to dataset file or dataset directory

  • text_keys -- key names of field that stores sample text. Default: None

  • suffixes -- the suffix of files that will be read. Default: None

  • add_suffix -- whether to add the file suffix to dataset meta. Default: False

返回:

a dataset formatter.