data_juicer.format.load module

data_juicer.format.load.load_formatter(dataset_path, text_keys=None, suffixes=None, add_suffix=False, **kwargs) BaseFormatter[source]

Load the appropriate formatter for different types of data formats.

Parameters:
  • dataset_path – Path to dataset file or dataset directory

  • text_keys – key names of field that stores sample text. Default: None

  • suffixes – the suffix of files that will be read. Default: None

  • add_suffix – whether to add the file suffix to dataset meta. Default: False

Returns:

a dataset formatter.