data_juicer.format.load module¶
- data_juicer.format.load.load_formatter(dataset_path, text_keys=None, suffixes=None, add_suffix=False, **kwargs) BaseFormatter [source]¶
Load the appropriate formatter for different types of data formats.
- Parameters:
dataset_path – Path to dataset file or dataset directory
text_keys – key names of field that stores sample text. Default: None
suffixes – the suffix of files that will be read. Default: None
add_suffix – whether to add the file suffix to dataset meta. Default: False
- Returns:
a dataset formatter.