data_juicer.format.json_formatter module

class data_juicer.format.json_formatter.JsonFormatter(dataset_path, suffixes=None, **kwargs)[源代码]

基类:LocalFormatter

The class is used to load and format json-type files.

Default suffixes is ['.json', '.jsonl', '.jsonl.zst']

SUFFIXES = ['.json', '.jsonl', '.jsonl.zst']
__init__(dataset_path, suffixes=None, **kwargs)[源代码]

Initialization method.

参数:
  • dataset_path -- a dataset file or a dataset directory

  • suffixes -- files with specified suffixes to be processed

  • kwargs -- extra args