data_juicer.config

data_juicer.config.init_configs(args=None)[source]
initialize the jsonargparse parser and parse configs from one of:
  1. POSIX-style commands line args;

  2. config files in yaml (json and jsonnet supersets);

  3. environment variables

  4. hard-coded defaults

Parameters:

args – list of params, e.g., [’–conifg’, ‘cfg.yaml’], defaut None.

Returns:

a global cfg object used by the Executor or Analyzer

data_juicer.config.export_config(cfg, path, format='yaml', skip_none=True, skip_check=True, overwrite=False, multifile=True)[source]

Save the config object, some params are from jsonargparse

Parameters:
  • cfg – cfg object to save (Namespace type)

  • path – the save path

  • format – ‘yaml’, ‘json’, ‘json_indented’, ‘parser_mode’

  • skip_none – Whether to exclude entries whose value is None.

  • skip_check – Whether to skip parser checking.

  • overwrite – Whether to overwrite existing files.

  • multifile – Whether to save multiple config files by using the __path__ metas.

Returns:

data_juicer.config.merge_config(ori_cfg, new_cfg: Dict)[source]

Merge configuration from new_cfg into ori_cfg

Parameters:
  • ori_cfg – the original configuration object, whose type is expected as namespace from jsonargparse

  • new_cfg – the configuration object to be merged, whose type is expected as dict or namespace from jsonargparse

Returns:

cfg_after_merge