data_juicer.config package¶
Submodules¶
data_juicer.config.config module¶
- data_juicer.config.config.init_configs(args: List[str] | None = None, which_entry: object | None = None)[source]¶
- initialize the jsonargparse parser and parse configs from one of:
POSIX-style commands line args;
config files in yaml (json and jsonnet supersets);
environment variables
hard-coded defaults
- Parameters:
args – list of params, e.g., [’–conifg’, ‘cfg.yaml’], defaut None.
which_entry – which entry to init configs (executor/analyzer)
- Returns:
a global cfg object used by the Executor or Analyzer
- data_juicer.config.config.init_setup_from_cfg(cfg: Namespace)[source]¶
Do some extra setup tasks after parsing config file or command line.
create working directory and a log directory
update cache directory
update checkpoint and temp_dir of tempfile
- Parameters:
cfg – an original cfg
cfg – an updated cfg
- data_juicer.config.config.sort_op_by_types_and_names(op_name_classes)[source]¶
Split ops items by op type and sort them to sub-ops by name, then concat together.
- Parameters:
op_name_classes – a list of op modules
- Returns:
sorted op list , each item is a pair of op_name and op_class
- data_juicer.config.config.namespace_to_arg_list(namespace, prefix='', includes=None, excludes=None)[source]¶
- data_juicer.config.config.export_config(cfg: Namespace, path: str, format: str = 'yaml', skip_none: bool = True, skip_check: bool = True, overwrite: bool = False, multifile: bool = True)[source]¶
Save the config object, some params are from jsonargparse
- Parameters:
cfg – cfg object to save (Namespace type)
path – the save path
format – ‘yaml’, ‘json’, ‘json_indented’, ‘parser_mode’
skip_none – Whether to exclude entries whose value is None.
skip_check – Whether to skip parser checking.
overwrite – Whether to overwrite existing files.
multifile – Whether to save multiple config files by using the __path__ metas.
- Returns:
- data_juicer.config.config.merge_config(ori_cfg: Namespace, new_cfg: Namespace)[source]¶
Merge configuration from new_cfg into ori_cfg
- Parameters:
ori_cfg – the original configuration object, whose type is expected as namespace from jsonargparse
new_cfg – the configuration object to be merged, whose type is expected as dict or namespace from jsonargparse
- Returns:
cfg_after_merge
- data_juicer.config.config.prepare_side_configs(ori_config: str | Namespace | Dict)[source]¶
- parse the config if ori_config is a string of a config file path with
yaml, yml or json format
- Parameters:
ori_config – a config dict or a string of a config file path with yaml, yml or json format
- Returns:
a config dict
Module contents¶
- data_juicer.config.init_configs(args: List[str] | None = None, which_entry: object | None = None)[source]¶
- initialize the jsonargparse parser and parse configs from one of:
POSIX-style commands line args;
config files in yaml (json and jsonnet supersets);
environment variables
hard-coded defaults
- Parameters:
args – list of params, e.g., [’–conifg’, ‘cfg.yaml’], defaut None.
which_entry – which entry to init configs (executor/analyzer)
- Returns:
a global cfg object used by the Executor or Analyzer
- data_juicer.config.get_init_configs(cfg: Namespace | Dict)[source]¶
set init configs of datajucer for cfg
- data_juicer.config.export_config(cfg: Namespace, path: str, format: str = 'yaml', skip_none: bool = True, skip_check: bool = True, overwrite: bool = False, multifile: bool = True)[source]¶
Save the config object, some params are from jsonargparse
- Parameters:
cfg – cfg object to save (Namespace type)
path – the save path
format – ‘yaml’, ‘json’, ‘json_indented’, ‘parser_mode’
skip_none – Whether to exclude entries whose value is None.
skip_check – Whether to skip parser checking.
overwrite – Whether to overwrite existing files.
multifile – Whether to save multiple config files by using the __path__ metas.
- Returns:
- data_juicer.config.merge_config(ori_cfg: Namespace, new_cfg: Namespace)[source]¶
Merge configuration from new_cfg into ori_cfg
- Parameters:
ori_cfg – the original configuration object, whose type is expected as namespace from jsonargparse
new_cfg – the configuration object to be merged, whose type is expected as dict or namespace from jsonargparse
- Returns:
cfg_after_merge
- data_juicer.config.prepare_side_configs(ori_config: str | Namespace | Dict)[source]¶
- parse the config if ori_config is a string of a config file path with
yaml, yml or json format
- Parameters:
ori_config – a config dict or a string of a config file path with yaml, yml or json format
- Returns:
a config dict