data_juicer.core.ray_exporter module¶

class data_juicer.core.ray_exporter.RayExporter(export_path, export_type=None, export_shard_size=0, keep_stats_in_res_ds=True, keep_hashes_in_res_ds=False, **kwargs)[源代码]¶

基类：object

The Exporter class is used to export a ray dataset to files of specific format.

__init__(export_path, export_type=None, export_shard_size=0, keep_stats_in_res_ds=True, keep_hashes_in_res_ds=False, **kwargs)[源代码]¶

Initialization method.

参数:

export_path -- the path to export datasets.
export_type -- the format type of the exported datasets.
export_shard_size -- the approximate size of each shard of exported dataset. In default, it's 0, which means export the dataset in the default setting of ray.
keep_stats_in_res_ds -- whether to keep stats in the result dataset.
keep_hashes_in_res_ds -- whether to keep hashes in the result dataset.

export(dataset, columns=None)[源代码]¶

Export method for a dataset.

参数:

dataset -- the dataset to export.
columns -- the columns to export.

返回:

static write_json(dataset, export_path, **kwargs)[源代码]¶

Export method for json/jsonl target files.

参数:

dataset -- the dataset to export.
export_path -- the path to store the exported dataset.
kwargs -- extra arguments.

返回:

static write_webdataset(dataset, export_path, **kwargs)[源代码]¶

Export method for webdataset target files.

参数:

dataset -- the dataset to export.
export_path -- the path to store the exported dataset.
kwargs -- extra arguments.

返回:

static write_others(dataset, export_path, **kwargs)[源代码]¶

Export method for other target files.

参数:

dataset -- the dataset to export.
export_path -- the path to store the exported dataset.
kwargs -- extra arguments.

返回: