data_juicer.utils.webdataset_utils module¶
- data_juicer.utils.webdataset_utils.reconstruct_custom_webdataset_format(samples, field_mapping: Dict[str, str] | None = None)[源代码]¶
Reconstruct the original dataset to the WebDataset format. For all keys, they can be specified by field_mapping argument, which is a dict mapping from the target field key in the result format to the source field key in the original samples.
- 参数:
samples -- the input samples batch to be reconstructed
field_mapping -- the field mapping to construct the left fields.