data_juicer.utils.webdataset_utils module¶
- data_juicer.utils.webdataset_utils.reconstruct_custom_webdataset_format(samples, field_mapping: Dict[str, str] | None = None)[source]¶
Reconstruct the original dataset to the WebDataset format. For all keys, they can be specified by field_mapping argument, which is a dict mapping from the target field key in the result format to the source field key in the original samples.
- Parameters:
samples – the input samples batch to be reconstructed
field_mapping – the field mapping to construct the left fields.