data_juicer.utils.webdataset_utils module

data_juicer.utils.webdataset_utils.reconstruct_custom_webdataset_format(samples, field_mapping: Dict[str, str] | None = None)[源代码]

Reconstruct the original dataset to the WebDataset format. For all keys, they can be specified by field_mapping argument, which is a dict mapping from the target field key in the result format to the source field key in the original samples.

参数:
  • samples -- the input samples batch to be reconstructed

  • field_mapping -- the field mapping to construct the left fields.