data_juicer.ops.grouper.naive_reverse_grouper module

class data_juicer.ops.grouper.naive_reverse_grouper.NaiveReverseGrouper(batch_meta_export_path=None, *args, **kwargs)[source]

Bases: Grouper

Split batched samples into individual samples.

This operator processes a dataset by splitting each batched sample into individual samples. It also handles and optionally exports batch metadata. - If a sample contains ‘batch_meta’, it is separated and can be exported to a specified path. - The operator converts the remaining data from a dictionary of lists to a list of dictionaries, effectively unbatching the samples. - If batch_meta_export_path is provided, the batch metadata is written to this file in JSON format, one entry per line. - If no samples are present in the dataset, the original dataset is returned.

__init__(batch_meta_export_path=None, *args, **kwargs)[source]

Initialization method.

Parameters:
  • batch_meta_export_path – the path to export the batch meta. Just drop the batch meta if it is None.

  • args – extra args

  • kwargs – extra args

process(dataset)[source]

Dataset –> dataset.

Parameters:

dataset – input dataset

Returns:

dataset of batched samples.