data_juicer.ops.grouper.key_value_grouper module¶
- class data_juicer.ops.grouper.key_value_grouper.KeyValueGrouper(group_by_keys: List[str] | None = None, *args, **kwargs)[源代码]¶
基类:
Grouper
Groups samples into batches based on values in specified keys.
This operator groups samples by the values of the given keys, which can be nested. If no keys are provided, it defaults to using the text key. It uses a naive grouping strategy to batch samples with identical key values. The resulting dataset is a list of batched samples, where each batch contains samples that share the same key values. This is useful for organizing data by specific attributes or features.