data_juicer.ops.grouper.key_value_grouper module

class data_juicer.ops.grouper.key_value_grouper.KeyValueGrouper(group_by_keys: List[str] | None = None, *args, **kwargs)[源代码]

基类:Grouper

Group samples to batched samples according values in given keys.

__init__(group_by_keys: List[str] | None = None, *args, **kwargs)[源代码]

Initialization method.

参数:
  • group_by_keys -- group samples according values in the keys. Support for nested keys such as "__dj__stats__.text_len". It is [self.text_key] in default.

  • args -- extra args

  • kwargs -- extra args

process(dataset)[源代码]

Dataset --> dataset.

参数:

dataset -- input dataset

返回:

dataset of batched samples.