data_juicer.ops.selector.random_selector module

class data_juicer.ops.selector.random_selector.RandomSelector(select_ratio: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] | None = None, select_num: Annotated[int, Gt(gt=0)] | None = None, *args, **kwargs)[source]

Bases: Selector

Selector to random select samples.

__init__(select_ratio: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] | None = None, select_num: Annotated[int, Gt(gt=0)] | None = None, *args, **kwargs)[source]

Initialization method.

Parameters:
  • select_ratio – The ratio to select. When both select_ratio and select_num are set, the value corresponding to the smaller number of samples will be applied.

  • select_num – The number of samples to select. When both select_ratio and select_num are set, the value corresponding to the smaller number of samples will be applied.

  • args – extra args

  • kwargs – extra args

process(dataset)[source]

Dataset –> dataset.

Parameters:

dataset – input dataset

Returns:

selected dataset.