data_juicer.ops.selector.topk_specified_field_selector module¶
- class data_juicer.ops.selector.topk_specified_field_selector.TopkSpecifiedFieldSelector(field_key: str = '', top_ratio: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] | None = None, topk: Annotated[int, Gt(gt=0)] | None = None, reverse: bool = True, *args, **kwargs)[源代码]¶
基类:
Selector
Selector to select top samples based on the sorted specified field value.
- __init__(field_key: str = '', top_ratio: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] | None = None, topk: Annotated[int, Gt(gt=0)] | None = None, reverse: bool = True, *args, **kwargs)[源代码]¶
Initialization method.
- 参数:
field_key -- Selector based on the specified value corresponding to the target key. The target key corresponding to multi-level field information need to be separated by '.'.
top_ratio -- Ratio of selected top samples, samples will be selected if their specified field values are within this parameter. When both topk and top_ratio are set, the value corresponding to the smaller number of samples will be applied.
topk -- Number of selected top sample, samples will be selected if their specified field values are within this parameter. When both topk and top_ratio are set, the value corresponding to the smaller number of samples will be applied.
reverse -- Determine the sorting rule, if reverse=True, then sort in descending order.
args -- extra args
kwargs -- extra args