data_juicer.ops.filter.specified_field_filter module¶
- class data_juicer.ops.filter.specified_field_filter.SpecifiedFieldFilter(field_key: str = '', target_value: List = [], *args, **kwargs)[source]¶
Bases:
Filter
Filter samples based on the specified field information.
This operator checks if the value of a specified field in each sample is within a given target value range. If the field value is not within the target range, the sample is filtered out. The field can be a multi-level key, with levels separated by dots. The target value is a list of acceptable values for the field. If the field value is not a list or tuple, it is converted to a list for comparison. Samples are retained if all values in the field match any of the target values.
Uses the ‘field_key’ and ‘target_value’ parameters.
Supports multi-level field keys, e.g., ‘level1.level2’.
Converts non-list/tuple field values to a list for comparison.
- __init__(field_key: str = '', target_value: List = [], *args, **kwargs)[source]¶
Initialization method.
- Parameters:
field_key – Filter based on the specified value corresponding to the target key. The target key corresponding to multi-level field information need to be separated by ‘.’.
target_value – The range of specified field information corresponding to the samples that need to be retained.
args – extra args
kwargs – extra args