data_juicer.ops.filter.general_field_filter module¶
- class data_juicer.ops.filter.general_field_filter.GeneralFieldFilter(filter_condition: str = '', *args, **kwargs)[source]¶
Bases:
Filter
Filter to keep samples based on a general field filter condition.
The filter condition is a string that can include logical operators (and/or) and chain comparisons. For example: “10 < num <= 30 and text != ‘nothing here’ and __dj__meta__.a == 3”. The condition is evaluated for each sample, and only samples that meet the condition are kept. The result of the filter condition is stored in the sample’s stats under the key ‘general_field_filter_condition’. If the filter condition is empty or already computed, the sample is not re-evaluated.
- __init__(filter_condition: str = '', *args, **kwargs)[source]¶
Initialization method. :param filter_condition: The filter condition as a string.
It can include logical operators (and/or) and chain comparisons. For example: “10 < num <= 30 and text != ‘nothing here’ and __dj__meta__.a == 3”.
- compute_stats_single(sample, context=False)[source]¶
Compute stats for the sample which is used as a metric to decide whether to filter this sample.
- Parameters:
sample – input sample.
context – whether to store context information of intermediate vars in the sample temporarily.
- Returns:
sample with computed stats