data_juicer.ops.filter.specified_field_filter module

class data_juicer.ops.filter.specified_field_filter.SpecifiedFieldFilter(field_key: str = '', target_value: List = [], *args, **kwargs)[source]

Bases: Filter

Filter based on specified field information.

If the specified field information in the sample is not within the specified target value, the sample will be filtered.

__init__(field_key: str = '', target_value: List = [], *args, **kwargs)[source]

Initialization method.

Parameters:
  • field_key – Filter based on the specified value corresponding to the target key. The target key corresponding to multi-level field information need to be separated by ‘.’.

  • target_value – The range of specified field information corresponding to the samples that need to be retained.

  • args – extra args

  • kwargs – extra args

compute_stats_single(sample)[source]

Compute stats for the sample which is used as a metric to decide whether to filter this sample.

Parameters:
  • sample – input sample.

  • context – whether to store context information of intermediate vars in the sample temporarily.

Returns:

sample with computed stats

process_single(sample)[source]

For sample level, sample –> Boolean.

Parameters:

sample – sample to decide whether to filter

Returns:

true for keeping and false for filtering