data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper module

data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.is_noun(word)[源代码]
data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.compare_text_index(text1, text2)[源代码]
data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.iou_filter(samples, iou_thresh)[源代码]
class data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper.Difference_Area_Generator_Mapper(image_pair_similarity_filter_args: Dict | None = {}, image_segment_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, *args, **kwargs)[源代码]

基类:Mapper

A fused operator for OPs that is used to run sequential OPs on the same batch to allow fine-grained control on data processing.

__init__(image_pair_similarity_filter_args: Dict | None = {}, image_segment_mapper_args: Dict | None = {}, image_text_matching_filter_args: Dict | None = {}, *args, **kwargs)[源代码]

Initialization.

参数:
  • image_pair_similarity_filter_args -- Arguments for image pair similarity filter. Controls the similarity filtering between image pairs. Default empty dict will use fixed values: min_score_1=0.1, max_score_1=1.0, min_score_2=0.1, max_score_2=1.0, hf_clip="openai/clip-vit-base-patch32", num_proc=1.

  • image_segment_mapper_args -- Arguments for image segmentation mapper. Controls the image segmentation process. Default empty dict will use fixed values: imgsz=1024, conf=0.05, iou=0.5, model_path="FastSAM-x.pt".

  • image_text_matching_filter_args -- Arguments for image-text matching filter. Controls the matching between cropped image regions and text descriptions. Default empty dict will use fixed values: min_score=0.1, max_score=1.0, hf_blip="Salesforce/blip-itm-base-coco", num_proc=1.

process_single(samples, rank=None)[源代码]

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample