data_juicer.ops.mapper.image_tagging_mapper module

class data_juicer.ops.mapper.image_tagging_mapper.ImageTaggingMapper(tag_field_name: str = 'image_tags', *args, **kwargs)[source]

Bases: Mapper

Generates image tags for each image in the sample.

This operator processes images to generate descriptive tags. It uses a Hugging Face model to analyze the images and produce relevant tags. The tags are stored in the specified field, defaulting to ‘image_tags’. If the tags are already present in the sample, the operator will not recompute them. For samples without images, an empty tag array is assigned. The generated tags are sorted by frequency and stored as a list of strings.

__init__(tag_field_name: str = 'image_tags', *args, **kwargs)[source]

Initialization method. :param tag_field_name: the field name to store the tags. It’s

“image_tags” in default.

Parameters:
  • args – extra args

  • kwargs – extra args

process_single(sample, rank=None, context=False)[source]

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample