image_captioning_mapper¶
Generates image captions using a Hugging Face model and appends them to samples.
This operator generates captions for images in the input samples using a specified Hugging Face model. It can generate multiple captions per image and apply different strategies to retain the generated captions. The operator supports three retention modes: ‘random_any’, ‘similar_one_simhash’, and ‘all’. In ‘random_any’ mode, a random caption is retained. In ‘similar_one_simhash’ mode, the most similar caption to the original text (based on SimHash) is retained. In ‘all’ mode, all generated captions are concatenated and retained. The operator can also keep or discard the original sample based on the keep_original_sample
parameter. If both prompt
and prompt_key
are set, the prompt_key
takes precedence.
使用 Hugging Face 模型生成图像描述并将其附加到样本中。
该算子使用指定的 Hugging Face 模型为输入样本中的图像生成描述。它可以为每张图像生成多个描述,并应用不同的策略来保留生成的描述。该算子支持三种保留模式:’random_any’、’similar_one_simhash’ 和 ‘all’。在 ‘random_any’ 模式下,随机保留一个描述。在 ‘similar_one_simhash’ 模式下,保留与原始文本最相似的描述(基于 SimHash)。在 ‘all’ 模式下,所有生成的描述被连接并保留。该算子还可以根据 keep_original_sample
参数保留或丢弃原始样本。如果同时设置了 prompt
和 prompt_key
,则 prompt_key
优先。
Type 算子类型: mapper
Tags 标签: cpu, hf, multimodal
🔧 Parameter Configuration 参数配置¶
name 参数名 |
type 类型 |
default 默认值 |
desc 说明 |
---|---|---|---|
|
<class ‘str’> |
|
model name on huggingface to generate caption |
|
<class ‘bool’> |
|
|
|
typing.Annotated[int, Gt(gt=0)] |
|
how many candidate captions to generate |
|
<class ‘str’> |
|
retain strategy for the generated |
|
<class ‘bool’> |
|
whether to keep the original sample. If |
|
typing.Optional[str] |
|
a string prompt to guide the generation of blip2 model |
|
typing.Optional[str] |
|
the key name of fields in samples to store prompts |
|
|
extra args |
|
|
|
extra args |
📊 Effect demonstration 效果演示¶
not available 暂无