data_juicer.ops.mapper.image_captioning_from_gpt4v_mapper module¶
- data_juicer.ops.mapper.image_captioning_from_gpt4v_mapper.call_gpt_vision_api(api_key, system_prompt, user_prompt, base64_image, max_tokens=500, temperature=1.0, model='gpt-4-vision-preview')[source]¶
- class data_juicer.ops.mapper.image_captioning_from_gpt4v_mapper.ImageCaptioningFromGPT4VMapper(mode: str = 'description', api_key: str = '', max_token: int = 500, temperature: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 1.0, system_prompt: str = '', user_prompt: str = '', user_prompt_key: str | None = None, keep_original_sample: bool = True, any_or_all: str = 'any', *args, **kwargs)[source]¶
Bases:
Mapper
Mapper to generate samples whose texts are generated based on gpt-4-vision and the image.
- __init__(mode: str = 'description', api_key: str = '', max_token: int = 500, temperature: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 1.0, system_prompt: str = '', user_prompt: str = '', user_prompt_key: str | None = None, keep_original_sample: bool = True, any_or_all: str = 'any', *args, **kwargs)[source]¶
Initialization method.
- Parameters:
mode – mode of text generated from images, can be one of [‘reasoning’, ‘description’, ‘conversation’, ‘custom’]
api_key – the API key to authenticate the request.
max_token – the maximum number of tokens to generate. Default is 500.
temperature – controls the randomness of the output (range from 0 to 1). Default is 0.
system_prompt – a string prompt used to set the context of a conversation and provide global guidance or rules for the gpt4-vision so that it can generate responses in the expected way. If mode set to custom, the parameter will be used.
user_prompt – a string prompt to guide the generation of gpt4-vision for each samples. It’s “” in default, which means no prompt provided.
user_prompt_key – the key name of fields in samples to store prompts for each sample. It’s used for set different prompts for different samples. If it’s none, use prompt in parameter “prompt”. It’s None in default.
keep_original_sample – whether to keep the original sample. If it’s set to False, there will be only generated text in the final datasets and the original text will be removed. It’s True in default.
any_or_all – keep this sample with ‘any’ or ‘all’ strategy of all images. ‘any’: keep this sample if any images meet the condition. ‘all’: keep this sample only if all images meet the condition.
args – extra args
kwargs – extra args