data_juicer.ops.mapper.mllm_mapper module¶

class data_juicer.ops.mapper.mllm_mapper.MllmMapper(hf_model: str = 'llava-hf/llava-v1.6-vicuna-7b-hf', max_new_tokens=256, temperature=0.2, top_p=None, num_beams=1, *args, **kwargs)[source]¶

Bases: Mapper

Mapper to use MLLMs for visual question answering tasks. This operator uses a Hugging Face model to generate answers based on input text and images. It supports models like llava-hf/llava-v1.6-vicuna-7b-hf and Qwen/Qwen2-VL-7B-Instruct. The operator processes each sample, loading and processing images, and generating responses using the specified model. The generated responses are appended to the sample’s text field. The key parameters include the model ID, maximum new tokens, temperature, top-p sampling, and beam search size, which control the generation process.

__init__(hf_model: str = 'llava-hf/llava-v1.6-vicuna-7b-hf', max_new_tokens=256, temperature=0.2, top_p=None, num_beams=1, *args, **kwargs)[source]¶

Initialization method. :param hf_model: hugginface model id. :param max_new_tokens: the maximum number of new tokens

generated by the model.

Parameters:

temperature – used to control the randomness of generated text. The higher the temperature, the more random and creative the generated text will be.
top_p – randomly select the next word from the group of words whose cumulative probability reaches p.
num_beams – the larger the beam search size, the higher the quality of the generated text.
args – extra args
kwargs – extra args

process_single(sample=None, rank=None)[source]¶

For sample level, sample –> sample

Parameters:: sample – sample to process
Returns:: processed sample