data_juicer.ops.mapper.detect_main_character_mapper module

class data_juicer.ops.mapper.detect_main_character_mapper.DetectMainCharacterMapper(mllm_mapper_args: Dict | None = {}, filter_min_character_num: int = 0, *args, **kwargs)[source]

Bases: Mapper

Extract all main character names based on the given image and its caption.

__init__(mllm_mapper_args: Dict | None = {}, filter_min_character_num: int = 0, *args, **kwargs)[source]

Initialization.

Parameters:
  • mllm_mapper_args – Arguments for multimodal language model mapper. Controls the generation of captions for bounding box regions. Default empty dict will use fixed values: max_new_tokens=256, temperature=0.2, top_p=None, num_beams=1, hf_model=”llava-hf/llava-v1.6-vicuna-7b-hf”.

  • filter_min_character_num – Filters out samples where the number of main characters in the image is less than this threshold.

process_single(samples, rank=None)[source]

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample