data_juicer.ops.mapper.image_diffusion_mapper module

class data_juicer.ops.mapper.image_diffusion_mapper.ImageDiffusionMapper(hf_diffusion: str = 'CompVis/stable-diffusion-v1-4', trust_remote_code: bool = False, torch_dtype: str = 'fp32', revision: str = 'main', strength: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 0.8, guidance_scale: float = 7.5, aug_num: Annotated[int, Gt(gt=0)] = 1, keep_original_sample: bool = True, caption_key: str | None = None, hf_img2seq: str = 'Salesforce/blip2-opt-2.7b', save_dir: str = None, *args, **kwargs)[源代码]

基类:Mapper

Generate image by diffusion model

__init__(hf_diffusion: str = 'CompVis/stable-diffusion-v1-4', trust_remote_code: bool = False, torch_dtype: str = 'fp32', revision: str = 'main', strength: Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=1)])] = 0.8, guidance_scale: float = 7.5, aug_num: Annotated[int, Gt(gt=0)] = 1, keep_original_sample: bool = True, caption_key: str | None = None, hf_img2seq: str = 'Salesforce/blip2-opt-2.7b', save_dir: str = None, *args, **kwargs)[源代码]

Initialization method.

参数:
  • hf_diffusion -- diffusion model name on huggingface to generate the image.

  • torch_dtype -- the floating point type used to load the diffusion model. Can be one of ['fp32', 'fp16', 'bf16']

  • revision -- The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git.

  • strength -- Indicates extent to transform the reference image. Must be between 0 and 1. image is used as a starting point and more noise is added the higher the strength. The number of denoising steps depends on the amount of noise initially added. When strength is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in num_inference_steps. A value of 1 essentially ignores image.

  • guidance_scale -- A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.

  • aug_num -- The image number to be produced by stable-diffusion model.

  • keep_original_sample -- whether to keep the original sample. If it's set to False, there will be only generated captions in the final datasets and the original captions will be removed. It's True by default.

  • caption_key -- the key name of fields in samples to store captions for each images. It can be a string if there is only one image in each sample. Otherwise, it should be a list. If it's none, ImageDiffusionMapper will produce captions for each images.

  • hf_img2seq -- model name on huggingface to generate caption if caption_key is None.

  • save_dir -- The directory where generated image files will be stored. If not specified, outputs will be saved in the same directory as their corresponding input files. This path can alternatively be defined by setting the DJ_PRODUCED_DATA_DIR environment variable.

process_batched(samples, rank=None, context=False)[源代码]

备注

This is a batched_OP, whose the input and output type are both list. Suppose there are $N$ input sample list with batch size as $b$, and denote aug_num as $M$. the number of total samples after generation is $(1+M)Nb$.

参数:

samples

返回: