data_juicer.ops.mapper.image_remove_background_mapper module

class data_juicer.ops.mapper.image_remove_background_mapper.ImageRemoveBackgroundMapper(alpha_matting: bool = False, alpha_matting_foreground_threshold: int = 240, alpha_matting_background_threshold: int = 10, alpha_matting_erode_size: int = 10, bgcolor: Tuple[int, int, int, int] | None = None, save_dir: str = None, *args, **kwargs)[source]

Bases: Mapper

Mapper to remove the background of images.

This operator processes each image in the sample, removing its background. It uses the rembg library to perform the background removal. If alpha_matting is enabled, it applies alpha matting with specified thresholds and erosion size. The resulting images are saved in PNG format. The bgcolor parameter can be set to specify a custom background color for the cutout image. The processed images are stored in the directory specified by save_dir, or in the same directory as the input files if save_dir is not provided. The source_file field in the sample is updated to reflect the new file paths.

__init__(alpha_matting: bool = False, alpha_matting_foreground_threshold: int = 240, alpha_matting_background_threshold: int = 10, alpha_matting_erode_size: int = 10, bgcolor: Tuple[int, int, int, int] | None = None, save_dir: str = None, *args, **kwargs)[source]

Initialization method.

Parameters:
  • alpha_matting – (bool, optional) Flag indicating whether to use alpha matting. Defaults to False.

  • alpha_matting_foreground_threshold – (int, optional) Foreground threshold for alpha matting. Defaults to 240.

  • alpha_matting_background_threshold – (int, optional) Background threshold for alpha matting. Defaults to 10.

  • alpha_matting_erode_size – (int, optional) Erosion size for alpha matting. Defaults to 10.

  • bgcolor – (Optional[Tuple[int, int, int, int]], optional) Background color for the cutout image. Defaults to None.

  • save_dir – The directory where generated image files will be stored. If not specified, outputs will be saved in the same directory as their corresponding input files. This path can alternatively be defined by setting the DJ_PRODUCED_DATA_DIR environment variable.

*args (Optional[Any]): Additional positional arguments. **kwargs (Optional[Any]): Additional keyword arguments.

process_single(sample, context=False)[source]

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample