data_juicer.utils.lazy_loader module

A LazyLoader class for on-demand module loading with uv integration.

data_juicer.utils.lazy_loader.get_toml_file_path()[source]

Get the path to pyproject.toml file.

data_juicer.utils.lazy_loader.get_uv_lock_path()[source]

Get the path to uv.lock file.

class data_juicer.utils.lazy_loader.LazyLoader(module_name: str, package_name: str = None, package_url: str = None, auto_install: bool = True)[source]

Bases: ModuleType

Lazily import a module, mainly to avoid pulling in large dependencies. Uses uv for fast dependency installation when available.

classmethod get_package_name(module_name: str) str[source]

Convert a module name to its corresponding package name.

Parameters:

module_name – The name of the module (e.g., ‘cv2’, ‘PIL’)

Returns:

The corresponding package name (e.g., ‘opencv-python’, ‘Pillow’)

Return type:

str

classmethod reset_dependencies_cache()[source]

Reset the dependencies cache.

classmethod get_all_dependencies()[source]

Get all dependencies, prioritizing uv.lock if available. Falls back to pyproject.toml if uv.lock is not found or fails to parse.

Returns:

A dictionary mapping module names to their full package specifications

e.g. {‘numpy’: ‘numpy>=1.26.4,<2.0.0’, ‘pandas’: ‘pandas>=2.0.0’}

Return type:

dict

classmethod check_packages(package_specs, pip_args=None)[source]

Check if packages are installed and install them if needed.

Parameters:
  • package_specs – A list of package specifications to check/install. Can be package names or URLs (e.g., ‘torch’ or ‘git+https://github.com/…’)

  • pip_args – Optional list of additional arguments to pass to pip install command (e.g., [’–no-deps’, ‘–upgrade’])

__init__(module_name: str, package_name: str = None, package_url: str = None, auto_install: bool = True)[source]

Initialize the LazyLoader.

Parameters:
  • module_name – The name of the module to import (e.g., ‘cv2’, ‘ray.data’, ‘torchvision.models’)

  • package_name – The name of the pip package to install (e.g., ‘opencv-python’, ‘ray’, ‘torchvision’) If None, will use the base module name (e.g., ‘ray’ for ‘ray.data’)

  • package_url – The URL to install the package from (e.g., git+https://github.com/…)

  • auto_install – Whether to automatically install missing dependencies