data_juicer.utils.lazy_loader module

A LazyLoader class for on-demand module loading with uv integration.

data_juicer.utils.lazy_loader.get_toml_file_path()[源代码]

Get the path to pyproject.toml file.

data_juicer.utils.lazy_loader.get_uv_lock_path()[源代码]

Get the path to uv.lock file.

class data_juicer.utils.lazy_loader.LazyLoader(module_name: str, package_name: str = None, package_url: str = None, auto_install: bool = True)[源代码]

基类:ModuleType

Lazily import a module, mainly to avoid pulling in large dependencies. Uses uv for fast dependency installation when available.

classmethod get_package_name(module_name: str) str[源代码]

Convert a module name to its corresponding package name.

参数:

module_name -- The name of the module (e.g., 'cv2', 'PIL')

返回:

The corresponding package name (e.g., 'opencv-python', 'Pillow')

返回类型:

str

classmethod reset_dependencies_cache()[源代码]

Reset the dependencies cache.

classmethod get_all_dependencies()[源代码]

Get all dependencies, prioritizing uv.lock if available. Falls back to pyproject.toml if uv.lock is not found or fails to parse.

返回:

A dictionary mapping module names to their full package specifications

e.g. {'numpy': 'numpy>=1.26.4,<2.0.0', 'pandas': 'pandas>=2.0.0'}

返回类型:

dict

classmethod check_packages(package_specs, pip_args=None)[源代码]

Check if packages are installed and install them if needed.

参数:
  • package_specs -- A list of package specifications to check/install. Can be package names or URLs (e.g., 'torch' or 'git+https://github.com/...')

  • pip_args -- Optional list of additional arguments to pass to pip install command (e.g., ['--no-deps', '--upgrade'])

__init__(module_name: str, package_name: str = None, package_url: str = None, auto_install: bool = True)[源代码]

Initialize the LazyLoader.

参数:
  • module_name -- The name of the module to import (e.g., 'cv2', 'ray.data', 'torchvision.models')

  • package_name -- The name of the pip package to install (e.g., 'opencv-python', 'ray', 'torchvision') If None, will use the base module name (e.g., 'ray' for 'ray.data')

  • package_url -- The URL to install the package from (e.g., git+https://github.com/...)

  • auto_install -- Whether to automatically install missing dependencies