data_juicer.utils.lazy_loader module¶
A LazyLoader class for on-demand module loading with uv integration.
- class data_juicer.utils.lazy_loader.LazyLoader(module_name: str, package_name: str = None, package_url: str = None, auto_install: bool = True)[source]¶
Bases:
ModuleType
Lazily import a module, mainly to avoid pulling in large dependencies. Uses uv for fast dependency installation when available.
- classmethod get_package_name(module_name: str) str [source]¶
Convert a module name to its corresponding package name.
- Parameters:
module_name – The name of the module (e.g., ‘cv2’, ‘PIL’)
- Returns:
The corresponding package name (e.g., ‘opencv-python’, ‘Pillow’)
- Return type:
str
- classmethod get_all_dependencies()[source]¶
Get all dependencies, prioritizing uv.lock if available. Falls back to pyproject.toml if uv.lock is not found or fails to parse.
- Returns:
- A dictionary mapping module names to their full package specifications
e.g. {‘numpy’: ‘numpy>=1.26.4,<2.0.0’, ‘pandas’: ‘pandas>=2.0.0’}
- Return type:
dict
- classmethod check_packages(package_specs, pip_args=None)[source]¶
Check if packages are installed and install them if needed.
- Parameters:
package_specs – A list of package specifications to check/install. Can be package names or URLs (e.g., ‘torch’ or ‘git+https://github.com/…’)
pip_args – Optional list of additional arguments to pass to pip install command (e.g., [’–no-deps’, ‘–upgrade’])
- __init__(module_name: str, package_name: str = None, package_url: str = None, auto_install: bool = True)[source]¶
Initialize the LazyLoader.
- Parameters:
module_name – The name of the module to import (e.g., ‘cv2’, ‘ray.data’, ‘torchvision.models’)
package_name – The name of the pip package to install (e.g., ‘opencv-python’, ‘ray’, ‘torchvision’) If None, will use the base module name (e.g., ‘ray’ for ‘ray.data’)
package_url – The URL to install the package from (e.g., git+https://github.com/…)
auto_install – Whether to automatically install missing dependencies