Skip to content

base

Base data module framework providing abstract interfaces for data pipeline components. Defines common structure and behavior for all data processing modules in the system.

BaseDataModule

Bases: BaseModule

Abstract base class for all data processing modules in the pipeline.

Provides common interface and metadata management for data operations. All concrete data modules must inherit from this class and implement the run method.

Attributes:

Name Type Description
module_type DataModuleType

Type classification of the data module from DataModuleType enum

name str

Unique identifier for the module instance

config Optional[Dict[str, Any]]

Module-specific configuration parameters

metadata Optional[Dict[str, Any]]

Additional metadata for tracking and debugging

Source code in rm_gallery/core/data/base.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
class BaseDataModule(BaseModule):
    """
    Abstract base class for all data processing modules in the pipeline.

    Provides common interface and metadata management for data operations.
    All concrete data modules must inherit from this class and implement the run method.

    Attributes:
        module_type: Type classification of the data module from DataModuleType enum
        name: Unique identifier for the module instance
        config: Module-specific configuration parameters
        metadata: Additional metadata for tracking and debugging
    """

    module_type: DataModuleType = Field(..., description="module type")
    name: str = Field(..., description="module name")
    config: Optional[Dict[str, Any]] = Field(None, description="module config")
    metadata: Optional[Dict[str, Any]] = Field(None, description="metadata")

    @abstractmethod
    def run(self, input_data: Union[BaseDataSet, List[DataSample]], **kwargs):
        """
        Execute the module's data processing logic.

        Args:
            input_data: Input dataset or list of data samples to process
            **kwargs: Additional runtime parameters specific to the module

        Returns:
            Processed data in the form of BaseDataSet or List[DataSample]

        Raises:
            NotImplementedError: If not implemented by concrete subclass
        """
        pass

    def get_module_info(self) -> Dict[str, Any]:
        """
        Retrieve comprehensive module information for debugging and monitoring.

        Returns:
            Dict containing module type, name, configuration, and metadata
            Used for pipeline introspection and debugging
        """
        config_dict = self.config.model_dump() if self.config else None
        return {
            "type": self.module_type.value,
            "name": self.name,
            "config": config_dict,
            "metadata": self.metadata,
        }

get_module_info()

Retrieve comprehensive module information for debugging and monitoring.

Returns:

Type Description
Dict[str, Any]

Dict containing module type, name, configuration, and metadata

Dict[str, Any]

Used for pipeline introspection and debugging

Source code in rm_gallery/core/data/base.py
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def get_module_info(self) -> Dict[str, Any]:
    """
    Retrieve comprehensive module information for debugging and monitoring.

    Returns:
        Dict containing module type, name, configuration, and metadata
        Used for pipeline introspection and debugging
    """
    config_dict = self.config.model_dump() if self.config else None
    return {
        "type": self.module_type.value,
        "name": self.name,
        "config": config_dict,
        "metadata": self.metadata,
    }

run(input_data, **kwargs) abstractmethod

Execute the module's data processing logic.

Parameters:

Name Type Description Default
input_data Union[BaseDataSet, List[DataSample]]

Input dataset or list of data samples to process

required
**kwargs

Additional runtime parameters specific to the module

{}

Returns:

Type Description

Processed data in the form of BaseDataSet or List[DataSample]

Raises:

Type Description
NotImplementedError

If not implemented by concrete subclass

Source code in rm_gallery/core/data/base.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
@abstractmethod
def run(self, input_data: Union[BaseDataSet, List[DataSample]], **kwargs):
    """
    Execute the module's data processing logic.

    Args:
        input_data: Input dataset or list of data samples to process
        **kwargs: Additional runtime parameters specific to the module

    Returns:
        Processed data in the form of BaseDataSet or List[DataSample]

    Raises:
        NotImplementedError: If not implemented by concrete subclass
    """
    pass

DataModuleType

Bases: Enum

Enumeration of supported data module types for categorizing processing components.

Each type represents a distinct stage in the data pipeline: - BUILD: Orchestrates the entire data pipeline workflow - LOAD: Ingests data from external sources - GENERATE: Creates new data samples programmatically - PROCESS: Transforms and filters existing data - ANNOTATION: Adds labels and metadata to data - EXPORT: Outputs data to various target formats

Source code in rm_gallery/core/data/base.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class DataModuleType(Enum):
    """
    Enumeration of supported data module types for categorizing processing components.

    Each type represents a distinct stage in the data pipeline:
    - BUILD: Orchestrates the entire data pipeline workflow
    - LOAD: Ingests data from external sources
    - GENERATE: Creates new data samples programmatically
    - PROCESS: Transforms and filters existing data
    - ANNOTATION: Adds labels and metadata to data
    - EXPORT: Outputs data to various target formats
    """

    BUILD = "builder"
    LOAD = "loader"
    PROCESS = "processor"
    ANNOTATION = "annotator"
    EXPORT = "exporter"