build
Data Build Module - core data pipeline orchestrator for end-to-end data processing. Coordinates loading, processing, annotation, and export stages with flexible configuration.
DataBuilder
Bases: BaseDataModule
Main pipeline orchestrator that coordinates all data processing stages.
Manages the complete data workflow from raw input to final export format, executing each stage in sequence while maintaining data integrity and logging.
Attributes:
Name | Type | Description |
---|---|---|
load_module |
Optional[DataLoader]
|
Optional data loading component for ingesting external data |
process_module |
Optional[DataProcessor]
|
Optional processing component for filtering and transforming data |
annotation_module |
Optional[DataAnnotator]
|
Optional annotation component for adding labels and metadata |
export_module |
Optional[DataExporter]
|
Optional export component for outputting data in target formats |
Source code in rm_gallery/core/data/build.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
__init__(name, config=None, metadata=None, **modules)
Initialize the data build pipeline with specified modules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Unique identifier for the pipeline instance |
required |
config
|
Optional[Dict[str, Any]]
|
Pipeline-level configuration parameters |
None
|
metadata
|
Optional[Dict[str, Any]]
|
Additional metadata for tracking and debugging |
None
|
**modules
|
Keyword arguments for individual pipeline modules |
{}
|
Source code in rm_gallery/core/data/build.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
run(input_data=None, **kwargs)
Execute the complete data processing pipeline with all configured stages.
Processes data through sequential stages: loading → processing → annotation → export. Each stage is optional and only executed if the corresponding module is configured.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data
|
Union[BaseDataSet, List[DataSample], None]
|
Initial dataset, list of samples, or None for load-only pipelines |
None
|
**kwargs
|
Additional runtime parameters passed to individual modules |
{}
|
Returns:
Type | Description |
---|---|
BaseDataSet
|
Final processed dataset after all stages complete |
Raises:
Type | Description |
---|---|
Exception
|
If any pipeline stage fails, with detailed error logging |
Source code in rm_gallery/core/data/build.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
create_builder(name, config=None, **modules)
Factory function to create a data build module with specified configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Unique identifier for the pipeline |
required |
config
|
Optional[Dict[str, Any]]
|
Pipeline configuration parameters |
None
|
**modules
|
Individual module instances to include in the pipeline |
{}
|
Returns:
Type | Description |
---|---|
DataBuilder
|
Configured DataBuilder instance ready for execution |
Source code in rm_gallery/core/data/build.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
create_builder_from_yaml(config_path)
Create a data build module from YAML configuration file.
Supports comprehensive pipeline configuration including data sources, processing operators, annotation settings, and export formats.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_path
|
str
|
Path to YAML configuration file |
required |
Returns:
Type | Description |
---|---|
DataBuilder
|
Fully configured DataBuilder instance based on YAML specification |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If configuration file does not exist |
ValueError
|
If configuration format is invalid |
Source code in rm_gallery/core/data/build.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|