data_juicer.core package¶
Submodules¶
data_juicer.core.adapter module¶
data_juicer.core.analyzer module¶
data_juicer.core.data module¶
data_juicer.core.executor module¶
data_juicer.core.exporter module¶
data_juicer.core.monitor module¶
- class data_juicer.core.monitor.Monitor[source]¶
Bases:
object
Monitor resource utilization and other information during the data processing.
Resource utilization dict: (for each func) ‘’’python {
‘time’: 10, ‘sampling interval’: 0.5, ‘resource’: [
- {
‘timestamp’: xxx, ‘CPU count’: xxx, ‘GPU free mem.’: xxx. …
}, {
‘timestamp’: xxx, ‘CPU count’: xxx, ‘GPU free mem.’: xxx, …
},
]
}¶
Based on the structure above, the resource utilization analysis result will add several extra fields on the first level: ‘’’python {
‘time’: 10, ‘sampling interval’: 0.5, ‘resource’: […], ‘resource_analysis’: {
- ‘GPU free mem.’: {
‘max’: xxx, ‘min’: xxx, ‘avg’: xxx,
}
}¶
Only those fields in DYNAMIC_FIELDS will be analyzed.
- DYNAMIC_FIELDS = {'Available mem.', 'CPU util.', 'Free mem.', 'GPU free mem.', 'GPU used mem.', 'GPU util.', 'Mem. util.', 'Used mem.'}¶
- static monitor_current_resources()[source]¶
Detect the resource utilization of the current environment/machine. All data of “util.” is ratios in the range of [0.0, 1.0]. All data of “mem.” is in MB.
- static analyze_resource_util_list(resource_util_list)[source]¶
Analyze the resource utilization for a given resource util list. Compute {‘max’, ‘min’, ‘avg’} of resource metrics for each dict item.
- static analyze_single_resource_util(resource_util_dict)[source]¶
Analyze the resource utilization for a single resource util dict. Compute {‘max’, ‘min’, ‘avg’} of each resource metrics.
- static monitor_func(func, args=None, sample_interval=0.5)[source]¶
Process the input dataset and probe related information for each OP in the specified operator list.
For now, we support the following targets to probe: “resource”: resource utilization for each OP. “speed”: average processing speed for each OP.
The probe result is a list and each item in the list is the probe result for each OP.