data_juicer.core.data.schema module¶
- class data_juicer.core.data.schema.Schema(column_types: Dict[str, Any], columns: List[str])[源代码]¶
基类:
object
Dataset schema representation.
- column_types¶
Mapping of column names to their types
- Type:
Dict[str, Any]
- columns¶
List of column names in order
- Type:
List[str]
- column_types: Dict[str, Any]¶
- columns: List[str]¶
- classmethod map_hf_type_to_python(feature)[源代码]¶
Map HuggingFace feature type to Python type.
Recursively maps nested types (e.g., List[str], Dict[str, int]).
示例
Value('string') -> str Sequence(Value('int32')) -> List[int] Dict({'text': Value('string')}) -> Dict[str, Any]
- 参数:
feature -- HuggingFace feature type
- 返回:
Corresponding Python type
- classmethod map_ray_type_to_python(ray_type: DataType)[源代码]¶
Map Ray/Arrow data type to Python type.
- 参数:
ray_type -- PyArrow DataType
- 返回:
Corresponding Python type
- __init__(column_types: Dict[str, Any], columns: List[str]) None ¶