data_juicer.core.data.schema module

class data_juicer.core.data.schema.Schema(column_types: Dict[str, Any], columns: List[str])[源代码]

基类:object

Dataset schema representation.

column_types

Mapping of column names to their types

Type:

Dict[str, Any]

columns

List of column names in order

Type:

List[str]

column_types: Dict[str, Any]
columns: List[str]
classmethod from_hf_features(features: Features)[源代码]
classmethod from_ray_schema(schema)[源代码]
classmethod map_hf_type_to_python(feature)[源代码]

Map HuggingFace feature type to Python type.

Recursively maps nested types (e.g., List[str], Dict[str, int]).

示例

Value('string') -> str Sequence(Value('int32')) -> List[int] Dict({'text': Value('string')}) -> Dict[str, Any]

参数:

feature -- HuggingFace feature type

返回:

Corresponding Python type

classmethod map_ray_type_to_python(ray_type: DataType)[源代码]

Map Ray/Arrow data type to Python type.

参数:

ray_type -- PyArrow DataType

返回:

Corresponding Python type

__init__(column_types: Dict[str, Any], columns: List[str]) None