Schema
schema ¶
T = TypeVar('T') module-attribute ¶
unify_schemas(this: pa.Schema, other: Optional[pa.Schema]) -> pa.Schema ¶
infer_schema(data: Dict[str, Any], metadata: Optional[Dict[str, Any]] = None) -> Self ¶
validate(schema: pa.Schema, data: dict) -> bool ¶
Validates a dictionary against a PyArrow schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema | Schema | The PyArrow schema to validate against. | required |
data | dict | The dictionary to validate. | required |
Raises:
| Type | Description |
|---|---|
ValidationError | If the dictionary does not match the schema. |
Examples:
>>> schema = pa.schema([pa.field('id', pa.int64()), pa.field('name', pa.string())])
>>> valid_dict = {'id': 1, 'name': 'Alice'}
>>> validate(schema, valid_dict)
>>> invalid_dict = {'id': '1', 'name': 'Alice'}
>>> validate(schema, invalid_dict)
Traceback (most recent call last):
...
ValidationError: ...
get_schema(data: T) -> Tuple[T, pa.Schema] ¶
Extracts the schema from a PyArrow schema or a generator of PyArrow record batches.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | Any | The PyArrow schema or generator of record batches. | required |
Returns:
| Type | Description |
|---|---|
Tuple[Any, Schema] | The data and the schema. |
from_dataclass(cls: T) -> pa.Schema ¶
Converts a dataclass to a PyArrow schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls | Type[T] | The dataclass to convert. | required |
Returns:
| Type | Description |
|---|---|
Schema | The PyArrow schema. |
Examples:
>>> import dataclasses
>>> @dataclasses.dataclass
... class Record:
... id: int
... name: str
>>> from_dataclass(Record)
pyarrow.Schema([...])
get_schema_from_dataclass(*args, **kwargs) -> pa.Schema ¶
Alias for from_dataclass.
Examples:
>>> import dataclasses
>>> @dataclasses.dataclass
... class Record:
... id: int
... name: str
>>> get_schema_from_dataclass(Record)
pyarrow.Schema([...])