Skip to content

Schema

schema

T = TypeVar('T') module-attribute

unify_schemas(this: pa.Schema, other: Optional[pa.Schema]) -> pa.Schema

infer_schema(data: Dict[str, Any], metadata: Optional[Dict[str, Any]] = None) -> Self

validate(schema: pa.Schema, data: dict) -> bool

Validates a dictionary against a PyArrow schema.

Parameters:

Name Type Description Default
schema Schema

The PyArrow schema to validate against.

required
data dict

The dictionary to validate.

required

Raises:

Type Description
ValidationError

If the dictionary does not match the schema.

Examples:

>>> schema = pa.schema([pa.field('id', pa.int64()), pa.field('name', pa.string())])
>>> valid_dict = {'id': 1, 'name': 'Alice'}
>>> validate(schema, valid_dict)
>>> invalid_dict = {'id': '1', 'name': 'Alice'}
>>> validate(schema, invalid_dict)
Traceback (most recent call last):
...
ValidationError: ...

get_schema(data: T) -> Tuple[T, pa.Schema]

Extracts the schema from a PyArrow schema or a generator of PyArrow record batches.

Parameters:

Name Type Description Default
data Any

The PyArrow schema or generator of record batches.

required

Returns:

Type Description
Tuple[Any, Schema]

The data and the schema.

from_dataclass(cls: T) -> pa.Schema

Converts a dataclass to a PyArrow schema.

Parameters:

Name Type Description Default
cls Type[T]

The dataclass to convert.

required

Returns:

Type Description
Schema

The PyArrow schema.

Examples:

>>> import dataclasses
>>> @dataclasses.dataclass
... class Record:
...     id: int
...     name: str
>>> from_dataclass(Record)
pyarrow.Schema([...])

get_schema_from_dataclass(*args, **kwargs) -> pa.Schema

Alias for from_dataclass.

Examples:

>>> import dataclasses
>>> @dataclasses.dataclass
... class Record:
...     id: int
...     name: str
>>> get_schema_from_dataclass(Record)
pyarrow.Schema([...])