Skip to content

Loaders

loaders

P = ParamSpec('P') module-attribute

R = TypeVar('R') module-attribute

F = TypeVar('F', bound=Callable[..., Any]) module-attribute

loaders: Dict[str, DatasetLoader] = {} module-attribute

DatasetLoader

Bases: BaseDatasetLoader

func = func instance-attribute

name = name or self.func.__name__ instance-attribute

extensions = extensions instance-attribute

path_arg = path_arg instance-attribute

wraps = wraps instance-attribute

__init__(func: Callable[..., Any], name: Optional[str] = None, extensions: Optional[list[str]] = None, path_arg: Optional[str] = None, wraps: Optional[Callable[P, R]] = None)

Parameters:

Name Type Description Default
func Callable[..., Any]

The function to decorate.

required
name Optional[str]

The name of the loader, by default None.

None
extensions Optional[list[str]]

The extensions that the loader supports, by default None.

None
path_arg Optional[str]

The name of the argument that is the path, by default None.

None
wraps Optional[Callable[..., Any]]

The function to wrap, by default None.

None

__call__(*args: P.args, **kwargs: P.kwargs) -> R

Call the loader function.

Parameters:

Name Type Description Default
args tuple

The arguments to pass to the function.

()
kwargs dict

The keyword arguments to pass to the function.

{}

Returns:

Type Description
R

The result of the function.

bind(*args: P.args, **kwargs: P.kwargs) -> Callable[..., R]

Bind arguments to the loader function.

Notes

This method is useful for creating a partial function with pre-filled arguments and keyword arguments. This helps to improve the uniqueness of the fingerprint of the dataset.

Parameters:

Name Type Description Default
args tuple

The arguments to pre-fill.

()
kwargs dict

The keyword arguments to pre-fill.

{}

Returns:

Type Description
Callable[..., R]

The partial function.

dataloader(func: Union[F, str, None] = None, name: Optional[str] = None, extensions: Optional[list[str]] = None, wraps: Optional[Callable[..., Any]] = None, path_arg: Optional[str] = None) -> Union[F, Callable[[F], F]]

dataloader(func: F, name: Optional[str] = None, extensions: Optional[list[str]] = None, wraps: Optional[Callable[P, R]] = None, path_arg: Optional[str] = None) -> F
dataloader(name: str, extensions: Optional[list[str]] = None, wraps: Optional[Callable[P, R]] = None, path_arg: Optional[str] = None) -> Callable[[F], F]

Decorator to register a function as a dataset loader.

Parameters:

Name Type Description Default
func Union[Callable[..., Any], str, None]

The function to decorate, by default None.

None
name Optional[str]

The name of the loader, by default None.

None
extensions Optional[list[str]]

The extensions that the loader supports, by default None.

None
wraps Optional[Callable[..., Any]]

The function to wrap, by default None.

None
path_arg Optional[str]

The name of the argument that is the path, by default None.

None

Returns:

Type Description
DatasetLoader

The dataset loader.

load_json(path: Union[str, Path], encoding: str = 'utf-8') -> Generator[List[Dict], None, None]

Load a dataset from a JSON file.

Parameters:

Name Type Description Default
path (str, Path)

The path to the file.

required
encoding str

The encoding of the file, by default "utf-8".

'utf-8'

Returns:

Type Description
dict

The loaded dataset.

load_jsonl(path: Union[str, Path], encoding: str = 'utf-8') -> Generator[List[Dict], None, None]

Load a dataset from a JSONL file.

Parameters:

Name Type Description Default
path (str, Path)

The path to the file.

required
encoding str

The encoding of the file, by default "utf-8".

'utf-8'

Returns:

Type Description
list[dict]

The loaded dataset.

load_csv(path: Union[str, Path], encoding: str = 'utf-8') -> Generator[List[Dict], None, None]

Load a dataset from a CSV/TSV file.

Parameters:

Name Type Description Default
path (str, Path)

The path to the file.

required
encoding str

The encoding of the file, by default "utf-8".

'utf-8'

Returns:

Type Description
list[dict]

The loaded dataset.