Loaders
loaders ¶
P = ParamSpec('P') module-attribute ¶
R = TypeVar('R') module-attribute ¶
F = TypeVar('F', bound=Callable[..., Any]) module-attribute ¶
loaders: Dict[str, DatasetLoader] = {} module-attribute ¶
DatasetLoader ¶
Bases: BaseDatasetLoader
func = func instance-attribute ¶
name = name or self.func.__name__ instance-attribute ¶
extensions = extensions instance-attribute ¶
path_arg = path_arg instance-attribute ¶
wraps = wraps instance-attribute ¶
__init__(func: Callable[..., Any], name: Optional[str] = None, extensions: Optional[list[str]] = None, path_arg: Optional[str] = None, wraps: Optional[Callable[P, R]] = None) ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func | Callable[..., Any] | The function to decorate. | required |
name | Optional[str] | The name of the loader, by default None. | None |
extensions | Optional[list[str]] | The extensions that the loader supports, by default None. | None |
path_arg | Optional[str] | The name of the argument that is the path, by default None. | None |
wraps | Optional[Callable[..., Any]] | The function to wrap, by default None. | None |
__call__(*args: P.args, **kwargs: P.kwargs) -> R ¶
Call the loader function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args | tuple | The arguments to pass to the function. | () |
kwargs | dict | The keyword arguments to pass to the function. | {} |
Returns:
| Type | Description |
|---|---|
R | The result of the function. |
bind(*args: P.args, **kwargs: P.kwargs) -> Callable[..., R] ¶
Bind arguments to the loader function.
Notes
This method is useful for creating a partial function with pre-filled arguments and keyword arguments. This helps to improve the uniqueness of the fingerprint of the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args | tuple | The arguments to pre-fill. | () |
kwargs | dict | The keyword arguments to pre-fill. | {} |
Returns:
| Type | Description |
|---|---|
Callable[..., R] | The partial function. |
dataloader(func: Union[F, str, None] = None, name: Optional[str] = None, extensions: Optional[list[str]] = None, wraps: Optional[Callable[..., Any]] = None, path_arg: Optional[str] = None) -> Union[F, Callable[[F], F]] ¶
dataloader(func: F, name: Optional[str] = None, extensions: Optional[list[str]] = None, wraps: Optional[Callable[P, R]] = None, path_arg: Optional[str] = None) -> F
dataloader(name: str, extensions: Optional[list[str]] = None, wraps: Optional[Callable[P, R]] = None, path_arg: Optional[str] = None) -> Callable[[F], F]
Decorator to register a function as a dataset loader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func | Union[Callable[..., Any], str, None] | The function to decorate, by default None. | None |
name | Optional[str] | The name of the loader, by default None. | None |
extensions | Optional[list[str]] | The extensions that the loader supports, by default None. | None |
wraps | Optional[Callable[..., Any]] | The function to wrap, by default None. | None |
path_arg | Optional[str] | The name of the argument that is the path, by default None. | None |
Returns:
| Type | Description |
|---|---|
DatasetLoader | The dataset loader. |
load_json(path: Union[str, Path], encoding: str = 'utf-8') -> Generator[List[Dict], None, None] ¶
Load a dataset from a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | (str, Path) | The path to the file. | required |
encoding | str | The encoding of the file, by default "utf-8". | 'utf-8' |
Returns:
| Type | Description |
|---|---|
dict | The loaded dataset. |
load_jsonl(path: Union[str, Path], encoding: str = 'utf-8') -> Generator[List[Dict], None, None] ¶
Load a dataset from a JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | (str, Path) | The path to the file. | required |
encoding | str | The encoding of the file, by default "utf-8". | 'utf-8' |
Returns:
| Type | Description |
|---|---|
list[dict] | The loaded dataset. |
load_csv(path: Union[str, Path], encoding: str = 'utf-8') -> Generator[List[Dict], None, None] ¶
Load a dataset from a CSV/TSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | (str, Path) | The path to the file. | required |
encoding | str | The encoding of the file, by default "utf-8". | 'utf-8' |
Returns:
| Type | Description |
|---|---|
list[dict] | The loaded dataset. |