Datasets
Torch datasets
Torch dataset are implementing dataset as suggested in the
Datasets & Dataloaders
guide.
These datasets are compatible with torch.utils.data.DataLoader
.
In-memory list datasets
mozuma.torch.datasets.ListDataset
dataclass
Simple dataset that contains a list of objects in memory
Attributes:
Name | Type | Description |
---|---|---|
objects |
Sequence[_DatasetType] |
List of objects of the dataset |
mozuma.torch.datasets.ListDatasetIndexed
dataclass
Simple dataset that contains a list of objects in memory with custom indices
Attributes:
Name | Type | Description |
---|---|---|
indices |
Sequence[_IndicesType] |
Indices to track objects in results |
objects |
Sequence[_DatasetType] |
List of objects of the dataset |
Local files datasets
mozuma.torch.datasets.LocalBinaryFilesDataset
dataclass
Dataset that reads a list of local file names and returns their content as bytes
Attributes:
Name | Type | Description |
---|---|---|
paths |
Sequence[_PathLike] |
List of paths to files |
Image datasets
mozuma.torch.datasets.ImageDataset
dataclass
Dataset that returns PIL.Image.Image
from a dataset of images in bytes format
Attributes:
Name | Type | Description |
---|---|---|
binary_files_dataset |
TorchDataset[_IndicesType, bytes] |
Dataset to load images.
Usually a |
resize_image_size |
tuple[int, int] | None |
Optionally reduce the image size on load |
mode |
str | None |
Optional mode to apply when loading the image. See PIL |
Bounding box datasets
mozuma.torch.datasets.ImageBoundingBoxDataset
dataclass
Dataset that returns tuple of Image
and bounding box data from a list of file names and bounding box information
Attributes:
Name | Type | Description |
---|---|---|
image_dataset |
TorchDataset[_IndicesType, Image] |
The dataset of images |
bounding_boxes |
Sequence[BatchBoundingBoxesPrediction[np.ndarray]] |
The bounding boxes predictions for all given images |
crop_image |
bool |
Whether to crop the image at the bounding box when loading it |
Training datasets
mozuma.torch.datasets.TorchTrainingDataset
dataclass
Dataset for training that returns a tuple (payload, target)
where payload
is the value returned
by dataset
and target
the corrisponding element in targets
.
Attributes:
Name | Type | Description |
---|---|---|
dataset |
TorchDataset[_IndicesType, _DatasetType] |
A TorchDataset |
targets |
Sequence[_TargetsType] |
Training target for each element of the dataset |
Note
Length of targets
must match the size of the dataset
.
Warning
TorchTrainingDataset
doesn't work is with Torchvision datasets
in torchvision.datasets
.
Write your own dataset
Dataset types depends on the runner used, refer to the runner list to know which type to implement.
Below is the list of dataset protocols that are specified by mozuma
mozuma.torch.datasets.TorchDataset
PyTorch dataset protocol
In order to be used with the PyTorch runners, a TorchDataset
should expose two functions __getitem__
and __len__
.
__getitem__(self, index: int) -> Tuple[+_IndicesType, +_DatasetType]
special
Get an item of the dataset by index
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int |
The index of the element to get |
required |
Returns:
Type | Description |
---|---|
Tuple[_IndicesType, _DatasetType] |
A tuple of dataset index of the element and value of the element |
__len__(self) -> int
special
Length of the dataset
Returns:
Type | Description |
---|---|
int |
The length of the dataset |
getitem_indices(self, index: int) -> +_IndicesType
The value of dataset indices at index position
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int |
The position of the element to get |
required |
Returns:
Type | Description |
---|---|
_IndicesType |
The value of the indices at the position |