Skip to content

Datasets

Torch datasets

Torch dataset are implementing dataset as suggested in the Datasets & Dataloaders guide. These datasets are compatible with torch.utils.data.DataLoader.

In-memory list datasets

mozuma.torch.datasets.ListDataset dataclass

Simple dataset that contains a list of objects in memory

Attributes:

Name Type Description
objects Sequence[_DatasetType]

List of objects of the dataset

mozuma.torch.datasets.ListDatasetIndexed dataclass

Simple dataset that contains a list of objects in memory with custom indices

Attributes:

Name Type Description
indices Sequence[_IndicesType]

Indices to track objects in results

objects Sequence[_DatasetType]

List of objects of the dataset

Local files datasets

mozuma.torch.datasets.LocalBinaryFilesDataset dataclass

Dataset that reads a list of local file names and returns their content as bytes

Attributes:

Name Type Description
paths Sequence[_PathLike]

List of paths to files

Image datasets

mozuma.torch.datasets.ImageDataset dataclass

Dataset that returns PIL.Image.Image from a dataset of images in bytes format

Attributes:

Name Type Description
binary_files_dataset TorchDataset[_IndicesType, bytes]

Dataset to load images. Usually a LocalBinaryFilesDataset.

resize_image_size tuple[int, int] | None

Optionally reduce the image size on load

mode str | None

Optional mode to apply when loading the image. See PIL Image.draft parameters.

Bounding box datasets

mozuma.torch.datasets.ImageBoundingBoxDataset dataclass

Dataset that returns tuple of Image and bounding box data from a list of file names and bounding box information

Attributes:

Name Type Description
image_dataset TorchDataset[_IndicesType, Image]

The dataset of images

bounding_boxes Sequence[BatchBoundingBoxesPrediction[np.ndarray]]

The bounding boxes predictions for all given images

crop_image bool

Whether to crop the image at the bounding box when loading it

Training datasets

mozuma.torch.datasets.TorchTrainingDataset dataclass

Dataset for training that returns a tuple (payload, target) where payload is the value returned by dataset and target the corrisponding element in targets.

Attributes:

Name Type Description
dataset TorchDataset[_IndicesType, _DatasetType]

A TorchDataset

targets Sequence[_TargetsType]

Training target for each element of the dataset

Note

Length of targets must match the size of the dataset.

Warning

TorchTrainingDataset doesn't work is with Torchvision datasets in torchvision.datasets.

Write your own dataset

Dataset types depends on the runner used, refer to the runner list to know which type to implement.

Below is the list of dataset protocols that are specified by mozuma

mozuma.torch.datasets.TorchDataset

PyTorch dataset protocol

In order to be used with the PyTorch runners, a TorchDataset should expose two functions __getitem__ and __len__.

__getitem__(self, index: int) -> Tuple[+_IndicesType, +_DatasetType] special

Get an item of the dataset by index

Parameters:

Name Type Description Default
index int

The index of the element to get

required

Returns:

Type Description
Tuple[_IndicesType, _DatasetType]

A tuple of dataset index of the element and value of the element

__len__(self) -> int special

Length of the dataset

Returns:

Type Description
int

The length of the dataset

getitem_indices(self, index: int) -> +_IndicesType

The value of dataset indices at index position

Parameters:

Name Type Description Default
index int

The position of the element to get

required

Returns:

Type Description
_IndicesType

The value of the indices at the position