Getting started

Code-based usage

This guide runs through the inference of a PyTorch ResNet model pre-trained on imagenet.

First, we need to create a dataset of images, for this we will be using an ImageDataset.

from mozuma.torch.datasets import LocalBinaryFilesDataset, ImageDataset

# Getting a dataset of images (1)
dataset = ImageDataset(
    LocalBinaryFilesDataset(
        paths=[
            "../tests/fixtures/cats_dogs/cat_0.jpg",
            "../tests/fixtures/cats_dogs/cat_90.jpg",
        ]
    )
)

See Datasets for a list of available datasets.

Next, we need to load the ResNet PyTorch module specifying the resnet18 architecture. The model is initialised with weights pre-trained on ImageNet¹.

from mozuma.models.resnet import torch_resnet_imagenet

# Model definition (1)
resnet = torch_resnet_imagenet("resnet18")

List of all models

Once we have the model initialized, we need to define what we want to do with it. In this case, we'll run an inference loop using the TorchInferenceRunner.

Note that we pass two callbacks to the runner: CollectFeaturesInMemory and CollectLabelsInMemory. They will be called to save the features and labels in-memory.

from mozuma.callbacks import CollectFeaturesInMemory, CollectLabelsInMemory
from mozuma.torch.options import TorchRunnerOptions
from mozuma.torch.runners import TorchInferenceRunner

# Creating the callback to collect data (1)
features = CollectFeaturesInMemory()
labels = CollectLabelsInMemory()

# Getting the torch runner for inference (2)
runner = TorchInferenceRunner(
    model=resnet,
    dataset=dataset,
    callbacks=[features, labels],
    options=TorchRunnerOptions(tqdm_enabled=True),
)

List of available callbacks.
List of available runners

Now that the runner is initialised, we run it with the method run.

The callbacks have accumulated the features and labels in memory and we print their content.

# Executing inference
runner.run()

# Printing the features
print(features.indices, features.features)

# Printing labels
print(labels.indices, labels.labels)

Command-line interface

MoZuMa exposes a command-line interface. See python -m mozuma -h for a list of available commands.

For instance, one can run a ResNet model against a list of local images with the following command:

python -m mozuma run ".resnet.modules.TorchResNetModule(resnet18)" *.jpg

It prints the results (features and labels) in JSON format.

Similarly, we can extract the key-frames from videos:

python -m mozuma run ".keyframes.selectors.resnet_key_frame_selector(resnet18, 10)" *.mp4 --file-type vi --batch-size 1

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee, 2009. ↩