Getting started
Code-based usage
This guide runs through the inference of a PyTorch ResNet model pre-trained on imagenet.
First, we need to create a dataset of images, for this we will be using an ImageDataset
.
from mozuma.torch.datasets import LocalBinaryFilesDataset, ImageDataset
# Getting a dataset of images (1)
dataset = ImageDataset(
LocalBinaryFilesDataset(
paths=[
"../tests/fixtures/cats_dogs/cat_0.jpg",
"../tests/fixtures/cats_dogs/cat_90.jpg",
]
)
)
- See Datasets for a list of available datasets.
Next, we need to load the ResNet PyTorch module specifying the resnet18
architecture.
The model is initialised with weights pre-trained on ImageNet1.
from mozuma.models.resnet import torch_resnet_imagenet
# Model definition (1)
resnet = torch_resnet_imagenet("resnet18")
- List of all models
Once we have the model initialized, we need to define what we want to do with it.
In this case, we'll run an inference loop using the TorchInferenceRunner
.
Note that we pass two callbacks to the runner: CollectFeaturesInMemory
and CollectLabelsInMemory
.
They will be called to save the features and labels in-memory.
from mozuma.callbacks import CollectFeaturesInMemory, CollectLabelsInMemory
from mozuma.torch.options import TorchRunnerOptions
from mozuma.torch.runners import TorchInferenceRunner
# Creating the callback to collect data (1)
features = CollectFeaturesInMemory()
labels = CollectLabelsInMemory()
# Getting the torch runner for inference (2)
runner = TorchInferenceRunner(
model=resnet,
dataset=dataset,
callbacks=[features, labels],
options=TorchRunnerOptions(tqdm_enabled=True),
)
Now that the runner is initialised, we run it with the method run
.
The callbacks have accumulated the features and labels in memory and we print their content.
# Executing inference
runner.run()
# Printing the features
print(features.indices, features.features)
# Printing labels
print(labels.indices, labels.labels)
Command-line interface
MoZuMa exposes a command-line interface. See python -m mozuma -h
for a list of available commands.
For instance, one can run a ResNet model against a list of local images with the following command:
It prints the results (features and labels) in JSON format.
Similarly, we can extract the key-frames from videos:
python -m mozuma run ".keyframes.selectors.resnet_key_frame_selector(resnet18, 10)" *.mp4 --file-type vi --batch-size 1
-
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee, 2009. ↩