VinVL
Pre-trained large-scale object-attribute detection (OD) model based on the ResNeXt-152 C4 architecture1. The OD model has been firstly trained on much larger amounts of data, combining multiple public object detection datasets, including COCO, OpenImages (OI), Objects365, and Visual Genome (VG). Then it is fine-tuned on VG dataset alone, since VG is the only dataset with label attributes (see issue #120). It predicts objects from 1594 classes with attributes from 524 classes. See the code and the paper for details.
Pre-trained models
mozuma.models.vinvl.pretrained.torch_vinvl_detector
VinVL object detection model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
score_threshold |
float |
required | |
attr_score_threshold |
float |
required | |
device |
torch.device |
PyTorch device attribute to initialise model. |
required |
Base model
The VinVL model is an implementation of a TorchModel
.
mozuma.models.vinvl.modules.TorchVinVLDetectorModule
VinVL object detection model
Attributes:
Name | Type | Description |
---|---|---|
score_threshold |
float |
|
attr_score_threshold |
float |
|
device |
torch.device |
PyTorch device attribute to initialise model. |
Provider store
See the stores documentation for usage.
mozuma.models.vinvl.stores.VinVLStore
Pre-trained model states for VinVL
These are identified by training_id=vinvl
.
-
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, and Jianfeng Gao. Vinvl: revisiting visual representations in vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5579–5588. June 2021. ↩