- Require latest MXNet. Set environment variable by
export MXNET_CUDNN_AUTOTUNE_DEFAULT=0
. - Install Python package
mxnet
(cpu inference only) ormxnet-cu90
(gpu training),cython
thenopencv-python matplotlib pycocotools tqdm
.
Download any of the following models to the current directory and run python3 demo.py --dataset $Dataset$ --network $Network$ --params $MODEL_FILE$ --image $YOUR_IMAGE$
to get single image inference.
For example python3 demo.py --dataset voc --network vgg16 --params vgg16_voc0712.params --image myimage.jpg
, add --gpu 0
to use GPU optionally.
Different network has different configuration. Different dataset has different object class names. You must pass them explicitly as command line arguments.
Network | Dataset | Imageset | Reference | Result | Link |
---|---|---|---|---|---|
vgg16 | voc | 07/07 | 69.9 | 70.23 | Dropbox |
vgg16 | voc | 07++12/07 | 73.2 | 75.97 | Dropbox |
resnet101 | voc | 07++12/07 | 76.4 | 79.35 | Dropbox |
vgg16 | coco | train2017/val2017 | 21.2 | 22.8 | Dropbox |
resnet101 | coco | train2017/val2017 | 27.2 | 26.1 | Dropbox |
Make a directory data
and follow py-faster-rcnn
for data preparation instructions.
- Pascal VOC should be in
data/VOCdevkit
containingVOC2007
,VOC2012
andannotations
. - MSCOCO should be in
data/coco
containingtrain2017
,val2017
andannotations/instances_train2017.json
,annotations/instances_val2017.json
.
- VGG16 should be at
model/vgg16-0000.params
from MXNet model zoo. - ResNet should be at
model/resnet-101-0000.params
from MXNet model zoo.
Use python3 train.py --dataset $Dataset$ --network $Network$ --pretrained $IMAGENET_MODEL_FILE$ --gpus $GPUS$
to train,
for example, python3 train.py --dataset voc --network vgg16 --pretrained model/vgg16-0000.params --gpus 0,1
.
Use python3 test.py --dataset $Dataset$ --network $Network$ --params $MODEL_FILE$ --gpu $GPU$
to evaluate,
for example, python3 test.py --dataset voc --network vgg16 --params model/vgg16-0010.params --gpu 0
.
- May 25, 2016: We released Fast R-CNN implementation.
- July 6, 2016: We released Faster R-CNN implementation.
- July 23, 2016: We updated to MXNet module solver.
- Oct 10, 2016: tornadomeet released approximate end-to-end training.
- Oct 30, 2016: We updated to MXNet module inference.
- Jan 19, 2017: We accelerated our pipeline and supported ResNet training.
- Jun 22, 2018: We simplified code.
This repository used code from MXNet,
Fast R-CNN,
Faster R-CNN,
caffe,
tornadomeet/mx-rcnn,
MS COCO API.
Thanks to tornadomeet for end-to-end experiments and MXNet contributers for helpful discussions.
- Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015
- Ross Girshick. "Fast R-CNN." In Proceedings of the IEEE International Conference on Computer Vision, 2015.
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. "Caffe: Convolutional architecture for fast feature embedding." In Proceedings of the ACM International Conference on Multimedia, 2014.
- Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The pascal visual object classes (voc) challenge." International journal of computer vision 88, no. 2 (2010): 303-338.
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. "ImageNet: A large-scale hierarchical image database." In Computer Vision and Pattern Recognition, IEEE Conference on, 2009.
- Karen Simonyan, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition". In Computer Vision and Pattern Recognition, IEEE Conference on, 2016.
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. "Microsoft COCO: Common Objects in Context" In European Conference on Computer Vision, pp. 740-755. Springer International Publishing, 2014.