Experimenting with Multi-views CNNs with Caffe, work under process.
Ideally, we want to demonstrate that the performance of CNNs for image classification can be improved by providing multiple perspectives --of the same frame-- to the network. Otherwise, one can reduce the depth (and thus the computations) of a CNN and keep tolerable classification accuracy by providing multi-perspective inputs.
- Original paper from Hang et al.
- Original git repository
- Dataset : ModelNet40v1
- Network : Alexnet topology for now, will be experimenting lighter CNNs after ...
Network | Accuracy |
---|---|
alexnet | 1.5% |
alexnet-ft | 85.39% |
mvcnn12 | 88.4% |
mvcnn12-ft | 90.8% |
- Evaluate accuracy of vanilla alexnet on modelnetv1, 1.5% is clearly bad
- Study with 3:12 number of views
- Explore/define where to put the view-pooling layer
- Shitty ModelNet40v1 Dataset: CAD Images, not sure it with work in real world Images
- Multi-view images of large objects in demos