Few-shot object detection has drawn increasing attention in the field of robotic exploration, where robots are required to find unseen objects with a few online provided examples. Despite recent efforts have been made to yield online processing capabilities, slow inference speeds of low-powered robots fail to meet the demands of real-time detection-making them impractical for autonomous exploration. Existing methods still face performance and efficiency challenges, mainly due to unreliable features and exhaustive class loops. In this work, we propose a new paradigm AirShot, and discover that, by fully exploiting the valuable correlation map, AirShot can result in a more robust and faster few-shot object detection system, which is more applicable to robotics community. The core module Top Prediction Filter (TPF) can operate on multi-scale correlation maps in both the training and inference stages. During training, TPF supervises the generation of a more representative correlation map, while during inference, it reduces looping iterations by selecting top-ranked classes, thus cutting down on computational costs with better performance. Surprisingly, this dual functionality exhibits general effectiveness and efficiency on various off-the-shelf models. Exhaustive experiments on COCO2017, VOC2014, and SubT datasets demonstrate that TPF can significantly boost the efficacy and efficiency of most off-the-shelf models, achieving up to 36.4% precision improvements along with 56.3% faster inference speed. We also opensource the DARPA Subterranean (SubT) Dataset for Few-shot Object Detection.
- Release SubT Dataset
- Release Pre-trained Checkpoints
- Release Code
- Prepare Website w/ Videos
Access the data and annotations through the following link: Dataset
Access the pre-trained checkpoints and data through the following link: Pre-trained Checkpoints
We provide official implementation here to reproduce the results w/o fine-tuning of ResNet101 backbone on:
- COCO-2017 validation
- VOC-2012 validation dataset
Expected dataset Structure:
coco/
annotations/
instances_{train,val}2017.json
person_keypoints_{train,val}2017.json
{train,val}2017/
VOC20{12}/
annotations/
json files
JPEGImages/
Download and unzip support (COCO json files) MEGA/BaiduNet(pwd:1134) in
datasets/
coco/
new_annotations/
Download and unzip support (VOC json files) MEGA/BaiduNet(pwd:1134) in
datasets/
voc/
new_annotations/
Run the script
cd datasets
bash generate_support_data.sh
You may modify 4_gen_support_pool_10_shot.py line 190, 213, and 269 with different shots (default is 1 shot).
Download base R-101 model in /output
start training
bash train.sh
It necessarily run 2 stages containing base training and further fine-tuning, different config files are loaded.
bash test.sh
This code is a pre-release, changes are made for modularization purpose thus not verified yet. (Will fix once I am free) If you find any problem(s), email or issues are welcomed.
If AirShot motivates your work or used as baseline, please consider citing us as:
@inproceedings{wang2024airshot,
title = {{AirShot}: Efficient Few-Shot Detection for Autonomous Exploration},
author = {Wang, Zihan and Li, Bowen and Wang, Chen and Scherer, Sebastian},
booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year = {2024},
url = {https://arxiv.org/pdf/2404.05069.pdf}
}