Skip to content
/ EOV-Seg Public

[AAAI 2025] Official implementation of the paper "EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation"

License

Notifications You must be signed in to change notification settings

nhw649/EOV-Seg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎉🎉🎉 EOV-Seg (Accepted by AAAI 2025)

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Hongwei Niu1, Jie Hu2, Jianghang Lin1, Guannan Jiang3, Shengchuan Zhang1

1Xiamen University, 2National University of Singapore, 3Contemporary Amperex Technology Co., Limited (CATL)

[Paper] [Demo] [BibTeX]


💡 Abstract

Open-vocabulary panoptic segmentation aims to segment and classify everything in diverse scenes across an unbounded vocabulary. Existing methods typically employ two-stage or single-stage framework. The two-stage framework involves cropping the image multiple times using masks generated by a mask generator, followed by feature extraction, while the single-stage framework relies on a heavyweight mask decoder to make up for the lack of spatial position information through self-attention and cross-attention in multiple stacked Transformer blocks. Both methods incur substantial computational overhead, thereby hindering the efficiency of model inference. To fill the gap in efficiency, we propose EOV-Seg, a novel single-stage, shared, efficient, and spatial-aware framework designed for open-vocabulary panoptic segmentation. Specifically, EOV-Seg innovates in two aspects. First, a Vocabulary-Aware Selection (VAS) module is proposed to improve the semantic comprehension of visual aggregated features and alleviate the feature interaction burden on the mask decoder. Second, we introduce a Two-way Dynamic Embedding Experts (TDEE), which efficiently utilizes the spatial awareness capabilities of ViT-based CLIP backbone. To the best of our knowledge, EOV-Seg is the first open-vocabulary panoptic segmentation framework towards efficiency, which runs faster and achieves competitive performance compared with state-of-the-art methods. Specifically, with COCO training only, EOV-Seg achieves 24.5 PQ, 32.1 mIoU, and 11.6 FPS on the ADE20K dataset and the inference time of EOV-Seg is 4-19 times faster than state-of-the-art methods. Especially, equipped with ResNet50 backbone, EOV-Seg runs 23.8 FPS with only 71M parameters on a single RTX 3090 GPU.


📋 Table of content

  1. 🛠️ Installation
  2. 🎯 Model Zoo
  3. 🧰 Usage
    1. Prepare Datasets
    2. Training
    3. Evaluation
    4. Inference
  4. 🔍 Citation
  5. 📜 License
  6. 💖 Acknowledgement

🛠️ Installation

conda create --name eov-seg python=3.8 -y
conda activate eov-seg
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

pip install -U opencv-python
git clone [email protected]:facebookresearch/detectron2.git
python -m pip install -e detectron2
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git

git clone https://github.com/nhw649/EOV-Seg.git
cd EOV-Seg
pip install -r requirements.txt

🎯 Model Zoo

Open-vocabulary panoptic segmentation
Name Backbone PQ SQ RQ AP mIoU FPS Params Download
EOV-Seg (S) ResNet50 15.1 57.0 18.9 7.2 21.9 23.8 71M ckpt 
EOV-Seg (M) ResNet50x4 18.7 63.5 23.2 8.5 25.5 18.4 127M ckpt 
EOV-Seg (L) ConvNeXt-L 24.5 70.2 30.1 13.7 32.1 11.6 225M ckpt 
Open-vocabulary semantic segmentation
Name Backbone A-847 PC-459 A-150 PC-59 PAS-20 FPS Download
EOV-Seg (S) ResNet50 6.6 11.5 21.9 46.0 87.2 24.5 ckpt 
EOV-Seg (M) ResNet50x4 7.8 12.2 25.5 51.8 91.2 18.9 ckpt 
EOV-Seg (L) ConvNeXt-L 12.8 16.8 32.1 56.9 94.8 11.8 ckpt 

🧰 Usage

  1. Please follow this to prepare datasets for training. The data should be organized like:
datasets/
    coco/
        annotations/
        {train, val}2017/
        panoptic_{train, val}2017/
        panoptic_semseg_{train, val}2017/
        stuffthingmaps_detectron2/
    ADEChallengeData2016/
        images/
        annotations/
        annotations_instance/
        annotations_detectron2/
        ade20k_panoptic_{train, val}/
        ade20k_panoptic_{train,val}.json
        ade20k_instance_{train,val}.json
    ADE20K_2021_17_01/
        images/
        images_detectron2/
        annotations_detectron2/
    VOCdevkit/
        VOC2012/
            Annotations/
            JPEGImages/
            ImageSets/
                Segmentation/  
        VOC2010/
            JPEGImages/
            trainval/
            trainval_merged.json
    pascal_voc_d2/
        images/
        annotations_pascal21/
        annotations_pascal20/
    pascal_ctx_d2/
        images/
        annotations_ctx59/
        annotations_ctx459/
  1. To train a model, use
# For ConvNeXt-Large variant
python train_net.py --num-gpus 4 --config-file configs/eov_seg/eov_seg_convnext_l.yaml
# For ResNet-50x4 variant
python train_net.py --num-gpus 4 --config-file configs/eov_seg/eov_seg_r50x4.yaml
# For ResNet-50 variant
python train_net.py --num-gpus 4 --config-file configs/eov_seg/eov_seg_r50.yaml
  1. To evaluate a model's performance, use
# For ConvNeXt-Large variant
python train_net.py --config-file configs/eov_seg/eov_seg_convnext_l.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file
# For ResNet-50x4 variant
python train_net.py --config-file configs/eov_seg/eov_seg_r50x4.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file
# For ResNet-50 variant
python train_net.py --config-file configs/eov_seg/eov_seg_r50.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file
  1. Inference demo with pre-trained models, use
python demo/demo.py --config-file configs/gtav/scsd_R50_bs2_20k.yaml \
                    --input input_dir/ \
                    --output output_dir/ \
                    --opts MODEL.WEIGHTS /path/to/checkpoint_file

🔍 Citation

@article{niu2024eov,
  title={EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation},
  author={Niu, Hongwei and Hu, Jie and Lin, Jianghang and Zhang, Shengchuan},
  journal={arXiv preprint arXiv:2412.08628},
  year={2024}
}

📜 License

EOV-Seg is released under the Apache 2.0 license. Please refer to LICENSE for the careful check, if you are using our code for commercial matters.

💖 Acknowledgement

About

[AAAI 2025] Official implementation of the paper "EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages