🎉🎉🎉 EOV-Seg (Accepted by AAAI 2025)

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Hongwei Niu¹, Jie Hu², Jianghang Lin¹, Guannan Jiang³, Shengchuan Zhang¹

¹Xiamen University, ²National University of Singapore, ³Contemporary Amperex Technology Co., Limited (CATL)

[Paper] [Demo] [BibTeX]

💡 Abstract

Open-vocabulary panoptic segmentation aims to segment and classify everything in diverse scenes across an unbounded vocabulary. Existing methods typically employ two-stage or single-stage framework. The two-stage framework involves cropping the image multiple times using masks generated by a mask generator, followed by feature extraction, while the single-stage framework relies on a heavyweight mask decoder to make up for the lack of spatial position information through self-attention and cross-attention in multiple stacked Transformer blocks. Both methods incur substantial computational overhead, thereby hindering the efficiency of model inference. To fill the gap in efficiency, we propose EOV-Seg, a novel single-stage, shared, efficient, and spatial-aware framework designed for open-vocabulary panoptic segmentation. Specifically, EOV-Seg innovates in two aspects. First, a Vocabulary-Aware Selection (VAS) module is proposed to improve the semantic comprehension of visual aggregated features and alleviate the feature interaction burden on the mask decoder. Second, we introduce a Two-way Dynamic Embedding Experts (TDEE), which efficiently utilizes the spatial awareness capabilities of ViT-based CLIP backbone. To the best of our knowledge, EOV-Seg is the first open-vocabulary panoptic segmentation framework towards efficiency, which runs faster and achieves competitive performance compared with state-of-the-art methods. Specifically, with COCO training only, EOV-Seg achieves 24.5 PQ, 32.1 mIoU, and 11.6 FPS on the ADE20K dataset and the inference time of EOV-Seg is 4-19 times faster than state-of-the-art methods. Especially, equipped with ResNet50 backbone, EOV-Seg runs 23.8 FPS with only 71M parameters on a single RTX 3090 GPU.

🛠️ Installation

conda create --name eov-seg python=3.8 -y
conda activate eov-seg
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

pip install -U opencv-python
git clone [email protected]:facebookresearch/detectron2.git
python -m pip install -e detectron2
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git

git clone https://github.com/nhw649/EOV-Seg.git
cd EOV-Seg
pip install -r requirements.txt

🎯 Model Zoo

Open-vocabulary panoptic segmentation

Name	Backbone	PQ	SQ	RQ	AP	mIoU	FPS	Params	Download
EOV-Seg (S)	ResNet50	15.1	57.0	18.9	7.2	21.9	23.8	71M	ckpt
EOV-Seg (M)	ResNet50x4	18.7	63.5	23.2	8.5	25.5	18.4	127M	ckpt
EOV-Seg (L)	ConvNeXt-L	24.5	70.2	30.1	13.7	32.1	11.6	225M	ckpt

Open-vocabulary semantic segmentation

Name	Backbone	A-847	PC-459	A-150	PC-59	PAS-20	FPS	Download
EOV-Seg (S)	ResNet50	6.6	11.5	21.9	46.0	87.2	24.5	ckpt
EOV-Seg (M)	ResNet50x4	7.8	12.2	25.5	51.8	91.2	18.9	ckpt
EOV-Seg (L)	ConvNeXt-L	12.8	16.8	32.1	56.9	94.8	11.8	ckpt

🧰 Usage

Please follow this to prepare datasets for training. The data should be organized like:

datasets/
    coco/
        annotations/
        {train, val}2017/
        panoptic_{train, val}2017/
        panoptic_semseg_{train, val}2017/
        stuffthingmaps_detectron2/
    ADEChallengeData2016/
        images/
        annotations/
        annotations_instance/
        annotations_detectron2/
        ade20k_panoptic_{train, val}/
        ade20k_panoptic_{train,val}.json
        ade20k_instance_{train,val}.json
    ADE20K_2021_17_01/
        images/
        images_detectron2/
        annotations_detectron2/
    VOCdevkit/
        VOC2012/
            Annotations/
            JPEGImages/
            ImageSets/
                Segmentation/  
        VOC2010/
            JPEGImages/
            trainval/
            trainval_merged.json
    pascal_voc_d2/
        images/
        annotations_pascal21/
        annotations_pascal20/
    pascal_ctx_d2/
        images/
        annotations_ctx59/
        annotations_ctx459/

To train a model, use

# For ConvNeXt-Large variant
python train_net.py --num-gpus 4 --config-file configs/eov_seg/eov_seg_convnext_l.yaml
# For ResNet-50x4 variant
python train_net.py --num-gpus 4 --config-file configs/eov_seg/eov_seg_r50x4.yaml
# For ResNet-50 variant
python train_net.py --num-gpus 4 --config-file configs/eov_seg/eov_seg_r50.yaml

To evaluate a model's performance, use

# For ConvNeXt-Large variant
python train_net.py --config-file configs/eov_seg/eov_seg_convnext_l.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file
# For ResNet-50x4 variant
python train_net.py --config-file configs/eov_seg/eov_seg_r50x4.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file
# For ResNet-50 variant
python train_net.py --config-file configs/eov_seg/eov_seg_r50.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Inference demo with pre-trained models, use

python demo/demo.py --config-file configs/gtav/scsd_R50_bs2_20k.yaml \
                    --input input_dir/ \
                    --output output_dir/ \
                    --opts MODEL.WEIGHTS /path/to/checkpoint_file

🔍 Citation

@article{niu2024eov,
  title={EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation},
  author={Niu, Hongwei and Hu, Jie and Lin, Jianghang and Zhang, Shengchuan},
  journal={arXiv preprint arXiv:2412.08628},
  year={2024}
}

📜 License

EOV-Seg is released under the Apache 2.0 license. Please refer to LICENSE for the careful check, if you are using our code for commercial matters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎉🎉🎉 EOV-Seg (Accepted by AAAI 2025)

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

💡 Abstract

📋 Table of content

🛠️ Installation

🎯 Model Zoo

🧰 Usage

🔍 Citation

📜 License

💖 Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
datasets		datasets
demo		demo
eov_seg		eov_seg
resources		resources
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_net.py		train_net.py

License

nhw649/EOV-Seg

Folders and files

Latest commit

History

Repository files navigation

🎉🎉🎉 EOV-Seg (Accepted by AAAI 2025)

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

💡 Abstract

📋 Table of content

🛠️ Installation

🎯 Model Zoo

🧰 Usage

🔍 Citation

📜 License

💖 Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages