ICCV2023-Papers-with-Code

ICCV 2023 论文和开源项目合集(papers with code)！

2160 papers accepted！

ICCV 2023 收录论文IDs：https://t.co/A0mCH8gbOi

注1：欢迎各位大佬提交issue，分享ICCV 2023论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

ICCV 2021

如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~

【ICCV 2023 论文开源目录】

Backbone
CLIP
MAE
GAN
GNN
MLP
NAS
OCR
NeRF
DETR
Prompt
Diffusion Models(扩散模型)
Prompt
Avatars
ReID(重识别)
长尾分布(Long-Tail)
Vision Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像分类(Medical Image Classfication)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video Object Segmentation)
视频实例分割(Video Instance Segmentation)
参考图像分割(Referring Image Segmentation)
图像抠图(Image Matting)
Low-level Vision
超分辨率(Super-Resolution)
去噪(Denoising)
去模糊(Deblur)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D语义场景补全(3D Semantic Scene Completion)
3D配准(3D Registration)
3D人体姿态估计(3D Human Pose Estimation)
3D人体Mesh估计(3D Human Mesh Estimation)
医学图像(Medical Image)
图像生成(Image Generation)
视频生成(Video Generation)
图像编辑(Image Editing)
视频编辑(Video Editing)
视频理解(Video Understanding)
人体运动生成(Human Motion Generation)
低光照图像增强(Low-light Image Enhancement)
场景文本识别(Scene Text Recognition)
图像检索(Image Retrieval)
图像融合(Image Fusion)
轨迹预测(Trajectory Prediction)
人群计数(Crowd Counting)
Video Quality Assessment(视频质量评价)
其它(Others)

Avatars

Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

Paper: https://arxiv.org/abs/2303.17606

Code: https://github.com/songrise/AvatarCraft

Backbone

Rethinking Mobile Block for Efficient Attention-based Models

Paper: https://arxiv.org/abs/2301.01146
Code: https://github.com/zhangzjn/EMO

CLIP

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

Paper: https://arxiv.org/abs/2307.15199
Code: https://PromptStyler.github.io/

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

Paper: https://arxiv.org/abs/2308.15226
Code: http://www.github.com/devaansh100/CLIPTrans

NeRF

IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis

Homepage: https://zju3dv.github.io/intrinsic_nerf/
Paper: https://arxiv.org/abs/2210.00647
Code: https://github.com/zju3dv/IntrinsicNeRF

Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

Paper: https://arxiv.org/abs/2303.17606
Code: https://github.com/songrise/AvatarCraft

FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis

Homepage: https://shawn615.github.io/flipnerf/
Code: https://github.com/shawn615/FlipNeRF
Paper: https://arxiv.org/abs/2306.17723

Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields

Homepage: https://wbhu.github.io/projects/Tri-MipRF
Paper: https://arxiv.org/abs/2307.11335
Code: https://github.com/wbhu/Tri-MipRF

Diffusion Models(扩散模型)

PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment

Paper: https://arxiv.org/abs/2306.15667
Code: https://github.com/facebookresearch/PoseDiffusion

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Paper: https://arxiv.org/abs/2303.09833
Code: https://github.com/vvictoryuki/FreeDoM

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Paper: https://arxiv.org/abs/2307.10816
Code: https://github.com/Sierkinhane/BoxDiff

BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction

Paper: https://arxiv.org/abs/2211.14304
Code: https://github.com/BarqueroGerman/BeLFusion

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

Paper: https://arxiv.org/abs/2303.06840
Code: https://github.com/Zhaozixiang1228/MMIF-DDFM

DIRE for Diffusion-Generated Image Detection

Paper: https://arxiv.org/abs/2303.09295
Code: https://github.com/ZhendongWang6/DIRE

Prompt

Read-only Prompt Optimization for Vision-Language Few-shot Learning

Paper: https://arxiv.org/abs/2308.14960
Code: https://github.com/mlvlab/RPO

Introducing Language Guidance in Prompt-based Continual Learning

Paper: https://arxiv.org/abs/2308.15827
Code: None

视觉和语言(Vision-Language)

Read-only Prompt Optimization for Vision-Language Few-shot Learning

Paper: https://arxiv.org/abs/2308.14960
Code: https://github.com/mlvlab/RPO

目标检测(Object Detection)

Femtodet: an object detection baseline for energy versus performance tradeoffs

Paper: https://arxiv.org/abs/2301.06719
Code: https://github.com/yh-pengtu/FemtoDet

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

Paper: https://arxiv.org/abs/2207.13085
Code: https://github.com/Atten4Vis/GroupDETR

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Paper: https://arxiv.org/abs/2205.09613
Code: https://github.com/LiewFeng/imTED

ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation

Paper: https://arxiv.org/abs/2308.09242
Code: https://github.com/iSEE-Laboratory/ASAG

目标跟踪(Visual Tracking)

Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers

Paper: https://arxiv.org/abs/2307.04129
Code: https://github.com/ZHU-Zhiyu/High-Rank_RGB-Event_Tracker

语义分割(Semantic Segmentation)

Segment Anything

Homepage: https://segment-anything.com/
Paper: https://arxiv.org/abs/2304.02643
Code: https://github.com/facebookresearch/segment-anything

MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2304.09913
Code: https://github.com/shjo-april/MARS

FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation

Paper: https://arxiv.org/abs/2307.07245
Code: https://github.com/TY-Shi/FreeCOS

Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation

Paper: https://arxiv.org/abs/2211.14512
Code: https://github.com/yyliu01

Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement

Paper: https://arxiv.org/abs/2307.09362
Code: https://github.com/w1oves/DTP

视频目标分割(Video Object Segmentation)

Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus

Paper: https://arxiv.org/abs/2207.01203
Code: https://github.com/lxa9867/R2VOS

视频实例分割(Video Instance Segmentation)

DVIS: Decoupled Video Instance Segmentation Framework

Paper: https://arxiv.org/abs/2306.03413
Code: https://github.com/zhang-tao-whu/DVIS

医学图像分类

BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification

Paper: https://arxiv.org/abs/2203.01937
Code: https://github.com/cyh-0/BoMD

医学图像分割

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Paper: https://arxiv.org/abs/2301.00785
Code: https://github.com/ljwztc/CLIP-Driven-Universal-Model

Low-level Vision

Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive

Paper: https://arxiv.org/abs/2305.19862
Code: https://github.com/shangwei5/SelfDRSC

超分辨率(Super-Resolution)

Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution.

Paper: https://arxiv.org/abs/2303.08942
Code: https://github.com/Zhaozixiang1228/GDSR-SSDNet

3D点云(3D Point Cloud)

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

Homepage: https://ldkong.com/Robo3D
Paper: https://arxiv.org/abs/2303.17597
Code: https://github.com/ldkong1205/Robo3D

Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

Paper: https://arxiv.org/abs/2304.07221
Code: https://github.com/zyh16143998882/ICCV23-IDPT

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

Paper: https://arxiv.org/abs/2308.09247
Code: None

3D目标检测(3D Object Detection)

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

Paper: https://arxiv.org/abs/2206.01256
Code: https://github.com/megvii-research/PETR

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

Paper: https://arxiv.org/abs/2304.13031
Code: https://github.com/AIR-DISCOVER/DQS3D

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

Paper: https://arxiv.org/abs/2304.14340
Code: https://github.com/yichen928/SparseFusion

StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

Paper: https://arxiv.org/abs/2303.11926
Code: https://github.com/exiawsh/StreamPETR.git

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

Paper: https://arxiv.org/abs/2301.01283
Code: https://github.com/junjie18/CMT.git

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

Paper: https://arxiv.org/abs/2304.09801
Project: https://chongjiange.github.io/metabev.html
Code: https://github.com/ChongjianGE/MetaBEV

Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling

Paper: https://arxiv.org/abs/2307.07944
Code: https://github.com/zhuoxiao-chen/ReDB-DA-3Ddet

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

Paper: https://arxiv.org/abs/2307.11477
Code: https://github.com/mengtan00/SA-BEV

3D语义分割(3D Semantic Segmentation)

Rethinking Range View Representation for LiDAR Segmentation

Homepage: https://ldkong.com/RangeFormer
Paper: https://arxiv.org/abs/2303.05367
Code: None

3D目标跟踪(3D Object Tracking)

MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors

Paper: https://arxiv.org/abs/2303.05071
Code : https://github.com/slothfulxtx/MBPTrack3D

视频理解(Video Understanding)

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Paper: https://arxiv.org/abs/2303.16058
Code: https://github.com/OpenGVLab/unmasked_teacher

图像生成(Image Generation)

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Paper: https://arxiv.org/abs/2303.09833
Code: https://github.com/vvictoryuki/FreeDoM

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Paper: https://arxiv.org/abs/2307.10816
Code: https://github.com/Sierkinhane/BoxDiff

视频生成(Video Generation)

Simulating Fluids in Real-World Still Images

Homepage: https://slr-sfs.github.io/
Paper: https://arxiv.org/abs/2204.11335
Code: https://github.com/simon3dv/SLR-SFS

图像编辑(Image Editing)

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Paper: https://arxiv.org/abs/2304.02051
Code: https://github.com/aimagelab/multimodal-garment-designer

视频编辑(Video Editing)

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

Project: https://fate-zero-edit.github.io/
Paper: https://arxiv.org/abs/2303.09535
Code: https://github.com/ChenyangQiQi/FateZero

人体运动生成(Human Motion Generation)

BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction

Paper: https://arxiv.org/abs/2211.14304
Code: https://github.com/BarqueroGerman/BeLFusion

低光照图像增强(Low-light Image Enhancement)

Implicit Neural Representation for Cooperative Low-light Image Enhancement

Paper: https://arxiv.org/abs/2303.11722
Code: https://github.com/Ysz2022/NeRCo

场景文本检测(Scene Text Detection)

场景文本识别(Scene Text Recognition)

Self-supervised Character-to-Character Distillation for Text Recognition

Paper: https://arxiv.org/abs/2211.00288
Code: https://github.com/TongkunGuan/CCD

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

Paper: https://arxiv.org/abs/2305.14758
Code: https://github.com/simplify23/MRN
中文解读：https://zhuanlan.zhihu.com/p/643948935

图像检索(Image Retrieval)

Zero-Shot Composed Image Retrieval with Textual Inversion

Paper: https://arxiv.org/abs/2303.15247
Code: https://github.com/miccunifi/SEARLE

图像融合(Image Fusion)

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

Paper: https://arxiv.org/abs/2303.06840
Code: https://github.com/Zhaozixiang1228/MMIF-DDFM

轨迹预测(Trajectory Prediction)

EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

Homepage: https://inhwanbae.github.io/publication/eigentrajectory/
Paper: https://arxiv.org/abs/2307.09306
Code: https://github.com/InhwanBae/EigenTrajectory

人群计数(Crowd Counting)

Point-Query Quadtree for Crowd Counting, Localization, and More

Paper: https://arxiv.org/abs/2308.13814
Code: https://github.com/cxliu0/PET

Video Quality Assessment(视频质量评价)

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

Paper: https://arxiv.org/abs/2211.04894
Code: https://github.com/VQAssessment/DOVER

其它(Others)

MotionBERT: A Unified Perspective on Learning Human Motion Representations

Homepage: https://motionbert.github.io/
Paper: https://arxiv.org/abs/2210.06551
Code: https://github.com/Walter0807/MotionBERT

Graph Matching with Bi-level Noisy Correspondence

Paper: https://arxiv.org/pdf/2212.04085.pdf
Code: https://github.com/Lin-Yijie/Graph-Matching-Networks/tree/main/COMMON

LDL: Line Distance Functions for Panoramic Localization

Paper: https://arxiv.org/abs/2308.13989
Code: https://github.com/82magnolia/panoramic-localization

Active Neural Mapping

Homepage: https://zikeyan.github.io/active-INR/index.html
Paper: https://arxiv.org/abs/2308.16246
Code: https://zikeyan.github.io/active-INR/index.html#

Reconstructing Groups of People with Hypergraph Relational Reasoning

Paper: https://arxiv.org/abs/2308.15844
Code: https://github.com/boycehbz/GroupRec

Files

README.md

Latest commit

History