Skip to content

Latest commit

 

History

History
570 lines (351 loc) · 17 KB

README.md

File metadata and controls

570 lines (351 loc) · 17 KB

ICCV2023-Papers-with-Code

ICCV 2023 论文和开源项目合集(papers with code)!

2160 papers accepted!

ICCV 2023 收录论文IDs:https://t.co/A0mCH8gbOi

注1:欢迎各位大佬提交issue,分享ICCV 2023论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

ICCV 2021

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【ICCV 2023 论文开源目录】

Avatars

Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

Paper: https://arxiv.org/abs/2303.17606

Code: https://github.com/songrise/AvatarCraft

Backbone

Rethinking Mobile Block for Efficient Attention-based Models

CLIP

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

NeRF

IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis

Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis

Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields

Diffusion Models(扩散模型)

PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

DIRE for Diffusion-Generated Image Detection

Prompt

Read-only Prompt Optimization for Vision-Language Few-shot Learning

Introducing Language Guidance in Prompt-based Continual Learning

视觉和语言(Vision-Language)

Read-only Prompt Optimization for Vision-Language Few-shot Learning

目标检测(Object Detection)

Femtodet: an object detection baseline for energy versus performance tradeoffs

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation

目标跟踪(Visual Tracking)

Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers

语义分割(Semantic Segmentation)

Segment Anything

MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation

FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation

Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation

Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement

视频目标分割(Video Object Segmentation)

Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus

视频实例分割(Video Instance Segmentation)

DVIS: Decoupled Video Instance Segmentation Framework

医学图像分类

BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification

医学图像分割

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Low-level Vision

Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive

超分辨率(Super-Resolution)

Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution.

3D点云(3D Point Cloud)

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

3D目标检测(3D Object Detection)

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

3D语义分割(3D Semantic Segmentation)

Rethinking Range View Representation for LiDAR Segmentation

3D目标跟踪(3D Object Tracking)

MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors

视频理解(Video Understanding)

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

图像生成(Image Generation)

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

视频生成(Video Generation)

Simulating Fluids in Real-World Still Images

图像编辑(Image Editing)

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

视频编辑(Video Editing)

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

人体运动生成(Human Motion Generation)

BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction

低光照图像增强(Low-light Image Enhancement)

Implicit Neural Representation for Cooperative Low-light Image Enhancement

场景文本检测(Scene Text Detection)

场景文本识别(Scene Text Recognition)

Self-supervised Character-to-Character Distillation for Text Recognition

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

图像检索(Image Retrieval)

Zero-Shot Composed Image Retrieval with Textual Inversion

图像融合(Image Fusion)

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

轨迹预测(Trajectory Prediction)

EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

人群计数(Crowd Counting)

Point-Query Quadtree for Crowd Counting, Localization, and More

Video Quality Assessment(视频质量评价)

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

其它(Others)

MotionBERT: A Unified Perspective on Learning Human Motion Representations

Graph Matching with Bi-level Noisy Correspondence

LDL: Line Distance Functions for Panoramic Localization

Active Neural Mapping

Reconstructing Groups of People with Hypergraph Relational Reasoning