CVPR-2022

Papers and Code from CVPR 2022, including scripts to extract them

Machine Learning

Paper Id	Paper Title	Link
11954	Efficient Deep Embedded Subspace Clustering	Paper
11402	Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers	Paper
9445	CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data	Paper
8776	Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning	Paper
6978	Active Learning for Open-Set Annotation	Paper
9075	Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training	Paper
6601	Robust Optimization As Data Augmentation for Large-Scale Graphs	Paper
6298	A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty	Paper
6106	The Devil Is in the Margin: Margin-Based Label Smoothing for Network Calibration	Paper
6705	Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector	Paper
10071	GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning	Paper
7829	Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning	Paper
5988	A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration	Paper
2503	Learning To Learn by Jointly Optimizing Neural Architecture and Weights	Paper
9806	Learning To Prompt for Continual Learning	Paper
2016	Meta-Attention for ViT-Backed Continual Learning	Paper
1343	Multi-Frame Self-Supervised Depth With Transformers	Paper
10018	Continual Learning With Lifelong Vision Transformer	Paper
780	Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation	Paper
4874	Revisiting Random Channel Pruning for Neural Network Compression	Paper
8330	Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase	Paper
9551	Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning	Paper
10484	Towards Robust and Reproducible Active Learning Using Neural Networks	Paper
7082	Non-Iterative Recovery From Nonlinear Observations Using Generative Models	Paper
11093	Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders	Paper
4542	Robust Combination of Distributed Gradients Under Adversarial Perturbations	Paper
11143	Do Learned Representations Respect Causal Relationships?	Paper
11220	How Much More Data Do I Need? Estimating Requirements for Downstream Tasks	Paper
8156	Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees	Paper
11131	Contrastive Test-Time Adaptation	Paper
448	AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation	Paper
1561	Selective-Supervised Contrastive Learning With Noisy Labels	Paper
7807	RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks	Paper
3279	Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction	Paper

Statistical Methods

Paper Id	Paper Title	Link
3348	Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels	Paper
7912	Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design	Paper
8877	Learning Structured Gaussians To Approximate Deep Ensembles	Paper
11673	Out-of-Distribution Generalization With Causal Invariant Transformations	Paper
8393	Split Hierarchical Variational Compression	Paper
9244	Implicit Feature Decoupling With Depthwise Quantization	Paper
282	Understanding Uncertainty Maps in Vision With Statistical Testing	Paper

Optimization Methods

Paper Id	Paper Title	Link
785	A Hybrid Quantum-Classical Algorithm for Robust Fitting	Paper
5911	A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching	Paper
6021	FastDOG: Fast Discrete Optimization on GPU	Paper
9232	Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization	Paper
10092	AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks	Paper
11171	Training Quantised Neural Networks With STE Variants: The Additive Noise Annealing Algorithm	Paper
2028	AME: Attention and Memory Enhancement in Hyper-Parameter Optimization	Paper
11189	Efficient Maximal Coding Rate Reduction by Variational Forms	Paper
10155	A Unified Framework for Implicit Sinkhorn Differentiation	Paper
6845	Computing Wasserstein-p Distance Between Images With Linear Cost	Paper
9064	An Iterative Quantum Approach for Transformation Estimation From Point Sets	Paper

Deep Learning Architectures & Techniques

Paper Id	Paper Title	Link
116	Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training?	Paper
5389	BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule	Paper
7704	Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search	Paper
4143	Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search	Paper
5167	GreedyNASv2: Greedier Search With a Greedy Path Filter	Paper
1115	Neural Architecture Search With Representation Mutual Information	Paper
7148	Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search	Paper
8841	Knowledge Distillation With the Reused Teacher Classifier	Paper
2812	Self-Distillation From the Last Mini-Batch for Consistency Regularization	Paper
142	Decoupled Knowledge Distillation	Paper
7053	Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs	Paper
123	A ConvNet for the 2020s	Paper
7254	Beyond Fixation: Dynamic Window Visual Transformer	Paper
7867	Lite Vision Transformer With Enhanced Self-Attention	Paper
7428	Swin Transformer V2: Scaling Up Capacity and Resolution	Paper
4325	The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy	Paper
9412	MulT: An End-to-End Multitask Learning Transformer	Paper
3664	Towards Robust Vision Transformer	Paper
9773	DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers	Paper
2434	MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens	Paper
1032	NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition	Paper
2029	TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation	Paper
4853	Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation	Paper
10350	Scaling Vision Transformers	Paper
7298	Bridged Transformer for Vision and Point Cloud 3D Object Detection	Paper
1981	CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows	Paper
3562	TransMix: Attend To Mix for Vision Transformers	Paper
2388	MiniViT: Compressing Vision Transformers With Weight Multiplexing	Paper
11460	Fine-Tuning Image Transformers Using Learnable Memory	Paper
4430	Patch Slimming for Efficient Vision Transformers	Paper
5093	CMT: Convolutional Neural Networks Meet Vision Transformers	Paper
6795	Multimodal Token Fusion for Vision Transformers	Paper

Recognition: Detection, Categorization, Retrieval

Paper Id	Paper Title	Link
2257	Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation	Paper
8811	Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model	Paper
184	Sign Language Video Retrieval With Free-Form Textual Queries	Paper
11661	FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback	Paper
4918	Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation	Paper
9957	ESCNet: Gaze Target Detection With the Understanding of 3D Scenes	Paper
2489	Interactive Multi-Class Tiny-Object Detection	Paper
9614	Weakly Supervised Rotation-Invariant Aerial Object Detection Network	Paper
8402	Large Loss Matters in Weakly Supervised Multi-Label Classification	Paper
8000	MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning	Paper
1233	FreeSOLO: Learning To Segment Objects Without Annotations	Paper
2645	Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection	Paper
3784	SIOD: Single Instance Annotated per Category per Image for Object Detection	Paper
4574	Towards Robust Adaptive Object Detection Under Noisy Annotations	Paper
3139	Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection	Paper
3751	Salvage of Supervision in Weakly Supervised Object Detection	Paper
6430	Label, Verify, Correct: A Simple Few Shot Object Detection Method	Paper
944	Background Activation Suppression for Weakly Supervised Object Localization	Paper
4063	Bridging the Gap Between Classification and Localization for Weakly Supervised Object Localization	Paper
2560	Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery	Paper
6708	Cloth-Changing Person Re-Identification From a Single Image With Gait Prediction and Regularization	Paper
1508	Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation	Paper
4122	Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification	Paper
10049	Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification	Paper
7097	Towards Total Recall in Industrial Anomaly Detection	Paper
1207	H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection	Paper
4192	Geometric and Textural Augmentation for Domain Gap Reduction	Paper
10135	General Incremental Learning With Domain-Aware Categorical Representations	Paper
491	DST: Dynamic Substitute Training for Data-Free Black-Box Attack	Paper
8711	ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation	Paper

Segmentation, Grouping and Shape Analysis

Paper Id	Paper Title	Link
6126	Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation	Paper
2899	Generalized Few-Shot Semantic Segmentation	Paper
9018	Learning Non-Target Knowledge for Few-Shot Semantic Segmentation	Paper
4783	Decoupling Zero-Shot Semantic Segmentation	Paper
1590	Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation	Paper
1034	ContrastMask: Contrastive Learning To Segment Every Thing	Paper
7789	The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference	Paper
2539	AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation	Paper
1707	APES: Articulated Part Extraction From Sprite Sheets	Paper
2544	GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation	Paper
6790	CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision	Paper
5602	Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images	Paper
3446	C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image	Paper
6268	CRIS: CLIP-Driven Referring Image Segmentation	Paper
7820	MatteFormer: Transformer-Based Image Matting via Prior-Tokens	Paper
3851	Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation	Paper
3405	Pyramid Grafting Network for One-Stage High Resolution Saliency Detection	Paper
2123	Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection	Paper
4573	Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation	Paper
5002	GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings	Paper
587	Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport	Paper
3312	CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly	Paper
1743	RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures	Paper
3978	Discovering Objects That Can Move	Paper
2604	PatchFormer: An Efficient Point Transformer With Patch Attention	Paper
4099	Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap	Paper
3933	SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation	Paper
4983	An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation	Paper
7469	Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation	Paper
4583	Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders	Paper

3D From Single Images

Paper Id	Paper Title	Link
41	360MonoDepth: High-Resolution 360deg Monocular Depth Estimation	Paper
4391	Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction	Paper
6088	DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation	Paper
2989	MonoGround: Detecting Monocular 3D Objects From the Ground	Paper
2686	3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow	Paper
2657	Toward Practical Monocular Indoor Depth Estimation	Paper
4692	Focal Length and Object Pose Estimation via Render and Compare	Paper
6311	CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields	Paper
2116	Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images	Paper
8082	Layered Depth Refinement With Mask Guidance	Paper
1031	HEAT: Holistic Edge Attention Transformer for Structured Reconstruction	Paper
931	BARC: Learning To Regress 3D Dog Shape From Images by Exploiting Breed Information	Paper
8688	Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving	Paper
816	What's in Your Hands? 3D Reconstruction of Generic Objects in Hands	Paper
7814	3D Moments From Near-Duplicate Photos	Paper
5766	Neural Window Fully-Connected CRFs for Monocular Depth Estimation	Paper
9095	PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors	Paper
3717	CroMo: Cross-Modal Learning for Monocular Depth Estimation	Paper
258	f-SfT: Shape-From-Template With a Physics-Based Deformation Model	Paper
923	Human-Aware Object Placement for Visual Environment Reconstruction	Paper
11298	AutoRF: Learning 3D Object Radiance Fields From Single View Observations	Paper
7080	Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation	Paper
2163	MonoScene: Monocular 3D Semantic Scene Completion	Paper
12016	GenDR: A Generalized Differentiable Renderer	Paper
4069	MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer	Paper
7078	ROCA: Robust CAD Model Retrieval and Alignment From a Single Image	Paper

Photogrammetry and Remote Sensing

Paper Id	Paper Title	Link
971	HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening	Paper
990	Revisiting Near/Remote Sensing With Geospatial Attention	Paper
2718	Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening	Paper
3511	Mutual Information-Driven Pan-Sharpening	Paper
3982	Sparse and Complete Latent Organization for Geospatial Semantic Segmentation	Paper
5907	The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions	Paper
4025	Oriented RepPoints for Aerial Object Detection	Paper
6403	Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction	Paper
8986	PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images	Paper
8832	Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites	Paper

Low-Level Vision

Paper Id	Paper Title	Link
163	Bilateral Video Magnification Filter	Paper
4527	Neural Data-Dependent Transform for Learned Image Compression	Paper
4329	Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence	Paper
4093	Deep Generalized Unfolding Networks for Image Restoration	Paper
3967	Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling	Paper
9885	XYDeblur: Divide and Conquer for Single Image Deblurring	Paper
8572	Abandoning the Bayer-Filter To See in the Dark	Paper
9293	RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution	Paper
8149	All-in-One Image Restoration for Unknown Corruption	Paper
9697	Modeling sRGB Camera Noise With Normalizing Flows	Paper
3788	A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift	Paper
1431	Video Frame Interpolation Transformer	Paper
1412	The Devil Is in the Details: Window-Based Attention for Image Compression	Paper
1176	Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction	Paper
3387	RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs	Paper
3051	AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement	Paper
2882	HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging	Paper
2182	HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging	Paper
6342	Learning To Zoom Inside Camera Imaging Pipeline	Paper
335	Towards an End-to-End Framework for Flow-Guided Video Inpainting	Paper
2141	Context-Aware Video Reconstruction for Rolling Shutter Cameras	Paper
5516	CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image	Paper
4529	Global Matching With Overlapping Attention for Optical Flow Estimation	Paper
1482	CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow	Paper
1048	Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression	Paper
4286	Video Demoiréing With Relation-Based Temporal Consistency	Paper
6635	Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images	Paper
5086	Deep Constrained Least Squares for Blind Image Super-Resolution	Paper
12027	Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model	Paper
5762	Unsupervised Homography Estimation With Coplanarity-Aware GAN	Paper

Behavior Analysis

Paper Id	Paper Title	Link
2656	Self-Supervised Keypoint Discovery in Behavioral Videos	Paper
874	Learning To Align Sequential Actions in the Wild	Paper
7245	Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination	Paper
4809	End-to-End Human-Gaze-Target Detection With Transformers	Paper
7132	Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis	Paper
9590	MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction	Paper
10192	Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction	Paper
7946	End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps	Paper
754	Learning Affordance Grounding From Exocentric Images	Paper

Vision Applications & Systems

Paper Id	Paper Title	Link
1915	3D Scene Painting via Semantic Image Synthesis	Paper
6370	Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography	Paper
2264	ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection	Paper
2112	Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches	Paper
5892	Image Disentanglement Autoencoder for Steganography Without Embedding	Paper
1885	Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection	Paper
5934	Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning	Paper
1616	Density-Preserving Deep Point Cloud Compression	Paper
9360	Graph-Context Attention Networks for Size-Varied Deep Graph Matching	Paper
968	TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions	Paper
1872	ObjectFormer for Image Manipulation Detection and Localization	Paper
7760	Sequential Voting With Relational Box Fields for Active Object Detection	Paper
6580	Efficient Classification of Very Large Images With Tiny Objects	Paper
6468	Partially Does It: Towards Scene-Level FG-SBIR With Partial Input	Paper
6025	Long-Term Visual Map Sparsification With Heterogeneous GNN	Paper
141	Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association	Paper
10095	DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation	Paper
3621	Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring	Paper
1079	Rethinking Image Cropping: Exploring Diverse Compositions From Global Views	Paper
1680	Defensive Patches for Robust Recognition in the Physical World	Paper
8380	Semi-Supervised Video Paragraph Grounding With Contrastive Encoder	Paper
5336	Large-Scale Pre-Training for Person Re-Identification With Noisy Labels	Paper
1146	Meta Distribution Alignment for Generalizable Person Re-Identification	Paper
5429	FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction	Paper
2926	It's About Time: Analog Clock Reading in the Wild	Paper
9312	Consistency Driven Sequential Transformers Attention Model for Partially Observable Scenes	Paper
9923	SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles	Paper
9662	Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers	Paper
9541	Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving	Paper

Video Analysis & Understanding

Paper Id	Paper Title	Link
1811	UnweaveNet: Unweaving Activity Stories	Paper
7769	Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos	Paper
1319	Audio-Adaptive Activity Recognition Across Video Domains	Paper
6385	Frame-Wise Action Representations for Long Videos via Sequence Contrastive Learning	Paper
9349	Image Based Reconstruction of Liquids From 2D Surface Detections	Paper
1579	Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency	Paper
6891	How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs	Paper
2102	Programmatic Concept Learning for Human Motion Description and Synthesis	Paper
4326	Learning To Recognize Procedural Activities With Distant Supervision	Paper
6761	Implicit Motion Handling for Video Camouflaged Object Detection	Paper
11553	Dynamic Scene Graph Generation via Anticipatory Pre-Training	Paper
1845	Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization	Paper
3930	OCSampler: Compressing Videos to One Clip With Single-Step Sampling	Paper
5670	A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting	Paper
3981	TubeFormer-DeepLab: Video Mask Transformer	Paper
2673	ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization	Paper
2928	GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation	Paper
8639	STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction	Paper
3656	Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos	Paper
5386	End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection	Paper
5430	Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision	Paper
2018	Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement	Paper
6082	A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information	Paper
8138	Long-Short Temporal Contrastive Learning of Video Transformers	Paper
4525	Scene Consistency Representation Learning for Video Scene Segmentation	Paper
1024	Unsupervised Pre-Training for Temporal Action Localization Tasks	Paper
7000	Contrastive Learning for Unsupervised Video Highlight Detection	Paper
8133	Deformable Video Transformer	Paper
8415	Recurring the Transformer for Video Action Recognition	Paper

Image & Video Synthesis and Generation

Paper Id	Paper Title	Link
5438	Text to Image Generation With Semantic-Spatial Aware GAN	Paper
107	StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis	Paper
5345	Blended Diffusion for Text-Driven Editing of Natural Images	Paper
5128	Make It Move: Controllable Image-to-Video Generation With Text Descriptions	Paper
5317	Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model	Paper
10144	A Style-Aware Discriminator for Controllable Image Translation	Paper
8904	Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint	Paper
10441	Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks	Paper
8356	FlexIT: Towards Flexible Semantic Image Translation	Paper
4022	Modulated Contrast for Versatile Image Synthesis	Paper
8146	QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation	Paper
9818	Self-Supervised Dense Consistency Regularization for Image-to-Image Translation	Paper
155	Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation	Paper
2538	InstaFormer: Instance-Aware Image-to-Image Translation With Transformer	Paper
648	Unsupervised Image-to-Image Translation With Generative Prior	Paper
133	StylizedNeRF: Consistent 3D Scene Stylization As Stylized NeRF via 2D-3D Mutual Learning	Paper
30	NeRF-Editing: Geometry Editing of Neural Radiance Fields	Paper
8276	GeoNeRF: Generalizing NeRF With Geometry Priors	Paper
5276	Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation	Paper
10588	AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields	Paper
5174	HDR-NeRF: High Dynamic Range Neural Radiance Fields	Paper
3703	NeRFReN: Neural Radiance Fields With Reflections	Paper
4368	Neural Point Light Fields	Paper
697	3D-Aware Image Synthesis via Learning Structural and Textural Representations	Paper
6895	GIRAFFE HD: A High-Resolution 3D-Aware Generative Model	Paper
1474	Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis	Paper
5736	Bi-Level Doubly Variational Learning for Energy-Based Latent Variable Models	Paper
5811	High-Resolution Image Harmonization via Collaborative Dual Transformations	Paper
10156	Brain-Supervised Image Editing	Paper

Face & Gestures

Paper Id	Paper Title	Link
4047	HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network	Paper
6529	Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC	Paper
8566	Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning	Paper
5578	Enhancing Face Recognition With Self-Supervised 3D Reconstruction	Paper
5996	Learning To Learn Across Diverse Data Biases in Deep Face Recognition	Paper
7320	An Efficient Training Approach for Very Large Scale Face Recognition	Paper
4045	MogFace: Towards a Deeper Appreciation on Face Detection	Paper
7382	Exploring Frequency Adversarial Attacks for Face Forgery Detection	Paper
7163	End-to-End Reconstruction-Classification Learning for Face Forgery Detection	Paper
3804	Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing	Paper
9981	Privacy-Preserving Online AutoML for Domain-Specific Face Detection	Paper
891	Simulated Adversarial Testing of Face Recognition Models	Paper
5782	Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing	Paper
2510	Towards Semi-Supervised Deep Facial Expression Recognition With an Adaptive Confidence Margin	Paper
5638	Towards Accurate Facial Landmark Detection via Cascaded Transformers	Paper
3038	PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer	Paper
5557	GazeOnce: Real-Time Multi-Person Gaze Estimation	Paper
3783	Generalizing Gaze Estimation With Rotation Consistency	Paper
4512	Face Relighting With Geometrically Consistent Shadows	Paper
2485	HairMapper: Removing Hair From Portraits Using GANs	Paper
5664	Learning To Restore 3D Face From In-the-Wild Degraded Images	Paper

Document Analysis & Understanding

Paper Id	Paper Title	Link
2898	Open-Set Text Recognition via Character-Context Decoupling	Paper
3331	Neural Collaborative Graph Machines for Table Structure Recognition	Paper
4051	Revisiting Document Image Dewarping by Grid Regularization	Paper
4161	Syntax-Aware Network for Handwritten Mathematical Expression Recognition	Paper
4743	Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection	Paper
5258	Fourier Document Restoration for Robust Document Dewarping and Recognition	Paper
6276	XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding	Paper
7348	SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition	Paper
2703	Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer	Paper
3686	TableFormer: Table Structure Understanding With Transformers	Paper
8352	Knowledge Mining With Scene Text for Fine-Grained Recognition	Paper
11454	PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents	Paper

Vision & Language

Paper Id	Paper Title	Link
2043	Towards Implicit Text-Guided 3D Shape Generation	Paper
9380	Towards Language-Free Training for Text-to-Image Generation	Paper
7612	ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic	Paper
4952	EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching	Paper
7374	Hierarchical Modular Network for Video Captioning	Paper
3770	SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning	Paper
3222	End-to-End Generative Pretraining for Multimodal Video Captioning	Paper
4855	Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning	Paper
8115	Scaling Up Vision-Language Pre-Training for Image Captioning	Paper
9270	Comprehending and Ordering Semantics for Image Captioning	Paper
11498	NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge	Paper
814	Injecting Semantic Concepts Into End-to-End Image Captioning	Paper
1613	DIFNet: Boosting Visual Information Flow for Image Captioning	Paper
8224	VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning	Paper
7848	Show, Deconfound and Tell: Image Captioning With Causal Inference	Paper
9257	EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval	Paper
11667	CLIPstyler: Image Style Transfer With a Single Text Condition	Paper
4042	HairCLIP: Design Your Hair by Text and Reference Image	Paper
1965	DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting	Paper
11622	On Guiding Visual Attention With Language Specification	Paper
9610	UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog	Paper
10953	Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer	Paper
10338	LiT: Zero-Shot Transfer With Locked-Image Text Tuning	Paper
851	GroupViT: Semantic Segmentation Emerges From Text Supervision	Paper
1404	ReSTR: Convolution-Free Referring Image Segmentation Using Transformers	Paper
1565	LAVT: Language-Aware Vision Transformer for Referring Image Segmentation	Paper
7782	An Empirical Study of Training End-to-End Vision-and-Language Transformers	Paper
7761	Are Multimodal Transformers Robust to Missing Modality?	Paper

3D From Multi-View & Sensors

Paper Id	Paper Title	Link
4834	NeurMiPs: Neural Mixture of Planar Experts for View Synthesis	Paper
4419	FWD: Real-Time Novel View Synthesis With Forward Warping and Depth	Paper
441	SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images	Paper
11049	Fast, Accurate and Memory-Efficient Partial Permutation Synchronization	Paper
2015	Learning To Find Good Models in RANSAC	Paper
9080	Optimizing Elimination Templates by Greedy Parameter Search	Paper
11523	GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision	Paper
2580	HARA: A Hierarchical Approach for Robust Rotation Averaging	Paper
4166	RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging	Paper
11316	A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors	Paper
4211	ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance	Paper
6651	Self-Supervised Neural Articulated Shape and Appearance Models	Paper
6645	Virtual Elastic Objects	Paper
3282	Decoupling Makes Weakly Supervised Local Feature Better	Paper
1667	JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints	Paper
640	ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging	Paper
9217	DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering	Paper
8789	Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis	Paper
1269	Structured Local Radiance Fields for Human Avatar Modeling	Paper
4685	High-Fidelity Human Avatars From a Single RGB Camera	Paper
5827	Forecasting Characteristic 3D Poses of Human Actions	Paper
817	Virtual Correspondence: Humans as a Cue for Extreme-View Geometry	Paper
869	BEHAVE: Dataset and Method for Tracking Human Object Interactions	Paper
3549	Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives	Paper
8956	RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation	Paper
9005	NPBG++: Accelerating Neural Point-Based Graphics	Paper
5409	Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows	Paper
875	Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos	Paper

Motion & Tracking

Paper Id	Paper Title	Link
8292	TransforMatcher: Match-to-Match Attention for Semantic Correspondence	Paper
1610	Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences	Paper
2606	Locality-Aware Inter– and Intra-Video Reconstruction for Self-Supervised Correspondence Learning	Paper
6011	Transforming Model Prediction for Tracking	Paper
10078	Ranking-Based Siamese Visual Tracking	Paper
3860	Correlation-Aware Deep Tracking	Paper
3825	Global Tracking via Ensemble of Local Trackers	Paper
909	Global Tracking Transformers	Paper
1198	Unified Transformer Tracker for Object Tracking	Paper
9651	Transformer Tracking With Cyclic Shifting Window Attention	Paper
7487	Spiking Transformers for Event-Based Single Object Tracking	Paper
6379	Adiabatic Quantum Computing for Multi Object Tracking	Paper
8065	HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction	Paper
2493	Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking	Paper
9395	TrackFormer: Multi-Object Tracking With Transformers	Paper
4294	Learning of Global Objective for Network Flow in Multi-Object Tracking	Paper
5264	LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking	Paper
3128	Multi-Object Tracking Meets Moving UAV	Paper
912	Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline	Paper
2683	Unsupervised Domain Adaptation for Nighttime Aerial Tracking	Paper
6998	Learning Optical Flow With Kernel Patch Attention	Paper
5798	Towards Understanding Adversarial Robustness of Optical Flow Networks	Paper
5641	DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow	Paper

Pose Estimation & Tracking

Paper Id	Paper Title	Link
8367	Multi-Person Extreme Motion Prediction	Paper
51	Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation	Paper
9962	AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation	Paper
4071	Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation	Paper
6971	Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation	Paper
10385	Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video	Paper
6843	Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization	Paper
3768	Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation	Paper
2364	Location-Free Human Pose Estimation	Paper
1083	MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation	Paper
7104	Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision	Paper
1897	Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors	Paper
5115	PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound	Paper
10409	Differentiable Dynamics for Articulated 3D Human Motion Reconstruction	Paper
4352	COAP: Compositional Articulated Occupancy of People	Paper
6849	Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video	Paper
6924	SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration	Paper
3094	MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video	Paper
770	Putting People in Their Place: Monocular Regression of 3D People in Depth	Paper
4288	FLAG: Flow-Based 3D Avatar Generation From Sparse Observations	Paper
896	GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping	Paper
933	Capturing and Inferring Dense Full-Body Human-Scene Contact	Paper
3301	BodyMap: Learning Full-Body Dense Correspondence Map	Paper
1209	ICON: Implicit Clothed Humans Obtained From Normals	Paper

Transfer_Low-Shot_Long-Tail Learning

Paper Id	Paper Title	Link
7748	Generating Representative Samples for Few-Shot Classification	Paper
2919	Matching Feature Sets for Few-Shot Image Classification	Paper
2525	Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations	Paper
6602	Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection	Paper
9011	Forward Compatible Few-Shot Class-Incremental Learning	Paper
10780	Constrained Few-Shot Class-Incremental Learning	Paper
9441	Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference	Paper
9456	EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning	Paper
10053	Few-Shot Learning With Noisy Labels	Paper
7988	Ranking Distance Calibration for Cross-Domain Few-Shot Learning	Paper
10614	Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning	Paper
2507	Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning	Paper
8242	Learning To Memorize Feature Hallucination for One-Shot Image Generation	Paper
48	A Closer Look at Few-Shot Image Generation	Paper
4470	Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition	Paper
2309	Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability	Paper
1534	Transferability Estimation Using Bhattacharyya Class Separability	Paper
9832	Revisiting the Transferability of Supervised Pretraining: An MLP Perspective	Paper
5990	Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data	Paper
6400	Which Model To Transfer? Finding the Needle in the Growing Haystack	Paper
7918	Does Robustness on ImageNet Transfer to Downstream Tasks?	Paper
9779	What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors	Paper
3815	OW-DETR: Open-World Detection Transformer	Paper
9180	Unseen Classes at a Later Time? No Problem	Paper
6901	Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism	Paper
5542	On Generalizing Beyond Domains in Cross-Domain Continual Learning	Paper
10123	Online Continual Learning on a Contaminated Data Stream With Blurry Task Boundaries	Paper
2527	DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion	Paper
544	Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning	Paper
2321	En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning	Paper
5161	VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning	Paper
5950	Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning	Paper
8438	KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning	Paper
6727	Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis	Paper
4846	WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery	Paper

Motion, Tracking, Registration, Vision & X, and Theory

Paper Id	Paper Title	Link
1812	MeMOT: Multi-Object Tracking With Memory	Paper
2326	Unsupervised Learning of Accurate Siamese Tracking	Paper
1995	Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds	Paper
3616	GMFlow: Learning Optical Flow via Global Matching	Paper
10012	GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking	Paper
3417	SNUG: Self-Supervised Neural Dynamic Garments	Paper
6431	Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction	Paper
10207	Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation	Paper
4351	Context-Aware Sequence Alignment Using 4D Skeletal Augmentation	Paper
10467	Enabling Equivariance for Arbitrary Lie Groups	Paper
1089	RAMA: A Rapid Multicut Algorithm on GPU	Paper
103	Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks	Paper
6427	RCP: Recurrent Closest Point for Point Cloud	Paper
6607	Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis	Paper
6810	Balanced Multimodal Learning via On-the-Fly Gradient Modulation	Paper

3D from Multiview & Sensors, Learning for Vision, Explainable Vision, and Privacy

Paper Id	Paper Title	Link
3870	Block-NeRF: Scalable Large Scene Neural View Synthesis	Paper
5472	SceneSqueezer: Learning To Compress Scene for Camera Relocalization	Paper
7077	Light Field Neural Rendering	Paper
8204	Extracting Triangular 3D Models, Materials, and Lighting From Images	Paper
8722	Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3)	Paper
1461	Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models	Paper
6131	It's All in the Teacher: Zero-Shot Quantization Brought Closer to the Teacher	Paper
6484	NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks	Paper
7060	Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention	Paper
5966	Parameter-Free Online Test-Time Adaptation	Paper
10272	Patch-Level Representation Learning for Self-Supervised Vision Transformers	Paper
11845	Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization	Paper
9568	Mixed Differential Privacy in Computer Vision	Paper
2663	DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis	Paper
11405	Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning	Paper

Computer Vision Theory

Paper Id	Paper Title	Link
11527	On the Instability of Relative Pose Estimation and RANSAC's Role	Paper
1458	Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training	Paper
2845	Global Sensing and Measurements Reuse for Image Compressed Sensing	Paper
7248	Maximum Consensus by Weighted Influences of Monotone Boolean Functions	Paper
8398	MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph	Paper
6292	Styleformer: Transformer Based Generative Adversarial Networks With Style Vector	Paper
9212	Scanline Homographies for Rolling-Shutter Plane Absolute Pose	Paper

Self_Semi_Meta & Unsupervised Learning

Paper Id	Paper Title	Link
4675	Self-Supervised Models Are Continual Learners	Paper
5592	The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization	Paper
5983	Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning	Paper
7932	SimMIM: A Simple Framework for Masked Image Modeling	Paper
8651	Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning	Paper
7363	UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning	Paper
6763	Contrastive Conditional Neural Processes	Paper
1945	One-Bit Active Query With Contrastive Pairs	Paper
496	HCSC: Hierarchical Contrastive Selective Coding	Paper
4560	Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging	Paper
9291	Hierarchical Self-Supervised Representation Learning for Movie Understanding	Paper
7239	Anomaly Detection via Reverse Distillation From One-Class Embedding	Paper
8177	Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning	Paper
3636	DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning	Paper
5723	Learning To Collaborate in Decentralized Learning of Personalized Models	Paper
8083	Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph	Paper
1264	DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning	Paper
1835	Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning	Paper
1139	Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes	Paper
1554	Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors	Paper
2856	Spectral Unsupervised Domain Adaptation for Visual Recognition	Paper
1408	DATA: Domain-Aware and Task-Aware Self-Supervised Learning	Paper
2449	Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning	Paper
4337	DeepDPM: Deep Clustering With an Unknown Number of Clusters	Paper
7785	PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions	Paper
9990	Robust Outlier Detection by De-Biasing VAE Likelihoods	Paper
3489	Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data	Paper
1420	CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding	Paper
10336	Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation	Paper
3423	DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation	Paper
8154	WildNet: Learning Domain Generalized Semantic Segmentation From the Wild	Paper
5616	UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation	Paper
4410	Semi-Supervised Semantic Segmentation With Error Localization Network	Paper
621	Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation	Paper
750	Integrative Few-Shot Learning for Classification and Segmentation	Paper
4568	GanOrCon: Are Generative Models Useful for Few-Shot Segmentation?	Paper
8214	SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis	Paper
1055	CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs	Paper

Privacy and Federated Learning

Paper Id	Paper Title	Link
93	GradViT: Gradient Inversion of Vision Transformers	Paper
9396	Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings	Paper
7502	CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning	Paper
6925	APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers	Paper
6650	Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning	Paper
6121	Robust Federated Learning With Noisy and Heterogeneous Clients	Paper
9724	Federated Learning With Position-Aware Neurons	Paper
10112	Layer-Wised Model Aggregation for Personalized Federated Learning	Paper
4369	FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning	Paper
2897	FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction	Paper
1250	Differentially Private Federated Learning With Local Regularization and Sparsification	Paper
1234	Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage	Paper
5568	Learn From Others and Be Yourself in Heterogeneous Federated Learning	Paper
3953	RSCFed: Random Sampling Consensus Federated Semi-Supervised Learning	Paper
2956	Federated Class-Incremental Learning	Paper
7881	Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning	Paper
8257	FedCorr: Multi-Stage Federated Learning for Label Noise Correction	Paper
6027	ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning	Paper

Explainable Computer Vision

Paper Id	Paper Title	Link
1096	Cycle-Consistent Counterfactuals by Latent Transformations	Paper
5428	Consistent Explanations by Contrastive Learning	Paper
6357	Towards Better Understanding Attribution Methods	Paper
7285	Proto2Proto: Can You Recognize the Car, the Way I Do?	Paper
7606	Do Explanations Explain? Model Knows Best	Paper
7668	HINT: Hierarchical Neuron Concept Explainer	Paper
7825	Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes	Paper
7404	What Do Navigation Agents Learn About Their Environment?	Paper
11789	A Framework for Learning Ante-Hoc Explainable Models via Concepts	Paper
778	Exploiting Explainable Metrics for Augmented SGD	Paper
8195	FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks	Paper
10710	Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations	Paper
6365	B-Cos Networks: Alignment Is All We Need for Interpretability	Paper
4303	The Flag Median and FlagIRLS	Paper

Transparency, Fairness, Accountability, Privacy & Ethics in Vision

Paper Id	Paper Title	Link
112	Learning Fair Classifiers With Partially Annotated Group Labels	Paper
5065	Estimating Structural Disparities for Face Models	Paper
6022	Estimating Example Difficulty Using Variance of Gradients	Paper
6962	Fairness-Aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models	Paper
9906	Fair Contrastive Learning for Facial Attribute Classification	Paper
6582	Leveraging Adversarial Examples To Quantify Membership Information Leakage	Paper
10915	Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers	Paper
11713	Deep Unlearning via Randomized Conditionally Independent Hessians	Paper
284	Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets	Paper
11071	A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models	Paper

Vision & X

Paper Id	Paper Title	Link
2658	Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices?	Paper
649	Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation	Paper
2266	SEEG: Semantic Energized Co-Speech Gesture Generation	Paper
715	Mix and Localize: Localizing Sound Sources in Mixtures	Paper
2204	Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation	Paper
7217	IntentVizor: Towards Generic Query Guided Interactive Video Summarization	Paper
11551	M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers	Paper
5355	Finding Fallen Objects via Asynchronous Audio-Visual Integration	Paper
6187	Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory	Paper
6676	Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization	Paper
10849	Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language	Paper
11001	It's Time for Artistic Correspondence in Music and Video	Paper
11391	Self-Supervised Object Detection From Audio-Visual Correspondence	Paper
8361	More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech	Paper
2475	ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer	Paper
7892	A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection	Paper

Image & Video Synthesis and Generation (I)

Paper Id	Paper Title	Link
4818	Diffusion Autoencoders: Toward a Meaningful and Decodable Representation	Paper
6519	Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps	Paper
11253	Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values	Paper
908	Ensembling Off-the-Shelf Models for GAN Training	Paper
10490	Marginal Contrastive Correspondence for Guided Image Generation	Paper
3437	GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation	Paper
5452	High-Resolution Image Synthesis With Latent Diffusion Models	Paper
3874	Vector Quantized Diffusion Model for Text-to-Image Synthesis	Paper
5265	ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation	Paper
790	Dataset Distillation by Matching Training Trajectories	Paper
6337	Continual Predictive Learning From Videos	Paper
11474	Motion-Adjustable Neural Implicit Video Representation	Paper
2561	Splicing ViT Features for Semantic Appearance Transfer	Paper
1064	MAT: Mask-Aware Transformer for Large Hole Image Inpainting	Paper
2344	Day-to-Night Image Synthesis for Training Nighttime Neural ISPs	Paper
5874	Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness	Paper
3576	Few-Shot Head Swapping in the Wild	Paper
5059	ClothFormer: Taming Video Virtual Try-On in All Module	Paper

Human Pose Estimation & Tracking, Localization, and Object Pose Estimation

Paper Id	Paper Title	Link
4380	Adversarial Parametric Pose Prior	Paper
4450	Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation	Paper
4806	PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision	Paper
2492	Generalizable Human Pose Triangulation	Paper
1181	GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras	Paper
1468	Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory	Paper
4014	Contextual Instance Decoupling for Robust Multi-Person Pose Estimation	Paper
2202	End-to-End Multi-Person Pose Estimation With Transformers	Paper
4534	Meta Agent Teaming Active Learning for Pose Estimation	Paper
3411	Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation	Paper
6194	Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer	Paper
8628	Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture	Paper
1445	LASER: LAtent SpacE Rendering for 2D Visual Localization	Paper
8152	Learning To Detect Scene Landmarks for Camera Localization	Paper
4196	Geometric Transformer for Fast and Robust Point Cloud Registration	Paper
7968	ARCS: Accurate Rotation and Correspondence Search	Paper
3628	FisherMatch: Semi-Supervised Rotation Regression via Entropy-Based Filtering	Paper
10439	Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation	Paper

Efficient Learning & Inference

Paper Id	Paper Title	Link
5660	CAFE: Learning To Condense Dataset by Aligning Features	Paper
9135	Lite-MDETR: A Lightweight Multi-Modal Detector	Paper
703	DeeCap: Dynamic Early Exiting for Efficient Image Captioning	Paper
10864	Searching the Deployable Convolution Neural Networks for GPUs	Paper
6685	Active Learning by Feature Mixing	Paper
6585	When To Prune? A Policy Towards Early Structural Pruning	Paper
11185	Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning	Paper
9318	How Well Do Sparse ImageNet Models Transfer?	Paper
9388	Rep-Net: Efficient On-Device Learning via Feature Reprogramming	Paper
4954	CHEX: CHannel EXploration for CNN Model Compression	Paper
3533	HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks	Paper
2934	AdaViT: Adaptive Vision Transformers for Efficient Image Recognition	Paper
1772	Cross-Image Relational Knowledge Distillation for Semantic Segmentation	Paper
724	Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error	Paper
3958	IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization	Paper
9796	DECORE: Deep Compression With Reinforcement Learning	Paper
11195	Towards Efficient and Scalable Sharpness-Aware Minimization	Paper
1088	AEGNN: Asynchronous Event-Based Graph Neural Networks	Paper
4078	DiSparse: Disentangled Sparsification for Multitask Model Compression	Paper
1836	Multi-Modal Extreme Classification	Paper
11241	A Sampling-Based Approach for Efficient Clustering in Large Datasets	Paper
11776	Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction	Paper
6380	Learnable Lookup Table for Neural Network Quantization	Paper
8374	Instance-Aware Dynamic Neural Network Quantization	Paper
10529	Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation	Paper
3265	Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction	Paper
5233	Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation	Paper
11646	PokeBNN: A Binary Pursuit of Lightweight Accuracy	Paper
2031	Automated Progressive Learning for Efficient Training of Vision Transformers	Paper
1417	DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos	Paper
9190	Channel Balancing for Accurate Quantization of Winograd Convolutions	Paper
9054	ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching	Paper
8230	Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs	Paper
4843	AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation	Paper
9210	TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing	Paper
185	SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems	Paper
6367	TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed	Paper

Physics-Based Vision and Shape-From-X

Paper Id	Paper Title	Link
793	DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation	Paper
1453	Universal Photometric Stereo Network Using Global Lighting Contexts	Paper
2355	Uncertainty-Aware Deep Multi-View Photometric Stereo	Paper
5441	Fast Light-Weight Near-Field Photometric Stereo	Paper
4990	Glass Segmentation Using Intensity and Spectral Polarization Cues	Paper
1557	Shape From Polarization for Complex Scenes in the Wild	Paper
6107	Deep Depth From Focus With Differential Focus Volume	Paper
7381	Optimal LED Spectral Multiplexing for NIR2RGB Translation	Paper
8076	Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements	Paper
8196	NAN: Noise-Aware NeRFs for Burst-Denoising	Paper
3129	Estimating Fine-Grained Noise Model via Contrastive Learning	Paper
11094	Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders	Paper
1021	MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution	Paper
6350	PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images	Paper

Visual Reasoning

Paper Id	Paper Title	Link
10159	Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors	Paper
1531	Learning To Anticipate Future With Dynamic Context Removal	Paper
11115	Self-Supervised Spatial Reasoning on Multi-View Line Drawings	Paper
5634	Contextual Debiasing for Visual Recognition With Causal Mechanisms	Paper

Security, Transparency, Fairness, Accountability, Privacy & Ethics in Vision

Paper Id	Paper Title	Link
3468	Adversarial Texture for Fooling Person Detectors in the Physical World	Paper
4109	Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World	Paper
5922	Enhancing Classifier Conservativeness and Robustness by Polynomiality	Paper
5448	Backdoor Attacks on Self-Supervised Learning	Paper
6583	Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks	Paper
3994	Few-Shot Backdoor Defense Using Shapley Estimation	Paper
10910	Better Trigger Inversion Optimization in Backdoor Scanning	Paper
7051	Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees	Paper
9002	Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching	Paper
6908	LAS-AT: Adversarial Training With Learnable Attack Strategy	Paper
4589	Subspace Adversarial Training	Paper
5403	Pyramid Adversarial Training Improves ViT Performance	Paper
12025	Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations	Paper
2245	Robust Image Forgery Detection Over Online Social Network Shared Images	Paper
6270	Quantifying Societal Bias Amplification in Image Captioning	Paper

Image & Video Synthesis and Generation (II); Video Analysis & Understanding

Paper Id	Paper Title	Link
725	Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models	Paper
706	GAN-Supervised Dense Visual Alignment	Paper
8416	Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator	Paper
8925	Text2Mesh: Text-Driven Neural Stylization for Meshes	Paper
6649	StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation	Paper
7720	Physical Simulation Layer for Accurate 3D Modeling	Paper
717	Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time	Paper
3579	Neural Texture Extraction and Distribution for Controllable Person Image Synthesis	Paper
6545	I M Avatar: Implicit Morphable Head Avatars From Videos	Paper
549	RCL: Recurrent Continuous Localization for Temporal Action Detection	Paper
4317	Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection	Paper
729	MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition	Paper
9219	TubeR: Tubelet Transformer for Video Action Detection	Paper
8613	MixFormer: End-to-End Tracking With Iterative Mixed Attention	Paper

Recognition, Learning for Vision, and Robot Vision

Paper Id	Paper Title	Link
5905	DN-DETR: Accelerate DETR Training by Introducing Query DeNoising	Paper
7010	Proper Reuse of Image Classification Features Improves Object Detection	Paper
8646	Boosting 3D Object Detection by Simulating Multimodality on Point Clouds	Paper
10578	TransVPR: Transformer-Based Place Recognition With Multi-Level Attention Aggregation	Paper
9856	Disentangling Visual Embeddings for Attributes and Objects	Paper
1856	QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection	Paper
5517	Unknown-Aware Object Detection: Learning What You Don't Know From Videos in the Wild	Paper
3247	Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks	Paper
5972	Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective	Paper
3349	Calibrating Deep Neural Networks by Pairwise Constraints	Paper
7691	Lifelong Graph Learning	Paper
11327	OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks	Paper
10810	Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation	Paper
11529	Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches	Paper
9157	Cross-View Transformers for Real-Time Map-View Semantic Segmentation	Paper

Self_Semi_Meta-, & Unsupervised Learning

Paper Id	Paper Title	Link
8542	Label Matching Semi-Supervised Object Detection	Paper
10433	Multidimensional Belief Quantification for Label-Efficient Meta-Learning	Paper
11752	Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples	Paper
5537	Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification	Paper
9804	Class-Aware Contrastive Semi-Supervised Learning	Paper
4181	Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework	Paper
10916	Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo	Paper
2296	Learning Where To Learn in Cross-View Self-Supervised Learning	Paper
2487	Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective	Paper
2869	SimMatch: Semi-Supervised Learning With Similarity Matching	Paper
540	Active Teacher for Semi-Supervised Object Detection	Paper
943	Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection	Paper
5807	Self-Supervised Learning of Object Parts for Semantic Segmentation	Paper
4603	MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection	Paper
6024	Scale-Equivalent Distillation for Semi-Supervised Object Detection	Paper
6654	A Self-Supervised Descriptor for Image Copy Detection	Paper
10678	Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut	Paper
9521	CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification	Paper
11648	Semi-Supervised Few-Shot Learning via Multi-Factor Clustering	Paper
2306	CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning	Paper
2589	Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data	Paper
3172	A Simple Data Mixing Prior for Improving Self-Supervised Learning	Paper
3375	DETReg: Unsupervised Pretraining With Region Priors for Object Detection	Paper
4354	Sound and Visual Representation Learning With Multiple Pretraining Tasks	Paper
4601	UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training	Paper
1744	Weakly Supervised Object Localization As Domain Adaption	Paper
7762	Debiased Learning From Naturally Imbalanced Pseudo-Labels	Paper
3414	Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning	Paper
1546	Masked Feature Prediction for Self-Supervised Visual Pre-Training	Paper
6171	Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency	Paper
11064	Id-Free Person Similarity Learning	Paper
5962	End-to-End Semi-Supervised Learning for Video Action Detection	Paper
11772	Probabilistic Representations for Video Contrastive Learning	Paper
5904	Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition	Paper
5668	BEVT: BERT Pretraining of Video Transformers	Paper
7678	Generative Cooperative Learning for Unsupervised Video Anomaly Detection	Paper
9976	When Does Contrastive Visual Representation Learning Work?	Paper
596	The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization	Paper
5267	What Matters for Meta-Learning Vision Regression Tasks?	Paper

Robot Vision

Paper Id	Paper Title	Link
689	IFOR: Iterative Flow Minimization for Robotic Object Rearrangement	Paper
2734	TCTrack: Temporal Contexts for Aerial Tracking	Paper
2846	AKB-48: A Real-World Articulated Object Knowledge Base	Paper
4440	3DAC: Learning Attribute Compression for Point Clouds	Paper
4521	Simple but Effective: CLIP Embeddings for Embodied AI	Paper
2359	Multi-Robot Active Mapping via Neural Bipartite Graph Matching	Paper
2464	Continuous Scene Representations for Embodied AI	Paper
2923	Interactron: Embodied Adaptive Object Detection	Paper
1761	Online Learning of Reusable Abstract Models for Object Goal Navigation	Paper
3195	RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization	Paper
2684	UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation	Paper
9736	Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation	Paper
10404	Upright-Net: Learning Upright Orientation for 3D Point Cloud	Paper

Computer Vision for Social Good

Paper Id	Paper Title	Link
7865	DeepFake Disrupter: The Detector of DeepFake Is My Friend	Paper
3350	HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization	Paper
7457	Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources	Paper
9423	Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection	Paper

Adversarial Attack & Defense

Paper Id	Paper Title	Link
8193	Transferable Sparse Adversarial Attack	Paper
8898	Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection	Paper
10026	Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability	Paper
8063	Improving Adversarial Transferability via Neuron Attribution-Based Attacks	Paper
10779	Complex Backdoor Detection by Symmetric Feature Differencing	Paper
10243	Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer	Paper
11148	Zero-Query Transfer Attacks on Context-Aware Object Detectors	Paper
6302	360-Attack: Distortion-Aware Perturbations From Perspective-Views	Paper
11210	Label-Only Model Inversion Attacks via Boundary Repulsion	Paper
11207	Merry Go Round: Rotate a Frame and Fool a DNN	Paper
1485	Cross-Modal Transferable Adversarial Attacks From Images to Videos	Paper
10629	BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning	Paper
11521	Investigating Top-k White-Box and Transferable Black-Box Attack	Paper
7175	Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution	Paper
2830	Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack	Paper
3325	Towards Efficient Data Free Black-Box Adversarial Attack	Paper
3931	Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network	Paper
11451	Certified Patch Robustness via Smoothed Vision Transformers	Paper
5540	Towards Practical Certifiable Patch Defense With Vision Transformer	Paper
4282	On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles	Paper
7361	3DeformRS: Certifying Spatial Deformations on Point Clouds	Paper
4302	Stereoscopic Universal Perturbations Across Different Architectures and Datasets	Paper
4407	Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations	Paper
10883	Bounded Adversarial Attack on Deep Content Features	Paper
9811	DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints	Paper
10212	Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart	Paper
10905	Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness	Paper
7360	Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input	Paper
2205	Adversarial Eigen Attack on Black-Box Models	Paper
7620	Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond	Paper
4422	Enhancing Adversarial Training With Second-Order Statistics of Weights	Paper
9176	Towards Data-Free Model Stealing in a Hard Label Setting	Paper
9218	Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients	Paper
10096	DTA: Physical Camouflage Attacks Using Differentiable Transformation Network	Paper
1841	Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity	Paper
201	Enhancing Adversarial Robustness for Deep Metric Learning	Paper
5230	Shape-Invariant 3D Adversarial Point Clouds	Paper
5789	Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack by Natural Phenomenon	Paper
6161	Exploring Effective Data for Surrogate Training Towards Black-Box Attack	Paper
11698	NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models	Paper
5970	Dual-Key Multimodal Backdoors for Visual Question Answering	Paper
6546	Proactive Image Manipulation Detection	Paper

Representation Learning

Paper Id	Paper Title	Link
2347	Unified Contrastive Learning in Image-Text-Label Space	Paper
9927	AlignMixup: Improving Representations by Interpolating Aligned Features	Paper
2419	On the Road to Online Adaptation for Semantic Image Segmentation	Paper
5236	ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation	Paper
3487	Kernelized Few-Shot Object Detection With Efficient Integral Aggregation	Paper
186	Neural Mean Discrepancy for Efficient Out-of-Distribution Detection	Paper
8477	A Structured Dictionary Perspective on Implicit Neural Representations	Paper
10563	LARGE: Latent-Based Regression Through GAN Semantics	Paper
6667	Rethinking Controllable Variational Autoencoders	Paper
9016	Learning Canonical F-Correlation Projection for Compact Multiview Representation	Paper
6288	Cross-Architecture Self-Supervised Video Representation Learning	Paper
4418	Improving Video Model Transfer With Dynamic Representation Learning	Paper
5928	Self-Supervised Image Representation Learning With Geometric Set Consistency	Paper
246	HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging	Paper
4037	Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling	Paper
7362	DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds	Paper
9356	Neural Convolutional Surfaces	Paper
10032	Representing 3D Shapes With Probabilistic Directed Distance Fields	Paper
3030	H4D: Human 4D Modeling by Learning Neural Compositional Representation	Paper
518	Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification	Paper
1275	Contrastive Regression for Domain Adaptation on Gaze Estimation	Paper
9822	Forward Compatible Training for Large-Scale Embedding Retrieval Systems	Paper
4945	Improving Subgraph Recognition With Variational Graph Information Bottleneck	Paper
2508	Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss	Paper
4145	Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species	Paper

Computational Photography

Paper Id	Paper Title	Link
4111	Deep Stereo Image Compression via Bi-Directional Coding	Paper
8934	RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion	Paper
4213	Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer	Paper
2554	Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels	Paper
3021	SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization	Paper
3470	Automatic Color Image Stitching Using Quaternion Rank-1 Alignment	Paper
6712	SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing	Paper
4112	Degree-of-Linear-Polarization-Based Color Constancy	Paper
2170	Point Cloud Color Constancy	Paper
265	Boosting View Synthesis With Residual Transfer	Paper
5780	Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection	Paper
4413	Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging	Paper
479	PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition	Paper
7596	Multimodal Material Segmentation	Paper
6384	Occlusion-Aware Cost Constructor for Light Field Depth Estimation	Paper
815	Learning Neural Light Fields With Ray-Space Embedding	Paper
2268	Acquiring a Dynamic Light Field Through a Single-Shot Coded Image	Paper
4415	Gravitationally Lensed Black Hole Emission Tomography	Paper
5058	Deep Saliency Prior for Reducing Visual Distraction	Paper
8388	Personalized Image Aesthetics Assessment With Rich Attributes	Paper
6382	Artistic Style Discovery With Independent Components	Paper

Scene Analysis & Understanding

Paper Id	Paper Title	Link
2004	Noisy Boundaries: Lemon or Lemonade for Semi-Supervised Instance Segmentation?	Paper
5105	Partial Class Activation Attention for Semantic Segmentation	Paper
7261	Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers	Paper
4156	Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation	Paper
7427	Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation	Paper
4593	Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation	Paper
1567	L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation	Paper
573	Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data	Paper
1307	Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation	Paper
2748	Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation	Paper
4586	MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation	Paper
8233	NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night	Paper
6032	Fast Point Transformer	Paper
7468	RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior	Paper
807	ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes	Paper
1738	DisARM: Displacement Aware Relation Module for 3D Detection	Paper
2722	Learning Object Context for Novel-View Scene Layout Generation	Paper
2166	Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts	Paper
348	Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image	Paper
5927	Raw High-Definition Radar for Multi-Task Learning	Paper
2343	Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation	Paper
7169	UKPGAN: A General Self-Supervised Keypoint Detector	Paper
5057	Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos	Paper

Navigation & Autonomous Driving

Paper Id	Paper Title	Link
104	Rethinking Efficient Lane Detection via Curve Modeling	Paper
5623	Exploiting Temporal Relations on Radar Perception for Autonomous Driving	Paper
1321	Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective	Paper
3631	BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement	Paper
9886	ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning	Paper
607	Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion	Paper
1397	Vehicle Trajectory Prediction Works, but Not Everywhere	Paper
9659	LTP: Lane-Based Trajectory Prediction for Autonomous Driving	Paper
2468	ONCE-3DLanes: Building Monocular 3D Lane Detection	Paper
10899	Towards Driving-Oriented Metric for Lane Detection Models	Paper
6918	Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes	Paper
5120	LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection	Paper
1664	DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection	Paper
9131	A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation	Paper
7091	Forecasting From LiDAR via Future Object Detection	Paper
5998	RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding	Paper
3364	Learning From All Vehicles	Paper
10331	Is Mapping Necessary for Realistic PointGoal Navigation?	Paper
9772	Symmetry-Aware Neural Architecture for Embodied Visual Exploration	Paper
6482	Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles	Paper
10621	Topology Preserving Local Road Network Estimation From Single Onboard Camera Image	Paper
6744	Coupling Vision and Proprioception for Navigation of Legged Robots	Paper
10063	Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation	Paper
9391	3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection	Paper
4385	Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior	Paper
2537	SelfD: Self-Learning Large-Scale Driving Policies From the Web	Paper
5244	Towards Real-World Navigation With Deep Differentiable Planners	Paper
10481	Privacy Preserving Partial Localization	Paper
6490	Efficient Large-Scale Localization by Global Instance Recognition	Paper
5459	CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data	Paper

Vision & Graphics

Paper Id	Paper Title	Link
84	De-Rendering 3D Objects in the Wild	Paper
4234	Neural Fields As Learnable Kernels for 3D Reconstruction	Paper
3715	HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing	Paper
2744	3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies	Paper
2410	Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian	Paper
9567	Deep Image-Based Illumination Harmonization	Paper
1834	Glass: Geometric Latent Augmentation for Shape Spaces	Paper
1559	PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes	Paper
1478	Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes	Paper
9364	Neural Mesh Simplification	Paper
6486	SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters	Paper
7818	CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation	Paper
7841	UNIST: Unpaired Neural Implicit Shape Translation Network	Paper
1800	CoNeRF: Controllable Neural Radiance Fields	Paper
6407	Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling	Paper
8338	Modeling Indirect Illumination for Inverse Rendering	Paper
3519	Neural Head Avatars From Monocular RGB Videos	Paper
2341	DeepCurrents: Learning Implicit Representations of Shapes With Boundaries	Paper

Biometrics, Face & Gestures, and Medical Image Analysis

Paper Id	Paper Title	Link
4335	Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination	Paper
5110	AnyFace: Free-Style Text-To-Face Synthesis and Manipulation	Paper
5301	General Facial Representation Learning in a Visual-Linguistic Manner	Paper
5269	Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection	Paper
1219	Detecting Deepfakes With Self-Blended Images	Paper
5967	3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces	Paper
9638	Evaluation-Oriented Knowledge Distillation for Deep Face Recognition	Paper
6682	AdaFace: Quality Adaptive Margin for Face Recognition	Paper
6920	Moving Window Regression: A Novel Approach to Ordinal Regression	Paper
10531	FaceFormer: Speech-Driven 3D Facial Animation With Transformers	Paper
11053	Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in “In-the-Wild” Videos	Paper
229	Deep Decomposition for Stochastic Normal-Abnormal Transport	Paper
3114	DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification	Paper
10426	Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification	Paper
10994	Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations	Paper

Scene & Shape Analysis and Understanding

Paper Id	Paper Title	Link
4710	VRDFormer: End-to-End Video Visual Relation Detection With Transformers	Paper
720	Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation	Paper
9896	Visual Acoustic Matching	Paper
5847	The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation	Paper
4283	Learning Multiple Dense Prediction Tasks From Partially Annotated Data	Paper
9443	PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning	Paper
5513	Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture	Paper
5826	FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation	Paper
3020	Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding	Paper
11019	Equivariant Point Cloud Analysis via Learning Orientations for Message Passing	Paper
2137	Surface Representation for Point Clouds	Paper
3284	Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds	Paper
3846	3D Common Corruptions and Data Augmentation	Paper
4027	INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation	Paper
11446	How Much Does Input Data Type Impact Final Face Model Accuracy?	Paper

Datasets & Evaluation, Action & Event Recognition, and Visual Question Answering

Paper Id	Paper Title	Link
7484	Ego4D: Around the World in 3,000 Hours of Egocentric Video	Paper
10504	TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting	Paper
5075	Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding	Paper
2465	vCLIMB: A Novel Video Class Incremental Learning Benchmark	Paper
2221	Opening Up Open World Tracking	Paper
1795	Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions	Paper
8910	CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters	Paper
11289	Failure Modes of Domain Generalization Algorithms	Paper
9398	A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes	Paper
6567	Grounding Answers for Visual Questions Asked by Visually Impaired People	Paper
6719	Learning To Answer Questions in Dynamic Audio-Visual Scenarios	Paper
1780	Episodic Memory Question Answering	Paper
11561	ScanQA: 3D Question Answering for Spatial Scene Understanding	Paper
5943	Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles	Paper
8893	BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild	Paper

Scene Analysis and Understanding

Paper Id	Paper Title	Link
98	Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation	Paper
2849	Structured Sparse R-CNN for Direct Scene Graph Generation	Paper
10248	PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation	Paper
3738	RU-Net: Regularized Unrolling Network for Scene Graph Generation	Paper
1142	Fine-Grained Predicates Learning for Scene Graph Generation	Paper
3323	HL-Net: Heterophily Learning Network for Scene Graph Generation	Paper
10227	SGTR: End-to-End Scene Graph Generation With Transformer	Paper
6703	Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs	Paper
8205	RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition	Paper
10369	Spatial Commonsense Graph for Object Localisation in Partial Scenes	Paper
4148	The Pedestrian Next to the Lamppost : Adaptive Object Graphs for Better Instantaneous Mapping	Paper
7832	Category-Aware Transformer Network for Better Human-Object Interaction Detection	Paper
7619	Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection	Paper
3379	Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection	Paper
10087	Human-Object Interaction Detection via Disentangled Transformer	Paper
5684	MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection	Paper
7237	GaTector: A Unified Framework for Gaze Object Prediction	Paper
6242	STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes	Paper
7926	Crowd Counting in the Frequency Domain	Paper
3876	Boosting Crowd Counting via Multifaceted Attention	Paper
6137	Rethinking Spatial Invariance of Convolutional Networks for Object Counting	Paper
6322	Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing	Paper
5725	Collaborative Transformers for Grounded Situation Recognition	Paper

Action and Event Recognition

Paper Id	Paper Title	Link
2817	Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos	Paper
5042	SVIP: Sequence VerIfication for Procedures in Videos	Paper
3292	Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency	Paper
5855	Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization	Paper
3084	GateHUB: Gated History Unit With Background Suppression for Online Action Detection	Paper
7477	E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition	Paper
4495	Hybrid Relation Guided Set Matching for Few-Shot Action Recognition	Paper
3385	Spatio-Temporal Relation Modeling for Few-Shot Action Recognition	Paper
9787	Alignment-Uniformity Aware Representation Learning for Zero-Shot Video Classification	Paper
1862	Cross-Modal Representation Learning for Zero-Shot Action Recognition	Paper
6938	Cross-Modal Background Suppression for Audio-Visual Event Localization	Paper
3142	Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization	Paper
1068	An Empirical Study of End-to-End Temporal Action Detection	Paper
11191	Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval	Paper
9295	DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition	Paper
730	MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection	Paper
2917	Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition	Paper
9674	AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition	Paper
5856	UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Paper
3946	Detector-Free Weakly Supervised Group Activity Recognition	Paper
2870	Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading	Paper
2752	Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer	Paper
517	Interactiveness Field in Human-Object Interactions	Paper
2258	GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection	Paper
7690	Object-Relation Reasoning Graph for Action Recognition	Paper
4315	UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection	Paper
1483	Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition	Paper
9379	SPAct: Self-Supervised Privacy Preservation for Action Recognition	Paper
818	Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering	Paper
28	InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition	Paper
11846	Learning Video Representations of Human Motion From Synthetic Data	Paper
6314	Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos	Paper

Biometrics

Paper Id	Paper Title	Link
1752	EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images	Paper
3373	Gait Recognition in the Wild With Dense 3D Representations and a Benchmark	Paper
1403	Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification	Paper
3765	Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition	Paper
9404	DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover's Distance Improves Out-of-Distribution Face Identification	Paper
1311	Learning Second Order Local Anomaly for General Face Forgery Detection	Paper
4821	PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition	Paper
10637	Face2Exp: Combating Data Biases for Facial Expression Recognition	Paper
11994	Local-Adaptive Face Recognition via Graph-Based Meta-Clustering and Regularized Adaptation	Paper

Face and Gestures

Paper Id	Paper Title	Link
4811	EMOCA: Emotion Driven Monocular Face Capture and Animation	Paper
6513	Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality	Paper
2290	FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset	Paper
4969	ImFace: A Nonlinear 3D Morphable Face Model With Implicit Neural Representations	Paper
3883	Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling	Paper
736	RigNeRF: Fully Controllable Neural 3D Portraits	Paper
5362	HeadNeRF: A Real-Time NeRF-Based Parametric Head Model	Paper
7738	Sparse to Dense Dynamic 3D Facial Expression Generation	Paper
812	Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion	Paper
7201	Speech Driven Tongue Animation	Paper
6728	Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition	Paper
980	gDNA: Towards Generative Detailed Neural Avatars	Paper
1874	GraFormer: Graph-Oriented Transformer for 3D Pose Estimation	Paper
10976	Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation	Paper
501	Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis	Paper
2836	PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence	Paper
4356	The Wanderings of Odysseus in 3D Scenes	Paper
6883	OSSO: Obtaining Skeletal Shape From Outside	Paper
11477	LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds	Paper
3402	Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression	Paper
2046	Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation	Paper
6216	LISA: Learning Implicit Shape and Appearance of Hands	Paper
3384	MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image	Paper
5835	Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation	Paper
7098	Low-Resource Adaptation for Personalized Co-Speech Gesture Generation	Paper
921	D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions	Paper

Medical, Biological and Cell Microscopy

Paper Id	Paper Title	Link
1104	Synthetic Generation of Face Videos With Plethysmograph Physiology	Paper
9240	Contour-Hugging Heatmaps for Landmark Detection	Paper
4486	Which Images To Label for Few-Shot Medical Landmark Detection?	Paper
4473	Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography	Paper
8680	Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization	Paper
8210	Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution	Paper
6627	Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations	Paper
3856	Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation	Paper
6449	BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation	Paper
6189	Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis	Paper
9027	Towards Low-Cost and Efficient Malaria Detection	Paper
5588	ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification	Paper
2696	Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification	Paper
9084	M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer	Paper
121	Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis	Paper
10799	HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet	Paper
2649	DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations	Paper
10420	Clean Implicit 3D Structure From Noisy 2D STEM Images	Paper
7672	Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks	Paper
4123	Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment	Paper
3819	Learning Optimal K-Space Acquisition and Reconstruction Using Physics-Informed Neural Networks	Paper
2466	NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration	Paper
4362	SMPL-A: Modeling Person-Specific Deformable Anatomy	Paper
1830	DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis	Paper
1826	Affine Medical Image Registration With Coarse-To-Fine Vision Transformer	Paper
9880	Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow	Paper
1002	Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization	Paper
6023	Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation	Paper
6328	FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis	Paper
8360	Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning	Paper
5948	CellTypeGraph: A New Geometric Computer Vision Benchmark	Paper
8619	ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics	Paper

Datasets and Evaluation

Paper Id	Paper Title	Link
378	FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos	Paper
675	Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions	Paper
10327	DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image	Paper
583	OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction	Paper
9029	PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking	Paper
6336	Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification	Paper
3564	JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection	Paper
1672	DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion	Paper
9778	Egocentric Prediction of Action Target in 3D	Paper
1950	HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction	Paper
10801	Amodal Panoptic Segmentation	Paper
8175	Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark	Paper
4070	YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset	Paper
9179	The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting	Paper
10392	3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos	Paper
8328	AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval	Paper
4732	A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection	Paper
2077	Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities	Paper
8123	Optimal Correction Cost for Object Detection Evaluation	Paper
7936	GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains	Paper
6061	ABO: Dataset and Benchmarks for Real-World 3D Object Understanding	Paper
11100	Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation	Paper
4346	ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes	Paper
4313	DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation	Paper
4793	Open Challenges in Deep Stereo: The Booster Dataset	Paper
2647	No-Reference Point Cloud Quality Assessment via Domain Adaptation	Paper
1637	Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network	Paper
2810	How Good Is Aesthetic Ability of a Fashion Model?	Paper
656	Instance-Wise Occlusion and Depth Orders in Natural Scenes	Paper
7655	PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects	Paper
436	Replacing Labeled Real-Image Datasets With Auto-Generated Contours	Paper
7315	V2C: Visual Voice Cloning	Paper
6786	M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining	Paper
11067	It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection	Paper
4520	From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering	Paper
718	Point Cloud Pre-Training With Natural 3D Structures	Paper
1658	The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift	Paper
9913	AutoMine: An Unmanned Mine Dataset	Paper
11097	SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis	Paper
4797	BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations	Paper
2027	Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task	Paper
8222	Unifying Panoptic Segmentation for Autonomous Driving	Paper
10407	DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection	Paper
3296	SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation	Paper
11670	Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions	Paper

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
scripts		scripts
CVPR2022.txt		CVPR2022.txt
LICENSE		LICENSE
README.md		README.md
cvf_data_w_pdf.json		cvf_data_w_pdf.json
cvpr_2022_dataset.csv		cvpr_2022_dataset.csv
cvpr_data.json		cvpr_data.json
cvpr_sheet.csv		cvpr_sheet.csv
cvpr_sheet_curated.csv		cvpr_sheet_curated.csv

License

riaz/CVPR-2022

Folders and files

Latest commit

History

Repository files navigation

CVPR-2022

Machine Learning

Statistical Methods

Optimization Methods

Deep Learning Architectures & Techniques

Recognition: Detection, Categorization, Retrieval

Segmentation, Grouping and Shape Analysis

3D From Single Images

Photogrammetry and Remote Sensing

Low-Level Vision

Behavior Analysis

Vision Applications & Systems

Video Analysis & Understanding

Image & Video Synthesis and Generation

Face & Gestures

Document Analysis & Understanding

Vision & Language

3D From Multi-View & Sensors

Motion & Tracking

Pose Estimation & Tracking

Transfer_Low-Shot_Long-Tail Learning

Motion, Tracking, Registration, Vision & X, and Theory

3D from Multiview & Sensors, Learning for Vision, Explainable Vision, and Privacy

Computer Vision Theory

Self_Semi_Meta & Unsupervised Learning

Privacy and Federated Learning

Explainable Computer Vision

Transparency, Fairness, Accountability, Privacy & Ethics in Vision

Vision & X

Image & Video Synthesis and Generation (I)

Human Pose Estimation & Tracking, Localization, and Object Pose Estimation

Efficient Learning & Inference

Physics-Based Vision and Shape-From-X

Visual Reasoning

Security, Transparency, Fairness, Accountability, Privacy & Ethics in Vision

Image & Video Synthesis and Generation (II); Video Analysis & Understanding

Recognition, Learning for Vision, and Robot Vision

Self_Semi_Meta-, & Unsupervised Learning

Robot Vision

Computer Vision for Social Good

Adversarial Attack & Defense

Representation Learning

Computational Photography

Scene Analysis & Understanding

Navigation & Autonomous Driving

Vision & Graphics

Biometrics, Face & Gestures, and Medical Image Analysis

Scene & Shape Analysis and Understanding

Datasets & Evaluation, Action & Event Recognition, and Visual Question Answering

Scene Analysis and Understanding

Action and Event Recognition

Biometrics

Face and Gestures

Medical, Biological and Cell Microscopy

Datasets and Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages