Skip to content
/ CVPR-2022 Public

Papers and Code from CVPR 2022, including scripts to extract them

License

Notifications You must be signed in to change notification settings

riaz/CVPR-2022

Repository files navigation

CVPR-2022

Papers and Code from CVPR 2022, including scripts to extract them

Machine Learning

Paper Id Paper Title Link
11954 Efficient Deep Embedded Subspace Clustering Paper
11402 Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers Paper
9445 CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data Paper
8776 Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning Paper
6978 Active Learning for Open-Set Annotation Paper
9075 Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training Paper
6601 Robust Optimization As Data Augmentation for Large-Scale Graphs Paper
6298 A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty Paper
6106 The Devil Is in the Margin: Margin-Based Label Smoothing for Network Calibration Paper
6705 Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector Paper
10071 GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning Paper
7829 Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning Paper
5988 A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration Paper
2503 Learning To Learn by Jointly Optimizing Neural Architecture and Weights Paper
9806 Learning To Prompt for Continual Learning Paper
2016 Meta-Attention for ViT-Backed Continual Learning Paper
1343 Multi-Frame Self-Supervised Depth With Transformers Paper
10018 Continual Learning With Lifelong Vision Transformer Paper
780 Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation Paper
4874 Revisiting Random Channel Pruning for Neural Network Compression Paper
8330 Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase Paper
9551 Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning Paper
10484 Towards Robust and Reproducible Active Learning Using Neural Networks Paper
7082 Non-Iterative Recovery From Nonlinear Observations Using Generative Models Paper
11093 Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders Paper
4542 Robust Combination of Distributed Gradients Under Adversarial Perturbations Paper
11143 Do Learned Representations Respect Causal Relationships? Paper
11220 How Much More Data Do I Need? Estimating Requirements for Downstream Tasks Paper
8156 Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees Paper
11131 Contrastive Test-Time Adaptation Paper
448 AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation Paper
1561 Selective-Supervised Contrastive Learning With Noisy Labels Paper
7807 RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks Paper
3279 Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction Paper

Statistical Methods

Paper Id Paper Title Link
3348 Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels Paper
7912 Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design Paper
8877 Learning Structured Gaussians To Approximate Deep Ensembles Paper
11673 Out-of-Distribution Generalization With Causal Invariant Transformations Paper
8393 Split Hierarchical Variational Compression Paper
9244 Implicit Feature Decoupling With Depthwise Quantization Paper
282 Understanding Uncertainty Maps in Vision With Statistical Testing Paper

Optimization Methods

Paper Id Paper Title Link
785 A Hybrid Quantum-Classical Algorithm for Robust Fitting Paper
5911 A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching Paper
6021 FastDOG: Fast Discrete Optimization on GPU Paper
9232 Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization Paper
10092 AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks Paper
11171 Training Quantised Neural Networks With STE Variants: The Additive Noise Annealing Algorithm Paper
2028 AME: Attention and Memory Enhancement in Hyper-Parameter Optimization Paper
11189 Efficient Maximal Coding Rate Reduction by Variational Forms Paper
10155 A Unified Framework for Implicit Sinkhorn Differentiation Paper
6845 Computing Wasserstein-p Distance Between Images With Linear Cost Paper
9064 An Iterative Quantum Approach for Transformation Estimation From Point Sets Paper

Deep Learning Architectures & Techniques

Paper Id Paper Title Link
116 Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training? Paper
5389 BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule Paper
7704 Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search Paper
4143 Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search Paper
5167 GreedyNASv2: Greedier Search With a Greedy Path Filter Paper
1115 Neural Architecture Search With Representation Mutual Information Paper
7148 Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search Paper
8841 Knowledge Distillation With the Reused Teacher Classifier Paper
2812 Self-Distillation From the Last Mini-Batch for Consistency Regularization Paper
142 Decoupled Knowledge Distillation Paper
7053 Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs Paper
123 A ConvNet for the 2020s Paper
7254 Beyond Fixation: Dynamic Window Visual Transformer Paper
7867 Lite Vision Transformer With Enhanced Self-Attention Paper
7428 Swin Transformer V2: Scaling Up Capacity and Resolution Paper
4325 The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Paper
9412 MulT: An End-to-End Multitask Learning Transformer Paper
3664 Towards Robust Vision Transformer Paper
9773 DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers Paper
2434 MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens Paper
1032 NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition Paper
2029 TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper
4853 Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation Paper
10350 Scaling Vision Transformers Paper
7298 Bridged Transformer for Vision and Point Cloud 3D Object Detection Paper
1981 CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows Paper
3562 TransMix: Attend To Mix for Vision Transformers Paper
2388 MiniViT: Compressing Vision Transformers With Weight Multiplexing Paper
11460 Fine-Tuning Image Transformers Using Learnable Memory Paper
4430 Patch Slimming for Efficient Vision Transformers Paper
5093 CMT: Convolutional Neural Networks Meet Vision Transformers Paper
6795 Multimodal Token Fusion for Vision Transformers Paper

Recognition: Detection, Categorization, Retrieval

Paper Id Paper Title Link
2257 Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation Paper
8811 Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model Paper
184 Sign Language Video Retrieval With Free-Form Textual Queries Paper
11661 FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback Paper
4918 Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation Paper
9957 ESCNet: Gaze Target Detection With the Understanding of 3D Scenes Paper
2489 Interactive Multi-Class Tiny-Object Detection Paper
9614 Weakly Supervised Rotation-Invariant Aerial Object Detection Network Paper
8402 Large Loss Matters in Weakly Supervised Multi-Label Classification Paper
8000 MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning Paper
1233 FreeSOLO: Learning To Segment Objects Without Annotations Paper
2645 Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection Paper
3784 SIOD: Single Instance Annotated per Category per Image for Object Detection Paper
4574 Towards Robust Adaptive Object Detection Under Noisy Annotations Paper
3139 Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection Paper
3751 Salvage of Supervision in Weakly Supervised Object Detection Paper
6430 Label, Verify, Correct: A Simple Few Shot Object Detection Method Paper
944 Background Activation Suppression for Weakly Supervised Object Localization Paper
4063 Bridging the Gap Between Classification and Localization for Weakly Supervised Object Localization Paper
2560 Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery Paper
6708 Cloth-Changing Person Re-Identification From a Single Image With Gait Prediction and Regularization Paper
1508 Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation Paper
4122 Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification Paper
10049 Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification Paper
7097 Towards Total Recall in Industrial Anomaly Detection Paper
1207 H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection Paper
4192 Geometric and Textural Augmentation for Domain Gap Reduction Paper
10135 General Incremental Learning With Domain-Aware Categorical Representations Paper
491 DST: Dynamic Substitute Training for Data-Free Black-Box Attack Paper
8711 ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation Paper

Segmentation, Grouping and Shape Analysis

Paper Id Paper Title Link
6126 Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation Paper
2899 Generalized Few-Shot Semantic Segmentation Paper
9018 Learning Non-Target Knowledge for Few-Shot Semantic Segmentation Paper
4783 Decoupling Zero-Shot Semantic Segmentation Paper
1590 Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation Paper
1034 ContrastMask: Contrastive Learning To Segment Every Thing Paper
7789 The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference Paper
2539 AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation Paper
1707 APES: Articulated Part Extraction From Sprite Sheets Paper
2544 GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation Paper
6790 CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision Paper
5602 Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images Paper
3446 C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image Paper
6268 CRIS: CLIP-Driven Referring Image Segmentation Paper
7820 MatteFormer: Transformer-Based Image Matting via Prior-Tokens Paper
3851 Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation Paper
3405 Pyramid Grafting Network for One-Stage High Resolution Saliency Detection Paper
2123 Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection Paper
4573 Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation Paper
5002 GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings Paper
587 Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport Paper
3312 CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly Paper
1743 RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures Paper
3978 Discovering Objects That Can Move Paper
2604 PatchFormer: An Efficient Point Transformer With Patch Attention Paper
4099 Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap Paper
3933 SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation Paper
4983 An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation Paper
7469 Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation Paper
4583 Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders Paper

3D From Single Images

Paper Id Paper Title Link
41 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation Paper
4391 Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction Paper
6088 DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation Paper
2989 MonoGround: Detecting Monocular 3D Objects From the Ground Paper
2686 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow Paper
2657 Toward Practical Monocular Indoor Depth Estimation Paper
4692 Focal Length and Object Pose Estimation via Render and Compare Paper
6311 CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields Paper
2116 Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images Paper
8082 Layered Depth Refinement With Mask Guidance Paper
1031 HEAT: Holistic Edge Attention Transformer for Structured Reconstruction Paper
931 BARC: Learning To Regress 3D Dog Shape From Images by Exploiting Breed Information Paper
8688 Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving Paper
816 What's in Your Hands? 3D Reconstruction of Generic Objects in Hands Paper
7814 3D Moments From Near-Duplicate Photos Paper
5766 Neural Window Fully-Connected CRFs for Monocular Depth Estimation Paper
9095 PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors Paper
3717 CroMo: Cross-Modal Learning for Monocular Depth Estimation Paper
258 f-SfT: Shape-From-Template With a Physics-Based Deformation Model Paper
923 Human-Aware Object Placement for Visual Environment Reconstruction Paper
11298 AutoRF: Learning 3D Object Radiance Fields From Single View Observations Paper
7080 Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation Paper
2163 MonoScene: Monocular 3D Semantic Scene Completion Paper
12016 GenDR: A Generalized Differentiable Renderer Paper
4069 MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer Paper
7078 ROCA: Robust CAD Model Retrieval and Alignment From a Single Image Paper

Photogrammetry and Remote Sensing

Paper Id Paper Title Link
971 HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening Paper
990 Revisiting Near/Remote Sensing With Geospatial Attention Paper
2718 Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening Paper
3511 Mutual Information-Driven Pan-Sharpening Paper
3982 Sparse and Complete Latent Organization for Geospatial Semantic Segmentation Paper
5907 The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions Paper
4025 Oriented RepPoints for Aerial Object Detection Paper
6403 Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction Paper
8986 PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images Paper
8832 Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites Paper

Low-Level Vision

Paper Id Paper Title Link
163 Bilateral Video Magnification Filter Paper
4527 Neural Data-Dependent Transform for Learned Image Compression Paper
4329 Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence Paper
4093 Deep Generalized Unfolding Networks for Image Restoration Paper
3967 Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling Paper
9885 XYDeblur: Divide and Conquer for Single Image Deblurring Paper
8572 Abandoning the Bayer-Filter To See in the Dark Paper
9293 RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution Paper
8149 All-in-One Image Restoration for Unknown Corruption Paper
9697 Modeling sRGB Camera Noise With Normalizing Flows Paper
3788 A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift Paper
1431 Video Frame Interpolation Transformer Paper
1412 The Devil Is in the Details: Window-Based Attention for Image Compression Paper
1176 Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction Paper
3387 RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs Paper
3051 AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement Paper
2882 HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging Paper
2182 HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging Paper
6342 Learning To Zoom Inside Camera Imaging Pipeline Paper
335 Towards an End-to-End Framework for Flow-Guided Video Inpainting Paper
2141 Context-Aware Video Reconstruction for Rolling Shutter Cameras Paper
5516 CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image Paper
4529 Global Matching With Overlapping Attention for Optical Flow Estimation Paper
1482 CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow Paper
1048 Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression Paper
4286 Video Demoiréing With Relation-Based Temporal Consistency Paper
6635 Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images Paper
5086 Deep Constrained Least Squares for Blind Image Super-Resolution Paper
12027 Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model Paper
5762 Unsupervised Homography Estimation With Coplanarity-Aware GAN Paper

Behavior Analysis

Paper Id Paper Title Link
2656 Self-Supervised Keypoint Discovery in Behavioral Videos Paper
874 Learning To Align Sequential Actions in the Wild Paper
7245 Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination Paper
4809 End-to-End Human-Gaze-Target Detection With Transformers Paper
7132 Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis Paper
9590 MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction Paper
10192 Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction Paper
7946 End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps Paper
754 Learning Affordance Grounding From Exocentric Images Paper

Vision Applications & Systems

Paper Id Paper Title Link
1915 3D Scene Painting via Semantic Image Synthesis Paper
6370 Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography Paper
2264 ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection Paper
2112 Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches Paper
5892 Image Disentanglement Autoencoder for Steganography Without Embedding Paper
1885 Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection Paper
5934 Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning Paper
1616 Density-Preserving Deep Point Cloud Compression Paper
9360 Graph-Context Attention Networks for Size-Varied Deep Graph Matching Paper
968 TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions Paper
1872 ObjectFormer for Image Manipulation Detection and Localization Paper
7760 Sequential Voting With Relational Box Fields for Active Object Detection Paper
6580 Efficient Classification of Very Large Images With Tiny Objects Paper
6468 Partially Does It: Towards Scene-Level FG-SBIR With Partial Input Paper
6025 Long-Term Visual Map Sparsification With Heterogeneous GNN Paper
141 Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association Paper
10095 DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation Paper
3621 Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring Paper
1079 Rethinking Image Cropping: Exploring Diverse Compositions From Global Views Paper
1680 Defensive Patches for Robust Recognition in the Physical World Paper
8380 Semi-Supervised Video Paragraph Grounding With Contrastive Encoder Paper
5336 Large-Scale Pre-Training for Person Re-Identification With Noisy Labels Paper
1146 Meta Distribution Alignment for Generalizable Person Re-Identification Paper
5429 FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction Paper
2926 It's About Time: Analog Clock Reading in the Wild Paper
9312 Consistency Driven Sequential Transformers Attention Model for Partially Observable Scenes Paper
9923 SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles Paper
9662 Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers Paper
9541 Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving Paper

Video Analysis & Understanding

Paper Id Paper Title Link
1811 UnweaveNet: Unweaving Activity Stories Paper
7769 Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos Paper
1319 Audio-Adaptive Activity Recognition Across Video Domains Paper
6385 Frame-Wise Action Representations for Long Videos via Sequence Contrastive Learning Paper
9349 Image Based Reconstruction of Liquids From 2D Surface Detections Paper
1579 Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency Paper
6891 How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs Paper
2102 Programmatic Concept Learning for Human Motion Description and Synthesis Paper
4326 Learning To Recognize Procedural Activities With Distant Supervision Paper
6761 Implicit Motion Handling for Video Camouflaged Object Detection Paper
11553 Dynamic Scene Graph Generation via Anticipatory Pre-Training Paper
1845 Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization Paper
3930 OCSampler: Compressing Videos to One Clip With Single-Step Sampling Paper
5670 A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting Paper
3981 TubeFormer-DeepLab: Video Mask Transformer Paper
2673 ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization Paper
2928 GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation Paper
8639 STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction Paper
3656 Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos Paper
5386 End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection Paper
5430 Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision Paper
2018 Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement Paper
6082 A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information Paper
8138 Long-Short Temporal Contrastive Learning of Video Transformers Paper
4525 Scene Consistency Representation Learning for Video Scene Segmentation Paper
1024 Unsupervised Pre-Training for Temporal Action Localization Tasks Paper
7000 Contrastive Learning for Unsupervised Video Highlight Detection Paper
8133 Deformable Video Transformer Paper
8415 Recurring the Transformer for Video Action Recognition Paper

Image & Video Synthesis and Generation

Paper Id Paper Title Link
5438 Text to Image Generation With Semantic-Spatial Aware GAN Paper
107 StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis Paper
5345 Blended Diffusion for Text-Driven Editing of Natural Images Paper
5128 Make It Move: Controllable Image-to-Video Generation With Text Descriptions Paper
5317 Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model Paper
10144 A Style-Aware Discriminator for Controllable Image Translation Paper
8904 Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint Paper
10441 Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks Paper
8356 FlexIT: Towards Flexible Semantic Image Translation Paper
4022 Modulated Contrast for Versatile Image Synthesis Paper
8146 QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation Paper
9818 Self-Supervised Dense Consistency Regularization for Image-to-Image Translation Paper
155 Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation Paper
2538 InstaFormer: Instance-Aware Image-to-Image Translation With Transformer Paper
648 Unsupervised Image-to-Image Translation With Generative Prior Paper
133 StylizedNeRF: Consistent 3D Scene Stylization As Stylized NeRF via 2D-3D Mutual Learning Paper
30 NeRF-Editing: Geometry Editing of Neural Radiance Fields Paper
8276 GeoNeRF: Generalizing NeRF With Geometry Priors Paper
5276 Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation Paper
10588 AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields Paper
5174 HDR-NeRF: High Dynamic Range Neural Radiance Fields Paper
3703 NeRFReN: Neural Radiance Fields With Reflections Paper
4368 Neural Point Light Fields Paper
697 3D-Aware Image Synthesis via Learning Structural and Textural Representations Paper
6895 GIRAFFE HD: A High-Resolution 3D-Aware Generative Model Paper
1474 Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis Paper
5736 Bi-Level Doubly Variational Learning for Energy-Based Latent Variable Models Paper
5811 High-Resolution Image Harmonization via Collaborative Dual Transformations Paper
10156 Brain-Supervised Image Editing Paper

Face & Gestures

Paper Id Paper Title Link
4047 HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network Paper
6529 Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC Paper
8566 Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning Paper
5578 Enhancing Face Recognition With Self-Supervised 3D Reconstruction Paper
5996 Learning To Learn Across Diverse Data Biases in Deep Face Recognition Paper
7320 An Efficient Training Approach for Very Large Scale Face Recognition Paper
4045 MogFace: Towards a Deeper Appreciation on Face Detection Paper
7382 Exploring Frequency Adversarial Attacks for Face Forgery Detection Paper
7163 End-to-End Reconstruction-Classification Learning for Face Forgery Detection Paper
3804 Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing Paper
9981 Privacy-Preserving Online AutoML for Domain-Specific Face Detection Paper
891 Simulated Adversarial Testing of Face Recognition Models Paper
5782 Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing Paper
2510 Towards Semi-Supervised Deep Facial Expression Recognition With an Adaptive Confidence Margin Paper
5638 Towards Accurate Facial Landmark Detection via Cascaded Transformers Paper
3038 PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer Paper
5557 GazeOnce: Real-Time Multi-Person Gaze Estimation Paper
3783 Generalizing Gaze Estimation With Rotation Consistency Paper
4512 Face Relighting With Geometrically Consistent Shadows Paper
2485 HairMapper: Removing Hair From Portraits Using GANs Paper
5664 Learning To Restore 3D Face From In-the-Wild Degraded Images Paper

Document Analysis & Understanding

Paper Id Paper Title Link
2898 Open-Set Text Recognition via Character-Context Decoupling Paper
3331 Neural Collaborative Graph Machines for Table Structure Recognition Paper
4051 Revisiting Document Image Dewarping by Grid Regularization Paper
4161 Syntax-Aware Network for Handwritten Mathematical Expression Recognition Paper
4743 Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection Paper
5258 Fourier Document Restoration for Robust Document Dewarping and Recognition Paper
6276 XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding Paper
7348 SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition Paper
2703 Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer Paper
3686 TableFormer: Table Structure Understanding With Transformers Paper
8352 Knowledge Mining With Scene Text for Fine-Grained Recognition Paper
11454 PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents Paper

Vision & Language

Paper Id Paper Title Link
2043 Towards Implicit Text-Guided 3D Shape Generation Paper
9380 Towards Language-Free Training for Text-to-Image Generation Paper
7612 ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Paper
4952 EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Paper
7374 Hierarchical Modular Network for Video Captioning Paper
3770 SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning Paper
3222 End-to-End Generative Pretraining for Multimodal Video Captioning Paper
4855 Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning Paper
8115 Scaling Up Vision-Language Pre-Training for Image Captioning Paper
9270 Comprehending and Ordering Semantics for Image Captioning Paper
11498 NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge Paper
814 Injecting Semantic Concepts Into End-to-End Image Captioning Paper
1613 DIFNet: Boosting Visual Information Flow for Image Captioning Paper
8224 VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning Paper
7848 Show, Deconfound and Tell: Image Captioning With Causal Inference Paper
9257 EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval Paper
11667 CLIPstyler: Image Style Transfer With a Single Text Condition Paper
4042 HairCLIP: Design Your Hair by Text and Reference Image Paper
1965 DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting Paper
11622 On Guiding Visual Attention With Language Specification Paper
9610 UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog Paper
10953 Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer Paper
10338 LiT: Zero-Shot Transfer With Locked-Image Text Tuning Paper
851 GroupViT: Semantic Segmentation Emerges From Text Supervision Paper
1404 ReSTR: Convolution-Free Referring Image Segmentation Using Transformers Paper
1565 LAVT: Language-Aware Vision Transformer for Referring Image Segmentation Paper
7782 An Empirical Study of Training End-to-End Vision-and-Language Transformers Paper
7761 Are Multimodal Transformers Robust to Missing Modality? Paper

3D From Multi-View & Sensors

Paper Id Paper Title Link
4834 NeurMiPs: Neural Mixture of Planar Experts for View Synthesis Paper
4419 FWD: Real-Time Novel View Synthesis With Forward Warping and Depth Paper
441 SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images Paper
11049 Fast, Accurate and Memory-Efficient Partial Permutation Synchronization Paper
2015 Learning To Find Good Models in RANSAC Paper
9080 Optimizing Elimination Templates by Greedy Parameter Search Paper
11523 GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision Paper
2580 HARA: A Hierarchical Approach for Robust Rotation Averaging Paper
4166 RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging Paper
11316 A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors Paper
4211 ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance Paper
6651 Self-Supervised Neural Articulated Shape and Appearance Models Paper
6645 Virtual Elastic Objects Paper
3282 Decoupling Makes Weakly Supervised Local Feature Better Paper
1667 JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints Paper
640 ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging Paper
9217 DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering Paper
8789 Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis Paper
1269 Structured Local Radiance Fields for Human Avatar Modeling Paper
4685 High-Fidelity Human Avatars From a Single RGB Camera Paper
5827 Forecasting Characteristic 3D Poses of Human Actions Paper
817 Virtual Correspondence: Humans as a Cue for Extreme-View Geometry Paper
869 BEHAVE: Dataset and Method for Tracking Human Object Interactions Paper
3549 Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives Paper
8956 RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation Paper
9005 NPBG++: Accelerating Neural Point-Based Graphics Paper
5409 Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows Paper
875 Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos Paper

Motion & Tracking

Paper Id Paper Title Link
8292 TransforMatcher: Match-to-Match Attention for Semantic Correspondence Paper
1610 Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences Paper
2606 Locality-Aware Inter– and Intra-Video Reconstruction for Self-Supervised Correspondence Learning Paper
6011 Transforming Model Prediction for Tracking Paper
10078 Ranking-Based Siamese Visual Tracking Paper
3860 Correlation-Aware Deep Tracking Paper
3825 Global Tracking via Ensemble of Local Trackers Paper
909 Global Tracking Transformers Paper
1198 Unified Transformer Tracker for Object Tracking Paper
9651 Transformer Tracking With Cyclic Shifting Window Attention Paper
7487 Spiking Transformers for Event-Based Single Object Tracking Paper
6379 Adiabatic Quantum Computing for Multi Object Tracking Paper
8065 HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction Paper
2493 Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking Paper
9395 TrackFormer: Multi-Object Tracking With Transformers Paper
4294 Learning of Global Objective for Network Flow in Multi-Object Tracking Paper
5264 LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Paper
3128 Multi-Object Tracking Meets Moving UAV Paper
912 Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline Paper
2683 Unsupervised Domain Adaptation for Nighttime Aerial Tracking Paper
6998 Learning Optical Flow With Kernel Patch Attention Paper
5798 Towards Understanding Adversarial Robustness of Optical Flow Networks Paper
5641 DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow Paper

Pose Estimation & Tracking

Paper Id Paper Title Link
8367 Multi-Person Extreme Motion Prediction Paper
51 Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation Paper
9962 AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation Paper
4071 Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation Paper
6971 Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation Paper
10385 Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video Paper
6843 Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization Paper
3768 Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation Paper
2364 Location-Free Human Pose Estimation Paper
1083 MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation Paper
7104 Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision Paper
1897 Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors Paper
5115 PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound Paper
10409 Differentiable Dynamics for Articulated 3D Human Motion Reconstruction Paper
4352 COAP: Compositional Articulated Occupancy of People Paper
6849 Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video Paper
6924 SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration Paper
3094 MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video Paper
770 Putting People in Their Place: Monocular Regression of 3D People in Depth Paper
4288 FLAG: Flow-Based 3D Avatar Generation From Sparse Observations Paper
896 GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Paper
933 Capturing and Inferring Dense Full-Body Human-Scene Contact Paper
3301 BodyMap: Learning Full-Body Dense Correspondence Map Paper
1209 ICON: Implicit Clothed Humans Obtained From Normals Paper

Transfer_Low-Shot_Long-Tail Learning

Paper Id Paper Title Link
7748 Generating Representative Samples for Few-Shot Classification Paper
2919 Matching Feature Sets for Few-Shot Image Classification Paper
2525 Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations Paper
6602 Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection Paper
9011 Forward Compatible Few-Shot Class-Incremental Learning Paper
10780 Constrained Few-Shot Class-Incremental Learning Paper
9441 Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference Paper
9456 EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning Paper
10053 Few-Shot Learning With Noisy Labels Paper
7988 Ranking Distance Calibration for Cross-Domain Few-Shot Learning Paper
10614 Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning Paper
2507 Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning Paper
8242 Learning To Memorize Feature Hallucination for One-Shot Image Generation Paper
48 A Closer Look at Few-Shot Image Generation Paper
4470 Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition Paper
2309 Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability Paper
1534 Transferability Estimation Using Bhattacharyya Class Separability Paper
9832 Revisiting the Transferability of Supervised Pretraining: An MLP Perspective Paper
5990 Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data Paper
6400 Which Model To Transfer? Finding the Needle in the Growing Haystack Paper
7918 Does Robustness on ImageNet Transfer to Downstream Tasks? Paper
9779 What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors Paper
3815 OW-DETR: Open-World Detection Transformer Paper
9180 Unseen Classes at a Later Time? No Problem Paper
6901 Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism Paper
5542 On Generalizing Beyond Domains in Cross-Domain Continual Learning Paper
10123 Online Continual Learning on a Contaminated Data Stream With Blurry Task Boundaries Paper
2527 DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion Paper
544 Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning Paper
2321 En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning Paper
5161 VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning Paper
5950 Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning Paper
8438 KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning Paper
6727 Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis Paper
4846 WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery Paper

Motion, Tracking, Registration, Vision & X, and Theory

Paper Id Paper Title Link
1812 MeMOT: Multi-Object Tracking With Memory Paper
2326 Unsupervised Learning of Accurate Siamese Tracking Paper
1995 Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds Paper
3616 GMFlow: Learning Optical Flow via Global Matching Paper
10012 GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking Paper
3417 SNUG: Self-Supervised Neural Dynamic Garments Paper
6431 Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction Paper
10207 Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation Paper
4351 Context-Aware Sequence Alignment Using 4D Skeletal Augmentation Paper
10467 Enabling Equivariance for Arbitrary Lie Groups Paper
1089 RAMA: A Rapid Multicut Algorithm on GPU Paper
103 Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks Paper
6427 RCP: Recurrent Closest Point for Point Cloud Paper
6607 Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Paper
6810 Balanced Multimodal Learning via On-the-Fly Gradient Modulation Paper

3D from Multiview & Sensors, Learning for Vision, Explainable Vision, and Privacy

Paper Id Paper Title Link
3870 Block-NeRF: Scalable Large Scene Neural View Synthesis Paper
5472 SceneSqueezer: Learning To Compress Scene for Camera Relocalization Paper
7077 Light Field Neural Rendering Paper
8204 Extracting Triangular 3D Models, Materials, and Lighting From Images Paper
8722 Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3) Paper
1461 Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models Paper
6131 It's All in the Teacher: Zero-Shot Quantization Brought Closer to the Teacher Paper
6484 NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks Paper
7060 Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention Paper
5966 Parameter-Free Online Test-Time Adaptation Paper
10272 Patch-Level Representation Learning for Self-Supervised Vision Transformers Paper
11845 Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization Paper
9568 Mixed Differential Privacy in Computer Vision Paper
2663 DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis Paper
11405 Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning Paper

Computer Vision Theory

Paper Id Paper Title Link
11527 On the Instability of Relative Pose Estimation and RANSAC's Role Paper
1458 Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training Paper
2845 Global Sensing and Measurements Reuse for Image Compressed Sensing Paper
7248 Maximum Consensus by Weighted Influences of Monotone Boolean Functions Paper
8398 MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph Paper
6292 Styleformer: Transformer Based Generative Adversarial Networks With Style Vector Paper
9212 Scanline Homographies for Rolling-Shutter Plane Absolute Pose Paper

Self_Semi_Meta & Unsupervised Learning

Paper Id Paper Title Link
4675 Self-Supervised Models Are Continual Learners Paper
5592 The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization Paper
5983 Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning Paper
7932 SimMIM: A Simple Framework for Masked Image Modeling Paper
8651 Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning Paper
7363 UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning Paper
6763 Contrastive Conditional Neural Processes Paper
1945 One-Bit Active Query With Contrastive Pairs Paper
496 HCSC: Hierarchical Contrastive Selective Coding Paper
4560 Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging Paper
9291 Hierarchical Self-Supervised Representation Learning for Movie Understanding Paper
7239 Anomaly Detection via Reverse Distillation From One-Class Embedding Paper
8177 Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning Paper
3636 DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning Paper
5723 Learning To Collaborate in Decentralized Learning of Personalized Models Paper
8083 Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph Paper
1264 DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning Paper
1835 Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning Paper
1139 Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes Paper
1554 Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors Paper
2856 Spectral Unsupervised Domain Adaptation for Visual Recognition Paper
1408 DATA: Domain-Aware and Task-Aware Self-Supervised Learning Paper
2449 Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning Paper
4337 DeepDPM: Deep Clustering With an Unknown Number of Clusters Paper
7785 PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions Paper
9990 Robust Outlier Detection by De-Biasing VAE Likelihoods Paper
3489 Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data Paper
1420 CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Paper
10336 Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation Paper
3423 DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation Paper
8154 WildNet: Learning Domain Generalized Semantic Segmentation From the Wild Paper
5616 UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation Paper
4410 Semi-Supervised Semantic Segmentation With Error Localization Network Paper
621 Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation Paper
750 Integrative Few-Shot Learning for Classification and Segmentation Paper
4568 GanOrCon: Are Generative Models Useful for Few-Shot Segmentation? Paper
8214 SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis Paper
1055 CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs Paper

Privacy and Federated Learning

Paper Id Paper Title Link
93 GradViT: Gradient Inversion of Vision Transformers Paper
9396 Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings Paper
7502 CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning Paper
6925 APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers Paper
6650 Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning Paper
6121 Robust Federated Learning With Noisy and Heterogeneous Clients Paper
9724 Federated Learning With Position-Aware Neurons Paper
10112 Layer-Wised Model Aggregation for Personalized Federated Learning Paper
4369 FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning Paper
2897 FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction Paper
1250 Differentially Private Federated Learning With Local Regularization and Sparsification Paper
1234 Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage Paper
5568 Learn From Others and Be Yourself in Heterogeneous Federated Learning Paper
3953 RSCFed: Random Sampling Consensus Federated Semi-Supervised Learning Paper
2956 Federated Class-Incremental Learning Paper
7881 Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning Paper
8257 FedCorr: Multi-Stage Federated Learning for Label Noise Correction Paper
6027 ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning Paper

Explainable Computer Vision

Paper Id Paper Title Link
1096 Cycle-Consistent Counterfactuals by Latent Transformations Paper
5428 Consistent Explanations by Contrastive Learning Paper
6357 Towards Better Understanding Attribution Methods Paper
7285 Proto2Proto: Can You Recognize the Car, the Way I Do? Paper
7606 Do Explanations Explain? Model Knows Best Paper
7668 HINT: Hierarchical Neuron Concept Explainer Paper
7825 Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes Paper
7404 What Do Navigation Agents Learn About Their Environment? Paper
11789 A Framework for Learning Ante-Hoc Explainable Models via Concepts Paper
778 Exploiting Explainable Metrics for Augmented SGD Paper
8195 FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks Paper
10710 Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations Paper
6365 B-Cos Networks: Alignment Is All We Need for Interpretability Paper
4303 The Flag Median and FlagIRLS Paper

Transparency, Fairness, Accountability, Privacy & Ethics in Vision

Paper Id Paper Title Link
112 Learning Fair Classifiers With Partially Annotated Group Labels Paper
5065 Estimating Structural Disparities for Face Models Paper
6022 Estimating Example Difficulty Using Variance of Gradients Paper
6962 Fairness-Aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models Paper
9906 Fair Contrastive Learning for Facial Attribute Classification Paper
6582 Leveraging Adversarial Examples To Quantify Membership Information Leakage Paper
10915 Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers Paper
11713 Deep Unlearning via Randomized Conditionally Independent Hessians Paper
284 Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets Paper
11071 A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models Paper

Vision & X

Paper Id Paper Title Link
2658 Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices? Paper
649 Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation Paper
2266 SEEG: Semantic Energized Co-Speech Gesture Generation Paper
715 Mix and Localize: Localizing Sound Sources in Mixtures Paper
2204 Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation Paper
7217 IntentVizor: Towards Generic Query Guided Interactive Video Summarization Paper
11551 M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers Paper
5355 Finding Fallen Objects via Asynchronous Audio-Visual Integration Paper
6187 Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory Paper
6676 Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization Paper
10849 Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language Paper
11001 It's Time for Artistic Correspondence in Music and Video Paper
11391 Self-Supervised Object Detection From Audio-Visual Correspondence Paper
8361 More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Paper
2475 ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer Paper
7892 A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection Paper

Image & Video Synthesis and Generation (I)

Paper Id Paper Title Link
4818 Diffusion Autoencoders: Toward a Meaningful and Decodable Representation Paper
6519 Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps Paper
11253 Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values Paper
908 Ensembling Off-the-Shelf Models for GAN Training Paper
10490 Marginal Contrastive Correspondence for Guided Image Generation Paper
3437 GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation Paper
5452 High-Resolution Image Synthesis With Latent Diffusion Models Paper
3874 Vector Quantized Diffusion Model for Text-to-Image Synthesis Paper
5265 ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation Paper
790 Dataset Distillation by Matching Training Trajectories Paper
6337 Continual Predictive Learning From Videos Paper
11474 Motion-Adjustable Neural Implicit Video Representation Paper
2561 Splicing ViT Features for Semantic Appearance Transfer Paper
1064 MAT: Mask-Aware Transformer for Large Hole Image Inpainting Paper
2344 Day-to-Night Image Synthesis for Training Nighttime Neural ISPs Paper
5874 Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness Paper
3576 Few-Shot Head Swapping in the Wild Paper
5059 ClothFormer: Taming Video Virtual Try-On in All Module Paper

Human Pose Estimation & Tracking, Localization, and Object Pose Estimation

Paper Id Paper Title Link
4380 Adversarial Parametric Pose Prior Paper
4450 Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation Paper
4806 PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision Paper
2492 Generalizable Human Pose Triangulation Paper
1181 GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras Paper
1468 Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory Paper
4014 Contextual Instance Decoupling for Robust Multi-Person Pose Estimation Paper
2202 End-to-End Multi-Person Pose Estimation With Transformers Paper
4534 Meta Agent Teaming Active Learning for Pose Estimation Paper
3411 Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation Paper
6194 Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer Paper
8628 Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture Paper
1445 LASER: LAtent SpacE Rendering for 2D Visual Localization Paper
8152 Learning To Detect Scene Landmarks for Camera Localization Paper
4196 Geometric Transformer for Fast and Robust Point Cloud Registration Paper
7968 ARCS: Accurate Rotation and Correspondence Search Paper
3628 FisherMatch: Semi-Supervised Rotation Regression via Entropy-Based Filtering Paper
10439 Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation Paper

Efficient Learning & Inference

Paper Id Paper Title Link
5660 CAFE: Learning To Condense Dataset by Aligning Features Paper
9135 Lite-MDETR: A Lightweight Multi-Modal Detector Paper
703 DeeCap: Dynamic Early Exiting for Efficient Image Captioning Paper
10864 Searching the Deployable Convolution Neural Networks for GPUs Paper
6685 Active Learning by Feature Mixing Paper
6585 When To Prune? A Policy Towards Early Structural Pruning Paper
11185 Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning Paper
9318 How Well Do Sparse ImageNet Models Transfer? Paper
9388 Rep-Net: Efficient On-Device Learning via Feature Reprogramming Paper
4954 CHEX: CHannel EXploration for CNN Model Compression Paper
3533 HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks Paper
2934 AdaViT: Adaptive Vision Transformers for Efficient Image Recognition Paper
1772 Cross-Image Relational Knowledge Distillation for Semantic Segmentation Paper
724 Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error Paper
3958 IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization Paper
9796 DECORE: Deep Compression With Reinforcement Learning Paper
11195 Towards Efficient and Scalable Sharpness-Aware Minimization Paper
1088 AEGNN: Asynchronous Event-Based Graph Neural Networks Paper
4078 DiSparse: Disentangled Sparsification for Multitask Model Compression Paper
1836 Multi-Modal Extreme Classification Paper
11241 A Sampling-Based Approach for Efficient Clustering in Large Datasets Paper
11776 Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction Paper
6380 Learnable Lookup Table for Neural Network Quantization Paper
8374 Instance-Aware Dynamic Neural Network Quantization Paper
10529 Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation Paper
3265 Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction Paper
5233 Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation Paper
11646 PokeBNN: A Binary Pursuit of Lightweight Accuracy Paper
2031 Automated Progressive Learning for Efficient Training of Vision Transformers Paper
1417 DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos Paper
9190 Channel Balancing for Accurate Quantization of Winograd Convolutions Paper
9054 ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching Paper
8230 Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs Paper
4843 AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation Paper
9210 TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing Paper
185 SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems Paper
6367 TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed Paper

Physics-Based Vision and Shape-From-X

Paper Id Paper Title Link
793 DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation Paper
1453 Universal Photometric Stereo Network Using Global Lighting Contexts Paper
2355 Uncertainty-Aware Deep Multi-View Photometric Stereo Paper
5441 Fast Light-Weight Near-Field Photometric Stereo Paper
4990 Glass Segmentation Using Intensity and Spectral Polarization Cues Paper
1557 Shape From Polarization for Complex Scenes in the Wild Paper
6107 Deep Depth From Focus With Differential Focus Volume Paper
7381 Optimal LED Spectral Multiplexing for NIR2RGB Translation Paper
8076 Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements Paper
8196 NAN: Noise-Aware NeRFs for Burst-Denoising Paper
3129 Estimating Fine-Grained Noise Model via Contrastive Learning Paper
11094 Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders Paper
1021 MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution Paper
6350 PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images Paper

Visual Reasoning

Paper Id Paper Title Link
10159 Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors Paper
1531 Learning To Anticipate Future With Dynamic Context Removal Paper
11115 Self-Supervised Spatial Reasoning on Multi-View Line Drawings Paper
5634 Contextual Debiasing for Visual Recognition With Causal Mechanisms Paper

Security, Transparency, Fairness, Accountability, Privacy & Ethics in Vision

Paper Id Paper Title Link
3468 Adversarial Texture for Fooling Person Detectors in the Physical World Paper
4109 Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World Paper
5922 Enhancing Classifier Conservativeness and Robustness by Polynomiality Paper
5448 Backdoor Attacks on Self-Supervised Learning Paper
6583 Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks Paper
3994 Few-Shot Backdoor Defense Using Shapley Estimation Paper
10910 Better Trigger Inversion Optimization in Backdoor Scanning Paper
7051 Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees Paper
9002 Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching Paper
6908 LAS-AT: Adversarial Training With Learnable Attack Strategy Paper
4589 Subspace Adversarial Training Paper
5403 Pyramid Adversarial Training Improves ViT Performance Paper
12025 Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations Paper
2245 Robust Image Forgery Detection Over Online Social Network Shared Images Paper
6270 Quantifying Societal Bias Amplification in Image Captioning Paper

Image & Video Synthesis and Generation (II); Video Analysis & Understanding

Paper Id Paper Title Link
725 Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models Paper
706 GAN-Supervised Dense Visual Alignment Paper
8416 Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator Paper
8925 Text2Mesh: Text-Driven Neural Stylization for Meshes Paper
6649 StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation Paper
7720 Physical Simulation Layer for Accurate 3D Modeling Paper
717 Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time Paper
3579 Neural Texture Extraction and Distribution for Controllable Person Image Synthesis Paper
6545 I M Avatar: Implicit Morphable Head Avatars From Videos Paper
549 RCL: Recurrent Continuous Localization for Temporal Action Detection Paper
4317 Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection Paper
729 MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition Paper
9219 TubeR: Tubelet Transformer for Video Action Detection Paper
8613 MixFormer: End-to-End Tracking With Iterative Mixed Attention Paper

Recognition, Learning for Vision, and Robot Vision

Paper Id Paper Title Link
5905 DN-DETR: Accelerate DETR Training by Introducing Query DeNoising Paper
7010 Proper Reuse of Image Classification Features Improves Object Detection Paper
8646 Boosting 3D Object Detection by Simulating Multimodality on Point Clouds Paper
10578 TransVPR: Transformer-Based Place Recognition With Multi-Level Attention Aggregation Paper
9856 Disentangling Visual Embeddings for Attributes and Objects Paper
1856 QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection Paper
5517 Unknown-Aware Object Detection: Learning What You Don't Know From Videos in the Wild Paper
3247 Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks Paper
5972 Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective Paper
3349 Calibrating Deep Neural Networks by Pairwise Constraints Paper
7691 Lifelong Graph Learning Paper
11327 OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks Paper
10810 Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation Paper
11529 Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches Paper
9157 Cross-View Transformers for Real-Time Map-View Semantic Segmentation Paper

Self_Semi_Meta-, & Unsupervised Learning

Paper Id Paper Title Link
8542 Label Matching Semi-Supervised Object Detection Paper
10433 Multidimensional Belief Quantification for Label-Efficient Meta-Learning Paper
11752 Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples Paper
5537 Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification Paper
9804 Class-Aware Contrastive Semi-Supervised Learning Paper
4181 Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework Paper
10916 Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo Paper
2296 Learning Where To Learn in Cross-View Self-Supervised Learning Paper
2487 Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective Paper
2869 SimMatch: Semi-Supervised Learning With Similarity Matching Paper
540 Active Teacher for Semi-Supervised Object Detection Paper
943 Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection Paper
5807 Self-Supervised Learning of Object Parts for Semantic Segmentation Paper
4603 MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection Paper
6024 Scale-Equivalent Distillation for Semi-Supervised Object Detection Paper
6654 A Self-Supervised Descriptor for Image Copy Detection Paper
10678 Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut Paper
9521 CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification Paper
11648 Semi-Supervised Few-Shot Learning via Multi-Factor Clustering Paper
2306 CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning Paper
2589 Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data Paper
3172 A Simple Data Mixing Prior for Improving Self-Supervised Learning Paper
3375 DETReg: Unsupervised Pretraining With Region Priors for Object Detection Paper
4354 Sound and Visual Representation Learning With Multiple Pretraining Tasks Paper
4601 UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training Paper
1744 Weakly Supervised Object Localization As Domain Adaption Paper
7762 Debiased Learning From Naturally Imbalanced Pseudo-Labels Paper
3414 Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning Paper
1546 Masked Feature Prediction for Self-Supervised Visual Pre-Training Paper
6171 Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency Paper
11064 Id-Free Person Similarity Learning Paper
5962 End-to-End Semi-Supervised Learning for Video Action Detection Paper
11772 Probabilistic Representations for Video Contrastive Learning Paper
5904 Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition Paper
5668 BEVT: BERT Pretraining of Video Transformers Paper
7678 Generative Cooperative Learning for Unsupervised Video Anomaly Detection Paper
9976 When Does Contrastive Visual Representation Learning Work? Paper
596 The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization Paper
5267 What Matters for Meta-Learning Vision Regression Tasks? Paper

Robot Vision

Paper Id Paper Title Link
689 IFOR: Iterative Flow Minimization for Robotic Object Rearrangement Paper
2734 TCTrack: Temporal Contexts for Aerial Tracking Paper
2846 AKB-48: A Real-World Articulated Object Knowledge Base Paper
4440 3DAC: Learning Attribute Compression for Point Clouds Paper
4521 Simple but Effective: CLIP Embeddings for Embodied AI Paper
2359 Multi-Robot Active Mapping via Neural Bipartite Graph Matching Paper
2464 Continuous Scene Representations for Embodied AI Paper
2923 Interactron: Embodied Adaptive Object Detection Paper
1761 Online Learning of Reusable Abstract Models for Object Goal Navigation Paper
3195 RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization Paper
2684 UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation Paper
9736 Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation Paper
10404 Upright-Net: Learning Upright Orientation for 3D Point Cloud Paper

Computer Vision for Social Good

Paper Id Paper Title Link
7865 DeepFake Disrupter: The Detector of DeepFake Is My Friend Paper
3350 HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization Paper
7457 Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources Paper
9423 Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection Paper

Adversarial Attack & Defense

Paper Id Paper Title Link
8193 Transferable Sparse Adversarial Attack Paper
8898 Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection Paper
10026 Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability Paper
8063 Improving Adversarial Transferability via Neuron Attribution-Based Attacks Paper
10779 Complex Backdoor Detection by Symmetric Feature Differencing Paper
10243 Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer Paper
11148 Zero-Query Transfer Attacks on Context-Aware Object Detectors Paper
6302 360-Attack: Distortion-Aware Perturbations From Perspective-Views Paper
11210 Label-Only Model Inversion Attacks via Boundary Repulsion Paper
11207 Merry Go Round: Rotate a Frame and Fool a DNN Paper
1485 Cross-Modal Transferable Adversarial Attacks From Images to Videos Paper
10629 BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning Paper
11521 Investigating Top-k White-Box and Transferable Black-Box Attack Paper
7175 Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution Paper
2830 Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack Paper
3325 Towards Efficient Data Free Black-Box Adversarial Attack Paper
3931 Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network Paper
11451 Certified Patch Robustness via Smoothed Vision Transformers Paper
5540 Towards Practical Certifiable Patch Defense With Vision Transformer Paper
4282 On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles Paper
7361 3DeformRS: Certifying Spatial Deformations on Point Clouds Paper
4302 Stereoscopic Universal Perturbations Across Different Architectures and Datasets Paper
4407 Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations Paper
10883 Bounded Adversarial Attack on Deep Content Features Paper
9811 DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints Paper
10212 Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart Paper
10905 Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness Paper
7360 Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input Paper
2205 Adversarial Eigen Attack on Black-Box Models Paper
7620 Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond Paper
4422 Enhancing Adversarial Training With Second-Order Statistics of Weights Paper
9176 Towards Data-Free Model Stealing in a Hard Label Setting Paper
9218 Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients Paper
10096 DTA: Physical Camouflage Attacks Using Differentiable Transformation Network Paper
1841 Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity Paper
201 Enhancing Adversarial Robustness for Deep Metric Learning Paper
5230 Shape-Invariant 3D Adversarial Point Clouds Paper
5789 Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack by Natural Phenomenon Paper
6161 Exploring Effective Data for Surrogate Training Towards Black-Box Attack Paper
11698 NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models Paper
5970 Dual-Key Multimodal Backdoors for Visual Question Answering Paper
6546 Proactive Image Manipulation Detection Paper

Representation Learning

Paper Id Paper Title Link
2347 Unified Contrastive Learning in Image-Text-Label Space Paper
9927 AlignMixup: Improving Representations by Interpolating Aligned Features Paper
2419 On the Road to Online Adaptation for Semantic Image Segmentation Paper
5236 ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation Paper
3487 Kernelized Few-Shot Object Detection With Efficient Integral Aggregation Paper
186 Neural Mean Discrepancy for Efficient Out-of-Distribution Detection Paper
8477 A Structured Dictionary Perspective on Implicit Neural Representations Paper
10563 LARGE: Latent-Based Regression Through GAN Semantics Paper
6667 Rethinking Controllable Variational Autoencoders Paper
9016 Learning Canonical F-Correlation Projection for Compact Multiview Representation Paper
6288 Cross-Architecture Self-Supervised Video Representation Learning Paper
4418 Improving Video Model Transfer With Dynamic Representation Learning Paper
5928 Self-Supervised Image Representation Learning With Geometric Set Consistency Paper
246 HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging Paper
4037 Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling Paper
7362 DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds Paper
9356 Neural Convolutional Surfaces Paper
10032 Representing 3D Shapes With Probabilistic Directed Distance Fields Paper
3030 H4D: Human 4D Modeling by Learning Neural Compositional Representation Paper
518 Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification Paper
1275 Contrastive Regression for Domain Adaptation on Gaze Estimation Paper
9822 Forward Compatible Training for Large-Scale Embedding Retrieval Systems Paper
4945 Improving Subgraph Recognition With Variational Graph Information Bottleneck Paper
2508 Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss Paper
4145 Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species Paper

Computational Photography

Paper Id Paper Title Link
4111 Deep Stereo Image Compression via Bi-Directional Coding Paper
8934 RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion Paper
4213 Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer Paper
2554 Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels Paper
3021 SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization Paper
3470 Automatic Color Image Stitching Using Quaternion Rank-1 Alignment Paper
6712 SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing Paper
4112 Degree-of-Linear-Polarization-Based Color Constancy Paper
2170 Point Cloud Color Constancy Paper
265 Boosting View Synthesis With Residual Transfer Paper
5780 Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection Paper
4413 Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging Paper
479 PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition Paper
7596 Multimodal Material Segmentation Paper
6384 Occlusion-Aware Cost Constructor for Light Field Depth Estimation Paper
815 Learning Neural Light Fields With Ray-Space Embedding Paper
2268 Acquiring a Dynamic Light Field Through a Single-Shot Coded Image Paper
4415 Gravitationally Lensed Black Hole Emission Tomography Paper
5058 Deep Saliency Prior for Reducing Visual Distraction Paper
8388 Personalized Image Aesthetics Assessment With Rich Attributes Paper
6382 Artistic Style Discovery With Independent Components Paper

Scene Analysis & Understanding

Paper Id Paper Title Link
2004 Noisy Boundaries: Lemon or Lemonade for Semi-Supervised Instance Segmentation? Paper
5105 Partial Class Activation Attention for Semantic Segmentation Paper
7261 Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers Paper
4156 Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation Paper
7427 Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation Paper
4593 Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation Paper
1567 L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation Paper
573 Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data Paper
1307 Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation Paper
2748 Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation Paper
4586 MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation Paper
8233 NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night Paper
6032 Fast Point Transformer Paper
7468 RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior Paper
807 ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes Paper
1738 DisARM: Displacement Aware Relation Module for 3D Detection Paper
2722 Learning Object Context for Novel-View Scene Layout Generation Paper
2166 Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts Paper
348 Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image Paper
5927 Raw High-Definition Radar for Multi-Task Learning Paper
2343 Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation Paper
7169 UKPGAN: A General Self-Supervised Keypoint Detector Paper
5057 Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos Paper

Navigation & Autonomous Driving

Paper Id Paper Title Link
104 Rethinking Efficient Lane Detection via Curve Modeling Paper
5623 Exploiting Temporal Relations on Radar Perception for Autonomous Driving Paper
1321 Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective Paper
3631 BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement Paper
9886 ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning Paper
607 Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion Paper
1397 Vehicle Trajectory Prediction Works, but Not Everywhere Paper
9659 LTP: Lane-Based Trajectory Prediction for Autonomous Driving Paper
2468 ONCE-3DLanes: Building Monocular 3D Lane Detection Paper
10899 Towards Driving-Oriented Metric for Lane Detection Models Paper
6918 Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Paper
5120 LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection Paper
1664 DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection Paper
9131 A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation Paper
7091 Forecasting From LiDAR via Future Object Detection Paper
5998 RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding Paper
3364 Learning From All Vehicles Paper
10331 Is Mapping Necessary for Realistic PointGoal Navigation? Paper
9772 Symmetry-Aware Neural Architecture for Embodied Visual Exploration Paper
6482 Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles Paper
10621 Topology Preserving Local Road Network Estimation From Single Onboard Camera Image Paper
6744 Coupling Vision and Proprioception for Navigation of Legged Robots Paper
10063 Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation Paper
9391 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection Paper
4385 Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior Paper
2537 SelfD: Self-Learning Large-Scale Driving Policies From the Web Paper
5244 Towards Real-World Navigation With Deep Differentiable Planners Paper
10481 Privacy Preserving Partial Localization Paper
6490 Efficient Large-Scale Localization by Global Instance Recognition Paper
5459 CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data Paper

Vision & Graphics

Paper Id Paper Title Link
84 De-Rendering 3D Objects in the Wild Paper
4234 Neural Fields As Learnable Kernels for 3D Reconstruction Paper
3715 HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing Paper
2744 3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies Paper
2410 Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian Paper
9567 Deep Image-Based Illumination Harmonization Paper
1834 Glass: Geometric Latent Augmentation for Shape Spaces Paper
1559 PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes Paper
1478 Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes Paper
9364 Neural Mesh Simplification Paper
6486 SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters Paper
7818 CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation Paper
7841 UNIST: Unpaired Neural Implicit Shape Translation Network Paper
1800 CoNeRF: Controllable Neural Radiance Fields Paper
6407 Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling Paper
8338 Modeling Indirect Illumination for Inverse Rendering Paper
3519 Neural Head Avatars From Monocular RGB Videos Paper
2341 DeepCurrents: Learning Implicit Representations of Shapes With Boundaries Paper

Biometrics, Face & Gestures, and Medical Image Analysis

Paper Id Paper Title Link
4335 Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination Paper
5110 AnyFace: Free-Style Text-To-Face Synthesis and Manipulation Paper
5301 General Facial Representation Learning in a Visual-Linguistic Manner Paper
5269 Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection Paper
1219 Detecting Deepfakes With Self-Blended Images Paper
5967 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Paper
9638 Evaluation-Oriented Knowledge Distillation for Deep Face Recognition Paper
6682 AdaFace: Quality Adaptive Margin for Face Recognition Paper
6920 Moving Window Regression: A Novel Approach to Ordinal Regression Paper
10531 FaceFormer: Speech-Driven 3D Facial Animation With Transformers Paper
11053 Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in “In-the-Wild” Videos Paper
229 Deep Decomposition for Stochastic Normal-Abnormal Transport Paper
3114 DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification Paper
10426 Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification Paper
10994 Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations Paper

Scene & Shape Analysis and Understanding

Paper Id Paper Title Link
4710 VRDFormer: End-to-End Video Visual Relation Detection With Transformers Paper
720 Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation Paper
9896 Visual Acoustic Matching Paper
5847 The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation Paper
4283 Learning Multiple Dense Prediction Tasks From Partially Annotated Data Paper
9443 PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning Paper
5513 Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture Paper
5826 FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation Paper
3020 Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding Paper
11019 Equivariant Point Cloud Analysis via Learning Orientations for Message Passing Paper
2137 Surface Representation for Point Clouds Paper
3284 Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds Paper
3846 3D Common Corruptions and Data Augmentation Paper
4027 INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation Paper
11446 How Much Does Input Data Type Impact Final Face Model Accuracy? Paper

Datasets & Evaluation, Action & Event Recognition, and Visual Question Answering

Paper Id Paper Title Link
7484 Ego4D: Around the World in 3,000 Hours of Egocentric Video Paper
10504 TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting Paper
5075 Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding Paper
2465 vCLIMB: A Novel Video Class Incremental Learning Benchmark Paper
2221 Opening Up Open World Tracking Paper
1795 Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions Paper
8910 CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters Paper
11289 Failure Modes of Domain Generalization Algorithms Paper
9398 A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes Paper
6567 Grounding Answers for Visual Questions Asked by Visually Impaired People Paper
6719 Learning To Answer Questions in Dynamic Audio-Visual Scenarios Paper
1780 Episodic Memory Question Answering Paper
11561 ScanQA: 3D Question Answering for Spatial Scene Understanding Paper
5943 Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles Paper
8893 BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild Paper

Scene Analysis and Understanding

Paper Id Paper Title Link
98 Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation Paper
2849 Structured Sparse R-CNN for Direct Scene Graph Generation Paper
10248 PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation Paper
3738 RU-Net: Regularized Unrolling Network for Scene Graph Generation Paper
1142 Fine-Grained Predicates Learning for Scene Graph Generation Paper
3323 HL-Net: Heterophily Learning Network for Scene Graph Generation Paper
10227 SGTR: End-to-End Scene Graph Generation With Transformer Paper
6703 Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs Paper
8205 RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition Paper
10369 Spatial Commonsense Graph for Object Localisation in Partial Scenes Paper
4148 The Pedestrian Next to the Lamppost : Adaptive Object Graphs for Better Instantaneous Mapping Paper
7832 Category-Aware Transformer Network for Better Human-Object Interaction Detection Paper
7619 Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection Paper
3379 Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection Paper
10087 Human-Object Interaction Detection via Disentangled Transformer Paper
5684 MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection Paper
7237 GaTector: A Unified Framework for Gaze Object Prediction Paper
6242 STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes Paper
7926 Crowd Counting in the Frequency Domain Paper
3876 Boosting Crowd Counting via Multifaceted Attention Paper
6137 Rethinking Spatial Invariance of Convolutional Networks for Object Counting Paper
6322 Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Paper
5725 Collaborative Transformers for Grounded Situation Recognition Paper

Action and Event Recognition

Paper Id Paper Title Link
2817 Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Paper
5042 SVIP: Sequence VerIfication for Procedures in Videos Paper
3292 Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency Paper
5855 Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization Paper
3084 GateHUB: Gated History Unit With Background Suppression for Online Action Detection Paper
7477 E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition Paper
4495 Hybrid Relation Guided Set Matching for Few-Shot Action Recognition Paper
3385 Spatio-Temporal Relation Modeling for Few-Shot Action Recognition Paper
9787 Alignment-Uniformity Aware Representation Learning for Zero-Shot Video Classification Paper
1862 Cross-Modal Representation Learning for Zero-Shot Action Recognition Paper
6938 Cross-Modal Background Suppression for Audio-Visual Event Localization Paper
3142 Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization Paper
1068 An Empirical Study of End-to-End Temporal Action Detection Paper
11191 Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval Paper
9295 DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition Paper
730 MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection Paper
2917 Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition Paper
9674 AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition Paper
5856 UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection Paper
3946 Detector-Free Weakly Supervised Group Activity Recognition Paper
2870 Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading Paper
2752 Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer Paper
517 Interactiveness Field in Human-Object Interactions Paper
2258 GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection Paper
7690 Object-Relation Reasoning Graph for Action Recognition Paper
4315 UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection Paper
1483 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition Paper
9379 SPAct: Self-Supervised Privacy Preservation for Action Recognition Paper
818 Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering Paper
28 InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition Paper
11846 Learning Video Representations of Human Motion From Synthetic Data Paper
6314 Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos Paper

Biometrics

Paper Id Paper Title Link
1752 EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images Paper
3373 Gait Recognition in the Wild With Dense 3D Representations and a Benchmark Paper
1403 Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification Paper
3765 Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition Paper
9404 DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover's Distance Improves Out-of-Distribution Face Identification Paper
1311 Learning Second Order Local Anomaly for General Face Forgery Detection Paper
4821 PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition Paper
10637 Face2Exp: Combating Data Biases for Facial Expression Recognition Paper
11994 Local-Adaptive Face Recognition via Graph-Based Meta-Clustering and Regularized Adaptation Paper

Face and Gestures

Paper Id Paper Title Link
4811 EMOCA: Emotion Driven Monocular Face Capture and Animation Paper
6513 Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality Paper
2290 FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset Paper
4969 ImFace: A Nonlinear 3D Morphable Face Model With Implicit Neural Representations Paper
3883 Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling Paper
736 RigNeRF: Fully Controllable Neural 3D Portraits Paper
5362 HeadNeRF: A Real-Time NeRF-Based Parametric Head Model Paper
7738 Sparse to Dense Dynamic 3D Facial Expression Generation Paper
812 Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion Paper
7201 Speech Driven Tongue Animation Paper
6728 Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition Paper
980 gDNA: Towards Generative Detailed Neural Avatars Paper
1874 GraFormer: Graph-Oriented Transformer for 3D Pose Estimation Paper
10976 Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation Paper
501 Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis Paper
2836 PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence Paper
4356 The Wanderings of Odysseus in 3D Scenes Paper
6883 OSSO: Obtaining Skeletal Shape From Outside Paper
11477 LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds Paper
3402 Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression Paper
2046 Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation Paper
6216 LISA: Learning Implicit Shape and Appearance of Hands Paper
3384 MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image Paper
5835 Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation Paper
7098 Low-Resource Adaptation for Personalized Co-Speech Gesture Generation Paper
921 D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions Paper

Medical, Biological and Cell Microscopy

Paper Id Paper Title Link
1104 Synthetic Generation of Face Videos With Plethysmograph Physiology Paper
9240 Contour-Hugging Heatmaps for Landmark Detection Paper
4486 Which Images To Label for Few-Shot Medical Landmark Detection? Paper
4473 Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography Paper
8680 Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization Paper
8210 Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution Paper
6627 Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations Paper
3856 Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation Paper
6449 BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation Paper
6189 Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis Paper
9027 Towards Low-Cost and Efficient Malaria Detection Paper
5588 ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification Paper
2696 Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification Paper
9084 M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer Paper
121 Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis Paper
10799 HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet Paper
2649 DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations Paper
10420 Clean Implicit 3D Structure From Noisy 2D STEM Images Paper
7672 Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks Paper
4123 Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment Paper
3819 Learning Optimal K-Space Acquisition and Reconstruction Using Physics-Informed Neural Networks Paper
2466 NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration Paper
4362 SMPL-A: Modeling Person-Specific Deformable Anatomy Paper
1830 DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis Paper
1826 Affine Medical Image Registration With Coarse-To-Fine Vision Transformer Paper
9880 Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow Paper
1002 Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization Paper
6023 Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation Paper
6328 FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis Paper
8360 Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning Paper
5948 CellTypeGraph: A New Geometric Computer Vision Benchmark Paper
8619 ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics Paper

Datasets and Evaluation

Paper Id Paper Title Link
378 FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos Paper
675 Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions Paper
10327 DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image Paper
583 OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction Paper
9029 PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking Paper
6336 Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification Paper
3564 JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection Paper
1672 DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Paper
9778 Egocentric Prediction of Action Target in 3D Paper
1950 HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction Paper
10801 Amodal Panoptic Segmentation Paper
8175 Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark Paper
4070 YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset Paper
9179 The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting Paper
10392 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos Paper
8328 AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval Paper
4732 A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection Paper
2077 Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities Paper
8123 Optimal Correction Cost for Object Detection Evaluation Paper
7936 GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains Paper
6061 ABO: Dataset and Benchmarks for Real-World 3D Object Understanding Paper
11100 Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation Paper
4346 ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes Paper
4313 DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation Paper
4793 Open Challenges in Deep Stereo: The Booster Dataset Paper
2647 No-Reference Point Cloud Quality Assessment via Domain Adaptation Paper
1637 Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network Paper
2810 How Good Is Aesthetic Ability of a Fashion Model? Paper
656 Instance-Wise Occlusion and Depth Orders in Natural Scenes Paper
7655 PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects Paper
436 Replacing Labeled Real-Image Datasets With Auto-Generated Contours Paper
7315 V2C: Visual Voice Cloning Paper
6786 M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining Paper
11067 It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection Paper
4520 From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering Paper
718 Point Cloud Pre-Training With Natural 3D Structures Paper
1658 The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift Paper
9913 AutoMine: An Unmanned Mine Dataset Paper
11097 SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis Paper
4797 BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations Paper
2027 Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task Paper
8222 Unifying Panoptic Segmentation for Autonomous Driving Paper
10407 DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection Paper
3296 SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation Paper
11670 Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions Paper

About

Papers and Code from CVPR 2022, including scripts to extract them

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages