Papers and Code from CVPR 2022, including scripts to extract them
Paper Id | Paper Title | Link |
---|---|---|
11954 | Efficient Deep Embedded Subspace Clustering | Paper |
11402 | Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers | Paper |
9445 | CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data | Paper |
8776 | Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning | Paper |
6978 | Active Learning for Open-Set Annotation | Paper |
9075 | Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training | Paper |
6601 | Robust Optimization As Data Augmentation for Large-Scale Graphs | Paper |
6298 | A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty | Paper |
6106 | The Devil Is in the Margin: Margin-Based Label Smoothing for Network Calibration | Paper |
6705 | Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector | Paper |
10071 | GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning | Paper |
7829 | Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning | Paper |
5988 | A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration | Paper |
2503 | Learning To Learn by Jointly Optimizing Neural Architecture and Weights | Paper |
9806 | Learning To Prompt for Continual Learning | Paper |
2016 | Meta-Attention for ViT-Backed Continual Learning | Paper |
1343 | Multi-Frame Self-Supervised Depth With Transformers | Paper |
10018 | Continual Learning With Lifelong Vision Transformer | Paper |
780 | Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation | Paper |
4874 | Revisiting Random Channel Pruning for Neural Network Compression | Paper |
8330 | Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase | Paper |
9551 | Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning | Paper |
10484 | Towards Robust and Reproducible Active Learning Using Neural Networks | Paper |
7082 | Non-Iterative Recovery From Nonlinear Observations Using Generative Models | Paper |
11093 | Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders | Paper |
4542 | Robust Combination of Distributed Gradients Under Adversarial Perturbations | Paper |
11143 | Do Learned Representations Respect Causal Relationships? | Paper |
11220 | How Much More Data Do I Need? Estimating Requirements for Downstream Tasks | Paper |
8156 | Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees | Paper |
11131 | Contrastive Test-Time Adaptation | Paper |
448 | AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation | Paper |
1561 | Selective-Supervised Contrastive Learning With Noisy Labels | Paper |
7807 | RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks | Paper |
3279 | Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction | Paper |
Paper Id | Paper Title | Link |
---|---|---|
3348 | Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels | Paper |
7912 | Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design | Paper |
8877 | Learning Structured Gaussians To Approximate Deep Ensembles | Paper |
11673 | Out-of-Distribution Generalization With Causal Invariant Transformations | Paper |
8393 | Split Hierarchical Variational Compression | Paper |
9244 | Implicit Feature Decoupling With Depthwise Quantization | Paper |
282 | Understanding Uncertainty Maps in Vision With Statistical Testing | Paper |
Paper Id | Paper Title | Link |
---|---|---|
785 | A Hybrid Quantum-Classical Algorithm for Robust Fitting | Paper |
5911 | A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching | Paper |
6021 | FastDOG: Fast Discrete Optimization on GPU | Paper |
9232 | Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization | Paper |
10092 | AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks | Paper |
11171 | Training Quantised Neural Networks With STE Variants: The Additive Noise Annealing Algorithm | Paper |
2028 | AME: Attention and Memory Enhancement in Hyper-Parameter Optimization | Paper |
11189 | Efficient Maximal Coding Rate Reduction by Variational Forms | Paper |
10155 | A Unified Framework for Implicit Sinkhorn Differentiation | Paper |
6845 | Computing Wasserstein-p Distance Between Images With Linear Cost | Paper |
9064 | An Iterative Quantum Approach for Transformation Estimation From Point Sets | Paper |
Paper Id | Paper Title | Link |
---|---|---|
116 | Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training? | Paper |
5389 | BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule | Paper |
7704 | Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search | Paper |
4143 | Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search | Paper |
5167 | GreedyNASv2: Greedier Search With a Greedy Path Filter | Paper |
1115 | Neural Architecture Search With Representation Mutual Information | Paper |
7148 | Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search | Paper |
8841 | Knowledge Distillation With the Reused Teacher Classifier | Paper |
2812 | Self-Distillation From the Last Mini-Batch for Consistency Regularization | Paper |
142 | Decoupled Knowledge Distillation | Paper |
7053 | Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs | Paper |
123 | A ConvNet for the 2020s | Paper |
7254 | Beyond Fixation: Dynamic Window Visual Transformer | Paper |
7867 | Lite Vision Transformer With Enhanced Self-Attention | Paper |
7428 | Swin Transformer V2: Scaling Up Capacity and Resolution | Paper |
4325 | The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy | Paper |
9412 | MulT: An End-to-End Multitask Learning Transformer | Paper |
3664 | Towards Robust Vision Transformer | Paper |
9773 | DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers | Paper |
2434 | MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens | Paper |
1032 | NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition | Paper |
2029 | TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation | Paper |
4853 | Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation | Paper |
10350 | Scaling Vision Transformers | Paper |
7298 | Bridged Transformer for Vision and Point Cloud 3D Object Detection | Paper |
1981 | CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows | Paper |
3562 | TransMix: Attend To Mix for Vision Transformers | Paper |
2388 | MiniViT: Compressing Vision Transformers With Weight Multiplexing | Paper |
11460 | Fine-Tuning Image Transformers Using Learnable Memory | Paper |
4430 | Patch Slimming for Efficient Vision Transformers | Paper |
5093 | CMT: Convolutional Neural Networks Meet Vision Transformers | Paper |
6795 | Multimodal Token Fusion for Vision Transformers | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2257 | Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation | Paper |
8811 | Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model | Paper |
184 | Sign Language Video Retrieval With Free-Form Textual Queries | Paper |
11661 | FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback | Paper |
4918 | Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation | Paper |
9957 | ESCNet: Gaze Target Detection With the Understanding of 3D Scenes | Paper |
2489 | Interactive Multi-Class Tiny-Object Detection | Paper |
9614 | Weakly Supervised Rotation-Invariant Aerial Object Detection Network | Paper |
8402 | Large Loss Matters in Weakly Supervised Multi-Label Classification | Paper |
8000 | MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning | Paper |
1233 | FreeSOLO: Learning To Segment Objects Without Annotations | Paper |
2645 | Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection | Paper |
3784 | SIOD: Single Instance Annotated per Category per Image for Object Detection | Paper |
4574 | Towards Robust Adaptive Object Detection Under Noisy Annotations | Paper |
3139 | Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection | Paper |
3751 | Salvage of Supervision in Weakly Supervised Object Detection | Paper |
6430 | Label, Verify, Correct: A Simple Few Shot Object Detection Method | Paper |
944 | Background Activation Suppression for Weakly Supervised Object Localization | Paper |
4063 | Bridging the Gap Between Classification and Localization for Weakly Supervised Object Localization | Paper |
2560 | Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery | Paper |
6708 | Cloth-Changing Person Re-Identification From a Single Image With Gait Prediction and Regularization | Paper |
1508 | Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation | Paper |
4122 | Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification | Paper |
10049 | Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification | Paper |
7097 | Towards Total Recall in Industrial Anomaly Detection | Paper |
1207 | H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection | Paper |
4192 | Geometric and Textural Augmentation for Domain Gap Reduction | Paper |
10135 | General Incremental Learning With Domain-Aware Categorical Representations | Paper |
491 | DST: Dynamic Substitute Training for Data-Free Black-Box Attack | Paper |
8711 | ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation | Paper |
Paper Id | Paper Title | Link |
---|---|---|
6126 | Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation | Paper |
2899 | Generalized Few-Shot Semantic Segmentation | Paper |
9018 | Learning Non-Target Knowledge for Few-Shot Semantic Segmentation | Paper |
4783 | Decoupling Zero-Shot Semantic Segmentation | Paper |
1590 | Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation | Paper |
1034 | ContrastMask: Contrastive Learning To Segment Every Thing | Paper |
7789 | The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference | Paper |
2539 | AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation | Paper |
1707 | APES: Articulated Part Extraction From Sprite Sheets | Paper |
2544 | GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation | Paper |
6790 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision | Paper |
5602 | Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images | Paper |
3446 | C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image | Paper |
6268 | CRIS: CLIP-Driven Referring Image Segmentation | Paper |
7820 | MatteFormer: Transformer-Based Image Matting via Prior-Tokens | Paper |
3851 | Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation | Paper |
3405 | Pyramid Grafting Network for One-Stage High Resolution Saliency Detection | Paper |
2123 | Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection | Paper |
4573 | Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation | Paper |
5002 | GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings | Paper |
587 | Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport | Paper |
3312 | CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly | Paper |
1743 | RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures | Paper |
3978 | Discovering Objects That Can Move | Paper |
2604 | PatchFormer: An Efficient Point Transformer With Patch Attention | Paper |
4099 | Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap | Paper |
3933 | SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation | Paper |
4983 | An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation | Paper |
7469 | Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation | Paper |
4583 | Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders | Paper |
Paper Id | Paper Title | Link |
---|---|---|
41 | 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation | Paper |
4391 | Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction | Paper |
6088 | DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation | Paper |
2989 | MonoGround: Detecting Monocular 3D Objects From the Ground | Paper |
2686 | 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow | Paper |
2657 | Toward Practical Monocular Indoor Depth Estimation | Paper |
4692 | Focal Length and Object Pose Estimation via Render and Compare | Paper |
6311 | CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields | Paper |
2116 | Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images | Paper |
8082 | Layered Depth Refinement With Mask Guidance | Paper |
1031 | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction | Paper |
931 | BARC: Learning To Regress 3D Dog Shape From Images by Exploiting Breed Information | Paper |
8688 | Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving | Paper |
816 | What's in Your Hands? 3D Reconstruction of Generic Objects in Hands | Paper |
7814 | 3D Moments From Near-Duplicate Photos | Paper |
5766 | Neural Window Fully-Connected CRFs for Monocular Depth Estimation | Paper |
9095 | PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors | Paper |
3717 | CroMo: Cross-Modal Learning for Monocular Depth Estimation | Paper |
258 | f-SfT: Shape-From-Template With a Physics-Based Deformation Model | Paper |
923 | Human-Aware Object Placement for Visual Environment Reconstruction | Paper |
11298 | AutoRF: Learning 3D Object Radiance Fields From Single View Observations | Paper |
7080 | Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation | Paper |
2163 | MonoScene: Monocular 3D Semantic Scene Completion | Paper |
12016 | GenDR: A Generalized Differentiable Renderer | Paper |
4069 | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer | Paper |
7078 | ROCA: Robust CAD Model Retrieval and Alignment From a Single Image | Paper |
Paper Id | Paper Title | Link |
---|---|---|
971 | HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening | Paper |
990 | Revisiting Near/Remote Sensing With Geospatial Attention | Paper |
2718 | Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening | Paper |
3511 | Mutual Information-Driven Pan-Sharpening | Paper |
3982 | Sparse and Complete Latent Organization for Geospatial Semantic Segmentation | Paper |
5907 | The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions | Paper |
4025 | Oriented RepPoints for Aerial Object Detection | Paper |
6403 | Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction | Paper |
8986 | PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images | Paper |
8832 | Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites | Paper |
Paper Id | Paper Title | Link |
---|---|---|
163 | Bilateral Video Magnification Filter | Paper |
4527 | Neural Data-Dependent Transform for Learned Image Compression | Paper |
4329 | Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence | Paper |
4093 | Deep Generalized Unfolding Networks for Image Restoration | Paper |
3967 | Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling | Paper |
9885 | XYDeblur: Divide and Conquer for Single Image Deblurring | Paper |
8572 | Abandoning the Bayer-Filter To See in the Dark | Paper |
9293 | RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution | Paper |
8149 | All-in-One Image Restoration for Unknown Corruption | Paper |
9697 | Modeling sRGB Camera Noise With Normalizing Flows | Paper |
3788 | A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift | Paper |
1431 | Video Frame Interpolation Transformer | Paper |
1412 | The Devil Is in the Details: Window-Based Attention for Image Compression | Paper |
1176 | Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction | Paper |
3387 | RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs | Paper |
3051 | AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement | Paper |
2882 | HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging | Paper |
2182 | HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging | Paper |
6342 | Learning To Zoom Inside Camera Imaging Pipeline | Paper |
335 | Towards an End-to-End Framework for Flow-Guided Video Inpainting | Paper |
2141 | Context-Aware Video Reconstruction for Rolling Shutter Cameras | Paper |
5516 | CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image | Paper |
4529 | Global Matching With Overlapping Attention for Optical Flow Estimation | Paper |
1482 | CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow | Paper |
1048 | Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression | Paper |
4286 | Video Demoiréing With Relation-Based Temporal Consistency | Paper |
6635 | Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images | Paper |
5086 | Deep Constrained Least Squares for Blind Image Super-Resolution | Paper |
12027 | Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model | Paper |
5762 | Unsupervised Homography Estimation With Coplanarity-Aware GAN | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2656 | Self-Supervised Keypoint Discovery in Behavioral Videos | Paper |
874 | Learning To Align Sequential Actions in the Wild | Paper |
7245 | Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination | Paper |
4809 | End-to-End Human-Gaze-Target Detection With Transformers | Paper |
7132 | Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis | Paper |
9590 | MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction | Paper |
10192 | Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction | Paper |
7946 | End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps | Paper |
754 | Learning Affordance Grounding From Exocentric Images | Paper |
Paper Id | Paper Title | Link |
---|---|---|
1915 | 3D Scene Painting via Semantic Image Synthesis | Paper |
6370 | Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography | Paper |
2264 | ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection | Paper |
2112 | Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches | Paper |
5892 | Image Disentanglement Autoencoder for Steganography Without Embedding | Paper |
1885 | Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection | Paper |
5934 | Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning | Paper |
1616 | Density-Preserving Deep Point Cloud Compression | Paper |
9360 | Graph-Context Attention Networks for Size-Varied Deep Graph Matching | Paper |
968 | TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions | Paper |
1872 | ObjectFormer for Image Manipulation Detection and Localization | Paper |
7760 | Sequential Voting With Relational Box Fields for Active Object Detection | Paper |
6580 | Efficient Classification of Very Large Images With Tiny Objects | Paper |
6468 | Partially Does It: Towards Scene-Level FG-SBIR With Partial Input | Paper |
6025 | Long-Term Visual Map Sparsification With Heterogeneous GNN | Paper |
141 | Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association | Paper |
10095 | DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation | Paper |
3621 | Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring | Paper |
1079 | Rethinking Image Cropping: Exploring Diverse Compositions From Global Views | Paper |
1680 | Defensive Patches for Robust Recognition in the Physical World | Paper |
8380 | Semi-Supervised Video Paragraph Grounding With Contrastive Encoder | Paper |
5336 | Large-Scale Pre-Training for Person Re-Identification With Noisy Labels | Paper |
1146 | Meta Distribution Alignment for Generalizable Person Re-Identification | Paper |
5429 | FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction | Paper |
2926 | It's About Time: Analog Clock Reading in the Wild | Paper |
9312 | Consistency Driven Sequential Transformers Attention Model for Partially Observable Scenes | Paper |
9923 | SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles | Paper |
9662 | Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers | Paper |
9541 | Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving | Paper |
Paper Id | Paper Title | Link |
---|---|---|
1811 | UnweaveNet: Unweaving Activity Stories | Paper |
7769 | Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos | Paper |
1319 | Audio-Adaptive Activity Recognition Across Video Domains | Paper |
6385 | Frame-Wise Action Representations for Long Videos via Sequence Contrastive Learning | Paper |
9349 | Image Based Reconstruction of Liquids From 2D Surface Detections | Paper |
1579 | Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency | Paper |
6891 | How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs | Paper |
2102 | Programmatic Concept Learning for Human Motion Description and Synthesis | Paper |
4326 | Learning To Recognize Procedural Activities With Distant Supervision | Paper |
6761 | Implicit Motion Handling for Video Camouflaged Object Detection | Paper |
11553 | Dynamic Scene Graph Generation via Anticipatory Pre-Training | Paper |
1845 | Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization | Paper |
3930 | OCSampler: Compressing Videos to One Clip With Single-Step Sampling | Paper |
5670 | A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting | Paper |
3981 | TubeFormer-DeepLab: Video Mask Transformer | Paper |
2673 | ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization | Paper |
2928 | GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation | Paper |
8639 | STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction | Paper |
3656 | Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos | Paper |
5386 | End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection | Paper |
5430 | Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision | Paper |
2018 | Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement | Paper |
6082 | A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information | Paper |
8138 | Long-Short Temporal Contrastive Learning of Video Transformers | Paper |
4525 | Scene Consistency Representation Learning for Video Scene Segmentation | Paper |
1024 | Unsupervised Pre-Training for Temporal Action Localization Tasks | Paper |
7000 | Contrastive Learning for Unsupervised Video Highlight Detection | Paper |
8133 | Deformable Video Transformer | Paper |
8415 | Recurring the Transformer for Video Action Recognition | Paper |
Paper Id | Paper Title | Link |
---|---|---|
5438 | Text to Image Generation With Semantic-Spatial Aware GAN | Paper |
107 | StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis | Paper |
5345 | Blended Diffusion for Text-Driven Editing of Natural Images | Paper |
5128 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions | Paper |
5317 | Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model | Paper |
10144 | A Style-Aware Discriminator for Controllable Image Translation | Paper |
8904 | Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint | Paper |
10441 | Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks | Paper |
8356 | FlexIT: Towards Flexible Semantic Image Translation | Paper |
4022 | Modulated Contrast for Versatile Image Synthesis | Paper |
8146 | QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation | Paper |
9818 | Self-Supervised Dense Consistency Regularization for Image-to-Image Translation | Paper |
155 | Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation | Paper |
2538 | InstaFormer: Instance-Aware Image-to-Image Translation With Transformer | Paper |
648 | Unsupervised Image-to-Image Translation With Generative Prior | Paper |
133 | StylizedNeRF: Consistent 3D Scene Stylization As Stylized NeRF via 2D-3D Mutual Learning | Paper |
30 | NeRF-Editing: Geometry Editing of Neural Radiance Fields | Paper |
8276 | GeoNeRF: Generalizing NeRF With Geometry Priors | Paper |
5276 | Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation | Paper |
10588 | AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields | Paper |
5174 | HDR-NeRF: High Dynamic Range Neural Radiance Fields | Paper |
3703 | NeRFReN: Neural Radiance Fields With Reflections | Paper |
4368 | Neural Point Light Fields | Paper |
697 | 3D-Aware Image Synthesis via Learning Structural and Textural Representations | Paper |
6895 | GIRAFFE HD: A High-Resolution 3D-Aware Generative Model | Paper |
1474 | Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis | Paper |
5736 | Bi-Level Doubly Variational Learning for Energy-Based Latent Variable Models | Paper |
5811 | High-Resolution Image Harmonization via Collaborative Dual Transformations | Paper |
10156 | Brain-Supervised Image Editing | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4047 | HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network | Paper |
6529 | Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC | Paper |
8566 | Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning | Paper |
5578 | Enhancing Face Recognition With Self-Supervised 3D Reconstruction | Paper |
5996 | Learning To Learn Across Diverse Data Biases in Deep Face Recognition | Paper |
7320 | An Efficient Training Approach for Very Large Scale Face Recognition | Paper |
4045 | MogFace: Towards a Deeper Appreciation on Face Detection | Paper |
7382 | Exploring Frequency Adversarial Attacks for Face Forgery Detection | Paper |
7163 | End-to-End Reconstruction-Classification Learning for Face Forgery Detection | Paper |
3804 | Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing | Paper |
9981 | Privacy-Preserving Online AutoML for Domain-Specific Face Detection | Paper |
891 | Simulated Adversarial Testing of Face Recognition Models | Paper |
5782 | Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing | Paper |
2510 | Towards Semi-Supervised Deep Facial Expression Recognition With an Adaptive Confidence Margin | Paper |
5638 | Towards Accurate Facial Landmark Detection via Cascaded Transformers | Paper |
3038 | PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer | Paper |
5557 | GazeOnce: Real-Time Multi-Person Gaze Estimation | Paper |
3783 | Generalizing Gaze Estimation With Rotation Consistency | Paper |
4512 | Face Relighting With Geometrically Consistent Shadows | Paper |
2485 | HairMapper: Removing Hair From Portraits Using GANs | Paper |
5664 | Learning To Restore 3D Face From In-the-Wild Degraded Images | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2898 | Open-Set Text Recognition via Character-Context Decoupling | Paper |
3331 | Neural Collaborative Graph Machines for Table Structure Recognition | Paper |
4051 | Revisiting Document Image Dewarping by Grid Regularization | Paper |
4161 | Syntax-Aware Network for Handwritten Mathematical Expression Recognition | Paper |
4743 | Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection | Paper |
5258 | Fourier Document Restoration for Robust Document Dewarping and Recognition | Paper |
6276 | XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding | Paper |
7348 | SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition | Paper |
2703 | Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer | Paper |
3686 | TableFormer: Table Structure Understanding With Transformers | Paper |
8352 | Knowledge Mining With Scene Text for Fine-Grained Recognition | Paper |
11454 | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2043 | Towards Implicit Text-Guided 3D Shape Generation | Paper |
9380 | Towards Language-Free Training for Text-to-Image Generation | Paper |
7612 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic | Paper |
4952 | EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching | Paper |
7374 | Hierarchical Modular Network for Video Captioning | Paper |
3770 | SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning | Paper |
3222 | End-to-End Generative Pretraining for Multimodal Video Captioning | Paper |
4855 | Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning | Paper |
8115 | Scaling Up Vision-Language Pre-Training for Image Captioning | Paper |
9270 | Comprehending and Ordering Semantics for Image Captioning | Paper |
11498 | NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge | Paper |
814 | Injecting Semantic Concepts Into End-to-End Image Captioning | Paper |
1613 | DIFNet: Boosting Visual Information Flow for Image Captioning | Paper |
8224 | VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning | Paper |
7848 | Show, Deconfound and Tell: Image Captioning With Causal Inference | Paper |
9257 | EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval | Paper |
11667 | CLIPstyler: Image Style Transfer With a Single Text Condition | Paper |
4042 | HairCLIP: Design Your Hair by Text and Reference Image | Paper |
1965 | DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting | Paper |
11622 | On Guiding Visual Attention With Language Specification | Paper |
9610 | UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog | Paper |
10953 | Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer | Paper |
10338 | LiT: Zero-Shot Transfer With Locked-Image Text Tuning | Paper |
851 | GroupViT: Semantic Segmentation Emerges From Text Supervision | Paper |
1404 | ReSTR: Convolution-Free Referring Image Segmentation Using Transformers | Paper |
1565 | LAVT: Language-Aware Vision Transformer for Referring Image Segmentation | Paper |
7782 | An Empirical Study of Training End-to-End Vision-and-Language Transformers | Paper |
7761 | Are Multimodal Transformers Robust to Missing Modality? | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4834 | NeurMiPs: Neural Mixture of Planar Experts for View Synthesis | Paper |
4419 | FWD: Real-Time Novel View Synthesis With Forward Warping and Depth | Paper |
441 | SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images | Paper |
11049 | Fast, Accurate and Memory-Efficient Partial Permutation Synchronization | Paper |
2015 | Learning To Find Good Models in RANSAC | Paper |
9080 | Optimizing Elimination Templates by Greedy Parameter Search | Paper |
11523 | GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision | Paper |
2580 | HARA: A Hierarchical Approach for Robust Rotation Averaging | Paper |
4166 | RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging | Paper |
11316 | A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors | Paper |
4211 | ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance | Paper |
6651 | Self-Supervised Neural Articulated Shape and Appearance Models | Paper |
6645 | Virtual Elastic Objects | Paper |
3282 | Decoupling Makes Weakly Supervised Local Feature Better | Paper |
1667 | JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints | Paper |
640 | ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging | Paper |
9217 | DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering | Paper |
8789 | Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis | Paper |
1269 | Structured Local Radiance Fields for Human Avatar Modeling | Paper |
4685 | High-Fidelity Human Avatars From a Single RGB Camera | Paper |
5827 | Forecasting Characteristic 3D Poses of Human Actions | Paper |
817 | Virtual Correspondence: Humans as a Cue for Extreme-View Geometry | Paper |
869 | BEHAVE: Dataset and Method for Tracking Human Object Interactions | Paper |
3549 | Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives | Paper |
8956 | RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation | Paper |
9005 | NPBG++: Accelerating Neural Point-Based Graphics | Paper |
5409 | Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows | Paper |
875 | Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos | Paper |
Paper Id | Paper Title | Link |
---|---|---|
8292 | TransforMatcher: Match-to-Match Attention for Semantic Correspondence | Paper |
1610 | Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences | Paper |
2606 | Locality-Aware Inter– and Intra-Video Reconstruction for Self-Supervised Correspondence Learning | Paper |
6011 | Transforming Model Prediction for Tracking | Paper |
10078 | Ranking-Based Siamese Visual Tracking | Paper |
3860 | Correlation-Aware Deep Tracking | Paper |
3825 | Global Tracking via Ensemble of Local Trackers | Paper |
909 | Global Tracking Transformers | Paper |
1198 | Unified Transformer Tracker for Object Tracking | Paper |
9651 | Transformer Tracking With Cyclic Shifting Window Attention | Paper |
7487 | Spiking Transformers for Event-Based Single Object Tracking | Paper |
6379 | Adiabatic Quantum Computing for Multi Object Tracking | Paper |
8065 | HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction | Paper |
2493 | Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking | Paper |
9395 | TrackFormer: Multi-Object Tracking With Transformers | Paper |
4294 | Learning of Global Objective for Network Flow in Multi-Object Tracking | Paper |
5264 | LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking | Paper |
3128 | Multi-Object Tracking Meets Moving UAV | Paper |
912 | Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline | Paper |
2683 | Unsupervised Domain Adaptation for Nighttime Aerial Tracking | Paper |
6998 | Learning Optical Flow With Kernel Patch Attention | Paper |
5798 | Towards Understanding Adversarial Robustness of Optical Flow Networks | Paper |
5641 | DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow | Paper |
Paper Id | Paper Title | Link |
---|---|---|
8367 | Multi-Person Extreme Motion Prediction | Paper |
51 | Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation | Paper |
9962 | AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation | Paper |
4071 | Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation | Paper |
6971 | Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation | Paper |
10385 | Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video | Paper |
6843 | Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization | Paper |
3768 | Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation | Paper |
2364 | Location-Free Human Pose Estimation | Paper |
1083 | MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation | Paper |
7104 | Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision | Paper |
1897 | Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors | Paper |
5115 | PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound | Paper |
10409 | Differentiable Dynamics for Articulated 3D Human Motion Reconstruction | Paper |
4352 | COAP: Compositional Articulated Occupancy of People | Paper |
6849 | Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video | Paper |
6924 | SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration | Paper |
3094 | MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video | Paper |
770 | Putting People in Their Place: Monocular Regression of 3D People in Depth | Paper |
4288 | FLAG: Flow-Based 3D Avatar Generation From Sparse Observations | Paper |
896 | GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping | Paper |
933 | Capturing and Inferring Dense Full-Body Human-Scene Contact | Paper |
3301 | BodyMap: Learning Full-Body Dense Correspondence Map | Paper |
1209 | ICON: Implicit Clothed Humans Obtained From Normals | Paper |
Paper Id | Paper Title | Link |
---|---|---|
7748 | Generating Representative Samples for Few-Shot Classification | Paper |
2919 | Matching Feature Sets for Few-Shot Image Classification | Paper |
2525 | Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations | Paper |
6602 | Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection | Paper |
9011 | Forward Compatible Few-Shot Class-Incremental Learning | Paper |
10780 | Constrained Few-Shot Class-Incremental Learning | Paper |
9441 | Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference | Paper |
9456 | EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning | Paper |
10053 | Few-Shot Learning With Noisy Labels | Paper |
7988 | Ranking Distance Calibration for Cross-Domain Few-Shot Learning | Paper |
10614 | Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning | Paper |
2507 | Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning | Paper |
8242 | Learning To Memorize Feature Hallucination for One-Shot Image Generation | Paper |
48 | A Closer Look at Few-Shot Image Generation | Paper |
4470 | Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition | Paper |
2309 | Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability | Paper |
1534 | Transferability Estimation Using Bhattacharyya Class Separability | Paper |
9832 | Revisiting the Transferability of Supervised Pretraining: An MLP Perspective | Paper |
5990 | Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data | Paper |
6400 | Which Model To Transfer? Finding the Needle in the Growing Haystack | Paper |
7918 | Does Robustness on ImageNet Transfer to Downstream Tasks? | Paper |
9779 | What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors | Paper |
3815 | OW-DETR: Open-World Detection Transformer | Paper |
9180 | Unseen Classes at a Later Time? No Problem | Paper |
6901 | Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism | Paper |
5542 | On Generalizing Beyond Domains in Cross-Domain Continual Learning | Paper |
10123 | Online Continual Learning on a Contaminated Data Stream With Blurry Task Boundaries | Paper |
2527 | DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion | Paper |
544 | Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning | Paper |
2321 | En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning | Paper |
5161 | VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning | Paper |
5950 | Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning | Paper |
8438 | KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning | Paper |
6727 | Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis | Paper |
4846 | WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery | Paper |
Paper Id | Paper Title | Link |
---|---|---|
1812 | MeMOT: Multi-Object Tracking With Memory | Paper |
2326 | Unsupervised Learning of Accurate Siamese Tracking | Paper |
1995 | Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds | Paper |
3616 | GMFlow: Learning Optical Flow via Global Matching | Paper |
10012 | GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking | Paper |
3417 | SNUG: Self-Supervised Neural Dynamic Garments | Paper |
6431 | Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction | Paper |
10207 | Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation | Paper |
4351 | Context-Aware Sequence Alignment Using 4D Skeletal Augmentation | Paper |
10467 | Enabling Equivariance for Arbitrary Lie Groups | Paper |
1089 | RAMA: A Rapid Multicut Algorithm on GPU | Paper |
103 | Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks | Paper |
6427 | RCP: Recurrent Closest Point for Point Cloud | Paper |
6607 | Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis | Paper |
6810 | Balanced Multimodal Learning via On-the-Fly Gradient Modulation | Paper |
Paper Id | Paper Title | Link |
---|---|---|
3870 | Block-NeRF: Scalable Large Scene Neural View Synthesis | Paper |
5472 | SceneSqueezer: Learning To Compress Scene for Camera Relocalization | Paper |
7077 | Light Field Neural Rendering | Paper |
8204 | Extracting Triangular 3D Models, Materials, and Lighting From Images | Paper |
8722 | Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3) | Paper |
1461 | Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models | Paper |
6131 | It's All in the Teacher: Zero-Shot Quantization Brought Closer to the Teacher | Paper |
6484 | NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks | Paper |
7060 | Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention | Paper |
5966 | Parameter-Free Online Test-Time Adaptation | Paper |
10272 | Patch-Level Representation Learning for Self-Supervised Vision Transformers | Paper |
11845 | Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization | Paper |
9568 | Mixed Differential Privacy in Computer Vision | Paper |
2663 | DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis | Paper |
11405 | Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning | Paper |
Paper Id | Paper Title | Link |
---|---|---|
11527 | On the Instability of Relative Pose Estimation and RANSAC's Role | Paper |
1458 | Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training | Paper |
2845 | Global Sensing and Measurements Reuse for Image Compressed Sensing | Paper |
7248 | Maximum Consensus by Weighted Influences of Monotone Boolean Functions | Paper |
8398 | MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph | Paper |
6292 | Styleformer: Transformer Based Generative Adversarial Networks With Style Vector | Paper |
9212 | Scanline Homographies for Rolling-Shutter Plane Absolute Pose | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4675 | Self-Supervised Models Are Continual Learners | Paper |
5592 | The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization | Paper |
5983 | Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning | Paper |
7932 | SimMIM: A Simple Framework for Masked Image Modeling | Paper |
8651 | Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning | Paper |
7363 | UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning | Paper |
6763 | Contrastive Conditional Neural Processes | Paper |
1945 | One-Bit Active Query With Contrastive Pairs | Paper |
496 | HCSC: Hierarchical Contrastive Selective Coding | Paper |
4560 | Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging | Paper |
9291 | Hierarchical Self-Supervised Representation Learning for Movie Understanding | Paper |
7239 | Anomaly Detection via Reverse Distillation From One-Class Embedding | Paper |
8177 | Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning | Paper |
3636 | DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning | Paper |
5723 | Learning To Collaborate in Decentralized Learning of Personalized Models | Paper |
8083 | Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph | Paper |
1264 | DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning | Paper |
1835 | Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning | Paper |
1139 | Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes | Paper |
1554 | Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors | Paper |
2856 | Spectral Unsupervised Domain Adaptation for Visual Recognition | Paper |
1408 | DATA: Domain-Aware and Task-Aware Self-Supervised Learning | Paper |
2449 | Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning | Paper |
4337 | DeepDPM: Deep Clustering With an Unknown Number of Clusters | Paper |
7785 | PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions | Paper |
9990 | Robust Outlier Detection by De-Biasing VAE Likelihoods | Paper |
3489 | Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data | Paper |
1420 | CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding | Paper |
10336 | Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation | Paper |
3423 | DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation | Paper |
8154 | WildNet: Learning Domain Generalized Semantic Segmentation From the Wild | Paper |
5616 | UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation | Paper |
4410 | Semi-Supervised Semantic Segmentation With Error Localization Network | Paper |
621 | Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation | Paper |
750 | Integrative Few-Shot Learning for Classification and Segmentation | Paper |
4568 | GanOrCon: Are Generative Models Useful for Few-Shot Segmentation? | Paper |
8214 | SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis | Paper |
1055 | CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs | Paper |
Paper Id | Paper Title | Link |
---|---|---|
93 | GradViT: Gradient Inversion of Vision Transformers | Paper |
9396 | Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings | Paper |
7502 | CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning | Paper |
6925 | APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers | Paper |
6650 | Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning | Paper |
6121 | Robust Federated Learning With Noisy and Heterogeneous Clients | Paper |
9724 | Federated Learning With Position-Aware Neurons | Paper |
10112 | Layer-Wised Model Aggregation for Personalized Federated Learning | Paper |
4369 | FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning | Paper |
2897 | FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction | Paper |
1250 | Differentially Private Federated Learning With Local Regularization and Sparsification | Paper |
1234 | Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage | Paper |
5568 | Learn From Others and Be Yourself in Heterogeneous Federated Learning | Paper |
3953 | RSCFed: Random Sampling Consensus Federated Semi-Supervised Learning | Paper |
2956 | Federated Class-Incremental Learning | Paper |
7881 | Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning | Paper |
8257 | FedCorr: Multi-Stage Federated Learning for Label Noise Correction | Paper |
6027 | ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning | Paper |
Paper Id | Paper Title | Link |
---|---|---|
1096 | Cycle-Consistent Counterfactuals by Latent Transformations | Paper |
5428 | Consistent Explanations by Contrastive Learning | Paper |
6357 | Towards Better Understanding Attribution Methods | Paper |
7285 | Proto2Proto: Can You Recognize the Car, the Way I Do? | Paper |
7606 | Do Explanations Explain? Model Knows Best | Paper |
7668 | HINT: Hierarchical Neuron Concept Explainer | Paper |
7825 | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes | Paper |
7404 | What Do Navigation Agents Learn About Their Environment? | Paper |
11789 | A Framework for Learning Ante-Hoc Explainable Models via Concepts | Paper |
778 | Exploiting Explainable Metrics for Augmented SGD | Paper |
8195 | FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks | Paper |
10710 | Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations | Paper |
6365 | B-Cos Networks: Alignment Is All We Need for Interpretability | Paper |
4303 | The Flag Median and FlagIRLS | Paper |
Paper Id | Paper Title | Link |
---|---|---|
112 | Learning Fair Classifiers With Partially Annotated Group Labels | Paper |
5065 | Estimating Structural Disparities for Face Models | Paper |
6022 | Estimating Example Difficulty Using Variance of Gradients | Paper |
6962 | Fairness-Aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models | Paper |
9906 | Fair Contrastive Learning for Facial Attribute Classification | Paper |
6582 | Leveraging Adversarial Examples To Quantify Membership Information Leakage | Paper |
10915 | Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers | Paper |
11713 | Deep Unlearning via Randomized Conditionally Independent Hessians | Paper |
284 | Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets | Paper |
11071 | A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2658 | Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices? | Paper |
649 | Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation | Paper |
2266 | SEEG: Semantic Energized Co-Speech Gesture Generation | Paper |
715 | Mix and Localize: Localizing Sound Sources in Mixtures | Paper |
2204 | Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation | Paper |
7217 | IntentVizor: Towards Generic Query Guided Interactive Video Summarization | Paper |
11551 | M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers | Paper |
5355 | Finding Fallen Objects via Asynchronous Audio-Visual Integration | Paper |
6187 | Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory | Paper |
6676 | Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization | Paper |
10849 | Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language | Paper |
11001 | It's Time for Artistic Correspondence in Music and Video | Paper |
11391 | Self-Supervised Object Detection From Audio-Visual Correspondence | Paper |
8361 | More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech | Paper |
2475 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer | Paper |
7892 | A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4818 | Diffusion Autoencoders: Toward a Meaningful and Decodable Representation | Paper |
6519 | Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps | Paper |
11253 | Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values | Paper |
908 | Ensembling Off-the-Shelf Models for GAN Training | Paper |
10490 | Marginal Contrastive Correspondence for Guided Image Generation | Paper |
3437 | GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation | Paper |
5452 | High-Resolution Image Synthesis With Latent Diffusion Models | Paper |
3874 | Vector Quantized Diffusion Model for Text-to-Image Synthesis | Paper |
5265 | ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation | Paper |
790 | Dataset Distillation by Matching Training Trajectories | Paper |
6337 | Continual Predictive Learning From Videos | Paper |
11474 | Motion-Adjustable Neural Implicit Video Representation | Paper |
2561 | Splicing ViT Features for Semantic Appearance Transfer | Paper |
1064 | MAT: Mask-Aware Transformer for Large Hole Image Inpainting | Paper |
2344 | Day-to-Night Image Synthesis for Training Nighttime Neural ISPs | Paper |
5874 | Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness | Paper |
3576 | Few-Shot Head Swapping in the Wild | Paper |
5059 | ClothFormer: Taming Video Virtual Try-On in All Module | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4380 | Adversarial Parametric Pose Prior | Paper |
4450 | Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation | Paper |
4806 | PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision | Paper |
2492 | Generalizable Human Pose Triangulation | Paper |
1181 | GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras | Paper |
1468 | Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory | Paper |
4014 | Contextual Instance Decoupling for Robust Multi-Person Pose Estimation | Paper |
2202 | End-to-End Multi-Person Pose Estimation With Transformers | Paper |
4534 | Meta Agent Teaming Active Learning for Pose Estimation | Paper |
3411 | Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation | Paper |
6194 | Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer | Paper |
8628 | Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture | Paper |
1445 | LASER: LAtent SpacE Rendering for 2D Visual Localization | Paper |
8152 | Learning To Detect Scene Landmarks for Camera Localization | Paper |
4196 | Geometric Transformer for Fast and Robust Point Cloud Registration | Paper |
7968 | ARCS: Accurate Rotation and Correspondence Search | Paper |
3628 | FisherMatch: Semi-Supervised Rotation Regression via Entropy-Based Filtering | Paper |
10439 | Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation | Paper |
Paper Id | Paper Title | Link |
---|---|---|
5660 | CAFE: Learning To Condense Dataset by Aligning Features | Paper |
9135 | Lite-MDETR: A Lightweight Multi-Modal Detector | Paper |
703 | DeeCap: Dynamic Early Exiting for Efficient Image Captioning | Paper |
10864 | Searching the Deployable Convolution Neural Networks for GPUs | Paper |
6685 | Active Learning by Feature Mixing | Paper |
6585 | When To Prune? A Policy Towards Early Structural Pruning | Paper |
11185 | Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning | Paper |
9318 | How Well Do Sparse ImageNet Models Transfer? | Paper |
9388 | Rep-Net: Efficient On-Device Learning via Feature Reprogramming | Paper |
4954 | CHEX: CHannel EXploration for CNN Model Compression | Paper |
3533 | HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks | Paper |
2934 | AdaViT: Adaptive Vision Transformers for Efficient Image Recognition | Paper |
1772 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation | Paper |
724 | Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error | Paper |
3958 | IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization | Paper |
9796 | DECORE: Deep Compression With Reinforcement Learning | Paper |
11195 | Towards Efficient and Scalable Sharpness-Aware Minimization | Paper |
1088 | AEGNN: Asynchronous Event-Based Graph Neural Networks | Paper |
4078 | DiSparse: Disentangled Sparsification for Multitask Model Compression | Paper |
1836 | Multi-Modal Extreme Classification | Paper |
11241 | A Sampling-Based Approach for Efficient Clustering in Large Datasets | Paper |
11776 | Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction | Paper |
6380 | Learnable Lookup Table for Neural Network Quantization | Paper |
8374 | Instance-Aware Dynamic Neural Network Quantization | Paper |
10529 | Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation | Paper |
3265 | Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction | Paper |
5233 | Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation | Paper |
11646 | PokeBNN: A Binary Pursuit of Lightweight Accuracy | Paper |
2031 | Automated Progressive Learning for Efficient Training of Vision Transformers | Paper |
1417 | DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos | Paper |
9190 | Channel Balancing for Accurate Quantization of Winograd Convolutions | Paper |
9054 | ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching | Paper |
8230 | Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs | Paper |
4843 | AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation | Paper |
9210 | TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing | Paper |
185 | SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems | Paper |
6367 | TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed | Paper |
Paper Id | Paper Title | Link |
---|---|---|
793 | DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation | Paper |
1453 | Universal Photometric Stereo Network Using Global Lighting Contexts | Paper |
2355 | Uncertainty-Aware Deep Multi-View Photometric Stereo | Paper |
5441 | Fast Light-Weight Near-Field Photometric Stereo | Paper |
4990 | Glass Segmentation Using Intensity and Spectral Polarization Cues | Paper |
1557 | Shape From Polarization for Complex Scenes in the Wild | Paper |
6107 | Deep Depth From Focus With Differential Focus Volume | Paper |
7381 | Optimal LED Spectral Multiplexing for NIR2RGB Translation | Paper |
8076 | Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements | Paper |
8196 | NAN: Noise-Aware NeRFs for Burst-Denoising | Paper |
3129 | Estimating Fine-Grained Noise Model via Contrastive Learning | Paper |
11094 | Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders | Paper |
1021 | MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution | Paper |
6350 | PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images | Paper |
Paper Id | Paper Title | Link |
---|---|---|
10159 | Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors | Paper |
1531 | Learning To Anticipate Future With Dynamic Context Removal | Paper |
11115 | Self-Supervised Spatial Reasoning on Multi-View Line Drawings | Paper |
5634 | Contextual Debiasing for Visual Recognition With Causal Mechanisms | Paper |
Paper Id | Paper Title | Link |
---|---|---|
3468 | Adversarial Texture for Fooling Person Detectors in the Physical World | Paper |
4109 | Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World | Paper |
5922 | Enhancing Classifier Conservativeness and Robustness by Polynomiality | Paper |
5448 | Backdoor Attacks on Self-Supervised Learning | Paper |
6583 | Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks | Paper |
3994 | Few-Shot Backdoor Defense Using Shapley Estimation | Paper |
10910 | Better Trigger Inversion Optimization in Backdoor Scanning | Paper |
7051 | Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees | Paper |
9002 | Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching | Paper |
6908 | LAS-AT: Adversarial Training With Learnable Attack Strategy | Paper |
4589 | Subspace Adversarial Training | Paper |
5403 | Pyramid Adversarial Training Improves ViT Performance | Paper |
12025 | Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations | Paper |
2245 | Robust Image Forgery Detection Over Online Social Network Shared Images | Paper |
6270 | Quantifying Societal Bias Amplification in Image Captioning | Paper |
Paper Id | Paper Title | Link |
---|---|---|
725 | Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models | Paper |
706 | GAN-Supervised Dense Visual Alignment | Paper |
8416 | Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator | Paper |
8925 | Text2Mesh: Text-Driven Neural Stylization for Meshes | Paper |
6649 | StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation | Paper |
7720 | Physical Simulation Layer for Accurate 3D Modeling | Paper |
717 | Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time | Paper |
3579 | Neural Texture Extraction and Distribution for Controllable Person Image Synthesis | Paper |
6545 | I M Avatar: Implicit Morphable Head Avatars From Videos | Paper |
549 | RCL: Recurrent Continuous Localization for Temporal Action Detection | Paper |
4317 | Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection | Paper |
729 | MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition | Paper |
9219 | TubeR: Tubelet Transformer for Video Action Detection | Paper |
8613 | MixFormer: End-to-End Tracking With Iterative Mixed Attention | Paper |
Paper Id | Paper Title | Link |
---|---|---|
5905 | DN-DETR: Accelerate DETR Training by Introducing Query DeNoising | Paper |
7010 | Proper Reuse of Image Classification Features Improves Object Detection | Paper |
8646 | Boosting 3D Object Detection by Simulating Multimodality on Point Clouds | Paper |
10578 | TransVPR: Transformer-Based Place Recognition With Multi-Level Attention Aggregation | Paper |
9856 | Disentangling Visual Embeddings for Attributes and Objects | Paper |
1856 | QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection | Paper |
5517 | Unknown-Aware Object Detection: Learning What You Don't Know From Videos in the Wild | Paper |
3247 | Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks | Paper |
5972 | Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective | Paper |
3349 | Calibrating Deep Neural Networks by Pairwise Constraints | Paper |
7691 | Lifelong Graph Learning | Paper |
11327 | OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks | Paper |
10810 | Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation | Paper |
11529 | Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches | Paper |
9157 | Cross-View Transformers for Real-Time Map-View Semantic Segmentation | Paper |
Paper Id | Paper Title | Link |
---|---|---|
8542 | Label Matching Semi-Supervised Object Detection | Paper |
10433 | Multidimensional Belief Quantification for Label-Efficient Meta-Learning | Paper |
11752 | Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples | Paper |
5537 | Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification | Paper |
9804 | Class-Aware Contrastive Semi-Supervised Learning | Paper |
4181 | Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework | Paper |
10916 | Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo | Paper |
2296 | Learning Where To Learn in Cross-View Self-Supervised Learning | Paper |
2487 | Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective | Paper |
2869 | SimMatch: Semi-Supervised Learning With Similarity Matching | Paper |
540 | Active Teacher for Semi-Supervised Object Detection | Paper |
943 | Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection | Paper |
5807 | Self-Supervised Learning of Object Parts for Semantic Segmentation | Paper |
4603 | MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection | Paper |
6024 | Scale-Equivalent Distillation for Semi-Supervised Object Detection | Paper |
6654 | A Self-Supervised Descriptor for Image Copy Detection | Paper |
10678 | Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut | Paper |
9521 | CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification | Paper |
11648 | Semi-Supervised Few-Shot Learning via Multi-Factor Clustering | Paper |
2306 | CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning | Paper |
2589 | Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data | Paper |
3172 | A Simple Data Mixing Prior for Improving Self-Supervised Learning | Paper |
3375 | DETReg: Unsupervised Pretraining With Region Priors for Object Detection | Paper |
4354 | Sound and Visual Representation Learning With Multiple Pretraining Tasks | Paper |
4601 | UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training | Paper |
1744 | Weakly Supervised Object Localization As Domain Adaption | Paper |
7762 | Debiased Learning From Naturally Imbalanced Pseudo-Labels | Paper |
3414 | Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning | Paper |
1546 | Masked Feature Prediction for Self-Supervised Visual Pre-Training | Paper |
6171 | Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency | Paper |
11064 | Id-Free Person Similarity Learning | Paper |
5962 | End-to-End Semi-Supervised Learning for Video Action Detection | Paper |
11772 | Probabilistic Representations for Video Contrastive Learning | Paper |
5904 | Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition | Paper |
5668 | BEVT: BERT Pretraining of Video Transformers | Paper |
7678 | Generative Cooperative Learning for Unsupervised Video Anomaly Detection | Paper |
9976 | When Does Contrastive Visual Representation Learning Work? | Paper |
596 | The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization | Paper |
5267 | What Matters for Meta-Learning Vision Regression Tasks? | Paper |
Paper Id | Paper Title | Link |
---|---|---|
689 | IFOR: Iterative Flow Minimization for Robotic Object Rearrangement | Paper |
2734 | TCTrack: Temporal Contexts for Aerial Tracking | Paper |
2846 | AKB-48: A Real-World Articulated Object Knowledge Base | Paper |
4440 | 3DAC: Learning Attribute Compression for Point Clouds | Paper |
4521 | Simple but Effective: CLIP Embeddings for Embodied AI | Paper |
2359 | Multi-Robot Active Mapping via Neural Bipartite Graph Matching | Paper |
2464 | Continuous Scene Representations for Embodied AI | Paper |
2923 | Interactron: Embodied Adaptive Object Detection | Paper |
1761 | Online Learning of Reusable Abstract Models for Object Goal Navigation | Paper |
3195 | RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization | Paper |
2684 | UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation | Paper |
9736 | Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation | Paper |
10404 | Upright-Net: Learning Upright Orientation for 3D Point Cloud | Paper |
Paper Id | Paper Title | Link |
---|---|---|
7865 | DeepFake Disrupter: The Detector of DeepFake Is My Friend | Paper |
3350 | HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization | Paper |
7457 | Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources | Paper |
9423 | Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection | Paper |
Paper Id | Paper Title | Link |
---|---|---|
8193 | Transferable Sparse Adversarial Attack | Paper |
8898 | Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection | Paper |
10026 | Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability | Paper |
8063 | Improving Adversarial Transferability via Neuron Attribution-Based Attacks | Paper |
10779 | Complex Backdoor Detection by Symmetric Feature Differencing | Paper |
10243 | Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer | Paper |
11148 | Zero-Query Transfer Attacks on Context-Aware Object Detectors | Paper |
6302 | 360-Attack: Distortion-Aware Perturbations From Perspective-Views | Paper |
11210 | Label-Only Model Inversion Attacks via Boundary Repulsion | Paper |
11207 | Merry Go Round: Rotate a Frame and Fool a DNN | Paper |
1485 | Cross-Modal Transferable Adversarial Attacks From Images to Videos | Paper |
10629 | BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning | Paper |
11521 | Investigating Top-k White-Box and Transferable Black-Box Attack | Paper |
7175 | Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution | Paper |
2830 | Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack | Paper |
3325 | Towards Efficient Data Free Black-Box Adversarial Attack | Paper |
3931 | Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network | Paper |
11451 | Certified Patch Robustness via Smoothed Vision Transformers | Paper |
5540 | Towards Practical Certifiable Patch Defense With Vision Transformer | Paper |
4282 | On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles | Paper |
7361 | 3DeformRS: Certifying Spatial Deformations on Point Clouds | Paper |
4302 | Stereoscopic Universal Perturbations Across Different Architectures and Datasets | Paper |
4407 | Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations | Paper |
10883 | Bounded Adversarial Attack on Deep Content Features | Paper |
9811 | DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints | Paper |
10212 | Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart | Paper |
10905 | Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness | Paper |
7360 | Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input | Paper |
2205 | Adversarial Eigen Attack on Black-Box Models | Paper |
7620 | Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond | Paper |
4422 | Enhancing Adversarial Training With Second-Order Statistics of Weights | Paper |
9176 | Towards Data-Free Model Stealing in a Hard Label Setting | Paper |
9218 | Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients | Paper |
10096 | DTA: Physical Camouflage Attacks Using Differentiable Transformation Network | Paper |
1841 | Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity | Paper |
201 | Enhancing Adversarial Robustness for Deep Metric Learning | Paper |
5230 | Shape-Invariant 3D Adversarial Point Clouds | Paper |
5789 | Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack by Natural Phenomenon | Paper |
6161 | Exploring Effective Data for Surrogate Training Towards Black-Box Attack | Paper |
11698 | NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models | Paper |
5970 | Dual-Key Multimodal Backdoors for Visual Question Answering | Paper |
6546 | Proactive Image Manipulation Detection | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2347 | Unified Contrastive Learning in Image-Text-Label Space | Paper |
9927 | AlignMixup: Improving Representations by Interpolating Aligned Features | Paper |
2419 | On the Road to Online Adaptation for Semantic Image Segmentation | Paper |
5236 | ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation | Paper |
3487 | Kernelized Few-Shot Object Detection With Efficient Integral Aggregation | Paper |
186 | Neural Mean Discrepancy for Efficient Out-of-Distribution Detection | Paper |
8477 | A Structured Dictionary Perspective on Implicit Neural Representations | Paper |
10563 | LARGE: Latent-Based Regression Through GAN Semantics | Paper |
6667 | Rethinking Controllable Variational Autoencoders | Paper |
9016 | Learning Canonical F-Correlation Projection for Compact Multiview Representation | Paper |
6288 | Cross-Architecture Self-Supervised Video Representation Learning | Paper |
4418 | Improving Video Model Transfer With Dynamic Representation Learning | Paper |
5928 | Self-Supervised Image Representation Learning With Geometric Set Consistency | Paper |
246 | HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging | Paper |
4037 | Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling | Paper |
7362 | DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds | Paper |
9356 | Neural Convolutional Surfaces | Paper |
10032 | Representing 3D Shapes With Probabilistic Directed Distance Fields | Paper |
3030 | H4D: Human 4D Modeling by Learning Neural Compositional Representation | Paper |
518 | Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification | Paper |
1275 | Contrastive Regression for Domain Adaptation on Gaze Estimation | Paper |
9822 | Forward Compatible Training for Large-Scale Embedding Retrieval Systems | Paper |
4945 | Improving Subgraph Recognition With Variational Graph Information Bottleneck | Paper |
2508 | Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss | Paper |
4145 | Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4111 | Deep Stereo Image Compression via Bi-Directional Coding | Paper |
8934 | RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion | Paper |
4213 | Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer | Paper |
2554 | Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels | Paper |
3021 | SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization | Paper |
3470 | Automatic Color Image Stitching Using Quaternion Rank-1 Alignment | Paper |
6712 | SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing | Paper |
4112 | Degree-of-Linear-Polarization-Based Color Constancy | Paper |
2170 | Point Cloud Color Constancy | Paper |
265 | Boosting View Synthesis With Residual Transfer | Paper |
5780 | Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection | Paper |
4413 | Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging | Paper |
479 | PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition | Paper |
7596 | Multimodal Material Segmentation | Paper |
6384 | Occlusion-Aware Cost Constructor for Light Field Depth Estimation | Paper |
815 | Learning Neural Light Fields With Ray-Space Embedding | Paper |
2268 | Acquiring a Dynamic Light Field Through a Single-Shot Coded Image | Paper |
4415 | Gravitationally Lensed Black Hole Emission Tomography | Paper |
5058 | Deep Saliency Prior for Reducing Visual Distraction | Paper |
8388 | Personalized Image Aesthetics Assessment With Rich Attributes | Paper |
6382 | Artistic Style Discovery With Independent Components | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2004 | Noisy Boundaries: Lemon or Lemonade for Semi-Supervised Instance Segmentation? | Paper |
5105 | Partial Class Activation Attention for Semantic Segmentation | Paper |
7261 | Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers | Paper |
4156 | Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation | Paper |
7427 | Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation | Paper |
4593 | Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation | Paper |
1567 | L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation | Paper |
573 | Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data | Paper |
1307 | Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation | Paper |
2748 | Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation | Paper |
4586 | MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation | Paper |
8233 | NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night | Paper |
6032 | Fast Point Transformer | Paper |
7468 | RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior | Paper |
807 | ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes | Paper |
1738 | DisARM: Displacement Aware Relation Module for 3D Detection | Paper |
2722 | Learning Object Context for Novel-View Scene Layout Generation | Paper |
2166 | Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts | Paper |
348 | Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image | Paper |
5927 | Raw High-Definition Radar for Multi-Task Learning | Paper |
2343 | Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation | Paper |
7169 | UKPGAN: A General Self-Supervised Keypoint Detector | Paper |
5057 | Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos | Paper |
Paper Id | Paper Title | Link |
---|---|---|
104 | Rethinking Efficient Lane Detection via Curve Modeling | Paper |
5623 | Exploiting Temporal Relations on Radar Perception for Autonomous Driving | Paper |
1321 | Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective | Paper |
3631 | BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement | Paper |
9886 | ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning | Paper |
607 | Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion | Paper |
1397 | Vehicle Trajectory Prediction Works, but Not Everywhere | Paper |
9659 | LTP: Lane-Based Trajectory Prediction for Autonomous Driving | Paper |
2468 | ONCE-3DLanes: Building Monocular 3D Lane Detection | Paper |
10899 | Towards Driving-Oriented Metric for Lane Detection Models | Paper |
6918 | Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes | Paper |
5120 | LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection | Paper |
1664 | DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection | Paper |
9131 | A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation | Paper |
7091 | Forecasting From LiDAR via Future Object Detection | Paper |
5998 | RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding | Paper |
3364 | Learning From All Vehicles | Paper |
10331 | Is Mapping Necessary for Realistic PointGoal Navigation? | Paper |
9772 | Symmetry-Aware Neural Architecture for Embodied Visual Exploration | Paper |
6482 | Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles | Paper |
10621 | Topology Preserving Local Road Network Estimation From Single Onboard Camera Image | Paper |
6744 | Coupling Vision and Proprioception for Navigation of Legged Robots | Paper |
10063 | Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation | Paper |
9391 | 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection | Paper |
4385 | Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior | Paper |
2537 | SelfD: Self-Learning Large-Scale Driving Policies From the Web | Paper |
5244 | Towards Real-World Navigation With Deep Differentiable Planners | Paper |
10481 | Privacy Preserving Partial Localization | Paper |
6490 | Efficient Large-Scale Localization by Global Instance Recognition | Paper |
5459 | CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data | Paper |
Paper Id | Paper Title | Link |
---|---|---|
84 | De-Rendering 3D Objects in the Wild | Paper |
4234 | Neural Fields As Learnable Kernels for 3D Reconstruction | Paper |
3715 | HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing | Paper |
2744 | 3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies | Paper |
2410 | Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian | Paper |
9567 | Deep Image-Based Illumination Harmonization | Paper |
1834 | Glass: Geometric Latent Augmentation for Shape Spaces | Paper |
1559 | PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes | Paper |
1478 | Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes | Paper |
9364 | Neural Mesh Simplification | Paper |
6486 | SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters | Paper |
7818 | CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation | Paper |
7841 | UNIST: Unpaired Neural Implicit Shape Translation Network | Paper |
1800 | CoNeRF: Controllable Neural Radiance Fields | Paper |
6407 | Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling | Paper |
8338 | Modeling Indirect Illumination for Inverse Rendering | Paper |
3519 | Neural Head Avatars From Monocular RGB Videos | Paper |
2341 | DeepCurrents: Learning Implicit Representations of Shapes With Boundaries | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4335 | Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination | Paper |
5110 | AnyFace: Free-Style Text-To-Face Synthesis and Manipulation | Paper |
5301 | General Facial Representation Learning in a Visual-Linguistic Manner | Paper |
5269 | Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection | Paper |
1219 | Detecting Deepfakes With Self-Blended Images | Paper |
5967 | 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces | Paper |
9638 | Evaluation-Oriented Knowledge Distillation for Deep Face Recognition | Paper |
6682 | AdaFace: Quality Adaptive Margin for Face Recognition | Paper |
6920 | Moving Window Regression: A Novel Approach to Ordinal Regression | Paper |
10531 | FaceFormer: Speech-Driven 3D Facial Animation With Transformers | Paper |
11053 | Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in “In-the-Wild” Videos | Paper |
229 | Deep Decomposition for Stochastic Normal-Abnormal Transport | Paper |
3114 | DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification | Paper |
10426 | Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification | Paper |
10994 | Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4710 | VRDFormer: End-to-End Video Visual Relation Detection With Transformers | Paper |
720 | Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation | Paper |
9896 | Visual Acoustic Matching | Paper |
5847 | The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation | Paper |
4283 | Learning Multiple Dense Prediction Tasks From Partially Annotated Data | Paper |
9443 | PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning | Paper |
5513 | Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture | Paper |
5826 | FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation | Paper |
3020 | Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding | Paper |
11019 | Equivariant Point Cloud Analysis via Learning Orientations for Message Passing | Paper |
2137 | Surface Representation for Point Clouds | Paper |
3284 | Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds | Paper |
3846 | 3D Common Corruptions and Data Augmentation | Paper |
4027 | INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation | Paper |
11446 | How Much Does Input Data Type Impact Final Face Model Accuracy? | Paper |
Paper Id | Paper Title | Link |
---|---|---|
7484 | Ego4D: Around the World in 3,000 Hours of Egocentric Video | Paper |
10504 | TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting | Paper |
5075 | Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding | Paper |
2465 | vCLIMB: A Novel Video Class Incremental Learning Benchmark | Paper |
2221 | Opening Up Open World Tracking | Paper |
1795 | Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions | Paper |
8910 | CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters | Paper |
11289 | Failure Modes of Domain Generalization Algorithms | Paper |
9398 | A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes | Paper |
6567 | Grounding Answers for Visual Questions Asked by Visually Impaired People | Paper |
6719 | Learning To Answer Questions in Dynamic Audio-Visual Scenarios | Paper |
1780 | Episodic Memory Question Answering | Paper |
11561 | ScanQA: 3D Question Answering for Spatial Scene Understanding | Paper |
5943 | Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles | Paper |
8893 | BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild | Paper |
Paper Id | Paper Title | Link |
---|---|---|
98 | Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation | Paper |
2849 | Structured Sparse R-CNN for Direct Scene Graph Generation | Paper |
10248 | PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation | Paper |
3738 | RU-Net: Regularized Unrolling Network for Scene Graph Generation | Paper |
1142 | Fine-Grained Predicates Learning for Scene Graph Generation | Paper |
3323 | HL-Net: Heterophily Learning Network for Scene Graph Generation | Paper |
10227 | SGTR: End-to-End Scene Graph Generation With Transformer | Paper |
6703 | Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs | Paper |
8205 | RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition | Paper |
10369 | Spatial Commonsense Graph for Object Localisation in Partial Scenes | Paper |
4148 | The Pedestrian Next to the Lamppost : Adaptive Object Graphs for Better Instantaneous Mapping | Paper |
7832 | Category-Aware Transformer Network for Better Human-Object Interaction Detection | Paper |
7619 | Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection | Paper |
3379 | Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection | Paper |
10087 | Human-Object Interaction Detection via Disentangled Transformer | Paper |
5684 | MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection | Paper |
7237 | GaTector: A Unified Framework for Gaze Object Prediction | Paper |
6242 | STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes | Paper |
7926 | Crowd Counting in the Frequency Domain | Paper |
3876 | Boosting Crowd Counting via Multifaceted Attention | Paper |
6137 | Rethinking Spatial Invariance of Convolutional Networks for Object Counting | Paper |
6322 | Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing | Paper |
5725 | Collaborative Transformers for Grounded Situation Recognition | Paper |
Paper Id | Paper Title | Link |
---|---|---|
2817 | Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos | Paper |
5042 | SVIP: Sequence VerIfication for Procedures in Videos | Paper |
3292 | Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency | Paper |
5855 | Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization | Paper |
3084 | GateHUB: Gated History Unit With Background Suppression for Online Action Detection | Paper |
7477 | E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition | Paper |
4495 | Hybrid Relation Guided Set Matching for Few-Shot Action Recognition | Paper |
3385 | Spatio-Temporal Relation Modeling for Few-Shot Action Recognition | Paper |
9787 | Alignment-Uniformity Aware Representation Learning for Zero-Shot Video Classification | Paper |
1862 | Cross-Modal Representation Learning for Zero-Shot Action Recognition | Paper |
6938 | Cross-Modal Background Suppression for Audio-Visual Event Localization | Paper |
3142 | Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization | Paper |
1068 | An Empirical Study of End-to-End Temporal Action Detection | Paper |
11191 | Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval | Paper |
9295 | DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition | Paper |
730 | MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection | Paper |
2917 | Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition | Paper |
9674 | AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition | Paper |
5856 | UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection | Paper |
3946 | Detector-Free Weakly Supervised Group Activity Recognition | Paper |
2870 | Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading | Paper |
2752 | Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer | Paper |
517 | Interactiveness Field in Human-Object Interactions | Paper |
2258 | GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection | Paper |
7690 | Object-Relation Reasoning Graph for Action Recognition | Paper |
4315 | UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection | Paper |
1483 | Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition | Paper |
9379 | SPAct: Self-Supervised Privacy Preservation for Action Recognition | Paper |
818 | Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering | Paper |
28 | InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition | Paper |
11846 | Learning Video Representations of Human Motion From Synthetic Data | Paper |
6314 | Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos | Paper |
Paper Id | Paper Title | Link |
---|---|---|
1752 | EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images | Paper |
3373 | Gait Recognition in the Wild With Dense 3D Representations and a Benchmark | Paper |
1403 | Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification | Paper |
3765 | Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition | Paper |
9404 | DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover's Distance Improves Out-of-Distribution Face Identification | Paper |
1311 | Learning Second Order Local Anomaly for General Face Forgery Detection | Paper |
4821 | PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition | Paper |
10637 | Face2Exp: Combating Data Biases for Facial Expression Recognition | Paper |
11994 | Local-Adaptive Face Recognition via Graph-Based Meta-Clustering and Regularized Adaptation | Paper |
Paper Id | Paper Title | Link |
---|---|---|
4811 | EMOCA: Emotion Driven Monocular Face Capture and Animation | Paper |
6513 | Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality | Paper |
2290 | FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset | Paper |
4969 | ImFace: A Nonlinear 3D Morphable Face Model With Implicit Neural Representations | Paper |
3883 | Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling | Paper |
736 | RigNeRF: Fully Controllable Neural 3D Portraits | Paper |
5362 | HeadNeRF: A Real-Time NeRF-Based Parametric Head Model | Paper |
7738 | Sparse to Dense Dynamic 3D Facial Expression Generation | Paper |
812 | Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion | Paper |
7201 | Speech Driven Tongue Animation | Paper |
6728 | Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition | Paper |
980 | gDNA: Towards Generative Detailed Neural Avatars | Paper |
1874 | GraFormer: Graph-Oriented Transformer for 3D Pose Estimation | Paper |
10976 | Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation | Paper |
501 | Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis | Paper |
2836 | PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence | Paper |
4356 | The Wanderings of Odysseus in 3D Scenes | Paper |
6883 | OSSO: Obtaining Skeletal Shape From Outside | Paper |
11477 | LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds | Paper |
3402 | Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression | Paper |
2046 | Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation | Paper |
6216 | LISA: Learning Implicit Shape and Appearance of Hands | Paper |
3384 | MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image | Paper |
5835 | Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation | Paper |
7098 | Low-Resource Adaptation for Personalized Co-Speech Gesture Generation | Paper |
921 | D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions | Paper |
Paper Id | Paper Title | Link |
---|---|---|
1104 | Synthetic Generation of Face Videos With Plethysmograph Physiology | Paper |
9240 | Contour-Hugging Heatmaps for Landmark Detection | Paper |
4486 | Which Images To Label for Few-Shot Medical Landmark Detection? | Paper |
4473 | Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography | Paper |
8680 | Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization | Paper |
8210 | Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution | Paper |
6627 | Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations | Paper |
3856 | Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation | Paper |
6449 | BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation | Paper |
6189 | Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis | Paper |
9027 | Towards Low-Cost and Efficient Malaria Detection | Paper |
5588 | ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification | Paper |
2696 | Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification | Paper |
9084 | M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer | Paper |
121 | Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis | Paper |
10799 | HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet | Paper |
2649 | DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations | Paper |
10420 | Clean Implicit 3D Structure From Noisy 2D STEM Images | Paper |
7672 | Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks | Paper |
4123 | Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment | Paper |
3819 | Learning Optimal K-Space Acquisition and Reconstruction Using Physics-Informed Neural Networks | Paper |
2466 | NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration | Paper |
4362 | SMPL-A: Modeling Person-Specific Deformable Anatomy | Paper |
1830 | DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis | Paper |
1826 | Affine Medical Image Registration With Coarse-To-Fine Vision Transformer | Paper |
9880 | Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow | Paper |
1002 | Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization | Paper |
6023 | Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation | Paper |
6328 | FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis | Paper |
8360 | Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning | Paper |
5948 | CellTypeGraph: A New Geometric Computer Vision Benchmark | Paper |
8619 | ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics | Paper |
Paper Id | Paper Title | Link |
---|---|---|
378 | FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos | Paper |
675 | Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions | Paper |
10327 | DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image | Paper |
583 | OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction | Paper |
9029 | PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking | Paper |
6336 | Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification | Paper |
3564 | JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection | Paper |
1672 | DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion | Paper |
9778 | Egocentric Prediction of Action Target in 3D | Paper |
1950 | HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction | Paper |
10801 | Amodal Panoptic Segmentation | Paper |
8175 | Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark | Paper |
4070 | YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset | Paper |
9179 | The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting | Paper |
10392 | 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos | Paper |
8328 | AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval | Paper |
4732 | A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection | Paper |
2077 | Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities | Paper |
8123 | Optimal Correction Cost for Object Detection Evaluation | Paper |
7936 | GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains | Paper |
6061 | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding | Paper |
11100 | Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation | Paper |
4346 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes | Paper |
4313 | DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation | Paper |
4793 | Open Challenges in Deep Stereo: The Booster Dataset | Paper |
2647 | No-Reference Point Cloud Quality Assessment via Domain Adaptation | Paper |
1637 | Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network | Paper |
2810 | How Good Is Aesthetic Ability of a Fashion Model? | Paper |
656 | Instance-Wise Occlusion and Depth Orders in Natural Scenes | Paper |
7655 | PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects | Paper |
436 | Replacing Labeled Real-Image Datasets With Auto-Generated Contours | Paper |
7315 | V2C: Visual Voice Cloning | Paper |
6786 | M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining | Paper |
11067 | It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection | Paper |
4520 | From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering | Paper |
718 | Point Cloud Pre-Training With Natural 3D Structures | Paper |
1658 | The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift | Paper |
9913 | AutoMine: An Unmanned Mine Dataset | Paper |
11097 | SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis | Paper |
4797 | BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations | Paper |
2027 | Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task | Paper |
8222 | Unifying Panoptic Segmentation for Autonomous Driving | Paper |
10407 | DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection | Paper |
3296 | SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation | Paper |
11670 | Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions | Paper |