Deep Learning Paper Summaries

Introduction

This repo houses summaries for various excitng works in the field of Deep Learning. You can contribute summaries of your own. Check out our contributing guide to start contributing. Happy Reading & Summarizing!

Summaries

2024

GARField: Group Anything with Radiance Fields [Paper][Review]
- Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa, CVPR 2024
Image Hijacks: Adversarial Images can Control Generative Models at Runtime [Paper][Review]
- Luke Bailey,Euan Ong,Stuart Russel,Scott Emmons, ICML 2024
AI CONTROL: IMPROVING SAFETY DESPITE INTENTIONAL SUBVERSION [Paper][Review]
- Ryan Greenblatt,Buck Shlegeris,Kshitij Sachan,Fabien Roger, ICML 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation [Paper][Review]
- Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan, ECCV 2024
THINK BEFORE YOU SPEAK: Training Language Models With Pause Tokens [Paper][Review]
- Sachin Goyal, Ziwei Ji, Ankit Rawat, Aditya Menon, Sanjiv Kumar, Vaishnavh Nagarajan, ICLR 2024
WARM: On the Benefits of Weight Averaged Rewarded Model [Paper][Review]
- Alexandre Ramé, Nino Vieillard, Léonard Hussenot, Robert Dadashi, Geoffrey Cideron, Olivier Bachem, Johan Ferret, ICML May 2024
Matryoshka Diffusion Models [Paper][Review]
- Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind & Navdeep Jaitly, ICLR 2024
INSTRUCTSCENE: INSTRUCTION-DRIVEN 3D INDOOR SCENE SYNTHESIS WITH SEMANTIC GRAPH PRIOR [Paper][Review]
- Chenguo Lin & Yadong Mu, ICLR 2024

2023

Ablating Concepts in Text-to-Image Diffusion Models [Paper][Review]
- Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, Jun-Yan Zhu, ICCV 2023
DIRE for Diffusion-Generated Image Detection [Paper][Review]
- Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li, ICCV 2023
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [Paper][Review]
- Nataniel Ruiz, Yuanzhen Li,Varun Jampani,Yael Pritch,Michael Rubinstein, Kfir Aberman, CVPR 2023
Multi-Concept Customization of Text-to-Image Diffusion [Paper][Review]
- Nupur Kumari,Bingliang Zhang,Richard Zhang,Eli Shechtman & Jun-Yan Zhu, CVPR 2023
Segment Anything [Paper][Review]
- Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick, ICCV 2023
Siamese Masked Autoencoders [Paper][Review]
- Agrim Gupta, Jiajun Wu, Jia Deng, Li Fei-Fei, NIPS 2023
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion [Paper][Review]
- Rinon Gal1, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or, ICVR 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation [Paper][Review]
- Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou, ICCV 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models [Paper][Review]
- Andy Zou, Zifan Wang, Nicholas Carlini , Milad Nasr, J. Zico Kolter & Matt Fredrikson
What do Neural Networks Learn in Image Classification? A Frequency Shortcut Perspective [Paper][Review]
- Shunxin Wang, Raymond Veldhuis ,Christoph Brune ,Nicola Strisciuglio, ICCV 2023

2022

GAN-based image steganography for enhancing security via adversarial attack and pixel-wise deep fusion [Paper][Review ]
- Chao Yuan, Hongxia Wang, Peisong He, Jie Luo, Bin Li, Springer Multimedia tools and applications 2022
Human-level play in the game of Diplomacy by combining language models with strategic reasoning [Paper][Review]
- Meta Fundamental AI Research Diplomacy Team (FAIR), Antin Bakhtun, Noam Brown, Emily Dinan, Science Journal 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [Paper][Review]
- Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi, NIPS 2022
Learning Video Representations from Large Language Models [Paper][Review]
- Yue Zhao, Ishan Misra, Philipp Krähenbüh, Rohit Girdhar, Facebook AI Research- Meta AI, University of Texas, Austin

2021

CLIP (Contrastive Language–Image Pre-training) [Paper][Review]
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever , ICML 2021
An Image is Worth 16X16 Wrods: Transformers for Image Recognition at Scale [Paper][Review]
- Alexey Dosovitsky, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, ICLR 2021
w2v-BERT: Combining Contrastive Learning and Masked Language Modelling for Self-Supervised Speech Pre-Training [Paper][Review]
- Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu
Rainbow Memory: Continual Learning with a Memory of Diverse Samples [Paper][Review]
- Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, Jonghyun Choi, CVPR 2021
Center-based 3D Object Detection and Tracking [Paper][Review]
- Tianwei Yin, Xingyi Zhou, Philipp Krahenbuhl (UT Austin), CVPR 2021
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds [Paper][Review]
- Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu, ICCV 2021
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [Paper][Review]
- Michael Niemeyer, Andreas Geiger, CVPR 2021
Creative Sketch Genetation [Paper][Review]
- Songwei Ge, Devi Parikh, Vedanuj Goswami & C. Lawrence Zitnick, ICLR 2021
Binary TTC: A Temporal Geofence for Autonomous Navigation[Paper][Review]
- Abhishek Badki, Orazio Gallo, Jan Kautz, Pradeep Sen, CVPR 2021
On The Frequency Bias of Generative Models [Paper][Review]
- Katja Schwarz, Yiyi Liao, Andreas Geiger, NeurIPS 2021

2020

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension [Paper][Review]
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer, ACL 2020
Machine-Unlearning [Paper][Review]
- Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia,Adelin Travers, Baiwu Zhang, David Lie,Nicolas Papernot, IEEE 2020
Big Bird: Transformers for Longer Sequences [Paper][Review]
- Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed, NeurIPS 2020
Feature Fusion Attention Network for Single Image Dehazing [Paper][Review]
- Qin, Xu and Wang, Zhilin and Bai, Yuanchao and Xie, Xiaodong and Jia, Huizhu, AAAI_2020
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild [Paper][Review]
- Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi, CVPR 2020
You Only Train Once: Loss-conditional training of deep networks [Paper][Review]
- Alexey Dosovitskiy, Josip Djolonga, ICLR 2020
GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce [Paper][Review]
- Sean Bell, Yiqun Liu, Sami Alsheikh, Yina Tang, Ed Pizzi, M. Henning, Karun Singh, Omkar Parkhi, Fedor Borisyuk, KDD 2020
Semantically multi-modal image synthesis [Paper][Review]
- Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai, CVPR 2020
Learning to Simulate Dynamic Environments with GameGAN [Paper][Review]
- Seung Wook Kim, Yuhao Zhou, Jonah Philion, Antonio Torralba, Sanja Fidler, CVPR 2020
Adversarial Policies : Attacking deep reinforcement learning [Paper][Review]
- Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell, ICLR 2020
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning [Paper][Review]
- Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko, CVPR 2020

2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [Paper][Review]
- Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee, NIPS 2019
Stand-Alone Self-Attention in Vision Models [Paper][Review]
- Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens, NIPS 2019
Zero-Shot Entity Linking by Reading Entity Descriptions [Paper][Review]
- Lajanugen Logeswaran , Ming-Wei Chang‡ Kenton Lee , Kristina Toutanova , Jacob Devlin, Honglak Lee ACL-2019
Do you know that Florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment [Paper][Review]
- Nanjiang Jiang and Marie-Catherine de Marneffe , ACL-2019
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations [Paper][Review]
- Vincent Sitzmann, Michael Zollhofer, Gordon Wetzstein, NIPS-2019
Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts [Paper][Review]
- Rui Xia, Zixiang Ding, ACL-2019
Putting an End to End-to-End: Gradient-Isolated Learning of Representations [Paper][Review]
- Sindy Lowe, Peter O' Connor, Bastiaan S. Veeling, NIPS-2019
Bridging the Gap between Training and Inference for Neural Machine Translation [Paper][Review]
- Wen Zhang, Yang Feng, Fandong Meng, Di You, Qun Liu, ACL-2019
Designing and Interpreting Probes with Control Tasks [Paper][Review]
- John Hewitt, Percy Liang, EMNLP-2019
Specializing Word Embeddings (for Parsing) by Information Bottleneck [Paper][Review]
- Xiang Lisa Li, Jason Eisner, EMNLP-2019
vGraph: A Generative Model for Joint Community Detection and Node Representational Learning [Paper][Review]
- Fan-Yun Sun, Meng Qu, Jordan Hoffmann, Chin-Wei Huang, Jian Tang, NIPS-2019
Uniform convergence may be unable to explain generalization in deep learning [Paper][Review]
- Vaishnavh Nagarajan, J. Zico Kolter, NIPS-2019
SinGAN: Learning a Generative Model from a Single Natural Image [Paper][Review]
- Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, ICCV-2019
Graph U-Nets [Paper][Review]
- Hongyang Gao, Shuiwang Ji, ICML-2019
Feature Denoising for Improving Adversarial Robustness [Paper][Review]
- Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, kaiming He, CVPR-2019
This Looks Like That: Deep Learning for Interpretable Image Recognition [Paper][Review]
- Chaofan Chen, Oscar Li, Chaofan Tao, Alina Jade Barnett, Jonathan Su, Cynthia Rudin, NIPS-2019

2018

Cycle-Dehaze: Enhanced CycleGAN for Single Image Dehazing [Paper][Review]
- Deniz Engin, Anil Genc, Hazim Kemal Ekenel, CVPR 2018
A Style-Based Generator Architecture for Generative Adversarial Networks [Paper][Review]
- Tero Karras, Samuli Laine, Timo Aila, IEEE_CVPR_2018
CyCADA: Cycle-Consistent Adversarial Domain Adaptation [Paper][Review]
- Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A. Efros, Trevor Darrell, ICML-2018

2017

Attention Is All You Need [Paper][Review_1][Review_2]
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, NeurIPS 2017
Unpaired Image-to-Image Translation using Cycle Consistent Adversarial Networks [Paper][Review]
- Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros, ICCV-2017
Densely Connected Convolutional Networks [Paper][Review]
- Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger, CVPR-2017
On Calibration of Modern Neural Networks [Paper][Review]
- Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger, ICML-2017

2016

Siamese Recurrent Architectures for Learning Sentence Similarity [Paper][Review]
- Jonas Mueller, Aditya Thyagarajan, AAAI-2016

Contributing

We appreciate all contributions to the set of summaries. Please refer to CONTRIBUTING.md for the contributing guideline.

Acknowledgements

papers_we_read is an open source repository that welcomes any contribution and feedback. We wish the collected sets of summaries can help the DL community to start with the practice of reading and understanding research papers which is a potent skill in the research community. Most of our contributors include students enrolled in undergraduate programmes. We are grateful for all the contributions that help improve this collection of summaries.

License

This repo is open-sourced under the MIT License.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deep Learning Paper Summaries

Introduction

Contents

Summaries

2024

GARField: Group Anything with Radiance Fields [Paper][Review]

Image Hijacks: Adversarial Images can Control Generative Models at Runtime [Paper][Review]

AI CONTROL: IMPROVING SAFETY DESPITE INTENTIONAL SUBVERSION [Paper][Review]

Evaluating Text-to-Visual Generation with Image-to-Text Generation [Paper][Review]

THINK BEFORE YOU SPEAK: Training Language Models With Pause Tokens [Paper][Review]

WARM: On the Benefits of Weight Averaged Rewarded Model [Paper][Review]

Matryoshka Diffusion Models [Paper][Review]

INSTRUCTSCENE: INSTRUCTION-DRIVEN 3D INDOOR SCENE SYNTHESIS WITH SEMANTIC GRAPH PRIOR [Paper][Review]

2023

Ablating Concepts in Text-to-Image Diffusion Models [Paper][Review]

DIRE for Diffusion-Generated Image Detection [Paper][Review]

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [Paper][Review]

Multi-Concept Customization of Text-to-Image Diffusion [Paper][Review]

Segment Anything [Paper][Review]

Siamese Masked Autoencoders [Paper][Review]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion [Paper][Review]

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation [Paper][Review]

Universal and Transferable Adversarial Attacks on Aligned Language Models [Paper][Review]

What do Neural Networks Learn in Image Classification? A Frequency Shortcut Perspective [Paper][Review]

2022

GAN-based image steganography for enhancing security via adversarial attack and pixel-wise deep fusion [Paper][Review ]

Human-level play in the game of Diplomacy by combining language models with strategic reasoning [Paper][Review]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [Paper][Review]

Learning Video Representations from Large Language Models [Paper][Review]

2021

CLIP (Contrastive Language–Image Pre-training) [Paper][Review]

An Image is Worth 16X16 Wrods: Transformers for Image Recognition at Scale [Paper][Review]

w2v-BERT: Combining Contrastive Learning and Masked Language Modelling for Self-Supervised Speech Pre-Training [Paper][Review]

Rainbow Memory: Continual Learning with a Memory of Diverse Samples [Paper][Review]

Center-based 3D Object Detection and Tracking [Paper][Review]

GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds [Paper][Review]

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [Paper][Review]

Creative Sketch Genetation [Paper][Review]

Binary TTC: A Temporal Geofence for Autonomous Navigation[Paper][Review]

On The Frequency Bias of Generative Models [Paper][Review]

2020

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension [Paper][Review]

Machine-Unlearning [Paper][Review]

Big Bird: Transformers for Longer Sequences [Paper][Review]

Feature Fusion Attention Network for Single Image Dehazing [Paper][Review]

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild [Paper][Review]

You Only Train Once: Loss-conditional training of deep networks [Paper][Review]

GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce [Paper][Review]

Semantically multi-modal image synthesis [Paper][Review]

Learning to Simulate Dynamic Environments with GameGAN [Paper][Review]

Adversarial Policies : Attacking deep reinforcement learning [Paper][Review]

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning [Paper][Review]

2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [Paper][Review]

Stand-Alone Self-Attention in Vision Models [Paper][Review]

Zero-Shot Entity Linking by Reading Entity Descriptions [Paper][Review]

Do you know that Florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment [Paper][Review]

Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations [Paper][Review]

Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts [Paper][Review]

Putting an End to End-to-End: Gradient-Isolated Learning of Representations [Paper][Review]

Bridging the Gap between Training and Inference for Neural Machine Translation [Paper][Review]

Designing and Interpreting Probes with Control Tasks [Paper][Review]

Specializing Word Embeddings (for Parsing) by Information Bottleneck [Paper][Review]

vGraph: A Generative Model for Joint Community Detection and Node Representational Learning [Paper][Review]

Uniform convergence may be unable to explain generalization in deep learning [Paper][Review]

SinGAN: Learning a Generative Model from a Single Natural Image [Paper][Review]

Graph U-Nets [Paper][Review]

Feature Denoising for Improving Adversarial Robustness [Paper][Review]

This Looks Like That: Deep Learning for Interpretable Image Recognition [Paper][Review]

2018

Cycle-Dehaze: Enhanced CycleGAN for Single Image Dehazing [Paper][Review]

A Style-Based Generator Architecture for Generative Adversarial Networks [Paper][Review]

CyCADA: Cycle-Consistent Adversarial Domain Adaptation [Paper][Review]

2017