- Neural Networks are Decision Trees
- Cross-Validation Bias due to Unsupervised Preprocessing
- The Forward-Forward Algorithm: Some Preliminary Investigations
- LoRA: Low-Rank Adaptation of Large Language Models (included here as it has applications beyond LLMs)
- Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
-
ViT related:
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
-
Training data-efficient image transformers & distillation through attention
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
-
(CLIP) Learning Transferable Visual Models From Natural Language Supervision
-
-
Diffusion related:
-
Taming Transformers for High-Resolution Image Synthesis (VQGAN)
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
-
Training language models to follow instructions with human feedback
-
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
-
Toolformer: Language Models Can Teach Themselves to Use Tools
Contributions are very welcome, please share back with the wider community (and get credited for it)!
Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.
Back to main page (table of contents)