This repo lists some interesting LLM related papers.
- Fine-Tuning LLaMA for Multi-Stage Text Retrieval
- Keywords: Dense Retriever(RepLLaMA), Pointwise Reranker(RankLLaMA), Contrastive loss, MS MARCO, BEIR
- Improving Text Embeddings with Large Language Models
- Keywords: Synthetic Data Generation, Mistral-7b, Contrastive loss, Multilingual Retrieval, BEIR, MTEB
- QLORA: Efficient Finetuning of Quantized LLMs
- Keywords: Low Rank Adapters (LoRA), 4-bit NormalFloat (NF4), Double Quantization, Paged Optimizers
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- Keywords: Self-attention, IO-aware, GPU high bandwidth memory (HBM), GPU SRAM, Block-Sparse Attention
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Keywords: KV cache, PagedAttention, Classical Virtual Memory, Paging techniques, vLLM
- Mixtral of Experts
- Keywords: Mixtral 8x7B, Sparse Mixture of Experts (SMoE)
- Mistral 7B
- Keywords: Grouped-Query Attention (GQA), Sliding Window Attention (SWA), Rolling Buffer Cache, Pre-fill and Chunking
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Keywords: Grouped-Query Attention (GQA), Context Length 4k, 2.0T Tokens
- LLaMA: Open and Efficient Foundation Language Models
- Keywords: Pre-normalization, SwiGLU activation function, Rotary Positional Embeddings (RoPE), Context Length 2k, 1.0T Tokens
- RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
- Keywords: RL from AI Feedback (RLAIF), Generating AI labels, Self Improvement
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
- Keywords: LLM Hallucination, RAG, Knowledge Retrieval, CoNLI, CoVe
- A Survey of Large Language Models
- Keywords: PLMs, LLMs, Pre-training, Adaptation tuning, Capacity evaluation
- RAGAS A framework to evaluate RAG: https://docs.ragas.io/en/stable/