ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

This repository provides the implementation for our paper "ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification". Our approach introduces an adaptive KV cache mixed-precision quantization method for LLMs.

arXiv | BibTeX

Getting Started

Follow the step-by-step tutorial to set up ZipCache.

Step 1: Setup

Create a virtual environment and install dependencies as specified by requirements.txt. Then install flash_attn and zipcache as follows:

pip install packaging ninja
pip install flash-attn --no-build-isolation
pip install -e .

Step 2: Download Pretrained Models

Download the pretrained LLaMA model from huggingface and modify the MODEL_PATH in zipcache_generation_demo.py.

Step 3: Inference with ZipCache

python3 zipcache_generation_demo.py

BibTeX

If you find this work useful for your research, please consider citing:

@article{he2024zipcache,
  title={ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification},
  author={He, Yefei and Zhang, Luoming and Wu, Weijia and Liu, Jing and Zhou, Hong and Zhuang, Bohan},
  journal={arXiv preprint arXiv:2405.14256},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
asset		asset
zipcache		zipcache
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
zipcache_generation_demo.py		zipcache_generation_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

Getting Started

Step 1: Setup

Step 2: Download Pretrained Models

Step 3: Inference with ZipCache

BibTeX

About

Releases

Packages

Languages

License

ThisisBillhe/ZipCache

Folders and files

Latest commit

History

Repository files navigation

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

Getting Started

Step 1: Setup

Step 2: Download Pretrained Models

Step 3: Inference with ZipCache

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages