quantizeHFmodel

Accepts Hugging Face models, and automatically downloads and quantizes it with Bits And Bytes (BNB). This process can be done entirely with CPU and RAM with acceptable performance. (About 10 minutes to quantize a 90GB model for me.)

Generates q4_k_m, q5_k_m, q8_0 by default.

Remember to export your Hugging Face Token like so:

export HUGGING_FACE_HUB_TOKEN="YOUR_TOKEN"

An example of using the script is like:

python3 quantizeHFmodel.py fireworks-ai/firefunction-v1

I'm also hosting quantizeHQQ here - it does the same thing except quantizes with HQQ (https://github.com/mobiusml/hqq), theoretically yielding a better quality quant. However, this takes crazy amounts of VRAM to do, on the order of >100GB. I don't have that, but if this is useful to you, more power to you.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
quantizeHFmodel.py		quantizeHFmodel.py
quantizeHQQ.py		quantizeHQQ.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quantizeHFmodel

About

Releases

Packages

Languages

License

CharlesMod/quantizeHFmodel

Folders and files

Latest commit

History

Repository files navigation

quantizeHFmodel

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages