Skip to content
/ llama Public
forked from meta-llama/llama

Inference code for LLaMA models on CPU and Mac M1/M2 GPU

License

Notifications You must be signed in to change notification settings

krychu/llama

 
 

Repository files navigation

Llama 2 on CPU, and Mac M1/M2 GPU

This is a fork of https://github.com/facebookresearch/llama that runs on CPU and Mac M1/M2 GPU (mps) if available.

Please refer to the official installation and usage instructions as they are exactly the same.

image

MacBook Pro M1 with 7B model:

  • MPS (default): ~4.3 words per second
  • CPU: ~0.67 words per second

There is also an extra message shown during text generation that reports the number and speed at which tokens are being generated.

About

Inference code for LLaMA models on CPU and Mac M1/M2 GPU

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages

  • Python 93.1%
  • Shell 6.9%