Llama2 Flask AI Server using LLamaCPP

Supported:

CPU
M1 Metal GPU
Cuda GPU

Warning:

Docker MacOS üzerinde en az 5 kat yavaş çalışıyor, sanallaştırmadan kaynaklı. Native kurulum yapın.
Linux üzerinde GPU kullanımı için ekran kartı sürücüleri host sisteme yüklenmeli ve docker kullanılacak ise "Cuda Container Toolkit" yüklenmeli
Model dosyaları HuggingFace'ten otomatik olarak "models" dizinine indirilir. Bu dizin eğer docker ile kullanılacak ise paylaşım yapılmalı.
MacOS için xcode-select --install gereklidir.
7b 16GB - 13b 32GB - 70b 140GB ortalama Ram|vRam gerektirir.
70b CPU üzerinde çalıştırılmamalıdır.

Dependencies/Model:

ENV:

The predefined models are in the src/models.py file.

MODEL=7b-Q4KM-CHAT
HOST=0.0.0.0
PORT=3000

Installation

Mac M1/M2 Metal GPU

git clone <repo>
sh ./install_mac.sh

Linux for CPU

git clone <repo>
sh ./install_linux.sh

Linux for GPU

Install Cuda Driver

git clone <repo>
sh ./install_linux.sh

Docker (Only Linux, very slow on MacOS)

Install Cuda Driver
Install Cuda Container Toolkit

docker compose up --remove-orphans --build

Run

python3 src/server.py

API

HTTP Server: 127.0.0.1:3000

endpoint: http://127.0.0.1:3000/ask

RAW JSON Content:

{
    "textContext": "My name is Cesur Apaydın",
    "pdfContextBase64": "",
    "prompts": [
        "What is person's full name?",
        "What is person's role?",
        "What are their skills?"
    ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
src		src
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
index.html		index.html
install_linux.sh		install_linux.sh
install_mac.sh		install_mac.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama2 Flask AI Server using LLamaCPP

Installation

Mac M1/M2 Metal GPU

Linux for CPU

Linux for GPU

Docker (Only Linux, very slow on MacOS)

Run

API

About

Releases

Packages

Languages

cesurapp/llama2-pdf

Folders and files

Latest commit

History

Repository files navigation

Llama2 Flask AI Server using LLamaCPP

Installation

Mac M1/M2 Metal GPU

Linux for CPU

Linux for GPU

Docker (Only Linux, very slow on MacOS)

Run

API

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages