Skip to content

Releases: 3Simplex/llama.cpp

b3942

18 Oct 15:24
afd9909
Compare
Choose a tag to compare
rpc : backend refactoring (#9912)

* rpc : refactor backend

Use structs for RPC request/response messages

* rpc : refactor server

b3895

07 Oct 18:13
f1af42f
Compare
Choose a tag to compare
Update building for Android (#9672)

* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android

b3855

01 Oct 14:03
a90484c
Compare
Choose a tag to compare
llama : print correct model type for Llama 3.2 1B and 3B

b3711

09 Sep 13:47
8e6e2fb
Compare
Choose a tag to compare
CUDA: fix variable name conflict for Windows build (#9382)

b3660

03 Sep 17:50
b69a480
Compare
Choose a tag to compare
readme : refactor API section + remove old hot topics

b3640

28 Aug 16:33
66b039a
Compare
Choose a tag to compare
docker : update CUDA images (#9213)

b3613

21 Aug 17:13
fc54ef0
Compare
Choose a tag to compare
server : support reading arguments from environment variables (#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var

b3576

12 Aug 14:24
84eb2f4
Compare
Choose a tag to compare
docs: introduce gpustack and gguf-parser (#8873)

* readme: introduce gpustack

GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.

Signed-off-by: thxCode <[email protected]>

* readme: introduce gguf-parser

GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.

Signed-off-by: thxCode <[email protected]>

---------

Signed-off-by: thxCode <[email protected]>

b3569

11 Aug 16:41
8cd1bcf
Compare
Choose a tag to compare
flake.lock: Update (#8979)

b3549

08 Aug 13:57
afd27f0
Compare
Choose a tag to compare
scripts : sync cann files (#0)