Releases · 3Simplex/llama.cpp

10 Dec 15:06

26a8406

b4295 Latest

Latest

CUDA: fix shared memory access condition for mmv (#10740)

Assets 22

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-12-10T15:06:38Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-12-10T15:06:47Z
llama-b4295-bin-macos-arm64.zip

52.7 MB 2024-12-10T15:07:00Z
llama-b4295-bin-macos-x64.zip

54.2 MB 2024-12-10T15:07:02Z
llama-b4295-bin-ubuntu-x64.zip

59.3 MB 2024-12-10T15:07:04Z
llama-b4295-bin-win-avx-x64.zip

8.35 MB 2024-12-10T15:07:07Z
llama-b4295-bin-win-avx2-x64.zip

8.35 MB 2024-12-10T15:07:08Z
llama-b4295-bin-win-avx512-x64.zip

8.36 MB 2024-12-10T15:07:08Z
llama-b4295-bin-win-cuda-cu11.7-x64.zip

146 MB 2024-12-10T15:07:09Z
llama-b4295-bin-win-cuda-cu12.4-x64.zip

145 MB 2024-12-10T15:07:15Z
Source code (zip)

2024-12-09T19:07:12Z
Source code (tar.gz)

2024-12-09T19:07:12Z

08 Dec 17:16

github-actions

b4288

43ed389

b4288

llama : use cmake for swift build (#10525)

* llama : use cmake for swift build

* swift : <> -> ""

* ci : remove make

* ci : disable ios build

* Revert "swift : <> -> """

This reverts commit d39ffd9556482b77d4ea5b118b453fc1c097a31d.

* ci : try fix ios build

* ci : cont

* ci : cont

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 22

05 Dec 17:44

github-actions

b4271

0cd182e

b4271

sync : ggml

Assets 22

04 Dec 14:37

github-actions

b4265

59f4db1

b4265

ggml : add predefined list of CPU backend variants to build (#10626)

* ggml : add predefined list of CPU backend variants to build

* update CPU dockerfiles

Assets 22

03 Dec 19:41

github-actions

b4254

91c36c2

b4254

server : (web ui) Various improvements, now use vite as bundler (#10599)

* hide buttons in dropdown menu

* use npm as deps manager and vite as bundler

* fix build

* fix build (2)

* fix responsive on mobile

* fix more problems on mobile

* sync build

* (test) add CI step for verifying build

* fix ci

* force rebuild .hpp files

* cmake: clean up generated files pre build

Assets 22

03 Dec 16:27

github-actions

b4248

3b4f2e3

b4248

llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636)

Assets 22

25 Nov 16:58

github-actions

b4164

9ca2e67

b4164

server : add speculative decoding support (#10455)

* server : add speculative decoding support

ggml-ci

* server : add helper function slot.can_speculate()

ggml-ci

Assets 21

22 Nov 14:32

github-actions

b4153

6dfcfef

b4153

ci: Update oneAPI runtime dll packaging (#10428)

This is the minimum runtime dll dependencies for oneAPI 2025.0

Assets 22

20 Nov 20:51

github-actions

b4145

9abe9ee

b4145

vulkan: predicate max operation in soft_max shaders/soft_max (#10437)

Fixes #10434

Assets 21

19 Nov 15:20

github-actions

b4132

3ee6382

b4132

cuda : fix CUDA_FLAGS not being applied (#10403)

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: 3Simplex/llama.cpp

b4295

b4288

b4271

b4265

b4254

b4248

b4164

b4153

b4145

b4132