Code for AI on GKE guide series #1228

ganochenkodg · 2024-04-04T10:58:38Z

Kustomize patches to run various quantized models in vLLM and TGI runtimes.

brandonroyal · 2024-07-19T15:28:36Z

ai-ml/llm-serving-gemma/vllm/vllm-2b-awq/patch.yaml

+        - --quantization=awq
+        env:
+        - name: MODEL_ID
+          value: dganochenko/gemma-2b-AWQ


Please change this to value: google/gemma-2b-AWQ so it points to the right repository.

it's done already

brandonroyal · 2024-07-19T15:30:08Z

ai-ml/llm-serving-gemma/vllm/vllm-2b.yaml

Small detail but can we remove this file change?

File recovered

brandonroyal · 2024-07-19T15:30:44Z

ai-ml/llm-serving-gemma/vllm/vllm-7b-awq/patch.yaml

+        - --quantization=awq
+        env:
+        - name: MODEL_ID
+          value: dganochenko/gemma-7b-AWQ


Please change to value: google/gemma-7b-AWQ to ensure we're pointing at the right repository

it's done already

brandonroyal · 2024-07-19T16:10:27Z

Looks good. @TarasRudko @ganochenkodg Can we mark this as ready for review?

snippet-bot · 2024-07-19T20:52:02Z

No region tags are edited in this PR.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

Refresh this comment

brandonroyal · 2024-07-19T21:26:43Z

ai-ml/llm-serving-gemma/vllm/vllm-2b.yaml

Is deleting this file intentional? Looks like deleting it will break this doc
https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-vllm

bourgeoisor

Looks good from my end

ganochenkodg added 9 commits March 26, 2024 01:02

add kustomizations and rwo-rom disk

654d3a5

update patches

044edad

fix

c9be370

use another models repo

f29811e

updates

a8e777e

fixed patches

290ce81

update

f4c8d3a

add more patches

2433471

last changes

dddeb2a

ganochenkodg requested review from rbarberop, alizaidis, yoshi-approver and a team as code owners April 4, 2024 10:58

ganochenkodg marked this pull request as draft April 4, 2024 10:58

ganochenkodg and others added 3 commits April 4, 2024 13:12

add headers

32f6147

add headers

3a18f50

Merge branch 'main' into llm-serving-optimization

16467bd

ganochenkodg changed the title ~~Code for AI ob GKE guide series~~ Code for AI on GKE guide series Apr 4, 2024

ganochenkodg and others added 12 commits April 5, 2024 08:40

remove chat models, add parallel examples

fb917c1

add token patches

c20fe1c

quickfix

7ff501b

add kvcache patch

4b87a5e

Merge branch 'GoogleCloudPlatform:main' into llm-serving-optimization

5fbd3e0

Merge branch 'GoogleCloudPlatform:main' into llm-serving-optimization

688b044

Merge branch 'GoogleCloudPlatform:main' into llm-serving-optimization

2ce17ad

add files for Optimizing LLM weights preloading on GKE

39da52c

Modify vllm-awq-7b model

ac5ebb9

Change VLLM GPTQ quantization example

1184775

Update TGI quantization example to use llama base manifest

cb5a559

Clean up - llama and doc#2 samples removing

c54b8f3

brandonroyal suggested changes Jul 19, 2024

View reviewed changes

Cleanup

a8ed659

brandonroyal approved these changes Jul 19, 2024

View reviewed changes

remove unused file

a77495d

ganochenkodg marked this pull request as ready for review July 19, 2024 20:51

ganochenkodg and others added 2 commits July 19, 2024 22:52

Merge branch 'main' into llm-serving-optimization

7288fea

Add unintentionally deleted file

144acee

brandonroyal approved these changes Jul 22, 2024

View reviewed changes

bourgeoisor approved these changes Jul 22, 2024

View reviewed changes

bourgeoisor merged commit 077a971 into GoogleCloudPlatform:main Jul 22, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code for AI on GKE guide series #1228

Code for AI on GKE guide series #1228

ganochenkodg commented Apr 4, 2024

brandonroyal Jul 19, 2024

ganochenkodg Jul 19, 2024

brandonroyal Jul 19, 2024

ganochenkodg Jul 19, 2024

TarasRudko Jul 22, 2024

brandonroyal Jul 19, 2024

ganochenkodg Jul 19, 2024

brandonroyal commented Jul 19, 2024

snippet-bot bot commented Jul 19, 2024 •

edited

Loading

brandonroyal Jul 19, 2024

bourgeoisor left a comment

Code for AI on GKE guide series #1228

Code for AI on GKE guide series #1228

Conversation

ganochenkodg commented Apr 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brandonroyal commented Jul 19, 2024

snippet-bot bot commented Jul 19, 2024 • edited Loading

No region tags are edited in this PR.

Choose a reason for hiding this comment

bourgeoisor left a comment

Choose a reason for hiding this comment

snippet-bot bot commented Jul 19, 2024 •

edited

Loading