Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code for AI on GKE guide series #1228

Merged
merged 28 commits into from
Jul 22, 2024

Conversation

ganochenkodg
Copy link
Contributor

  • Kustomize patches to run various quantized models in vLLM and TGI runtimes.

@ganochenkodg ganochenkodg marked this pull request as draft April 4, 2024 10:58
@ganochenkodg ganochenkodg changed the title Code for AI ob GKE guide series Code for AI on GKE guide series Apr 4, 2024
- --quantization=awq
env:
- name: MODEL_ID
value: dganochenko/gemma-2b-AWQ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this to value: google/gemma-2b-AWQ so it points to the right repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's done already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small detail but can we remove this file change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File recovered

- --quantization=awq
env:
- name: MODEL_ID
value: dganochenko/gemma-7b-AWQ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to value: google/gemma-7b-AWQ to ensure we're pointing at the right repository

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's done already

@brandonroyal
Copy link
Contributor

Looks good. @TarasRudko @ganochenkodg Can we mark this as ready for review?

@ganochenkodg ganochenkodg marked this pull request as ready for review July 19, 2024 20:51
Copy link

snippet-bot bot commented Jul 19, 2024

No region tags are edited in this PR.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is deleting this file intentional? Looks like deleting it will break this doc
https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-vllm

Copy link
Member

@bourgeoisor bourgeoisor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my end

@bourgeoisor bourgeoisor merged commit 077a971 into GoogleCloudPlatform:main Jul 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants