-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue getting Llama3 8b running on GKE #43
Comments
Hi Francesco, |
any updates? |
I just re-tried this with |
@tengomucho Unfortunately that didn't work. I used the same manifests as above with the changes you mentioned. I also rebuilt the docker image with the latest changes from main. What TPU are you running on? Is it possible that the v5e node is not big enough, and its unable to use multiple nodes? I can try on a v5p if that's better |
I tried on a |
hmm I don't see why my K8s config would be any different to that. Is there a prebuilt public Docker image I can test out? |
Let me cook one for you, I'll do it on Monday and I'll get back to you. |
any update on this, I had the same issue with GKE, none of huggingface model works( gemma-2b, mistral, llama etc). No error in logs either, just hang with Info: Warming up model for gemma, For Misrtral a little bit different: |
At the same time, I was able to try the following example test inside GKE POD created. |
@tengomucho, any comment on optimum-tpu on GKE issues or potentially public image? |
Hey, sorry it took me longer to get this done, but you should be able to test this TGI image |
Thank you, @tengomucho, got stuck/hang on same step on Warming up: |
Umh strange, I just tested it and it worked fine. I tested with this command line BTW:
And it took ~12s to warm up:
|
I believe it is GKE specific, |
@tengomucho Im seeing the same thing. I retried my deployment manifest I pasted above but with the image |
this one works for me:
|
thanks, @liurupeng, 2024-06-26 22:43:50.501 EDT So, I assume the TGI model should be up and running, but the curl validation command throws connection refused error( I tried both container port 80 or 8000):
Did you try the curl connection to validate? |
@rick-c-goog I ran the below command:
|
Thanks, @liurupeng , the port-forward curl to 127.0.0.1 working, then busybox curl to service cluster IP afterwards |
@tengomucho I am testing optimum-tpu with |
@Bihan For now we have only tested |
@tengomucho Thank you for a quick a reply. Do you think testing with v2-8 or v3-8 would require a major modification? |
I'm trying to deploy Llama3 8b on GKE using optimum but running into some troubles.
Following instructions here: https://github.com/huggingface/optimum-tpu/tree/main/text-generation-inference. I built the docker image using the make command mentioned.
The server will start booting up, but gets stuck at "Warming up model". See logs below:
Here's my config:
Any ideas?
The text was updated successfully, but these errors were encountered: