Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Runpod Provider #157

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add Runpod Provider #157

wants to merge 2 commits into from

Conversation

pandyamarut
Copy link

Why this PR
We want to add Runpod as remote inference provider for Llama-stack. Runpod endpoints are OpenAI Compatible, hence it's recommended to use it with Runpod model serving endpoints.

What does PR Includes

  1. Integration with the Distribution.
  2. OpenAI as a Client.

How did we test?
After setting the configuration by providing the : endpoint_url and api_key and keeping other settings as a default, launched a server using:

llama stack run remote_runpod --port 8080.

  1. Invoke the call(streaming):
    curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":true}'

Response:

data: {"event":{"event_type":"start","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"Here","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"'s","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"complete","delta":"","logprobs":null,"stop_reason":"end_of_turn"}}
  1. Invoke the call(non-streaming)
    curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":false}'

Response:

data: {"completion_message":{"role":"assistant","content":"Here's a 2-sentence poem about the moon:\n\nThe moon glows softly in the midnight sky, \nA beacon of peace, as it drifts gently by.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}

Marut Pandya and others added 2 commits September 30, 2024 03:52
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2024
@pandyamarut
Copy link
Author

@ashwinb @yanxi0830 @hardikjshah when can I expect review? Thanks.

@ashwinb
Copy link
Contributor

ashwinb commented Oct 3, 2024

Thanks for the PR @pandyamarut! We are putting together a few tests in the repository now so we can make sure inference works reliably (especially w.r.t. tool calling, etc.) wherever we are dealing with openai-compatible endpoints. Usually we vastly prefer a raw token API (e.g., HuggingFace's text-generation one https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/tgi/tgi.py#L136). Expect some changes around here in a couple days. I will post an update when this happens. There are a couple other inference-related PRs also which are kind of languishing without review because of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants