Add Runpod Provider #157

pandyamarut · 2024-09-30T11:08:44Z

Why this PR
We want to add Runpod as remote inference provider for Llama-stack. Runpod endpoints are OpenAI Compatible, hence it's recommended to use it with Runpod model serving endpoints.

What does PR Includes

Integration with the Distribution.
OpenAI as a Client.

How did we test?
After setting the configuration by providing the : endpoint_url and api_key and keeping other settings as a default, launched a server using:

llama stack run remote_runpod --port 8080.

Invoke the call(streaming):
curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":true}'

Response:

data: {"event":{"event_type":"start","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"Here","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"'s","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"complete","delta":"","logprobs":null,"stop_reason":"end_of_turn"}}

Invoke the call(non-streaming)
curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":false}'

Response:

data: {"completion_message":{"role":"assistant","content":"Here's a 2-sentence poem about the moon:\n\nThe moon glows softly in the midnight sky, \nA beacon of peace, as it drifts gently by.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}

Signed-off-by: Marut Pandya <[email protected]>

Add Runpod Provider

pandyamarut · 2024-10-02T02:11:35Z

@ashwinb @yanxi0830 @hardikjshah when can I expect review? Thanks.

ashwinb · 2024-10-03T18:39:46Z

Thanks for the PR @pandyamarut! We are putting together a few tests in the repository now so we can make sure inference works reliably (especially w.r.t. tool calling, etc.) wherever we are dealing with openai-compatible endpoints. Usually we vastly prefer a raw token API (e.g., HuggingFace's text-generation one https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/tgi/tgi.py#L136). Expect some changes around here in a couple days. I will post an update when this happens. There are a couple other inference-related PRs also which are kind of languishing without review because of this issue.

Marut Pandya and others added 2 commits September 30, 2024 03:52

Add Runpod Provider

02d3ffd

Signed-off-by: Marut Pandya <[email protected]>

Merge pull request #1 from pandyamarut/add-provider-runpod

aeaa982

Add Runpod Provider

pandyamarut requested review from ashwinb, yanxi0830, hardikjshah, dltn and raghotham as code owners September 30, 2024 11:08

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Runpod Provider #157

Add Runpod Provider #157

pandyamarut commented Sep 30, 2024

pandyamarut commented Oct 2, 2024

ashwinb commented Oct 3, 2024

Add Runpod Provider #157

Are you sure you want to change the base?

Add Runpod Provider #157

Conversation

pandyamarut commented Sep 30, 2024

pandyamarut commented Oct 2, 2024

ashwinb commented Oct 3, 2024