Skip to content

Commit

Permalink
Add StreamingLLM support to studio2 chat (#2060)
Browse files Browse the repository at this point in the history
* Streaming LLM

* Update precision and add gpu support

* (studio2) Separate weights generation for quantization support

* Adapt prompt changes to studio flow

* Remove outdated flag from llm compile flags.

* (studio2) use turbine vmfbRunner

* tweaks to prompts

* Update CPU path and llm api test.

* Change device in test to cpu.

* Fixes to runner, device names, vmfb mgmt

* Use small test without external weights.
  • Loading branch information
monorimet committed Feb 12, 2024
1 parent be4c49a commit 1541b21
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 0 deletions.
1 change: 1 addition & 0 deletions apps/shark_studio/tests/api_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import logging
import unittest
import json
from apps.shark_studio.api.llm import LanguageModel
import gc

from apps.shark_studio.api.llm import LanguageModel, llm_chat_api
Expand Down
2 changes: 2 additions & 0 deletions apps/shark_studio/web/ui/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@

B_SYS, E_SYS = "<s>", "</s>"

B_SYS, E_SYS = "<s>", "</s>"


def user(message, history):
# Append the user's message to the conversation history
Expand Down

0 comments on commit 1541b21

Please sign in to comment.