Add StreamingLLM support to studio2 chat (#2060)

* Streaming LLM * Update precision and add gpu support * (studio2) Separate weights generation for quantization support * Adapt prompt changes to studio flow * Remove outdated flag from llm compile flags. * (studio2) use turbine vmfbRunner * tweaks to prompts * Update CPU path and llm api test. * Change device in test to cpu. * Fixes to runner, device names, vmfb mgmt * Use small test without external weights.
nod-ai · Feb 12, 2024 · 1541b21 · 1541b21
1 parent be4c49a
commit 1541b21
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 0 deletions.
diff --git a/apps/shark_studio/tests/api_test.py b/apps/shark_studio/tests/api_test.py
@@ -7,6 +7,7 @@
 import logging
 import unittest
 import json
+from apps.shark_studio.api.llm import LanguageModel
 import gc
 
 from apps.shark_studio.api.llm import LanguageModel, llm_chat_api

diff --git a/apps/shark_studio/web/ui/chat.py b/apps/shark_studio/web/ui/chat.py
@@ -13,6 +13,8 @@
 
 B_SYS, E_SYS = "<s>", "</s>"
 
+B_SYS, E_SYS = "<s>", "</s>"
+
 
 def user(message, history):
     # Append the user's message to the conversation history