Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add StreamingLLM support to studio2 chat (#2060)
* Streaming LLM * Update precision and add gpu support * (studio2) Separate weights generation for quantization support * Adapt prompt changes to studio flow * Remove outdated flag from llm compile flags. * (studio2) use turbine vmfbRunner * tweaks to prompts * Update CPU path and llm api test. * Change device in test to cpu. * Fixes to runner, device names, vmfb mgmt * Use small test without external weights.
- Loading branch information