-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM] add deploy server #9581
base: develop
Are you sure you want to change the base?
[LLM] add deploy server #9581
Conversation
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #9581 +/- ##
===========================================
+ Coverage 52.97% 52.98% +0.01%
===========================================
Files 703 703
Lines 110981 110982 +1
===========================================
+ Hits 58788 58809 +21
+ Misses 52193 52173 -20 ☔ View full report in Codecov by Sentry. |
|
||
<h1 align="center"><b><em>大模型服务化部署</em></b></h1> | ||
|
||
*该部署工具是基于英伟达Triton框架专为服务器场景的大模型服务化部署而设计。它提供了支持gRPC、HTTP协议的服务接口,以及流式Token输出能力。底层推理引擎支持连续批处理、weight only int8、后训练量化(PTQ)等加速优化策略,为用户带来易用且高性能的部署体验。* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在 /llm/readme.md 上面也加上使用介绍吧。
*该部署工具是基于英伟达Triton框架专为服务器场景的大模型服务化部署而设计。它提供了支持gRPC、HTTP协议的服务接口,以及流式Token输出能力。底层推理引擎支持连续批处理、weight only int8、后训练量化(PTQ)等加速优化策略,为用户带来易用且高性能的部署体验。* | ||
|
||
# 快速开始 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
必须要搭配特定镜像使用吗?不能像vllm一样随意部署吗?
# 下载模型 | ||
wget https://paddle-qa.bj.bcebos.com/inference_model/Meta-Llama-3-8B-Instruct-A8W8C8.tar | ||
mkdir Llama-3-8B-A8W8C8 && tar -xf Meta-Llama-3-8B-Instruct-A8W8C8.tar -C Llama-3-8B-A8W8C8 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以提供更过开箱即用的量化模型吗?
|
||
|
||
``` | ||
from fastdeploy_client.chatbot import ChatBot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm,还是需要fd吗?fd的安装会不会有问题,版本要求,兼容性要求呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这个文件夹 不要 弄 fd 了吧,容易混淆
|
||
|
||
``` | ||
from fastdeploy_client.chatbot import ChatBot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这个文件夹 不要 弄 fd 了吧,容易混淆
REQUIRED_PACKAGES = fin.read() | ||
|
||
setuptools.setup( | ||
name="fastdeploy-client", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以换一个 paddle 或者 paddlenlp 之类名字?
|
||
docker run --gpus all --shm-size 5G --network=host --privileged --cap-add=SYS_PTRACE \ | ||
-v ${MODEL_PATH}:/models/ \ | ||
-dit registry.baidubce.com/paddlepaddle/fastdeploy:llm-serving-cuda123-cudnn9-v1.2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能否去除掉特定docker的依赖。依赖特定docker的话,会使得易用性降低。
PR types
New features
PR changes
Others
Description
add llm deploy server