Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM] add deploy server #9581

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

kevincheng2
Copy link

PR types

New features

PR changes

Others

Description

add llm deploy server

Copy link

paddle-bot bot commented Dec 9, 2024

Thanks for your contribution!

Copy link

codecov bot commented Dec 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.98%. Comparing base (753436a) to head (2e1c5a8).
Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9581      +/-   ##
===========================================
+ Coverage    52.97%   52.98%   +0.01%     
===========================================
  Files          703      703              
  Lines       110981   110982       +1     
===========================================
+ Hits         58788    58809      +21     
+ Misses       52193    52173      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


<h1 align="center"><b><em>大模型服务化部署</em></b></h1>

*该部署工具是基于英伟达Triton框架专为服务器场景的大模型服务化部署而设计。它提供了支持gRPC、HTTP协议的服务接口,以及流式Token输出能力。底层推理引擎支持连续批处理、weight only int8、后训练量化(PTQ)等加速优化策略,为用户带来易用且高性能的部署体验。*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在 /llm/readme.md 上面也加上使用介绍吧。

*该部署工具是基于英伟达Triton框架专为服务器场景的大模型服务化部署而设计。它提供了支持gRPC、HTTP协议的服务接口,以及流式Token输出能力。底层推理引擎支持连续批处理、weight only int8、后训练量化(PTQ)等加速优化策略,为用户带来易用且高性能的部署体验。*

# 快速开始

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

必须要搭配特定镜像使用吗?不能像vllm一样随意部署吗?

# 下载模型
wget https://paddle-qa.bj.bcebos.com/inference_model/Meta-Llama-3-8B-Instruct-A8W8C8.tar
mkdir Llama-3-8B-A8W8C8 && tar -xf Meta-Llama-3-8B-Instruct-A8W8C8.tar -C Llama-3-8B-A8W8C8

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以提供更过开箱即用的量化模型吗?



```
from fastdeploy_client.chatbot import ChatBot
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm,还是需要fd吗?fd的安装会不会有问题,版本要求,兼容性要求呢?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议这个文件夹 不要 弄 fd 了吧,容易混淆



```
from fastdeploy_client.chatbot import ChatBot
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议这个文件夹 不要 弄 fd 了吧,容易混淆

REQUIRED_PACKAGES = fin.read()

setuptools.setup(
name="fastdeploy-client",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以换一个 paddle 或者 paddlenlp 之类名字?


docker run --gpus all --shm-size 5G --network=host --privileged --cap-add=SYS_PTRACE \
-v ${MODEL_PATH}:/models/ \
-dit registry.baidubce.com/paddlepaddle/fastdeploy:llm-serving-cuda123-cudnn9-v1.2 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能否去除掉特定docker的依赖。依赖特定docker的话,会使得易用性降低。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants