Releases: VectorInstitute/vector-inference
Releases · VectorInstitute/vector-inference
v0.4.0.post1
- Fix wrong dependency
- Updated README files
v0.4.0
- Onboarded various new models and new model types: text embedding model and reward reasoning model.
- Added
metrics
command that streams performance metrics for inference server. - Enabled more launch command options:
--max-num-seqs
,--model-weights-parent-dir
,--pipeline-parallelism
,--enforce-eager
. - Improved support for launching custom models.
- Improved command response time.
- Improved visuals for
list
command.
v0.3.3
v0.3.2
v0.3.1
v0.3.0
-
Added
vec-inf
CLI:- Install
vec-inf
viapip
launch
command to launch modelsstatus
command to check inference server statusshutdown
command to stop inference serverlist
command to see all available models
- Install
-
Upgraded
vllm
to0.5.4
-
Added support for new model families:
- Llama 3.1 (Including 405B)
- Gemma 2
- Phi 3
- Mistral Large
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Easy-to-use high-throughput LLM inference on Slurm clusters using vLLM
Supported models and variants:
- Command R plus
- DBRX: Instruct
- Llama 2: 7b, 7b-chat, 13b, 13b-chat, 70b, 70b-chat
- Llama 3: 8B, 8B-Instruct, 70B, 70B-Instruct
- Mixtral: 8x7B-Instruct-v0.1, 8x22B-v0.1, 8x22B-Instruct-v0.1
Supported functionalities:
- Completions and chat completions
- Logits generation