Skip to content

Releases: VectorInstitute/vector-inference

v0.4.0.post1

28 Nov 19:23
dba901b
Compare
Choose a tag to compare
  • Fix wrong dependency
  • Updated README files

v0.4.0

28 Nov 18:21
d221dae
Compare
Choose a tag to compare
  • Onboarded various new models and new model types: text embedding model and reward reasoning model.
  • Added metrics command that streams performance metrics for inference server.
  • Enabled more launch command options: --max-num-seqs, --model-weights-parent-dir, --pipeline-parallelism, --enforce-eager.
  • Improved support for launching custom models.
  • Improved command response time.
  • Improved visuals for list command.

v0.3.3

03 Sep 21:53
d10758d
Compare
Choose a tag to compare
  • Added missing package in decencies
  • Fixed pre-commit hooks
  • Linted and formatted code
  • Updated outdated examples

v0.3.2

03 Sep 18:27
39b98a2
Compare
Choose a tag to compare
  • Add support for custom models, users can now launch custom models as long as the model architecture is supported by vllm
  • Minor update multi-node job launching to better support custom models
  • Add Llama3-OpenBioLLM-70B to supported model list

v0.3.1

29 Aug 13:41
f43d7bf
Compare
Choose a tag to compare
  • Add model-name argument to list command to show default setup of a specific supported model
  • Improved command option descriptions
  • Restructured models directory
  • Add some default values for using a custom model

v0.3.0

29 Aug 06:09
156dfa5
Compare
Choose a tag to compare
  • Added vec-inf CLI:

    • Install vec-inf via pip
    • launch command to launch models
    • status command to check inference server status
    • shutdown command to stop inference server
    • list command to see all available models
  • Upgraded vllm to 0.5.4

  • Added support for new model families:

    • Llama 3.1 (Including 405B)
    • Gemma 2
    • Phi 3
    • Mistral Large

v0.2.1

06 Jul 15:58
2c43a25
Compare
Choose a tag to compare
  • Add CodeLlama
  • Update model variant names for Llama 2 in README

v0.2.0

04 Jul 14:29
635e13f
Compare
Choose a tag to compare
  • Update default environment to use singularity container, added associated Dockerfile
  • Update vLLM to 0.5.0 and added VLM support (LLaVa-1.5 and LLaVa-NEXT) and updated example scripts
  • Refactored repo structure for simpler model onboard and update process

v0.1.1

23 May 20:32
Compare
Choose a tag to compare
  • Update vllm to 0.4.2, which resolves the flash attention package not found issue
  • Update instructions for using the default environment to prevent/resolve NCCL not found error

v0.1.0

24 Apr 20:21
0784588
Compare
Choose a tag to compare

Easy-to-use high-throughput LLM inference on Slurm clusters using vLLM

Supported models and variants:

  • Command R plus
  • DBRX: Instruct
  • Llama 2: 7b, 7b-chat, 13b, 13b-chat, 70b, 70b-chat
  • Llama 3: 8B, 8B-Instruct, 70B, 70B-Instruct
  • Mixtral: 8x7B-Instruct-v0.1, 8x22B-v0.1, 8x22B-Instruct-v0.1

Supported functionalities:

  • Completions and chat completions
  • Logits generation