v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI
SDXL Export and Inference
Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).
Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge
or larger recommended) or a CPU-only instance (disable the validation with --disable-validation
) :
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/
And then run inference with the class NeuronStableDiffusionXLPipeline
from optimum.neuron import NeuronStableDiffusionXLPipeline
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]
- Add sdxl exporter support by @JingyaHuang in #203
- Add Stable Diffusion XL inference support by @JingyaHuang in #212
Llama v1, v2 Inference
Llama v2 Training
- Llama V2 training support by @michaelbenayoun in #211
- LLama V1 training fix by @michaelbenayoun in #211
TGI
Major bugfixes
neuron_parallel_compile
,ParallelLoader
and Zero-1 fixes for torchneuron 8+ by @michaelbenayoun in #200- flan-t5 fix:
T5Parallelizer
,NeuronCacheCallback
andNeuronHash
refactors by @michaelbenayoun in #207 - Fix optimum-cli broke by optimum 1.13.0 release by @JingyaHuang in #217
Other changes
- Bump Inference APIs to Neuron 2.13 by @JingyaHuang in #206
- Add log for SD when applying optim attn & pipelines lazy loading by @JingyaHuang in #208
- Cancel concurreny CIs for inference by @JingyaHuang in #218
- fix(tgi): typer does not support Union types by @dacorvo in #219
- Bump neuron-cc version to 1.18.* by @JingyaHuang in #224
Full Changelog: v0.0.10...v0.0.11