You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, it's not possible to parallelize() the model with parallelformers since only fp16 mode is supported at the moment.
Expected behavior
If 8bit inference could be supported, it would good to add another argument as for fp16, e.g.
fromparallelformersimportparallelizemodel=AutoModelForCausalLM.from_pretrained(model_name)
parallelize(model, num_gpus=2, int8=True, verbose='detail')
# or one argument for precision mode, where dtype can be either "int8", "fp16", or "fp32" (default)# parallelize(model, num_gpus=2, dtype='int8', verbose='detail')
The text was updated successfully, but these errors were encountered:
Describe a requested feature
I wonder if there's any plan to support 8bit inference in parallelformers. Right now, we can load 🤗 transformers models in 8bit like here, e.g.:
However, it's not possible to
parallelize()
the model with parallelformers since only fp16 mode is supported at the moment.Expected behavior
If 8bit inference could be supported, it would good to add another argument as for fp16, e.g.
The text was updated successfully, but these errors were encountered: