Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT8 support #39

Open
volkerha opened this issue Oct 6, 2022 · 0 comments
Open

INT8 support #39

volkerha opened this issue Oct 6, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@volkerha
Copy link

volkerha commented Oct 6, 2022

Describe a requested feature

I wonder if there's any plan to support 8bit inference in parallelformers. Right now, we can load 🤗 transformers models in 8bit like here, e.g.:

model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)

However, it's not possible to parallelize() the model with parallelformers since only fp16 mode is supported at the moment.

Expected behavior

If 8bit inference could be supported, it would good to add another argument as for fp16, e.g.

from parallelformers import parallelize

model = AutoModelForCausalLM.from_pretrained(model_name)
parallelize(model, num_gpus=2, int8=True, verbose='detail')
# or one argument for precision mode, where dtype can be either "int8", "fp16", or "fp32" (default)
# parallelize(model, num_gpus=2, dtype='int8', verbose='detail')
@volkerha volkerha added the enhancement New feature or request label Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant