-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ causes poorly generated text #540
Comments
cc @HDCharles could you help take a look, I think the issue might be in the custom code. |
@MDK8888 I'd also suggest to do a proper eval as well, maybe you can evaluate the wikitext perplexity and compare the result before and after quantization? |
I doubt an eval would help as a first step here, per @MDK8888 the outputs were just garbage That said @MDK8888 it's quite difficult and unrealistic for us to debug numerics issues for custom quantization implementations so is there any way you can create a minimal repro using the existing gptq implementation here in ao? |
Hey, I will try to create a minimal repo over the weekend using the existing GPTQ implementation in ao and share the results here. @jerryzh168 @msaroufim thanks for responding! |
Hey, sorry for the late response! I tried working with the existing GPTQ implementation in ao, but I was getting a little bit confused with the MultiInputs. I have the repository linked here with my progress: https://github.com/MDK8888/GPTQTest - will keep working on this throughout the week to try and get it to work. |
Hey, I was able to fix the issue in GPTFast 0.3.1 - as it turns out, the linear layers in a lot of transformers actually have a bias :) |
Hey, I'm the creator of the GPTFast, which scales the techniques outlined in gpt-fast to more models. I use a combination of AutoGPTQ as well as the GPTQ quantization methods here when I quantize, but the quality of the generated text after quantization is poor, often repeating a token many times once quantized. My quantization method is linked here: https://github.com/MDK8888/GPTFast/blob/master/GPTFast/Core/Quantize/GPTQ/Quantizers/GPTQModelQuantizer.py.
The text was updated successfully, but these errors were encountered: