Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTQ causes poorly generated text #540

Closed
MDK8888 opened this issue Jul 25, 2024 · 6 comments
Closed

GPTQ causes poorly generated text #540

MDK8888 opened this issue Jul 25, 2024 · 6 comments

Comments

@MDK8888
Copy link

MDK8888 commented Jul 25, 2024

Hey, I'm the creator of the GPTFast, which scales the techniques outlined in gpt-fast to more models. I use a combination of AutoGPTQ as well as the GPTQ quantization methods here when I quantize, but the quality of the generated text after quantization is poor, often repeating a token many times once quantized. My quantization method is linked here: https://github.com/MDK8888/GPTFast/blob/master/GPTFast/Core/Quantize/GPTQ/Quantizers/GPTQModelQuantizer.py.

@jerryzh168
Copy link
Contributor

cc @HDCharles could you help take a look, I think the issue might be in the custom code.

@jerryzh168
Copy link
Contributor

@MDK8888 I'd also suggest to do a proper eval as well, maybe you can evaluate the wikitext perplexity and compare the result before and after quantization?

@msaroufim
Copy link
Member

I doubt an eval would help as a first step here, per @MDK8888 the outputs were just garbage

That said @MDK8888 it's quite difficult and unrealistic for us to debug numerics issues for custom quantization implementations so is there any way you can create a minimal repro using the existing gptq implementation here in ao?

@MDK8888
Copy link
Author

MDK8888 commented Jul 27, 2024

Hey, I will try to create a minimal repo over the weekend using the existing GPTQ implementation in ao and share the results here. @jerryzh168 @msaroufim thanks for responding!

@MDK8888
Copy link
Author

MDK8888 commented Jul 31, 2024

Hey, sorry for the late response! I tried working with the existing GPTQ implementation in ao, but I was getting a little bit confused with the MultiInputs. I have the repository linked here with my progress: https://github.com/MDK8888/GPTQTest - will keep working on this throughout the week to try and get it to work.

@MDK8888
Copy link
Author

MDK8888 commented Aug 22, 2024

Hey, I was able to fix the issue in GPTFast 0.3.1 - as it turns out, the linear layers in a lot of transformers actually have a bias :)

@MDK8888 MDK8888 closed this as completed Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants