How to make quantised models work faster on CPU machines #132
-
The assumption is that quantised models are better suited on CPU machines than their non-quantised counterparts. So what governs the performance of quantized versions of the embeddings (both during embedding generation as well as runtime inferencing) on these CPU machines? Number of CPU cores? Or RAM? Or both? Will a machine with more cores be able to parallel process higher number of queries with good performance, or does it require RAM as well to do that? Please advise. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hey @TheRabidWolverine, couple of things:
I hope this answers your questions? Please feel free to ask more follow up questions! |
Beta Was this translation helpful? Give feedback.
-
Thanks Nirant. Do more cores necessarily make the performance faster?Sent from my iPhoneOn 23-Feb-2024, at 10:35 AM, Nirant ***@***.***> wrote:
Closed #132 as resolved.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
Hey @TheRabidWolverine, couple of things:
I hope this answers your questions? Please feel free …