-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port optimizations from HEaaN.mlir paper #635
Comments
In the paper they also describe a potential optimization of modulo arithmetic by removing the use of a modulo operator (
becomes
We are able to avoid the use of potential division used in the remainder operation and only have runtime multiplication and bitshift as the Barrett ratio is able to be statically computed. We would be able to directly use this optimization in the current NTT lowering. Further, they describe the use of a data-flow analysis to be able to reduce the number of So I would propose the following steps to implement the papers optimizations:
Looking for feedback in all aspects of the proposed solution, especially operation names. |
Nice! I'm excited to see the difference when applied to the NTT lowering :)
Just to check me: for computing the Barret ratio, during
Hmm yes that's a good point. I'm a little curious how just an operation will play out. Do we need an attribute on the polynomial type itself to mark that it is normalized? Without it I would think that a |
Another possibility I was considering while writing #675 is that we should ensure this invariant holds always. I didn't ultimately do it in that PR because I found some confusing behavior around remsi/remui (either BOTH operands are signed or BOTH are unsigned, which is wrong both ways if you have But we could consider that. I think @AlexanderViand-Intel should chime in since this would have to be compatible with polynomial ISA considerations. Otherwise I think this is a great plan. |
Yes exactly, we can compute the ratio from half the operand bit-width and Hm good point. Thinking out loud: the optimizations would happen after We could use an encoding in the tensor to denote it is normalised wrt. some Another option would be to introduce the I can start now with adding the |
https://dl.acm.org/doi/pdf/10.1145/3591228
The lowest hanging fruit for us seems to be loop fusion passes, which we could apply to the polynomial dialect after ntt is lowered to affine in the
mlir-polynomial-to-llvm
pipelineThe text was updated successfully, but these errors were encountered: