Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --convert-tensor-to-scalars pass #763

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

AlexanderViand-Intel
Copy link
Collaborator

@AlexanderViand-Intel AlexanderViand-Intel commented Jul 1, 2024

This is a helper pass to remove tensor.insert/extract (with constant indices) and (statically shaped) tensor<...> types from the IR by effectively "unrolling" tensor<axbx!element_type> into a TypeRange of a*b copies of !element_type.

This is necessary for targeting RLWE HW accelerators, the first generation of which "understand" only polynomial operations and datatypes. Specifically, this is necessary as passes such as -bgv-to-polynomial produce polynomial ops on tensors (e.g., adding two standard ciphertext lowers to a polynomial addition of two tensor<2x!polynomial.polynomial>).
While -convert-elementwise-to-affine (see #524) lowers this to (loops over) polynomial operations on individual polynomial values (and the loops can be unrolled via -full-loop-unroll), the resulting IR still contains various tensor operations, primarily tensor.insert/extract.

This PR introduces a simple pass that replaces tensor types (with static shape) by a TypeRange of dim1xdim2... copies of the element type via the OneToNTypeConversion framework. Note: this framework apparently exists because the standard DialectConversion and associated TypeConverter are very broken when it comes to handling 1->N type conversions (as I found out the hard way).

In addition to doing the type conversion, there are also patterns to translate tensor operations to corresponding operations on the ValueRange, but the list is fairly incomplete right now, as there are only patterns for tensor.from_elements and tensor.insert. Surprisingly, together with folding (which, because left-over ValueRanges are materialized as tensor.from_elements, takes care of tensor.extract) this is actually already enough for my primary use case. However, I'd like to make this more robust and actually support any tensor ops that can conceptually be "unrolled" this way.

I'm posting this as a draft as I'd love to get some suggestions for test cases beyond what the -bgv-to-polynomial -convert-elementwise-to-affine -full-loop-unroll pipeline produces, especially given I might want to suggest #524 and this for inclusion upstream at some point.

Open ToDos for this PR before it's ready-for-review:

  • Add (a wide range of) tests (both HE-pipeline-derived and generic)
  • Handle additional ops (e.g. tensor.slice)
  • Add a cut-off size as a pass argument with a sane default, to prevent very large tensors from being "unrolled" accidentally (In our use case, tensors will be of single-digit sizes)

Related ToDos:

  • Address polynomial.ntt/intt/mul_scalar which aren't ElementwiseMappable and therefore not handled by -convert-elementwise-to-affine (PolyToStandard: handling tensors of poly? #143)
  • Write a tutorial on 1->N type conversions in the hopes of preventing more people from having to discover all the DialectConversion bugs/todos/fixme's around this the hard way.

@j2kun
Copy link
Collaborator

j2kun commented Jul 5, 2024

This is necessary for targeting RLWE HW accelerators

Explicitly, this means it has no control flow?

@AlexanderViand-Intel
Copy link
Collaborator Author

This is necessary for targeting RLWE HW accelerators

Explicitly, this means it has no control flow?

Yes, this was something that came up at the HW summit in Leuven last month: at this point, most of the accelerators don't seem to have a native sense of control flow, expecting the host to supply a a simple stream of instructions.

@AlexanderViand-Intel
Copy link
Collaborator Author

AlexanderViand-Intel commented Jul 5, 2024

Related ToDos:

That turned out to be easier than expected, see PR here: #769
EDIT: celebrated a bit too early, as this only deals with ops like mul_scalar (which is ElementwiseMappable) but not ntt/intt
EDIT2: Actually, maybe this does already solve it! We can probably work around this though the right pass ordering, as ntt/intt are added in a separate pass 😄

  1. --bgv-to-polynomial
  2. --convert-elementwise-to-affine
  3. --convert-polynomial-mul-to-ntt
  4. --canonicalize (once Fix upstream polynomial canonicalization rules #749 is resolved)
  5. --full-loop-unroll
  6. --convert-tensor-to-scalars

@AlexanderViand-Intel AlexanderViand-Intel force-pushed the detensorize branch 10 times, most recently from 7876f1a to e600f01 Compare July 11, 2024 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants