-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PolyToStandard: handling tensors of poly? #143
Comments
@j2kun very kindly reminded me we can just multi-dim the tensors |
The |
Yep :) multi-dim it is. Haha it totally slipped my mind. I'll update with some code, I believe I need to add a type converter for ranked tensors, and then handle some tensor ops. What's really strange is there's no interface pattern for this :/ which makes it really weird. Even if the type converter exists, I need to add support for converting ops like |
Let's chat a bit about the type conversion when you get back to this. I don't think the type conversion should be particularly hard... |
From our HEIR meeting today I suggested we should be able to find a way to use upstream passes to map elementwise operations to loops, perhaps via linalg. @AlexanderViand-Intel could you post your sample IR? |
Sure, here's a minimal example, something like this also appears when lowering func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = polynomial.add(%arg0, %arg1) : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
} Trying to lower this with Since the whole idea of traits like #map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) outs(%arg0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
^bb0(%in: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %in_0: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %out: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>):
%1 = polynomial.add(%in, %in_0) : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
linalg.yield %1 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
} -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
} However, from here I'm a bit stuck, none of the passes I tried would further simplify this in a meaningful way (e.g., convert it to loop). Note that applying #map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) outs(%arg0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
^bb0(%in: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %in_0: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %out: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>):
%1 = builtin.unrealized_conversion_cast %in : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%2 = builtin.unrealized_conversion_cast %in_0 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%3 = arith.extsi %1 : tensor<1024xi25> to tensor<1024xi26>
%4 = arith.extsi %2 : tensor<1024xi25> to tensor<1024xi26>
%5 = arith.addi %3, %4 : tensor<1024xi26>
%6 = arith.remsi %5, %cst : tensor<1024xi26>
%7 = arith.trunci %6 : tensor<1024xi26> to tensor<1024xi25>
%8 = builtin.unrealized_conversion_cast %7 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
linalg.yield %8 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
} -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
} From here, we could think about how these #map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x1024xi25>, %arg1: tensor<2x1024xi25>) -> tensor<2x1024xi25> {
%0 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1 : tensor<2x1024xi25>, tensor<2x1024xi25>) outs(%arg0 : tensor<2x1024xi25>) {
^bb0(%in: tensor<1024xi25>, %in_0: tensor<1024xi25>, %out: tensor<1024xi25>):
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%3 = arith.extsi %in : tensor<1024xi25> to tensor<1024xi26>
%4 = arith.extsi %in_0 : tensor<1024xi25> to tensor<1024xi26>
%5 = arith.addi %3, %4 : tensor<1024xi26>
%6 = arith.remsi %5, %cst : tensor<1024xi26>
%7 = arith.trunci %6 : tensor<1024xi26> to tensor<1024xi25>
linalg.yield %7 : tensor<1024xi25>
} -> tensor<2x1024xi25>
return %0 : tensor<2x1024xi25>
} Unfortunately, this doesn't work because |
I think if we had a pass that converted elementwise mappable things to #map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = affine.for %i = 0 to 2 iter_args(%a = %arg0) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%in = tensor.extract %arg0[%i] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%in_0 = tensor.extract %arg1[%i] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%1 = polynomial.add(%in, %in_0) : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
%t = tensor.insert %1 into %a[%i] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
affine.yield %t : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
} We could then handle the type conversion gracefully with multi-dimensional tensors. I haven't yet found a pass that would do this (either going through |
@AlexanderViand-Intel #504 should show you how to convert to affine loops, though it still runs into the problem of a failed type conversion in |
Even my "manually converted" affine loop thing above runs into failed type conversion issues, which I think is because the type converter doesn't realize that it can convert from |
I see what you're saying, but I would expect it to just type-convert the content of the memref opaquely. Maybe the problem is that you can't have a memref of tensors? Anyhow, if you have time to root-cause that type conversion issue, that'd be helpful. Otherwise as least we have a clear path to convert elementwise to loops now :) |
I don't think it's memref related, because lowering my example above (no memrefs, just affine.for) with the func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = affine.for %arg2 = 0 to 2 iter_args(%arg3 = %arg0) -> (tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
%extracted = tensor.extract %arg0[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%1 = builtin.unrealized_conversion_cast %extracted : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%extracted_0 = tensor.extract %arg1[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%2 = builtin.unrealized_conversion_cast %extracted_0 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%3 = arith.extsi %1 : tensor<1024xi25> to tensor<1024xi26>
%4 = arith.extsi %2 : tensor<1024xi25> to tensor<1024xi26>
%5 = arith.addi %3, %4 : tensor<1024xi26>
%6 = arith.remsi %5, %cst : tensor<1024xi26>
%7 = arith.trunci %6 : tensor<1024xi26> to tensor<1024xi25>
%8 = builtin.unrealized_conversion_cast %7 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
%inserted = tensor.insert %8 into %arg3[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
affine.yield %inserted : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
} If I add an explicit typeconversion then we get a bit closer (note the func has been handled properly): func.func @test_bin_ops(%arg0: tensor<1024x2xi25>, %arg1: tensor<1024x2xi25>) -> tensor<1024x2xi25> {
%0 = builtin.unrealized_conversion_cast %arg1 : tensor<1024x2xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%1 = builtin.unrealized_conversion_cast %arg0 : tensor<1024x2xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%2 = affine.for %arg2 = 0 to 2 iter_args(%arg3 = %1) -> (tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
%extracted = tensor.extract %1[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%4 = builtin.unrealized_conversion_cast %extracted : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%extracted_0 = tensor.extract %0[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%5 = builtin.unrealized_conversion_cast %extracted_0 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%6 = arith.extsi %4 : tensor<1024xi25> to tensor<1024xi26>
%7 = arith.extsi %5 : tensor<1024xi25> to tensor<1024xi26>
%8 = arith.addi %6, %7 : tensor<1024xi26>
%9 = arith.remsi %8, %cst : tensor<1024xi26>
%10 = arith.trunci %9 : tensor<1024xi26> to tensor<1024xi25>
%11 = builtin.unrealized_conversion_cast %10 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
%inserted = tensor.insert %11 into %arg3[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
affine.yield %inserted : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
%3 = builtin.unrealized_conversion_cast %2 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> to tensor<1024x2xi25>
return %3 : tensor<1024x2xi25>
} but it's still not quite there, because the |
I think I have an idea of how to handle this (and also #505), WIP here. It nearly works, but right now only with PS: this seems like it would pop up nearly as frequently as the |
The "tensor of tensor" PR (#508) lays the groundwork for supporting poly operations on tensors, but it's not a full end-to-end solution because it doesn't support the code generated by the Instead of adding "memref of memref" helpers and then trying to figure out how to elegantly handle the raising from |
Reopening just to clean up the remaining comments |
Digging this back up, as I noticed (as part of starting on #559) that #ring = #polynomial.ring<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>
%0= polynomial.ntt %some_poly : !polynomial.polynomial<#ring> -> tensor<1024xi64, #ring> // normal NTT
%1= polynomial.ntt %some_poly_tensor : tensor<2x!polynomial.polynomial<#ring>> -> tensor<2x1024xi64, #ring> // tensor NTT and so the ElementwiseMappable verifier will complain that "all non-scalar operands/results must have the same shape and base type". In the future, it might be interesting to start a conversation upstream about the possibility of expanding ElementwiseMappable, but for now I'll just avoid adding the trait to the op and instead add a todo for custom patterns for |
I'm re-opening this issue, as the problem mentioned above appeared again as part of #763 and therefore needs attention. |
@mr0re1 brought up a great review comment about lowering tensors of poly's to standard. https://github.com/google/heir/pull/134/files#r1312057016
Tensors of polys arise from BGV lowerings (and likely all other scheme lowerings to poly). When lowering to standard, we convert poly's to a tensor of ints.
Tensors of tensors are disallowed in MLIR - they are invalid tensor element types.
We have a few options here:
(1) Flatten the tensor: e.g. if we have a tensor polynomials, each of which are lowered to
tensor<1024xi64>
, then a size 2 tensor would becometensor<2048xi64>
. However, this would appear to severely complicate the lowering math.(2) Detensorize/lower to linalg and then lower to standard? E.g. inputs to
PolyToStandard
would not includetensors
.(3) It feels like I'm missing something - surely type conversion with tensors has been done before?
The text was updated successfully, but these errors were encountered: