-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding vector dialect support to Calyx? #1777
Comments
Hi, @medbzkst—thanks for the super interesting suggestion! I will freely admit I'm no expert on either vector nor the SCFToCalyx pass infrastructure. However, the prospect of lowering data-parallel MLIR ops to hardware seems totally tractable. The main question I'd have is: can this be accomplished (perhaps in a lower-performance way) with existing MLIR passes? For example, there is VectorToSCF. Does it suffice, for correctness only, to use this and then lower the resulting vector-free code to Calyx? I'd be super interested to know if this causes any problems. I expect that to generate not-great hardware, but it would be extremely useful as a baseline. It's not clear to me whether a better lowering would go directly from vector to Calyx or if some intermediate point would make more sense, but I'd be willing to believe that the direct route is good. |
It occurs to me to mention @jiahanxie353, who has been ramping up on the CIRCT-related Calyx infrastructure. Is this something you'd have an opinion about, or even be interested in contributing to? |
Hi @medbzkst thanks for bringing up this super interesting conversion! While I'm not an expert on vector either, my current understanding is that lowering from Vector to SCF and then from SCF to Calyx seems more reasonable. The reason is SCF's
I'm definitely interested! I can delve more into it and return to the thread with some insights once I've formulated some thoughts. |
Thank you for the feedback @sampsyo and @jiahanxie353. Indeed the first idea is to try to use the available infrastructure, and intuitively it would be a great baseline for correctness and performance. However, I came across some problems while I was trying to do so. The VectorToScf pass seems to be not as generic as it sounds. From the top of the source code file you can see the comment "This file implements lowering of vector transfer operations to SCF.". So it is precisely there to lower the transfer operations. Even if I give it a run on a very simple MLIR module that receives 2 vectors as input, sum them up, and return the output, it does not behave as expected in the sense that it does not convert that to an SCF for or while loop even by setting the You can give it a try if you want on this MLIR module:
By running Now the idea was, let's convert that manually (by an external script as a temporary solution before implementing a pass for that) to the desired loop implementation in SCF. The problem happened with those operations that iterate and check a condition, i.e. the masked operations. For example a gather operation if thought of as a loop, it would require checking whether to grab an element from memory or skip it and use a pass-through element. Even less sophisticated than that, a maskedload operation would do the same. The problem here is that the loop will have more than one control block which MLIR does not seem to be happy with using the while loop (perhaps that's why there is no pass available that does that?). Using the for loop it would also be problematic because those operations yield nothing, but at least it is at the level of CIRCT here. Finally, the last chance was to lower everything to CF and lower from there to Calyx as it will be just about a bunch of blocks with some controls. However, this means having backedges in the CFG since we have loops, which is not lowerable so far. For those reasons, I believe there should be a way to have a VectorToCalyx pass. Most especially when dealing with the masked operations, it would be probably more optimal to decide on how to connect components at the pass level and have a direct Calyx code optimised for that. |
Intriguing. I think it might be useful, if we were to go down the "direct lowering" road, to try hand-writing some Calyx programs that perform data-parallel vector-ish operations and see how those look… |
@medbzkst another possible place to look is #1585. @xerpi implemented a Halide-to-Calyx compiler that implements lowering for vector operations to Calyx but it wasn't merged into CIRCT. Perhaps that's a good place to start? Also, if you have questions about Calyx, we can add you to the Calyx slack and directly answer questions there as well. If you'd like to join, please send me an email. |
Thanks @medbzkst for the detailed issue description!
This is surprising... Anyway, I think we should still keep VectorToSCF pass. Because I think structured control flow is just very important for vector operations. And I hope VectorToSCF is still under development, so they will eventually support full functionalities? Otherwise it's so surprising that they couldn't event support vectors add...
With this manual temporary solution of basic case of At the end of the day, I would imagine we can have code like: func.func @maskedload(%base_vector: memref<?xf32>, %mask: memref<8xi1>, %pass_thru: memref<8xf32>) -> memref<8xf32> {
%result = scf.for %index = 0 to %size step 1 {
%mask_val = load %mask[%index]
%cond = cmpi eq %mask_val, %c1
%result_next = scf.if %cond {
%val = load %base_vector[%index]
scf.yield %val
} else {
%val = load %pass_thru[%index]
scf.yield %val
}
scf.yield %result_next
}
return %result
} So as long as we support lowering
Agree with these comments. Given that vector is already pretty low level, we can expose vectors to Calyx directly and exploit parallelisms there. Summing up, the solution I come up with is using a mix of SCF and Calyx, where SCF is in charge of the control flow, and Calyx to exploit "data-parallel vector-ish operations". And I can
Please let me know what you think |
Hey @medbzkst! I'm going to close this issue but please feel free to re-open this or a new issue if you'd like to discuss another problem/need help with something! |
I am targeting the lowering of MLIR codes that contain functions that receive memrefs as input, transform them into vectors (using load, maskedload, or gather operations), perform several operations using vector dialect, and then spit out a memref as an output (using store, maskedstore, or scatter operation). The thing is that CIRCT does not support lowering vector dialect operations to Calyx. However, vector dialect is an abstraction that can be directly concretised into hardware. One way to see that Calyx is suitable for that is the Dahlia and the systolic array generator frontends.
So what about generating hardware that interfaces with memories (memrefs as inputs and outputs) and performs all the required vector operations as they are instructed in vector operations? Maybe a conversion pass
vector-to-calyx
?The text was updated successfully, but these errors were encountered: