-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: About the support for efficient Winograd convolution in SPU #738
Comments
Hi @anakinxc Thanks for your quick reponse! Currently I have implemented the Wingrad conv class in Flax in the frontend of SPU but the comm improvement is far from expectation. Should I directly re-implement the Winograd convolution in the libspu/kernel/hlo/convolution.cc you mentioned here? If so, for the model definition such as ResNet18 in the frontend of SPU, should I retain the definition of conv layers as regular nn.conv or something else? Thanks. |
yes
regular conv should be fine |
Hi @anakinxc. Thanks for your respones and keeping this issue active. In recent days, I have been trying to implement the C++ backend op for Winograd in SPU. But there are still some difficulties because of my unfamiliarity with C++. So I sincerely request your guidance and assistance. Thanks. First of all, many apologies if these are trivial questions. As previously pointed, I have re-implemented the convolution in libspu/kernel/hlo/convolution.cc by highly referencing the python/torch version here. And the modified convolution.cc is:
Please note that I only convert 3x3 convs into Winograd version and the others are retained as regular convs. The input/output tile size are fixed to 4/2 respectively. This implemention is a peer-to-peer implemention as its torch version but it didn't work. I have some issues:
SPU is a grand and complex project, so it is quite difficult for me to fully understand it. If possible, could you please provide me with the modifications required for the above code as much detailed as possible. Thank you very much. |
Hi @warpoons Nice work! Thanks
|
Hi @warpoons , SPU |
Hi @anakinxc. Thanks for your respones!
Could you please kindly provide a specific example just about the definition of one of the Winograd transformation matrices
Thanks!
Does this mean that
Hi @tpppppub. Thanks for your respones! Do you mind take a look at if the following usage of
|
Your understanding of
|
Hi @tpppppub. Thanks for your quick and detailed respone! I have modified the
After modifying the the
Would you mind take a look at what went wrong there and feasible solutions? Thanks! :) |
libspu/kernel/hlo/convolution.cc:62:41: error: invalid initialization of reference of type 'const spu::Value&' from expression of type 'long int'
|
Hi @tpppppub. Thanks for pointing out that! In the modified version, I have used
Could you mind take a look at what were wrong here and feasible solutions? Sorry for taking your time. Thanks! :) |
matmul only supports < 2D dot. Seems you are doing 4D dot |
Hi @anakinxc. Thanks for your reminder! I realized that
But the profile of ResNet18 on CIFAR-100 was still wired:
It can be noted that the comms of Additionally, I noted that in the Winograd conv, three hal::tensordot are used where two for the input/kernel transformation and one for the GEMM, where the regular only use hal::tensordot once. Remember that for a fixed model with known weights, the kernel transformation can be done offline. Therefore, I replace the hal::tensordot in kernel transformation with
The total comm is slightly decreased but still far from expectations. So confused! Would you mind take a look at this and is there any suggestions? BTW, I used 2PC.json with "protocol": "SEMI2K", "field": "FM64" in this test, Thanks! :) |
Hi @warpoons Please try to benchmark with Looks like your implementation requires 7 Tensordot with non-integer inputs will introduce a One thing you can try is reduce number of |
Hi @AnakinX. Thanks for your suggestion! I have benchmarked with
Seem to be no difference here. And for the number of
Therefore, there are total 2+2+1+2=7 tensordot for one Winograd conv. This might be the reason for high trunc and total comm. I have also merged all the tensordot into one expression but the profile didn't change. Are there any feasible optimizations? Thanks! :) |
Merge into one expression does not reduce number of calls. Try to figure out how many truncations you actually need might be a better idea. |
Thank you @anakinxc. As you previously said, matmul is suitable for 2D dot while tensorsot supports >2D dot. An additional question is that does matmul also introduce 1 trunc just like tensorsot? |
Thanks for the patient guidance and discussion. My problem has been solved. This issue can be closed as completed. Thansk for all! |
Issue Type
Build/Install
Modules Involved
MPC protocol
Have you reproduced the bug with SPU HEAD?
Yes
Have you searched existing issues?
Yes
SPU Version
spu 0.9.0.dev20240311
OS Platform and Distribution
Ubuntu 18.04.6 LTS by WSL
Python Version
3.10
Compiler Version
GCC 11.3.0
Current Behavior?
Hi dear SPU team,
In recent weeks I have been working on using SPU to evaluate the private inference cost of Winograd-based ConvNets, which can reduce the number of multiplications and thus a lightweight comm cost.
BTW, using Winograd for more efficient PPML is gradually emerging and has been proven to be an efficient way to improve the comm efficiency, including:
But I found that the reduction of multiplications will NOT make the comm truly decreases in SPU. In general, the respective comms of a single conv layer are: the stand conv: 759296 byte, Winograd-EWMM (element-wise matmul) conv: 6291456 byte, Winograd-GEMM (general matmul) conv: 3964928 byte. Apparently, the Winograd algorithm increases the comm by 8.28x and 5.22x, respectively. And the increased comm mainly comes from the frequent invocation of truncations. For details please see here.
So if now I want to modify the C++ backend op for Winograd in SPU, could you tell me where should I start and which piece should i start looking at in SPU? Thanks! (If possible, I am willing to contribute the support for Winograd to the SPU community.)
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: