-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNN fallback CPU ops #226 #780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution @WangYuyao
The code has a few issues integrating with its CUDA counterparts. I would deal with these when integrating your efforts. Unfortunately the test cases you introduced are failing (locally and in the GitHub CI). I guess they did run when you tested before submitting the PR?
Yes, all the test cases were run before submitting and they are all passed on my machine. |
Hi @corepointer, thanks for looking into this PR! I've run the test suite locally on my machine yesterday and all tests passed (both in the Thanks for pointing out that there are inconsistencies with the CUDA counterparts. Sure, it would be great if you could address them or give @WangYuyao some hints on how to fix those himself. @WangYuyao told me that he would be happy to continue working on this contribution over the summer to increase its value for DAPHNE. |
Hi @WangYuyao & @pdamme, |
Dear @corepointer, the commit with ID fe41fef is the implementation of DNN CPU Ops which passed on both my and pdamme's machine. The latter is the attempt to implement a DNN example in DaphneDSL including adding the respective DaphneDSL built-in functions, registering the DNN CPU kernels, etc. However, I failed in this part. For example, the kernel function max_pooling returned a message "std::bad_array_new_length" and now I am trying to debug it. |
Hi! [1] https://en.cppreference.com/w/cpp/memory/new/bad_array_new_length |
Thank you for your reply and I will try to fix my code following your guidance! Additionally, would it be convenient for you to communicate via email? Thank you for your patience! |
Hi! |
- All of the DNN-related kernel test cases defined their own utility function "check". - For all other kernels, we typically call this function "checkXYZ", whereby XYZ is the kernel name; thereby ensuring a unique function name. - Calling the function "check" for every kernel looks correct (the unit tests are separate cpp files and, thus, should be separate compilation units). - However, it seems like that caused some kind of undefined behavior. - Perhaps this behavior is related to the way catch2 handles the test case cpp files? I can't really tell the reason.
I must correct myself. I noticed that I had commented out the new unit test cases for the DNN-related kernels when I ran the tests... When I don't comment out the test cases, they fail. However, I think I have found the cause. It's not directly related to the kernels, but to a subtle issue with the test case code itself. Each of those test cases had a little helper function, and those were all called However, now the CI tests fail with a different problem, already in the "Initialize Containers" step, i.e., unrelated to the DAPHNE source code:
@corepointer: Do you have an idea what is going wrong there? |
If the non-unique name |
Sorry, just hit the wrong button. This PR is still open. |
Dear @corepointer, I found 2 ways of defining kernels. One is using 'namespace' and the other is 'Struct for partial template specialization - Convenience function - (Partial) template specializations for different data/value types'. Is there any difference between them? Additionally, in the way of using 'Namespace', I found there is a namespace function named 'Forward'. Does it mean that originally there should also be a function named 'Backward' to implement the backward ops of DNN, and correspondingly in DaphneDSL DNN functions will be like 'Conv2d.forward' and 'Conv2d.backward'? Similarly, to integrate both GPU and CPU versions of DNN ops, what should the form of functions be like in DaphneDSL? Will there be an argument like "use_cuda == 1" or a namespace function like 'conv2d.cuda' or 'conv2d' to distinguish which version of functions to use? Thank you for your guidance! |
Dear @WangYuyao, In the namespace of a kernel, e.g., In the DSL I went for wrapper functions. But that's not merged to main yet and lives in the branch of PR #734. This is also how it's done in SystemDS and allows to encapsulate utility code like The decision if the CPU or GPU version is called will primarily be made by the DAPHNE compiler. At the moment the compiler pass to mark operations as GPU ops is quite primitive. But if an operation has the CUDASupport trait and the input is adequately sized, then the comiler would generate a GPU instruction instead of a CPU instruction. But only if you compiled with CUDA support and run with the hth, Mark |
Dear @corepointer , Thank you for your time! |
Hi, Sorry for the delayed answer. Two things caught my eye:
hth, Mark |
Dear @corepointer, the implementation of BatchNormBackward GPU op and the corresponding test case is finished. I found that there is no implementation of that in the main repository and it is my pleasure that mine could be a part of #734 if it works correctly. Besides, I noticed that there are conflicts in 'kernels.json', could you please give me some guidance on how to fix it? Thank you for your help! |
As I mentioned in my email, I fixed the issue in the course of rebasing your commits on top of main. So this will also cause conflicts. Maybe we can fix this next week. |
Gerne |
Thank you for your effort @WangYuyao 👍 After rearranging some thing and some cleanup I merged your contribution to the main branch. I did, however, not force push to your fork's main branch. This is why this PR is now marked as closed and not merged. |
AMLS Exercise "DNN fallback CPU ops #226"DNN fallback CPU ops #226"
Yuyao Wang