Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNN fallback CPU ops #226 #780

Closed
wants to merge 21 commits into from
Closed

DNN fallback CPU ops #226 #780

wants to merge 21 commits into from

Conversation

WangYuyao
Copy link
Contributor

AMLS Exercise "DNN fallback CPU ops #226"DNN fallback CPU ops #226"
Yuyao Wang

Copy link
Collaborator

@corepointer corepointer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @WangYuyao
The code has a few issues integrating with its CUDA counterparts. I would deal with these when integrating your efforts. Unfortunately the test cases you introduced are failing (locally and in the GitHub CI). I guess they did run when you tested before submitting the PR?

@WangYuyao
Copy link
Contributor Author

Thank you for your contribution @WangYuyao The code has a few issues integrating with its CUDA counterparts. I would deal with these when integrating your efforts. Unfortunately the test cases you introduced are failing (locally and in the GitHub CI). I guess they did run when you tested before submitting the PR?

Yes, all the test cases were run before submitting and they are all passed on my machine.

@pdamme pdamme added the AMLS summer 2024 Student project for the Architecture of ML Systems lecture at TU Berlin (summer 2024). label Jul 30, 2024
@pdamme
Copy link
Collaborator

pdamme commented Jul 30, 2024

Hi @corepointer, thanks for looking into this PR! I've run the test suite locally on my machine yesterday and all tests passed (both in the daphne-dev (X86-64) and the github-action CI container)...

Thanks for pointing out that there are inconsistencies with the CUDA counterparts. Sure, it would be great if you could address them or give @WangYuyao some hints on how to fix those himself.

@WangYuyao told me that he would be happy to continue working on this contribution over the summer to increase its value for DAPHNE.

@corepointer
Copy link
Collaborator

Hi @WangYuyao & @pdamme,
I ran the tests again and they are still failing for me (running in the daphne-dev container in debug and release mode). If you want to continue your work to get behind these errors, this would be highly appreciated of course :)
For the integration with the main branch: besides some issues that arise if you try to compile with CUDA enabled, there will be some additional refactoring required to align with what's already there and what's coming from #734 (naming scheme changes mostly).

@WangYuyao
Copy link
Contributor Author

Hi @WangYuyao & @pdamme, I ran the tests again and they are still failing for me (running in the daphne-dev container in debug and release mode). If you want to continue your work to get behind these errors, this would be highly appreciated of course :) For the integration with the main branch: besides some issues that arise if you try to compile with CUDA enabled, there will be some additional refactoring required to align with what's already there and what's coming from #734 (naming scheme changes mostly).

Dear @corepointer, the commit with ID fe41fef is the implementation of DNN CPU Ops which passed on both my and pdamme's machine. The latter is the attempt to implement a DNN example in DaphneDSL including adding the respective DaphneDSL built-in functions, registering the DNN CPU kernels, etc. However, I failed in this part. For example, the kernel function max_pooling returned a message "std::bad_array_new_length" and now I am trying to debug it.

@corepointer
Copy link
Collaborator

Hi!
According to [1], this exception can happen for various reasons. Maybe a negative number is passed to some result matrix allocation or something like this? Unknown dimensions are represented by -1, so this could be a reason for the error. You could check what sizes are used there with a debugger.
hth, Mark

[1] https://en.cppreference.com/w/cpp/memory/new/bad_array_new_length

@WangYuyao
Copy link
Contributor Author

Hi! According to [1], this exception can happen for various reasons. Maybe a negative number is passed to some result matrix allocation or something like this? Unknown dimensions are represented by -1, so this could be a reason for the error. You could check what sizes are used there with a debugger. hth, Mark

[1] https://en.cppreference.com/w/cpp/memory/new/bad_array_new_length

Thank you for your reply and I will try to fix my code following your guidance!

Additionally, would it be convenient for you to communicate via email?

Thank you for your patience!

@corepointer
Copy link
Collaborator

Hi!
Yes sure. Please connect us @pdamme
Best, Mark

- All of the DNN-related kernel test cases defined their own utility function "check".
- For all other kernels, we typically call this function "checkXYZ", whereby XYZ is the kernel name; thereby ensuring a unique function name.
- Calling the function "check" for every kernel looks correct (the unit tests are separate cpp files and, thus, should be separate compilation units).
- However, it seems like that caused some kind of undefined behavior.
- Perhaps this behavior is related to the way catch2 handles the test case cpp files? I can't really tell the reason.
@pdamme
Copy link
Collaborator

pdamme commented Aug 4, 2024

Hi @corepointer, thanks for looking into this PR! I've run the test suite locally on my machine yesterday and all tests passed (both in the daphne-dev (X86-64) and the github-action CI container)...

I must correct myself. I noticed that I had commented out the new unit test cases for the DNN-related kernels when I ran the tests... When I don't comment out the test cases, they fail. However, I think I have found the cause. It's not directly related to the kernels, but to a subtle issue with the test case code itself. Each of those test cases had a little helper function, and those were all called check. When we give them unique names, e.g., checkBiasAddForward, then all test cases pass on my machine in the daphne-dev container. See my commit for details. I don't fully understand why that led to a problem, since the unit tests should be independent compilation units, but maybe there is a problem when they are used by catch2, I can only speculate...

However, now the CI tests fail with a different problem, already in the "Initialize Containers" step, i.e., unrelated to the DAPHNE source code:

failed to register layer: write /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so: no space left on device

@corepointer: Do you have an idea what is going wrong there?

@pdamme
Copy link
Collaborator

pdamme commented Aug 4, 2024

If the non-unique name check is a problem, then we might want to change that for the CUDA-based DNN kernel test cases, too.

@pdamme pdamme closed this Aug 4, 2024
@pdamme pdamme reopened this Aug 4, 2024
@pdamme
Copy link
Collaborator

pdamme commented Aug 4, 2024

Sorry, just hit the wrong button. This PR is still open.

@WangYuyao
Copy link
Contributor Author

Dear @corepointer,
I have questions about the form of the kernel ops definition in DAPHNE.

I found 2 ways of defining kernels. One is using 'namespace' and the other is 'Struct for partial template specialization - Convenience function - (Partial) template specializations for different data/value types'. Is there any difference between them?

Additionally, in the way of using 'Namespace', I found there is a namespace function named 'Forward'. Does it mean that originally there should also be a function named 'Backward' to implement the backward ops of DNN, and correspondingly in DaphneDSL DNN functions will be like 'Conv2d.forward' and 'Conv2d.backward'?

Similarly, to integrate both GPU and CPU versions of DNN ops, what should the form of functions be like in DaphneDSL? Will there be an argument like "use_cuda == 1" or a namespace function like 'conv2d.cuda' or 'conv2d' to distinguish which version of functions to use?

Thank you for your guidance!

@corepointer
Copy link
Collaborator

Dear @WangYuyao,
The name spaces in the C++ kernel implementations is just an organizational measure. And I mostly omitted the convenience function as most code that calls the kernels is generated anyway. Also, Pooling is a special case where I put the opcode into the template parameters. This would normally be a function parameter in the convenience function that passes it on to the apply() method.

In the namespace of a kernel, e.g., CUDA::NN::Softmax there should be the structs Forward and Backward that have an apply() method. This would be the place where forward and backward passes for the Softmax layer can be implemented.

In the DSL I went for wrapper functions. But that's not merged to main yet and lives in the branch of PR #734. This is also how it's done in SystemDS and allows to encapsulate utility code like conv2d.init(...) and you can also easily exchange a script based conv2d against a kernel based call. So conv2d.forward(...) would call conv2d(..) appropriately. The backward functions I'd name the same way as in SystemDS (e.g., conv2d_backward_filter and conv2d_backward_data). With a wrapper script you can then call them both with conv2d.backward(...). Haven't implemented that yet in #734 though.

The decision if the CPU or GPU version is called will primarily be made by the DAPHNE compiler. At the moment the compiler pass to mark operations as GPU ops is quite primitive. But if an operation has the CUDASupport trait and the input is adequately sized, then the comiler would generate a GPU instruction instead of a CPU instruction. But only if you compiled with CUDA support and run with the --cuda CLI parameter or set use_cuda=true in the UserConfig.json file. There will be a mechanism to use special decorations in the DSL to pass extra information to the compiler. But that's a feature that is not implemented yet.

hth, Mark

@WangYuyao
Copy link
Contributor Author

Dear @corepointer ,
I have just uploaded my attempt to implement the CUDA version of batch_norm_backward op and the corresponding test file. However, when I tried to test the code by entering the command
" ./build.sh --no-deps --target run_tests && ./test.sh -nb batch_norm_bwd* -d yes --cuda",
the CUDA version was not executed. Could you please check my test file and provide me with some guidance on it?

Thank you for your time!

@corepointer
Copy link
Collaborator

Hi,

Sorry for the delayed answer.

Two things caught my eye:

  • You don't compile with CUDA support. You need to add --cuda to the build.sh parameters. Do a build.sh --clean -y before switching from no CUDA to with CUDA support.
  • Once you compile with CUDA support you'll notice that your test code does not compile (clashing function names and a few other things)

hth, Mark

@WangYuyao
Copy link
Contributor Author

Dear @corepointer,

the implementation of BatchNormBackward GPU op and the corresponding test case is finished. I found that there is no implementation of that in the main repository and it is my pleasure that mine could be a part of #734 if it works correctly.

Besides, I noticed that there are conflicts in 'kernels.json', could you please give me some guidance on how to fix it?

Thank you for your help!

@corepointer
Copy link
Collaborator

As I mentioned in my email, I fixed the issue in the course of rebasing your commits on top of main. So this will also cause conflicts. Maybe we can fix this next week.
Regards,
Mark

@WangYuyao
Copy link
Contributor Author

WangYuyao commented Aug 16, 2024

As I mentioned in my email, I fixed the issue in the course of rebasing your commits on top of main. So this will also cause conflicts. Maybe we can fix this next week. Regards, Mark

Gerne

@corepointer
Copy link
Collaborator

Thank you for your effort @WangYuyao 👍

After rearranging some thing and some cleanup I merged your contribution to the main branch. I did, however, not force push to your fork's main branch. This is why this PR is now marked as closed and not merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMLS summer 2024 Student project for the Architecture of ML Systems lecture at TU Berlin (summer 2024).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants