DNN fallback CPU ops #226 #780

WangYuyao · 2024-07-08T17:55:06Z

AMLS Exercise "DNN fallback CPU ops #226"DNN fallback CPU ops #226"
Yuyao Wang

corepointer

Thank you for your contribution @WangYuyao
The code has a few issues integrating with its CUDA counterparts. I would deal with these when integrating your efforts. Unfortunately the test cases you introduced are failing (locally and in the GitHub CI). I guess they did run when you tested before submitting the PR?

WangYuyao · 2024-07-30T17:00:28Z

Thank you for your contribution @WangYuyao The code has a few issues integrating with its CUDA counterparts. I would deal with these when integrating your efforts. Unfortunately the test cases you introduced are failing (locally and in the GitHub CI). I guess they did run when you tested before submitting the PR?

Yes, all the test cases were run before submitting and they are all passed on my machine.

pdamme · 2024-07-30T17:52:27Z

Hi @corepointer, thanks for looking into this PR! I've run the test suite locally on my machine yesterday and all tests passed (both in the daphne-dev (X86-64) and the github-action CI container)...

Thanks for pointing out that there are inconsistencies with the CUDA counterparts. Sure, it would be great if you could address them or give @WangYuyao some hints on how to fix those himself.

@WangYuyao told me that he would be happy to continue working on this contribution over the summer to increase its value for DAPHNE.

corepointer · 2024-07-30T23:51:56Z

Hi @WangYuyao & @pdamme,
I ran the tests again and they are still failing for me (running in the daphne-dev container in debug and release mode). If you want to continue your work to get behind these errors, this would be highly appreciated of course :)
For the integration with the main branch: besides some issues that arise if you try to compile with CUDA enabled, there will be some additional refactoring required to align with what's already there and what's coming from #734 (naming scheme changes mostly).

WangYuyao · 2024-08-01T20:20:42Z

Hi @WangYuyao & @pdamme, I ran the tests again and they are still failing for me (running in the daphne-dev container in debug and release mode). If you want to continue your work to get behind these errors, this would be highly appreciated of course :) For the integration with the main branch: besides some issues that arise if you try to compile with CUDA enabled, there will be some additional refactoring required to align with what's already there and what's coming from #734 (naming scheme changes mostly).

Dear @corepointer, the commit with ID fe41fef is the implementation of DNN CPU Ops which passed on both my and pdamme's machine. The latter is the attempt to implement a DNN example in DaphneDSL including adding the respective DaphneDSL built-in functions, registering the DNN CPU kernels, etc. However, I failed in this part. For example, the kernel function max_pooling returned a message "std::bad_array_new_length" and now I am trying to debug it.

corepointer · 2024-08-01T21:08:25Z

Hi!
According to [1], this exception can happen for various reasons. Maybe a negative number is passed to some result matrix allocation or something like this? Unknown dimensions are represented by -1, so this could be a reason for the error. You could check what sizes are used there with a debugger.
hth, Mark

[1] https://en.cppreference.com/w/cpp/memory/new/bad_array_new_length

WangYuyao · 2024-08-01T21:27:41Z

Hi! According to [1], this exception can happen for various reasons. Maybe a negative number is passed to some result matrix allocation or something like this? Unknown dimensions are represented by -1, so this could be a reason for the error. You could check what sizes are used there with a debugger. hth, Mark

[1] https://en.cppreference.com/w/cpp/memory/new/bad_array_new_length

Thank you for your reply and I will try to fix my code following your guidance!

Additionally, would it be convenient for you to communicate via email?

Thank you for your patience!

corepointer · 2024-08-02T14:55:55Z

Hi!
Yes sure. Please connect us @pdamme
Best, Mark

- All of the DNN-related kernel test cases defined their own utility function "check". - For all other kernels, we typically call this function "checkXYZ", whereby XYZ is the kernel name; thereby ensuring a unique function name. - Calling the function "check" for every kernel looks correct (the unit tests are separate cpp files and, thus, should be separate compilation units). - However, it seems like that caused some kind of undefined behavior. - Perhaps this behavior is related to the way catch2 handles the test case cpp files? I can't really tell the reason.

pdamme · 2024-08-04T19:04:32Z

Hi @corepointer, thanks for looking into this PR! I've run the test suite locally on my machine yesterday and all tests passed (both in the daphne-dev (X86-64) and the github-action CI container)...

I must correct myself. I noticed that I had commented out the new unit test cases for the DNN-related kernels when I ran the tests... When I don't comment out the test cases, they fail. However, I think I have found the cause. It's not directly related to the kernels, but to a subtle issue with the test case code itself. Each of those test cases had a little helper function, and those were all called check. When we give them unique names, e.g., checkBiasAddForward, then all test cases pass on my machine in the daphne-dev container. See my commit for details. I don't fully understand why that led to a problem, since the unit tests should be independent compilation units, but maybe there is a problem when they are used by catch2, I can only speculate...

However, now the CI tests fail with a different problem, already in the "Initialize Containers" step, i.e., unrelated to the DAPHNE source code:

failed to register layer: write /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so: no space left on device

@corepointer: Do you have an idea what is going wrong there?

pdamme · 2024-08-04T19:05:29Z

If the non-unique name check is a problem, then we might want to change that for the CUDA-based DNN kernel test cases, too.

pdamme · 2024-08-04T19:06:11Z

Sorry, just hit the wrong button. This PR is still open.

WangYuyao · 2024-08-05T18:00:05Z

Dear @corepointer,
I have questions about the form of the kernel ops definition in DAPHNE.

I found 2 ways of defining kernels. One is using 'namespace' and the other is 'Struct for partial template specialization - Convenience function - (Partial) template specializations for different data/value types'. Is there any difference between them?

Additionally, in the way of using 'Namespace', I found there is a namespace function named 'Forward'. Does it mean that originally there should also be a function named 'Backward' to implement the backward ops of DNN, and correspondingly in DaphneDSL DNN functions will be like 'Conv2d.forward' and 'Conv2d.backward'?

Similarly, to integrate both GPU and CPU versions of DNN ops, what should the form of functions be like in DaphneDSL? Will there be an argument like "use_cuda == 1" or a namespace function like 'conv2d.cuda' or 'conv2d' to distinguish which version of functions to use?

Thank you for your guidance!

corepointer · 2024-08-06T14:44:38Z

Dear @WangYuyao,
The name spaces in the C++ kernel implementations is just an organizational measure. And I mostly omitted the convenience function as most code that calls the kernels is generated anyway. Also, Pooling is a special case where I put the opcode into the template parameters. This would normally be a function parameter in the convenience function that passes it on to the apply() method.

In the namespace of a kernel, e.g., CUDA::NN::Softmax there should be the structs Forward and Backward that have an apply() method. This would be the place where forward and backward passes for the Softmax layer can be implemented.

In the DSL I went for wrapper functions. But that's not merged to main yet and lives in the branch of PR #734. This is also how it's done in SystemDS and allows to encapsulate utility code like conv2d.init(...) and you can also easily exchange a script based conv2d against a kernel based call. So conv2d.forward(...) would call conv2d(..) appropriately. The backward functions I'd name the same way as in SystemDS (e.g., conv2d_backward_filter and conv2d_backward_data). With a wrapper script you can then call them both with conv2d.backward(...). Haven't implemented that yet in #734 though.

The decision if the CPU or GPU version is called will primarily be made by the DAPHNE compiler. At the moment the compiler pass to mark operations as GPU ops is quite primitive. But if an operation has the CUDASupport trait and the input is adequately sized, then the comiler would generate a GPU instruction instead of a CPU instruction. But only if you compiled with CUDA support and run with the --cuda CLI parameter or set use_cuda=true in the UserConfig.json file. There will be a mechanism to use special decorations in the DSL to pass extra information to the compiler. But that's a feature that is not implemented yet.

hth, Mark

WangYuyao · 2024-08-06T21:55:24Z

Dear @corepointer ,
I have just uploaded my attempt to implement the CUDA version of batch_norm_backward op and the corresponding test file. However, when I tried to test the code by entering the command
" ./build.sh --no-deps --target run_tests && ./test.sh -nb batch_norm_bwd* -d yes --cuda",
the CUDA version was not executed. Could you please check my test file and provide me with some guidance on it?

Thank you for your time!

corepointer · 2024-08-09T15:34:21Z

Hi,

Sorry for the delayed answer.

Two things caught my eye:

You don't compile with CUDA support. You need to add --cuda to the build.sh parameters. Do a build.sh --clean -y before switching from no CUDA to with CUDA support.
Once you compile with CUDA support you'll notice that your test code does not compile (clashing function names and a few other things)

hth, Mark

WangYuyao · 2024-08-10T18:34:43Z

Dear @corepointer,

the implementation of BatchNormBackward GPU op and the corresponding test case is finished. I found that there is no implementation of that in the main repository and it is my pleasure that mine could be a part of #734 if it works correctly.

Besides, I noticed that there are conflicts in 'kernels.json', could you please give me some guidance on how to fix it?

Thank you for your help!

corepointer · 2024-08-16T16:14:06Z

As I mentioned in my email, I fixed the issue in the course of rebasing your commits on top of main. So this will also cause conflicts. Maybe we can fix this next week.
Regards,
Mark

WangYuyao · 2024-08-16T17:32:35Z

As I mentioned in my email, I fixed the issue in the course of rebasing your commits on top of main. So this will also cause conflicts. Maybe we can fix this next week. Regards, Mark

Gerne

This reverts commit 4a14f95.

This reverts commit 587d4fd.

This reverts commit 7e4fcdc.

This reverts commit dfb8626.

corepointer · 2024-09-13T13:50:58Z

Thank you for your effort @WangYuyao 👍

After rearranging some thing and some cleanup I merged your contribution to the main branch. I did, however, not force push to your fork's main branch. This is why this PR is now marked as closed and not merged.

WangYuyao added 10 commits July 7, 2024 23:38

AMSL Exercise

ffde4f8

AMLS Exercise

fe41fef

AMLS Exercise

a50cb0b

AMLS Exercise

ac90e6f

AMLS Exercise

07a7861

AMLS Exercise

c24c495

AMLS Exercise

eda83f3

AMLS Exercise

294b064

AMLS Exercise

2249ab8

AMLS Exercise

9cbc404

corepointer requested changes Jul 30, 2024

View reviewed changes

pdamme added the AMLS summer 2024 Student project for the Architecture of ML Systems lecture at TU Berlin (summer 2024). label Jul 30, 2024

pdamme force-pushed the main branch from f3b4d11 to be74624 Compare August 4, 2024 18:51

pdamme closed this Aug 4, 2024

pdamme reopened this Aug 4, 2024

WangYuyao added 2 commits August 5, 2024 19:09

Clean up commented-out code

433ffa7

Merge remote-tracking branch 'WangYuyao/daphne/main' into daphne/main

e96872a

An attempt to implement BN_bwd of CUDA version

dfb8626

WangYuyao added 2 commits August 10, 2024 15:30

kernel correction

7e4fcdc

BatchNorm bwd gpu kernel test case pass

587d4fd

WangYuyao added 5 commits August 18, 2024 22:05

Pooling Backward Ops cuda version

4a14f95

Revert "Pooling Backward Ops cuda version"

5a93ee7

This reverts commit 4a14f95.

Revert "BatchNorm bwd gpu kernel test case pass"

afb1748

This reverts commit 587d4fd.

Revert "kernel correction"

9e6ee33

This reverts commit 7e4fcdc.

Revert "An attempt to implement BN_bwd of CUDA version"

2ae4472

This reverts commit dfb8626.

corepointer closed this in 940cc90 Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNN fallback CPU ops #226 #780

DNN fallback CPU ops #226 #780

WangYuyao commented Jul 8, 2024

corepointer left a comment

WangYuyao commented Jul 30, 2024

pdamme commented Jul 30, 2024

corepointer commented Jul 30, 2024

WangYuyao commented Aug 1, 2024

corepointer commented Aug 1, 2024

WangYuyao commented Aug 1, 2024

corepointer commented Aug 2, 2024

pdamme commented Aug 4, 2024

pdamme commented Aug 4, 2024

pdamme commented Aug 4, 2024

WangYuyao commented Aug 5, 2024

corepointer commented Aug 6, 2024

WangYuyao commented Aug 6, 2024

corepointer commented Aug 9, 2024

WangYuyao commented Aug 10, 2024

corepointer commented Aug 16, 2024

WangYuyao commented Aug 16, 2024 •

edited

Loading

corepointer commented Sep 13, 2024

DNN fallback CPU ops #226 #780

DNN fallback CPU ops #226 #780

Conversation

WangYuyao commented Jul 8, 2024

corepointer left a comment

Choose a reason for hiding this comment

WangYuyao commented Jul 30, 2024

pdamme commented Jul 30, 2024

corepointer commented Jul 30, 2024

WangYuyao commented Aug 1, 2024

corepointer commented Aug 1, 2024

WangYuyao commented Aug 1, 2024

corepointer commented Aug 2, 2024

pdamme commented Aug 4, 2024

pdamme commented Aug 4, 2024

pdamme commented Aug 4, 2024

WangYuyao commented Aug 5, 2024

corepointer commented Aug 6, 2024

WangYuyao commented Aug 6, 2024

corepointer commented Aug 9, 2024

WangYuyao commented Aug 10, 2024

corepointer commented Aug 16, 2024

WangYuyao commented Aug 16, 2024 • edited Loading

corepointer commented Sep 13, 2024

WangYuyao commented Aug 16, 2024 •

edited

Loading