Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda heat example w quaditer #913
base: master
Are you sure you want to change the base?
Cuda heat example w quaditer #913
Changes from 139 commits
a979fb2
298158c
1794db3
51ab4f2
22a7377
18377f3
27a3a96
95b5729
d4e881d
f55b878
c1ef6ad
394ac6a
1f0df67
3152042
aac5994
142f89a
eaff534
ffdc341
59595e8
0e3cb21
687141d
11d5a01
d5c951c
f4272a6
d5cf949
2e52de1
2cd0168
54922ab
9406ff9
8fedba5
8bd417a
abf11b6
4f85cf5
4935b70
188cceb
9c904e4
06432db
a67caaa
ecee17f
60edda9
063ff7a
9206be3
1eeb568
5e339a0
204f3be
0f2e6b7
7100e0a
f129449
618adb5
78f120c
4971cba
ea8451c
986c5db
f93fdfb
f442ae2
81274d5
fbc05ed
506328c
b505189
b0a94aa
aa3d1ae
c8cf6fe
8a4523d
9617a4f
427a6b0
bbed047
85c055c
2b77613
f9c70ab
0519016
5752676
0fe023c
a352612
2a6120a
67face7
aca8a6f
9e4d592
6114495
2a8abeb
630017c
fc26670
ae7bc93
0e28f14
0eb376d
8f7a182
9b1567d
a08ab97
b34c43b
b289b69
b2c0347
b87d78b
e10e2f6
42a28e1
e59b8b8
e7157e4
e4b194d
a613107
113a7a2
d1e831e
7f8fa3c
c38419c
190e43e
763c6b5
1b6060d
d767668
726ea9e
8590aa4
39e1f0c
f0cd305
9d4e8b9
12f64bb
361333b
e31c6e3
626dec2
fbc1b4b
ea83925
a356d8d
ee1f77c
fb7e1fc
b38ab72
adb166a
6300a4a
b7301c2
18f47b8
f6e9cc6
8a796de
75e89ed
a77c347
7338788
cbab665
1c81281
d42bcab
1ab1650
bc8ec95
c7f4b0f
d4d5967
3b2196b
825d257
6109bd1
9caa60b
868d559
a4637b6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing the analogue benchmark using
QuadraturePointIterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For benchmarking purposes we should separate setup (and hence setup cost) from the actual assembly performance. E.g.
This uncovers on my machine that there is still memory transfer going on in the actual assembly code:
Hence we should adapt all objects before the kernel launch. This has the advantage that we do not need to transfer everything from CPU to GPU memory space when assembling multiple times.
Furthermore, the actual allocation should be
instead of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is essentially user code. Please keep it for now here until we decide where exactly to put it (e.g. just some how-to or its own package).
Check warning on line 23 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L21-L23
Check warning on line 25 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L25
Check warning on line 49 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L40-L49
Check warning on line 52 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some trouble understanding how
Can you quickly point me to the relevant code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this actually a tough design issue that I have faced and to be honest I greatly improvised on this one, but essentially the local matrices are being transferred in two steps:
_to_localdh
Check warning on line 60 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L55-L60
Check warning on line 85 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L79-L85
Check warning on line 101 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L100-L101
Check warning on line 118 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L115-L118
Check warning on line 138 in ext/GPU/CUDAKernelLauncher.jl
Codecov / codecov/patch
ext/GPU/CUDAKernelLauncher.jl#L133-L138