Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible GPU and CPU Usage for Gravity #170

Merged
merged 23 commits into from
Jun 12, 2024
Merged

Conversation

spencerw
Copy link
Member

@spencerw spencerw commented May 20, 2024

Although calculating gravity on the GPU is generally much faster, doing so introduces a significant amount of extra overhead. When using multistepping, it's often the case that using the GPU for gravity on higher rungs actually slows things down.

This PR introduces a new parameter 'nGpuMinParts', which triggers a tree walk on the CPU (instead of the GPU) whenever the number of active particles falls below the threshold.

@spencerw
Copy link
Member Author

Note that there are still a couple of 'todo' items before this can be merged. Most importantly, enabling the periodic force calculation currently causes a hang whenever the gravity tree walk happens on the CPU. @trquinn I was hoping you might have some ideas as to why that could be happening.

@spencerw
Copy link
Member Author

spencerw commented May 21, 2024

The hang during the gravity calculation was caused by a misplaced #ifdef inside of calculateEwald and is now fixed.

Some of the 'useckloop' code still needs to be incorporated and there are a couple of places where indentation needs to be updated, but a code review would definitely be helpful at this point. In particular, this PR involved creating a considerable amount of duplicated code, to ensure that the CPU tree walk can be done with or without the CUDA compile flag. Moving things into helper functions cuts down on this, but it's definitely made the code messier and harder to read.

@spencerw
Copy link
Member Author

spencerw commented Jun 5, 2024

@trquinn Are we sure that removing 'bUseCpu' as a member from the TreePiece and ListCompute classes is what we want to do? I count at least 15 function headers that will need to be changed if we instead pass it down as an argument from Main::startGravity to all of the places it's needed.

@trquinn
Copy link
Member

trquinn commented Jun 5, 2024

I think it's fine to keep the bUseCpu as Treepiece and ListCompute attributes. I recommend NOT putting them under an #ifdef CUDA

Copy link
Member

@trquinn trquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment cleanup and possible removal of some ifdefs.

Compute.cpp Outdated Show resolved Hide resolved
Compute.h Outdated Show resolved Hide resolved
DataManager.h Outdated Show resolved Hide resolved
ParallelGravity.h Outdated Show resolved Hide resolved
ParallelGravity.h Outdated Show resolved Hide resolved
TreePiece.cpp Outdated Show resolved Hide resolved
TreePiece.cpp Outdated Show resolved Hide resolved
TreePiece.cpp Outdated Show resolved Hide resolved
TreePiece.cpp Outdated Show resolved Hide resolved
Compute.cpp Outdated Show resolved Hide resolved
@spencerw
Copy link
Member Author

All of the requested changes have been made. Note that the review was started before my previous commit (5f7275e), so some of the change requests are marked as 'outdated'.

It sounds like you noticed the new commit (implementing the ifdef trick to get rid of duplicated code) partway through, however.

Copy link
Member

@trquinn trquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change is correcting a comment, the other is a bug, although on a code path that is rarely used.

Also: I'm getting memory errors on "teststep" when running on the A100s with 1 node, 2 processes (i.e. 2 GPUs), 2 PEs/process. In short

ibrun -n 2 ../ChaNGa.smp.cuda ++ppn 2 +setcpuaffinity +commap 0,8 +pemap 1-2,9-10 ./test_pg.param

The error is

Calculating gravity (tree bucket, theta = 0.700000) ... Fatal CUDA Error an illegal memory access was encountered at HostCUDA.cu:609.
Return value 700 from 'cudaFree(ptr->d_list)'.------------- Processor 3 Exiting: Caught Signal ------------
Reason: Aborted
Fatal CUDA Error an illegal memory access was encountered at HostCUDA.cu:588.
Return value 700 from 'cudaMalloc(&ptr->d_bucketMarkers, markerSize)'.[v330-020.ls6.tacc.utexas.edu:mpispawn_0][readline] Unexpected En
d-Of-File on file descriptor 5. MPI process died?

Compute.cpp Outdated Show resolved Hide resolved
ParallelGravity.cpp Outdated Show resolved Hide resolved
@trquinn trquinn merged commit 94506ea into N-BodyShop:master Jun 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants