Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive device memory needed for systems with many periodic copies #2

Open
marcelloPuligheddu opened this issue Oct 24, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@marcelloPuligheddu
Copy link
Owner

marcelloPuligheddu commented Oct 24, 2024

Running a system in which the exchange need many periodic cell copies to accurately compute the HFX
( where many means > ~ 60 ) require us to reserve a lot of memory on device to compute the integrals.

The issue comes from the fact that in the case of NG pb and NP primitives before screening we need to reserve:

NG * (NG * NP * NP ) * (NG * NP * NP ) * VRR_BS(L) for the vrr and eco and
NG * (NG * NP * NP ) * (NG * NP * NP ) * HRR_BS(L) for the hrr.
Screening reduces the ket and bra, not the first NG, and in practical tests it does not seems to help much.

At high L and high NG and NP this gets to be quite big
e.g. L = 3333, NG = 100, NP = 3 leads to 27 * 1M * BS > 10 GB before sceening for a single set

Note that we only compute full sets[L], so we have no way of splitting this calculation.

This makes these calculations not possible with the current approach. At the moment we get a cudaMalloc failure, so at least it is somewhat meaningful.

The only workaround at the moment is to change basis set or eps_schwarz, or much larger GPUs, so not exactly great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant