You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running a system in which the exchange need many periodic cell copies to accurately compute the HFX
( where many means > ~ 60 ) require us to reserve a lot of memory on device to compute the integrals.
The issue comes from the fact that in the case of NG pb and NP primitives before screening we need to reserve:
NG * (NG * NP * NP ) * (NG * NP * NP ) * VRR_BS(L) for the vrr and eco and
NG * (NG * NP * NP ) * (NG * NP * NP ) * HRR_BS(L) for the hrr.
Screening reduces the ket and bra, not the first NG, and in practical tests it does not seems to help much.
At high L and high NG and NP this gets to be quite big
e.g. L = 3333, NG = 100, NP = 3 leads to 27 * 1M * BS > 10 GB before sceening for a single set
Note that we only compute full sets[L], so we have no way of splitting this calculation.
This makes these calculations not possible with the current approach. At the moment we get a cudaMalloc failure, so at least it is somewhat meaningful.
The only workaround at the moment is to change basis set or eps_schwarz, or much larger GPUs, so not exactly great
The text was updated successfully, but these errors were encountered:
Running a system in which the exchange need many periodic cell copies to accurately compute the HFX
( where many means > ~ 60 ) require us to reserve a lot of memory on device to compute the integrals.
The issue comes from the fact that in the case of NG pb and NP primitives before screening we need to reserve:
NG * (NG * NP * NP ) * (NG * NP * NP ) * VRR_BS(L) for the vrr and eco and
NG * (NG * NP * NP ) * (NG * NP * NP ) * HRR_BS(L) for the hrr.
Screening reduces the ket and bra, not the first NG, and in practical tests it does not seems to help much.
At high L and high NG and NP this gets to be quite big
e.g. L = 3333, NG = 100, NP = 3 leads to 27 * 1M * BS > 10 GB before sceening for a single set
Note that we only compute full sets[L], so we have no way of splitting this calculation.
This makes these calculations not possible with the current approach. At the moment we get a cudaMalloc failure, so at least it is somewhat meaningful.
The only workaround at the moment is to change basis set or eps_schwarz, or much larger GPUs, so not exactly great
The text was updated successfully, but these errors were encountered: