Skip to content

ClarkResearchGroup/cuDMRG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cuDMRG

Testing Cupy-based DMRG implementation.

python >= 3.7, numpy >= 1.19, conda install -c conda-forge cupy cutensor cudatoolkit=11

If Cupy is missing, this project automatically switches to Numpy.

To get some sense of how Cupy compares with Numpy, it is recommended to run benchmarks in https://gist.github.com/fukatani/4702aa05aed255cd25f42e77d0a22e37

In general, two types of operations are most common in DMRG: (1) tensor multiplication; (2) SVD or Eigen decomposition. For CUDA, the first is super fast and efficient (this project can switch between "transpose + matrix multiplication + transpose", or the cutensor implementation of https://github.com/springer13/tcl). However, SVD or Eigen decomposition does not benefit as much from a GPU, and it comes with big overhead. For dense SVD and Eigen solvers, GPU's speedup against CPU is only visible when matrix size is in the thousands, but this speedup grows with matrix size. Hopefully more sparse solvers can be added in the future, which will likely benefit the block sparse structure of quantum-number DMRG with the massive parallelism of GPU.

Latest benchmark of the current code on a EVGA RTX 3090 FTW3 Ultra (L=100, svd error = 1e-16, max bond dimension=1000, Heisenberg chain):

[21:01:54 cuDMRG.apps.dmrgINFO] sweep = 0, E = -43.14847950275731, max_dim = 4
[21:01:54 cuDMRG.apps.dmrgINFO] sweep = 1, E = -44.09681238309175, max_dim = 16
[21:01:55 cuDMRG.apps.dmrgINFO] sweep = 2, E = -44.12544610855322, max_dim = 64
[21:01:59 cuDMRG.apps.dmrgINFO] sweep = 3, E = -44.12734668170454, max_dim = 256
[21:02:17 cuDMRG.apps.dmrgINFO] sweep = 4, E = -44.12767996796869, max_dim = 600
[21:02:57 cuDMRG.apps.dmrgINFO] sweep = 5, E = -44.12773208020941, max_dim = 912
[21:03:51 cuDMRG.apps.dmrgINFO] sweep = 6, E = -44.12773892512647, max_dim = 1000
[21:04:46 cuDMRG.apps.dmrgINFO] sweep = 7, E = -44.12773980075967, max_dim = 1000
[21:05:32 cuDMRG.apps.dmrgINFO] sweep = 8, E = -44.12773988714249, max_dim = 1000
[21:06:07 cuDMRG.apps.dmrgINFO] sweep = 9, E = -44.12773989317154, max_dim = 1000

This is actually faster than on CPU with ITensor (built with parallel Intel MKL fully enabled on AMD Ryzen 5600X).

About

Testing cupy-based DMRG implementation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages