cuDMRG

Testing Cupy-based DMRG implementation.

python >= 3.7, numpy >= 1.19, conda install -c conda-forge cupy cutensor cudatoolkit=11

If Cupy is missing, this project automatically switches to Numpy.

To get some sense of how Cupy compares with Numpy, it is recommended to run benchmarks in https://gist.github.com/fukatani/4702aa05aed255cd25f42e77d0a22e37

In general, two types of operations are most common in DMRG: (1) tensor multiplication; (2) SVD or Eigen decomposition. For CUDA, the first is super fast and efficient (this project can switch between "transpose + matrix multiplication + transpose", or the cutensor implementation of https://github.com/springer13/tcl). However, SVD or Eigen decomposition does not benefit as much from a GPU, and it comes with big overhead. For dense SVD and Eigen solvers, GPU's speedup against CPU is only visible when matrix size is in the thousands, but this speedup grows with matrix size. Hopefully more sparse solvers can be added in the future, which will likely benefit the block sparse structure of quantum-number DMRG with the massive parallelism of GPU.

Latest benchmark of the current code on a EVGA RTX 3090 FTW3 Ultra (L=100, svd error = 1e-16, max bond dimension=1000, Heisenberg chain):

[21:01:54 cuDMRG.apps.dmrgINFO] sweep = 0, E = -43.14847950275731, max_dim = 4
[21:01:54 cuDMRG.apps.dmrgINFO] sweep = 1, E = -44.09681238309175, max_dim = 16
[21:01:55 cuDMRG.apps.dmrgINFO] sweep = 2, E = -44.12544610855322, max_dim = 64
[21:01:59 cuDMRG.apps.dmrgINFO] sweep = 3, E = -44.12734668170454, max_dim = 256
[21:02:17 cuDMRG.apps.dmrgINFO] sweep = 4, E = -44.12767996796869, max_dim = 600
[21:02:57 cuDMRG.apps.dmrgINFO] sweep = 5, E = -44.12773208020941, max_dim = 912
[21:03:51 cuDMRG.apps.dmrgINFO] sweep = 6, E = -44.12773892512647, max_dim = 1000
[21:04:46 cuDMRG.apps.dmrgINFO] sweep = 7, E = -44.12773980075967, max_dim = 1000
[21:05:32 cuDMRG.apps.dmrgINFO] sweep = 8, E = -44.12773988714249, max_dim = 1000
[21:06:07 cuDMRG.apps.dmrgINFO] sweep = 9, E = -44.12773989317154, max_dim = 1000

This is actually faster than on CPU with ITensor (built with parallel Intel MKL fully enabled on AMD Ryzen 5600X).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cuDMRG		cuDMRG
.gitignore		.gitignore
README.md		README.md
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuDMRG

About

Releases

Packages

Languages

ClarkResearchGroup/cuDMRG

Folders and files

Latest commit

History

Repository files navigation

cuDMRG

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages