From 443130275cf890dfe289429ceff8b6def4294faa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ole=20Sch=C3=BCtt?= Date: Tue, 14 Nov 2023 12:10:42 +0100 Subject: [PATCH] precommit: Wrap Markdown lines after 100 characters --- INSTALL.md | 561 ++++++++++----------- README.md | 51 +- README_cmake.md | 176 +++---- benchmarks/Fayalite-FIST/README.md | 20 +- benchmarks/QMMM_CBD_PHY/README.md | 20 +- benchmarks/QMMM_ClC/README.md | 28 +- benchmarks/QMMM_MQAE/README.md | 22 +- benchmarks/QS/README.md | 52 +- benchmarks/QS_DM_LS/README.md | 54 +- benchmarks/QS_LiH_HFX/README.md | 75 ++- benchmarks/QS_low_scaling_postHF/README.md | 42 +- benchmarks/QS_mp2_rpa/128-H2O/README.md | 7 +- benchmarks/QS_mp2_rpa/32-H2O/README.md | 14 +- benchmarks/QS_mp2_rpa/64-H2O/README.md | 23 +- benchmarks/QS_mp2_rpa/README.md | 18 +- benchmarks/QS_ot_ls/README.md | 8 +- benchmarks/QS_pao_ml_tio2/README.md | 7 +- benchmarks/QS_single_node/README.md | 7 +- benchmarks/QS_stmv/README.md | 32 +- benchmarks/README.md | 68 ++- data/NNP/bulkH2O-jcp2020-cnnp/README.md | 13 +- docs/README.md | 5 +- docs/getting-started/spack.md | 49 +- docs/methods/pao-ml.md | 182 +++---- docs/units.md | 29 +- src/README.md | 3 +- src/dbm/README.md | 22 +- src/dbt/README.md | 6 +- src/grid/README.md | 35 +- src/pw/fpga/README.md | 8 +- src/start/python/README.md | 21 +- tests/Fist/EAM_LIB/README.md | 8 +- tests/README.md | 36 +- tools/benchmark_plots/README.md | 34 +- tools/dashboard/README.md | 37 +- tools/docker/README.md | 9 +- tools/doxify/README.md | 128 +++-- tools/input_editing/emacs/README.md | 40 +- tools/input_editing/vim/README.md | 7 +- tools/plan_mpi_omp/README.md | 159 +++--- tools/precommit/README.md | 62 +-- tools/precommit/precommit_server.py | 2 +- tools/toolchain/README.md | 231 ++++----- 43 files changed, 1116 insertions(+), 1295 deletions(-) diff --git a/INSTALL.md b/INSTALL.md index 9ba785ddf2..2ecbb3fa5b 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -3,17 +3,18 @@ ## 1. Acquire the code For users, the preferred method is to [download a release](https://github.com/cp2k/cp2k/releases/) -(use the versioned tarballs, `cp2k-X.Y.tar.bz2`). For developers, the preferred -method is to [download from Git](./README.md#downloading-cp2k-source-code). +(use the versioned tarballs, `cp2k-X.Y.tar.bz2`). For developers, the preferred method is to +[download from Git](./README.md#downloading-cp2k-source-code). For more details on downloading CP2K, see . ## 2. Install prerequisites -The easiest way to build CP2K with all its dependencies is as a [Docker container](./tools/docker/README.md). +The easiest way to build CP2K with all its dependencies is as a +[Docker container](./tools/docker/README.md). -Alternatively, the [toolchain script](./tools/toolchain/install_cp2k_toolchain.sh) -can also be run directly. +Alternatively, the [toolchain script](./tools/toolchain/install_cp2k_toolchain.sh) can also be run +directly. For a complete introduction to the toolchain script, see the [README](./tools/toolchain/README.md). @@ -33,16 +34,15 @@ cd tools/toolchain/ --with-fftw=system --with-reflapack=no --enable-cuda ``` -- Once the script has completed successfully, follow the instructions given at - the end of its output. Note that the pre-built arch files provided by the - toolchain are for the GNU compiler, users must adapt them for other compilers. - It is possible to use the provided [arch files](./arch) as guidance. +- Once the script has completed successfully, follow the instructions given at the end of its + output. Note that the pre-built arch files provided by the toolchain are for the GNU compiler, + users must adapt them for other compilers. It is possible to use the provided [arch files](./arch) + as guidance. There are [arch files](./arch) for a few specific platforms (e.g. [Linux-gnu-x86_64](./arch/Linux-gnu-x86_64.psmp), -[Linux-intel-x86_64](./arch/Linux-intel-x86_64.psmp)) -which include a toolchain build. -Sourcing such an arch file in the cp2k folder launches a toolchain build, e.g. +[Linux-intel-x86_64](./arch/Linux-intel-x86_64.psmp)) which include a toolchain build. Sourcing such +an arch file in the cp2k folder launches a toolchain build, e.g. ``` source ./arch/Linux-gnu-x86_64.psmp @@ -58,39 +58,37 @@ Check also the corresponding [HowTos](https://www.cp2k.org/howto/) for [Apple M1 (macOS)](https://www.cp2k.org/howto:compile_on_macos/) and [Cray XC40/50 (Piz Daint, CSCS)](https://www.cp2k.org/howto:compile_on_cray_cscs/). -Sub-points here discuss prerequisites needed to build CP2K. Copies of the -recommended versions of 3rd party software can be downloaded from . +Sub-points here discuss prerequisites needed to build CP2K. Copies of the recommended versions of +3rd party software can be downloaded from . -Generally, CP2K supports only one version for each of its dependencies. -These are defined by the [toolchain scripts](./tools/toolchain/scripts/). -Other versions might work too, but we don't test them. So, your mileage may vary. +Generally, CP2K supports only one version for each of its dependencies. These are defined by the +[toolchain scripts](./tools/toolchain/scripts/). Other versions might work too, but we don't test +them. So, your mileage may vary. ### 2a. GNU make (required, build system) -GNU make should be on your system (gmake or make on linux) and used for the build, -go to download from . +GNU make should be on your system (gmake or make on linux) and used for the build, go to + download from . ### 2b. Python (required, build system) -Python 3.5+ is needed to run the dependency generator. On most system Python is -already installed. For more information visit: +Python 3.5+ is needed to run the dependency generator. On most system Python is already installed. +For more information visit: ### 2c. Fortran and C Compiler (required, build system) -A Fortran 2008 compiler and matching C99 compiler should be installed on your system. -We have good experience with gcc/gfortran (gcc >=4.6 works, later version recommended). -Be aware that some compilers have bugs that might cause them to fail (internal -compiler errors, segfaults) or, worse, yield a mis-compiled CP2K. Report bugs to -compiler vendors; they (and we) have an interest in fixing them. A list of tested -compiler can be found [here](https://www.cp2k.org/dev:compiler_support). +A Fortran 2008 compiler and matching C99 compiler should be installed on your system. We have good +experience with gcc/gfortran (gcc >=4.6 works, later version recommended). Be aware that some +compilers have bugs that might cause them to fail (internal compiler errors, segfaults) or, worse, +yield a mis-compiled CP2K. Report bugs to compiler vendors; they (and we) have an interest in fixing +them. A list of tested compiler can be found [here](https://www.cp2k.org/dev:compiler_support). Always run a `make -j test` (See point 5.) after compilation to identify these problems. ### 2d. BLAS and LAPACK (required, base functionality) -BLAS and LAPACK should be installed. Using vendor-provided libraries can make a -very significant difference (up to 100%, e.g., ACML, MKL, ESSL), not all optimized -libraries are bug free. Use the latest versions available, use the interfaces -matching your compiler, and download all patches! +BLAS and LAPACK should be installed. Using vendor-provided libraries can make a very significant +difference (up to 100%, e.g., ACML, MKL, ESSL), not all optimized libraries are bug free. Use the +latest versions available, use the interfaces matching your compiler, and download all patches! - The canonical BLAS and LAPACK can be obtained from the Netlib repository: - @@ -101,128 +99,118 @@ matching your compiler, and download all patches! - - -Please note that the BLAS/LAPACK implementation used by CP2K needs to be -thread-safe (OpenMP). Examples are the sequential variant of the Intel MKL, -the Cray libsci, the OpenBLAS OpenMP variant and the reference BLAS/LAPACK packages. -If compiling with MKL, users must -define `-D__MKL` to ensure the code is thread-safe. MKL with multiple OpenMP -threads in CP2K requires that CP2K was compiled with the Intel compiler. -If the `cpp` precompiler is used in a separate precompilation step in combination -with the Intel Fortran compiler, `-D__INTEL_COMPILER` must be added explicitly -(the Intel compiler sets `__INTEL_COMPILER` otherwise automatically). +Please note that the BLAS/LAPACK implementation used by CP2K needs to be thread-safe (OpenMP). +Examples are the sequential variant of the Intel MKL, the Cray libsci, the OpenBLAS OpenMP variant +and the reference BLAS/LAPACK packages. If compiling with MKL, users must define `-D__MKL` to ensure +the code is thread-safe. MKL with multiple OpenMP threads in CP2K requires that CP2K was compiled +with the Intel compiler. If the `cpp` precompiler is used in a separate precompilation step in +combination with the Intel Fortran compiler, `-D__INTEL_COMPILER` must be added explicitly (the +Intel compiler sets `__INTEL_COMPILER` otherwise automatically). -On the Mac, BLAS and LAPACK may be provided by Apple's Accelerate framework. -If using this framework, `-D__ACCELERATE` must be defined to account for some -interface incompatibilities between Accelerate and reference BLAS/LAPACK. +On the Mac, BLAS and LAPACK may be provided by Apple's Accelerate framework. If using this +framework, `-D__ACCELERATE` must be defined to account for some interface incompatibilities between +Accelerate and reference BLAS/LAPACK. -When building on/for Windows using the Minimalist GNU for Windows (MinGW) environment, -you must set `-D__MINGW`, `-D__NO_STATM_ACCESS` and `-D__NO_SOCKETS` to avoid -undefined references during linking, respectively errors while printing the statistics. +When building on/for Windows using the Minimalist GNU for Windows (MinGW) environment, you must set +`-D__MINGW`, `-D__NO_STATM_ACCESS` and `-D__NO_SOCKETS` to avoid undefined references during +linking, respectively errors while printing the statistics. ### 2e. MPI and SCALAPACK (optional, required for MPI parallel builds) -MPI (version 3) and SCALAPACK are needed for parallel code. -(Use the latest versions available and download all patches!). +MPI (version 3) and SCALAPACK are needed for parallel code. (Use the latest versions available and +download all patches!). -:warning: Note that your MPI installation must match the used Fortran compiler. -If your computing platform does not provide MPI, -there are several freely available alternatives: +:warning: Note that your MPI installation must match the used Fortran compiler. If your computing +platform does not provide MPI, there are several freely available alternatives: -- MPICH2 MPI: - (may require `-fallow-argument-mismatch` when building with GCC 10) +- MPICH2 MPI: (may require `-fallow-argument-mismatch` when + building with GCC 10) - OpenMPI MPI: - ScaLAPACK: - - - - ScaLAPACK can be part of ACML or cluster MKL. - These libraries are recommended if available. - - Recently a [ScaLAPACK installer](http://www.netlib.org/scalapack/scalapack_installer.tgz) - has been added that simplifies the installation. + - ScaLAPACK can be part of ACML or cluster MKL. These libraries are recommended if available. + - Recently a [ScaLAPACK installer](http://www.netlib.org/scalapack/scalapack_installer.tgz) has + been added that simplifies the installation. -CP2K assumes that the MPI library implements MPI version 3. Older -versions of MPI (e.g., MPI 2.0) are not supported and the old flag `-D__MPI_VERSION` in -the arch file will be ignored. CP2K can make use of the mpi_f08 module. If its use is requested, -set the flag `-D__MPI_F08`. +CP2K assumes that the MPI library implements MPI version 3. Older versions of MPI (e.g., MPI 2.0) +are not supported and the old flag `-D__MPI_VERSION` in the arch file will be ignored. CP2K can make +use of the mpi_f08 module. If its use is requested, set the flag `-D__MPI_F08`. ### 2f. FFTW (optional, improved performance of FFTs) -FFTW can be used to improve FFT speed on a wide range of architectures. -It is strongly recommended to install and use FFTW3. The current version of CP2K -works with FFTW 3.X (use `-D__FFTW3`). It can be downloaded from +FFTW can be used to improve FFT speed on a wide range of architectures. It is strongly recommended +to install and use FFTW3. The current version of CP2K works with FFTW 3.X (use `-D__FFTW3`). It can +be downloaded from -:warning: Note that FFTW must know the Fortran compiler you will use in order to -install properly (e.g., `export F77=gfortran` before configure if you intend to -use gfortran). +:warning: Note that FFTW must know the Fortran compiler you will use in order to install properly +(e.g., `export F77=gfortran` before configure if you intend to use gfortran). -:warning: Note that on machines and compilers which support SSE you can configure -FFTW3 with `--enable-sse2`. Compilers/systems that do not align memory (NAG f95, -Intel IA32/gfortran) should either not use `--enable-sse2` or otherwise set the -define `-D__FFTW3_UNALIGNED` in the arch file. Since CP2K is OpenMP parallelized, -the FFTW3 threading library libfftw3_threads (or libfftw3_omp) is required. +:warning: Note that on machines and compilers which support SSE you can configure FFTW3 with +`--enable-sse2`. Compilers/systems that do not align memory (NAG f95, Intel IA32/gfortran) should +either not use `--enable-sse2` or otherwise set the define `-D__FFTW3_UNALIGNED` in the arch file. +Since CP2K is OpenMP parallelized, the FFTW3 threading library libfftw3_threads (or libfftw3_omp) is +required. ### 2g. LIBINT (optional, enables methods including HF exchange) -- Hartree-Fock exchange (optional, use `-D__LIBINT`) - requires the LIBINT package to be installed. -- Recommended way to build LIBINT: Download a CP2K-configured LIBINT library - from [libint-cp2k](https://github.com/cp2k/libint-cp2k). Build and install - LIBINT by following the instructions provided there. Note that using a library - configured for higher maximum angular momentum will increase build time and - binary size of CP2K executable (assuming static linking). -- CP2K is not hardwired to these provided libraries and any other LIBINT - library (version >= 2.5.0) should be compatible as long as it was compiled - with `--enable-eri=1` and default ordering. -- Avoid debugging information (`-g` flag) for compiling LIBINT since this will - increase library size by a large factor. -- In the arch file of CP2K: add `-D__LIBINT` to the `DFLAGS`. - Add `-L$(LIBINT_DIR)/lib -lint2 -lstdc++` to `LIBS` and `-I$(LIBINT_DIR)/include` - to `FCFLAGS`. `lstdc++` is needed if you use the GNU C++ compiler. -- Libint 1 is no longer supported and the previously needed flags - `-D__LIBINT_MAX_AM` and `-D__LIBDERIV_MAX_AM1` are ignored. -- `-D__MAX_CONTR=4` (default=2) can be used to compile efficient contraction - kernels up to l=4, but the build time will increase accordingly. +- Hartree-Fock exchange (optional, use `-D__LIBINT`) requires the LIBINT package to be installed. +- Recommended way to build LIBINT: Download a CP2K-configured LIBINT library from + [libint-cp2k](https://github.com/cp2k/libint-cp2k). Build and install LIBINT by following the + instructions provided there. Note that using a library configured for higher maximum angular + momentum will increase build time and binary size of CP2K executable (assuming static linking). +- CP2K is not hardwired to these provided libraries and any other LIBINT library (version >= 2.5.0) + should be compatible as long as it was compiled with `--enable-eri=1` and default ordering. +- Avoid debugging information (`-g` flag) for compiling LIBINT since this will increase library size + by a large factor. +- In the arch file of CP2K: add `-D__LIBINT` to the `DFLAGS`. Add + `-L$(LIBINT_DIR)/lib -lint2 -lstdc++` to `LIBS` and `-I$(LIBINT_DIR)/include` to `FCFLAGS`. + `lstdc++` is needed if you use the GNU C++ compiler. +- Libint 1 is no longer supported and the previously needed flags `-D__LIBINT_MAX_AM` and + `-D__LIBDERIV_MAX_AM1` are ignored. +- `-D__MAX_CONTR=4` (default=2) can be used to compile efficient contraction kernels up to l=4, but + the build time will increase accordingly. ### 2h. LIBXSMM (optional, improved performance for matrix multiplication) - A library for matrix operations and deep learning primitives: . -- Add `-D__LIBXSMM` to enable it, with suitable include and library paths, - e.g., `FCFLAGS += -I${LIBXSMM_DIR}/include -D__LIBXSMM` - and `LIBS += -L${LIBXSMM_DIR}/lib -lxsmmf -lxsmm -ldl` +- Add `-D__LIBXSMM` to enable it, with suitable include and library paths, e.g., + `FCFLAGS += -I${LIBXSMM_DIR}/include -D__LIBXSMM` and + `LIBS += -L${LIBXSMM_DIR}/lib -lxsmmf -lxsmm -ldl` - LIBSMM is not used if LIBXSMM is enabled. ### 2i. CUDA (optional, improved performance on GPU systems) -- Specify OFFLOAD_CC (e.g., `OFFLOAD_CC = nvcc`) and - OFFLOAD_FLAGS (e.g., `OFFLOAD_FLAGS = -O3 -g -w --std=c++11`) variables. - Remember to include the support for the C++11 standard. +- Specify OFFLOAD_CC (e.g., `OFFLOAD_CC = nvcc`) and OFFLOAD_FLAGS (e.g., + `OFFLOAD_FLAGS = -O3 -g -w --std=c++11`) variables. Remember to include the support for the C++11 + standard. - Use `-D__OFFLOAD_CUDA` to generally enable support for Nvidia GPUs -- Use the `-D__DBCSR_ACC` and `OFFLOAD_TARGET = cuda` to enable - accelerator support for matrix multiplications. +- Use the `-D__DBCSR_ACC` and `OFFLOAD_TARGET = cuda` to enable accelerator support for matrix + multiplications. - Add `-lstdc++ -lcudart -lnvrtc -lcuda -lcublas` to LIBS. -- Specify the GPU type (e.g., `GPUVER = P100`), - possible values are K20X, K40, K80, P100, V100, A100. -- Specify the C++ compiler (e.g., `CXX = g++`) and the CXXFLAGS to support - the C++11 standard. -- CUFFT 7.0 has a known bug and is therefore disabled by default. - NVIDIA's webpage list a patch (an upgraded version cufft i.e. >= 7.0.35) - - use this together with `-D__HAS_PATCHED_CUFFT_70`. -- Use `-D__OFFLOAD_PROFILING` to turn on Nvidia Tools Extensions. - It requires to link `-lnvToolsExt`. +- Specify the GPU type (e.g., `GPUVER = P100`), possible values are K20X, K40, K80, P100, V100, + A100. +- Specify the C++ compiler (e.g., `CXX = g++`) and the CXXFLAGS to support the C++11 standard. +- CUFFT 7.0 has a known bug and is therefore disabled by default. NVIDIA's webpage list a patch (an + upgraded version cufft i.e. >= 7.0.35) - use this together with `-D__HAS_PATCHED_CUFFT_70`. +- Use `-D__OFFLOAD_PROFILING` to turn on Nvidia Tools Extensions. It requires to link + `-lnvToolsExt`. - Link to a blas/scalapack library that accelerates large DGEMMs (e.g., libsci_acc) - Use `-D__NO_OFFLOAD_GRID` to disable the GPU backend of the grid library. - Use `-D__NO_OFFLOAD_DBM` to disable the GPU backend of the sparse tensor library. -- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs - and associated gather/scatter operations. +- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs and associated gather/scatter + operations. ### 2j. LIBXC (optional, wider choice of xc functionals) -- The version 5.1.0 (or later) of LIBXC can be downloaded from -- CP2K does not make use of fourth derivates such that LIBXC may be configured - with './configure --disable-lxc \'. -- During the installation, the directories `$(LIBXC_DIR)/lib` - and `$(LIBXC_DIR)/include` are created. -- Add `-D__LIBXC` to DFLAGS, `-I$(LIBXC_DIR)/include` to FCFLAGS - and `-L$(LIBXC_DIR)/lib -lxcf03 -lxc` to LIBS. +- The version 5.1.0 (or later) of LIBXC can be downloaded from + +- CP2K does not make use of fourth derivates such that LIBXC may be configured with './configure + --disable-lxc \'. +- During the installation, the directories `$(LIBXC_DIR)/lib` and `$(LIBXC_DIR)/include` are + created. +- Add `-D__LIBXC` to DFLAGS, `-I$(LIBXC_DIR)/include` to FCFLAGS and + `-L$(LIBXC_DIR)/lib -lxcf03 -lxc` to LIBS. - :warning: Note that the deprecated flags `-D__LIBXC2` and `-D__LIBXC3` are ignored. ### 2k. ELPA (optional, improved performance for diagonalization) @@ -234,19 +222,19 @@ Library ELPA for the solution of the eigenvalue problem - During the installation the `libelpa_openmp.a` is created. - Minimal supported version of ELPA is 2018.05.001. - Add `-D__ELPA` to `DFLAGS` -- Add `-D__ELPA_NVIDIA_GPU`, `-D__ELPA_AMD_GPU`, or `-D__ELPA_INTEL_GPU` - to `DFLAGS` to enable GPU support for the respective vendor. +- Add `-D__ELPA_NVIDIA_GPU`, `-D__ELPA_AMD_GPU`, or `-D__ELPA_INTEL_GPU` to `DFLAGS` to enable GPU + support for the respective vendor. - Add `-I$(ELPA_INCLUDE_DIR)/modules` to `FCFLAGS` - Add `-I$(ELPA_INCLUDE_DIR)/elpa` to `FCFLAGS` - Add `-L$(ELPA_DIR)` to `LDFLAGS` - Add `-lelpa` to `LIBS` -- For specific architectures it can be better to install specifically optimized - kernels (see BG) and/or employ a higher optimization level to compile it. +- For specific architectures it can be better to install specifically optimized kernels (see BG) + and/or employ a higher optimization level to compile it. ### 2l. cuSOLVERMp (experimental, improved performance for diagonalization on Nvidia GPUs) -NVIDIA cuSOLVERMp is a high-performance, distributed-memory, GPU-accelerated library -that provides tools for the solution of dense linear systems and eigenvalue problems. +NVIDIA cuSOLVERMp is a high-performance, distributed-memory, GPU-accelerated library that provides +tools for the solution of dense linear systems and eigenvalue problems. - cuSOLVERMp replaces the ScaLapack `SYEVD` to improve the performance of the diagonalization - A version of cuSOLVERMp can be downloaded from . @@ -255,29 +243,28 @@ that provides tools for the solution of dense linear systems and eigenvalue prob ### 2m. PEXSI (optional, low scaling SCF method) -The Pole EXpansion and Selected Inversion (PEXSI) method requires the PEXSI -library and two dependencies (ParMETIS or PT-Scotch and SuperLU_DIST). +The Pole EXpansion and Selected Inversion (PEXSI) method requires the PEXSI library and two +dependencies (ParMETIS or PT-Scotch and SuperLU_DIST). -- Download PEXSI (www.pexsi.org) and install it and its dependencies by - following its README.md. +- Download PEXSI (www.pexsi.org) and install it and its dependencies by following its README.md. - PEXSI versions 0.10.x have been tested with CP2K. Older versions are not supported. - PEXSI needs to be built with `make finstall`. In the arch file of CP2K: -- Add `-lpexsi_${SUFFIX} -llapack -lblas -lsuperlu_dist_3.3 -lparmetis -lmetis`, - and their paths (with `-L$(LIB_DIR)`) to LIBS. -- It is important that a copy of LAPACK and BLAS is placed before and after these - libraries (replace `-llapack` and `-lblas` with the optimized versions as needed). -- In order to link in PT-Scotch instead of ParMETIS replace `-lparmetis -lmetis` - with: `-lptscotchparmetis -lptscotch -lptscotcherr -lscotchmetis -lscotch -lscotcherr` +- Add `-lpexsi_${SUFFIX} -llapack -lblas -lsuperlu_dist_3.3 -lparmetis -lmetis`, and their paths + (with `-L$(LIB_DIR)`) to LIBS. +- It is important that a copy of LAPACK and BLAS is placed before and after these libraries (replace + `-llapack` and `-lblas` with the optimized versions as needed). +- In order to link in PT-Scotch instead of ParMETIS replace `-lparmetis -lmetis` with: + `-lptscotchparmetis -lptscotch -lptscotcherr -lscotchmetis -lscotch -lscotcherr` - Add `-I$(PEXSI_DIR)/fortran/` to FCFLAGS. - Add `-D__LIBPEXSI` to DFLAGS. Below are some additional hints that may help in the compilation process: -- For building PT-Scotch, the flag `-DSCOTCH_METIS_PREFIX` in `Makefile.inc` - must not be set and the flag `-DSCOTCH_PTHREAD` must be removed. +- For building PT-Scotch, the flag `-DSCOTCH_METIS_PREFIX` in `Makefile.inc` must not be set and the + flag `-DSCOTCH_PTHREAD` must be removed. - For building SuperLU_DIST with PT-Scotch, you must set the following in `make.inc`: ```shell @@ -287,8 +274,8 @@ PARMETISLIB = -lptscotchparmetis -lptscotch -lptscotcherr ### 2n. QUIP (optional, wider range of interaction potentials) -QUIP - QUantum mechanics and Interatomic Potentials Support for QUIP can be -enabled via the flag `-D__QUIP`. +QUIP - QUantum mechanics and Interatomic Potentials Support for QUIP can be enabled via the flag +`-D__QUIP`. For more information see . @@ -316,40 +303,37 @@ SIRIUS is a domain specific library for electronic structure calculations. ### 2r. FPGA (optional, plane wave FFT calculations) -- Use `-D__PW_FPGA` to enable FPGA support for PW (fft) calculations. - Currently tested only for Intel Stratix 10 and Arria 10 GX1150 FPGAs. -- Supports single precision and double precision fft calculations with the use - of dedicated APIs. +- Use `-D__PW_FPGA` to enable FPGA support for PW (fft) calculations. Currently tested only for + Intel Stratix 10 and Arria 10 GX1150 FPGAs. +- Supports single precision and double precision fft calculations with the use of dedicated APIs. - Double precision is the default API chosen when set using the `-D__PW_FPGA` flag. -- Single precision can be set using an additional `-D__PW_FPGA_SP` flag along - with the `-D__PW_FPGA` flag. +- Single precision can be set using an additional `-D__PW_FPGA_SP` flag along with the `-D__PW_FPGA` + flag. - Kernel code must be synthesized separately and copied to a specific location. -- See - for the kernel code and instructions for synthesis. -- Read `src/pw/fpga/README.md` - for information on the specific location to copy the binaries to. +- See for the kernel code and instructions for synthesis. +- Read `src/pw/fpga/README.md` for information on the specific location to copy the binaries to. - Currently supported FFT3d sizes - 16^3, 32^3, 64^3. -- Include aocl compile flags and `-D__PW_FPGA -D__PW_FPGA_SP` to `CFLAGS`, - aocl linker flags to `LDFLAGS` and aocl libs to `LIBS`. +- Include aocl compile flags and `-D__PW_FPGA -D__PW_FPGA_SP` to `CFLAGS`, aocl linker flags to + `LDFLAGS` and aocl libs to `LIBS`. - When building FPGA and OFFLOAD together then `-D__NO_OFFLOAD_PW` must be used. ### 2s. COSMA (Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm) -- COSMA is an alternative for the pdgemm routine included in ScaLAPACK. - The library supports both CPU and GPUs. +- COSMA is an alternative for the pdgemm routine included in ScaLAPACK. The library supports both + CPU and GPUs. - Add `-D__COSMA` to the DFLAGS to enable support for COSMA. - See for more information. ### 2t. LibVori (Voronoi Integration for Electrostatic Properties from Electron Density) -- LibVori is a library which enables the calculation of electrostatic properties - (charge, dipole vector, quadrupole tensor, etc.) via integration of the total - electron density in the Voronoi cell of each atom. +- LibVori is a library which enables the calculation of electrostatic properties (charge, dipole + vector, quadrupole tensor, etc.) via integration of the total electron density in the Voronoi cell + of each atom. - Add `-D__LIBVORI` to the DFLAGS to enable support for LibVori. - See for more information. -- LibVori also enables support for the BQB file format for compressed trajectories, - please see for more information as well as - the `bqbtool` to inspect BQB files. +- LibVori also enables support for the BQB file format for compressed trajectories, please see + for more information as well as the `bqbtool` to inspect BQB + files. ### 2u. Torch (Machine Learning Framework needed for NequIP) @@ -358,89 +342,84 @@ SIRIUS is a domain specific library for electronic structure calculations. ### 2v. ROCM/HIP (Support for AMD GPU) -The code for the HIP based grid backend was developed and tested on Mi100 but -should work out of the box on Nvidia hardware as well. +The code for the HIP based grid backend was developed and tested on Mi100 but should work out of the +box on Nvidia hardware as well. - Use `-D__OFFLOAD_HIP` to generally enable support for AMD GPUs - Use `-D__NO_OFFLOAD_GRID` to disable the GPU backend of the grid library. - Use `-D__NO_OFFLOAD_DBM` to disable the GPU backend of the sparse tensor library. -- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs - and associated gather/scatter operations. +- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs and associated gather/scatter + operations. - Add `GPUVER=Mi50, Mi60, Mi100, Mi250` - Add `OFFLOAD_CC = hipcc` -- Add `-lamdhip64` to the `LIBS` variable -- Add `OFFLOAD_FLAGS = '-fopenmp -m64 -pthread -fPIC -D__GRID_HIP -O2 --offload-arch=gfx908 --rocm-path=$(ROCM_PATH)'` where `ROCM_PATH` is the path - where the rocm sdk resides. Architectures Mi250 (gfx90a), Mi100 (gfx908), - Mi50 (gfx906) the hip backend for the grid library supports nvidia hardware - as well. It uses the same code and can be used to validate the backend in case - of access to Nvidia hardware only. To get the compilation working, follow - the steps above and set the `OFFLOAD_FLAGS` with right `nvcc` parameters - (see the cuda section of this document). The environment variable `HIP_PLATFORM` - should be set to `HIP_PLATFORM=nvidia` to indicate to hipcc to use the - nvcc compiler instead. -- Specify the C++ compiler (e.g., `CXX = g++`). Remember to set the - CXXFLAGS flags to support C++11 standard and OpenMP. -- When the HIP backend is enabled for DBCSR using `-D__DBCSR_ACC`, then add - `-D__HIP_PLATFORM_AMD__` to `CXXFLAGS` and set `OFFLOAD_TARGET = hip`. -- Use `-D__OFFLOAD_PROFILING` to turn on the AMD ROC TX and Tracer libray. - It requires to link `-lroctx64 -lroctracer64`. +- Add `-lamdhip64` to the `LIBS` variable +- Add + `OFFLOAD_FLAGS = '-fopenmp -m64 -pthread -fPIC -D__GRID_HIP -O2 --offload-arch=gfx908 --rocm-path=$(ROCM_PATH)'` + where `ROCM_PATH` is the path where the rocm sdk resides. Architectures Mi250 (gfx90a), Mi100 + (gfx908), Mi50 (gfx906) the hip backend for the grid library supports nvidia hardware as well. It + uses the same code and can be used to validate the backend in case of access to Nvidia hardware + only. To get the compilation working, follow the steps above and set the `OFFLOAD_FLAGS` with + right `nvcc` parameters (see the cuda section of this document). The environment variable + `HIP_PLATFORM` should be set to `HIP_PLATFORM=nvidia` to indicate to hipcc to use the nvcc + compiler instead. +- Specify the C++ compiler (e.g., `CXX = g++`). Remember to set the CXXFLAGS flags to support C++11 + standard and OpenMP. +- When the HIP backend is enabled for DBCSR using `-D__DBCSR_ACC`, then add `-D__HIP_PLATFORM_AMD__` + to `CXXFLAGS` and set `OFFLOAD_TARGET = hip`. +- Use `-D__OFFLOAD_PROFILING` to turn on the AMD ROC TX and Tracer libray. It requires to link + `-lroctx64 -lroctracer64`. ### 2w. OpenCL Devices -OpenCL devices are currently supported for DBCSR and can cover GPUs and other devices. -Kernels can be automatically tuned like for the CUDA/HIP backend in DBCSR. -Note: the OpenCL backend uses some functionality from LIBXSMM (dependency). - -- Installing an OpenCL runtime depends on the operating system and the device vendor. - Debian for instance brings two packages called `opencl-headers` and - `ocl-icd-opencl-dev` which can be present in addition to a vendor-specific - installation. The OpenCL header files are only necessary if CP2K/DBCSR is compiled - from source. Please note, some implementations ship with outdated OpenCL - headers which can prevent using latest features (if an application discovers such - features only at compile-time). When building from source, for instance - `libOpenCL.so` is sufficient (ICD loader) at link-time. However, an Installable - Client Driver (ICD) is finally necessary at runtime. -- Nvidia CUDA, AMD HIP, and Intel OneAPI are fully equipped with an OpenCL runtime - (if `opencl-headers` package is not installed, CPATH can be needed to point into - such an installation, similarly `LIBRARY_PATH` for finding `libOpenCL.so` - at link-time). Installing a minimal or stand-alone OpenCL is also possible, - e.g., following the instructions for Debian (or Ubuntu) as given for every - [release](https://github.com/intel/compute-runtime/releases) of the +OpenCL devices are currently supported for DBCSR and can cover GPUs and other devices. Kernels can +be automatically tuned like for the CUDA/HIP backend in DBCSR. Note: the OpenCL backend uses some +functionality from LIBXSMM (dependency). + +- Installing an OpenCL runtime depends on the operating system and the device vendor. Debian for + instance brings two packages called `opencl-headers` and `ocl-icd-opencl-dev` which can be present + in addition to a vendor-specific installation. The OpenCL header files are only necessary if + CP2K/DBCSR is compiled from source. Please note, some implementations ship with outdated OpenCL + headers which can prevent using latest features (if an application discovers such features only at + compile-time). When building from source, for instance `libOpenCL.so` is sufficient (ICD loader) + at link-time. However, an Installable Client Driver (ICD) is finally necessary at runtime. +- Nvidia CUDA, AMD HIP, and Intel OneAPI are fully equipped with an OpenCL runtime (if + `opencl-headers` package is not installed, CPATH can be needed to point into such an installation, + similarly `LIBRARY_PATH` for finding `libOpenCL.so` at link-time). Installing a minimal or + stand-alone OpenCL is also possible, e.g., following the instructions for Debian (or Ubuntu) as + given for every [release](https://github.com/intel/compute-runtime/releases) of the [Intel Compute Runtime](https://github.com/intel/compute-runtime). -- CP2K's toolchain supports `--enable-opencl` to select DBCSR's OpenCL backend. - This can be combined with `--enable-cuda` (`--gpu-ver` is then imposed) to - use a GPU for CP2K's grid and DBM/DBT components (no OpenCL support yet). -- For manually writing an ARCH-file add `-D__OPENCL` and `-D__DBCSR_ACC` to `CFLAGS`, - and add `-lOpenCL` to the `LIBS` variable, i.e., `OFFLOAD_CC` and `OFFLOAD_FLAGS` - can duplicate `CC` and `CFLAGS` (no special offload compiler needed). Please also - set `OFFLOAD_TARGET = opencl` to enable the OpenCL backend in DBCSR. For OpenCL, - it is not necessary to specify a GPU version (e.g., `GPUVER = V100` would map to - `exts/dbcsr/src/acc/opencl/smm/params/tune_multiply_V100.csv`). In fact, `GPUVER` - limits tuned parameters to the specified GPU, and by default all tuned parameters - are embedded (`exts/dbcsr/src/acc/opencl/smm/params/*.csv`) and applied at runtime. - If auto-tuned parameters are not available for DBCSR, well-chosen defaults will - be used to populate kernels at runtime. Refer to the toolchain method (above) - for an ARCH-file that blends, e.g., OpenCL and CUDA. -- Auto-tuned parameters are embedded into the binary, i.e., CP2K does not rely on - a hard-coded location. Setting `OPENCL_LIBSMM_SMM_PARAMS=/path/to/csv-file` - environment variable can supply parameters for an already built application, - or `OPENCL_LIBSMM_SMM_PARAMS=0` can disable using tuned parameters. -- The environment variable `ACC_OPENCL_VERBOSE=2` prints information about - kernels generated at runtime (and thereby checks the installation). -- Refer to for, e.g., environment variables, - or how to tune kernels (auto tuned parameters). +- CP2K's toolchain supports `--enable-opencl` to select DBCSR's OpenCL backend. This can be combined + with `--enable-cuda` (`--gpu-ver` is then imposed) to use a GPU for CP2K's grid and DBM/DBT + components (no OpenCL support yet). +- For manually writing an ARCH-file add `-D__OPENCL` and `-D__DBCSR_ACC` to `CFLAGS`, and add + `-lOpenCL` to the `LIBS` variable, i.e., `OFFLOAD_CC` and `OFFLOAD_FLAGS` can duplicate `CC` and + `CFLAGS` (no special offload compiler needed). Please also set `OFFLOAD_TARGET = opencl` to enable + the OpenCL backend in DBCSR. For OpenCL, it is not necessary to specify a GPU version (e.g., + `GPUVER = V100` would map to `exts/dbcsr/src/acc/opencl/smm/params/tune_multiply_V100.csv`). In + fact, `GPUVER` limits tuned parameters to the specified GPU, and by default all tuned parameters + are embedded (`exts/dbcsr/src/acc/opencl/smm/params/*.csv`) and applied at runtime. If auto-tuned + parameters are not available for DBCSR, well-chosen defaults will be used to populate kernels at + runtime. Refer to the toolchain method (above) for an ARCH-file that blends, e.g., OpenCL and + CUDA. +- Auto-tuned parameters are embedded into the binary, i.e., CP2K does not rely on a hard-coded + location. Setting `OPENCL_LIBSMM_SMM_PARAMS=/path/to/csv-file` environment variable can supply + parameters for an already built application, or `OPENCL_LIBSMM_SMM_PARAMS=0` can disable using + tuned parameters. +- The environment variable `ACC_OPENCL_VERBOSE=2` prints information about kernels generated at + runtime (and thereby checks the installation). +- Refer to for, e.g., environment variables, or how to tune kernels + (auto tuned parameters). ### 2x. matrix-matrix multiplication offloading on GPU using SPLA -The SPLA library is a hard dependency of SIRIUS but can also be used as a -standalone library. It provides a generic interface to the blas gemm family with -offloading on GPU. Offloading supports both CUDA and ROCM. +The SPLA library is a hard dependency of SIRIUS but can also be used as a standalone library. It +provides a generic interface to the blas gemm family with offloading on GPU. Offloading supports +both CUDA and ROCM. -To make the functionality available, add the flag `-D__SPLA -D__OFFLOAD_GEMM` to -the `DFLAGS` variable and compile SPLA with Fortran interface and GPU support. -Please note that only the functions replacing the dgemm calls with -`offload_dgemm` will eventually be offloaded to the GPU. The SPLA library has -internal criteria to decide if it is worth to do the operation on GPU or not. +To make the functionality available, add the flag `-D__SPLA -D__OFFLOAD_GEMM` to the `DFLAGS` +variable and compile SPLA with Fortran interface and GPU support. Please note that only the +functions replacing the dgemm calls with `offload_dgemm` will eventually be offloaded to the GPU. +The SPLA library has internal criteria to decide if it is worth to do the operation on GPU or not. Calls to `offload_dgemm` also accept pointers on GPU or a combination of them.