From 443130275cf890dfe289429ceff8b6def4294faa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ole=20Sch=C3=BCtt?= <ole.schuett@cp2k.org>
Date: Tue, 14 Nov 2023 12:10:42 +0100
Subject: [PATCH] precommit: Wrap Markdown lines after 100 characters

---
 INSTALL.md                                 | 561 ++++++++++-----------
 README.md                                  |  51 +-
 README_cmake.md                            | 176 +++----
 benchmarks/Fayalite-FIST/README.md         |  20 +-
 benchmarks/QMMM_CBD_PHY/README.md          |  20 +-
 benchmarks/QMMM_ClC/README.md              |  28 +-
 benchmarks/QMMM_MQAE/README.md             |  22 +-
 benchmarks/QS/README.md                    |  52 +-
 benchmarks/QS_DM_LS/README.md              |  54 +-
 benchmarks/QS_LiH_HFX/README.md            |  75 ++-
 benchmarks/QS_low_scaling_postHF/README.md |  42 +-
 benchmarks/QS_mp2_rpa/128-H2O/README.md    |   7 +-
 benchmarks/QS_mp2_rpa/32-H2O/README.md     |  14 +-
 benchmarks/QS_mp2_rpa/64-H2O/README.md     |  23 +-
 benchmarks/QS_mp2_rpa/README.md            |  18 +-
 benchmarks/QS_ot_ls/README.md              |   8 +-
 benchmarks/QS_pao_ml_tio2/README.md        |   7 +-
 benchmarks/QS_single_node/README.md        |   7 +-
 benchmarks/QS_stmv/README.md               |  32 +-
 benchmarks/README.md                       |  68 ++-
 data/NNP/bulkH2O-jcp2020-cnnp/README.md    |  13 +-
 docs/README.md                             |   5 +-
 docs/getting-started/spack.md              |  49 +-
 docs/methods/pao-ml.md                     | 182 +++----
 docs/units.md                              |  29 +-
 src/README.md                              |   3 +-
 src/dbm/README.md                          |  22 +-
 src/dbt/README.md                          |   6 +-
 src/grid/README.md                         |  35 +-
 src/pw/fpga/README.md                      |   8 +-
 src/start/python/README.md                 |  21 +-
 tests/Fist/EAM_LIB/README.md               |   8 +-
 tests/README.md                            |  36 +-
 tools/benchmark_plots/README.md            |  34 +-
 tools/dashboard/README.md                  |  37 +-
 tools/docker/README.md                     |   9 +-
 tools/doxify/README.md                     | 128 +++--
 tools/input_editing/emacs/README.md        |  40 +-
 tools/input_editing/vim/README.md          |   7 +-
 tools/plan_mpi_omp/README.md               | 159 +++---
 tools/precommit/README.md                  |  62 +--
 tools/precommit/precommit_server.py        |   2 +-
 tools/toolchain/README.md                  | 231 ++++-----
 43 files changed, 1116 insertions(+), 1295 deletions(-)

diff --git a/INSTALL.md b/INSTALL.md
index 9ba785ddf2..2ecbb3fa5b 100644
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -3,17 +3,18 @@
 ## 1. Acquire the code
 
 For users, the preferred method is to [download a release](https://github.com/cp2k/cp2k/releases/)
-(use the versioned tarballs, `cp2k-X.Y.tar.bz2`). For developers, the preferred
-method is to [download from Git](./README.md#downloading-cp2k-source-code).
+(use the versioned tarballs, `cp2k-X.Y.tar.bz2`). For developers, the preferred method is to
+[download from Git](./README.md#downloading-cp2k-source-code).
 
 For more details on downloading CP2K, see <https://www.cp2k.org/download>.
 
 ## 2. Install prerequisites
 
-The easiest way to build CP2K with all its dependencies is as a [Docker container](./tools/docker/README.md).
+The easiest way to build CP2K with all its dependencies is as a
+[Docker container](./tools/docker/README.md).
 
-Alternatively, the [toolchain script](./tools/toolchain/install_cp2k_toolchain.sh)
-can also be run directly.
+Alternatively, the [toolchain script](./tools/toolchain/install_cp2k_toolchain.sh) can also be run
+directly.
 
 For a complete introduction to the toolchain script, see the [README](./tools/toolchain/README.md).
 
@@ -33,16 +34,15 @@ cd tools/toolchain/
      --with-fftw=system --with-reflapack=no  --enable-cuda
 ```
 
-- Once the script has completed successfully, follow the instructions given at
-  the end of its output. Note that the pre-built arch files provided by the
-  toolchain are for the GNU compiler, users must adapt them for other compilers.
-  It is possible to use the provided [arch files](./arch) as guidance.
+- Once the script has completed successfully, follow the instructions given at the end of its
+  output. Note that the pre-built arch files provided by the toolchain are for the GNU compiler,
+  users must adapt them for other compilers. It is possible to use the provided [arch files](./arch)
+  as guidance.
 
 There are [arch files](./arch) for a few specific platforms (e.g.
 [Linux-gnu-x86_64](./arch/Linux-gnu-x86_64.psmp),
-[Linux-intel-x86_64](./arch/Linux-intel-x86_64.psmp))
-which include a toolchain build.
-Sourcing such an arch file in the cp2k folder launches a toolchain build, e.g.
+[Linux-intel-x86_64](./arch/Linux-intel-x86_64.psmp)) which include a toolchain build. Sourcing such
+an arch file in the cp2k folder launches a toolchain build, e.g.
 
 ```
 source ./arch/Linux-gnu-x86_64.psmp
@@ -58,39 +58,37 @@ Check also the corresponding [HowTos](https://www.cp2k.org/howto/) for
 [Apple M1 (macOS)](https://www.cp2k.org/howto:compile_on_macos/) and
 [Cray XC40/50 (Piz Daint, CSCS)](https://www.cp2k.org/howto:compile_on_cray_cscs/).
 
-Sub-points here discuss prerequisites needed to build CP2K. Copies of the
-recommended versions of 3rd party software can be downloaded from <https://www.cp2k.org/static/downloads/>.
+Sub-points here discuss prerequisites needed to build CP2K. Copies of the recommended versions of
+3rd party software can be downloaded from <https://www.cp2k.org/static/downloads/>.
 
-Generally, CP2K supports only one version for each of its dependencies.
-These are defined by the [toolchain scripts](./tools/toolchain/scripts/).
-Other versions might work too, but we don't test them. So, your mileage may vary.
+Generally, CP2K supports only one version for each of its dependencies. These are defined by the
+[toolchain scripts](./tools/toolchain/scripts/). Other versions might work too, but we don't test
+them. So, your mileage may vary.
 
 ### 2a. GNU make (required, build system)
 
-GNU make should be on your system (gmake or make on linux) and used for the build,
-go to <https://www.gnu.org/software/make/make.html> download from <https://ftp.gnu.org/pub/gnu/make/>.
+GNU make should be on your system (gmake or make on linux) and used for the build, go to
+<https://www.gnu.org/software/make/make.html> download from <https://ftp.gnu.org/pub/gnu/make/>.
 
 ### 2b. Python (required, build system)
 
-Python 3.5+ is needed to run the dependency generator. On most system Python is
-already installed. For more information visit: <https://www.python.org>
+Python 3.5+ is needed to run the dependency generator. On most system Python is already installed.
+For more information visit: <https://www.python.org>
 
 ### 2c. Fortran and C Compiler (required, build system)
 
-A Fortran 2008 compiler and matching C99 compiler should be installed on your system.
-We have good experience with gcc/gfortran (gcc >=4.6 works, later version recommended).
-Be aware that some compilers have bugs that might cause them to fail (internal
-compiler errors, segfaults) or, worse, yield a mis-compiled CP2K. Report bugs to
-compiler vendors; they (and we) have an interest in fixing them. A list of tested
-compiler can be found [here](https://www.cp2k.org/dev:compiler_support).
+A Fortran 2008 compiler and matching C99 compiler should be installed on your system. We have good
+experience with gcc/gfortran (gcc >=4.6 works, later version recommended). Be aware that some
+compilers have bugs that might cause them to fail (internal compiler errors, segfaults) or, worse,
+yield a mis-compiled CP2K. Report bugs to compiler vendors; they (and we) have an interest in fixing
+them. A list of tested compiler can be found [here](https://www.cp2k.org/dev:compiler_support).
 Always run a `make -j test` (See point 5.) after compilation to identify these problems.
 
 ### 2d. BLAS and LAPACK (required, base functionality)
 
-BLAS and LAPACK should be installed.  Using vendor-provided libraries can make a
-very significant difference (up to 100%, e.g., ACML, MKL, ESSL), not all optimized
-libraries are bug free. Use the latest versions available, use the interfaces
-matching your compiler, and download all patches!
+BLAS and LAPACK should be installed. Using vendor-provided libraries can make a very significant
+difference (up to 100%, e.g., ACML, MKL, ESSL), not all optimized libraries are bug free. Use the
+latest versions available, use the interfaces matching your compiler, and download all patches!
 
 - The canonical BLAS and LAPACK can be obtained from the Netlib repository:
   - <http://www.netlib.org/blas/>
@@ -101,128 +99,118 @@ matching your compiler, and download all patches!
   - <http://math-atlas.sourceforge.net>
   - <https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2>
 
-Please note that the BLAS/LAPACK implementation used by CP2K needs to be
-thread-safe (OpenMP). Examples are the sequential variant of the Intel MKL,
-the Cray libsci, the OpenBLAS OpenMP variant and the reference BLAS/LAPACK packages.
-If compiling with MKL, users must
-define `-D__MKL` to ensure the code is thread-safe. MKL with multiple OpenMP
-threads in CP2K requires that CP2K was compiled with the Intel compiler.
-If the `cpp` precompiler is used in a separate precompilation step in combination
-with the Intel Fortran compiler, `-D__INTEL_COMPILER` must be added explicitly
-(the Intel compiler sets `__INTEL_COMPILER` otherwise automatically).
+Please note that the BLAS/LAPACK implementation used by CP2K needs to be thread-safe (OpenMP).
+Examples are the sequential variant of the Intel MKL, the Cray libsci, the OpenBLAS OpenMP variant
+and the reference BLAS/LAPACK packages. If compiling with MKL, users must define `-D__MKL` to ensure
+the code is thread-safe. MKL with multiple OpenMP threads in CP2K requires that CP2K was compiled
+with the Intel compiler. If the `cpp` precompiler is used in a separate precompilation step in
+combination with the Intel Fortran compiler, `-D__INTEL_COMPILER` must be added explicitly (the
+Intel compiler sets `__INTEL_COMPILER` otherwise automatically).
 
-On the Mac, BLAS and LAPACK may be provided by Apple's Accelerate framework.
-If using this framework, `-D__ACCELERATE` must be defined to account for some
-interface incompatibilities between Accelerate and reference BLAS/LAPACK.
+On the Mac, BLAS and LAPACK may be provided by Apple's Accelerate framework. If using this
+framework, `-D__ACCELERATE` must be defined to account for some interface incompatibilities between
+Accelerate and reference BLAS/LAPACK.
 
-When building on/for Windows using the Minimalist GNU for Windows (MinGW) environment,
-you must set `-D__MINGW`, `-D__NO_STATM_ACCESS` and `-D__NO_SOCKETS` to avoid
-undefined references during linking, respectively errors while printing the statistics.
+When building on/for Windows using the Minimalist GNU for Windows (MinGW) environment, you must set
+`-D__MINGW`, `-D__NO_STATM_ACCESS` and `-D__NO_SOCKETS` to avoid undefined references during
+linking, respectively errors while printing the statistics.
 
 ### 2e. MPI and SCALAPACK (optional, required for MPI parallel builds)
 
-MPI (version 3) and SCALAPACK are needed for parallel code.
-(Use the latest versions available and download all patches!).
+MPI (version 3) and SCALAPACK are needed for parallel code. (Use the latest versions available and
+download all patches!).
 
-:warning: Note that your MPI installation must match the used Fortran compiler.
-If your computing platform does not provide MPI,
-there are several freely available alternatives:
+:warning: Note that your MPI installation must match the used Fortran compiler. If your computing
+platform does not provide MPI, there are several freely available alternatives:
 
-- MPICH2 MPI: <http://www-unix.mcs.anl.gov/mpi/mpich/>
-  (may require `-fallow-argument-mismatch` when building with GCC 10)
+- MPICH2 MPI: <http://www-unix.mcs.anl.gov/mpi/mpich/> (may require `-fallow-argument-mismatch` when
+  building with GCC 10)
 - OpenMPI MPI: <http://www.open-mpi.org/>
 - ScaLAPACK:
   - <http://www.netlib.org/scalapack/>
   - <http://www.netlib.org/lapack-dev/>
-  - ScaLAPACK can be part of ACML or cluster MKL.
-    These libraries are recommended if available.
-  - Recently a [ScaLAPACK installer](http://www.netlib.org/scalapack/scalapack_installer.tgz)
-    has been added that simplifies the installation.
+  - ScaLAPACK can be part of ACML or cluster MKL. These libraries are recommended if available.
+  - Recently a [ScaLAPACK installer](http://www.netlib.org/scalapack/scalapack_installer.tgz) has
+    been added that simplifies the installation.
 
-CP2K assumes that the MPI library implements MPI version 3. Older
-versions of MPI (e.g., MPI 2.0) are not supported and the old flag `-D__MPI_VERSION` in
-the arch file will be ignored. CP2K can make use of the mpi_f08 module. If its use is requested,
-set the flag `-D__MPI_F08`.
+CP2K assumes that the MPI library implements MPI version 3. Older versions of MPI (e.g., MPI 2.0)
+are not supported and the old flag `-D__MPI_VERSION` in the arch file will be ignored. CP2K can make
+use of the mpi_f08 module. If its use is requested, set the flag `-D__MPI_F08`.
 
 ### 2f. FFTW (optional, improved performance of FFTs)
 
-FFTW can be used to improve FFT speed on a wide range of architectures.
-It is strongly recommended to install and use FFTW3. The current version of CP2K
-works with FFTW 3.X (use `-D__FFTW3`). It can be downloaded from <http://www.fftw.org>
+FFTW can be used to improve FFT speed on a wide range of architectures. It is strongly recommended
+to install and use FFTW3. The current version of CP2K works with FFTW 3.X (use `-D__FFTW3`). It can
+be downloaded from <http://www.fftw.org>
 
-:warning: Note that FFTW must know the Fortran compiler you will use in order to
-install properly (e.g., `export F77=gfortran` before configure if you intend to
-use gfortran).
+:warning: Note that FFTW must know the Fortran compiler you will use in order to install properly
+(e.g., `export F77=gfortran` before configure if you intend to use gfortran).
 
-:warning: Note that on machines and compilers which support SSE you can configure
-FFTW3 with `--enable-sse2`. Compilers/systems that do not align memory (NAG f95,
-Intel IA32/gfortran) should either not use `--enable-sse2` or otherwise set the
-define `-D__FFTW3_UNALIGNED` in the arch file. Since CP2K is OpenMP parallelized,
-the FFTW3 threading library libfftw3_threads (or libfftw3_omp) is required.
+:warning: Note that on machines and compilers which support SSE you can configure FFTW3 with
+`--enable-sse2`. Compilers/systems that do not align memory (NAG f95, Intel IA32/gfortran) should
+either not use `--enable-sse2` or otherwise set the define `-D__FFTW3_UNALIGNED` in the arch file.
+Since CP2K is OpenMP parallelized, the FFTW3 threading library libfftw3_threads (or libfftw3_omp) is
+required.
 
 ### 2g. LIBINT (optional, enables methods including HF exchange)
 
-- Hartree-Fock exchange (optional, use `-D__LIBINT`)
-  requires the LIBINT package to be installed.
-- Recommended way to build LIBINT: Download a CP2K-configured LIBINT library
-  from [libint-cp2k](https://github.com/cp2k/libint-cp2k). Build and install
-  LIBINT by following the instructions provided there. Note that using a library
-  configured for higher maximum angular momentum will increase build time and
-  binary size of CP2K executable (assuming static linking).
-- CP2K is not hardwired to these provided libraries and any other LIBINT
-  library (version >= 2.5.0) should be compatible as long as it was compiled
-  with `--enable-eri=1` and default ordering.
-- Avoid debugging information (`-g` flag) for compiling LIBINT since this will
-  increase library size by a large factor.
-- In the arch file of CP2K: add `-D__LIBINT` to the `DFLAGS`.
-  Add `-L$(LIBINT_DIR)/lib -lint2 -lstdc++` to `LIBS` and `-I$(LIBINT_DIR)/include`
-  to `FCFLAGS`. `lstdc++` is needed if you use the GNU C++ compiler.
-- Libint 1 is no longer supported and the previously needed flags
-  `-D__LIBINT_MAX_AM` and `-D__LIBDERIV_MAX_AM1` are ignored.
-- `-D__MAX_CONTR=4` (default=2) can be used to compile efficient contraction
-  kernels up to l=4, but the build time will increase accordingly.
+- Hartree-Fock exchange (optional, use `-D__LIBINT`) requires the LIBINT package to be installed.
+- Recommended way to build LIBINT: Download a CP2K-configured LIBINT library from
+  [libint-cp2k](https://github.com/cp2k/libint-cp2k). Build and install LIBINT by following the
+  instructions provided there. Note that using a library configured for higher maximum angular
+  momentum will increase build time and binary size of CP2K executable (assuming static linking).
+- CP2K is not hardwired to these provided libraries and any other LIBINT library (version >= 2.5.0)
+  should be compatible as long as it was compiled with `--enable-eri=1` and default ordering.
+- Avoid debugging information (`-g` flag) for compiling LIBINT since this will increase library size
+  by a large factor.
+- In the arch file of CP2K: add `-D__LIBINT` to the `DFLAGS`. Add
+  `-L$(LIBINT_DIR)/lib -lint2 -lstdc++` to `LIBS` and `-I$(LIBINT_DIR)/include` to `FCFLAGS`.
+  `lstdc++` is needed if you use the GNU C++ compiler.
+- Libint 1 is no longer supported and the previously needed flags `-D__LIBINT_MAX_AM` and
+  `-D__LIBDERIV_MAX_AM1` are ignored.
+- `-D__MAX_CONTR=4` (default=2) can be used to compile efficient contraction kernels up to l=4, but
+  the build time will increase accordingly.
 
 ### 2h. LIBXSMM (optional, improved performance for matrix multiplication)
 
 - A library for matrix operations and deep learning primitives: <https://github.com/hfp/libxsmm/>.
-- Add `-D__LIBXSMM` to enable it, with suitable include and library paths,
-  e.g., `FCFLAGS += -I${LIBXSMM_DIR}/include -D__LIBXSMM`
-  and `LIBS += -L${LIBXSMM_DIR}/lib -lxsmmf -lxsmm -ldl`
+- Add `-D__LIBXSMM` to enable it, with suitable include and library paths, e.g.,
+  `FCFLAGS += -I${LIBXSMM_DIR}/include -D__LIBXSMM` and
+  `LIBS += -L${LIBXSMM_DIR}/lib -lxsmmf -lxsmm -ldl`
 - LIBSMM is not used if LIBXSMM is enabled.
 
 ### 2i. CUDA (optional, improved performance on GPU systems)
 
-- Specify OFFLOAD_CC (e.g., `OFFLOAD_CC = nvcc`) and
-  OFFLOAD_FLAGS (e.g., `OFFLOAD_FLAGS = -O3 -g -w --std=c++11`) variables.
-  Remember to include the support for the C++11 standard.
+- Specify OFFLOAD_CC (e.g., `OFFLOAD_CC = nvcc`) and OFFLOAD_FLAGS (e.g.,
+  `OFFLOAD_FLAGS = -O3 -g -w --std=c++11`) variables. Remember to include the support for the C++11
+  standard.
 - Use `-D__OFFLOAD_CUDA` to generally enable support for Nvidia GPUs
-- Use the `-D__DBCSR_ACC` and `OFFLOAD_TARGET = cuda` to enable
-  accelerator support for matrix multiplications.
+- Use the `-D__DBCSR_ACC` and `OFFLOAD_TARGET = cuda` to enable accelerator support for matrix
+  multiplications.
 - Add `-lstdc++ -lcudart -lnvrtc -lcuda -lcublas` to LIBS.
-- Specify the GPU type (e.g., `GPUVER = P100`),
-  possible values are K20X, K40, K80, P100, V100, A100.
-- Specify the C++ compiler (e.g., `CXX = g++`) and the CXXFLAGS to support
-  the C++11 standard.
-- CUFFT 7.0 has a known bug and is therefore disabled by default.
-  NVIDIA's webpage list a patch (an upgraded version cufft i.e. >= 7.0.35) -
-  use this together with `-D__HAS_PATCHED_CUFFT_70`.
-- Use `-D__OFFLOAD_PROFILING` to turn on Nvidia Tools Extensions.
-  It requires to link `-lnvToolsExt`.
+- Specify the GPU type (e.g., `GPUVER = P100`), possible values are K20X, K40, K80, P100, V100,
+  A100.
+- Specify the C++ compiler (e.g., `CXX = g++`) and the CXXFLAGS to support the C++11 standard.
+- CUFFT 7.0 has a known bug and is therefore disabled by default. NVIDIA's webpage list a patch (an
+  upgraded version cufft i.e. >= 7.0.35) - use this together with `-D__HAS_PATCHED_CUFFT_70`.
+- Use `-D__OFFLOAD_PROFILING` to turn on Nvidia Tools Extensions. It requires to link
+  `-lnvToolsExt`.
 - Link to a blas/scalapack library that accelerates large DGEMMs (e.g., libsci_acc)
 - Use `-D__NO_OFFLOAD_GRID` to disable the GPU backend of the grid library.
 - Use `-D__NO_OFFLOAD_DBM` to disable the GPU backend of the sparse tensor library.
-- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs
-  and associated gather/scatter operations.
+- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs and associated gather/scatter
+  operations.
 
 ### 2j. LIBXC (optional, wider choice of xc functionals)
 
-- The version 5.1.0 (or later) of LIBXC can be downloaded from <https://www.tddft.org/programs/libxc>
-- CP2K does not make use of fourth derivates such that LIBXC may be configured
-  with './configure --disable-lxc \<other LIBXC configuration flags>'.
-- During the installation, the directories `$(LIBXC_DIR)/lib`
-  and `$(LIBXC_DIR)/include` are created.
-- Add `-D__LIBXC` to DFLAGS, `-I$(LIBXC_DIR)/include` to FCFLAGS
-  and `-L$(LIBXC_DIR)/lib -lxcf03 -lxc` to LIBS.
+- The version 5.1.0 (or later) of LIBXC can be downloaded from
+  <https://www.tddft.org/programs/libxc>
+- CP2K does not make use of fourth derivates such that LIBXC may be configured with './configure
+  --disable-lxc \<other LIBXC configuration flags>'.
+- During the installation, the directories `$(LIBXC_DIR)/lib` and `$(LIBXC_DIR)/include` are
+  created.
+- Add `-D__LIBXC` to DFLAGS, `-I$(LIBXC_DIR)/include` to FCFLAGS and
+  `-L$(LIBXC_DIR)/lib -lxcf03 -lxc` to LIBS.
 - :warning: Note that the deprecated flags `-D__LIBXC2` and `-D__LIBXC3` are ignored.
 
 ### 2k. ELPA (optional, improved performance for diagonalization)
@@ -234,19 +222,19 @@ Library ELPA for the solution of the eigenvalue problem
 - During the installation the `libelpa_openmp.a` is created.
 - Minimal supported version of ELPA is 2018.05.001.
 - Add `-D__ELPA` to `DFLAGS`
-- Add `-D__ELPA_NVIDIA_GPU`, `-D__ELPA_AMD_GPU`, or `-D__ELPA_INTEL_GPU`
-  to `DFLAGS` to enable GPU support for the respective vendor.
+- Add `-D__ELPA_NVIDIA_GPU`, `-D__ELPA_AMD_GPU`, or `-D__ELPA_INTEL_GPU` to `DFLAGS` to enable GPU
+  support for the respective vendor.
 - Add `-I$(ELPA_INCLUDE_DIR)/modules` to `FCFLAGS`
 - Add `-I$(ELPA_INCLUDE_DIR)/elpa` to `FCFLAGS`
 - Add `-L$(ELPA_DIR)` to `LDFLAGS`
 - Add `-lelpa` to `LIBS`
-- For specific architectures it can be better to install specifically optimized
-  kernels (see BG) and/or employ a higher optimization level to compile it.
+- For specific architectures it can be better to install specifically optimized kernels (see BG)
+  and/or employ a higher optimization level to compile it.
 
 ### 2l. cuSOLVERMp (experimental, improved performance for diagonalization on Nvidia GPUs)
 
-NVIDIA cuSOLVERMp is a high-performance, distributed-memory, GPU-accelerated library
-that provides tools for the solution of dense linear systems and eigenvalue problems.
+NVIDIA cuSOLVERMp is a high-performance, distributed-memory, GPU-accelerated library that provides
+tools for the solution of dense linear systems and eigenvalue problems.
 
 - cuSOLVERMp replaces the ScaLapack `SYEVD` to improve the performance of the diagonalization
 - A version of cuSOLVERMp can be downloaded from <https://docs.nvidia.com/hpc-sdk/cusolvermp>.
@@ -255,29 +243,28 @@ that provides tools for the solution of dense linear systems and eigenvalue prob
 
 ### 2m. PEXSI (optional, low scaling SCF method)
 
-The Pole EXpansion and Selected Inversion (PEXSI) method requires the PEXSI
-library and two dependencies (ParMETIS or PT-Scotch and SuperLU_DIST).
+The Pole EXpansion and Selected Inversion (PEXSI) method requires the PEXSI library and two
+dependencies (ParMETIS or PT-Scotch and SuperLU_DIST).
 
-- Download PEXSI (www.pexsi.org) and install it and its dependencies by
-  following its README.md.
+- Download PEXSI (www.pexsi.org) and install it and its dependencies by following its README.md.
 - PEXSI versions 0.10.x have been tested with CP2K. Older versions are not supported.
 - PEXSI needs to be built with `make finstall`.
 
 In the arch file of CP2K:
 
-- Add `-lpexsi_${SUFFIX} -llapack -lblas -lsuperlu_dist_3.3 -lparmetis -lmetis`,
-  and their paths (with `-L$(LIB_DIR)`) to LIBS.
-- It is important that a copy of LAPACK and BLAS is placed before and after these
-  libraries  (replace `-llapack` and `-lblas` with the optimized versions as needed).
-- In order to link in PT-Scotch instead of ParMETIS replace `-lparmetis -lmetis`
-  with: `-lptscotchparmetis -lptscotch -lptscotcherr -lscotchmetis -lscotch -lscotcherr`
+- Add `-lpexsi_${SUFFIX} -llapack -lblas -lsuperlu_dist_3.3 -lparmetis -lmetis`, and their paths
+  (with `-L$(LIB_DIR)`) to LIBS.
+- It is important that a copy of LAPACK and BLAS is placed before and after these libraries (replace
+  `-llapack` and `-lblas` with the optimized versions as needed).
+- In order to link in PT-Scotch instead of ParMETIS replace `-lparmetis -lmetis` with:
+  `-lptscotchparmetis -lptscotch -lptscotcherr -lscotchmetis -lscotch -lscotcherr`
 - Add `-I$(PEXSI_DIR)/fortran/` to FCFLAGS.
 - Add `-D__LIBPEXSI` to DFLAGS.
 
 Below are some additional hints that may help in the compilation process:
 
-- For building PT-Scotch, the flag `-DSCOTCH_METIS_PREFIX` in `Makefile.inc`
-  must not be set and the flag `-DSCOTCH_PTHREAD` must be removed.
+- For building PT-Scotch, the flag `-DSCOTCH_METIS_PREFIX` in `Makefile.inc` must not be set and the
+  flag `-DSCOTCH_PTHREAD` must be removed.
 - For building SuperLU_DIST with PT-Scotch, you must set the following in `make.inc`:
 
 ```shell
@@ -287,8 +274,8 @@ PARMETISLIB = -lptscotchparmetis -lptscotch -lptscotcherr
 
 ### 2n. QUIP (optional, wider range of interaction potentials)
 
-QUIP - QUantum mechanics and Interatomic Potentials Support for QUIP can be
-enabled via the flag `-D__QUIP`.
+QUIP - QUantum mechanics and Interatomic Potentials Support for QUIP can be enabled via the flag
+`-D__QUIP`.
 
 For more information see <http://www.libatoms.org>.
 
@@ -316,40 +303,37 @@ SIRIUS is a domain specific library for electronic structure calculations.
 
 ### 2r. FPGA (optional, plane wave FFT calculations)
 
-- Use `-D__PW_FPGA` to enable FPGA support for PW (fft) calculations.
-  Currently tested only for Intel Stratix 10 and Arria 10 GX1150 FPGAs.
-- Supports single precision and double precision fft calculations with the use
-  of dedicated APIs.
+- Use `-D__PW_FPGA` to enable FPGA support for PW (fft) calculations. Currently tested only for
+  Intel Stratix 10 and Arria 10 GX1150 FPGAs.
+- Supports single precision and double precision fft calculations with the use of dedicated APIs.
 - Double precision is the default API chosen when set using the `-D__PW_FPGA` flag.
-- Single precision can be set using an additional `-D__PW_FPGA_SP` flag along
-  with the `-D__PW_FPGA` flag.
+- Single precision can be set using an additional `-D__PW_FPGA_SP` flag along with the `-D__PW_FPGA`
+  flag.
 - Kernel code must be synthesized separately and copied to a specific location.
-- See <https://github.com/pc2/fft3d-fpga/>
-  for the kernel code and instructions for synthesis.
-- Read `src/pw/fpga/README.md`
-  for information on the specific location to copy the binaries to.
+- See <https://github.com/pc2/fft3d-fpga/> for the kernel code and instructions for synthesis.
+- Read `src/pw/fpga/README.md` for information on the specific location to copy the binaries to.
 - Currently supported FFT3d sizes - 16^3, 32^3, 64^3.
-- Include aocl compile flags and `-D__PW_FPGA -D__PW_FPGA_SP` to `CFLAGS`,
-  aocl linker flags to `LDFLAGS` and aocl libs to `LIBS`.
+- Include aocl compile flags and `-D__PW_FPGA -D__PW_FPGA_SP` to `CFLAGS`, aocl linker flags to
+  `LDFLAGS` and aocl libs to `LIBS`.
 - When building FPGA and OFFLOAD together then `-D__NO_OFFLOAD_PW` must be used.
 
 ### 2s. COSMA (Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm)
 
-- COSMA is an alternative for the pdgemm routine included in ScaLAPACK.
-  The library supports both CPU and GPUs.
+- COSMA is an alternative for the pdgemm routine included in ScaLAPACK. The library supports both
+  CPU and GPUs.
 - Add `-D__COSMA` to the DFLAGS to enable support for COSMA.
 - See <https://github.com/eth-cscs/COSMA> for more information.
 
 ### 2t. LibVori (Voronoi Integration for Electrostatic Properties from Electron Density)
 
-- LibVori is a library which enables the calculation of electrostatic properties
-  (charge, dipole vector, quadrupole tensor, etc.) via integration of the total
-  electron density in the Voronoi cell of each atom.
+- LibVori is a library which enables the calculation of electrostatic properties (charge, dipole
+  vector, quadrupole tensor, etc.) via integration of the total electron density in the Voronoi cell
+  of each atom.
 - Add `-D__LIBVORI` to the DFLAGS to enable support for LibVori.
 - See <https://brehm-research.de/libvori> for more information.
-- LibVori also enables support for the BQB file format for compressed trajectories,
-  please see <https://brehm-research.de/bqb> for more information as well as
-  the `bqbtool` to inspect BQB files.
+- LibVori also enables support for the BQB file format for compressed trajectories, please see
+  <https://brehm-research.de/bqb> for more information as well as the `bqbtool` to inspect BQB
+  files.
 
 ### 2u. Torch (Machine Learning Framework needed for NequIP)
 
@@ -358,89 +342,84 @@ SIRIUS is a domain specific library for electronic structure calculations.
 
 ### 2v. ROCM/HIP (Support for AMD GPU)
 
-The code for the HIP based grid backend was developed and tested on Mi100 but
-should work out of the box on Nvidia hardware as well.
+The code for the HIP based grid backend was developed and tested on Mi100 but should work out of the
+box on Nvidia hardware as well.
 
 - Use `-D__OFFLOAD_HIP` to generally enable support for AMD GPUs
 - Use `-D__NO_OFFLOAD_GRID` to disable the GPU backend of the grid library.
 - Use `-D__NO_OFFLOAD_DBM` to disable the GPU backend of the sparse tensor library.
-- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs
-  and associated gather/scatter operations.
+- Use `-D__NO_OFFLOAD_PW` to disable the GPU backend of FFTs and associated gather/scatter
+  operations.
 - Add `GPUVER=Mi50, Mi60, Mi100, Mi250`
 - Add `OFFLOAD_CC = hipcc`
-- Add  `-lamdhip64` to the `LIBS` variable
-- Add `OFFLOAD_FLAGS = '-fopenmp -m64 -pthread -fPIC -D__GRID_HIP -O2 --offload-arch=gfx908 --rocm-path=$(ROCM_PATH)'` where `ROCM_PATH` is the path
-  where the rocm sdk resides. Architectures Mi250 (gfx90a), Mi100 (gfx908),
-  Mi50 (gfx906) the hip backend for the grid library supports nvidia hardware
-  as well. It uses the same code and can be used to validate the backend in case
-  of access to Nvidia hardware only. To get the compilation working, follow
-  the steps above and set the `OFFLOAD_FLAGS` with right `nvcc` parameters
-  (see the cuda section of this document). The environment variable `HIP_PLATFORM`
-  should be set to `HIP_PLATFORM=nvidia` to indicate to hipcc to use the
-  nvcc compiler instead.
-- Specify the C++ compiler (e.g., `CXX = g++`). Remember to set the
-  CXXFLAGS flags to support C++11 standard and OpenMP.
-- When the HIP backend is enabled for DBCSR using `-D__DBCSR_ACC`, then add
-  `-D__HIP_PLATFORM_AMD__` to `CXXFLAGS` and set `OFFLOAD_TARGET = hip`.
-- Use `-D__OFFLOAD_PROFILING` to turn on the AMD ROC TX and Tracer libray.
-  It requires to link `-lroctx64 -lroctracer64`.
+- Add `-lamdhip64` to the `LIBS` variable
+- Add
+  `OFFLOAD_FLAGS = '-fopenmp -m64 -pthread -fPIC -D__GRID_HIP -O2 --offload-arch=gfx908 --rocm-path=$(ROCM_PATH)'`
+  where `ROCM_PATH` is the path where the rocm sdk resides. Architectures Mi250 (gfx90a), Mi100
+  (gfx908), Mi50 (gfx906) the hip backend for the grid library supports nvidia hardware as well. It
+  uses the same code and can be used to validate the backend in case of access to Nvidia hardware
+  only. To get the compilation working, follow the steps above and set the `OFFLOAD_FLAGS` with
+  right `nvcc` parameters (see the cuda section of this document). The environment variable
+  `HIP_PLATFORM` should be set to `HIP_PLATFORM=nvidia` to indicate to hipcc to use the nvcc
+  compiler instead.
+- Specify the C++ compiler (e.g., `CXX = g++`). Remember to set the CXXFLAGS flags to support C++11
+  standard and OpenMP.
+- When the HIP backend is enabled for DBCSR using `-D__DBCSR_ACC`, then add `-D__HIP_PLATFORM_AMD__`
+  to `CXXFLAGS` and set `OFFLOAD_TARGET = hip`.
+- Use `-D__OFFLOAD_PROFILING` to turn on the AMD ROC TX and Tracer libray. It requires to link
+  `-lroctx64 -lroctracer64`.
 
 ### 2w. OpenCL Devices
 
-OpenCL devices are currently supported for DBCSR and can cover GPUs and other devices.
-Kernels can be automatically tuned like for the CUDA/HIP backend in DBCSR.
-Note: the OpenCL backend uses some functionality from LIBXSMM (dependency).
-
-- Installing an OpenCL runtime depends on the operating system and the device vendor.
-  Debian for instance brings two packages called `opencl-headers` and
-  `ocl-icd-opencl-dev` which can be present in addition to a vendor-specific
-  installation. The OpenCL header files are only necessary if CP2K/DBCSR is compiled
-  from source. Please note, some implementations ship with outdated OpenCL
-  headers which can prevent using latest features (if an application discovers such
-  features only at compile-time). When building from source, for instance
-  `libOpenCL.so` is sufficient (ICD loader) at link-time. However, an Installable
-  Client Driver (ICD) is finally necessary at runtime.
-- Nvidia CUDA, AMD HIP, and Intel OneAPI are fully equipped with an OpenCL runtime
-  (if `opencl-headers` package is not installed, CPATH can be needed to point into
-  such an installation, similarly `LIBRARY_PATH` for finding `libOpenCL.so`
-  at link-time). Installing a minimal or stand-alone OpenCL is also possible,
-  e.g., following the instructions for Debian (or Ubuntu) as given for every
-  [release](https://github.com/intel/compute-runtime/releases) of the
+OpenCL devices are currently supported for DBCSR and can cover GPUs and other devices. Kernels can
+be automatically tuned like for the CUDA/HIP backend in DBCSR. Note: the OpenCL backend uses some
+functionality from LIBXSMM (dependency).
+
+- Installing an OpenCL runtime depends on the operating system and the device vendor. Debian for
+  instance brings two packages called `opencl-headers` and `ocl-icd-opencl-dev` which can be present
+  in addition to a vendor-specific installation. The OpenCL header files are only necessary if
+  CP2K/DBCSR is compiled from source. Please note, some implementations ship with outdated OpenCL
+  headers which can prevent using latest features (if an application discovers such features only at
+  compile-time). When building from source, for instance `libOpenCL.so` is sufficient (ICD loader)
+  at link-time. However, an Installable Client Driver (ICD) is finally necessary at runtime.
+- Nvidia CUDA, AMD HIP, and Intel OneAPI are fully equipped with an OpenCL runtime (if
+  `opencl-headers` package is not installed, CPATH can be needed to point into such an installation,
+  similarly `LIBRARY_PATH` for finding `libOpenCL.so` at link-time). Installing a minimal or
+  stand-alone OpenCL is also possible, e.g., following the instructions for Debian (or Ubuntu) as
+  given for every [release](https://github.com/intel/compute-runtime/releases) of the
   [Intel Compute Runtime](https://github.com/intel/compute-runtime).
-- CP2K's toolchain supports `--enable-opencl` to select DBCSR's OpenCL backend.
-  This can be combined with `--enable-cuda` (`--gpu-ver` is then imposed) to
-  use a GPU for CP2K's grid and DBM/DBT components (no OpenCL support yet).
-- For manually writing an ARCH-file add `-D__OPENCL` and `-D__DBCSR_ACC` to `CFLAGS`,
-  and add `-lOpenCL` to the `LIBS` variable, i.e., `OFFLOAD_CC` and `OFFLOAD_FLAGS`
-  can duplicate `CC` and `CFLAGS` (no special offload compiler needed). Please also
-  set `OFFLOAD_TARGET = opencl` to enable the OpenCL backend in DBCSR. For OpenCL,
-  it is not necessary to specify a GPU version (e.g., `GPUVER = V100` would map to
-  `exts/dbcsr/src/acc/opencl/smm/params/tune_multiply_V100.csv`). In fact, `GPUVER`
-  limits tuned parameters to the specified GPU, and by default all tuned parameters
-  are embedded (`exts/dbcsr/src/acc/opencl/smm/params/*.csv`) and applied at runtime.
-  If auto-tuned parameters are not available for DBCSR, well-chosen defaults will
-  be used to populate kernels at runtime. Refer to the toolchain method (above)
-  for an ARCH-file that blends, e.g., OpenCL and CUDA.
-- Auto-tuned parameters are embedded into the binary, i.e., CP2K does not rely on
-  a hard-coded location. Setting `OPENCL_LIBSMM_SMM_PARAMS=/path/to/csv-file`
-  environment variable can supply parameters for an already built application,
-  or `OPENCL_LIBSMM_SMM_PARAMS=0` can disable using tuned parameters.
-- The environment variable `ACC_OPENCL_VERBOSE=2` prints information about
-  kernels generated at runtime (and thereby checks the installation).
-- Refer to <https://cp2k.github.io/dbcsr/> for, e.g., environment variables,
-  or how to tune kernels (auto tuned parameters).
+- CP2K's toolchain supports `--enable-opencl` to select DBCSR's OpenCL backend. This can be combined
+  with `--enable-cuda` (`--gpu-ver` is then imposed) to use a GPU for CP2K's grid and DBM/DBT
+  components (no OpenCL support yet).
+- For manually writing an ARCH-file add `-D__OPENCL` and `-D__DBCSR_ACC` to `CFLAGS`, and add
+  `-lOpenCL` to the `LIBS` variable, i.e., `OFFLOAD_CC` and `OFFLOAD_FLAGS` can duplicate `CC` and
+  `CFLAGS` (no special offload compiler needed). Please also set `OFFLOAD_TARGET = opencl` to enable
+  the OpenCL backend in DBCSR. For OpenCL, it is not necessary to specify a GPU version (e.g.,
+  `GPUVER = V100` would map to `exts/dbcsr/src/acc/opencl/smm/params/tune_multiply_V100.csv`). In
+  fact, `GPUVER` limits tuned parameters to the specified GPU, and by default all tuned parameters
+  are embedded (`exts/dbcsr/src/acc/opencl/smm/params/*.csv`) and applied at runtime. If auto-tuned
+  parameters are not available for DBCSR, well-chosen defaults will be used to populate kernels at
+  runtime. Refer to the toolchain method (above) for an ARCH-file that blends, e.g., OpenCL and
+  CUDA.
+- Auto-tuned parameters are embedded into the binary, i.e., CP2K does not rely on a hard-coded
+  location. Setting `OPENCL_LIBSMM_SMM_PARAMS=/path/to/csv-file` environment variable can supply
+  parameters for an already built application, or `OPENCL_LIBSMM_SMM_PARAMS=0` can disable using
+  tuned parameters.
+- The environment variable `ACC_OPENCL_VERBOSE=2` prints information about kernels generated at
+  runtime (and thereby checks the installation).
+- Refer to <https://cp2k.github.io/dbcsr/> for, e.g., environment variables, or how to tune kernels
+  (auto tuned parameters).
 
 ### 2x. matrix-matrix multiplication offloading on GPU using SPLA
 
-The SPLA library is a hard dependency of SIRIUS but can also be used as a
-standalone library. It provides a generic interface to the blas gemm family with
-offloading on GPU. Offloading supports both CUDA and ROCM.
+The SPLA library is a hard dependency of SIRIUS but can also be used as a standalone library. It
+provides a generic interface to the blas gemm family with offloading on GPU. Offloading supports
+both CUDA and ROCM.
 
-To make the functionality available, add the flag `-D__SPLA -D__OFFLOAD_GEMM` to
-the `DFLAGS` variable and compile SPLA with Fortran interface and GPU support.
-Please note that only the functions replacing the dgemm calls with
-`offload_dgemm` will eventually be offloaded to the GPU. The SPLA library has
-internal criteria to decide if it is worth to do the operation on GPU or not.
+To make the functionality available, add the flag `-D__SPLA -D__OFFLOAD_GEMM` to the `DFLAGS`
+variable and compile SPLA with Fortran interface and GPU support. Please note that only the
+functions replacing the dgemm calls with `offload_dgemm` will eventually be offloaded to the GPU.
+The SPLA library has internal criteria to decide if it is worth to do the operation on GPU or not.
 Calls to `offload_dgemm` also accept pointers on GPU or a combination of them.
 
 <!---
@@ -456,12 +435,11 @@ Calls to `offload_dgemm` also accept pointers on GPU or a combination of them.
 
 ### 3a. ARCH files
 
-The location of compiler and libraries needs to be specified.
-Examples for several common architectures can be found in
-[arch folder](./arch/). The names of these files match `architecture.version`
-e.g., [Linux-x86-64-gfortran.sopt](./arch/Linux-x86-64-gfortran.sopt).
-Alternatively, <https://dashboard.cp2k.org> provides sample arch files as part of
-the testing reports (click on the status field, search for 'ARCH-file').
+The location of compiler and libraries needs to be specified. Examples for several common
+architectures can be found in [arch folder](./arch/). The names of these files match
+`architecture.version` e.g., [Linux-x86-64-gfortran.sopt](./arch/Linux-x86-64-gfortran.sopt).
+Alternatively, <https://dashboard.cp2k.org> provides sample arch files as part of the testing
+reports (click on the status field, search for 'ARCH-file').
 
 - With -DNDEBUG assertions may be stripped ("compiled out").
 - NDEBUG is the ANSI-conforming symbol name (not \_\_NDEBUG).
@@ -480,8 +458,8 @@ Conventionally, there are six versions:
 
 You'll need to modify one of these files to match your system's settings.
 
-You can now build CP2K using these settings
-(where -j N allows for a parallel build using N processes):
+You can now build CP2K using these settings (where -j N allows for a parallel build using N
+processes):
 
 ```shell
 make -j N ARCH=architecture VERSION=version
@@ -501,15 +479,14 @@ make -j N ARCH=Linux-x86-64-gfortran VERSION="sopt popt ssmp psmp"
 
 An executable should appear in the `./exe/` folder.
 
-All compiled files, libraries, executables, etc. of all architectures and
-versions can be removed with
+All compiled files, libraries, executables, etc. of all architectures and versions can be removed
+with
 
 ```shell
 make distclean
 ```
 
-To remove only objects and mod files (i.e., keep exe) for a given
-ARCH/VERSION use, e.g.,
+To remove only objects and mod files (i.e., keep exe) for a given ARCH/VERSION use, e.g.,
 
 ```shell
 make ARCH=Linux-x86-64-gfortran VERSION=sopt clean
@@ -523,80 +500,75 @@ make ARCH=Linux-x86-64-gfortran VERSION=sopt realclean
 
 ### 3b. Compilation Flags
 
-The following flags should be present (or not) in the arch file,
-partially depending on installed libraries (see 2.)
+The following flags should be present (or not) in the arch file, partially depending on installed
+libraries (see 2.)
 
 - `-D__parallel -D__SCALAPACK` parallel runs
 - `-D__LIBINT` use LIBINT (needed for HF exchange)
 - `-D__LIBXC` use LIBXC
-- `-D__ELPA` use ELPA in place of SYEVD  to solve the eigenvalue problem
+- `-D__ELPA` use ELPA in place of SYEVD to solve the eigenvalue problem
 - `-D__FFTW3` FFTW version 3 is recommended
 - `-D__MKL` link the MKL library for linear algebra and/or FFT
-- `-D__GRID_CORE=X` (with X=1..6) specific optimized core routines can be
-  selected.  Reasonable defaults are [provided](./src/grid/collocate_fast.f90)
-  but trial-and-error might yield (a small ~10%) speedup.
-- `-D__PILAENV_BLOCKSIZE`: can be used to specify the blocksize (e.g., `-D__PILAENV_BLOCKSIZE=1024`),
-  which is a hack to overwrite (if the linker allows this) the PILAENV function
-  provided by Scalapack. This can lead to much improved PDGEMM performance.
-  The optimal value depends on hardware (GPU?) and precise problem.
-  Alternatively, Cray provides an environment variable to this effect
-  (e.g., `export LIBSCI_ACC_PILAENV=4000`)
-- `-D__STATM_RESIDENT` or `-D__STATM_TOTAL`
-  toggles memory usage reporting between resident memory and total memory
-- `-D__CRAY_PM_ACCEL_ENERGY` or `-D__CRAY_PM_ENERGY`
-  switch on energy profiling on Cray systems
-- `-D__NO_ABORT` to avoid calling abort, but STOP instead (useful for coverage
-  testing, and to avoid core dumps on some systems)
-- `-D__HDF5` enables hdf5 support. This is a hard dependency for SIRIUS, but can also
-  be used by itself to allow read/write functionalities of QCSchema files in the
-  active space module.
+- `-D__GRID_CORE=X` (with X=1..6) specific optimized core routines can be selected. Reasonable
+  defaults are [provided](./src/grid/collocate_fast.f90) but trial-and-error might yield (a small
+  ~10%) speedup.
+- `-D__PILAENV_BLOCKSIZE`: can be used to specify the blocksize (e.g.,
+  `-D__PILAENV_BLOCKSIZE=1024`), which is a hack to overwrite (if the linker allows this) the
+  PILAENV function provided by Scalapack. This can lead to much improved PDGEMM performance. The
+  optimal value depends on hardware (GPU?) and precise problem. Alternatively, Cray provides an
+  environment variable to this effect (e.g., `export LIBSCI_ACC_PILAENV=4000`)
+- `-D__STATM_RESIDENT` or `-D__STATM_TOTAL` toggles memory usage reporting between resident memory
+  and total memory
+- `-D__CRAY_PM_ACCEL_ENERGY` or `-D__CRAY_PM_ENERGY` switch on energy profiling on Cray systems
+- `-D__NO_ABORT` to avoid calling abort, but STOP instead (useful for coverage testing, and to avoid
+  core dumps on some systems)
+- `-D__HDF5` enables hdf5 support. This is a hard dependency for SIRIUS, but can also be used by
+  itself to allow read/write functionalities of QCSchema files in the active space module.
 
 Features useful to deal with legacy systems
 
-- `-D__NO_MPI_THREAD_SUPPORT_CHECK`  - Workaround for MPI libraries that do not
-  declare they are thread safe (serialized).
-- `-D__NO_SOCKETS` disables the socket interface in case of troubles compiling
-  on systems that do not support POSIX sockets.
+- `-D__NO_MPI_THREAD_SUPPORT_CHECK` - Workaround for MPI libraries that do not declare they are
+  thread safe (serialized).
+- `-D__NO_SOCKETS` disables the socket interface in case of troubles compiling on systems that do
+  not support POSIX sockets.
 - `-D__HAS_IEEE_EXCEPTIONS` disables trapping temporarily for libraries like scalapack.
-- The Makefile automatically compiles in the path to the data directory via the
-  flag `-D__DATA_DIR`. If you want to compile in a different path, set the
-  variable `DATA_DIR` in your arch-file.
-- `-D__NO_STATM_ACCESS` - Do not try to read from /proc/self/statm to get memory
-  usage information. This is otherwise attempted on several. Linux-based
-  architectures or using with the NAG, gfortran, compilers.
-- `-D__CHECK_DIAG` Debug option which activates an orthonormality check of the
-  eigenvectors calculated by the selected eigensolver
+- The Makefile automatically compiles in the path to the data directory via the flag `-D__DATA_DIR`.
+  If you want to compile in a different path, set the variable `DATA_DIR` in your arch-file.
+- `-D__NO_STATM_ACCESS` - Do not try to read from /proc/self/statm to get memory usage information.
+  This is otherwise attempted on several. Linux-based architectures or using with the NAG, gfortran,
+  compilers.
+- `-D__CHECK_DIAG` Debug option which activates an orthonormality check of the eigenvectors
+  calculated by the selected eigensolver
 
 ### 3c. Building CP2K as a library
 
-You can build CP2K for use as a library by adding `libcp2k` as an option to
-your `make` command, e.g.
+You can build CP2K for use as a library by adding `libcp2k` as an option to your `make` command,
+e.g.
 
 ```shell
 make -j N ARCH=Linux-x86-64-gfortran VERSION=sopt libcp2k
 ```
 
-This will create `libcp2k.a` in the relevant subdirectory of `./lib/`. You will
-need to add this subdirectory to the library search path of your compiler
-(typically via the `LD_LIBRARY_PATH` environment variable or the `-L` option to
-your compiler) and link to the library itself with `-lcp2k`.
+This will create `libcp2k.a` in the relevant subdirectory of `./lib/`. You will need to add this
+subdirectory to the library search path of your compiler (typically via the `LD_LIBRARY_PATH`
+environment variable or the `-L` option to your compiler) and link to the library itself with
+`-lcp2k`.
 
-In order to use the functions in the library you will also require the `libcp2k.h`
-header file. This can be found in `./src/start/` directory. You should add this
-directory to the header search path of your compiler (typically via the `CPATH`
-environment variable or the `-I` option to your compiler).
+In order to use the functions in the library you will also require the `libcp2k.h` header file. This
+can be found in `./src/start/` directory. You should add this directory to the header search path of
+your compiler (typically via the `CPATH` environment variable or the `-I` option to your compiler).
 
-For Fortran users, you will require the module interface file (`.mod` file) for
-every MODULE encountered in the source. These are compiler specific and are to
-be found in the subdirectory of `./obj/` that corresponds to your build, e.g.,
+For Fortran users, you will require the module interface file (`.mod` file) for every MODULE
+encountered in the source. These are compiler specific and are to be found in the subdirectory of
+`./obj/` that corresponds to your build, e.g.,
 
 ```shell
 ./obj/Linux-x86-64-gfortran/sopt/
 ```
 
-In order for your compiler to find these, you will need to indicate their
-location to the compiler as is done for header files  (typically via the `CPATH`
-environment variable or the `-I` option to your compiler).
+In order for your compiler to find these, you will need to indicate their location to the compiler
+as is done for header files (typically via the `CPATH` environment variable or the `-I` option to
+your compiler).
 
 ## 4. If it doesn't work
 
@@ -604,8 +576,8 @@ If things fail, take a break... go back to 2a (or skip to step 6).
 
 ## 5. Regtesting
 
-If compilation works fine, it is recommended to test the generated binary,
-to exclude errors in libraries, or miscompilations, etc.
+If compilation works fine, it is recommended to test the generated binary, to exclude errors in
+libraries, or miscompilations, etc.
 
 ```shell
 make -j ARCH=... VERSION=... test
@@ -613,17 +585,18 @@ make -j ARCH=... VERSION=... test
 
 should work if you can locally execute CP2K without the need for, e.g., batch submission.
 
-In the other case, you might need to configure the underlying testing script as
-described more systematically at <https://www.cp2k.org/dev:regtesting>
+In the other case, you might need to configure the underlying testing script as described more
+systematically at <https://www.cp2k.org/dev:regtesting>
 
 ## 6. Talk to us
 
-In any case please tell us your comments, praise, criticism, thanks, etc. see <https://www.cp2k.org>.
+In any case please tell us your comments, praise, criticism, thanks, etc. see
+<https://www.cp2k.org>.
 
 ## 7. Manual
 
-A reference manual of CP2K can be found on the web: <https://manual.cp2k.org> or
-can be generated using the cp2k executable, see <https://manual.cp2k.org/trunk/generate_manual_howto.html>
+A reference manual of CP2K can be found on the web: <https://manual.cp2k.org> or can be generated
+using the cp2k executable, see <https://manual.cp2k.org/trunk/generate_manual_howto.html>
 
 ## 8. Happy computing
 
diff --git a/README.md b/README.md
index 206a939759..5160356d10 100644
--- a/README.md
+++ b/README.md
@@ -1,18 +1,16 @@
 # CP2K
 
-CP2K is a quantum chemistry and solid state physics software package that can
-perform atomistic simulations of solid state, liquid, molecular, periodic,
-material, crystal, and biological systems. CP2K provides a general framework for
-different modeling methods such as DFT using the mixed Gaussian and plane waves
-approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2,
-RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, ...), and classical force
-fields (AMBER, CHARMM, ...). CP2K can do simulations of molecular dynamics,
-metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level
-spectroscopy, energy minimization, and transition state optimization using NEB
-or dimer method.
-
-CP2K is written in Fortran 2008 and can be run efficiently in parallel using
-a combination of multi-threading, MPI, and CUDA.
+CP2K is a quantum chemistry and solid state physics software package that can perform atomistic
+simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems.
+CP2K provides a general framework for different modeling methods such as DFT using the mixed
+Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA,
+MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, ...), and classical force fields (AMBER,
+CHARMM, ...). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest
+dynamics, vibrational analysis, core level spectroscopy, energy minimization, and transition state
+optimization using NEB or dimer method.
+
+CP2K is written in Fortran 2008 and can be run efficiently in parallel using a combination of
+multi-threading, MPI, and CUDA.
 
 ## Downloading CP2K source code
 
@@ -30,27 +28,28 @@ To clone a release version v*x.y*:
 git clone -b support/vx.y --recursive https://github.com/cp2k/cp2k.git cp2k
 ```
 
-For more information on downloading CP2K, see [Downloading CP2K](https://www.cp2k.org/download).
-For help on git, see [Git Tips & Tricks](https://github.com/cp2k/cp2k/wiki/Git-Tips-&-Tricks).
+For more information on downloading CP2K, see [Downloading CP2K](https://www.cp2k.org/download). For
+help on git, see [Git Tips & Tricks](https://github.com/cp2k/cp2k/wiki/Git-Tips-&-Tricks).
 
 ## Install CP2K
 
-The easiest way to build CP2K with all of its dependencies is as a [Docker container](./tools/docker/README.md).
+The easiest way to build CP2K with all of its dependencies is as a
+[Docker container](./tools/docker/README.md).
 
 For building CP2K from scratch see the [installation instructions](./INSTALL.md).
 
 ## Links
 
-- [CP2K.org](https://www.cp2k.org)
-  for showcases of scientific work, tutorials, exercises, presentation slides, etc.
-- [The manual](https://manual.cp2k.org/)
-  with descriptions of all the keywords for the CP2K input file
-- [The dashboard](https://dashboard.cp2k.org)
-  to get an overview of the currently tested architectures
-- [The Google group](https://groups.google.com/group/cp2k) to get help if you
-  could not find an answer in one of the previous links
-- [Acknowledgements](https://www.cp2k.org/funding) for list of institutions and
-  grants that help to fund the development of CP2K
+- [CP2K.org](https://www.cp2k.org) for showcases of scientific work, tutorials, exercises,
+  presentation slides, etc.
+- [The manual](https://manual.cp2k.org/) with descriptions of all the keywords for the CP2K input
+  file
+- [The dashboard](https://dashboard.cp2k.org) to get an overview of the currently tested
+  architectures
+- [The Google group](https://groups.google.com/group/cp2k) to get help if you could not find an
+  answer in one of the previous links
+- [Acknowledgements](https://www.cp2k.org/funding) for list of institutions and grants that help to
+  fund the development of CP2K
 
 ## Directory organization
 
diff --git a/README_cmake.md b/README_cmake.md
index 08fe43a579..172fdcb0c3 100644
--- a/README_cmake.md
+++ b/README_cmake.md
@@ -1,15 +1,13 @@
 # Build CP2K with CMake
 
-This document regroups information about the CP2K CMake system. CMake is used to
-detect CP2K dependencies and configure the compilation process. Dependencies
-should be installed independently either with a distribution package manager,
-easybuild, or spack to name a few.
-
-It is easier to build and install all manually built dependencies in a single
-directory ideally where CP2K will also be installed. CMake will have less
-difficulties to find the `FindPACKAGE.cmake` files and dependent libraries. CMake
-will also use environment variables such as `ORNL_FFTW3_ROOT`, etc. Usually a
-standard prefix is used in HPC environments. If known just add it in the
+This document regroups information about the CP2K CMake system. CMake is used to detect CP2K
+dependencies and configure the compilation process. Dependencies should be installed independently
+either with a distribution package manager, easybuild, or spack to name a few.
+
+It is easier to build and install all manually built dependencies in a single directory ideally
+where CP2K will also be installed. CMake will have less difficulties to find the `FindPACKAGE.cmake`
+files and dependent libraries. CMake will also use environment variables such as `ORNL_FFTW3_ROOT`,
+etc. Usually a standard prefix is used in HPC environments. If known just add it in the
 `cmake/cp2k_utils.cmake` file.
 
 The CMake build system requires a minimum set of dependencies:
@@ -22,20 +20,19 @@ The CMake build system requires a minimum set of dependencies:
 - fftw3
 - CMake
 
-Major vendors implementations of BLAS, LAPACK, and scalapack are supported. The
-build system was tested with MKL, cray libsci, OpenBLAS, flexiblas, blis or
-ATLAS.
+Major vendors implementations of BLAS, LAPACK, and scalapack are supported. The build system was
+tested with MKL, cray libsci, OpenBLAS, flexiblas, blis or ATLAS.
 
-All optional dependencies are turned off by default. DBCSR is the only dependency that
-can be built at the same time than CP2K. It is OFF by default. We assume that mpi, blas,
-lapack, and scalapack are installed. To compile `CP2K and `DBCSR\` open a terminal and write
+All optional dependencies are turned off by default. DBCSR is the only dependency that can be built
+at the same time than CP2K. It is OFF by default. We assume that mpi, blas, lapack, and scalapack
+are installed. To compile `CP2K and `DBCSR\` open a terminal and write
 
 ```shell
 cmake -DCMAKE_INSTALL_PREFIX=/pyprefix -DCP2K_BUILD_DBCSR=ON ..
 ```
 
-This command will build cp2k and dbcsr dependencies in cpu only mode. Openmp is turned
-on by default for dbcsr (see dedicated session in this README).
+This command will build cp2k and dbcsr dependencies in cpu only mode. Openmp is turned on by default
+for dbcsr (see dedicated session in this README).
 
 If MKL is present on your system then add this
 
@@ -57,8 +54,8 @@ CP2K can be compiled on CRAY systems with the following command line
 make -j
 ```
 
-MKL (or openblas, etc) can be combined with CUDA or HIP after adding the
-additional flags enabling GPU support to the one of the previous cmake commands.
+MKL (or openblas, etc) can be combined with CUDA or HIP after adding the additional flags enabling
+GPU support to the one of the previous cmake commands.
 
 To get CUDA support (V100 gpus) write this command
 
@@ -80,26 +77,24 @@ cmake -DCMAKE_INSTALL_PREFIX=/pyprefix \
 make -j
 ```
 
-The option `CP2K_ENABLE_REGTESTS=ON` will configure the build system such that
-the regtests are ran as usuall. `CMake` will create the binaries in the
-`exe/cmake-build-{cpu,cuda,hip}` directory located in the cp2k root
-directory. Users will have to set the environment variabble `CP2K_DATA_DIR`
+The option `CP2K_ENABLE_REGTESTS=ON` will configure the build system such that the regtests are ran
+as usuall. `CMake` will create the binaries in the `exe/cmake-build-{cpu,cuda,hip}` directory
+located in the cp2k root directory. Users will have to set the environment variabble `CP2K_DATA_DIR`
 accordingly.
 
 For additional options please read the section below.
 
 ## Use of spack to build cp2k
 
-The \[spack\]{https://github.com/spack/spack.git} tool can build CP2K and all its
-dependencies with relative ease. We assume that you have spack available on your
-system. Enter the command
+The \[spack\]{https://github.com/spack/spack.git} tool can build CP2K and all its dependencies with
+relative ease. We assume that you have spack available on your system. Enter the command
 
 ```shell
 spack install cp2k@master build_system=cmake ^openblas
 ```
 
-To build cp2k master with MPI, openblas, scalapack, fftw and openmp. CP2K with
-CUDA support is compiled with
+To build cp2k master with MPI, openblas, scalapack, fftw and openmp. CP2K with CUDA support is
+compiled with
 
 ```shell
 spack install cp2k@master build_system=cmake +cuda cuda_arch=70 ^openblas ^dbcsr+cuda cuda_arch=70
@@ -136,8 +131,8 @@ build_system=cmake \
 
 The CP2K cmake build system also supports the following dependencies :
 
-- `CP2K_USE_SIRIUS = OFF`: add [SIRIUS](https://github.com/electronic-structure/SIRIUS)
-  support to CP2K
+- `CP2K_USE_SIRIUS = OFF`: add [SIRIUS](https://github.com/electronic-structure/SIRIUS) support to
+  CP2K
 
 - `CP2K_USE_FFTW3 = ON`: add support of [fftw3](https://www.fftw.org)
 
@@ -148,82 +143,73 @@ The CP2K cmake build system also supports the following dependencies :
 
 - `CP2K_USE_SUPERLU = OFF`: detection should work but needs improvement
 
-- `CP2K_USE_COSMA = OFF` : add [cosma](https://github.com/eth-cscs/COSMA) drop-in
-  replacement for sclapack pdgemnm
+- `CP2K_USE_COSMA = OFF` : add [cosma](https://github.com/eth-cscs/COSMA) drop-in replacement for
+  sclapack pdgemnm
 
-- `CP2K_USE_LIBINT2 = OFF`: add [libint2](https://github.com/evaleev/libint) support
-  (detection works ok, module files may not be found at compilation time though)
+- `CP2K_USE_LIBINT2 = OFF`: add [libint2](https://github.com/evaleev/libint) support (detection
+  works ok, module files may not be found at compilation time though)
 
-- `CP2K_USE_VORI = OFF`: detection is fine compilation might fail at linking time
-  (investigating why)
+- `CP2K_USE_VORI = OFF`: detection is fine compilation might fail at linking time (investigating
+  why)
 
 - `CP2K_USE_QUIP = OFF` :
 
 - `CP2K_USE_SPGLIB = OFF`: everything alright
 
-- `CP2K_USE_LIBXC = OFF`: Use `pkg-config` by default (ideally
-  the library should be built with CMake, if so we can get rid of the
-  `FindLibXC.cmake`). If you installed LIBXC in a non-standard location,
-  modify the `PKG_CONFIG_PATH` variable accordingly.
+- `CP2K_USE_LIBXC = OFF`: Use `pkg-config` by default (ideally the library should be built with
+  CMake, if so we can get rid of the `FindLibXC.cmake`). If you installed LIBXC in a non-standard
+  location, modify the `PKG_CONFIG_PATH` variable accordingly.
 
-- `CP2K_USE_SPLA = OFF`: enable spla off-loading capabilities (use CMake modules
-  to detect it)
+- `CP2K_USE_SPLA = OFF`: enable spla off-loading capabilities (use CMake modules to detect it)
 
 - `CP2K_USE_METIS = OFF`:
 
-- `CP2K_USE_LIBXSMM = OFF`: use [libxsmm](https://libxsmm.readthedocs.io/en/latest/)
-  library for small matrices operations. Detection based on `pkg-config`. If you
-  installed libxsmm in a non-standard location, modify the `PKG_CONFIG_PATH` variable
-  accordingly.
+- `CP2K_USE_LIBXSMM = OFF`: use [libxsmm](https://libxsmm.readthedocs.io/en/latest/) library for
+  small matrices operations. Detection based on `pkg-config`. If you installed libxsmm in a
+  non-standard location, modify the `PKG_CONFIG_PATH` variable accordingly.
 
 - `CP2K_USE_ACCEL = NONE, CUDA, HIP`: enable GPU support
 
-- `CP2K_BLAS_VENDOR = auto`: CMake will search for the most common blas / lapack
-  implementations. If possible indicate which implementation you are using. Supported
-  values are: `auto` (default), `MKL`, `SCI`, `OpenBLAS`, `FlexiBLAS`, `Armpl`.
+- `CP2K_BLAS_VENDOR = auto`: CMake will search for the most common blas / lapack implementations. If
+  possible indicate which implementation you are using. Supported values are: `auto` (default),
+  `MKL`, `SCI`, `OpenBLAS`, `FlexiBLAS`, `Armpl`.
 
-- `CP2K_SCALAPACK_VENDOR = MKL, SCI, GENERIC`: similar to the previous option but for
-  scalapack
+- `CP2K_SCALAPACK_VENDOR = MKL, SCI, GENERIC`: similar to the previous option but for scalapack
 
-- `CP2K_BLAS_THREADING = sequential, openmp, etc...`: leave the default value (or
-  use it at your own peril)
+- `CP2K_BLAS_THREADING = sequential, openmp, etc...`: leave the default value (or use it at your own
+  peril)
 
-- `CP2K_BLAS_INTERFACE = 32 bits, 64 bits`: size of the integers for the matrices
-  and vectors sizes. Default: 32 bits
+- `CP2K_BLAS_INTERFACE = 32 bits, 64 bits`: size of the integers for the matrices and vectors sizes.
+  Default: 32 bits
 
-- `CP2K_DEV_OPTIONS = OFF`: enable developer options. The main purpose is for
-  debugging
+- `CP2K_DEV_OPTIONS = OFF`: enable developer options. The main purpose is for debugging
 
   - `CP2K_USE_GRID_GPU = ON`: turn on of gpu support for collocate integrate
   - `CP2K_USE_PW_GPU = ON`: turn on or off gpu fft support
   - `CP2K_USE_DBM_GPU = ON`: turn on or off dbm gpu support
-  - `CP2K_CHECK_CONSISTENCY = OFF` : Compare the list of compiled files to the
-    files found in the tree. This function is only relevant to developers and should
-    be left to its default value. It has no incidence on the binaries generated by
-    the build.
-
-ROCM 5.0.x is known to have a bug in the CMake configuration files. It is
-possible to go around this but at the expense of time. The build system was not
-tested with ROCM 5.1.x but this version shows performance regression and should
-be avoided. The Jiting capabilities of ROCM 5.2.x do not work properly which
-affects DBCSR. It is highly recommended to update ROCM to the latest version to
-avoid all these issues. CP2K can be built with ROCM 5.2.x but GPU support in
-DBCSR should be turned off otherwise a crash is expected. ROCM 5.3.x and
-later seems to work fine.
+  - `CP2K_CHECK_CONSISTENCY = OFF` : Compare the list of compiled files to the files found in the
+    tree. This function is only relevant to developers and should be left to its default value. It
+    has no incidence on the binaries generated by the build.
+
+ROCM 5.0.x is known to have a bug in the CMake configuration files. It is possible to go around this
+but at the expense of time. The build system was not tested with ROCM 5.1.x but this version shows
+performance regression and should be avoided. The Jiting capabilities of ROCM 5.2.x do not work
+properly which affects DBCSR. It is highly recommended to update ROCM to the latest version to avoid
+all these issues. CP2K can be built with ROCM 5.2.x but GPU support in DBCSR should be turned off
+otherwise a crash is expected. ROCM 5.3.x and later seems to work fine.
 
 ## Threading with blas and lapack
 
 CP2K expect by default a single threaded version of blas and lapack. The option
-`-DCP2K_BLAS_THREADING` can change this behavior. Be careful when tweaking this
-specific option as many implementations of blas / lapack are either threaded or
-(exclusive) sequential but not both. I think the only exception to this is MKL.
-Also note that CP2K dependencies will most likely have the same issue (COSMA
-with cray-libsci for instance).
+`-DCP2K_BLAS_THREADING` can change this behavior. Be careful when tweaking this specific option as
+many implementations of blas / lapack are either threaded or (exclusive) sequential but not both. I
+think the only exception to this is MKL. Also note that CP2K dependencies will most likely have the
+same issue (COSMA with cray-libsci for instance).
 
 ## Typical examples of CMake use
 
-The following list gives several examples of CMake command lines. Just add
-`-DCP2K_USE_SIRIUS=ON` to add support of SIRIUS in CP2K
+The following list gives several examples of CMake command lines. Just add `-DCP2K_USE_SIRIUS=ON` to
+add support of SIRIUS in CP2K
 
 ```shell
 cmake -DCP2K_INSTALL_PREFIX=/myprefix -DCP2K_USE_SIRIUS=ON ..
@@ -251,8 +237,7 @@ MPICC=cc MPICXX=CC cmake -DCP2K_INSTALL_PREFIX=/myprefix
 
 ## CUDA / HIP
 
-Let us consider the case where OpenBLAS and netlib scalapack are installed
-(openmpi or mpich)
+Let us consider the case where OpenBLAS and netlib scalapack are installed (openmpi or mpich)
 
 ```shell
 cmake -DCP2K_INSTALL_PREFIX=/myprefix -DCP2K_BLAS_VENDOR=OpenBLAS
@@ -268,30 +253,29 @@ cmake -DCP2K_INSTALL_PREFIX=/myprefix -DCP2K_BLAS_VENDOR=OpenBLAS
 
 ## Troubleshooting
 
-This build system is relatevily stable and was tested on Cray, IBM, and redhat
-like distributions. However it is not perfect and problems will show up, that's
-why the two build systems will be available. We encourage the user to test the
-build system just reporting the output of `cmake ..` is already beneficial.
+This build system is relatevily stable and was tested on Cray, IBM, and redhat like distributions.
+However it is not perfect and problems will show up, that's why the two build systems will be
+available. We encourage the user to test the build system just reporting the output of `cmake ..` is
+already beneficial.
 
-The best way to report these problems is to open an issue including the CMake
-command line, error message, and operating systems.
+The best way to report these problems is to open an issue including the CMake command line, error
+message, and operating systems.
 
 What is known to fail sometimes
 
-- Nvidia HPC SDK: The location of the cuda maths libraries has changed
-  recently. While CUDA support will be detected, the CUDA maths libraries may not.
+- Nvidia HPC SDK: The location of the cuda maths libraries has changed recently. While CUDA support
+  will be detected, the CUDA maths libraries may not.
 
-- HIP : CMake support of ROCM is still under development and is known to fail
-  from time to time. Update to ROCM 5.3.x or above to solve the issue.
+- HIP : CMake support of ROCM is still under development and is known to fail from time to time.
+  Update to ROCM 5.3.x or above to solve the issue.
 
-- BLAS / LAPACK / SCALAPACK : use the options `CP2K_BLAS_VENDOR` and
-  `CP2K_SCALPACK_VENDOR` if you know that `MKL` or `SCI` (cray libsci) are
-  present. `-DCP2k_BLAS_VENDOR=OpenBLAS` will also help CMake to find OpenBLAS if
-  it is used. Detecting the scalapack library might also fail if the user
+- BLAS / LAPACK / SCALAPACK : use the options `CP2K_BLAS_VENDOR` and `CP2K_SCALPACK_VENDOR` if you
+  know that `MKL` or `SCI` (cray libsci) are present. `-DCP2k_BLAS_VENDOR=OpenBLAS` will also help
+  CMake to find OpenBLAS if it is used. Detecting the scalapack library might also fail if the user
   environment is not properly set up.
 
-- BLAS / LAPACK / SCALAPACK: It is possible to set up BLAS / LAPACK / SCALAPACK
-  libraries manually with the command
+- BLAS / LAPACK / SCALAPACK: It is possible to set up BLAS / LAPACK / SCALAPACK libraries manually
+  with the command
 
 ```shell
 cmake -DCP2K_BLAS_LINK_LIBRARIES=libmyblas.so -DCP2K_BLAS_VENDOR=CUSTOM 
diff --git a/benchmarks/Fayalite-FIST/README.md b/benchmarks/Fayalite-FIST/README.md
index 8192fb6bc2..2b1b5dcfc0 100644
--- a/benchmarks/Fayalite-FIST/README.md
+++ b/benchmarks/Fayalite-FIST/README.md
@@ -2,15 +2,13 @@
 
 ## Description
 
-This is a short molecular dynamics run of 1'000 time steps in a NPT ensemble at
-300K. It consists of 28'000 atoms - a 103 supercell with 28 atoms of iron silicate
-(Fe2SiO4, also known as Fayalite) per unit cell. The simulation employs a classical
-potential (Morse with a hard-core repulsive term and 5.5 angstrom cutoff) with
-long-range electrostatics using Smoothed Particle Mesh Ewald (SPME) summation.
-While CP2K does support classical potentials via the Frontiers In Simulation
-Technology (FIST) module, this is not a typical calculation for CP2K but is
-included to give an impression of the performance difference between machines
-for the MM part of a QM/MM calculation.
+This is a short molecular dynamics run of 1'000 time steps in a NPT ensemble at 300K. It consists of
+28'000 atoms - a 103 supercell with 28 atoms of iron silicate (Fe2SiO4, also known as Fayalite) per
+unit cell. The simulation employs a classical potential (Morse with a hard-core repulsive term and
+5.5 angstrom cutoff) with long-range electrostatics using Smoothed Particle Mesh Ewald (SPME)
+summation. While CP2K does support classical potentials via the Frontiers In Simulation Technology
+(FIST) module, this is not a typical calculation for CP2K but is included to give an impression of
+the performance difference between machines for the MM part of a QM/MM calculation.
 
 ## Benchmarks
 
@@ -18,8 +16,8 @@ for the MM part of a QM/MM calculation.
 
 ## Results
 
-The best configurations are shown below.
-Click the links under "Detailed Results" to see more detail.
+The best configurations are shown below. Click the links under "Detailed Results" to see more
+detail.
 
 | Machine Name | Architecture | Date       | SVN Revision | Fastest time (s) | Number of Cores | Number of Threads                 | Detailed Results                                                      |
 | ------------ | ------------ | ---------- | ------------ | ---------------- | --------------- | --------------------------------- | --------------------------------------------------------------------- |
diff --git a/benchmarks/QMMM_CBD_PHY/README.md b/benchmarks/QMMM_CBD_PHY/README.md
index 045c9729a5..a087e15da3 100644
--- a/benchmarks/QMMM_CBD_PHY/README.md
+++ b/benchmarks/QMMM_CBD_PHY/README.md
@@ -2,22 +2,20 @@
 
 ## Description
 
-This benchmark performs a short QM/MM MD simulation of 5 steps.
-The CBD_PHY system contains a phytochrome dimer (PBD-ID: 4O0P) with a bound
-chromophore, solvated in water. There are 68 QM atoms in this system and 167,922
-atoms in total. The QM atoms are modelled using the GPW method with the DZVP-MOLOPT-GTH
-basis set and PBE XC functional. For the MM part the Amber03 forcefield is used
-for the protein and water molecules are treated using the TIP3P model. The QM/MM
-coupling is described with the Gaussian Expansion of the Electrostatic Potential
-(GEEP) method, and the bonds between the QM and MM atoms are treated using the
-Generalized Hybrid Orbital (GHO) method.
+This benchmark performs a short QM/MM MD simulation of 5 steps. The CBD_PHY system contains a
+phytochrome dimer (PBD-ID: 4O0P) with a bound chromophore, solvated in water. There are 68 QM atoms
+in this system and 167,922 atoms in total. The QM atoms are modelled using the GPW method with the
+DZVP-MOLOPT-GTH basis set and PBE XC functional. For the MM part the Amber03 forcefield is used for
+the protein and water molecules are treated using the TIP3P model. The QM/MM coupling is described
+with the Gaussian Expansion of the Electrostatic Potential (GEEP) method, and the bonds between the
+QM and MM atoms are treated using the Generalized Hybrid Orbital (GHO) method.
 
 ## Files description
 
 `CBD_PHY.inp` - CP2K input file.
 
-`CBD_PHY.prmtop` - Amber forcefield for MM atoms. The Amber03 forcefield and
-the TIP3P water model are used.
+`CBD_PHY.prmtop` - Amber forcefield for MM atoms. The Amber03 forcefield and the TIP3P water model
+are used.
 
 `CBD_PHY.pdb` - Atomic input coordinates.
 
diff --git a/benchmarks/QMMM_ClC/README.md b/benchmarks/QMMM_ClC/README.md
index 3c65273ab7..f76ee3c3e5 100644
--- a/benchmarks/QMMM_ClC/README.md
+++ b/benchmarks/QMMM_ClC/README.md
@@ -2,19 +2,17 @@
 
 ## Description
 
-This benchmark performs a short QM/MM MD simulation of 5 steps.
-ClC consists of a (ClC-ec1) chloride ion channel embedded in a lipid bilayer
-(PDB-ID: 1KPK), which is solvated in water. Two variants are included for this
-system - ClC-19 and ClC-253 which differ only in having respectively 19 and 253
-atoms treated quantum mechanically, representing a small and large QM subsystem
-within a large MM subsystem (150,925 atoms in total). The QM regions are modelled
-using the GPW method with the DZVP-MOLOPT-GTH basis set and the BLYP XC functional
-and the corresponding pseudopotentials. An energy cut-off for the plane waves of
-300 Ry was found to be suitable. The Amber14 forcefield is used for the protein
-and lipid14 forcefield is used for the lipid molecules, and water molecules are
-treated using the TIP3P model. The QM/MM coupling is described with the Gaussian
-Expansion of the Electrostatic Potential (GEEP) method, and the bonds between
-the QM and MM atoms are treated using the Generalized Hybrid Orbital (GHO) method.
+This benchmark performs a short QM/MM MD simulation of 5 steps. ClC consists of a (ClC-ec1) chloride
+ion channel embedded in a lipid bilayer (PDB-ID: 1KPK), which is solvated in water. Two variants are
+included for this system - ClC-19 and ClC-253 which differ only in having respectively 19 and 253
+atoms treated quantum mechanically, representing a small and large QM subsystem within a large MM
+subsystem (150,925 atoms in total). The QM regions are modelled using the GPW method with the
+DZVP-MOLOPT-GTH basis set and the BLYP XC functional and the corresponding pseudopotentials. An
+energy cut-off for the plane waves of 300 Ry was found to be suitable. The Amber14 forcefield is
+used for the protein and lipid14 forcefield is used for the lipid molecules, and water molecules are
+treated using the TIP3P model. The QM/MM coupling is described with the Gaussian Expansion of the
+Electrostatic Potential (GEEP) method, and the bonds between the QM and MM atoms are treated using
+the Generalized Hybrid Orbital (GHO) method.
 
 See also <https://doi.org/10.1021/acs.jctc.9b00424>.
 
@@ -24,8 +22,8 @@ See also <https://doi.org/10.1021/acs.jctc.9b00424>.
 
 `ClC-253.inp` - ClC with 253 QM atoms.
 
-`ClC.prmtop` - Amber forcefield for MM atoms. The Amber14 forcefield and
-the TIP3P water model are used.
+`ClC.prmtop` - Amber forcefield for MM atoms. The Amber14 forcefield and the TIP3P water model are
+used.
 
 `ClC.pdb` - Atomic input coordinates.
 
diff --git a/benchmarks/QMMM_MQAE/README.md b/benchmarks/QMMM_MQAE/README.md
index e4c63f798f..5d00f6e983 100644
--- a/benchmarks/QMMM_MQAE/README.md
+++ b/benchmarks/QMMM_MQAE/README.md
@@ -2,16 +2,14 @@
 
 ## Description
 
-This benchmark performs a short QM/MM MD simulation of 5 steps.
-The MQAE system is a solute-solvent system consisting of a N-(6-methoxyquinolyl)
-acetoethyl ester in solution. All 34 atoms of the ester are treated with QM
-whereas the remaining water atoms are treated with MM. The parameters for the
-organic molecule are created using the General Amber Force Field (GAFF) and the
-water molecules are modelled using the SPCE model. The BLYP functional as the XC
-functional are used and an energy cut-off of 400 Ry for the plane waves was
-found to be suitable. The QM/MM coupling is described with the Gaussian Expansion
-of the Electrostatic Potential (GEEP) method, and the bonds between theQM and MM
-atoms are treated using the Generalized Hybrid Orbital (GHO) method.
+This benchmark performs a short QM/MM MD simulation of 5 steps. The MQAE system is a solute-solvent
+system consisting of a N-(6-methoxyquinolyl) acetoethyl ester in solution. All 34 atoms of the ester
+are treated with QM whereas the remaining water atoms are treated with MM. The parameters for the
+organic molecule are created using the General Amber Force Field (GAFF) and the water molecules are
+modelled using the SPCE model. The BLYP functional as the XC functional are used and an energy
+cut-off of 400 Ry for the plane waves was found to be suitable. The QM/MM coupling is described with
+the Gaussian Expansion of the Electrostatic Potential (GEEP) method, and the bonds between theQM and
+MM atoms are treated using the Generalized Hybrid Orbital (GHO) method.
 
 See also <https://doi.org/10.1021/acs.jctc.9b00424>.
 
@@ -19,8 +17,8 @@ See also <https://doi.org/10.1021/acs.jctc.9b00424>.
 
 `MQAE.inp` - CP2K input file.
 
-`MQAE.prmtop` - Amber forcefield for MM atoms. The Amber14 forcefield and
-the SPCE water model are used.
+`MQAE.prmtop` - Amber forcefield for MM atoms. The Amber14 forcefield and the SPCE water model are
+used.
 
 `MQAE.pdb` - Atomic input coordinates.
 
diff --git a/benchmarks/QS/README.md b/benchmarks/QS/README.md
index ba5080171f..6e9ae75068 100644
--- a/benchmarks/QS/README.md
+++ b/benchmarks/QS/README.md
@@ -2,38 +2,38 @@
 
 ## Description
 
-Ab-initio molecular dynamics of liquid water using the Born-Oppenheimer approach,
-using [Quickstep](https://www.cp2k.org/quickstep) DFT. Production quality settings
-for the basis sets (TZV2P) and the planewave cutoff (280 Ry) are chosen, and the
-Local Density Approximation (LDA) is used for the calculation of the Exchange-Correlation
-energy. The configurations were generated by classical equilibration, and the
-initial guess of the electronic density is made based on Atomic Orbitals.
+Ab-initio molecular dynamics of liquid water using the Born-Oppenheimer approach, using
+[Quickstep](https://www.cp2k.org/quickstep) DFT. Production quality settings for the basis sets
+(TZV2P) and the planewave cutoff (280 Ry) are chosen, and the Local Density Approximation (LDA) is
+used for the calculation of the Exchange-Correlation energy. The configurations were generated by
+classical equilibration, and the initial guess of the electronic density is made based on Atomic
+Orbitals.
 
 ## Benchmarks
 
-- [`H2O-32.inp`](H2O-32.inp): a system of 32 water molecules (96 atoms,
-  256 electrons) in a 9.9 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-64.inp`](H2O-64.inp): a system of 64 water molecules (192 atoms,
-  512 electrons) in a 12.4 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-128.inp`](H2O-128.inp): a system of 128 water molecules (384 atoms,
-  1'024 electrons) in a 15.6 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-256.inp`](H2O-256.inp): a system of 256 water molecules (768 atoms,
-  2'048 electrons) in a 19.7 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-512.inp`](H2O-512.inp): a system of 512 water molecules (1'536 atoms,
-  4'096 electrons) in a 24.9 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-1024.inp`](H2O-1024.inp): a system of 1'024 water molecules (3'072 atoms,
-  8'192 electrons) in a 31.3 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-2048.inp`](H2O-2048.inp): a system of 2'048 water molecules (6'144 atoms,
-  16'384 electrons) in a 39.5 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-4096.inp`](H2O-4096.inp): a system of 4'096 water molecules (12'288 atoms,
-  32'768 electrons) in a 49.7 cubic angstrom cell and MD is run for 10 steps
-- [`H2O-8192.inp`](H2O-8192.inp): a system of 8'192 water molecules (24'576 atoms,
-  65'536 electrons) in a 62.7 cubic angstrom cell and MD is run for 10 steps
+- [`H2O-32.inp`](H2O-32.inp): a system of 32 water molecules (96 atoms, 256 electrons) in a 9.9
+  cubic angstrom cell and MD is run for 10 steps
+- [`H2O-64.inp`](H2O-64.inp): a system of 64 water molecules (192 atoms, 512 electrons) in a 12.4
+  cubic angstrom cell and MD is run for 10 steps
+- [`H2O-128.inp`](H2O-128.inp): a system of 128 water molecules (384 atoms, 1'024 electrons) in a
+  15.6 cubic angstrom cell and MD is run for 10 steps
+- [`H2O-256.inp`](H2O-256.inp): a system of 256 water molecules (768 atoms, 2'048 electrons) in a
+  19.7 cubic angstrom cell and MD is run for 10 steps
+- [`H2O-512.inp`](H2O-512.inp): a system of 512 water molecules (1'536 atoms, 4'096 electrons) in a
+  24.9 cubic angstrom cell and MD is run for 10 steps
+- [`H2O-1024.inp`](H2O-1024.inp): a system of 1'024 water molecules (3'072 atoms, 8'192 electrons)
+  in a 31.3 cubic angstrom cell and MD is run for 10 steps
+- [`H2O-2048.inp`](H2O-2048.inp): a system of 2'048 water molecules (6'144 atoms, 16'384 electrons)
+  in a 39.5 cubic angstrom cell and MD is run for 10 steps
+- [`H2O-4096.inp`](H2O-4096.inp): a system of 4'096 water molecules (12'288 atoms, 32'768 electrons)
+  in a 49.7 cubic angstrom cell and MD is run for 10 steps
+- [`H2O-8192.inp`](H2O-8192.inp): a system of 8'192 water molecules (24'576 atoms, 65'536 electrons)
+  in a 62.7 cubic angstrom cell and MD is run for 10 steps
 
 ## Results
 
-The best configurations are shown below.
-Click the links under "Detailed Results" to see more detail.
+The best configurations are shown below. Click the links under "Detailed Results" to see more
+detail.
 
 | Machine Name | Architecture | Date       | SVN Revision | Fastest time (s) | Number of Cores | Number of Threads                 | Detailed Results                                                      |
 | ------------ | ------------ | ---------- | ------------ | ---------------- | --------------- | --------------------------------- | --------------------------------------------------------------------- |
diff --git a/benchmarks/QS_DM_LS/README.md b/benchmarks/QS_DM_LS/README.md
index 2d168764a6..ed80d58605 100644
--- a/benchmarks/QS_DM_LS/README.md
+++ b/benchmarks/QS_DM_LS/README.md
@@ -4,33 +4,29 @@
 
 This is a single-point energy calculation using linear-scaling DFT.
 
-For large systems the linear-scaling approach for solving Self-Consistent-Field
-equations will be much cheaper computationally than using standard DFT and allows
-scaling up to 1 million atoms for simple systems. The linear scaling cost results
-from the fact that the algorithm is based on an iteration on the density matrix.
-The cubically-scaling orthogonalisation step of standard Quickstep DFT using OT
-is avoided and the key operation is sparse matrix-matrix multiplications, which
-have a number of non-zero entries that scale linearly with system size.
-These are implemented efficiently in the DBCSR library.
+For large systems the linear-scaling approach for solving Self-Consistent-Field equations will be
+much cheaper computationally than using standard DFT and allows scaling up to 1 million atoms for
+simple systems. The linear scaling cost results from the fact that the algorithm is based on an
+iteration on the density matrix. The cubically-scaling orthogonalisation step of standard Quickstep
+DFT using OT is avoided and the key operation is sparse matrix-matrix multiplications, which have a
+number of non-zero entries that scale linearly with system size. These are implemented efficiently
+in the DBCSR library.
 
-The problem size can be tuned by the parameter `NREP` in the input file, whereby
-the number of atoms scales cubically with `NREP`.
+The problem size can be tuned by the parameter `NREP` in the input file, whereby the number of atoms
+scales cubically with `NREP`.
 
 ## Files Description
 
-- [H2O-dft-ls.inp](H2O-dft-ls.inp) (NREP=6): H20 density functional theory
-  linear scaling consisting of 20'736 atoms in a 59 cubic angstrom box (6'912
-  water molecules in total). An LDA functional is used with a DZVP MOLOPT basis
-  set and a 300 Ry cut-off.
-- [H2O-dft-ls.NREP4.inp](H2O-dft-ls.NREP4.inp): H20 density functional theory
-  linear scaling consisting of 6'144 atoms in a 39 cubic angstrom box (2'048
-  water molecules in total). An LDA functional is used with a DZVP MOLOPT basis
-  set and a 300 Ry cut-off.
-- [H2O-dft-ls.NREP2.inp](H2O-dft-ls.NREP2.inp): H20 density functional theory
-  linear scaling consisting of 6'144 atoms in a 39 cubic angstrom box (2'048
-  water molecules in total). An LDA functional is used with a DZVP MOLOPT basis
-  set and a 300 Ry cut-off (a smaller version of the H2O-dft-ls benchmark, with
-  NREP=2, meant to run on 1 node).
+- [H2O-dft-ls.inp](H2O-dft-ls.inp) (NREP=6): H20 density functional theory linear scaling consisting
+  of 20'736 atoms in a 59 cubic angstrom box (6'912 water molecules in total). An LDA functional is
+  used with a DZVP MOLOPT basis set and a 300 Ry cut-off.
+- [H2O-dft-ls.NREP4.inp](H2O-dft-ls.NREP4.inp): H20 density functional theory linear scaling
+  consisting of 6'144 atoms in a 39 cubic angstrom box (2'048 water molecules in total). An LDA
+  functional is used with a DZVP MOLOPT basis set and a 300 Ry cut-off.
+- [H2O-dft-ls.NREP2.inp](H2O-dft-ls.NREP2.inp): H20 density functional theory linear scaling
+  consisting of 6'144 atoms in a 39 cubic angstrom box (2'048 water molecules in total). An LDA
+  functional is used with a DZVP MOLOPT basis set and a 300 Ry cut-off (a smaller version of the
+  H2O-dft-ls benchmark, with NREP=2, meant to run on 1 node).
 - [TiO2.inp](TiO2.inp)
 - [amorph.inp](amorph.inp)
 
@@ -38,8 +34,8 @@ the number of atoms scales cubically with `NREP`.
 
 ### NREP=4
 
-The best configurations are shown below.
-Click the links under "Detailed Results" to see more detail.
+The best configurations are shown below. Click the links under "Detailed Results" to see more
+detail.
 
 | Machine Name | Architecture | Date       | SVN Revision | Fastest time (s) | Number of Cores | Number of Threads                  | Detailed Results                                                              |
 | ------------ | ------------ | ---------- | ------------ | ---------------- | --------------- | ---------------------------------- | ----------------------------------------------------------------------------- |
@@ -57,13 +53,11 @@ Following results were obtained in the following conditions:
 - Date: 15th November 2019
 - CP2K version: version 7.0 (Development Version, git:78cea8eeebb25e459941d8a28d987c9990d92676)
 - DBCSR version: v2.0.0-rc9 (git:15fdaba855385f12db7599a6e69b51a7a4ce8a9a)
-- CP2K flags: omp libint fftw3 libxc elpa parallel scalapack acc pw_cuda
-  xsmm dbcsr_acc max_contr=4
+- CP2K flags: omp libint fftw3 libxc elpa parallel scalapack acc pw_cuda xsmm dbcsr_acc max_contr=4
 - Machine: Piz Daint (GPU partition), CSCS
 - Slurm configuration: 2 MPI ranks per node, 12 OpenMP threads per MPI rank
-- The cell contents specify the runtime (`grep 'CP2K    ' output.out`) in seconds,
-  while the cells marked with an `X` crashed with out-of-memory errors, and the
-  cells left empty weren't measured.
+- The cell contents specify the runtime (`grep 'CP2K    ' output.out`) in seconds, while the cells
+  marked with an `X` crashed with out-of-memory errors, and the cells left empty weren't measured.
 
 | nodes / NREP | NREP=1 | NREP=2 | NREP=3 | NREP=4 | NREP=6 | NREP=8 | NREP=9 |
 | ------------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
diff --git a/benchmarks/QS_LiH_HFX/README.md b/benchmarks/QS_LiH_HFX/README.md
index a6d505f220..bcb68bbd72 100644
--- a/benchmarks/QS_LiH_HFX/README.md
+++ b/benchmarks/QS_LiH_HFX/README.md
@@ -4,38 +4,33 @@ Hybrid benchmark to test CP2K scaling up to 10000s of cores
 
 ## Description
 
-This is a single-point DFT energy calculation using Quickstep GAPW (Gaussian and
-Augmented Plane-Waves) with hybrid Hartree-Fock exchange. It consists of a 216
-atom Lithium Hydride crystal with 432 electrons in a 12.3 cubic angstrom cell.
-These types of calculations are generally around one hundred times the
-computational cost of a standard local DFT calculation, although this can be
-reduced using the Auxiliary Density Matrix Method (ADMM). Using OpenMP is of
-particular benefit here as the HFX implementation requires a large amount of
-memory to store partial integrals. By using several threads, fewer MPI processes
-share the available memory on the node and thus enough memory is available to
-avoid recomputing any integrals on-the-fly, improving performance.
+This is a single-point DFT energy calculation using Quickstep GAPW (Gaussian and Augmented
+Plane-Waves) with hybrid Hartree-Fock exchange. It consists of a 216 atom Lithium Hydride crystal
+with 432 electrons in a 12.3 cubic angstrom cell. These types of calculations are generally around
+one hundred times the computational cost of a standard local DFT calculation, although this can be
+reduced using the Auxiliary Density Matrix Method (ADMM). Using OpenMP is of particular benefit here
+as the HFX implementation requires a large amount of memory to store partial integrals. By using
+several threads, fewer MPI processes share the available memory on the node and thus enough memory
+is available to avoid recomputing any integrals on-the-fly, improving performance.
 
 ## Files description
 
-- [`input_bulk_B88_3.inp`](input_bulk_B88_3.inp): needed to generate an initial
-  wfn (wave function) file for the HFX runs (this should be run once before
-  running the actual HFX benchmark, and is not a part of the benchmark)
-- [`input_bulk_HFX_3.inp`](input_bulk_HFX_3.inp): the actual input file for the
-  HFX benchmark
-- the additional files [`t_c_g.dat`](../../data/t_c_g.dat) and
-  [`POTENTIAL`](../../data/POTENTIAL) are needed, and can be found in the
-  `cp2k/data/` directory.
+- [`input_bulk_B88_3.inp`](input_bulk_B88_3.inp): needed to generate an initial wfn (wave function)
+  file for the HFX runs (this should be run once before running the actual HFX benchmark, and is not
+  a part of the benchmark)
+- [`input_bulk_HFX_3.inp`](input_bulk_HFX_3.inp): the actual input file for the HFX benchmark
+- the additional files [`t_c_g.dat`](../../data/t_c_g.dat) and [`POTENTIAL`](../../data/POTENTIAL)
+  are needed, and can be found in the `cp2k/data/` directory.
 
 ## Benchmark Requirements
 
-To run these this benchmark, CP2K needs to be compiled with libint support
-(-D\_\_LIBINT), and it is advantageous to have a OMP/MPI hybrid code (cp2k.psmp).
+To run these this benchmark, CP2K needs to be compiled with libint support (-D\_\_LIBINT), and it is
+advantageous to have a OMP/MPI hybrid code (cp2k.psmp).
 
 ## How to Run the Benchmark
 
-1. as a preliminary step, not part of the benchmark: run `input_bulk_B88_3.inp`
-   (5min. with 256x1 mpixomp tasks) and rename the resulting wavefunction file
-   `LiH_bulk_3-RESTART.wfn` to `B88.wfn`
+1. as a preliminary step, not part of the benchmark: run `input_bulk_B88_3.inp` (5min. with 256x1
+   mpixomp tasks) and rename the resulting wavefunction file `LiH_bulk_3-RESTART.wfn` to `B88.wfn`
 
    ```shell
    cp LiH_bulk_3-RESTART.wfn B88.wfn
@@ -57,10 +52,9 @@ To run these this benchmark, CP2K needs to be compiled with libint support
 
 ## Notes
 
-The amount of memory available per MPI process must be altered according to the
-number of MPI processes being used. If this is not done the benchmark will crash
-with an out of memory (OOM) error. The input file keyword `MAX_MEMORY` in
-`input_bulk_HFX_3.inp` needs to be changed as follows:
+The amount of memory available per MPI process must be altered according to the number of MPI
+processes being used. If this is not done the benchmark will crash with an out of memory (OOM)
+error. The input file keyword `MAX_MEMORY` in `input_bulk_HFX_3.inp` needs to be changed as follows:
 
 ```cp2k-input
 MAX_MEMORY 14000
@@ -72,9 +66,9 @@ should be changed to
 MAX_MEMORY new_value
 ```
 
-The new value of `MAX_MEMORY` is chosen by dividing the total amount of memory
-available on a node by the number of MPI processes being used per node.
-If a shorter runtime is desirable, the following line in `input_bulk_HFX_3.inp`:
+The new value of `MAX_MEMORY` is chosen by dividing the total amount of memory available on a node
+by the number of MPI processes being used per node. If a shorter runtime is desirable, the following
+line in `input_bulk_HFX_3.inp`:
 
 ```cp2k-input
 MAX_SCF 20
@@ -86,10 +80,9 @@ may be changed to
 MAX_SCF 1
 ```
 
-in order to reduce the maximum number of SCF cycles and hence the execution
-time. If the runtime or required memory needs to be reduced so the benchmark can
-run on a smaller number of nodes, the OPT1 basis set can be used instead of the
-default OPT2. To this end, the line
+in order to reduce the maximum number of SCF cycles and hence the execution time. If the runtime or
+required memory needs to be reduced so the benchmark can run on a smaller number of nodes, the OPT1
+basis set can be used instead of the default OPT2. To this end, the line
 
 ```cp2k-input
 BASIS_SET OPT2
@@ -105,8 +98,8 @@ BASIS_SET OPT1
 
 ### Best Configurations
 
-The best configurations are shown below. Click the links under
-"Detailed Results" to see more detail.
+The best configurations are shown below. Click the links under "Detailed Results" to see more
+detail.
 
 | Machine Name | Architecture |       Date | SVN Revision | Fastest time (s) | Number of cores |                  Number of threads |                                                        Detailed results |
 | -----------: | -----------: | ---------: | -----------: | ---------------: | --------------: | ---------------------------------: | ----------------------------------------------------------------------: |
@@ -117,14 +110,14 @@ The best configurations are shown below. Click the links under
 |       Cirrus |   SGI ICE XA | 24/11/2016 |        17566 |          483.676 |            2016 |         6 OMP threads per MPI task |       [cirrus-lih-hfx](https://www.cp2k.org/performance:cirrus-lih-hfx) |
 |       Noctua |   Cray CS500 | 25/09/2019 |      9f58d81 |          131.290 |           10240 |         4 OMP threads per MPI task |       [noctua-lih-hfx](https://www.cp2k.org/performance:noctua-lih-hfx) |
 
-(\*) Prior to r14945, a bug resulted in an underestimation of the number of ERIs
-which should be computed (by roughly 50% for this benchmark. Therefore these
-results cannot be compared directly with later ones.
+(\*) Prior to r14945, a bug resulted in an underestimation of the number of ERIs which should be
+computed (by roughly 50% for this benchmark. Therefore these results cannot be compared directly
+with later ones.
 
 ### 2009-04-15
 
-Running on ORNL's Cray XT5 (Jaguar) the following runtime has been obtained in a
-setup using 8 OMP threads per node (8 cores per node / 16 Gb per node).
+Running on ORNL's Cray XT5 (Jaguar) the following runtime has been obtained in a setup using 8 OMP
+threads per node (8 cores per node / 16 Gb per node).
 
 | Cores | Full CP2K\[s\] | HFX\[s\] | local HFX\[s\] | Mem/node\[Mb\] |
 | ----: | -------------: | -------: | -------------: | -------------: |
diff --git a/benchmarks/QS_low_scaling_postHF/README.md b/benchmarks/QS_low_scaling_postHF/README.md
index 30d79c123a..cfe0e9474a 100644
--- a/benchmarks/QS_low_scaling_postHF/README.md
+++ b/benchmarks/QS_low_scaling_postHF/README.md
@@ -3,9 +3,10 @@
 ## Description
 
 This benchmark set showcases sparse tensor based low-scaling post-HF ENERGY_FORCE calculations. Both
-the RPA and SOS-MP2 methods are represented. The implementation relies on the RI approximation in the
-AO basis. For increased sparsity and performance, the overlap RI metric is used. All input files are
-of production level. The methods make use of the DBT and DBM libraries, which are GPU accelerated.
+the RPA and SOS-MP2 methods are represented. The implementation relies on the RI approximation in
+the AO basis. For increased sparsity and performance, the overlap RI metric is used. All input files
+are of production level. The methods make use of the DBT and DBM libraries, which are GPU
+accelerated.
 
 ## Files
 
@@ -18,11 +19,12 @@ correlation energy is calculated on top of a PBE SCF (without EXX).
 ## Reference timings
 
 These benchmarks were run during the pilot phase of the
-[LUMI-G supercomputer](https://docs.lumi-supercomputer.eu/hardware/compute/lumig/) (December 2022), under
+[LUMI-G supercomputer](https://docs.lumi-supercomputer.eu/hardware/compute/lumig/) (December 2022),
+under
 [git commit d8f3624](https://github.com/cp2k/cp2k/commit/d8f36242127b9a828f127550ecd9613eefb3f1cc).
-All calculations were run with 4 GPUs (8 GCDs), 16 MPI ranks, and 3 OMP threads per node.
-Note that this amounts to a total of 48 out of the 64 CPUs of a LUMI-G node,
-because full nodes where not available at the time.
+All calculations were run with 4 GPUs (8 GCDs), 16 MPI ranks, and 3 OMP threads per node. Note that
+this amounts to a total of 48 out of the 64 CPUs of a LUMI-G node, because full nodes where not
+available at the time.
 
 ### 32-H2O
 
@@ -49,19 +51,19 @@ because full nodes where not available at the time.
 
 Note that the choice of the overlap RI metric is the most performant because it leads to the highest
 sparsity. Alternatively, the truncated Coulomb operator could be used, for example with a cutoff
-radius of 1-2 Angstroms, which could lead to more accurate results, especially for system with smaller
-basis sets (e.g. double-zeta). Using the exact same input files, but with a truncated Coulomb metric
-of range 1 Ang, leads to calculations 20% to 30% more expensive.
+radius of 1-2 Angstroms, which could lead to more accurate results, especially for system with
+smaller basis sets (e.g. double-zeta). Using the exact same input files, but with a truncated
+Coulomb metric of range 1 Ang, leads to calculations 20% to 30% more expensive.
 
 Most input files use 4 as the value for MEMORY_CUT. This controls the batching procedure of sparse
 tensor contractions such that the memory footprint remains limited. Note that using a high value for
-this keyword leads to overheads. As an illustration, the H2O-128-RPA-TZ.inp input file with MEMORY_CUT
-set to 3 does not run on 32 LUMI-G nodes because of memory. However, it runs on 64 nodes with a wall
-time of 2149 seconds (compared to 2578 seconds with MEMORY_CUT set to 4).
-
-Finally, the value of 6 was chosen for the MIN_BLOCK_SIZE keyword, while the default is 5. This is
-a trade-off between better sparsity for smaller block size, and higher efficiency of block-wise
-matrix-matrix mulitplications for larger sizes. This mostly matters for the force calculations of large
-system, because the bottleneck of the calculation involves the contraction of fairly dense tensors.
-The H2O-128-RPA-TZ.inp input file with MIN_BLOCK_SIZE 5 runs in 4105 seconds on 32 nodes, versus
-3710 seconds with a block size of 6.
+this keyword leads to overheads. As an illustration, the H2O-128-RPA-TZ.inp input file with
+MEMORY_CUT set to 3 does not run on 32 LUMI-G nodes because of memory. However, it runs on 64 nodes
+with a wall time of 2149 seconds (compared to 2578 seconds with MEMORY_CUT set to 4).
+
+Finally, the value of 6 was chosen for the MIN_BLOCK_SIZE keyword, while the default is 5. This is a
+trade-off between better sparsity for smaller block size, and higher efficiency of block-wise
+matrix-matrix mulitplications for larger sizes. This mostly matters for the force calculations of
+large system, because the bottleneck of the calculation involves the contraction of fairly dense
+tensors. The H2O-128-RPA-TZ.inp input file with MIN_BLOCK_SIZE 5 runs in 4105 seconds on 32 nodes,
+versus 3710 seconds with a block size of 6.
diff --git a/benchmarks/QS_mp2_rpa/128-H2O/README.md b/benchmarks/QS_mp2_rpa/128-H2O/README.md
index 051f6c51fa..663a503748 100644
--- a/benchmarks/QS_mp2_rpa/128-H2O/README.md
+++ b/benchmarks/QS_mp2_rpa/128-H2O/README.md
@@ -2,8 +2,7 @@
 
 ## Description of Input Files
 
-- [`H2O-128-PBE-TZ.inp`](H2O-128-PBE-TZ.inp):
-  needed to generate an initial wfn for the SCF runs
+- [`H2O-128-PBE-TZ.inp`](H2O-128-PBE-TZ.inp): needed to generate an initial wfn for the SCF runs
 - [`H2O-128-RI-dRPA-TZ.inp`](H2O-128-RI-dRPA-TZ.inp): actual RI-dRPA benchmark
 
 ## Additional files
@@ -14,8 +13,8 @@
 
 ## How to Run the Benchmark
 
-1. run `H2O-128-PBE-TZ.inp`: this will generate the file `H2O-128-PBE-TZ-RESTART.wfn`,
-   necessary for the benchmark run.
+1. run `H2O-128-PBE-TZ.inp`: this will generate the file `H2O-128-PBE-TZ-RESTART.wfn`, necessary for
+   the benchmark run.
 1. run `H2O-128-RI-dRPA-TZ.inp` for the RI-RPA benchmark.
 
 ## Results
diff --git a/benchmarks/QS_mp2_rpa/32-H2O/README.md b/benchmarks/QS_mp2_rpa/32-H2O/README.md
index 0f31a2a64c..593c1b5e51 100644
--- a/benchmarks/QS_mp2_rpa/32-H2O/README.md
+++ b/benchmarks/QS_mp2_rpa/32-H2O/README.md
@@ -9,20 +9,18 @@
 
 - [`BASIS_H2O`](BASIS_H2O): contains the primary and auxiliary(RI) basis sets
 - [`H2O-32.xyz`](H2O-32.xyz): geometry in xyz format
-- [`H2O-32-PBE-TZ.inp`](H2O-32-PBE-TZ.inp):
-  needed to generate an initial DFT wfn (RPA, MP2)
-- [`H2O-32-HF-TZ.inp`](H2O-32-HF-TZ.inp):
-  needed to refine DFT wfn at HF level (MP2)
+- [`H2O-32-PBE-TZ.inp`](H2O-32-PBE-TZ.inp): needed to generate an initial DFT wfn (RPA, MP2)
+- [`H2O-32-HF-TZ.inp`](H2O-32-HF-TZ.inp): needed to refine DFT wfn at HF level (MP2)
 - [`H2O-32-RI-MP2-TZ.inp`](H2O-32-RI-MP2-TZ.inp): actual RI-MP2 benchmark (MP2)
 - [`H2O-32-RI-dRPA-TZ.inp`](H2O-32-RI-dRPA-TZ.inp): actual RI-RPA benchmark (RPA)
 
-the additional files [`t_c_g.dat`](../../../data/t_c_g.dat) and [`POTENTIAL`](../../../data/POTENTIAL)
-are taken from [cp2k/data](../../../data) directory.
+the additional files [`t_c_g.dat`](../../../data/t_c_g.dat) and
+[`POTENTIAL`](../../../data/POTENTIAL) are taken from [cp2k/data](../../../data) directory.
 
 ## How to Run the Benchmark
 
-1. run `H2O-32-PBE-TZ.inp`: this will generate the file `H2O-32-PBE-TZ-RESTART.wfn`,
-   necessary for the two benchmark runs.
+1. run `H2O-32-PBE-TZ.inp`: this will generate the file `H2O-32-PBE-TZ-RESTART.wfn`, necessary for
+   the two benchmark runs.
 1. run `H2O-32-RI-MP2-TZ.inp` for the RI-MP2 benchmark.
 1. and/or run `H2O-32-RI-dRPA-TZ.inp` for the RI-RPA benchmark.
 
diff --git a/benchmarks/QS_mp2_rpa/64-H2O/README.md b/benchmarks/QS_mp2_rpa/64-H2O/README.md
index c27aa2dd0a..d421ea528c 100644
--- a/benchmarks/QS_mp2_rpa/64-H2O/README.md
+++ b/benchmarks/QS_mp2_rpa/64-H2O/README.md
@@ -2,13 +2,11 @@
 
 ## Description of Input Files
 
-- [`H2O-64-PBE-TZ.inp`](H2O-64-PBE-TZ.inp):
-  needed to generate an initial wfn for the SCF runs
-- [`H2O-64-RI-MP2-TZ.inp`](H2O-64-RI-MP2-TZ.inp): actual RI-MP2 benchmark:
-  the system consists of 64 water molecules in a 12.4 cubic angstrom cell.
-  This is exactly the same system as used in the [Quickstep H2O-64](../../QS/H2O-64.inp)
-  benchmark but using a much more accurate model, which is around 100 times more
-  computationally demanding than standard DFT calculations.
+- [`H2O-64-PBE-TZ.inp`](H2O-64-PBE-TZ.inp): needed to generate an initial wfn for the SCF runs
+- [`H2O-64-RI-MP2-TZ.inp`](H2O-64-RI-MP2-TZ.inp): actual RI-MP2 benchmark: the system consists of 64
+  water molecules in a 12.4 cubic angstrom cell. This is exactly the same system as used in the
+  [Quickstep H2O-64](../../QS/H2O-64.inp) benchmark but using a much more accurate model, which is
+  around 100 times more computationally demanding than standard DFT calculations.
 - [`H2O-64-RI-dRPA-TZ.inp`](H2O-64-RI-dRPA-TZ.inp): actual RI-dRPA benchmark
 
 ## Additional files
@@ -17,13 +15,13 @@
 - [`POTENTIAL_H2O`](POTENTIAL_H2O): contains the GTH pseudo potentials
 - [`H2O-64.xyz`](H2O-64.xyz): geometry in xyz format
 
-the additional files [`t_c_g.dat`](../../../data/t_c_g.dat) is needed for the
-RI-MP2 run, and can be found in the [cp2k/data](../../../data) directory.
+the additional files [`t_c_g.dat`](../../../data/t_c_g.dat) is needed for the RI-MP2 run, and can be
+found in the [cp2k/data](../../../data) directory.
 
 ## How to Run the Benchmark
 
-1. run `H2O-64-PBE-TZ.inp`: this will generate the file `H2O-64-PBE-TZ-RESTART.wfn`,
-   necessary for the two benchmark runs.
+1. run `H2O-64-PBE-TZ.inp`: this will generate the file `H2O-64-PBE-TZ-RESTART.wfn`, necessary for
+   the two benchmark runs.
 1. run `H2O-64-RI-MP2-TZ.inp` for the RI-MP2 benchmark.
 1. and/or run `H2O-64-RI-dRPA-TZ.inp` for the RI-RPA benchmark.
 
@@ -31,8 +29,7 @@ RI-MP2 run, and can be found in the [cp2k/data](../../../data) directory.
 
 ### Best Configurations
 
-The best configurations are shown below.
-Click the links under "Detailed Results"to see more detail.
+The best configurations are shown below. Click the links under "Detailed Results"to see more detail.
 
 | Machine Name | Architecture | Date       | SVN Revision | Fastest time (s) | Number of Cores | Number of Threads                  | Detailed Results                                                                    |
 | ------------ | ------------ | ---------- | ------------ | ---------------- | --------------- | ---------------------------------- | ----------------------------------------------------------------------------------- |
diff --git a/benchmarks/QS_mp2_rpa/README.md b/benchmarks/QS_mp2_rpa/README.md
index 5580b50f23..b1bbdd455a 100644
--- a/benchmarks/QS_mp2_rpa/README.md
+++ b/benchmarks/QS_mp2_rpa/README.md
@@ -4,16 +4,16 @@ Hybrid benchmark for RI-MP2 and RI-dRPA.
 
 ## Description
 
-This benchmark is a single-point energy calculation using 2nd order Moeller-Plesset
-perturbation theory (MP2) with the Resolution-of-the-Identity approximation to
-calculate the exchange-correlation energy.
+This benchmark is a single-point energy calculation using 2nd order Moeller-Plesset perturbation
+theory (MP2) with the Resolution-of-the-Identity approximation to calculate the exchange-correlation
+energy.
 
 ## Benchmark Requirements
 
-To run these benchmarks, CP2K needs to be compiled with libint support (`-D__LIBINT`).
-Libint library has to be compiled such that higher angular momentum can be computed
-(see: [libint-cp2k/README](https://github.com/cp2k/libint-cp2k), use,
-for example, `--with-libint-max-am=6 --with-libderiv-max-am1=5`.
+To run these benchmarks, CP2K needs to be compiled with libint support (`-D__LIBINT`). Libint
+library has to be compiled such that higher angular momentum can be computed (see:
+[libint-cp2k/README](https://github.com/cp2k/libint-cp2k), use, for example,
+`--with-libint-max-am=6 --with-libderiv-max-am1=5`.
 
-It is advantageous to have a OMP/MPI hybrid code (cp2k.psmp). In particular the
-RI-MP2 and RI-dRPA inputs are suitable for being used with 2 threads per task.
+It is advantageous to have a OMP/MPI hybrid code (cp2k.psmp). In particular the RI-MP2 and RI-dRPA
+inputs are suitable for being used with 2 threads per task.
diff --git a/benchmarks/QS_ot_ls/README.md b/benchmarks/QS_ot_ls/README.md
index c0a3809529..12b728f8b4 100644
--- a/benchmarks/QS_ot_ls/README.md
+++ b/benchmarks/QS_ot_ls/README.md
@@ -2,8 +2,8 @@
 
 **Authors:** V. Weber and U. Borstnik
 
-The purpose of this benchmark is to analyse the time spent in the following
-linear algebra routines as displayed at the end of the cp2k output.
+The purpose of this benchmark is to analyse the time spent in the following linear algebra routines
+as displayed at the end of the cp2k output.
 
 **Main Routines:**
 
@@ -30,5 +30,5 @@ On rosa the total run times are approximately:
 | H2O-4096.inp  |   2304 cores |                15 min |
 | H2O-65536.inp |              | doesn't run so far... |
 
-Runs performed with cp2k from 30.01.09 (no libdbcsr routines used), compiled
-with gfortran and linked to the default libraries.
+Runs performed with cp2k from 30.01.09 (no libdbcsr routines used), compiled with gfortran and
+linked to the default libraries.
diff --git a/benchmarks/QS_pao_ml_tio2/README.md b/benchmarks/QS_pao_ml_tio2/README.md
index c88a3d2909..7aff00b43b 100644
--- a/benchmarks/QS_pao_ml_tio2/README.md
+++ b/benchmarks/QS_pao_ml_tio2/README.md
@@ -7,8 +7,7 @@ Large scale benchmark for PAO ML (and/or LS DFT in general).
 Bunzip2 all files, have [`BASIS_MOLOPT`](../../data/BASIS_MOLOPT) and
 [`GTH_POTENTIALS`](../../data/GTH_POTENTIALS) available (from [cp2k/data](../../data/).
 
-For tuning purposes, the length of the full benchmark can be reduced in the
-following ways:
+For tuning purposes, the length of the full benchmark can be reduced in the following ways:
 
 - by reducing the number of MD steps (STEPS 20 -> STEPS 5)
 - by doing only an energy calculation (RUN_TYPE MD -> RUN_TYPE ENERGY)
@@ -16,8 +15,8 @@ following ways:
 
 ## Results Archive
 
-Reference energies and timings using CP2K svn:17405, Piz Daint, Cray XC30, 1024
-nodes, SB + K20X, +- 8Gb mem used per node
+Reference energies and timings using CP2K svn:17405, Piz Daint, Cray XC30, 1024 nodes, SB + K20X, +-
+8Gb mem used per node
 
 ### Output File
 
diff --git a/benchmarks/QS_single_node/README.md b/benchmarks/QS_single_node/README.md
index ae2a5e9e7e..507d5bc3d5 100644
--- a/benchmarks/QS_single_node/README.md
+++ b/benchmarks/QS_single_node/README.md
@@ -1,6 +1,5 @@
 # Quickstep Single Node Regression Tests
 
-The purpose of these tests is to give a quick idea of the performance of a
-system, when only a single node is available.
-The tests should run in \< 5 minutes on a modern node,
-and require less than 16Gb of memory.
+The purpose of these tests is to give a quick idea of the performance of a system, when only a
+single node is available. The tests should run in \< 5 minutes on a modern node, and require less
+than 16Gb of memory.
diff --git a/benchmarks/QS_stmv/README.md b/benchmarks/QS_stmv/README.md
index d9e6a4d2cc..2de6fa19f1 100644
--- a/benchmarks/QS_stmv/README.md
+++ b/benchmarks/QS_stmv/README.md
@@ -1,26 +1,24 @@
 # STMV benchmark
 
-This benchmark test the performance of CP2K to run calculations of the electronic
-structure of relatively complex systems containing a million atoms. The input is
-based on [earlier work](https://pubs.acs.org/doi/full/10.1021/acs.jctc.6b00398)
-where the electronic structure of the STMV virus was simulated based on DFT and
-subsystem DFT. Here, instead, the xTB tight-binding method is employed. The input
-is realistic in its input settings and might be useful to set up similar systems.
+This benchmark test the performance of CP2K to run calculations of the electronic structure of
+relatively complex systems containing a million atoms. The input is based on
+[earlier work](https://pubs.acs.org/doi/full/10.1021/acs.jctc.6b00398) where the electronic
+structure of the STMV virus was simulated based on DFT and subsystem DFT. Here, instead, the xTB
+tight-binding method is employed. The input is realistic in its input settings and might be useful
+to set up similar systems.
 
 ## Properties of the benchmark
 
-The benchmark exercises in particular the sparse matrix handling and linear
-scaling algorithms in CP2K. It performs 1 step of geometry optimization, so
-requires SCF, energy and force calculations. Some properties are computed as well.
-Given the xTB method, relatively small block sizes dominate in the sparse matrix
-multiplication.
+The benchmark exercises in particular the sparse matrix handling and linear scaling algorithms in
+CP2K. It performs 1 step of geometry optimization, so requires SCF, energy and force calculations.
+Some properties are computed as well. Given the xTB method, relatively small block sizes dominate in
+the sparse matrix multiplication.
 
 ## Typical timings and setup
 
-A typical parallel run will require on the order of 256 nodes, to see completion
-of the benchmark in a reasonable time, and memory consumption. An invocation with
-slurm on a system with 32 threads per node (dual socket, Intel(R) Xeon(R)
-CPU E5-2695 v4 @ 2.10GHz, Piz Daint multi-core) could look like:
+A typical parallel run will require on the order of 256 nodes, to see completion of the benchmark in
+a reasonable time, and memory consumption. An invocation with slurm on a system with 32 threads per
+node (dual socket, Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz, Piz Daint multi-core) could look like:
 
 ```shell
 export OMP_NUM_THREADS=8
@@ -28,8 +26,8 @@ srun --cpu-bind=none --nodes=256 --ntasks=1024 --ntasks-per-node=4 \
      --cpus-per-task=8 ./cp2k.psmp -i stmv_xtb.inp -o stmv_xtb.out
 ```
 
-Which would need roughly 7Gb per rank (28Gb per node), and would run in in less
-than 4h. The timing report for  this run (based on CP2K 7.0, git:bf104a630):
+Which would need roughly 7Gb per rank (28Gb per node), and would run in in less than 4h. The timing
+report for this run (based on CP2K 7.0, git:bf104a630):
 
 ```cp2k-output
 SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME
diff --git a/benchmarks/README.md b/benchmarks/README.md
index 309ca33d00..d86d21b499 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -2,28 +2,28 @@
 
 This directory contains input files for CP2K's benchmarks.
 
-For measurements from different machines, please refer to [CP2K benchmark suite](https://www.cp2k.org/performance),
-and for documentation on CP2K's input files, please refer to the
-[Input Reference Manual](https://manual.cp2k.org/). Python scripts for
+For measurements from different machines, please refer to
+[CP2K benchmark suite](https://www.cp2k.org/performance), and for documentation on CP2K's input
+files, please refer to the [Input Reference Manual](https://manual.cp2k.org/). Python scripts for
 generating the scaling graphs are provided in [tools/benchmark_plots/](../tools/benchmark_plots).
 
-**Note:** the benchmark names make common use of acronyms. For explanations,
-please refer to the [Glossary of Acronyms and Abbreviations](https://www.cp2k.org/acronyms).
+**Note:** the benchmark names make common use of acronyms. For explanations, please refer to the
+[Glossary of Acronyms and Abbreviations](https://www.cp2k.org/acronyms).
 
 ## Introduction
 
-The purpose of the CP2K benchmark suite is to provide performance which can be
-used to guide users towards the best configuration (e.g. machine, number of MPI
-processors, number of OpenMP threads) for a particular problem, and give a good
-estimation for the parallel performance of the code for different types of
-methods.
+The purpose of the CP2K benchmark suite is to provide performance which can be used to guide users
+towards the best configuration (e.g. machine, number of MPI processors, number of OpenMP threads)
+for a particular problem, and give a good estimation for the parallel performance of the code for
+different types of methods.
 
-The systems used to obtain the benchmark results are described on the [systems page](https://www.cp2k.org/performance:systems).
+The systems used to obtain the benchmark results are described on the
+[systems page](https://www.cp2k.org/performance:systems).
 
 ## Benchmarks
 
-See the `README.md` inside each benchmark sub-directory for descriptions of each
-benchmark along with performance numbers.
+See the `README.md` inside each benchmark sub-directory for descriptions of each benchmark along
+with performance numbers.
 
 Benchmarks currently available:
 
@@ -41,12 +41,11 @@ Benchmarks currently available:
 
 ### Run Benchmarks
 
-Some benchmarks require a preliminary step to generate an input file, e.g. a
-wavefunction. When that is the case, it is specified in the `README.md` inside
-the benchmark's sub-directory.
+Some benchmarks require a preliminary step to generate an input file, e.g. a wavefunction. When that
+is the case, it is specified in the `README.md` inside the benchmark's sub-directory.
 
-The general way to run the benchmarks with the hybrid parallel executable is,
-e.g. for 2 threads per rank:
+The general way to run the benchmarks with the hybrid parallel executable is, e.g. for 2 threads per
+rank:
 
 ```shell
 export OMP_NUM_THREADS=2
@@ -55,17 +54,17 @@ parallel_launcher launcher_options path_to_cp2k.psmp -i inputfile.inp -o logfile
 
 where:
 
-- The parallel_launcher is mpirun, mpiexec, or some variant such as aprun on
-  Cray systems or srun when using Slurm.
-- `launcher_options` specifies parallel placement in terms of total numbers of
-  nodes, MPI ranks/tasks, tasks per node, and OpenMP threads per task (which
-  should be equal to the value given to OMP_NUM_THREADS). This is not necessary
-  if parallel runtime options are picked up by the launcher from the job environment.
+- The parallel_launcher is mpirun, mpiexec, or some variant such as aprun on Cray systems or srun
+  when using Slurm.
+- `launcher_options` specifies parallel placement in terms of total numbers of nodes, MPI
+  ranks/tasks, tasks per node, and OpenMP threads per task (which should be equal to the value given
+  to OMP_NUM_THREADS). This is not necessary if parallel runtime options are picked up by the
+  launcher from the job environment.
 
 ### Obtain Benchmark Results
 
-The reported walltime for a given run can be obtained by querying the resulting
-`.log` file for CP2K's internal timing, as follows:
+The reported walltime for a given run can be obtained by querying the resulting `.log` file for
+CP2K's internal timing, as follows:
 
 ```shell
 grep "CP2K     "  *.log
@@ -73,21 +72,20 @@ grep "CP2K     "  *.log
 
 Moreover, the end of the resulting `.log` files contains some performance numbers:
 
-- `DBCSR STATISTICS`: statistics on DBCSR's computation and communication
-  performance. First few lines: number of flops spent on different small dense
-  block sizes, and which proportion of them ran on BLAS, Small Matrix-Matrix
-  multiplicator (`SMM`), and GPU (`ACC`).
+- `DBCSR STATISTICS`: statistics on DBCSR's computation and communication performance. First few
+  lines: number of flops spent on different small dense block sizes, and which proportion of them
+  ran on BLAS, Small Matrix-Matrix multiplicator (`SMM`), and GPU (`ACC`).
 - `DBCSR MESSAGE PASSING PERFORMANCE`: statistics on MPI calls in DBCSR
 - `MESSAGE PASSING PERFORMANCE`: statistics on MPI calls in CP2K
 - `T I M I N G`: timing and number of calls of CP2K functions
 
 ### Plotting
 
-Python scripts for generating the scaling graphs are provided in [cp2k/tools/benchmark_plots/](../tools/benchmark_plots/).
+Python scripts for generating the scaling graphs are provided in
+[cp2k/tools/benchmark_plots/](../tools/benchmark_plots/).
 
 ## Contributing
 
-We encourage you to contribute benchmark results from your own local cluster or
-HPC system - just run the inputs and add timings in the relevant sections below.
-Please also update the [list of machines](https://www.cp2k.org/performance:systems)
-for which benchmark data is provided.
+We encourage you to contribute benchmark results from your own local cluster or HPC system - just
+run the inputs and add timings in the relevant sections below. Please also update the
+[list of machines](https://www.cp2k.org/performance:systems) for which benchmark data is provided.
diff --git a/data/NNP/bulkH2O-jcp2020-cnnp/README.md b/data/NNP/bulkH2O-jcp2020-cnnp/README.md
index e8a2331237..cad3b2d73b 100644
--- a/data/NNP/bulkH2O-jcp2020-cnnp/README.md
+++ b/data/NNP/bulkH2O-jcp2020-cnnp/README.md
@@ -2,13 +2,12 @@
 
 This directory contains NNP models from the following paper:
 
-Committee neural network potentials control generalization errors
-and enable active learning
-Christoph Schran, Kyrstof Brezina, and Ondrej Marsalek
-J. Chem. Phys. 153, 2020, DOI: [10.1063/5.0016004](https://doi.org/10.1063/5.0016004)
-arXiv: [2006.01541](https://arxiv.org/abs/2006.01541)
+Committee neural network potentials control generalization errors and enable active learning
+Christoph Schran, Kyrstof Brezina, and Ondrej Marsalek J. Chem. Phys. 153, 2020, DOI:
+[10.1063/5.0016004](https://doi.org/10.1063/5.0016004) arXiv:
+[2006.01541](https://arxiv.org/abs/2006.01541)
 
 ## Contents
 
-- `nnp-1/2/3/4/5/6/7/8`
-  All parameters of the C-NNP model from generation 4 of the active learning procedure.
+- `nnp-1/2/3/4/5/6/7/8` All parameters of the C-NNP model from generation 4 of the active learning
+  procedure.
diff --git a/docs/README.md b/docs/README.md
index 64f79c51e6..54466849a4 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,8 +1,9 @@
 # CP2K Documentation
 
-These are the source of the [CP2K manual](https://manual.cp2k.org/trunk). They are published daily by [this script](../tools/docker/scripts/test_manual.sh).
+These are the source of the [CP2K manual](https://manual.cp2k.org/trunk). They are published daily
+by [this script](../tools/docker/scripts/test_manual.sh).
 
-To build  a local version of the manual perform the following steps:
+To build a local version of the manual perform the following steps:
 
 1. Install the required Python packaged:
 
diff --git a/docs/getting-started/spack.md b/docs/getting-started/spack.md
index 62dba7bc95..db238d4277 100644
--- a/docs/getting-started/spack.md
+++ b/docs/getting-started/spack.md
@@ -1,6 +1,8 @@
 # Spack
 
-CP2K can be built and installed with [Spack]. [Spack] is a package manager designed to support multiple versions and configurations on a wide variety of platforms and environments, with focus on HPC.
+CP2K can be built and installed with [Spack]. [Spack] is a package manager designed to support
+multiple versions and configurations on a wide variety of platforms and environments, with focus on
+HPC.
 
 To install CP2K with [Spack], you need to [install Spack].
 
@@ -12,7 +14,8 @@ A barebone version of CP2K can be installed with [Spack] as follows:
 spack install cp2k
 ```
 
-This command will build CP2K, as well as all the necessary dependencies. If it is the first time you run this command, building all the dependencies might take a while.
+This command will build CP2K, as well as all the necessary dependencies. If it is the first time you
+run this command, building all the dependencies might take a while.
 
 In order to use CP2K installed with [Spack], you can simply type
 
@@ -20,23 +23,31 @@ In order to use CP2K installed with [Spack], you can simply type
 spack load cp2k
 ```
 
-This command will add the appropriate directories to `PATH` and `MANPATH`. See [using installed Spack packages] for more information.
+This command will add the appropriate directories to `PATH` and `MANPATH`. See
+[using installed Spack packages] for more information.
 
 ### Customizing Installation
 
 #### Variants
 
-[Spack] allows to fully customize an installation. The [CP2K Spack package] has several options for customization (called "variants", see [Spack package variants]). For example, to install CP2K with `libint` and `libxc`, one can type
+[Spack] allows to fully customize an installation. The [CP2K Spack package] has several options for
+customization (called "variants", see [Spack package variants]). For example, to install CP2K with
+`libint` and `libxc`, one can type
 
 ```bash
 spack install cp2k +libint +libxc
 ```
 
-`+VARIANT` will enable a boolean variant, while `~VARIANT` will disable a boolean variant. Non boolean variants can be specified with the `VARIANT=VALUE` syntax (see [Spack package variants] for more details). The previous installation command takes care of building CP2K with `libint` and `libxc` support. More importantly, it takes care of building the appropriate versions of `libint` and `libxc` to work with CP2K (Fortran support, ...).
+`+VARIANT` will enable a boolean variant, while `~VARIANT` will disable a boolean variant. Non
+boolean variants can be specified with the `VARIANT=VALUE` syntax (see [Spack package variants] for
+more details). The previous installation command takes care of building CP2K with `libint` and
+`libxc` support. More importantly, it takes care of building the appropriate versions of `libint`
+and `libxc` to work with CP2K (Fortran support, ...).
 
 #### Versions
 
-Versions in [Spack] can be specified with `@` following the package name (see \[version specifier\]). The following installs version `2023.2` of CP2K:
+Versions in [Spack] can be specified with `@` following the package name (see \[version
+specifier\]). The following installs version `2023.2` of CP2K:
 
 ```bash
 spack install cp2k@2023.2
@@ -48,18 +59,23 @@ A more complete installation of CP2K can be installed with the following:
 spack install cp2k@2023.2 +libint +libxc +dlaf +sirius +cosma +spglib lmax=6 
 ```
 
-The `cp2k@2023.2 +libint +libxc +dlaf +sirius +cosma +spglib lmax=6` string is called a [spec] in [Spack] lingo.
+The `cp2k@2023.2 +libint +libxc +dlaf +sirius +cosma +spglib lmax=6` string is called a [spec] in
+[Spack] lingo.
 
 #### CUDA and ROCm Support
 
-The `cuda` and `rocm` variants are available for CP2K. Therefore, CUDA support can be enabled with `+cuda` and ROCm support can be enabled with `+rocm`. `cuda_arch` and `amdgpu_target` allow specifying the GPU architecture:
+The `cuda` and `rocm` variants are available for CP2K. Therefore, CUDA support can be enabled with
+`+cuda` and ROCm support can be enabled with `+rocm`. `cuda_arch` and `amdgpu_target` allow
+specifying the GPU architecture:
 
 ```bash
 spack install cp2k +cuda cuda_arch=80
 spack install cp2k +rocm amdgpu_target=gfx90a
 ```
 
-[Spack] is designed to support the installation of different versions of the same software, therefore there is no problem with running _both_ commands above. However, `spack load cp2k` will no longer work, you will need to be a bit more specific:
+[Spack] is designed to support the installation of different versions of the same software,
+therefore there is no problem with running _both_ commands above. However, `spack load cp2k` will no
+longer work, you will need to be a bit more specific:
 
 ```bash
 spack load cp2k +cuda
@@ -67,25 +83,30 @@ spack load cp2k +cuda
 
 ### Managing Dependencies
 
-Sometimes you need to control dependencies too. Dependencies are also [Spack] packages, and their installation can be configured in the same way as for CP2K. A dependency spec is defined by `^`.
+Sometimes you need to control dependencies too. Dependencies are also [Spack] packages, and their
+installation can be configured in the same way as for CP2K. A dependency spec is defined by `^`.
 
-For example, if you want to install CP2K with CUDA support but [DBCSR] without CUDA support you can do
+For example, if you want to install CP2K with CUDA support but [DBCSR] without CUDA support you can
+do
 
 ```bash
 spack install cp2k +cuda cuda_arch=80 ^dbcsr ~cuda
 ```
 
-Another example, is the choice of vendor libraries. In order to use Intel oneAPI MKL, it can be specified as a dependency:
+Another example, is the choice of vendor libraries. In order to use Intel oneAPI MKL, it can be
+specified as a dependency:
 
 ```bash
 spack install cp2k ^intel-oneapi-mkl +cluster
 ```
 
-This command will build CP2K with Intel oneAPI MKL. `+cluster` is a variant of the [Intel oneAPI Spack package] enabling cluster support (ScaLAPACK, BLACS, ...).
+This command will build CP2K with Intel oneAPI MKL. `+cluster` is a variant of the
+[Intel oneAPI Spack package] enabling cluster support (ScaLAPACK, BLACS, ...).
 
 ## Developer Workflow
 
-[Spack] has support for developer workflows. One way to use [Spack] to develop CP2K, is to use [Spack] to install all the necessary dependencies and then build CP2K manually.
+[Spack] has support for developer workflows. One way to use [Spack] to develop CP2K, is to use
+[Spack] to install all the necessary dependencies and then build CP2K manually.
 
 To install only the dependencies of CP2K (and not CP2K itself) you can use
 
diff --git a/docs/methods/pao-ml.md b/docs/methods/pao-ml.md
index aa7b4eed71..f1f44b277e 100644
--- a/docs/methods/pao-ml.md
+++ b/docs/methods/pao-ml.md
@@ -1,45 +1,35 @@
 # PAO-ML
 
-PAO-ML stands for Polarized Atomic Orbitals from Machine Learning. It
-uses machine learning to generate geometry adopted small basis sets. It
-also provides exact ionic forces. The scheme can serve as an almost
-drop-in replacement for conventional basis sets to speedup otherwise
-standard DFT calculations. The method is similar to semi-empirical
-models based on minimal basis sets, but offers improved accuracy and
-quasi-automatic parameterization. However, the method is still in an
-early stage - so use with caution. For more information see:
+PAO-ML stands for Polarized Atomic Orbitals from Machine Learning. It uses machine learning to
+generate geometry adopted small basis sets. It also provides exact ionic forces. The scheme can
+serve as an almost drop-in replacement for conventional basis sets to speedup otherwise standard DFT
+calculations. The method is similar to semi-empirical models based on minimal basis sets, but offers
+improved accuracy and quasi-automatic parameterization. However, the method is still in an early
+stage - so use with caution. For more information see:
 [10.1021/acs.jctc.8b00378](https://dx.doi.org/10.1021/acs.jctc.8b00378).
 
 ## Step 1: Obtain training structures
 
-The PAO-ML scheme takes a set of training structures as input. For each
-of these structures, the variational PAO basis is determined via an
-explicit optimization. The training structures should be much smaller
-than the target system, but large enough to contain all the *motifs* of
-the larger system. For liquids a good way to obtain structures is to run
-an MD of a smaller box.
+The PAO-ML scheme takes a set of training structures as input. For each of these structures, the
+variational PAO basis is determined via an explicit optimization. The training structures should be
+much smaller than the target system, but large enough to contain all the *motifs* of the larger
+system. For liquids a good way to obtain structures is to run an MD of a smaller box.
 
 ## Step 2: Calculate reference data in primary basis
 
 Choose a primary basis set, e.g. `DZVP-MOLOPT-GTH` and perform a full
-[LS_SCF](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF)
-optimization. You should also enable
-[RESTART_WRITE](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.RESTART_WRITE)
-to save the final density matrix. It can be used to speed up the next
-step significantly.
+[LS_SCF](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF) optimization. You should also enable
+[RESTART_WRITE](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.RESTART_WRITE) to save the final density matrix.
+It can be used to speed up the next step significantly.
 
 ## Step 3: Optimize PAO basis for training structures
 
-Choose a
-[PAO_BASIS_SIZE](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.PAO_BASIS_SIZE)
-for each atomic kind. Good results can already be optained with a
-minimal basis sets. Slightly larger-than-minimal PAO basis sets can
-significantly increase the accuracy. However, they are also tougher to
-optimize and machine learn.
+Choose a [PAO_BASIS_SIZE](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.PAO_BASIS_SIZE) for each atomic kind.
+Good results can already be optained with a minimal basis sets. Slightly larger-than-minimal PAO
+basis sets can significantly increase the accuracy. However, they are also tougher to optimize and
+machine learn.
 
-Most of the PAO settings are in the
-[PAO](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO)
-sections:
+Most of the PAO settings are in the [PAO](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO) sections:
 
 ```
 &PAO
@@ -67,9 +57,7 @@ sections:
 &END PAO
 ```
 
-Settings for individual atomic kinds are in the
-[KIND](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND)
-section:
+Settings for individual atomic kinds are in the [KIND](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND) section:
 
 ```
 &KIND H
@@ -83,91 +71,72 @@ section:
 
 ### Tuning the PAO Optimization
 
-Finding the optimal PAO basis poses an intricate minimization problem,
-because the rotation matrix U and the Kohn-Sham matrix H have to be
-optimized in a self-consistent manner. In order to speedup the
-optimization, the Kohn-Sham matrix is only updated occasionally while
-most time is spend on optimizing U. This alternating scheme is
-controlled by two input parameters:
+Finding the optimal PAO basis poses an intricate minimization problem, because the rotation matrix U
+and the Kohn-Sham matrix H have to be optimized in a self-consistent manner. In order to speedup the
+optimization, the Kohn-Sham matrix is only updated occasionally while most time is spend on
+optimizing U. This alternating scheme is controlled by two input parameters:
 
 - The frequency with which H is recalculated is determined by
   [MAX_CYCLES](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MAX_CYCLES).
 - Overshooting during the U optimization is damped via
   [MIXING](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MIXING).
 
-The progress of the PAO optimization can be tracked from lines that
-start with `PAO| step`. The columns have the following meaning:
+The progress of the PAO optimization can be tracked from lines that start with `PAO| step`. The
+columns have the following meaning:
 
 ```
              step-num             energy          conv-crit. step-length   time
  PAO| step   1121                 -186.164843303  0.227E-06  0.120E+01     1.440
 ```
 
-- The step number counts the number of energy evaluation, ie. the
-  number of U matrices probed. It can increase with different
-  intervals, when the
-  [ADAPTive](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.LINE_SEARCH.METHOD)
-  line-search method is used. When the step number reaches
-  [MAX_PAO](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MAX_PAO)
-  then the optimization is terminated prematurely.
-- The energy is the quantity that is optimized. It contains **only the
-  first order term** of the total energy, ie. $Tr\[HP\]$, but shares
-  the same variational minima. It furthermore contains the
+- The step number counts the number of energy evaluation, ie. the number of U matrices probed. It
+  can increase with different intervals, when the
+  [ADAPTive](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.LINE_SEARCH.METHOD) line-search method is used.
+  When the step number reaches [MAX_PAO](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MAX_PAO) then the
+  optimization is terminated prematurely.
+- The energy is the quantity that is optimized. It contains **only the first order term** of the
+  total energy, ie. $Tr\[HP\]$, but shares the same variational minima. It furthermore contains the
   contributions from the various regularization terms.
-- The convergence criterion is the norm of the gradient normalized by
-  system size. It is compared against
-  [EPS_PAO](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.EPS_PAO)
-  to decided if the PAO optimization has converged. The overall
-  optimization is terminated if this convergence criterion is reached
-  within two steps after updating the Kohn-Sham matrix.
-- The step length is the outcome of the line search. It should be of
-  order 1. If it starts to behave erratically towards the end of the
-  optimization, this indicates that further optimization is hindered
-  by numerical accuracy e.g. from
-  [EPS_FILTER](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.EPS_FILTER)
-  or
+- The convergence criterion is the norm of the gradient normalized by system size. It is compared
+  against [EPS_PAO](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.EPS_PAO) to decided if the PAO
+  optimization has converged. The overall optimization is terminated if this convergence criterion
+  is reached within two steps after updating the Kohn-Sham matrix.
+- The step length is the outcome of the line search. It should be of order 1. If it starts to behave
+  erratically towards the end of the optimization, this indicates that further optimization is
+  hindered by numerical accuracy e.g. from
+  [EPS_FILTER](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.EPS_FILTER) or
   [EPS_SCF](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.EPS_SCF).
-- The time is the time spend on this optimization step in seconds.
-  This number can varry accordingly to the number of performed lines
-  search steps.
+- The time is the time spend on this optimization step in seconds. This number can varry accordingly
+  to the number of performed lines search steps.
 
 ## Step 4: Optimize machine learning hyper-parameters
 
-For the simulation of larger systems the PAO-ML scheme infers new PAO
-basis sets from the training data. For this two heuristics are employed:
-A [descriptor](<https://en.wikipedia.org/wiki/Feature_(machine_learning)>)
-and an inference algorithm. Currently, only one simple descriptor and
-[Gaussian processes](https://en.wikipedia.org/wiki/Gaussian_process) are
-implemented. However, this part offers great opportunities for future
-research.
-
-In order to obtain good results from the learning machinery a small
-number of so-called
-[hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter) have to
-be carefully tuned for each application. For the current implementation
-this includes the
-[GP_SCALE](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MACHINE_LEARNING.GP_SCALE)
-and the descriptor's
-[BETA](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.PAO_DESCRIPTOR.BETA)
-and
+For the simulation of larger systems the PAO-ML scheme infers new PAO basis sets from the training
+data. For this two heuristics are employed: A
+[descriptor](<https://en.wikipedia.org/wiki/Feature_(machine_learning)>) and an inference algorithm.
+Currently, only one simple descriptor and
+[Gaussian processes](https://en.wikipedia.org/wiki/Gaussian_process) are implemented. However, this
+part offers great opportunities for future research.
+
+In order to obtain good results from the learning machinery a small number of so-called
+[hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter) have to be carefully tuned for each
+application. For the current implementation this includes the
+[GP_SCALE](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MACHINE_LEARNING.GP_SCALE) and the descriptor's
+[BETA](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.PAO_DESCRIPTOR.BETA) and
 [SCREENING](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.PAO_DESCRIPTOR.SCREENING).
 
-For the optimization of the hyper-parameter exists no gradient, hence
-one has to use a derivative-free method like the one by
-[Powell](https://en.wikipedia.org/wiki/Powell%27s_method). A versatile
-implementation is e.g. the
-[scriptmini](https://github.com/cp2k/cp2k/tree/master/tools/scriptmini)
-tool. A good optimization criterion is the variance of the energy
-difference wrt. the primary basis across the training set.
-Alternatively, atomic forces could be compared. Despite the missing
-gradients, this optimization is rather quick because it only performs
-calculations in the small PAO basis set.
+For the optimization of the hyper-parameter exists no gradient, hence one has to use a
+derivative-free method like the one by [Powell](https://en.wikipedia.org/wiki/Powell%27s_method). A
+versatile implementation is e.g. the
+[scriptmini](https://github.com/cp2k/cp2k/tree/master/tools/scriptmini) tool. A good optimization
+criterion is the variance of the energy difference wrt. the primary basis across the training set.
+Alternatively, atomic forces could be compared. Despite the missing gradients, this optimization is
+rather quick because it only performs calculations in the small PAO basis set.
 
 ## Step 5: Run simulation with PAO-ML
 
 Most of the PAO-ML settings are in the
-[PAO/MACHINE_LEARNING](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MACHINE_LEARNING)
-sections:
+[PAO/MACHINE_LEARNING](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.MACHINE_LEARNING) sections:
 
 ```
 &PAO
@@ -194,8 +163,7 @@ sections:
 &END PAO
 ```
 
-Settings for individual atomic kinds are again in the
-[KIND](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND)
+Settings for individual atomic kinds are again in the [KIND](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND)
 section:
 
 ```
@@ -216,20 +184,16 @@ section:
 
 ## Debugging accuracy vs learnability trade-off
 
-When optimizing the PAO reference data in Step 3 one has to make a
-trade-off between accuracy and learnability. Good learnability means
-that similar structures leads to similar PAO parameters. In other words
-the PAO parameters should depend smoothly on the atomic positions. In
-general, the settings presented above should yield good results.
-However, if problems arise in the later machine learning steps, this
-might be the culprit.
+When optimizing the PAO reference data in Step 3 one has to make a trade-off between accuracy and
+learnability. Good learnability means that similar structures leads to similar PAO parameters. In
+other words the PAO parameters should depend smoothly on the atomic positions. In general, the
+settings presented above should yield good results. However, if problems arise in the later machine
+learning steps, this might be the culprit.
 
-Unfortunately, there is not yet a simple way to assess learnability. One
-way to investigate is to create a set of structures along a reaction
-coordinate, e.g. a dimer dissociation. One can then plot the numbers
-from the `Xblock` in the `.pao` files vs. the reaction coordinate.
+Unfortunately, there is not yet a simple way to assess learnability. One way to investigate is to
+create a set of structures along a reaction coordinate, e.g. a dimer dissociation. One can then plot
+the numbers from the `Xblock` in the `.pao` files vs. the reaction coordinate.
 
 The most critical parameters for learnability are
-[LINPOT_REGULARIZATION_DELTA](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.LINPOT_REGULARIZATION_DELTA)
-and the potential's
-[BETA](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.PAO_POTENTIAL.BETA).
+[LINPOT_REGULARIZATION_DELTA](#CP2K_INPUT.FORCE_EVAL.DFT.LS_SCF.PAO.LINPOT_REGULARIZATION_DELTA) and
+the potential's [BETA](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.PAO_POTENTIAL.BETA).
diff --git a/docs/units.md b/docs/units.md
index dfc4546da7..69c60109ef 100644
--- a/docs/units.md
+++ b/docs/units.md
@@ -4,7 +4,8 @@ CP2K Available Units of Measurement
 
 ## Energy
 
-Possible units of measurement for Energies. The `[energy]` entry acts like a dummy flag (assumes the unit of measurement of energy is in internal units), useful for dimensional analysis.
+Possible units of measurement for Energies. The `[energy]` entry acts like a dummy flag (assumes the
+unit of measurement of energy is in internal units), useful for dimensional analysis.
 
 - `hartree`
 - `wavenumber_e`
@@ -19,7 +20,8 @@ Possible units of measurement for Energies. The `[energy]` entry acts like a dum
 
 ## Length
 
-Possible units of measurement for Lengths. The `[length]` entry acts like a dummy flag (assumes the unit of measurement of length is in internal units), useful for dimensional analysis.
+Possible units of measurement for Lengths. The `[length]` entry acts like a dummy flag (assumes the
+unit of measurement of length is in internal units), useful for dimensional analysis.
 
 - `bohr`
 - `m`
@@ -30,7 +32,9 @@ Possible units of measurement for Lengths. The `[length]` entry acts like a dumm
 
 ## Temperature
 
-Possible units of measurement for Temperature. The `[temperature]` entry acts like a dummy flag (assumes the unit of measurement of temperature is in internal units), useful for dimensional analysis.
+Possible units of measurement for Temperature. The `[temperature]` entry acts like a dummy flag
+(assumes the unit of measurement of temperature is in internal units), useful for dimensional
+analysis.
 
 - `K`
 - `au_temp`
@@ -38,7 +42,8 @@ Possible units of measurement for Temperature. The `[temperature]` entry acts li
 
 ## Pressure
 
-Possible units of measurement for Pressure. The `[pressure]` entry acts like a dummy flag (assumes the unit of measurement of pressure is in internal units), useful for dimensional analysis.
+Possible units of measurement for Pressure. The `[pressure]` entry acts like a dummy flag (assumes
+the unit of measurement of pressure is in internal units), useful for dimensional analysis.
 
 - `bar`
 - `atm`
@@ -51,7 +56,8 @@ Possible units of measurement for Pressure. The `[pressure]` entry acts like a d
 
 ## Angle
 
-Possible units of measurement for Angles. The `[angle]` entry acts like a dummy flag (assumes the unit of measurement of angle is in internal units), useful for dimensional analysis.
+Possible units of measurement for Angles. The `[angle]` entry acts like a dummy flag (assumes the
+unit of measurement of angle is in internal units), useful for dimensional analysis.
 
 - `rad`
 - `deg`
@@ -59,7 +65,8 @@ Possible units of measurement for Angles. The `[angle]` entry acts like a dummy
 
 ## Time
 
-Possible units of measurement for Time. The `[time]` entry acts like a dummy flag (assumes the unit of measurement of time is in internal units), useful for dimensional analysis.
+Possible units of measurement for Time. The `[time]` entry acts like a dummy flag (assumes the unit
+of measurement of time is in internal units), useful for dimensional analysis.
 
 - `s`
 - `fs`
@@ -70,7 +77,8 @@ Possible units of measurement for Time. The `[time]` entry acts like a dummy fla
 
 ## Mass
 
-Possible units of measurement for Masses. The `[mass`\] entry acts like a dummy flag (assumes the unit of measurement of mass is in internal units), useful for dimensional analysis.
+Possible units of measurement for Masses. The `[mass`\] entry acts like a dummy flag (assumes the
+unit of measurement of mass is in internal units), useful for dimensional analysis.
 
 - `kg`
 - `amu`
@@ -79,7 +87,9 @@ Possible units of measurement for Masses. The `[mass`\] entry acts like a dummy
 
 ## Potential
 
-Possible units of measurement for potentials. The `[potential]` entry acts like a dummy flag (assumes the unit of measurement of potential is in internal units), useful for dimensional analysis.
+Possible units of measurement for potentials. The `[potential]` entry acts like a dummy flag
+(assumes the unit of measurement of potential is in internal units), useful for dimensional
+analysis.
 
 - `volt`
 - `au_pot`
@@ -87,7 +97,8 @@ Possible units of measurement for potentials. The `[potential]` entry acts like
 
 ## Force
 
-Possible units of measurement for forces. The `[force]` entry acts like a dummy flag (assumes the unit of measurement of force is in internal units), useful for dimensional analysis.
+Possible units of measurement for forces. The `[force]` entry acts like a dummy flag (assumes the
+unit of measurement of force is in internal units), useful for dimensional analysis.
 
 - `N`
 - `Newton`
diff --git a/src/README.md b/src/README.md
index d5435fe703..a105b42f22 100644
--- a/src/README.md
+++ b/src/README.md
@@ -2,4 +2,5 @@
 
 ## Code Structure
 
-For information on CP2K's code structure, please refer to the [code structure documentation](https://www.cp2k.org/dev:codestructure).
+For information on CP2K's code structure, please refer to the
+[code structure documentation](https://www.cp2k.org/dev:codestructure).
diff --git a/src/dbm/README.md b/src/dbm/README.md
index fd36fddf50..a94255c8d5 100644
--- a/src/dbm/README.md
+++ b/src/dbm/README.md
@@ -1,13 +1,12 @@
 # DBM: Distributed Block-sparse Matrices
 
-The DBM is a drop-in replacement for [DBCSR](https://github.com/cp2k/dbcsr)
-written in C. For the time being only features required by [DBT](../dbt/) are implemented.
+The DBM is a drop-in replacement for [DBCSR](https://github.com/cp2k/dbcsr) written in C. For the
+time being only features required by [DBT](../dbt/) are implemented.
 
 ## Storage
 
 The DBM uses [coordinate lists](<https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)>)
-as internal storage format.
-An existing block is represented by the following data structure:
+as internal storage format. An existing block is represented by the following data structure:
 
 ```C
 typedef struct {
@@ -17,8 +16,9 @@ typedef struct {
 } dbm_block_t;
 ```
 
-To allow for efficient OpenMP parallelism the blocks are [sharded](<https://en.wikipedia.org/wiki/Shard_(database_architecture)>)
-across threads similar to how they are distributed across MPI ranks:
+To allow for efficient OpenMP parallelism the blocks are
+[sharded](<https://en.wikipedia.org/wiki/Shard_(database_architecture)>) across threads similar to
+how they are distributed across MPI ranks:
 
 ```C
 int dbm_get_shard_index(const dbm_matrix_t *matrix, const int row, const int col) {
@@ -30,8 +30,8 @@ int dbm_get_shard_index(const dbm_matrix_t *matrix, const int row, const int col
 
 ## MPI Communication
 
-The communication scheme in [dbm_multiply_comm.c](./dbm_multiply_comm.c) is decoupled
-from the local multiplication in [dbm_multiply.c](./dbm_multiply.c) via the
+The communication scheme in [dbm_multiply_comm.c](./dbm_multiply_comm.c) is decoupled from the local
+multiplication in [dbm_multiply.c](./dbm_multiply.c) via the
 [iterator pattern](https://en.wikipedia.org/wiki/Iterator_pattern):
 
 ```C
@@ -45,9 +45,9 @@ while (dbm_comm_iterator_next(iter, &pack_a, &pack_b)) {
 ## Backends
 
 The last stage of the multiplication are the backends for specific hardware, e.g.
-[CPU](./dbm_multiply_cpu.c) and [CUDA](./dbm_multiply_cuda.cu).
-They are passed batches of task for processing. Each task describes a single block
-multiplication. A simplest backend implementation looks like this:
+[CPU](./dbm_multiply_cpu.c) and [CUDA](./dbm_multiply_cuda.cu). They are passed batches of task for
+processing. Each task describes a single block multiplication. A simplest backend implementation
+looks like this:
 
 ```C
 for (int itask = 0; itask < ntasks; itask++) {
diff --git a/src/dbt/README.md b/src/dbt/README.md
index ee6bc3c07e..b9e0782e6c 100644
--- a/src/dbt/README.md
+++ b/src/dbt/README.md
@@ -1,8 +1,8 @@
 # DBT: Distributed Block-sparse Tensors
 
-The DBT is a lightweight fork of [DBCSR Tensors](https://github.com/cp2k/dbcsr/tree/develop/src/tensors)
-and the related [TAS layer](https://github.com/cp2k/dbcsr/tree/develop/src/tas)
-for Tall-and-Skinny matrices.
+The DBT is a lightweight fork of
+[DBCSR Tensors](https://github.com/cp2k/dbcsr/tree/develop/src/tensors) and the related
+[TAS layer](https://github.com/cp2k/dbcsr/tree/develop/src/tas) for Tall-and-Skinny matrices.
 
 The code was originally written by Patrick Seewald and is well documented in his
 [PhD thesis](https://www.cp2k.org/_media/docs:phd_thesis_patrick_seewald.pdf).
diff --git a/src/grid/README.md b/src/grid/README.md
index 0b7c830fb8..5346c7ed0e 100644
--- a/src/grid/README.md
+++ b/src/grid/README.md
@@ -1,18 +1,18 @@
 # grid: High performance primitives to power GPW & Co
 
-This package hosts the performance critical grid operations of cp2k. The code is
-entirely written in C and can be built stand-alone in order to provide a good
-separation of concerns between computational chemists and performance engineers.
+This package hosts the performance critical grid operations of cp2k. The code is entirely written in
+C and can be built stand-alone in order to provide a good separation of concerns between
+computational chemists and performance engineers.
 
 Currently, this package offers the following main features:
 
 - Collocate a single task, see `grid_ref_collocate_pgf_product` in
   [grid_ref_collocate.h](ref/grid_ref_collocate.h) for details.
-- Collocate a list of tasks, see `grid_collocate_task_list` in
-  [grid_task_list.h](grid_task_list.h) for details.
+- Collocate a list of tasks, see `grid_collocate_task_list` in [grid_task_list.h](grid_task_list.h)
+  for details.
 
-In order to support diverse hardware architectures different backends are available.
-Currently, the following backends exist:
+In order to support diverse hardware architectures different backends are available. Currently, the
+following backends exist:
 
 - [ref](./ref/): A reference implementations for documentation and validation purposes.
 - [cpu](./cpu/): A performance optimized implementation for x86 CPUs.
@@ -22,16 +22,16 @@ Currently, the following backends exist:
 
 ## The .task files
 
-For debugging all collocations by the CPU backend can be written to .task
-files. To enable this feature edit the following line in [grid_cpu_collocate.c](cpu/grid_cpu_collocate.c):
+For debugging all collocations by the CPU backend can be written to .task files. To enable this
+feature edit the following line in [grid_cpu_collocate.c](cpu/grid_cpu_collocate.c):
 
 ```C
 // Set this to true to write each task to a file.
 const bool DUMP_TASKS = false;
 ```
 
-The files are given sequential names like `grid_collocate_00123.task`.
-Beware that MPI ranks can overwrite each other's files.
+The files are given sequential names like `grid_collocate_00123.task`. Beware that MPI ranks can
+overwrite each other's files.
 
 The resulting .task files are human readable and diffable:
 
@@ -49,9 +49,9 @@ For more information see [grid_replay.c](grid_replay.c).
 
 ## MiniApp
 
-The `grid_miniapp.x` binary allows to run individual .task files.
-By default `grid_ref_collocate_pgf_product` is called. When the `--batch` flag
-is set then `grid_collocate_task_list` is called instead.
+The `grid_miniapp.x` binary allows to run individual .task files. By default
+`grid_ref_collocate_pgf_product` is called. When the `--batch` flag is set then
+`grid_collocate_task_list` is called instead.
 
 ```shell
 $ cd cp2k/src/grid
@@ -65,10 +65,9 @@ Task: ./sample_tasks/ortho_density_l2200.task                   Collocate Batche
 
 ## Unit Test
 
-The `grid_unittest.x` binary runs the .task files from the
-[sample_tasks](./sample_tasks/) directory - with and without batching.
-Beware that this is merely a smoke test.
-The cp2k regtest suite provides much more thorough testing.
+The `grid_unittest.x` binary runs the .task files from the [sample_tasks](./sample_tasks/) directory
+\- with and without batching. Beware that this is merely a smoke test. The cp2k regtest suite
+provides much more thorough testing.
 
 ```shell
 $ cd cp2k/src/grid
diff --git a/src/pw/fpga/README.md b/src/pw/fpga/README.md
index b1fde56d0b..c805a42887 100644
--- a/src/pw/fpga/README.md
+++ b/src/pw/fpga/README.md
@@ -2,8 +2,8 @@
 
 ## Finding the kernel code
 
-- See <https://github.com/pc2/fft3d-fpga> for the kernel code and instructions
-  on synthesizing them for INTEL FPGAs.
+- See <https://github.com/pc2/fft3d-fpga> for the kernel code and instructions on synthesizing them
+  for INTEL FPGAs.
 
 ### Copy the bitstream generated to a particular path
 
@@ -15,5 +15,5 @@
   - Therefore, a 16^3 FFT3d file should be copied to `syn16` folder.
   - The sizes supported are 16^3, 32^3, 64^3.
 - If the required FFT size is different from the default options:
-  - modify the switch case in the `fft_fpga.c` to include the required size
-    and the path to the location of the bitstream.
+  - modify the switch case in the `fft_fpga.c` to include the required size and the path to the
+    location of the bitstream.
diff --git a/src/start/python/README.md b/src/start/python/README.md
index d39e5b5442..8ce904cf47 100644
--- a/src/start/python/README.md
+++ b/src/start/python/README.md
@@ -2,29 +2,26 @@
 
 ## Installation
 
-There is a target `py-cython-bindings` in the global `Makefile` to build the
-Python bindings. The shared object can be found in:
-`<CP2K_SOURCE_DIR>/lib/<ARCH>/<VERSION>/python`
+There is a target `py-cython-bindings` in the global `Makefile` to build the Python bindings. The
+shared object can be found in: `<CP2K_SOURCE_DIR>/lib/<ARCH>/<VERSION>/python`
 
 Only the Python headers and a NumPy installation are required.
 
 ## Development
 
-To regenerate the C file from the `cp2k.pyx`, `Cython` is required and should
-be called as follows:
+To regenerate the C file from the `cp2k.pyx`, `Cython` is required and should be called as follows:
 
 ```sh
 cd <CP2K_SOURCE_DIR>/src/start/python
 cython cp2k.pyx
 ```
 
-Unittests can be found in the `test/` directory. They must be run in separate
-Python interpreter instances due to side effects in the library.
+Unittests can be found in the `test/` directory. They must be run in separate Python interpreter
+instances due to side effects in the library.
 
 ## Known Issues
 
-- If libcp2k is built with MPI support, you may get an MPI initialization error
-  depending on your MPI implementation/configuration. In that case MPI must be
-  initialized first by using Mpi4py and the Fortran MPI communicator handler
-  must be passed down the CP2K via the respective `...comm` functions.
-  The reason for this is documented here: <https://github.com/jhedev/mpi_python>
+- If libcp2k is built with MPI support, you may get an MPI initialization error depending on your
+  MPI implementation/configuration. In that case MPI must be initialized first by using Mpi4py and
+  the Fortran MPI communicator handler must be passed down the CP2K via the respective `...comm`
+  functions. The reason for this is documented here: <https://github.com/jhedev/mpi_python>
diff --git a/tests/Fist/EAM_LIB/README.md b/tests/Fist/EAM_LIB/README.md
index 0e5dde6c72..c87610efa0 100644
--- a/tests/Fist/EAM_LIB/README.md
+++ b/tests/Fist/EAM_LIB/README.md
@@ -1,6 +1,6 @@
 # Fist EAM Regression Tests
 
-The potentials for Ag, Pt and Au listed here are converted from the [LAMMPS distribution](http://lammps.sandia.gov).
-The mixed file is generated by multiplication of Za(R)\*Zb(R)/R as prescribed in
-Foiles et al., PRB 33 (1986) 7983.
-Note that the grid is quite loose, so `EMAX_SPLINE` is set to a high value (D. Passerone).
+The potentials for Ag, Pt and Au listed here are converted from the
+[LAMMPS distribution](http://lammps.sandia.gov). The mixed file is generated by multiplication of
+Za(R)\*Zb(R)/R as prescribed in Foiles et al., PRB 33 (1986) 7983. Note that the grid is quite
+loose, so `EMAX_SPLINE` is set to a high value (D. Passerone).
diff --git a/tests/README.md b/tests/README.md
index 708fba566b..304866f669 100644
--- a/tests/README.md
+++ b/tests/README.md
@@ -2,37 +2,35 @@
 
 This directory contains input files for CP2K's tests and regression tests.
 
-Automatic test results are collected on [CP2K's dashboard](https://dashboard.cp2k.org)
-for different machines. For documentation on CP2K's input files, please refer to
-the [Input Reference Manual](https://manual.cp2k.org/).
+Automatic test results are collected on [CP2K's dashboard](https://dashboard.cp2k.org) for different
+machines. For documentation on CP2K's input files, please refer to the
+[Input Reference Manual](https://manual.cp2k.org/).
 
-**Note:** the test names make common use of acronyms.
-For explanations, please refer to the [Glossary of Acronyms and Abbreviations](https://www.cp2k.org/acronyms).
+**Note:** the test names make common use of acronyms. For explanations, please refer to the
+[Glossary of Acronyms and Abbreviations](https://www.cp2k.org/acronyms).
 
 ## Regression Tests
 
-There is a very large number of regtests. For this reason, each individual
-regtest should be fast (e.g. shorter than a minute on a regular laptop with an
-sdbg version of the code). Since these tests do not need to return meaningful
-results (just consistent results), one can use e.g. small basis sets,
-low cutoffs, small EPS_DEFAULT, ...
+There is a very large number of regtests. For this reason, each individual regtest should be fast
+(e.g. shorter than a minute on a regular laptop with an sdbg version of the code). Since these tests
+do not need to return meaningful results (just consistent results), one can use e.g. small basis
+sets, low cutoffs, small EPS_DEFAULT, ...
 
 ### Test Directories Structure
 
 The test-suite is fully controlled by the following files:
 
 - [`TEST_DIRS`](TEST_DIRS) is a list of directories that contain tests.
-- [`TEST_TYPES`](TEST_TYPES) defines test types. I.e. specifies which words
-  should be grepped and what field should be used in the numerical comparison.
+- [`TEST_TYPES`](TEST_TYPES) defines test types. I.e. specifies which words should be grepped and
+  what field should be used in the numerical comparison.
 
-Additionally, each test-subdirectory contain `TEST_FILES`, which lists the input
-files that need to be executed for the regression test.
-These files will be run in order.
+Additionally, each test-subdirectory contain `TEST_FILES`, which lists the input files that need to
+be executed for the regression test. These files will be run in order.
 
 Some regression testing directories contain:
 
-- `untested_inputs`: list of input files, which have a more meaningful setup
-  compared to the regtests, but that are not checked at every single commit.
+- `untested_inputs`: list of input files, which have a more meaningful setup compared to the
+  regtests, but that are not checked at every single commit.
 
 ### How to Run Regression Tests
 
@@ -41,5 +39,5 @@ For information on how to run regression testing, please refer to the
 
 ### How to Add a Regression Test
 
-To add a regression test, commit the `.inp` file and add a corresponding entry
-to [`TEST_DIRS`](TEST_DIRS) and `TEST_FILES`.
+To add a regression test, commit the `.inp` file and add a corresponding entry to
+[`TEST_DIRS`](TEST_DIRS) and `TEST_FILES`.
diff --git a/tools/benchmark_plots/README.md b/tools/benchmark_plots/README.md
index 05a4552fb4..bd80de82e5 100644
--- a/tools/benchmark_plots/README.md
+++ b/tools/benchmark_plots/README.md
@@ -2,7 +2,7 @@
 
 Two python scripts are provided to help plot benchmark results:
 
-- `plot_benchmark.py`  - Plots results for an individual benchmark
+- `plot_benchmark.py` - Plots results for an individual benchmark
 - `plot_comparison.py` - Plots a comparison of results for an individual benchmark
 
 Library requirements:
@@ -26,12 +26,10 @@ Library requirements:
 
 ## `plot_benchmark.py` assumptions
 
-The results will be read from files called:
-`<machine>_benchmarks/<name>/<name>_besttimes.txt`
-(converted to lower case) in the directory defined as `BASE`.
-This `BASE` directory should be edited directly in the script to set the
-correct location for the benchmark results. See below for the format
-of this file.
+The results will be read from files called: `<machine>_benchmarks/<name>/<name>_besttimes.txt`
+(converted to lower case) in the directory defined as `BASE`. This `BASE` directory should be edited
+directly in the script to set the correct location for the benchmark results. See below for the
+format of this file.
 
 ## `plot_comparison.py` usage
 
@@ -44,24 +42,20 @@ of this file.
 
 - `--name`: Name of the benchmark to plot results for, e.g. fayalite-FIST
 - `--machine`: List of machines to plot results for, e.g. Magnus ARCHER HECToR
-- `--shift`: Fractional shifts of label in y-direction, for fine tuning,
-  e.g. `1.1 1.1 0.9`.
-  If less shifts are specified than machines then they will be used in rotation.
+- `--shift`: Fractional shifts of label in y-direction, for fine tuning, e.g. `1.1 1.1 0.9`. If less
+  shifts are specified than machines then they will be used in rotation.
 - `--show`: Show the plot window as well as saving to a file
 
 ## `plot_comparison.py` assumptions
 
-The results will be read from files called:
-`<machine>_benchmarks/<name>/<name>_besttimes.txt`
-(converted to lower case) in the directory defined as `BASE`.
-This `BASE` directory should be edited directly in the script to set the
-correct location for the benchmark results. See below for the format
-of this file.
+The results will be read from files called: `<machine>_benchmarks/<name>/<name>_besttimes.txt`
+(converted to lower case) in the directory defined as `BASE`. This `BASE` directory should be edited
+directly in the script to set the correct location for the benchmark results. See below for the
+format of this file.
 
 ## Result File Format
 
-The result files should contain the result for a single run on each line in
-the form:
+The result files should contain the result for a single run on each line in the form:
 
 ```output
 <nodes> <time> <configuration>
@@ -71,5 +65,5 @@ Where:
 
 - `<nodes>`: is the number of nodes run on
 - `<time>`: is the time taken in seconds
-- `<configuration>`: is the configuration for the best time, eg `2_TH` if the
-  best result is obtained for 2 OpenMP threads per MPI task.
+- `<configuration>`: is the configuration for the best time, eg `2_TH` if the best result is
+  obtained for 2 OpenMP threads per MPI task.
diff --git a/tools/dashboard/README.md b/tools/dashboard/README.md
index 1700dc4583..ab8712b488 100644
--- a/tools/dashboard/README.md
+++ b/tools/dashboard/README.md
@@ -2,14 +2,13 @@
 
 [![Screenshot of the Dashboard](./screenshot.png)](https://dashboard.cp2k.org)
 
-The CP2K dashboard is hosted at
-<https://dashboard.cp2k.org>. It is the central place where we collect
-automatic test results.
+The CP2K dashboard is hosted at <https://dashboard.cp2k.org>. It is the central place where we
+collect automatic test results.
 
 ## Main View
 
-The main view shows the latest results from all testers. The table
-columns have the following meaning:
+The main view shows the latest results from all testers. The table columns have the following
+meaning:
 
 | Column  | Meaning                                                             | Link Target            |
 | ------- | ------------------------------------------------------------------- | ---------------------- |
@@ -34,24 +33,21 @@ columns have the following meaning:
 ## How does it work?
 
 The HTML pages that make up the dashboard are generated by the script
-[generate_dashboard.py](./generate_dashboard.py).
-It is run every 5 minutes by a
+[generate_dashboard.py](./generate_dashboard.py). It is run every 5 minutes by a
 [cron-job](https://en.wikipedia.org/wiki/Cron).
 
 For each tester it performs the following steps:
 
 1. fetch latest report from `report_url`
 1. parse report
-1. if fetching and parsing was successful, make a copy of the report
-   for the archive
-1. if the test status is FAILED and the tester has notifications
-   enabled, send emails to responsible author(s).
+1. if fetching and parsing was successful, make a copy of the report for the archive
+1. if the test status is FAILED and the tester has notifications enabled, send emails to responsible
+   author(s).
 
 ## Adding a Tester
 
-To add a new tester to the dashboard, simply edit the [dashboard.conf](./dashboard.conf).
-The file has the format of the python
-[configparser](https://docs.python.org/3/library/configparser.html). A
+To add a new tester to the dashboard, simply edit the [dashboard.conf](./dashboard.conf). The file
+has the format of the python [configparser](https://docs.python.org/3/library/configparser.html). A
 typical entry looks like this:
 
 ```
@@ -91,25 +87,22 @@ Summary: <text>
 Status: <OK/FAILED/UNKNOWN>
 ```
 
-The output from a [regtest run](../regtesting) already contains the
-necessary lines.
+The output from a [regtest run](../regtesting) already contains the necessary lines.
 
 ## Bulk-Download of Archived Reports
 
-Over time the dashboard archive has become quite a resource on its own.
-To allow for bulk-downloads of the reports two url-lists are provided:
+Over time the dashboard archive has become quite a resource on its own. To allow for bulk-downloads
+of the reports two url-lists are provided:
 
 - A full list containing all reports in the archive:
   <http://dashboard.cp2k.org/archive/list_full.txt>
 - A recent list containing only reports from the last 100 commits:
   <http://dashboard.cp2k.org/archive/list_recent.txt>
 
-You can conveniently download all reports in a list with
-[wget](https://www.gnu.org/software/wget/):
+You can conveniently download all reports in a list with [wget](https://www.gnu.org/software/wget/):
 
 ```
 $ wget -nH -Nxi http://dashboard.cp2k.org/archive/list_recent.txt
 ```
 
-Added bonus: If you run the wget-command repeatedly, it'll only
-download the new reports.
+Added bonus: If you run the wget-command repeatedly, it'll only download the new reports.
diff --git a/tools/docker/README.md b/tools/docker/README.md
index c42dd9eb03..be15bb7082 100644
--- a/tools/docker/README.md
+++ b/tools/docker/README.md
@@ -1,11 +1,10 @@
 # Docker Containers for Testing
 
-> **Note**
-> For production containers please visit: <http://github.com/cp2k/cp2k-containers>.
+> **Note** For production containers please visit: <http://github.com/cp2k/cp2k-containers>.
 
-This directory hosts docker files for testing cp2k.
-They are mostly used by the [cp2k-ci](https://github.com/cp2k/cp2k-ci)
-to check pull requests and populate the [dashboard](https://dashboard.cp2k.org).
+This directory hosts docker files for testing cp2k. They are mostly used by the
+[cp2k-ci](https://github.com/cp2k/cp2k-ci) to check pull requests and populate the
+[dashboard](https://dashboard.cp2k.org).
 
 To run a test one simply has to build the image:
 
diff --git a/tools/doxify/README.md b/tools/doxify/README.md
index 230e167998..c9bb15a2ff 100644
--- a/tools/doxify/README.md
+++ b/tools/doxify/README.md
@@ -1,23 +1,20 @@
 # DOXYGEN
 
-This file describes the scripts used to add missing comments and parse existing
-comments for subroutines and functions in CP2K. The objective of the scripts is
-to ensure that all subroutines and functions in CP2K have a standard DOXYGEN
-comment block and to flag any missing comments such that the CP2K developers can
-add/update these when time is available.
+This file describes the scripts used to add missing comments and parse existing comments for
+subroutines and functions in CP2K. The objective of the scripts is to ensure that all subroutines
+and functions in CP2K have a standard DOXYGEN comment block and to flag any missing comments such
+that the CP2K developers can add/update these when time is available.
 
-The driver script `doxyify.sh` will process CP2K source code (`*.F` files) processing
-one file at a time. The following steps are carried out for each `*.F` file:
+The driver script `doxyify.sh` will process CP2K source code (`*.F` files) processing one file at a
+time. The following steps are carried out for each `*.F` file:
 
 1. Run `remove_double_ampersands.pl` - removes any any double ampersand characters
-1. Run `fixcomments.pl` - does the addition of missing comments or updating of
-   existing comments
-1. Run `remove_extra_comments.pl` - removes any extra comment lines which arise
-   from application of `fixcomments.pl`
+1. Run `fixcomments.pl` - does the addition of missing comments or updating of existing comments
+1. Run `remove_extra_comments.pl` - removes any extra comment lines which arise from application of
+   `fixcomments.pl`
 1. Finally, overwrite the original `*.F` file with the updated version
 
-To run the script, you should execute it with the name of the source file you
-want to process:
+To run the script, you should execute it with the name of the source file you want to process:
 
 ```shell
 ./doxyify.sh full_path_to_cp2k_src/myfile.F
@@ -29,8 +26,8 @@ or with a list of files, e.g.
 ./doxyify.sh full_path_to_cp2k_src/myfile.F full_path_to_cp2k_src/myfile2.F full_path_to_cp2k_src/myfile2.F
 ```
 
-In debug mode, the script will produce a lot of output if the whole source tree
-is processed so you may wish to redirect the stdout to file, e.g.
+In debug mode, the script will produce a lot of output if the whole source tree is processed so you
+may wish to redirect the stdout to file, e.g.
 
 ```shell
 ./doxyify.sh > comments.txt
@@ -38,8 +35,8 @@ is processed so you may wish to redirect the stdout to file, e.g.
 
 ## `remove_double_ampersands.pl`
 
-Removes any double ampersand characters that occur at the end of one line and
-the start of the next line, e.g:
+Removes any double ampersand characters that occur at the end of one line and the start of the next
+line, e.g:
 
 ```fortran
 SUBROUTINE fred(a,b, &
@@ -53,15 +50,15 @@ SUBROUTINE fred(a,b, &
      c,d)
 ```
 
-This is performed throughout the source code and is necessary as the ampersand
-is used by `fixcomments.pl` to determine whether a subroutine/function definition
-extends across multiple lines.
+This is performed throughout the source code and is necessary as the ampersand is used by
+`fixcomments.pl` to determine whether a subroutine/function definition extends across multiple
+lines.
 
 ## `fixcomments.pl`
 
-The script adds in missing comments and parses existing comments checking whether
-any data is missing. For procedures with no existing comments, e.g. a subroutine
-with 3 arguments `a`, `b` and `c`:
+The script adds in missing comments and parses existing comments checking whether any data is
+missing. For procedures with no existing comments, e.g. a subroutine with 3 arguments `a`, `b` and
+`c`:
 
 ```fortran
 SUBROUTINE fred (a,b,c)
@@ -81,72 +78,63 @@ SUBROUTINE fred (a,b,c)
 
 The scripts works as follows:
 
-- We read through the source file until we see a comment block e.g. something
-  beginning with `!>`
+- We read through the source file until we see a comment block e.g. something beginning with `!>`
 
-- We then loop over the comment block looking for entries for `\brief`, `\param`,
-  `\date`, `\par`, `\version`, `\note`, and `\author`. If any of these items
-  exist we store the details in separate variables. We also keep a copy of the
-  entire comment block in the `oldheader` variable as we need to print the
-  comment block out unchanged for MODULE and TYPE. When parsing the header we
-  ensure that we don't throw any text away.
+- We then loop over the comment block looking for entries for `\brief`, `\param`, `\date`, `\par`,
+  `\version`, `\note`, and `\author`. If any of these items exist we store the details in separate
+  variables. We also keep a copy of the entire comment block in the `oldheader` variable as we need
+  to print the comment block out unchanged for MODULE and TYPE. When parsing the header we ensure
+  that we don't throw any text away.
 
-- Once the comment block has been read in we continue reading in the code line
-  by line. If we don't encounter any procedures then the output file will be
-  identical to the input file.
+- Once the comment block has been read in we continue reading in the code line by line. If we don't
+  encounter any procedures then the output file will be identical to the input file.
 
-- However, if we encounter a procedure we then read in the line(s) containing
-  the definition, e.g.
+- However, if we encounter a procedure we then read in the line(s) containing the definition, e.g.
 
   ```fortran
   SUBROUTINE fred(a,b,c)
   ```
 
-  to an array called `@string`. We do this by splitting over space, comma,
-  brackets etc. We also take into account of the fact a subroutine / function
-  definition can extend over multiple lines via the ampersand (`&`) character
-  and ampersand variable. The `@string` array then contains the type of procedure,
-  its name and the arguments, e.g. for the example above `@string` would contain
+  to an array called `@string`. We do this by splitting over space, comma, brackets etc. We also
+  take into account of the fact a subroutine / function definition can extend over multiple lines
+  via the ampersand (`&`) character and ampersand variable. The `@string` array then contains the
+  type of procedure, its name and the arguments, e.g. for the example above `@string` would contain
   5 elements: `SUBROUTINE`, `fred`, `a`, `b`, `c`.
 
-- Once we know the name and arguments of the procedure we loop over the arguments
-  checking to see whether the comment block contained a match. If it is does we
-  use the existing text, if no match is found then the standard text detailed
-  above is output. We also check whether any text exists for the `\brief` entry
-  and re-use this if available, otherwise the `\brief ...` text is added as
-  detailed above. The script can also handle `\date`, `\par`, `\author`, `\note`
-  and `\version` entries if these should be required in future. At present these
-  entries are copied across unchanged if they exist and otherwise they are not
-  added in.
+- Once we know the name and arguments of the procedure we loop over the arguments checking to see
+  whether the comment block contained a match. If it is does we use the existing text, if no match
+  is found then the standard text detailed above is output. We also check whether any text exists
+  for the `\brief` entry and re-use this if available, otherwise the `\brief ...` text is added as
+  detailed above. The script can also handle `\date`, `\par`, `\author`, `\note` and `\version`
+  entries if these should be required in future. At present these entries are copied across
+  unchanged if they exist and otherwise they are not added in.
 
-- Finally anything else that was found in the comment block is output and suitably
-  annotated. For example unmatched procedure arguments are flagged up, lines
-  which begin with `!>`, `!`, or `!> \something_or_other` are also annotated.
+- Finally anything else that was found in the comment block is output and suitably annotated. For
+  example unmatched procedure arguments are flagged up, lines which begin with `!>`, `!`, or
+  `!> \something_or_other` are also annotated.
 
 - The annotations used for marking up the comment block are as follows:
 
-  - `...` Added to `\brief` and `\param` only if no existing information is
-    available. Other commonly occurring entries such as `\par`, `\author`,
-    `\version`, `\note` and `\date` if they exist are passed through unchanged.
+  - `...` Added to `\brief` and `\param` only if no existing information is available. Other
+    commonly occurring entries such as `\par`, `\author`, `\version`, `\note` and `\date` if they
+    exist are passed through unchanged.
 
-  - `UNMATCHED_PROCEDURE_ARGUMENT` - gets appended on to any procedure arguments
-    in the comment block that don't match those in the procedure definition.
+  - `UNMATCHED_PROCEDURE_ARGUMENT` - gets appended on to any procedure arguments in the comment
+    block that don't match those in the procedure definition.
 
-  - `UNKNOWN_DOXYGEN_COMMENT` - for parts of the comment block which look like a
-    standard Doxygen block (e.g. lines that look like: `!> \something or other`)
-    but not part of the standard header above.
+  - `UNKNOWN_DOXYGEN_COMMENT` - for parts of the comment block which look like a standard Doxygen
+    block (e.g. lines that look like: `!> \something or other`) but not part of the standard header
+    above.
 
-  - `UNKNOWN_COMMENT` - lines that begin `!>` or `!` inside the comment block.
-    These get placed into a `!> \note with UNKNOWN_COMMENT` added at the start
-    of the line.
+  - `UNKNOWN_COMMENT` - lines that begin `!>` or `!` inside the comment block. These get placed into
+    a `!> \note with UNKNOWN_COMMENT` added at the start of the line.
 
-The script also includes checks to ensure that the annotations are not added to
-a line multiple times.
+The script also includes checks to ensure that the annotations are not added to a line multiple
+times.
 
 ## `remove_extra_comments.pl`
 
-Post-processing script to remove any double lines which begin `! ******`
-e.g.
+Post-processing script to remove any double lines which begin `! ******` e.g.
 
 ```fortran
 ! *****************************************************************************
@@ -159,5 +147,5 @@ gets changed to
 ! *****************************************************************************
 ```
 
-This is necessary because occasionally a procedure may be missing the start
-or end `! ****` line or may have a duplicate one.
+This is necessary because occasionally a procedure may be missing the start or end `! ****` line or
+may have a duplicate one.
diff --git a/tools/input_editing/emacs/README.md b/tools/input_editing/emacs/README.md
index a5e5fab9a6..a5df00ebdc 100644
--- a/tools/input_editing/emacs/README.md
+++ b/tools/input_editing/emacs/README.md
@@ -1,7 +1,7 @@
 # `cp2k-mode.el`
 
-`cp2k-mode.el` provides a major mode in emacs for editing CP2K input
-files. It has been tested on emacs 2.1, 2.3 and 2.4.
+`cp2k-mode.el` provides a major mode in emacs for editing CP2K input files. It has been tested on
+emacs 2.1, 2.3 and 2.4.
 
 ## Functionalities
 
@@ -11,19 +11,16 @@ files. It has been tested on emacs 2.1, 2.3 and 2.4.
   - the keywords
   - the comment lines
 - Indents lines according to the CP2K input syntax using `<tab>` key
-- Input sections can be folded or unfolded, using emacs outline
-  minor mode.
-  - `outline-toggle-children` when called on a unfolded section, folds
-    the section recursively; when called on a folded section,
-    unfolds the top level tree.
+- Input sections can be folded or unfolded, using emacs outline minor mode.
+  - `outline-toggle-children` when called on a unfolded section, folds the section recursively; when
+    called on a folded section, unfolds the top level tree.
   - `show-all` when called unfolds all sections recursively.
   - `show-subtree` when called unfolds a folded section recursively.
 - New interactive functions:
   - `cp2k-indent-line`: indents the line according to CP2K input syntax.
-  - `cp2k-beginning-of-block`:
-    goes to the beginning of the subsection, marks the current cursor position.
-  - `cp2k-end-of-block`:
-    goes to the ending of the subsection, marks the current cursor position.
+  - `cp2k-beginning-of-block`: goes to the beginning of the subsection, marks the current cursor
+    position.
+  - `cp2k-end-of-block`: goes to the ending of the subsection, marks the current cursor position.
 
 ### Key Bindings
 
@@ -40,24 +37,23 @@ files. It has been tested on emacs 2.1, 2.3 and 2.4.
 
 ## Installation
 
-You need to put `cp2k-mode.el` in one of your local emacs lisp
-directories, which is in the search path of your emacs installation.
+You need to put `cp2k-mode.el` in one of your local emacs lisp directories, which is in the search
+path of your emacs installation.
 
 ### Adding to emacs's search path
 
-If you have never installed any packages manually before, and do not
-know the search path of your emacs installation, then in your home
-directory create directory:
+If you have never installed any packages manually before, and do not know the search path of your
+emacs installation, then in your home directory create directory:
 
 ```shell
 mkdir ~/.emacs.d/lisp/
 ```
 
-This is the usual place where the local/user defined emacs lisp
-files are installed. Move `cp2k-mode.el` to `~/.emacs.d/lisp/`.
+This is the usual place where the local/user defined emacs lisp files are installed. Move
+`cp2k-mode.el` to `~/.emacs.d/lisp/`.
 
-Then, add the following in your `.emacs` file (which should be in your
-home directory, and if it does not exist, create one):
+Then, add the following in your `.emacs` file (which should be in your home directory, and if it
+does not exist, create one):
 
 ```emacs
 (add-to-list 'load-path "~/.emacs.d/lisp/")
@@ -67,8 +63,8 @@ This tells emacs to add `~/.emacs.d/lisp/` to its search path for `*.el` files.
 
 ### Tell emacs to load `cp2k-mode.el` at startup
 
-Once `cp2k-mode.el` is in the search path, we need to tell emacs to
-load it at start up, this is done by adding
+Once `cp2k-mode.el` is in the search path, we need to tell emacs to load it at start up, this is
+done by adding
 
 ```emacs
 (require 'cp2k-mode nil 'noerror)
diff --git a/tools/input_editing/vim/README.md b/tools/input_editing/vim/README.md
index 6b52598030..64e980c184 100644
--- a/tools/input_editing/vim/README.md
+++ b/tools/input_editing/vim/README.md
@@ -2,6 +2,7 @@
 
 For more information on
 
-- obtaining and using the Vim editor plugin for CP2K, see [CP2K Tools:vim](https://www.cp2k.org/tools:vim)
-- creating the Vim editor plugin file cp2k.vim (together with the manual),
-  see [How to generate the CP2K input reference manual](https://manual.cp2k.org/trunk/generate_manual_howto.html)
+- obtaining and using the Vim editor plugin for CP2K, see
+  [CP2K Tools:vim](https://www.cp2k.org/tools:vim)
+- creating the Vim editor plugin file cp2k.vim (together with the manual), see
+  [How to generate the CP2K input reference manual](https://manual.cp2k.org/trunk/generate_manual_howto.html)
diff --git a/tools/plan_mpi_omp/README.md b/tools/plan_mpi_omp/README.md
index 994970f64c..51e9d1cf9b 100644
--- a/tools/plan_mpi_omp/README.md
+++ b/tools/plan_mpi_omp/README.md
@@ -2,28 +2,23 @@
 
 ## Overview
 
-DBCSR's block sparse matrix multiplication (Cannon algorithm) prefer a
-square-number for the total rank-count (2d communication pattern). This is
-not to be obfuscated with a Power-of-Two (POT) rank-count that usually leads
-to trivial work distribution (MPI).
-
-It can be more efficient to leave CPU-cores unused in order to achieve this
-square-number property rather than using all cores with a "non-preferred" total
-rank-count (sometimes a frequency upside over an "all-core turbo" emphasizes
-this property further). Counter-intuitively, even an unbalanced rank-count per
-node i.e., different rank-counts per socket can be an advantage. Pinning MPI
-processes and placing threads requires extra care to be taken on a per-node
-basis to load a dual-socket system in a balanced fashion or to setup space
-between ranks for the OpenMP threads.
-
-Because of the above-mentioned complexity, a script for planning
-MPI/OpenMP-hybrid execution (`plan.sh`) is available. Here is a first example
-for running the PSMP-binary on an SMP-enabled (Hyperthreads) dual-socket system
-with 24 cores per processor/socket (96 hardware threads in total). At
-first, a run with 48 ranks and 2 threads per core comes to the mind
-(`48x2`). However, for instance 16 ranks with 6 threads per rank can
-be better for performance (`16x6`). To easily place the ranks, Intel MPI
-is used:
+DBCSR's block sparse matrix multiplication (Cannon algorithm) prefer a square-number for the total
+rank-count (2d communication pattern). This is not to be obfuscated with a Power-of-Two (POT)
+rank-count that usually leads to trivial work distribution (MPI).
+
+It can be more efficient to leave CPU-cores unused in order to achieve this square-number property
+rather than using all cores with a "non-preferred" total rank-count (sometimes a frequency upside
+over an "all-core turbo" emphasizes this property further). Counter-intuitively, even an unbalanced
+rank-count per node i.e., different rank-counts per socket can be an advantage. Pinning MPI
+processes and placing threads requires extra care to be taken on a per-node basis to load a
+dual-socket system in a balanced fashion or to setup space between ranks for the OpenMP threads.
+
+Because of the above-mentioned complexity, a script for planning MPI/OpenMP-hybrid execution
+(`plan.sh`) is available. Here is a first example for running the PSMP-binary on an SMP-enabled
+(Hyperthreads) dual-socket system with 24 cores per processor/socket (96 hardware threads in total).
+At first, a run with 48 ranks and 2 threads per core comes to the mind (`48x2`). However, for
+instance 16 ranks with 6 threads per rank can be better for performance (`16x6`). To easily place
+the ranks, Intel MPI is used:
 
 ```bash
 mpiexec -np 16 \
@@ -33,20 +28,18 @@ mpiexec -np 16 \
   exe/Linux-x86-64-intelx/cp2k.psmp workload.inp
 ```
 
-**NOTE**: For hybrid codes, `I_MPI_PIN_DOMAIN=auto` is recommended as it spaces
-the ranks according to the number of OpenMP threads (`OMP_NUM_THREADS`). It is
-not necessary and not recommended to build a rather complicated
-`I_MPI_PIN_PROCESSOR_LIST` for hybrid codes (MPI plus OpenMP). To display and
-to log the pinning and thread affinization at the startup of an application,
-`I_MPI_DEBUG=4` can be used with no performance penalty. The recommended
-`I_MPI_PIN_ORDER=bunch` ensures that ranks per node are split as even as
-possible with respect to sockets (e.g., 36 ranks on a 2x20-core system
-are put in 2x18 ranks instead of 20+16 ranks).
-
-To achieve a similar placement with OpenMPI, ranks are mapped to "execution
-slots" (`--map-by slot`) along with specifying the number of processing
-elements (`PE`). By default, execution slots are counted in number of physical
-cores which yields `--map-by slot:PE=3` for the same system (mentioned above).
+**NOTE**: For hybrid codes, `I_MPI_PIN_DOMAIN=auto` is recommended as it spaces the ranks according
+to the number of OpenMP threads (`OMP_NUM_THREADS`). It is not necessary and not recommended to
+build a rather complicated `I_MPI_PIN_PROCESSOR_LIST` for hybrid codes (MPI plus OpenMP). To display
+and to log the pinning and thread affinization at the startup of an application, `I_MPI_DEBUG=4` can
+be used with no performance penalty. The recommended `I_MPI_PIN_ORDER=bunch` ensures that ranks per
+node are split as even as possible with respect to sockets (e.g., 36 ranks on a 2x20-core system are
+put in 2x18 ranks instead of 20+16 ranks).
+
+To achieve a similar placement with OpenMPI, ranks are mapped to "execution slots" (`--map-by slot`)
+along with specifying the number of processing elements (`PE`). By default, execution slots are
+counted in number of physical cores which yields `--map-by slot:PE=3` for the same system (mentioned
+above).
 
 ```bash
 mpiexec -np 16 --map-by slot:PE=3 \
@@ -55,31 +48,29 @@ mpiexec -np 16 --map-by slot:PE=3 \
   exe/Linux-x86-64-intelx/cp2k.psmp workload.inp
 ```
 
-**NOTE**: Intel MPI's `I_MPI_PIN_ORDER=bunch` to balance the number of
-ranks between sockets (see above) appears hard to achieve with OpenMPI
-therefore an undersubscribed system may not be recommended. To display and
-to log the pinning and thread affinization at the startup of an application,
-`mpiexec --report-bindings` can be used.
+**NOTE**: Intel MPI's `I_MPI_PIN_ORDER=bunch` to balance the number of ranks between sockets (see
+above) appears hard to achieve with OpenMPI therefore an undersubscribed system may not be
+recommended. To display and to log the pinning and thread affinization at the startup of an
+application, `mpiexec --report-bindings` can be used.
 
-The end of the next section continues with our example and extends execution
-to multiple nodes of the above-mentioned system.
+The end of the next section continues with our example and extends execution to multiple nodes of
+the above-mentioned system.
 
 ## Plan Script
 
-To configure the plan-script, the metric of the compute nodes can be given for
-future invocations so that only the node-count is required as an argument. The
-script's help output (`-h` or `--help`) initially shows the "system metric" of
-the computer the script is invoked on. For a system with 48 cores (two
-sockets, SMP/HT enabled), setting up the "system metric" looks like (`plan.sh [num-nodes] [ncores-per-node] [nthreads-per-core] [nsockets-per-node]`):
+To configure the plan-script, the metric of the compute nodes can be given for future invocations so
+that only the node-count is required as an argument. The script's help output (`-h` or `--help`)
+initially shows the "system metric" of the computer the script is invoked on. For a system with
+48 cores (two sockets, SMP/HT enabled), setting up the "system metric" looks like
+(`plan.sh [num-nodes] [ncores-per-node] [nthreads-per-core] [nsockets-per-node]`):
 
 ```bash
 ./plan.sh 1 48 2 2
 ```
 
-The script is storing the arguments (except for the node-count) as default
-values for the next plan (file: `$HOME/.xconfigure-cp2k-plan`). This allows to
-supply the system-type once, and to plan with varying node-counts in a
-convenient fashion. Planning for 8 nodes of the above kind yields the
+The script is storing the arguments (except for the node-count) as default values for the next plan
+(file: `$HOME/.xconfigure-cp2k-plan`). This allows to supply the system-type once, and to plan with
+varying node-counts in a convenient fashion. Planning for 8 nodes of the above kind yields the
 following output (`plan.sh 8`):
 
 ```text
@@ -97,12 +88,11 @@ following output (`plan.sh 8`):
 --------------------------------------------------------------------------------
 ```
 
-The first group of the output displays POT-style (trivial) MPI/OpenMP
-configurations (penalty denotes potential communication overhead), however the
-second group (if present) shows rank/thread combinations with the total
-rank-count hitting a square number (penalty denotes waste of compute due to not
-filling each node). For the given example, 8 ranks per node with
-12 threads per rank is chosen (`8x12`) and MPI-executed:
+The first group of the output displays POT-style (trivial) MPI/OpenMP configurations (penalty
+denotes potential communication overhead), however the second group (if present) shows rank/thread
+combinations with the total rank-count hitting a square number (penalty denotes waste of compute due
+to not filling each node). For the given example, 8 ranks per node with 12 threads per rank is
+chosen (`8x12`) and MPI-executed:
 
 ```bash
 mpiexec -perhost 8 -host node1,node2,node3,node4,node5,node6,node7,node8 \
@@ -111,21 +101,19 @@ mpiexec -perhost 8 -host node1,node2,node3,node4,node5,node6,node7,node8 \
   exe/Linux-x86-64-intelx/cp2k.psmp workload.inp
 ```
 
-**NOTE**: For Intel MPI as well as OpenMPI, mpiexec's host-list (`mpiexec -host`) is setup with unique node-names, and this is the only style that is
-explained in this article. There is a competing style where nodes names are
-duplicated for the sake of enumerating available ranks (or "execution slots"
-in case of OpenMPI), which is not exercised in this article.
-
-For OpenMPI, the quantity (per node) of the previously mentioned "execution
-slots" (measured in number of physical cores) are sometimes not known to
-OpenMPI (depends on cluster/scheduler setup). For instance, `mpiexec` may be
-complaining about an attempt to use too many execution slots simply because
-OpenMPI believes all systems represent a single such slot (instead of 2x24
-cores it only "sees" a single core per system). In such case, it is not
-recommended to "oversubscribe" the system because rank/thread affinity will
-likely be wrong (`mpiexec --oversubscribe`). Instead, the list of unique nodes
-names (`-host`) may be augmented with the number of physical cores on each of
-the nodes (e.g., ":48" in our case).
+**NOTE**: For Intel MPI as well as OpenMPI, mpiexec's host-list (`mpiexec -host`) is setup with
+unique node-names, and this is the only style that is explained in this article. There is a
+competing style where nodes names are duplicated for the sake of enumerating available ranks (or
+"execution slots" in case of OpenMPI), which is not exercised in this article.
+
+For OpenMPI, the quantity (per node) of the previously mentioned "execution slots" (measured in
+number of physical cores) are sometimes not known to OpenMPI (depends on cluster/scheduler setup).
+For instance, `mpiexec` may be complaining about an attempt to use too many execution slots simply
+because OpenMPI believes all systems represent a single such slot (instead of 2x24 cores it only
+"sees" a single core per system). In such case, it is not recommended to "oversubscribe" the system
+because rank/thread affinity will likely be wrong (`mpiexec --oversubscribe`). Instead, the list of
+unique nodes names (`-host`) may be augmented with the number of physical cores on each of the nodes
+(e.g., ":48" in our case).
 
 ```bash
 mpiexec -npernode 8 -host
@@ -135,20 +123,17 @@ node1:48,node2:48,node3:48,node4:48,node5:48,node6:48,node7:48,node8:48 \
   exe/Linux-x86-64-intelx/cp2k.psmp workload.inp
 ```
 
-**NOTE**: It can be still insufficient to augment the nodes with the expected
-number of slots (`:48`). If OpenMPI's mpiexec is still complaining, it might
-be caused and solved by the job scheduler. For example, `qsub` (PBS) may be
-instructed with `-l select=8:mpiprocs=48` in the above case (`mpiexec` in this
-job can use less than 48 ranks per node).
-
-The plan-script also suggests close-by configurations (lower and higher
-node-counts) that can hit the square-property ("Try also the following node
-counts"). The example (as exercised above) was to illustrate how the script
-works, however it can be very helpful when running jobs especially on CPUs with
-not many prime factors in the core-count. Remember, the latter can be also the
-case for virtualized environments that reserve some of the cores to run the
-Hypervisor i.e., reporting less cores to the Operating System (guest OS) when
-compared to the physical core-count.
+**NOTE**: It can be still insufficient to augment the nodes with the expected number of slots
+(`:48`). If OpenMPI's mpiexec is still complaining, it might be caused and solved by the job
+scheduler. For example, `qsub` (PBS) may be instructed with `-l select=8:mpiprocs=48` in the above
+case (`mpiexec` in this job can use less than 48 ranks per node).
+
+The plan-script also suggests close-by configurations (lower and higher node-counts) that can hit
+the square-property ("Try also the following node counts"). The example (as exercised above) was to
+illustrate how the script works, however it can be very helpful when running jobs especially on CPUs
+with not many prime factors in the core-count. Remember, the latter can be also the case for
+virtualized environments that reserve some of the cores to run the Hypervisor i.e., reporting less
+cores to the Operating System (guest OS) when compared to the physical core-count.
 
 ## References
 
diff --git a/tools/precommit/README.md b/tools/precommit/README.md
index dea326eee7..543c26cb1f 100644
--- a/tools/precommit/README.md
+++ b/tools/precommit/README.md
@@ -1,31 +1,21 @@
 # CP2K Precommit
 
-The precommit system consists of the following tools that analyze and format the
-source code:
-
-- [doxify](../doxify/)
-  to add Doxygen templates.
-- [prettify](../prettify/)
-  to format Fortran files.
-- [analyze_src](../conventions/analyze_src.py)
-  to check copyright banners and a few other things.
-- [ast.parse](https://docs.python.org/3/library/ast.html)
-  to check Python syntax.
-- [clang-format](https://clang.llvm.org/docs/ClangFormat.html)
-  to format C and Cuda files.
-- [black](https://github.com/psf/black)
-  to format Python scripts.
-- [shfmt](https://github.com/mvdan/sh)
-  to format Shell scripts.
-- [shellcheck](https://github.com/koalaman/shellcheck)
-  to analyze Shell scripts.
-- [mdformat](https://github.com/executablebooks/mdformat) with [gfm plugin](https://github.com/hukkin/mdformat-gfm)
-  to format Markdown scripts.
-- [cmake-format](https://github.com/cheshirekow/cmake_format)
-  to format CMake scripts.
-
-In contrast to the [CP2K-CI](https://github.com/cp2k/cp2k-ci) these tools
-process each file individually, which makes them much more lightweight.
+The precommit system consists of the following tools that analyze and format the source code:
+
+- [doxify](../doxify/) to add Doxygen templates.
+- [prettify](../prettify/) to format Fortran files.
+- [analyze_src](../conventions/analyze_src.py) to check copyright banners and a few other things.
+- [ast.parse](https://docs.python.org/3/library/ast.html) to check Python syntax.
+- [clang-format](https://clang.llvm.org/docs/ClangFormat.html) to format C and Cuda files.
+- [black](https://github.com/psf/black) to format Python scripts.
+- [shfmt](https://github.com/mvdan/sh) to format Shell scripts.
+- [shellcheck](https://github.com/koalaman/shellcheck) to analyze Shell scripts.
+- [mdformat](https://github.com/executablebooks/mdformat) with
+  [gfm plugin](https://github.com/hukkin/mdformat-gfm) to format Markdown scripts.
+- [cmake-format](https://github.com/cheshirekow/cmake_format) to format CMake scripts.
+
+In contrast to the [CP2K-CI](https://github.com/cp2k/cp2k-ci) these tools process each file
+individually, which makes them much more lightweight.
 
 ## Install Git Hook
 
@@ -37,9 +27,9 @@ ln -fs ../../tools/precommit/precommit.py .git/hooks/pre-commit
 
 ## Server
 
-Many of the tools listed above require more than just Python. To avoid their
-tedious installation a remote server is used by default. It is hosted at
-<https://precommit.cp2k.org> via [Cloud Run](https://cloud.google.com/run).
+Many of the tools listed above require more than just Python. To avoid their tedious installation a
+remote server is used by default. It is hosted at <https://precommit.cp2k.org> via
+[Cloud Run](https://cloud.google.com/run).
 
 The same server can also be started locally when Docker is available:
 
@@ -61,12 +51,10 @@ Searching for files...
 
 ## Backups and Cache
 
-Before precommit runs an external tool on a file it create a backup copy in
-`obj/precommit`. If the tool leaves the file unmodified then the backup copy is
-remove afterwards. If the tool modifies the file and precommit was invoked
-without `--allow-modifications` then the backup copy is used to restore the
-file's original content.
+Before precommit runs an external tool on a file it create a backup copy in `obj/precommit`. If the
+tool leaves the file unmodified then the backup copy is remove afterwards. If the tool modifies the
+file and precommit was invoked without `--allow-modifications` then the backup copy is used to
+restore the file's original content.
 
-After a successful tool run the file's timestamp is recorded in
-`obj/precommit/cache.json` and precommit will skip it in future unless it's
-invoked with `--no-cache`.
+After a successful tool run the file's timestamp is recorded in `obj/precommit/cache.json` and
+precommit will skip it in future unless it's invoked with `--no-cache`.
diff --git a/tools/precommit/precommit_server.py b/tools/precommit/precommit_server.py
index 2318b218b8..09f436a203 100755
--- a/tools/precommit/precommit_server.py
+++ b/tools/precommit/precommit_server.py
@@ -45,7 +45,7 @@ def shellcheck():
 @app.route("/mdformat", methods=["POST"])
 @app.route("/markdownlint", methods=["POST"])  # for backwards compatibility
 def mdformat():
-    return run_tool(["mdformat"])
+    return run_tool(["mdformat", "--wrap=100"])
 
 
 # ======================================================================================
diff --git a/tools/toolchain/README.md b/tools/toolchain/README.md
index 5063dcf954..1e13564bec 100644
--- a/tools/toolchain/README.md
+++ b/tools/toolchain/README.md
@@ -2,8 +2,8 @@
 
 ## Options
 
-To use the CP2K toolchain installer, you may want to first follow
-the instructions given in installer help message:
+To use the CP2K toolchain installer, you may want to first follow the instructions given in
+installer help message:
 
 ```shell
 > ./install_cp2k_toolchain.sh --help
@@ -17,15 +17,13 @@ If you are new to CP2K, and want a basic CP2K binary, then just calling
 > ./install_cp2k_toolchain.sh
 ```
 
-may be enough. This will use your system gcc, and mpi library (if
-existing) and build libint, libxc, fftw and openblas (MKL will be used
-instead if MKLROOT env variable is found) from scratch, and give you
-a set of arch files that allow you to compile CP2K.
+may be enough. This will use your system gcc, and mpi library (if existing) and build libint, libxc,
+fftw and openblas (MKL will be used instead if MKLROOT env variable is found) from scratch, and give
+you a set of arch files that allow you to compile CP2K.
 
 ## Complete toolchain build
 
-For a complete toolchain build, with everything installed from
-scratch, use:
+For a complete toolchain build, with everything installed from scratch, use:
 
 ```shell
 > ./install_cp2k_toolchain.sh --install-all
@@ -33,54 +31,47 @@ scratch, use:
 
 ### Package settings
 
-One can then change settings for some packages, by setting
-`--with-PKG` options after the `--install-all` option. e.g.:
+One can then change settings for some packages, by setting `--with-PKG` options after the
+`--install-all` option. e.g.:
 
 ```shell
 > ./install_CP2K_toolchain.sh --install-all --with-mkl=system
 ```
 
-will set the script to look for a system MKL library to link, while
-compile other packages from scratch.
+will set the script to look for a system MKL library to link, while compile other packages from
+scratch.
 
 ### MPI implementation choice
 
-If you do not have a MPI installation, by default the `--install-all`
-option will install MPICH for you.  You can change this default
-behavior by setting `--mpi-mode` after the `--install-all` option.
+If you do not have a MPI installation, by default the `--install-all` option will install MPICH for
+you. You can change this default behavior by setting `--mpi-mode` after the `--install-all` option.
 
 ## Trouble Shooting
 
-Below are solutions to some of the common problems you may encounter when
-running this script.
+Below are solutions to some of the common problems you may encounter when running this script.
 
 ### The script terminated with an error message
 
-Look at the error message. If it does not indicate the reason for
-failure then it is likely that some error occurred during
-compilation of the package.  You can look at the compiler log in
-the file make.log in the source directory of the package in
-`./build`.
+Look at the error message. If it does not indicate the reason for failure then it is likely that
+some error occurred during compilation of the package. You can look at the compiler log in the file
+make.log in the source directory of the package in `./build`.
 
-One of the causes on some systems may be the fact that too many
-parallel make processes were initiated.  By default the script
-tries to use all of the processors on you node. You can override
+One of the causes on some systems may be the fact that too many parallel make processes were
+initiated. By default the script tries to use all of the processors on you node. You can override
 this behavior using `-j` option.
 
 ### The script failed at a tarball downloading stage
 
-Try run again with `--no-check-certificate` option. See the help
-section for this option for details.
+Try run again with `--no-check-certificate` option. See the help section for this option for
+details.
 
 ### I've used --with-XYZ=system cannot find the XYZ library
 
-The installation script in "system" mode will try to find a library
-in the following system PATHS: `LD_LIBRARY_PATH`, `LD_RUN_PATH`,
-`LIBRARY_PATH`, `/usr/local/lib64`, `/usr/local/lib`, `/usr/lib64`,
-`/usr/lib`.
+The installation script in "system" mode will try to find a library in the following system PATHS:
+`LD_LIBRARY_PATH`, `LD_RUN_PATH`, `LIBRARY_PATH`, `/usr/local/lib64`, `/usr/local/lib`,
+`/usr/lib64`, `/usr/lib`.
 
-For MKL libraries, the installation script will try to look for
-MKLROOT environment variable.
+For MKL libraries, the installation script will try to look for MKLROOT environment variable.
 
 You can use:
 
@@ -88,24 +79,20 @@ You can use:
 > module show XYZ
 ```
 
-to see exactly what happens when the module XYZ is loaded into your
-system. Sometimes a module will define its own PATHS and
-environment variables that is not in the default installation
-script search path. And as a result the given library will likely
-not be found.
+to see exactly what happens when the module XYZ is loaded into your system. Sometimes a module will
+define its own PATHS and environment variables that is not in the default installation script search
+path. And as a result the given library will likely not be found.
 
-The simplest solution is perhaps to find where the root
-installation directory of the library or package is, and then use
-`--with-XYZ=/some/location/to/XYZ` to tell the script exactly where
-to look for the library.
+The simplest solution is perhaps to find where the root installation directory of the library or
+package is, and then use `--with-XYZ=/some/location/to/XYZ` to tell the script exactly where to look
+for the library.
 
 ## Licenses
 
 The toolchain only downloads and installs packages that are
-[compatible with the GPL](https://www.gnu.org/licenses/gpl-faq.html#WhatDoesCompatMean).
-The following table list the licenses of all those packages. While the toolchain
-does support linking proprietary software packages, like e.g. MKL, these have to
-be installed separately by the user.
+[compatible with the GPL](https://www.gnu.org/licenses/gpl-faq.html#WhatDoesCompatMean). The
+following table list the licenses of all those packages. While the toolchain does support linking
+proprietary software packages, like e.g. MKL, these have to be installed separately by the user.
 
 | Package   | License                                                                            | GPL Compatible                                                                   |
 | --------- | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
@@ -141,57 +128,44 @@ be installed separately by the user.
 
 ### Structure of the toolchain scripts
 
-- `install_cp2k_toolchain.sh` is the main script that will call all
-  other scripts.  It contains default flag settings, user input
-  parser, calls to each package installation scripts and the
+- `install_cp2k_toolchain.sh` is the main script that will call all other scripts. It contains
+  default flag settings, user input parser, calls to each package installation scripts and the
   generator of the CP2K arch files.
 
-- `script/install_*.sh` are the installation scripts for individual
-  packages. They are relatively independent, in the sense that by
-  running `script/install_PKG.sh` it should install the package on its
-  own. However, in practice due to dependencies to other libraries,
-  sometimes for a package to be installed this way, it will depend
-  on other libraries being already installed and the correct
-  environment variables set. At the end of each script, it should
-  write to __two__ files: `build/setup_PKG` and `install/setup`.
-
-  - The `build/setup_PKG` file contains all the instructions to set
-    the variables used by the `install_cp2k_toolchain.sh` and other
-    `script/install_PKG.sh` scripts in order for them to correctly
-    compile the toolchain and set the correct library flags for the
-    arch files.
-  - The `install/setup` file contains all the instructions for setting
-    up the correct environment before the user can compile and/or
-    run CP2K.
-
-- `script/toolkit.sh` contains all the macros that may be used by all
-  of the scripts, and provides functionalities such as prepending a
-  path, checking if a library exists etc.
-
-- `script/common_var.sh` contains all of the common variables used by
-  each installation scripts. All of the variables in the file should
-  have a default value, but allow the environment to set the values,
-  using: `VAR=${VAR:-default_value}`.
-
-- `script/parse_if.py` is a python code for parsing the `IF_XYZ(A|B)`
-  constructs in the script. Nested structures will be parsed
-  correctly. See
+- `script/install_*.sh` are the installation scripts for individual packages. They are relatively
+  independent, in the sense that by running `script/install_PKG.sh` it should install the package on
+  its own. However, in practice due to dependencies to other libraries, sometimes for a package to
+  be installed this way, it will depend on other libraries being already installed and the correct
+  environment variables set. At the end of each script, it should write to __two__ files:
+  `build/setup_PKG` and `install/setup`.
+
+  - The `build/setup_PKG` file contains all the instructions to set the variables used by the
+    `install_cp2k_toolchain.sh` and other `script/install_PKG.sh` scripts in order for them to
+    correctly compile the toolchain and set the correct library flags for the arch files.
+  - The `install/setup` file contains all the instructions for setting up the correct environment
+    before the user can compile and/or run CP2K.
+
+- `script/toolkit.sh` contains all the macros that may be used by all of the scripts, and provides
+  functionalities such as prepending a path, checking if a library exists etc.
+
+- `script/common_var.sh` contains all of the common variables used by each installation scripts. All
+  of the variables in the file should have a default value, but allow the environment to set the
+  values, using: `VAR=${VAR:-default_value}`.
+
+- `script/parse_if.py` is a python code for parsing the `IF_XYZ(A|B)` constructs in the script.
+  Nested structures will be parsed correctly. See
   [`IF_XYZ` constructs](./README_FOR_DEVELOPERS.md#the-if_xyz-constructs) below.
 
-- `checksums.sha256` contains the pre-calculated SHA256 checksums for
-  the tar balls of all of the packages. This is used by the
-  `download_pkg` macro in `script/toolkit.sh`.
+- `checksums.sha256` contains the pre-calculated SHA256 checksums for the tar balls of all of the
+  packages. This is used by the `download_pkg` macro in `script/toolkit.sh`.
 
-- `arch_base.tmpl` contains the template skeleton structure for the
-  arch files. The `install_cp2k_toolchain` script will set all the
-  variables used in the template file, and then do an eval to expand
-  all of `${VARIABLE}` items in `arch_base.tmpl` to give the cp2k arch
-  files.
+- `arch_base.tmpl` contains the template skeleton structure for the arch files. The
+  `install_cp2k_toolchain` script will set all the variables used in the template file, and then do
+  an eval to expand all of `${VARIABLE}` items in `arch_base.tmpl` to give the cp2k arch files.
 
 ### `enable-FEATURE` options
 
-The `enable-FEATURE` options control whether a FEATURE is enabled or disabled.
-Possible values are:
+The `enable-FEATURE` options control whether a FEATURE is enabled or disabled. Possible values are:
 
 - `yes` (equivalent to using the option-keyword alone)
 - `no`
@@ -200,25 +174,21 @@ Possible values are:
 
 The `with_PKG` options controls how a package is going to be installed:
 
-- either compiled and installed from source downloaded
-  (`install`, or the option-keyword alone),
+- either compiled and installed from source downloaded (`install`, or the option-keyword alone),
 - or linked to locations provided by system search paths (`system`),
 - or linked to locations provided by the user (`<path>`, path to some directory),
 - or that the installer won't be used (`no`).
 
-For most packages the `with_pkg` variables will act like a switch for
-turning on or off the support for this package. However, for
-packages serving the same purpose, with the installer needing only
-one, an extra variable `PKG_MODE` (e.g. `MPI_MODE`) are used as a
-selector.  In this case, while `with_PKG` controls the installation
-method, the `PKG_MODE` variable picks which package to actually use.
-This provides more flexibility.
+For most packages the `with_pkg` variables will act like a switch for turning on or off the support
+for this package. However, for packages serving the same purpose, with the installer needing only
+one, an extra variable `PKG_MODE` (e.g. `MPI_MODE`) are used as a selector. In this case, while
+`with_PKG` controls the installation method, the `PKG_MODE` variable picks which package to actually
+use. This provides more flexibility.
 
 ### The IF_XYZ constructs
 
-Due to the fact that `install_cp2k_toolchain.sh` needs to produce
-several different versions of the arch files: `psmp`, `pdbg`,
-`ssmp`, `sdbg`, etc, it will have to resolve different flags for
+Due to the fact that `install_cp2k_toolchain.sh` needs to produce several different versions of the
+arch files: `psmp`, `pdbg`, `ssmp`, `sdbg`, etc, it will have to resolve different flags for
 different arch file versions.
 
 The solution used by this script is to use a syntax construct:
@@ -227,9 +197,9 @@ The solution used by this script is to use a syntax construct:
 IF_XYZ(A | B)
 ```
 
-A parser will then parse this expression to *A* if *XYZ* is passed
-to the parser (python `parse_if.py` filename XYZ); and to *B* if *XYZ*
-is not passed as command line option (python `parse_if.py` filename).
+A parser will then parse this expression to *A* if *XYZ* is passed to the parser (python
+`parse_if.py` filename XYZ); and to *B* if *XYZ* is not passed as command line option (python
+`parse_if.py` filename).
 
 The `IF_XYZ(A|B)` construct can be nested, so things like:
 
@@ -237,19 +207,17 @@ The `IF_XYZ(A|B)` construct can be nested, so things like:
 IF_XYZ(IF_ABC(flag1|flag2) | flag3)
 ```
 
-will parse to *flag1* if both *XYZ* and *ABC* are present in the command
-line arguments of `parser_if.py`, to *flag2* if only *XYZ* is present,
-and *flag3* if nothing is present.
+will parse to *flag1* if both *XYZ* and *ABC* are present in the command line arguments of
+`parser_if.py`, to *flag2* if only *XYZ* is present, and *flag3* if nothing is present.
 
 ### To ensure portability
 
-- one should always pass compiler flags through the
-  `allowed_gcc_flags` and `allowed_gfortran_flags` filters in
-  `scripts/toolkit.sh` to omit any flags that are not supported by
-  the gcc version used (or installed by this script).
+- one should always pass compiler flags through the `allowed_gcc_flags` and `allowed_gfortran_flags`
+  filters in `scripts/toolkit.sh` to omit any flags that are not supported by the gcc version used
+  (or installed by this script).
 
-- note that `allowed_gcc_flags` and `allowed_gfortran_flags` do not work
-  with `IF_XYZ` constructs. So if you have something like:
+- note that `allowed_gcc_flags` and `allowed_gfortran_flags` do not work with `IF_XYZ` constructs.
+  So if you have something like:
 
 ```shell
 FCFLAGS="IF_XYZ(flag1 flag2 | flag3 flag4)"
@@ -271,13 +239,11 @@ So that:
 FCFLAGS="IF_XYZ($XYZ_TRUE_FLAGS | $XYZ_FALSE_FLAGS)"
 ```
 
-- For any intrinsic fortran modules that may be used, it is best to
-  check with `check_gfortran_module` macro defined in
-  `script/tool_kit.sh`. Depending on the gcc version, some intrinsic
-  modules may not exist.
+- For any intrinsic fortran modules that may be used, it is best to check with
+  `check_gfortran_module` macro defined in `script/tool_kit.sh`. Depending on the gcc version, some
+  intrinsic modules may not exist.
 
-- Try to avoid as much hard coding as possible:
-  e.g. instead of setting:
+- Try to avoid as much hard coding as possible: e.g. instead of setting:
 
 ```shell
 ./configure --prefix=some_dir CC=mpicc FC=mpif90
@@ -291,19 +257,14 @@ use the common variables:
 
 ## To keep maintainability it is recommended that we follow these practices
 
-- Reuse as much functionality from the macros defined in the
-  `script/toolkit.sh` as possible
-
-- When the existing macros in `script/toolkit.sh` do not provide the
-  functionalities you want, it is better to write the new
-  functionality as a macro in `script/toolkit.sh`, and then use the
-  macro (repeatedly if required) in the actual installation
-  script. This keeps the installation scripts uncluttered and more
-  readable.
-
-- All packages should install into their own directories, and with a
-  lock file created in their respective directory to indicate
-  installation has been successful. This allows the script to skip
-  over the compilation stages of already installed packages if the
-  user terminated the toolchain script at the middle of a run and
-  then restarted the script.
+- Reuse as much functionality from the macros defined in the `script/toolkit.sh` as possible
+
+- When the existing macros in `script/toolkit.sh` do not provide the functionalities you want, it is
+  better to write the new functionality as a macro in `script/toolkit.sh`, and then use the macro
+  (repeatedly if required) in the actual installation script. This keeps the installation scripts
+  uncluttered and more readable.
+
+- All packages should install into their own directories, and with a lock file created in their
+  respective directory to indicate installation has been successful. This allows the script to skip
+  over the compilation stages of already installed packages if the user terminated the toolchain
+  script at the middle of a run and then restarted the script.