From 1b6f62e97f2f90e4a05c68b210c2fd8124fe7b66 Mon Sep 17 00:00:00 2001
From: Jan Wilhelm <JWilhelm@users.noreply.github.com>
Date: Tue, 30 Jul 2024 10:21:02 +0200
Subject: [PATCH] improve convergence of GW bandstructure calculation & improve
 manual

---
 docs/methods/properties/bandstructure_gw.md   |  74 ++--
 .../properties/optical/bethe-salpeter.md      | 108 +----
 src/common/bibliography.F                     |   8 +-
 src/gw_communication.F                        | 200 ++++++++-
 src/gw_large_cell_gamma.F                     |   6 +-
 src/gw_small_cell_full_kp.F                   | 391 +-----------------
 src/gw_utils.F                                | 306 ++++++++++++--
 src/input_cp2k_mp2.F                          |   4 +-
 src/input_cp2k_properties_dft.F               |   5 +-
 src/post_scf_bandstructure_utils.F            |   6 +
 src/rpa_gw.F                                  |  43 +-
 .../07_G0W0_periodic_H2Te_kp_sampling_SCF.inp |  80 ++++
 tests/QS/regtest-scalable-gw/TEST_FILES       |   5 +-
 13 files changed, 662 insertions(+), 574 deletions(-)
 create mode 100644 tests/QS/regtest-scalable-gw/07_G0W0_periodic_H2Te_kp_sampling_SCF.inp
diff --git a/docs/methods/properties/bandstructure_gw.md b/docs/methods/properties/bandstructure_gw.md
index 5ef480ccdb..caa1899773 100644
--- a/docs/methods/properties/bandstructure_gw.md
+++ b/docs/methods/properties/bandstructure_gw.md
@@ -4,8 +4,8 @@
 energies of molecules. In this tutorial, we describe the band gap problem of DFT to motivate the
 usage of *GW*, we briefly discuss the theoretical framework of *GW* and the *GW* implementation in
 CP2K. We also provide input and output files of *GW* calculations performed with CP2K. The *GW*
-theory of this tutorial is taken from [](#Golze2019) and many useful details on *GW* can be found in
-this review.
+theory of this tutorial is taken from \[[](#Golze2019)\] and many useful details on *GW* can be
+found in this review.
 
 ## 1. The band gap problem of DFT
 
@@ -25,12 +25,12 @@ electronic band structure of a solid. This approximation comes with limitations:
 
 - When using one of the common GGA exchange-correlation (xc) functionals, the band gap in the KS-DFT
   band structure $\varepsilon_{n\mathbf{k}}^\text{DFT}$ is much too small compared to experimental
-  band gaps (Fig. 26 in [](#Golze2019)). Even with the exact xc functional, the band gap in the
+  band gaps (Fig. 26 in \[[](#Golze2019))\]. Even with the exact xc functional, the band gap in the
   KS-DFT band structure $\varepsilon_{n\mathbf{k}}^\text{DFT}$ will be too small due to the
   derivative discontinuity.
 
-- The KS-DFT band structure $\varepsilon_{n\mathbf{k}}^\text{DFT}$ is insensitive to screening by
-  the environment. As an example, the KS eigenvalue gap
+- The GGA band structure $\varepsilon_{n\mathbf{k}}^\text{DFT}$ is insensitive to screening by the
+  environment. As an example, the GGA eigenvalue gap
   $\varepsilon_{n=\text{LUMO}}^\text{DFT}-\varepsilon_{n=\text{HOMO}}^\text{DFT}$ of a molecule is
   almost identical for the molecule in gas phase and the molecule being on a surface. In experiment
   however, the surface induces an image-charge effect which can reduce the HOMO-LUMO gap of the
@@ -80,9 +80,10 @@ $\varepsilon_{n\mathbf{k}}^\text{DFT}$ and we add the xc contribution from
 $\braket{\psi_{n\mathbf{k}}|
 \Sigma^{G_0W_0}(\varepsilon_{n\mathbf{k}}^{G_0W_0}) |\psi_{n\mathbf{k}}}$.
 
-The DFT xc functional can influence the *G*<sub>0</sub>*W*<sub>0</sub> quasiparticle energies. For
-molecules, it is recommended to start the *G*<sub>0</sub>*W*<sub>0</sub> calculation from the PBE0
-functional and for solids, from the PBE functional.
+The DFT xc start functional of *G*<sub>0</sub>*W*<sub>0</sub> can influence the
+*G*<sub>0</sub>*W*<sub>0</sub> quasiparticle energies. For molecules, it is recommended to start the
+*G*<sub>0</sub>*W*<sub>0</sub> calculation from the PBE0 functional and for solids, from the PBE
+functional.
 
 CP2K also allows to perform eigenvalue-selfconsistency in $G$ (ev*GW*<sub>0</sub>) and
 eigenvalue-selfconsistency in $G$ and in $W$ (ev*GW*). For solids, it has been shown that band gaps
@@ -91,18 +92,19 @@ from ev*GW*$_0$@PBE can be in better agreement with experimental band gaps than
 
 CP2K contains three different *GW* implementations:
 
-- *GW* for molecules [](#Wilhelm2016) (Sec. 3)
+- *GW* for molecules \[[](#Wilhelm2016)\] (Sec. 3)
 - *GW* for computing the band structure of a solid with small unit cell with *k*-point sampling in
   DFT (publication in preparation TODO: insert reference, Sec. 4)
-- *GW* for computing the band structure of a large cell in a Γ-only approach [](#Graml2024) (Sec. 5)
+- *GW* for computing the band structure of a large cell in a Γ-only approach \[[](#Graml2024)\]
+  (Sec. 5)
 
 In the following, we will discuss details and usage of these *GW* implementations.
 
 ## 3. *GW* for molecules
 
-For starting a *G*<sub>0</sub>*W*<sub>0</sub>, ev*GW*<sub>0</sub> or ev*GW* calculation, one needs
-to set the [RUN_TYPE](#CP2K_INPUT.GLOBAL.RUN_TYPE) to `ENERGY` and one needs to put the following
-section:
+For starting a *G*<sub>0</sub>*W*<sub>0</sub>, ev*GW*<sub>0</sub> or ev*GW* calculation for a
+molecule, one needs to set the [RUN_TYPE](#CP2K_INPUT.GLOBAL.RUN_TYPE) to `ENERGY` and one needs to
+put the following section:
 
 ```
 &XC
@@ -124,19 +126,21 @@ section:
 In the upper *GW* section, the following keywords have been used:
 
 - [QUADRATURE_POINTS](#CP2K_INPUT.FORCE_EVAL.DFT.XC.WF_CORRELATION.RI_RPA.QUADRATURE_POINTS): Number
-  of imaginary-frequency points used for computing the self-energy. 100 points are usually enough
-  for converging quasiparticle energies within 10 meV.
+  of imaginary-frequency points used for computing the self-energy (Eq. (21) in
+  \[[](#Wilhelm2016))\]. 100 points are usually enough for converging quasiparticle energies within
+  10 meV.
 
 - [SELF_CONSISTENCY](#CP2K_INPUT.FORCE_EVAL.DFT.XC.WF_CORRELATION.RI_RPA.GW.SELF_CONSISTENCY):
   Determines which *GW* self-consistency (*G*<sub>0</sub>*W*<sub>0</sub>, ev*GW*<sub>0</sub> or
   ev*GW*) is used to calculate the *GW* quasiparticle energies.
 
 The numerical precision of the *GW* implementation is 10 meV compared to reference calculations, for
-example the *GW*100 test set [](#vanSetten2015). To help new users get familiar with the *GW*
+example the *GW*100 test set \[[](#vanSetten2015)\]. To help new users get familiar with the *GW*
 implementation in CP2K, we recommend starting by reproducing the HOMO and LUMO
 *G*<sub>0</sub>*W*<sub>0</sub>@PBE quasiparticle energies of the H<sub>2</sub>O molecule from the
 GW100 test set. The reference values are $\varepsilon_\text{HOMO}^{G_0W_0\text{@PBE}}$ = -11.97 eV
-and $\varepsilon_\text{LUMO}^{G_0W_0\text{@PBE}}$ = 2.37 eV; input and output files are available
+and $\varepsilon_\text{LUMO}^{G_0W_0\text{@PBE}}$ = 2.37 eV; CP2K input and output files for the
+*G*<sub>0</sub>*W*<sub>0</sub>@PBE calculation of the H<sub>2</sub>O molecule are available
 [here](https://github.com/cp2k/cp2k-examples/tree/master/bandstructure_gw/1_H2O_GW100).
 
 The following settings from DFT will also have an influence on *GW* quasiparticle energies:
@@ -145,19 +149,11 @@ The following settings from DFT will also have an influence on *GW* quasiparticl
   *G*<sub>0</sub>*W*<sub>0</sub>, ev*GW*<sub>0</sub> or ev*GW* calculation. For molecules, we
   recommend either ev*GW*<sub>0</sub>@PBE or *G*<sub>0</sub>*W*<sub>0</sub>@PBE0. For further
   guidance on selecting an appropriate DFT starting functional and self-consistency scheme for your
-  system, you may consult [](#Golze2019) or the following video:
-
-```{youtube} 1vUuethWhbs
----
-url_parameters: ?start=5563
-align: center
-privacy_mode:
----
-```
+  system, you may consult \[[](#Golze2019)\].
 
 - [BASIS_SET](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.BASIS_SET): The basis set is of Gaussian type and
   can have strong influence on the quasiparticle energies. For computing quasiparticle energies, a
-  basis set extrapolation is necessary, see Fig. 2a in [](#Wilhelm2016) and we recommend
+  basis set extrapolation is necessary, see Fig. 2a in \[[](#Wilhelm2016)\] and we recommend
   all-electron GAPW calculations with correlation-consistent basis sets from the
   <a href="https://www.basissetexchange.org/" target="_blank">EMSL database</a>. For computing the
   HOMO-LUMO gap from *GW*, we recommend augmented basis sets, for example
@@ -247,11 +243,11 @@ details in forthcoming publication (TODO Link):
   radius of truncated Coulomb metric in Å. A larger cutoff leads to faster the RI basis set
   convergence, but also computational cost increases. A cutoff of 7 Å is an accurate choice.
 - [&SOC](#CP2K_INPUT.FORCE_EVAL.PROPERTIES.BANDSTRUCTURE.SOC): Activates spin-orbit coupling (SOC
-  from Hartwigsen-Goedecker-Hutter pseudopotentials \[[Hartwigsen1998](#Hartwigsen1998)\]. SOC also
+  from Hartwigsen-Goedecker-Hutter pseudopotentials \[[Hartwigsen1998](#Hartwigsen1998)\]). SOC also
   needs `POTENTIAL_FILE_NAME  GTH_SOC_POTENTIALS`.
 - [&BANDSTRUCTURE_PATH](#CP2K_INPUT.FORCE_EVAL.PROPERTIES.BANDSTRUCTURE.BANDSTRUCTURE_PATH): Specify
   the *k*-path in the Brillouin zone for computing the band structure. Relative *k*-coordinates are
-  needed which you can retrieve for your crystal structure from [](#Setyawan2010).
+  needed which you can retrieve for your crystal structure from \[[](#Setyawan2010)\].
 
 We recommend TZVP-MOLOPT basis sets together with GTH/HGH pseudopotentials, see basis set
 convergence study in (TODOref). At present, 2d-periodic boundary conditions are supported, 1d- and
@@ -271,21 +267,21 @@ hours on 20 large-memory nodes).
 
 ## 5. *GW* for large cells in Γ-only approach
 
-For a large unit cell, a Γ-only *GW* algorithm is available in CP2K [](#Graml2024). The requirement
-on the cell is that elements of the density matrix decay by several orders of magnitude when the two
-basis functions of the matrix element have a distance similar to the cell size. As rule of thumb,
-for a 2d material, a 9x9 unit cell is large enough for the Γ-only algorithm, see Fig. 1 in
-[](#Graml2024).
+For a large unit cell, a Γ-only *GW* algorithm is available in CP2K \[[](#Graml2024)\]. The
+requirement on the cell is that elements of the density matrix decay by several orders of magnitude
+when the two basis functions of the matrix element have a distance similar to the cell size. As rule
+of thumb, for a 2d material, a 9x9 unit cell is large enough for the Γ-only algorithm, see Fig. 1 in
+\[[](#Graml2024)\].
 
 The input file for a Γ-only *GW* calculation is identical as for *GW* for small cells with *k*-point
-sampling except that the `&KPOINTS` section in DFT needs to be avoided. An exemplary input and
+sampling except that the `&KPOINTS` section in DFT needs to be removed. An exemplary input and
 output is available
-\[<a href="https://github.com/cp2k/cp2k-examples/tree/master/bandstructure_gw/5_9x9_supercell_2d_MoS2" target="_blank">here</a>\]
+\[<a href="https://github.com/cp2k/cp2k-examples/tree/master/bandstructure_gw/5_9x9_supercell_2d_MoS2" target="_blank">here</a>\].
 Running the input file requires access to a large computer (the calculation took 2.5 hours on 32
 nodes on Noctua2 cluster in Paderborn). The computational parameters from this input file reach
 numerical convergence of the band gap within ~ 50 meV (TZVP basis set, 10 time and frequency
-points). Detailed convergence tests are available in the SI, Table S1 of [](#Graml2024) We recommend
-the numerical parameters from the input file for large-scale GW calculations. The code prints
-restart files with ending .matrix that can be used to restart a crashed calculation.
+points). Detailed convergence tests are available in the SI, Table S1 of \[[](#Graml2024)\] We
+recommend the numerical parameters from the input file for large-scale GW calculations. The code
+prints restart files with ending .matrix that can be used to restart a crashed calculation.
 
 In case anything does not work, please feel free to contact jan.wilhelm (at) ur.de.
diff --git a/docs/methods/properties/optical/bethe-salpeter.md b/docs/methods/properties/optical/bethe-salpeter.md
index ba9bd2aa1c..7f67f70e12 100644
--- a/docs/methods/properties/optical/bethe-salpeter.md
+++ b/docs/methods/properties/optical/bethe-salpeter.md
@@ -18,11 +18,12 @@ The following ingredients are necessary for computing $\Omega^{(n)}$:
 - Occupied Kohn-Sham (KS) orbitals $\varphi_i(\mathbf{r})$ and empty KS orbitals
   $\varphi_a(\mathbf{r})$ from a DFT calculation, where $i=1,\ldots,N_\mathrm{occ}$ and
   $a=N_\mathrm{occ}+1,\ldots,N_\mathrm{occ}+N_\mathrm{empty}$,
-- $GW$ eigenvalues $\varepsilon_i^{GW}$ and $\varepsilon_a^{GW}$ of corresponding KS orbitals.
+- *GW* eigenvalues $\varepsilon_i^{GW}$ and $\varepsilon_a^{GW}$ of corresponding KS orbitals.
 
-It is possible to use $G_0W_0$, ev$GW_0$ or ev$GW$ eigenvalues, see details in [GW] and in Ref.
-\[[](#Golze2019)\], i.e. we perform BSE@$G_0W_0$/ev$GW_0$/ev$GW$@DFT. Thus, also input parameters
-for a DFT and $GW$ calculation influence the BSE calcualation (see full input in Sec.
+It is possible to use *G*<sub>0</sub>*W*<sub>0</sub>, ev*GW*<sub>0</sub> or ev*GW* eigenvalues, see
+details in [GW] and in Ref. \[[](#Golze2019)\], i.e. we perform
+BSE@*G*<sub>0</sub>*W*<sub>0</sub>/ev*GW*<sub>0</sub>/ev*GW*@DFT. Thus, also input parameters for a
+DFT and *GW* calculation influence the BSE calcualation (see full input in Sec.
 [3.1](#header-input-file)). We obtain optical properties from BSE solving the following generalized
 eigenvalue problem that involves the block matrix $ABBA$:
 
@@ -86,8 +87,9 @@ For starting a BSE calculation one needs to set the [RUN_TYPE](#CP2K_INPUT.GLOBA
 In the upper GW/BSE section, the following keywords have been used:
 
 - [SELF_CONSISTENCY](#CP2K_INPUT.FORCE_EVAL.DFT.XC.WF_CORRELATION.RI_RPA.GW.SELF_CONSISTENCY):
-  Determines which GW self-consistency ($G_0W_0$, ev$GW_0$ or ev$G_0W_0$) is used to calculate the
-  single-particle GW energies $\varepsilon_p^{GW}$ needed in the BSE calculation.
+  Determines which *GW* self-consistency (*G*<sub>0</sub>*W*<sub>0</sub>, ev*GW*<sub>0</sub> or
+  ev*GW*) is used to calculate the single-particle *GW* energies $\varepsilon_p^{GW}$ needed in the
+  BSE calculation.
 
 - [TDA](#CP2K_INPUT.FORCE_EVAL.DFT.XC.WF_CORRELATION.RI_RPA.GW.BSE.TDA): Three options available:
 
@@ -123,8 +125,8 @@ The following settings from DFT will also have an influence on the BSE excitatio
 - [XC_FUNCTIONAL](#CP2K_INPUT.FORCE_EVAL.DFT.XC.XC_FUNCTIONAL): Choose between one of the available
   xc-functionals. The starting point can have a profound influence on the excitation energies
   \[[Bruneval2015](#Bruneval2015), [](#Gui2018), [](#Jacquemin2017)\]. We either recommend the PBE
-  functional as DFT starting point when using BSE@ev$GW_0$@PBE or the PBE0 functional, when using
-  BSE@$G_0W_0$@PBE0 (see also
+  functional as DFT starting point when using BSE@ev*GW*<sub>0</sub>@PBE or the PBE0 functional,
+  when using BSE@*G*<sub>0</sub>*W*<sub>0</sub>@PBE0 (see also
   [SELF_CONSISTENCY](#CP2K_INPUT.FORCE_EVAL.DFT.XC.WF_CORRELATION.RI_RPA.GW.SELF_CONSISTENCY)).
 
 - [BASIS_SET](#CP2K_INPUT.FORCE_EVAL.SUBSYS.KIND.BASIS_SET): Specify the basis set, which affects
@@ -143,7 +145,7 @@ We have benchmarked the numerical precision of our BSE implementation and we fou
 agreement within only 10 meV compared to the BSE implementation in FHI aims \[[](#Liu2020)\].
 
 The current BSE implementation in CP2K works for molecules. The inclusion of periodic boundary
-conditions in a $\Gamma$-only approach and with full $k$-point sampling is work in progress.
+conditions in a Γ-only approach and with full *k*-point sampling is work in progress.
 
 (header-example)=
 
@@ -153,7 +155,7 @@ conditions in a $\Gamma$-only approach and with full $k$-point sampling is work
 
 ### 3.1 Input file
 
-In this section, we provide a minimal example of a BSE calculation on $\mathrm{H}_2$. For the
+In this section, we provide a minimal example of a BSE calculation on H<sub>2</sub>. For the
 calculation you need the input file BSE_H2.inp and the aug-cc-DZVP basis
 ([Download](https://github.com/cp2k/cp2k-examples/tree/master/bethe-salpeter/H2)).
 
@@ -163,83 +165,13 @@ Please copy both files into your working directory and run CP2K by
 mpirun -n 1 cp2k.psmp BSE_H2.inp
 ```
 
-which requires 12 GB RAM and takes roughly 2 minutes on 1 core. You can find the output file also in
-the [Download](https://github.com/cp2k/cp2k-examples/tree/master/bethe-salpeter/H2).
-
-```none
-&GLOBAL
-  PROJECT  H2
-  RUN_TYPE ENERGY
-&END GLOBAL
-&FORCE_EVAL
-  METHOD Quickstep
-  &DFT
-    BASIS_SET_FILE_NAME BASIS-aug               ! Custom Basis set file (aug-cc-pVDZ and aug-cc-pVDZ-RIFIT 
-    POTENTIAL_FILE_NAME POTENTIAL               ! from the Basis Set Exchange library - www.basissetexchange.org/)
-    &QS
-      METHOD GAPW                               ! All electron calculation
-      EPS_DEFAULT 1.0E-16
-      EPS_PGF_ORB 1.0E-16
-    &END QS
-    &POISSON
-      PERIODIC NONE
-      PSOLVER MT
-    &END
-    &SCF
-      SCF_GUESS RESTART
-      EPS_SCF 1e-7
-    &END SCF
-    &XC
-      &XC_FUNCTIONAL PBE0                       ! Choice of functional has a profound influence on the results
-      &END XC_FUNCTIONAL
-      &WF_CORRELATION
-        &RI_RPA                                 ! In the RI_RPA and the GW section, additional numerical parameters, e.g.
-          &GW
-            SELF_CONSISTENCY      G0W0          ! can be changed to EV_GW0 or EV_GW
-            &BSE  
-              TDA                 TDA+ABBA      ! Diagonalizing ABBA and A
-              SPIN_CONFIG         SINGLET       ! or TRIPLET
-              NUM_PRINT_EXC       20            ! Number of printed excitations
-              ENERGY_CUTOFF_OCC   -1            ! Set to positive numbers (eV) to
-              ENERGY_CUTOFF_EMPTY -1            ! truncate matrices A_ia,jb and B_ia,jb
-            &END BSE
-          &END GW
-        &END RI_RPA
-      &END WF_CORRELATION
-    &END XC
-  &END DFT
-  &SUBSYS
-    &CELL
-      ABC 20 20 20
-      PERIODIC NONE
-    &END CELL
-    &COORD
-      H 0.0000 0.0000 0.0000                    ! H2 molecule geometry from GW100 Paper
-      H 0.0000 0.0000 0.74144
-    &END COORD
-    &TOPOLOGY
-      &CENTER_COORDINATES
-      &END
-    &END TOPOLOGY
-    &KIND H
-      BASIS_SET ORB    aug-cc-pVDZ              ! For production runs, the basis set should be checked for convergence.
-      BASIS_SET RI_AUX aug-cc-pVDZ-RIFIT        ! In general, pVDZ should be a solid choice.
-      POTENTIAL ALL
-    &END KIND
-    &KIND O
-      BASIS_SET ORB    aug-cc-pVDZ
-      BASIS_SET RI_AUX aug-cc-pVDZ-RIFIT
-      POTENTIAL ALL
-    &END KIND
-  &END SUBSYS
-&END FORCE_EVAL
-```
-
-The basis sets `aug-cc-pVDZ` and `aug-cc-pVDZ-RIFIT` in `BASIS-aug` can be obtained from the Basis
-Set Exchange Library:
+which requires 12 GB RAM and takes roughly 2 minutes on 1 core. You can find the input and output
+file [here](https://github.com/cp2k/cp2k-examples/tree/master/bethe-salpeter/H2). We use the basis
+sets `aug-cc-pVDZ` and `aug-cc-pVDZ-RIFIT` from the file `BASIS-aug`. These basis sets can be
+obtained from the Basis Set Exchange Library:
 <a href="https://www.basissetexchange.org/basis/aug-cc-pvdz/format/cp2k/?version=1&elements=1" target="_blank">aug-cc-pVDZ</a>,
 <a href="https://www.basissetexchange.org/basis/aug-cc-pvdz-rifit/format/cp2k/?version=1&elements=1" target="_blank">aug-cc-pVDZ-RIFIT</a>.
-The geometry for $\mathrm{H}_2$ was taken from [](#vanSetten2015).
+The geometry for H<sub>2</sub> was taken from \[[](#vanSetten2015)\].
 
 ### 3.2 Output
 
@@ -277,14 +209,14 @@ printed:
  BSE|            4      1 =>     5                     -full-             0.7076
 ```
 
-In the case of the $\mathrm{H}_2$, the lowest excitation *n* = 1 is mainly built up by a transition
-from the HOMO (i=1) to the LUMO (a=2), what is apparent from
+In the case of H<sub>2</sub>, the lowest excitation *n* = 1 is mainly built up by a transition from
+the HOMO (i=1) to the LUMO (a=2), what is apparent from
 $X_{i=\text{HOMO},a=\text{LUMO}}^{(n=1)}=0.663$, and also contains a considerable contribution from
 the 1=>4 (HOMO=>LUMO+2) transition ($X_{i=\text{HOMO},a=\text{LUMO+2}}^{(n=1)}=0.2549$ ).
 
 ### 3.3 Large scale calculations
 
-Going to larger systems is a challenge for a $GW$+BSE-calculation, since the memory consumption
+Going to larger systems is a challenge for a *GW*+BSE-calculation, since the memory consumption
 increases with $N_\mathrm{occ}^2 N_\mathrm{empty}^2$. The used $N_\mathrm{occ}$, $N_\mathrm{empty}$
 and the required memory of a calculation are printed in the output file to estimate the memory
 consumption. We recommend to use several nodes to provide the required memory. In the following, we
diff --git a/src/common/bibliography.F b/src/common/bibliography.F
index 7d7696b7cc..558e1128a2 100644
--- a/src/common/bibliography.F
+++ b/src/common/bibliography.F
@@ -95,7 +95,7 @@ MODULE bibliography
                             Caldeweyher2017, Caldeweyher2019, Caldeweyher2020, Freeman1977, Gruneis2009, &
                             Stein2022, Stein2024, &
                             Blase2018, Blase2020, Bruneval2015, Golze2019, Gui2018, Jacquemin2017, Liu2020, &
-                            Sander2015, Schreiber2008, Hueser2013, vanSetten2015, Setyawan2010
+                            Sander2015, Schreiber2008, vanSetten2015, Setyawan2010
 
 CONTAINS
 
@@ -1890,12 +1890,6 @@ SUBROUTINE add_all_references()
                          source="The Journal of chemical physics", volume="128", pages="134110", &
                          year=2008, doi="10.1063/1.2889385")
 
-      CALL add_reference(key=Hueser2013, &
-                         authors=s2a("Hueser, F", "Olsen, T", "Thygesen, KS"), &
-                         title="Quasiparticle GW calculations for solids, molecules, and two-dimensional materials", &
-                         source="Physical Review B", volume="87", &
-                         year=2013, doi="10.1103/PhysRevB.87.235132")
-
       CALL add_reference(key=vanSetten2015, &
                          authors=s2a("van Setten, MJ", "Caruso, F", "Sharifzadeh, S", "Ren, X", "Scheffler, M", &
                                      "Liu, F", "Lischner, J", "Lin, L", "Deslippe, JR", "Louie, SG", "Yang, C", &
diff --git a/src/gw_communication.F b/src/gw_communication.F
index b3cd7d65e6..cb869bb04a 100644
--- a/src/gw_communication.F
+++ b/src/gw_communication.F
@@ -14,10 +14,12 @@ MODULE gw_communication
    USE cp_dbcsr_api,                    ONLY: &
         dbcsr_copy, dbcsr_create, dbcsr_filter, dbcsr_finalize, dbcsr_get_info, &
         dbcsr_get_stored_coordinates, dbcsr_iterator_blocks_left, dbcsr_iterator_next_block, &
-        dbcsr_iterator_start, dbcsr_iterator_stop, dbcsr_iterator_type, dbcsr_release, &
-        dbcsr_reserve_all_blocks, dbcsr_reserve_blocks, dbcsr_set, dbcsr_type
-   USE cp_dbcsr_operations,             ONLY: copy_fm_to_dbcsr
-   USE cp_fm_types,                     ONLY: cp_fm_type
+        dbcsr_iterator_start, dbcsr_iterator_stop, dbcsr_iterator_type, dbcsr_p_type, &
+        dbcsr_release, dbcsr_reserve_all_blocks, dbcsr_reserve_blocks, dbcsr_set, dbcsr_type
+   USE cp_dbcsr_operations,             ONLY: copy_dbcsr_to_fm,&
+                                              copy_fm_to_dbcsr
+   USE cp_fm_types,                     ONLY: cp_fm_get_info,&
+                                              cp_fm_type
    USE dbt_api,                         ONLY: dbt_clear,&
                                               dbt_copy,&
                                               dbt_copy_matrix_to_tensor,&
@@ -38,7 +40,8 @@ MODULE gw_communication
 
    CHARACTER(len=*), PARAMETER, PRIVATE :: moduleN = 'gw_communication'
 
-   PUBLIC :: local_dbt_to_global_mat, fm_to_local_tensor
+   PUBLIC :: local_dbt_to_global_mat, fm_to_local_tensor, fm_to_local_array, local_array_to_fm, &
+             local_dbt_to_global_fm
 
    TYPE buffer_type
       REAL(KIND=dp), DIMENSION(:), POINTER  :: msg => NULL()
@@ -749,4 +752,191 @@ SUBROUTINE local_matrix_to_global_matrix(mat_local, mat_global, para_env)
 
    END SUBROUTINE local_matrix_to_global_matrix
 
+! **************************************************************************************************
+!> \brief ...
+!> \param fm_S ...
+!> \param array_S ...
+!> \param weight ...
+!> \param add ...
+! **************************************************************************************************
+   SUBROUTINE fm_to_local_array(fm_S, array_S, weight, add)
+
+      TYPE(cp_fm_type), DIMENSION(:)                     :: fm_S
+      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: array_S
+      REAL(KIND=dp), OPTIONAL                            :: weight
+      LOGICAL, OPTIONAL                                  :: add
+
+      CHARACTER(LEN=*), PARAMETER                        :: routineN = 'fm_to_local_array'
+
+      INTEGER                                            :: handle, i, i_row_local, img, j, &
+                                                            j_col_local, n_basis, ncol_local, &
+                                                            nimages, nrow_local
+      INTEGER, DIMENSION(:), POINTER                     :: col_indices, row_indices
+      LOGICAL                                            :: my_add
+      REAL(KIND=dp)                                      :: my_weight
+      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: array_tmp
+
+      CALL timeset(routineN, handle)
+
+      my_weight = 1.0_dp
+      IF (PRESENT(weight)) my_weight = weight
+
+      my_add = .FALSE.
+      IF (PRESENT(add)) my_add = add
+
+      n_basis = SIZE(array_S, 1)
+      nimages = SIZE(array_S, 3)
+
+      ! checks
+      CPASSERT(SIZE(array_S, 2) == n_basis)
+      CPASSERT(SIZE(fm_S) == nimages)
+      CPASSERT(LBOUND(array_S, 1) == 1)
+      CPASSERT(LBOUND(array_S, 2) == 1)
+      CPASSERT(LBOUND(array_S, 3) == 1)
+
+      CALL cp_fm_get_info(matrix=fm_S(1), &
+                          nrow_local=nrow_local, &
+                          ncol_local=ncol_local, &
+                          row_indices=row_indices, &
+                          col_indices=col_indices)
+
+      IF (.NOT. my_add) array_S(:, :, :) = 0.0_dp
+      ALLOCATE (array_tmp(SIZE(array_S, 1), SIZE(array_S, 2), SIZE(array_S, 3)))
+      array_tmp(:, :, :) = 0.0_dp
+
+      DO img = 1, nimages
+         DO i_row_local = 1, nrow_local
+
+            i = row_indices(i_row_local)
+
+            DO j_col_local = 1, ncol_local
+
+               j = col_indices(j_col_local)
+
+               array_tmp(i, j, img) = fm_S(img)%local_data(i_row_local, j_col_local)
+
+            END DO ! j_col_local
+         END DO ! i_row_local
+      END DO ! img
+
+      CALL fm_S(1)%matrix_struct%para_env%sync()
+      CALL fm_S(1)%matrix_struct%para_env%sum(array_tmp)
+      CALL fm_S(1)%matrix_struct%para_env%sync()
+
+      array_S(:, :, :) = array_S(:, :, :) + my_weight*array_tmp(:, :, :)
+
+      CALL timestop(handle)
+
+   END SUBROUTINE fm_to_local_array
+
+! **************************************************************************************************
+!> \brief ...
+!> \param array_S ...
+!> \param fm_S ...
+!> \param weight ...
+!> \param add ...
+! **************************************************************************************************
+   SUBROUTINE local_array_to_fm(array_S, fm_S, weight, add)
+      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: array_S
+      TYPE(cp_fm_type), DIMENSION(:)                     :: fm_S
+      REAL(KIND=dp), OPTIONAL                            :: weight
+      LOGICAL, OPTIONAL                                  :: add
+
+      CHARACTER(LEN=*), PARAMETER                        :: routineN = 'local_array_to_fm'
+
+      INTEGER                                            :: handle, i, i_row_local, img, j, &
+                                                            j_col_local, n_basis, ncol_local, &
+                                                            nimages, nrow_local
+      INTEGER, DIMENSION(:), POINTER                     :: col_indices, row_indices
+      LOGICAL                                            :: my_add
+      REAL(KIND=dp)                                      :: my_weight, S_ij
+
+      CALL timeset(routineN, handle)
+
+      my_weight = 1.0_dp
+      IF (PRESENT(weight)) my_weight = weight
+
+      my_add = .FALSE.
+      IF (PRESENT(add)) my_add = add
+
+      n_basis = SIZE(array_S, 1)
+      nimages = SIZE(array_S, 3)
+
+      ! checks
+      CPASSERT(SIZE(array_S, 2) == n_basis)
+      CPASSERT(SIZE(fm_S) == nimages)
+      CPASSERT(LBOUND(array_S, 1) == 1)
+      CPASSERT(LBOUND(array_S, 2) == 1)
+      CPASSERT(LBOUND(array_S, 3) == 1)
+
+      CALL cp_fm_get_info(matrix=fm_S(1), &
+                          nrow_local=nrow_local, &
+                          ncol_local=ncol_local, &
+                          row_indices=row_indices, &
+                          col_indices=col_indices)
+
+      DO img = 1, nimages
+
+         DO i_row_local = 1, nrow_local
+
+            i = row_indices(i_row_local)
+
+            DO j_col_local = 1, ncol_local
+
+               j = col_indices(j_col_local)
+
+               IF (my_add) THEN
+                  S_ij = fm_S(img)%local_data(i_row_local, j_col_local) + &
+                         array_S(i, j, img)*my_weight
+               ELSE
+                  S_ij = array_S(i, j, img)*my_weight
+               END IF
+               fm_S(img)%local_data(i_row_local, j_col_local) = S_ij
+
+            END DO ! j_col_local
+
+         END DO ! i_row_local
+
+      END DO ! img
+
+      CALL timestop(handle)
+
+   END SUBROUTINE local_array_to_fm
+
+! **************************************************************************************************
+!> \brief ...
+!> \param t_R ...
+!> \param fm_R ...
+!> \param mat_global ...
+!> \param mat_local ...
+!> \param bs_env ...
+! **************************************************************************************************
+   SUBROUTINE local_dbt_to_global_fm(t_R, fm_R, mat_global, mat_local, bs_env)
+      TYPE(dbt_type), DIMENSION(:)                       :: t_R
+      TYPE(cp_fm_type), DIMENSION(:)                     :: fm_R
+      TYPE(dbcsr_p_type)                                 :: mat_global, mat_local
+      TYPE(post_scf_bandstructure_type), POINTER         :: bs_env
+
+      CHARACTER(LEN=*), PARAMETER :: routineN = 'local_dbt_to_global_fm'
+
+      INTEGER                                            :: handle, i_cell, n_images
+
+      CALL timeset(routineN, handle)
+
+      n_images = SIZE(t_R)
+
+      CPASSERT(n_images == SIZE(fm_R))
+
+      DO i_cell = 1, n_images
+         CALL dbcsr_set(mat_global%matrix, 0.0_dp)
+         CALL dbcsr_set(mat_local%matrix, 0.0_dp)
+         CALL local_dbt_to_global_mat(t_R(i_cell), mat_local%matrix, mat_global%matrix, &
+                                      bs_env%para_env)
+         CALL copy_dbcsr_to_fm(mat_global%matrix, fm_R(i_cell))
+      END DO
+
+      CALL timestop(handle)
+
+   END SUBROUTINE local_dbt_to_global_fm
+
 END MODULE gw_communication
diff --git a/src/gw_large_cell_gamma.F b/src/gw_large_cell_gamma.F
index 06e74b449b..d042b4c90e 100644
--- a/src/gw_large_cell_gamma.F
+++ b/src/gw_large_cell_gamma.F
@@ -639,9 +639,9 @@ SUBROUTINE G_times_3c(t_3c_for_G, t_G, t_M, bs_env, atoms_AO_1, atoms_AO_2, atom
 
       CALL timeset(routineN, handle)
 
-      ! JW: bounds_IL and bounds_k do not safe any operations, but maybe communication
-      !     maybe remove "bounds_1=bounds_IL, &" and "bounds_2=bounds_k, &" later and
-      !     check whether performance improves
+      ! JW bounds_IL and bounds_k do not safe any operations, but maybe communication
+      !    maybe remove "bounds_1=bounds_IL, &" and "bounds_2=bounds_k, &" later and
+      !    check whether performance improves
 
       bounds_IL(1:2) = [bs_env%i_ao_start_from_atom(atoms_IL(1)), &
                         bs_env%i_ao_end_from_atom(atoms_IL(2))]
diff --git a/src/gw_small_cell_full_kp.F b/src/gw_small_cell_full_kp.F
index 282f01a2b0..dcde44f0f6 100644
--- a/src/gw_small_cell_full_kp.F
+++ b/src/gw_small_cell_full_kp.F
@@ -11,26 +11,14 @@
 !> \date 05.2024
 ! **************************************************************************************************
 MODULE gw_small_cell_full_kp
-   USE cp_blacs_env,                    ONLY: cp_blacs_env_type
    USE cp_cfm_types,                    ONLY: cp_cfm_create,&
                                               cp_cfm_get_info,&
                                               cp_cfm_release,&
                                               cp_cfm_to_cfm,&
                                               cp_cfm_to_fm,&
                                               cp_cfm_type
-   USE cp_dbcsr_api,                    ONLY: dbcsr_create,&
-                                              dbcsr_distribution_release,&
-                                              dbcsr_distribution_type,&
-                                              dbcsr_p_type,&
-                                              dbcsr_release,&
-                                              dbcsr_set,&
-                                              dbcsr_type,&
-                                              dbcsr_type_no_symmetry
-   USE cp_dbcsr_operations,             ONLY: copy_dbcsr_to_fm,&
-                                              cp_dbcsr_dist2d_to_dist
    USE cp_fm_types,                     ONLY: cp_fm_create,&
                                               cp_fm_get_diag,&
-                                              cp_fm_get_info,&
                                               cp_fm_release,&
                                               cp_fm_set_all,&
                                               cp_fm_type
@@ -40,9 +28,10 @@ MODULE gw_small_cell_full_kp
                                               dbt_create,&
                                               dbt_destroy,&
                                               dbt_type
-   USE distribution_2d_types,           ONLY: distribution_2d_type
-   USE gw_communication,                ONLY: fm_to_local_tensor,&
-                                              local_dbt_to_global_mat
+   USE gw_communication,                ONLY: fm_to_local_array,&
+                                              fm_to_local_tensor,&
+                                              local_array_to_fm,&
+                                              local_dbt_to_global_fm
    USE gw_kp_to_real_space_and_back,    ONLY: add_ikp_to_all_rs,&
                                               fm_add_ikp_to_rs,&
                                               fm_trafo_rs_to_ikp,&
@@ -50,26 +39,19 @@ MODULE gw_small_cell_full_kp
    USE gw_utils,                        ONLY: add_R,&
                                               analyt_conti_and_print,&
                                               de_init_bs_env,&
+                                              get_V_tr_R,&
                                               is_cell_in_index_to_cell,&
+                                              power,&
                                               time_to_freq
    USE kinds,                           ONLY: dp
    USE kpoint_coulomb_2c,               ONLY: build_2c_coulomb_matrix_kp_small_cell
-   USE libint_2c_3c,                    ONLY: libint_potential_type
    USE machine,                         ONLY: m_walltime
    USE mathconstants,                   ONLY: z_one,&
                                               z_zero
    USE parallel_gemm_api,               ONLY: parallel_gemm
-   USE particle_methods,                ONLY: get_particle_set
-   USE particle_types,                  ONLY: particle_type
    USE post_scf_bandstructure_types,    ONLY: post_scf_bandstructure_type
    USE post_scf_bandstructure_utils,    ONLY: get_all_VBM_CBM_bandgaps
-   USE qs_environment_types,            ONLY: get_qs_env,&
-                                              qs_environment_type
-   USE qs_kind_types,                   ONLY: qs_kind_type
-   USE qs_neighbor_list_types,          ONLY: neighbor_list_set_p_type,&
-                                              release_neighbor_list_sets
-   USE qs_tensors,                      ONLY: build_2c_integrals,&
-                                              build_2c_neighbor_lists
+   USE qs_environment_types,            ONLY: qs_environment_type
 #include "./base/base_uses.f90"
 
    IMPLICIT NONE
@@ -102,8 +84,8 @@ SUBROUTINE gw_calc_small_cell_full_kp(qs_env, bs_env)
       ! G^occ_µλ(i|τ|,k) = sum_n^occ C_µn(k)^* e^(-|(ϵ_nk-ϵ_F)τ|) C_λn(k)
       ! G^vir_µλ(i|τ|,k) = sum_n^vir C_µn(k)^* e^(-|(ϵ_nk-ϵ_F)τ|) C_λn(k)
       ! k-point k -> cell S: G^occ/vir_µλ^S(i|τ|) = sum_k w_k G^occ/vir_µλ(i|τ|,k) e^(ikS)
-      ! χ_PQ^R(iτ) = sum_λR1νR2 [ sum_µS (µR1-S νR2 | P0) G^vir_µλ^S(i|τ|) ]
-      !                         [ sum_σS (σR2-S λR1 | QR) G^occ_σν^S(i|τ|) ]
+      ! χ_PQ^R(iτ) = sum_λR1νR2 [ sum_µS (µR1-S νR2 | P0) G^vir_λµ^S(i|τ|) ]
+      !                         [ sum_σS (σR2-S λR1 | QR) G^occ_νσ^S(i|τ|) ]
       CALL compute_chi(bs_env)
 
       ! χ_PQ^R(iτ) -> χ_PQ(iω,k) -> ε_PQ(iω,k) -> W_PQ(iω,k) -> Ŵ(iω,k) = M^-1(k)*W(iω,k)*M^-1(k)
@@ -187,7 +169,6 @@ SUBROUTINE compute_chi(bs_env)
 
                   ! 3-cells check because in M^vir_νR2,λR1,QR (step 3.): R2 is index on ν
                   IF (.NOT. cell_found) CYCLE
-
                   ! 2. M^occ/vir_λR1,νR2,P0 = sum_µS (λR1 µR2-S | P0) G^occ/vir_νµ^S(iτ)
                   CALL G_times_3c(Gocc_S, t_Gocc, bs_env, i_cell_R1, i_cell_R2)
                   CALL G_times_3c(Gvir_S, t_Gvir, bs_env, i_cell_R2, i_cell_R1)
@@ -225,7 +206,7 @@ SUBROUTINE compute_chi(bs_env)
 
    END SUBROUTINE compute_chi
 
-! **************************************************************************************************
+! *************************************************************************************************
 !> \brief ...
 !> \param R ...
 !> \param template ...
@@ -460,193 +441,6 @@ SUBROUTINE G_occ_vir(bs_env, tau, G_S, ispin, occ, vir)
 
    END SUBROUTINE G_occ_vir
 
-! **************************************************************************************************
-!> \brief ...
-!> \param t_R ...
-!> \param fm_R ...
-!> \param mat_global ...
-!> \param mat_local ...
-!> \param bs_env ...
-! **************************************************************************************************
-   SUBROUTINE local_dbt_to_global_fm(t_R, fm_R, mat_global, mat_local, bs_env)
-      TYPE(dbt_type), DIMENSION(:)                       :: t_R
-      TYPE(cp_fm_type), DIMENSION(:)                     :: fm_R
-      TYPE(dbcsr_p_type)                                 :: mat_global, mat_local
-      TYPE(post_scf_bandstructure_type), POINTER         :: bs_env
-
-      CHARACTER(LEN=*), PARAMETER :: routineN = 'local_dbt_to_global_fm'
-
-      INTEGER                                            :: handle, i_cell, n_images
-
-      CALL timeset(routineN, handle)
-
-      n_images = SIZE(t_R)
-
-      CPASSERT(n_images == SIZE(fm_R))
-
-      DO i_cell = 1, n_images
-         CALL dbcsr_set(mat_global%matrix, 0.0_dp)
-         CALL dbcsr_set(mat_local%matrix, 0.0_dp)
-         CALL local_dbt_to_global_mat(t_R(i_cell), mat_local%matrix, mat_global%matrix, &
-                                      bs_env%para_env)
-         CALL copy_dbcsr_to_fm(mat_global%matrix, fm_R(i_cell))
-      END DO
-
-      CALL timestop(handle)
-
-   END SUBROUTINE local_dbt_to_global_fm
-
-! **************************************************************************************************
-!> \brief ...
-!> \param fm_S ...
-!> \param array_S ...
-!> \param weight ...
-!> \param add ...
-! **************************************************************************************************
-   SUBROUTINE fm_to_local_array(fm_S, array_S, weight, add)
-
-      TYPE(cp_fm_type), DIMENSION(:)                     :: fm_S
-      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: array_S
-      REAL(KIND=dp), OPTIONAL                            :: weight
-      LOGICAL, OPTIONAL                                  :: add
-
-      CHARACTER(LEN=*), PARAMETER                        :: routineN = 'fm_to_local_array'
-
-      INTEGER                                            :: handle, i, i_row_local, img, j, &
-                                                            j_col_local, n_basis, ncol_local, &
-                                                            nimages, nrow_local
-      INTEGER, DIMENSION(:), POINTER                     :: col_indices, row_indices
-      LOGICAL                                            :: my_add
-      REAL(KIND=dp)                                      :: my_weight
-      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: array_tmp
-
-      CALL timeset(routineN, handle)
-
-      my_weight = 1.0_dp
-      IF (PRESENT(weight)) my_weight = weight
-
-      my_add = .FALSE.
-      IF (PRESENT(add)) my_add = add
-
-      n_basis = SIZE(array_S, 1)
-      nimages = SIZE(array_S, 3)
-
-      ! checks
-      CPASSERT(SIZE(array_S, 2) == n_basis)
-      CPASSERT(SIZE(fm_S) == nimages)
-      CPASSERT(LBOUND(array_S, 1) == 1)
-      CPASSERT(LBOUND(array_S, 2) == 1)
-      CPASSERT(LBOUND(array_S, 3) == 1)
-
-      CALL cp_fm_get_info(matrix=fm_S(1), &
-                          nrow_local=nrow_local, &
-                          ncol_local=ncol_local, &
-                          row_indices=row_indices, &
-                          col_indices=col_indices)
-
-      IF (.NOT. my_add) array_S(:, :, :) = 0.0_dp
-      ALLOCATE (array_tmp(SIZE(array_S, 1), SIZE(array_S, 2), SIZE(array_S, 3)))
-      array_tmp(:, :, :) = 0.0_dp
-
-      DO img = 1, nimages
-         DO i_row_local = 1, nrow_local
-
-            i = row_indices(i_row_local)
-
-            DO j_col_local = 1, ncol_local
-
-               j = col_indices(j_col_local)
-
-               array_tmp(i, j, img) = fm_S(img)%local_data(i_row_local, j_col_local)
-
-            END DO ! j_col_local
-         END DO ! i_row_local
-      END DO ! img
-
-      CALL fm_S(1)%matrix_struct%para_env%sync()
-      CALL fm_S(1)%matrix_struct%para_env%sum(array_tmp)
-      CALL fm_S(1)%matrix_struct%para_env%sync()
-
-      array_S(:, :, :) = array_S(:, :, :) + my_weight*array_tmp(:, :, :)
-
-      CALL timestop(handle)
-
-   END SUBROUTINE fm_to_local_array
-
-! **************************************************************************************************
-!> \brief ...
-!> \param array_S ...
-!> \param fm_S ...
-!> \param weight ...
-!> \param add ...
-! **************************************************************************************************
-   SUBROUTINE local_array_to_fm(array_S, fm_S, weight, add)
-      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: array_S
-      TYPE(cp_fm_type), DIMENSION(:)                     :: fm_S
-      REAL(KIND=dp), OPTIONAL                            :: weight
-      LOGICAL, OPTIONAL                                  :: add
-
-      CHARACTER(LEN=*), PARAMETER                        :: routineN = 'local_array_to_fm'
-
-      INTEGER                                            :: handle, i, i_row_local, img, j, &
-                                                            j_col_local, n_basis, ncol_local, &
-                                                            nimages, nrow_local
-      INTEGER, DIMENSION(:), POINTER                     :: col_indices, row_indices
-      LOGICAL                                            :: my_add
-      REAL(KIND=dp)                                      :: my_weight, S_ij
-
-      CALL timeset(routineN, handle)
-
-      my_weight = 1.0_dp
-      IF (PRESENT(weight)) my_weight = weight
-
-      my_add = .FALSE.
-      IF (PRESENT(add)) my_add = add
-
-      n_basis = SIZE(array_S, 1)
-      nimages = SIZE(array_S, 3)
-
-      ! checks
-      CPASSERT(SIZE(array_S, 2) == n_basis)
-      CPASSERT(SIZE(fm_S) == nimages)
-      CPASSERT(LBOUND(array_S, 1) == 1)
-      CPASSERT(LBOUND(array_S, 2) == 1)
-      CPASSERT(LBOUND(array_S, 3) == 1)
-
-      CALL cp_fm_get_info(matrix=fm_S(1), &
-                          nrow_local=nrow_local, &
-                          ncol_local=ncol_local, &
-                          row_indices=row_indices, &
-                          col_indices=col_indices)
-
-      DO img = 1, nimages
-
-         DO i_row_local = 1, nrow_local
-
-            i = row_indices(i_row_local)
-
-            DO j_col_local = 1, ncol_local
-
-               j = col_indices(j_col_local)
-
-               IF (my_add) THEN
-                  S_ij = fm_S(img)%local_data(i_row_local, j_col_local) + &
-                         array_S(i, j, img)*my_weight
-               ELSE
-                  S_ij = array_S(i, j, img)*my_weight
-               END IF
-               fm_S(img)%local_data(i_row_local, j_col_local) = S_ij
-
-            END DO ! j_col_local
-
-         END DO ! i_row_local
-
-      END DO ! img
-
-      CALL timestop(handle)
-
-   END SUBROUTINE local_array_to_fm
-
 ! **************************************************************************************************
 !> \brief ...
 !> \param t_Gocc ...
@@ -945,82 +739,6 @@ SUBROUTINE compute_Minv_and_Vsqrt(bs_env, qs_env, M_inv_V_sqrt, M_inv, V_sqrt)
 
    END SUBROUTINE compute_Minv_and_Vsqrt
 
-! **************************************************************************************************
-!> \brief ...
-!> \param matrix ...
-!> \param exponent ...
-!> \param eps ...
-! **************************************************************************************************
-   SUBROUTINE power(matrix, exponent, eps)
-      COMPLEX(KIND=dp), DIMENSION(:, :)                  :: matrix
-      REAL(KIND=dp)                                      :: exponent, eps
-
-      CHARACTER(len=*), PARAMETER                        :: routineN = 'power'
-
-      COMPLEX(KIND=dp), ALLOCATABLE, DIMENSION(:, :)     :: eigenvectors
-      COMPLEX(KIND=dp), DIMENSION(:), POINTER            :: work
-      COMPLEX(KIND=dp), DIMENSION(:, :), POINTER         :: A
-      INTEGER                                            :: handle, i, info, liwork, lrwork, lwork, n
-      INTEGER, DIMENSION(:), POINTER                     :: iwork
-      REAL(KIND=dp)                                      :: pos_eval
-      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:)           :: eigenvalues
-      REAL(KIND=dp), DIMENSION(:), POINTER               :: rwork
-
-      CALL timeset(routineN, handle)
-
-      ! code by Ole Schütt
-      IF (SIZE(matrix, 1) /= SIZE(matrix, 2)) CPABORT("expected square matrix")
-
-      ! make matrix perfectly Hermitian
-      matrix(:, :) = 0.5_dp*(matrix(:, :) + CONJG(TRANSPOSE(matrix(:, :))))
-
-      n = SIZE(matrix, 1)
-      ALLOCATE (iwork(1), rwork(1), work(1), A(n, n), eigenvalues(n), eigenvectors(n, n))
-
-      A(:, :) = matrix ! ZHEEVD will overwrite A
-      ! work space query
-      lwork = -1
-      lrwork = -1
-      liwork = -1
-
-      CALL ZHEEVD('V', 'U', n, A(1, 1), n, eigenvalues(1), &
-                  work(1), lwork, rwork(1), lrwork, iwork(1), liwork, info)
-      lwork = INT(REAL(work(1), dp))
-      lrwork = INT(REAL(rwork(1), dp))
-      liwork = iwork(1)
-
-      DEALLOCATE (iwork, rwork, work)
-      ALLOCATE (iwork(liwork))
-      iwork(:) = 0
-      ALLOCATE (rwork(lrwork))
-      rwork(:) = 0.0_dp
-      ALLOCATE (work(lwork))
-      work(:) = CMPLX(0.0_dp, 0.0_dp, KIND=dp)
-
-      CALL ZHEEVD('V', 'U', n, A(1, 1), n, eigenvalues(1), &
-                  work(1), lwork, rwork(1), lrwork, iwork(1), liwork, info)
-
-      eigenvectors(:, :) = A(:, :)
-
-      IF (info /= 0) CPABORT("diagonalization failed")
-
-      DO i = 1, n
-         IF (eigenvalues(i) > eps) THEN
-            pos_eval = (eigenvalues(i))**(0.5_dp*exponent)
-         ELSE
-            pos_eval = 0.0_dp
-         END IF
-         eigenvectors(:, i) = eigenvectors(:, i)*pos_eval
-      END DO
-
-      CALL ZGEMM("N", "C", n, n, n, z_one, eigenvectors, n, eigenvectors, n, z_zero, matrix, n)
-
-      DEALLOCATE (iwork, rwork, work, A, eigenvalues, eigenvectors)
-
-      CALL timestop(handle)
-
-   END SUBROUTINE power
-
 ! **************************************************************************************************
 !> \brief ...
 !> \param matrix ...
@@ -1087,7 +805,7 @@ SUBROUTINE mult_W_with_Minv(W_R, MWM_R, bs_env, qs_env)
          ! invert M_PQ(k)
          CALL power(M_inv, -1.0_dp, 0.0_dp)
 
-         ! W(k) = sum_R e^ikR W^R [k-mesh is not extrapolated for stable mult. with M^-1(k) ]
+         ! W(k) = sum_R e^ikR W^R [only R in the supercell that is determined by the SCF k-mesh]
          CALL trafo_rs_to_ikp(W_R, W_k, &
                               bs_env%kpoints_scf_desymm%index_to_cell, &
                               bs_env%kpoints_scf_desymm%xkp(1:3, ikp))
@@ -1352,93 +1070,6 @@ SUBROUTINE get_Minv_Vtr_Minv_R(Mi_Vtr_Mi_R, bs_env, qs_env)
 
    END SUBROUTINE get_Minv_Vtr_Minv_R
 
-! **************************************************************************************************
-!> \brief ...
-!> \param V_tr_R ...
-!> \param pot_type ...
-!> \param regularization_RI ...
-!> \param bs_env ...
-!> \param qs_env ...
-! **************************************************************************************************
-   SUBROUTINE get_V_tr_R(V_tr_R, pot_type, regularization_RI, bs_env, qs_env)
-      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: V_tr_R
-      TYPE(libint_potential_type)                        :: pot_type
-      REAL(KIND=dp)                                      :: regularization_RI
-      TYPE(post_scf_bandstructure_type), POINTER         :: bs_env
-      TYPE(qs_environment_type), POINTER                 :: qs_env
-
-      CHARACTER(LEN=*), PARAMETER                        :: routineN = 'get_V_tr_R'
-
-      INTEGER                                            :: handle, img, nimages_scf_desymm
-      INTEGER, ALLOCATABLE, DIMENSION(:)                 :: sizes_RI
-      INTEGER, DIMENSION(:), POINTER                     :: col_bsize, row_bsize
-      TYPE(cp_blacs_env_type), POINTER                   :: blacs_env
-      TYPE(cp_fm_type), ALLOCATABLE, DIMENSION(:)        :: fm_V_tr_R
-      TYPE(dbcsr_distribution_type)                      :: dbcsr_dist
-      TYPE(dbcsr_type), ALLOCATABLE, DIMENSION(:)        :: mat_V_tr_R
-      TYPE(distribution_2d_type), POINTER                :: dist_2d
-      TYPE(neighbor_list_set_p_type), DIMENSION(:), &
-         POINTER                                         :: sab_RI
-      TYPE(particle_type), DIMENSION(:), POINTER         :: particle_set
-      TYPE(qs_kind_type), DIMENSION(:), POINTER          :: qs_kind_set
-
-      CALL timeset(routineN, handle)
-
-      NULLIFY (sab_RI, dist_2d)
-
-      CALL get_qs_env(qs_env=qs_env, &
-                      blacs_env=blacs_env, &
-                      distribution_2d=dist_2d, &
-                      qs_kind_set=qs_kind_set, &
-                      particle_set=particle_set)
-
-      ALLOCATE (sizes_RI(bs_env%n_atom))
-      CALL get_particle_set(particle_set, qs_kind_set, nsgf=sizes_RI, basis=bs_env%basis_set_RI)
-      CALL build_2c_neighbor_lists(sab_RI, bs_env%basis_set_RI, bs_env%basis_set_RI, &
-                                   pot_type, "2c_nl_RI", qs_env, sym_ij=.FALSE., &
-                                   dist_2d=dist_2d)
-      CALL cp_dbcsr_dist2d_to_dist(dist_2d, dbcsr_dist)
-      ALLOCATE (row_bsize(SIZE(sizes_RI)))
-      ALLOCATE (col_bsize(SIZE(sizes_RI)))
-      row_bsize(:) = sizes_RI
-      col_bsize(:) = sizes_RI
-
-      nimages_scf_desymm = bs_env%nimages_scf_desymm
-      ALLOCATE (mat_V_tr_R(nimages_scf_desymm))
-      CALL dbcsr_create(mat_V_tr_R(1), "(RI|RI)", dbcsr_dist, dbcsr_type_no_symmetry, &
-                        row_bsize, col_bsize)
-      DEALLOCATE (row_bsize, col_bsize)
-
-      DO img = 2, nimages_scf_desymm
-         CALL dbcsr_create(mat_V_tr_R(img), template=mat_V_tr_R(1))
-      END DO
-
-      CALL build_2c_integrals(mat_V_tr_R, 0.0_dp, qs_env, sab_RI, bs_env%basis_set_RI, &
-                              bs_env%basis_set_RI, pot_type, do_kpoints=.TRUE., &
-                              ext_kpoints=bs_env%kpoints_scf_desymm, &
-                              regularization_RI=regularization_RI)
-
-      ALLOCATE (fm_V_tr_R(nimages_scf_desymm))
-      DO img = 1, nimages_scf_desymm
-         CALL cp_fm_create(fm_V_tr_R(img), bs_env%fm_RI_RI%matrix_struct)
-         CALL copy_dbcsr_to_fm(mat_V_tr_R(img), fm_V_tr_R(img))
-         CALL dbcsr_release(mat_V_tr_R(img))
-      END DO
-
-      IF (.NOT. ALLOCATED(V_tr_R)) THEN
-         ALLOCATE (V_tr_R(bs_env%n_RI, bs_env%n_RI, nimages_scf_desymm))
-      END IF
-
-      CALL fm_to_local_array(fm_V_tr_R, V_tr_R)
-
-      CALL cp_fm_release(fm_V_tr_R)
-      CALL dbcsr_distribution_release(dbcsr_dist)
-      CALL release_neighbor_list_sets(sab_RI)
-
-      CALL timestop(handle)
-
-   END SUBROUTINE get_V_tr_R
-
 ! **************************************************************************************************
 !> \brief ...
 !> \param t_1d ...
diff --git a/src/gw_utils.F b/src/gw_utils.F
index 915069fbec..4e717f71f1 100644
--- a/src/gw_utils.F
+++ b/src/gw_utils.F
@@ -29,10 +29,16 @@ MODULE gw_utils
                                               cp_cfm_type
    USE cp_control_types,                ONLY: dft_control_type
    USE cp_dbcsr_api,                    ONLY: dbcsr_create,&
+                                              dbcsr_distribution_release,&
+                                              dbcsr_distribution_type,&
                                               dbcsr_p_type,&
-                                              dbcsr_set
+                                              dbcsr_release,&
+                                              dbcsr_set,&
+                                              dbcsr_type,&
+                                              dbcsr_type_no_symmetry
    USE cp_dbcsr_operations,             ONLY: copy_dbcsr_to_fm,&
                                               copy_fm_to_dbcsr,&
+                                              cp_dbcsr_dist2d_to_dist,&
                                               dbcsr_allocate_matrix_set,&
                                               dbcsr_deallocate_matrix_set
    USE cp_files,                        ONLY: close_file,&
@@ -53,7 +59,10 @@ MODULE gw_utils
         dbt_clear, dbt_create, dbt_destroy, dbt_filter, dbt_iterator_blocks_left, &
         dbt_iterator_next_block, dbt_iterator_start, dbt_iterator_stop, dbt_iterator_type, &
         dbt_mp_environ_pgrid, dbt_pgrid_create, dbt_pgrid_destroy, dbt_pgrid_type, dbt_type
+   USE distribution_2d_types,           ONLY: distribution_2d_type
+   USE gw_communication,                ONLY: fm_to_local_array
    USE gw_integrals,                    ONLY: build_3c_integral_block
+   USE gw_kp_to_real_space_and_back,    ONLY: trafo_rs_to_ikp
    USE input_constants,                 ONLY: do_potential_truncated,&
                                               large_cell_Gamma,&
                                               ri_rpa_g0w0_crossing_newton,&
@@ -70,6 +79,7 @@ MODULE gw_utils
    USE kpoint_types,                    ONLY: get_kpoint_info,&
                                               kpoint_create,&
                                               kpoint_type
+   USE libint_2c_3c,                    ONLY: libint_potential_type
    USE libint_wrapper,                  ONLY: cp_libint_static_cleanup,&
                                               cp_libint_static_init
    USE machine,                         ONLY: m_memory,&
@@ -105,8 +115,11 @@ MODULE gw_utils
    USE qs_kind_types,                   ONLY: get_qs_kind,&
                                               qs_kind_type
    USE qs_ks_methods,                   ONLY: qs_ks_build_kohn_sham_matrix
-   USE qs_neighbor_list_types,          ONLY: neighbor_list_set_p_type
-   USE qs_tensors,                      ONLY: build_3c_integrals,&
+   USE qs_neighbor_list_types,          ONLY: neighbor_list_set_p_type,&
+                                              release_neighbor_list_sets
+   USE qs_tensors,                      ONLY: build_2c_integrals,&
+                                              build_2c_neighbor_lists,&
+                                              build_3c_integrals,&
                                               build_3c_neighbor_lists,&
                                               get_tensor_occupancy,&
                                               neighbor_list_3c_destroy
@@ -124,7 +137,7 @@ MODULE gw_utils
 
    PUBLIC :: create_and_init_bs_env_for_gw, de_init_bs_env, get_i_j_atoms, &
              kpoint_init_cell_index_simple, compute_xkp, time_to_freq, analyt_conti_and_print, &
-             add_R, is_cell_in_index_to_cell
+             add_R, is_cell_in_index_to_cell, get_V_tr_R, power
 
    CHARACTER(len=*), PARAMETER, PRIVATE :: moduleN = 'gw_utils'
 
@@ -198,6 +211,8 @@ SUBROUTINE create_and_init_bs_env_for_gw(qs_env, bs_env, bs_sec)
 
          CALL trafo_V_xc_R_to_kp(qs_env, bs_env)
 
+         CALL heuristic_RI_regularization(qs_env, bs_env)
+
       END SELECT
 
       CALL setup_time_and_frequency_minimax_grid(bs_env)
@@ -416,8 +431,8 @@ SUBROUTINE setup_kpoints_chi_eps_W(bs_env, kpoints)
 
       CHARACTER(LEN=*), PARAMETER :: routineN = 'setup_kpoints_chi_eps_W'
 
-      INTEGER                                            :: exp_two, handle, i_dim, n_dim, nkp, &
-                                                            nkp_extra, nkp_orig, u
+      INTEGER                                            :: handle, i_dim, n_dim, nkp, nkp_extra, &
+                                                            nkp_orig, u
       INTEGER, DIMENSION(3)                              :: nkp_grid, nkp_grid_extra, periodic
       REAL(KIND=dp)                                      :: exp_s_p, n_dim_inv
 
@@ -445,21 +460,15 @@ SUBROUTINE setup_kpoints_chi_eps_W(bs_env, kpoints)
 
             SELECT CASE (bs_env%small_cell_full_kp_or_large_cell_Gamma)
             CASE (large_cell_Gamma)
+
                nkp_grid(i_dim) = 4
                nkp_grid_extra(i_dim) = 6
+
             CASE (small_cell_full_kp)
-               ! for small cell, use the smallest 2^n x 2^m x 2^k k-mesh which is larger than
-               ! the SCF k-point grid; extrapolate with 2^(n+1) x 2^(m+1) x 2^(k+1) k-mesh
-               exp_two = 4
-               DO WHILE (.TRUE.)
-                  IF (exp_two > bs_env%kpoints_scf_desymm%nkp_grid(i_dim)) THEN
-                     nkp_grid(i_dim) = exp_two
-                     nkp_grid_extra(i_dim) = exp_two*2
-                     EXIT
-                  ELSE
-                     exp_two = exp_two*2
-                  END IF
-               END DO
+
+               nkp_grid(i_dim) = bs_env%kpoints_scf_desymm%nkp_grid(i_dim)*4
+               nkp_grid_extra(i_dim) = bs_env%kpoints_scf_desymm%nkp_grid(i_dim)*8
+
             END SELECT
 
          CASE DEFAULT
@@ -1300,6 +1309,11 @@ SUBROUTINE set_parallelization_parameters(qs_env, bs_env)
       u = bs_env%unit_nr
       IF (u > 0) THEN
          WRITE (u, '(T2,A,I47)') 'Group size for tensor operations', bs_env%group_size_tensor
+         IF (bs_env%group_size_tensor > 1 .AND. bs_env%n_atom < 5) THEN
+            WRITE (u, '(T2,A)') 'The requested group size is > 1 which can lead to bad performance.'
+            WRITE (u, '(T2,A)') 'Using more memory per MPI process might improve performance.'
+            WRITE (u, '(T2,A)') '(Also increase MEMORY_PER_PROC when using more memory per process.)'
+         END IF
       END IF
 
       CALL timestop(handle)
@@ -1490,19 +1504,13 @@ SUBROUTINE set_heuristic_parameters(bs_env, qs_env)
 
       bs_env%eps_eigval_mat_RI = 0.0_dp
 
-      IF (bs_env%input_regularization_RI > 0.0_dp) THEN
+      IF (bs_env%input_regularization_RI > -1.0E-12_dp) THEN
          bs_env%regularization_RI = bs_env%input_regularization_RI
       ELSE
-
          ! default case:
 
          ! 1. for periodic systems, we use the regularized resolution of the identity per default
-         SELECT CASE (bs_env%small_cell_full_kp_or_large_cell_Gamma)
-         CASE (large_cell_Gamma)
-            bs_env%regularization_RI = 1.0E-2_dp
-         CASE (small_cell_full_kp)
-            bs_env%regularization_RI = 1.0E-3_dp
-         END SELECT
+         bs_env%regularization_RI = 1.0E-2_dp
 
          ! 2. for molecules, no regularization is necessary
          IF (SUM(bs_env%periodic) == 0) bs_env%regularization_RI = 0.0_dp
@@ -1528,6 +1536,7 @@ SUBROUTINE set_heuristic_parameters(bs_env, qs_env)
          WRITE (u, FMT="(T2,2A,F15.1,A)") "Cutoff radius for the truncated Coulomb ", &
             "operator in RI metric:", bs_env%ri_metric%cutoff_radius*angstrom, " Å"
          WRITE (u, FMT="(T2,A,ES48.1)") "Regularization parameter of RI ", bs_env%regularization_RI
+         WRITE (u, FMT="(T2,A,I53)") "Lattice sum size for V(k):", bs_env%size_lattice_sum_V
       END IF
 
       CALL timestop(handle)
@@ -2754,6 +2763,251 @@ SUBROUTINE trafo_V_xc_R_to_kp(qs_env, bs_env)
 
    END SUBROUTINE trafo_V_xc_R_to_kp
 
+! **************************************************************************************************
+!> \brief ...
+!> \param qs_env ...
+!> \param bs_env ...
+! **************************************************************************************************
+   SUBROUTINE heuristic_RI_regularization(qs_env, bs_env)
+      TYPE(qs_environment_type), POINTER                 :: qs_env
+      TYPE(post_scf_bandstructure_type), POINTER         :: bs_env
+
+      CHARACTER(LEN=*), PARAMETER :: routineN = 'heuristic_RI_regularization'
+
+      COMPLEX(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)  :: M
+      INTEGER                                            :: handle, ikp, ikp_local, n_RI, nkp, &
+                                                            nkp_local, u
+      REAL(KIND=dp)                                      :: cond_nr, cond_nr_max, max_ev, &
+                                                            max_ev_ikp, min_ev, min_ev_ikp
+      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: M_R
+
+      CALL timeset(routineN, handle)
+
+      ! compute M^R_PQ = <phi_P,0|V^tr(rc)|phi_Q,R> for RI metric
+      CALL get_V_tr_R(M_R, bs_env%ri_metric, 0.0_dp, bs_env, qs_env)
+
+      nkp = bs_env%nkp_chi_eps_W_orig_plus_extra
+      n_RI = bs_env%n_RI
+
+      nkp_local = 0
+      DO ikp = 1, nkp
+         ! trivial parallelization over k-points
+         IF (MODULO(ikp, bs_env%para_env%num_pe) .NE. bs_env%para_env%mepos) CYCLE
+         nkp_local = nkp_local + 1
+      END DO
+
+      ALLOCATE (M(n_RI, n_RI, nkp_local))
+
+      ikp_local = 0
+      cond_nr_max = 0.0_dp
+      min_ev = 1000.0_dp
+      max_ev = -1000.0_dp
+
+      DO ikp = 1, nkp
+
+         ! trivial parallelization
+         IF (MODULO(ikp, bs_env%para_env%num_pe) .NE. bs_env%para_env%mepos) CYCLE
+
+         ikp_local = ikp_local + 1
+
+         ! M(k) = sum_R e^ikR M^R
+         CALL trafo_rs_to_ikp(M_R, M(:, :, ikp_local), &
+                              bs_env%kpoints_scf_desymm%index_to_cell, &
+                              bs_env%kpoints_chi_eps_W%xkp(1:3, ikp))
+
+         ! compute condition number of M_PQ(k)
+         CALL power(M(:, :, ikp_local), 1.0_dp, 0.0_dp, cond_nr, min_ev_ikp, max_ev_ikp)
+
+         IF (cond_nr > cond_nr_max) cond_nr_max = cond_nr
+         IF (max_ev_ikp > max_ev) max_ev = max_ev_ikp
+         IF (min_ev_ikp < min_ev) min_ev = min_ev_ikp
+
+      END DO ! ikp
+
+      CALL bs_env%para_env%max(cond_nr_max)
+
+      u = bs_env%unit_nr
+      IF (u > 0) THEN
+         WRITE (u, FMT="(T2,A,ES34.1)") "Min. abs. eigenvalue of RI metric matrix M(k)", min_ev
+         WRITE (u, FMT="(T2,A,ES34.1)") "Max. abs. eigenvalue of RI metric matrix M(k)", max_ev
+         WRITE (u, FMT="(T2,A,ES50.1)") "Max. condition number of M(k)", cond_nr_max
+      END IF
+
+      CALL timestop(handle)
+
+   END SUBROUTINE heuristic_RI_regularization
+
+! **************************************************************************************************
+!> \brief ...
+!> \param V_tr_R ...
+!> \param pot_type ...
+!> \param regularization_RI ...
+!> \param bs_env ...
+!> \param qs_env ...
+! **************************************************************************************************
+   SUBROUTINE get_V_tr_R(V_tr_R, pot_type, regularization_RI, bs_env, qs_env)
+      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:, :, :)     :: V_tr_R
+      TYPE(libint_potential_type)                        :: pot_type
+      REAL(KIND=dp)                                      :: regularization_RI
+      TYPE(post_scf_bandstructure_type), POINTER         :: bs_env
+      TYPE(qs_environment_type), POINTER                 :: qs_env
+
+      CHARACTER(LEN=*), PARAMETER                        :: routineN = 'get_V_tr_R'
+
+      INTEGER                                            :: handle, img, nimages_scf_desymm
+      INTEGER, ALLOCATABLE, DIMENSION(:)                 :: sizes_RI
+      INTEGER, DIMENSION(:), POINTER                     :: col_bsize, row_bsize
+      TYPE(cp_blacs_env_type), POINTER                   :: blacs_env
+      TYPE(cp_fm_type), ALLOCATABLE, DIMENSION(:)        :: fm_V_tr_R
+      TYPE(dbcsr_distribution_type)                      :: dbcsr_dist
+      TYPE(dbcsr_type), ALLOCATABLE, DIMENSION(:)        :: mat_V_tr_R
+      TYPE(distribution_2d_type), POINTER                :: dist_2d
+      TYPE(neighbor_list_set_p_type), DIMENSION(:), &
+         POINTER                                         :: sab_RI
+      TYPE(particle_type), DIMENSION(:), POINTER         :: particle_set
+      TYPE(qs_kind_type), DIMENSION(:), POINTER          :: qs_kind_set
+
+      CALL timeset(routineN, handle)
+
+      NULLIFY (sab_RI, dist_2d)
+
+      CALL get_qs_env(qs_env=qs_env, &
+                      blacs_env=blacs_env, &
+                      distribution_2d=dist_2d, &
+                      qs_kind_set=qs_kind_set, &
+                      particle_set=particle_set)
+
+      ALLOCATE (sizes_RI(bs_env%n_atom))
+      CALL get_particle_set(particle_set, qs_kind_set, nsgf=sizes_RI, basis=bs_env%basis_set_RI)
+      CALL build_2c_neighbor_lists(sab_RI, bs_env%basis_set_RI, bs_env%basis_set_RI, &
+                                   pot_type, "2c_nl_RI", qs_env, sym_ij=.FALSE., &
+                                   dist_2d=dist_2d)
+      CALL cp_dbcsr_dist2d_to_dist(dist_2d, dbcsr_dist)
+      ALLOCATE (row_bsize(SIZE(sizes_RI)))
+      ALLOCATE (col_bsize(SIZE(sizes_RI)))
+      row_bsize(:) = sizes_RI
+      col_bsize(:) = sizes_RI
+
+      nimages_scf_desymm = bs_env%nimages_scf_desymm
+      ALLOCATE (mat_V_tr_R(nimages_scf_desymm))
+      CALL dbcsr_create(mat_V_tr_R(1), "(RI|RI)", dbcsr_dist, dbcsr_type_no_symmetry, &
+                        row_bsize, col_bsize)
+      DEALLOCATE (row_bsize, col_bsize)
+
+      DO img = 2, nimages_scf_desymm
+         CALL dbcsr_create(mat_V_tr_R(img), template=mat_V_tr_R(1))
+      END DO
+
+      CALL build_2c_integrals(mat_V_tr_R, 0.0_dp, qs_env, sab_RI, bs_env%basis_set_RI, &
+                              bs_env%basis_set_RI, pot_type, do_kpoints=.TRUE., &
+                              ext_kpoints=bs_env%kpoints_scf_desymm, &
+                              regularization_RI=regularization_RI)
+
+      ALLOCATE (fm_V_tr_R(nimages_scf_desymm))
+      DO img = 1, nimages_scf_desymm
+         CALL cp_fm_create(fm_V_tr_R(img), bs_env%fm_RI_RI%matrix_struct)
+         CALL copy_dbcsr_to_fm(mat_V_tr_R(img), fm_V_tr_R(img))
+         CALL dbcsr_release(mat_V_tr_R(img))
+      END DO
+
+      IF (.NOT. ALLOCATED(V_tr_R)) THEN
+         ALLOCATE (V_tr_R(bs_env%n_RI, bs_env%n_RI, nimages_scf_desymm))
+      END IF
+
+      CALL fm_to_local_array(fm_V_tr_R, V_tr_R)
+
+      CALL cp_fm_release(fm_V_tr_R)
+      CALL dbcsr_distribution_release(dbcsr_dist)
+      CALL release_neighbor_list_sets(sab_RI)
+
+      CALL timestop(handle)
+
+   END SUBROUTINE get_V_tr_R
+
+! **************************************************************************************************
+!> \brief ...
+!> \param matrix ...
+!> \param exponent ...
+!> \param eps ...
+!> \param cond_nr ...
+!> \param min_ev ...
+!> \param max_ev ...
+! **************************************************************************************************
+   SUBROUTINE power(matrix, exponent, eps, cond_nr, min_ev, max_ev)
+      COMPLEX(KIND=dp), DIMENSION(:, :)                  :: matrix
+      REAL(KIND=dp)                                      :: exponent, eps
+      REAL(KIND=dp), OPTIONAL                            :: cond_nr, min_ev, max_ev
+
+      CHARACTER(len=*), PARAMETER                        :: routineN = 'power'
+
+      COMPLEX(KIND=dp), ALLOCATABLE, DIMENSION(:, :)     :: eigenvectors
+      COMPLEX(KIND=dp), DIMENSION(:), POINTER            :: work
+      COMPLEX(KIND=dp), DIMENSION(:, :), POINTER         :: A
+      INTEGER                                            :: handle, i, info, liwork, lrwork, lwork, n
+      INTEGER, DIMENSION(:), POINTER                     :: iwork
+      REAL(KIND=dp)                                      :: pos_eval
+      REAL(KIND=dp), ALLOCATABLE, DIMENSION(:)           :: eigenvalues
+      REAL(KIND=dp), DIMENSION(:), POINTER               :: rwork
+
+      CALL timeset(routineN, handle)
+
+      ! code by Ole Schütt
+      IF (SIZE(matrix, 1) /= SIZE(matrix, 2)) CPABORT("expected square matrix")
+
+      ! make matrix perfectly Hermitian
+      matrix(:, :) = 0.5_dp*(matrix(:, :) + CONJG(TRANSPOSE(matrix(:, :))))
+
+      n = SIZE(matrix, 1)
+      ALLOCATE (iwork(1), rwork(1), work(1), A(n, n), eigenvalues(n), eigenvectors(n, n))
+
+      A(:, :) = matrix ! ZHEEVD will overwrite A
+      ! work space query
+      lwork = -1
+      lrwork = -1
+      liwork = -1
+
+      CALL ZHEEVD('V', 'U', n, A(1, 1), n, eigenvalues(1), &
+                  work(1), lwork, rwork(1), lrwork, iwork(1), liwork, info)
+      lwork = INT(REAL(work(1), dp))
+      lrwork = INT(REAL(rwork(1), dp))
+      liwork = iwork(1)
+
+      DEALLOCATE (iwork, rwork, work)
+      ALLOCATE (iwork(liwork))
+      iwork(:) = 0
+      ALLOCATE (rwork(lrwork))
+      rwork(:) = 0.0_dp
+      ALLOCATE (work(lwork))
+      work(:) = CMPLX(0.0_dp, 0.0_dp, KIND=dp)
+
+      CALL ZHEEVD('V', 'U', n, A(1, 1), n, eigenvalues(1), &
+                  work(1), lwork, rwork(1), lrwork, iwork(1), liwork, info)
+
+      eigenvectors(:, :) = A(:, :)
+
+      IF (info /= 0) CPABORT("diagonalization failed")
+
+      IF (PRESENT(cond_nr)) cond_nr = MAXVAL(ABS(eigenvalues))/MINVAL(ABS(eigenvalues))
+      IF (PRESENT(min_ev)) min_ev = MINVAL(ABS(eigenvalues))
+      IF (PRESENT(max_ev)) max_ev = MAXVAL(ABS(eigenvalues))
+
+      DO i = 1, n
+         IF (eigenvalues(i) > eps) THEN
+            pos_eval = (eigenvalues(i))**(0.5_dp*exponent)
+         ELSE
+            pos_eval = 0.0_dp
+         END IF
+         eigenvectors(:, i) = eigenvectors(:, i)*pos_eval
+      END DO
+
+      CALL ZGEMM("N", "C", n, n, n, z_one, eigenvectors, n, eigenvectors, n, z_zero, matrix, n)
+
+      DEALLOCATE (iwork, rwork, work, A, eigenvalues, eigenvectors)
+
+      CALL timestop(handle)
+
+   END SUBROUTINE power
+
 ! **************************************************************************************************
 !> \brief ...
 !> \param bs_env ...
diff --git a/src/input_cp2k_mp2.F b/src/input_cp2k_mp2.F
index ce0b10bfbd..329a2d28bd 100644
--- a/src/input_cp2k_mp2.F
+++ b/src/input_cp2k_mp2.F
@@ -682,7 +682,9 @@ SUBROUTINE create_ri_g0w0(section)
 
       CPASSERT(.NOT. ASSOCIATED(section))
       CALL section_create(section, __LOCATION__, name="GW", &
-                          description="Parameters influencing the RI-G0W0 method", &
+                          description="Parameters influencing GW calculations on molecules, "// &
+                          "see also 'Electronic band structure from GW', "// &
+                          "https://manual.cp2k.org/trunk/methods/properties/bandstructure_gw.html.", &
                           n_keywords=24, n_subsections=1, repeats=.FALSE.)
 
       NULLIFY (keyword, subsection)
diff --git a/src/input_cp2k_properties_dft.F b/src/input_cp2k_properties_dft.F
index 9d25be74dc..91c0f4ad77 100644
--- a/src/input_cp2k_properties_dft.F
+++ b/src/input_cp2k_properties_dft.F
@@ -2165,7 +2165,8 @@ SUBROUTINE create_soc_section(section)
                           "<https://doi.org/10.1103/PhysRevB.58.3641>, "// &
                           "$V_{\mu\nu}^{\mathrm{SOC}, (\alpha)} = "// &
                           "(\hbar/2) \langle \phi_\mu | \sum_l \Delta "// &
-                          "V_l^\mathrm{SO}(\mathbf{r},\mathbf{r}') L^{(\alpha)} | \phi_\nu \rangle, "// &
+                          "V_l^\mathrm{SO}(\mathbf{r},\mathbf{r}') "// &
+                          "L^{(\alpha)} | \phi_\nu \rangle, "// &
                           "\alpha = x, y, z$.", &
                           n_keywords=1, n_subsections=1, repeats=.FALSE.)
 
@@ -2184,7 +2185,7 @@ SUBROUTINE create_soc_section(section)
                           "\varepsilon_\mathrm{CBM}+E_\mathrm{window}/2]$. Might be necessary "// &
                           "to use for large systems to prevent numerical instabilities.", &
                           usage="ENERGY_WINDOW 5.0", &
-                          default_r_val=cp_unit_to_cp2k(value=-1.0_dp, unit_str="eV"), &
+                          default_r_val=cp_unit_to_cp2k(value=40.0_dp, unit_str="eV"), &
                           unit_str="eV")
       CALL section_add_keyword(section, keyword)
       CALL keyword_release(keyword)
diff --git a/src/post_scf_bandstructure_utils.F b/src/post_scf_bandstructure_utils.F
index baade39a1a..362289ee3e 100644
--- a/src/post_scf_bandstructure_utils.F
+++ b/src/post_scf_bandstructure_utils.F
@@ -1119,6 +1119,12 @@ SUBROUTINE dos_pdos_ldos(qs_env, bs_env)
             band_edges_G0W0_SOC%DBG = 1000.0_dp
          END IF
 
+         IF (bs_env%unit_nr > 0) THEN
+            WRITE (bs_env%unit_nr, '(A)') ''
+            WRITE (bs_env%unit_nr, '(T2,A,F43.1,A)') 'SOC requested, SOC energy window:', &
+               bs_env%energy_window_soc*evolt, ' eV'
+         END IF
+
       END IF
 
       IF (bs_env%do_ldos) THEN
diff --git a/src/rpa_gw.F b/src/rpa_gw.F
index c50e3bf6f0..19471d051c 100644
--- a/src/rpa_gw.F
+++ b/src/rpa_gw.F
@@ -4428,8 +4428,6 @@ SUBROUTINE continuation_pade(vec_gw_energ, vec_omega_fit_gw, &
                                            - Eigenval(n_level_gw_ref) + Eigenval_scf(n_level_gw_ref)
       END IF
 
-      iunit = cp_logger_get_default_unit_nr()
-
       IF (PRESENT(min_level_self_energy) .AND. PRESENT(max_level_self_energy)) THEN
          IF (n_level_gw_ref >= min_level_self_energy .AND. &
              n_level_gw_ref <= max_level_self_energy) THEN
@@ -4449,6 +4447,7 @@ SUBROUTINE continuation_pade(vec_gw_energ, vec_omega_fit_gw, &
       IF (ndos /= 0) THEN
          ! Hedin shift not implemented
          CPASSERT(.NOT. do_hedin_shift)
+         iunit = cp_logger_get_default_unit_nr()
          DO idos = 1, ndos
             omega_dos = dos_lower_bound + REAL(idos - 1, KIND=dp)*dos_precision
             omega_dos_pade_eval = omega_dos - e_fermi
@@ -4480,30 +4479,31 @@ SUBROUTINE continuation_pade(vec_gw_energ, vec_omega_fit_gw, &
       END IF
 
       IF (PRESENT(min_level_self_energy) .AND. PRESENT(max_level_self_energy)) THEN
-      IF (n_level_gw_ref >= min_level_self_energy .AND. &
-          n_level_gw_ref <= max_level_self_energy .AND. iunit > 0) THEN
+         iunit = cp_logger_get_default_unit_nr()
+         IF (n_level_gw_ref >= min_level_self_energy .AND. &
+             n_level_gw_ref <= max_level_self_energy .AND. iunit > 0) THEN
 
-         CALL open_file('self_energy_re_'//TRIM(string_level)//'.dat', unit_number=iunit, &
-                        file_status="UNKNOWN", file_action="WRITE")
-         DO idos = 1, ndos
-            omega_dos = dos_lower_bound + REAL(idos - 1, KIND=dp)*dos_precision
-            WRITE (iunit, '(F17.10, F17.10)') omega_dos*evolt, vec_sigma_real(idos)*evolt
-         END DO
+            CALL open_file('self_energy_re_'//TRIM(string_level)//'.dat', unit_number=iunit, &
+                           file_status="UNKNOWN", file_action="WRITE")
+            DO idos = 1, ndos
+               omega_dos = dos_lower_bound + REAL(idos - 1, KIND=dp)*dos_precision
+               WRITE (iunit, '(F17.10, F17.10)') omega_dos*evolt, vec_sigma_real(idos)*evolt
+            END DO
 
-         CALL close_file(iunit)
+            CALL close_file(iunit)
 
-         CALL open_file('self_energy_im_'//TRIM(string_level)//'.dat', unit_number=iunit, &
-                        file_status="UNKNOWN", file_action="WRITE")
-         DO idos = 1, ndos
-            omega_dos = dos_lower_bound + REAL(idos - 1, KIND=dp)*dos_precision
-            WRITE (iunit, '(F17.10, F17.10)') omega_dos*evolt, vec_sigma_imag(idos)*evolt
-         END DO
+            CALL open_file('self_energy_im_'//TRIM(string_level)//'.dat', unit_number=iunit, &
+                           file_status="UNKNOWN", file_action="WRITE")
+            DO idos = 1, ndos
+               omega_dos = dos_lower_bound + REAL(idos - 1, KIND=dp)*dos_precision
+               WRITE (iunit, '(F17.10, F17.10)') omega_dos*evolt, vec_sigma_imag(idos)*evolt
+            END DO
 
-         CALL close_file(iunit)
+            CALL close_file(iunit)
 
-         DEALLOCATE (vec_sigma_real)
-         DEALLOCATE (vec_sigma_imag)
-      END IF
+            DEALLOCATE (vec_sigma_real)
+            DEALLOCATE (vec_sigma_imag)
+         END IF
       END IF
 
       !*** perform crossing search
@@ -4571,6 +4571,7 @@ SUBROUTINE continuation_pade(vec_gw_energ, vec_omega_fit_gw, &
 
          END IF
 
+         iunit = cp_logger_get_default_unit_nr()
          CALL open_file(TRIM(filename), unit_number=iunit, file_status="UNKNOWN", file_action="WRITE")
 
          num_omega = 10000
diff --git a/tests/QS/regtest-scalable-gw/07_G0W0_periodic_H2Te_kp_sampling_SCF.inp b/tests/QS/regtest-scalable-gw/07_G0W0_periodic_H2Te_kp_sampling_SCF.inp
new file mode 100644
index 0000000000..508381d833
--- /dev/null
+++ b/tests/QS/regtest-scalable-gw/07_G0W0_periodic_H2Te_kp_sampling_SCF.inp
@@ -0,0 +1,80 @@
+&GLOBAL
+  PRINT_LEVEL SILENT
+  PROJECT scalable_GW
+  RUN_TYPE ENERGY
+  &TIMINGS
+    THRESHOLD 0.01
+  &END TIMINGS
+&END GLOBAL
+
+&FORCE_EVAL
+  METHOD Quickstep
+  &DFT
+    BASIS_SET_FILE_NAME BASIS_MOLOPT
+    BASIS_SET_FILE_NAME HFX_BASIS
+    BASIS_SET_FILE_NAME ./REGTEST_BASIS
+    POTENTIAL_FILE_NAME GTH_SOC_POTENTIALS
+    SORT_BASIS EXP
+    &KPOINTS
+      PARALLEL_GROUP_SIZE -1
+      SCHEME MONKHORST-PACK 1 4 4
+    &END KPOINTS
+    &MGRID
+      CUTOFF 100
+      REL_CUTOFF 20
+    &END MGRID
+    &QS
+      EPS_DEFAULT 1.0E-15
+      EPS_PGF_ORB 1.0E-15
+      METHOD GPW
+    &END QS
+    &SCF
+      ADDED_MOS -1
+      EPS_SCF 1.0E-5
+      MAX_SCF 100
+      SCF_GUESS ATOMIC
+    &END SCF
+    &XC
+      &XC_FUNCTIONAL PBE
+      &END XC_FUNCTIONAL
+    &END XC
+  &END DFT
+  &PROPERTIES
+    &BANDSTRUCTURE
+      &DOS
+        KPOINTS 1 2 2
+      &END DOS
+      &GW
+        EPS_FILTER 1.0E-5
+        NUM_TIME_FREQ_POINTS 6
+      &END GW
+      &SOC
+      &END SOC
+    &END BANDSTRUCTURE
+  &END PROPERTIES
+  &SUBSYS
+    &CELL
+      ABC [angstrom] 12.000 6.000 8.000
+      MULTIPLE_UNIT_CELL 1 1 1
+      PERIODIC YZ
+    &END CELL
+    &COORD
+      H  0.0 -0.5 -4.5
+      Te 0.5  0.0  4.5
+      H  0.0  0.5 -4.5
+    &END COORD
+    &KIND H
+      BASIS_SET ORB DZVP-GTH
+      BASIS_SET RI_AUX RI-dummy-regtest
+      POTENTIAL GTH-PBE-q1
+    &END KIND
+    &KIND Te
+      BASIS_SET ORB SZV-MOLOPT-SR-GTH
+      BASIS_SET RI_AUX RI-dummy-regtest
+      POTENTIAL GTH-PBE-q6
+    &END KIND
+    &TOPOLOGY
+      MULTIPLE_UNIT_CELL 1 1 1
+    &END TOPOLOGY
+  &END SUBSYS
+&END FORCE_EVAL
diff --git a/tests/QS/regtest-scalable-gw/TEST_FILES b/tests/QS/regtest-scalable-gw/TEST_FILES
index ede4580900..243681d1b3 100644
--- a/tests/QS/regtest-scalable-gw/TEST_FILES
+++ b/tests/QS/regtest-scalable-gw/TEST_FILES
@@ -1,11 +1,12 @@
 01_G0W0_periodic_H2O.inp                                    106     1e-03        9.861
 02_G0W0_IH_SOC_LDOS.inp                                     106     1e-03       17.826
 02_G0W0_IH_SOC_LDOS.inp                                     107     1e-03       11.344
-03_G0W0_bandstructure_IH_chain.inp                          106     5e-04       21.977
-03_G0W0_bandstructure_IH_chain.inp                          108     5e-04       21.692
+03_G0W0_bandstructure_IH_chain.inp                          106     5e-04       21.962
+03_G0W0_bandstructure_IH_chain.inp                          108     5e-04       21.678
 04_G0W0_SOC_TeH_chain_open_shell_kp_extrapol.inp            106     1e-03        5.174
 05_G0W0_SOC_TeH_chain_open_shell.inp                        106     5e-04        5.178
 05_G0W0_SOC_TeH_chain_open_shell.inp                        107     5e-04        0.306
 05_G0W0_SOC_TeH_chain_open_shell.inp                        108     5e-04        5.134
 06_G0W0_periodic_H2O_Hedin_shift.inp                        106     1e-03       10.417
+07_G0W0_periodic_H2Te_kp_sampling_SCF.inp                   108     5e-04       10.315
 #EOF