HFX-RI with k-Points

The resolution-of-the-identity (RI) technique is implemented for various methods in CP2K, each time in a slightly different flavor. Hartree-Fock exchange (HFX) has two distinct RI implementations: One for \(\Gamma\)-point calculations, and one for k-point sampling, which is covered on this page.

The implementation for RI-HFXk (with k-point sampling) is described in doi:10.1063/5.0189659. All files necessary to run the examples discussed below can be found here.

Application domain

The RI-HFX implementation with k-point sampling (RI-HFXk) is optimized for the simulation of small unit cells and dense k-point meshes. It is much more efficient than equivalent \(\Gamma\)-point supercell calculations.

Brief theory recap

In HFX calculations with k-point sampling, the exact-exchange contribution to the KS matrix at a given k-point is expressed as

\[ K_{\mu\nu}^\mathbf{k} = \sum_\mathbf{k'} P_{\sigma\lambda}^{\mathbf{k'}} (\mu^\mathbf{k}\sigma^{\mathbf{k'}} | \nu^\mathbf{k}\lambda^{\mathbf{k'}}) \]

In CP2K, most k-space matrices are obtained via a Fourier transform of their real-space counterpart. This is also the case for the exact-exchange matrix:

\[ K_{\mu\nu}^{\mathbf{k}} = \sum_\mathbf{R} e^{i\mathbf{k}\cdot\mathbf{R}}\ K^\mathbf{R}_{\mu\nu} \]

where \(\mathbf{R}\) is the translation vector for a periodic image of the simulation cell. The real-space matrix is expressed as:

\[ K^\mathbf{b}_{\mu\nu} = \sum_{\mathbf{a},\mathbf{c}}\ P_{\sigma,\lambda}^{\mathbf{c}}\ (\mu^\mathbf{0}\sigma^\mathbf{a}| \nu^\mathbf{b}\lambda^\mathbf{a+c}) \]

where \(\mathbf{a, b, c}\) are periodic image indices. Note that in \(\Gamma\)-point calculations, there is a single k-point with \(\mathbf{k} = 0\), leading to the same density matrix for all \(\mathbf{c}\). It can then be taken out of the sum, and the \(\Gamma\)-point formula from far above is recovered.

In the RI-HFXk method, the real-space exact-exchange matrices are calculated with a local atom-specific RI basis:

\[\begin{split} \begin{aligned} K_{\mu i,\nu j}^\textbf{b} = \sum_{\mathbf{a}, \mathbf{c}} \ &P^\mathbf{c}_{\sigma,\lambda} \ (\mu^\mathbf{0}_i\sigma^\mathbf{a}\lfloor P^\mathbf{0}_i)\ (P^\mathbf{0}_i\lfloor R^\mathbf{0}_i)^{-1}\ (R^\mathbf{0}_i | S^\mathbf{b}_j)\\ &(S^\mathbf{b}_j\lfloor Q^\mathbf{b}_j)^{-1}\ (Q^\mathbf{b}_j\lfloor \nu^\mathbf{b}_j \lambda^{\mathbf{a}+\mathbf{c}}) \end{aligned} \end{split}\]

where indices \(i,j\) refer to atoms, and \(K^\mathbf{b}_{\mu i,\nu j}\) correspond to the \(\mu,\nu\) AO pair in the \(i,j\) atomic block of the matrix for periodic image \(\mathbf{b}\). The local RI basis {\(P^\mathbf{0}_i\)} for atom \(i\) in the reference cell is composed of RI basis elements coming from all atoms within a sphere a radius \(R_\text{max}\) centered on atom \(i\). \(R_\text{max}\) is the extent of the most diffuse AO in the system. You can find more details in the paper accepted, not yet published paper.

Because building the real-space exchange matrices is the most expensive part of the calculation, the method has a constant cost regardless of the k-point mesh. When the number of k-point becomes high (~1000), that scaling becomes linear because a matrix has to be diagonalized at each k-point. The ADMM method can be seamlessly used with RI-HFXk.

Simple example (graphene band structure)

In this example, the band structure of graphene at the ADMM-PBE0 level of theory is calculated. We use the pob-TZVP-rev2 basis set because it is not diffuse, and therefore efficient. Note that more diffuse basis sets, such as the ccGRB family distributed with CP2K, seem to produce more robust results. As the ADMM auxiliary basis set, we use the lower quality pob-DZVP-rev2.

To calculate a band structure, we first converge a SCF cycle with a dense, general k-point mesh. In this case, we use a 19x19x1 Monkhorst-Pack mesh. Note that the special point K where the Dirac cone is located is not explicitly in the mesh. This allows for a simpler calculation since the system is treated as a semi-conductor. The mesh is however dense enough that the physics is well captured. After the SCF is converged, the collection of real-space KS matrices \(F^\textbf{b}_{\mu\nu}\) is back Fourier transformed to \(F^\textbf{k}_{\mu\nu}\) along the k-point path of interest specified in the input. The matrices are diagonalized, and their eigenvalues used as band energies. You can used the cp2k_bs2csv to transform the resulting CP2K output to a more workable CSV file.

Note that in this input file, we select a value of \(1.0\times 10^{-6}\) for EPS_PGF_ORB. This parameter controls the range of AOs, and threfore the extent of the local atom-specific RI basis sets. This value leads to particularly high accuracy. The default of \(1.0\times 10^{-5}\) is typically enough.

The HFX potential was selected as the Truncated Coulomb operator with a cutoff radius of \(R_C = 5.0\) Angstroms. As for all periodic HFX calculations, a limited range potential is required. In case of k-point calculations, a sphere with the radius of \(R_C\) must fit the in the BvK supercell (equivalent of L/2 requirement in \(\Gamma\)-point calculations). If the requirement is not met, a warning is issued. The ideal truncation radius to be used for converged results is system dependend, and careful testing should be done. In practice, most calculations are converged from \(R_C = 6.0\) Angstroms on.

Note that there is no specification for the RI metric in the input file below. The default is to use the same operator for the HFX potential and the RI metric (TC with \(R_C = 5.0\) Angstroms, in this case). Contrary to \(\Gamma\)-point calculations, using longer ranged RI metric does not dramatically affect speed in RI-HFXk, while insuring best possible accuracy.

As for most HFX calculations, the SCF convergence can be spedup by restarting from a converged PBE wavefunction. Note that in k-point restart files, the real-space density matrices are dumped. However, many more images are required for HFX calculation than for PBE, due to the non-locality of exact-exchange. Using a very tight value of EPS_PGF_ORB (e.g. \(1.0\times 10^{-12}\)) in the initial PBE calculation leads to a lot of images there as well. An example is provided in the example file bundle. The example bellow takes about 5 minutes to run on 32 CPUs if restarted from a PBE wavefunction, and 10 minutes otherwise.

  PROJECT graphene_kp
    !restarting from a converged PBE calculation lead to less SCF steps

    !Turning on the ADMM approximation

      !sometimes necessary when running small systems with a lot of CPUs
      !needs to be the same value as that in RI%EPS_PGF_ORB
      EPS_PGF_ORB 1.0E-6
    &END  QS

      CUTOFF 600
      REL_CUTOFF 60
      NGRIDS 5

      EPS_SCF 1.0E-06
      MAX_SCF 50
      !typically need lower threshold to start DIIS with k-points
      EPS_DIIS 0.05
    &END SCF

          SCALE_X 0.75
        FRACTION 0.25
          KP_NGROUPS 16
          !using a smaller than default EPS_PGF_ORB allows for a
          !more accurate calculation with a larger local RI basis
          EPS_PGF_ORB 1.0E-6
        &END RI
          !Always use a limited ranged potential in PBCs
          CUTOFF_RADIUS 5.0
      &END HF
    &END XC
        ADDED_MOS 5
          NPOINTS 50
          SPECIAL_POINT GAMMA 0.0000000000 0.0000000000 0.0000000000
          SPECIAL_POINT M 0.5000000000 0.0000000000 0.0000000000
          SPECIAL_POINT K 0.3333333333 0.3333333333 0.0000000000
          SPECIAL_POINT GAMMA 0.0000000000 0.0000000000 0.0000000000
        &END  KPOINT_SET
        FILE_NAME graphene_kp.bs
      !enough space between 2 sheets of graphene not to interact
      ABC 2.46 2.46 20.000
      ALPHA_BETA_GAMMA 90.0 90.0 120.0
      C 0.3333333 0.6666667 0.000
      C 0.6666667 0.3333333 0.000
    &KIND C
      BASIS_SET pob-TZVP-rev2
      BASIS_SET AUX_FIT pob-DZVP-rev2

Important input parameters

There are a few important input parameters for RI-HFXk calculations:

  • EPS_FILTER: the filtering threshold for sparse tensors. Works the same way as for \(\Gamma\)-point calculations (see above).

  • RI_METRIC: using the default value for the RI metric, which correspond to the choice of HFX potential, is the way to go. It insures best possible accuracy, while only marginally increasing the costs.

  • KP_NGROUPS: this is a performance keyword. During the calculation of the real-space exact-exchange matrices, the work is split among MPI subcommunicators. Using more groups drastically speeds up the calculation (efficienctly up to 16 groups, reasonably up to 32). This comes with a memory overhead though, as some data must be replicated on each subgroup. The total number of MPI ranks must be divisible by the number of groups.

  • EPS_PGF_ORB: generally determines the range of GTOs in the AO basis. As such, it also determines the extent of the local atom-specific RI basis used for RI-HFXk. The default value of \(1.0\times 10^{-5}\) has proven to be accurate and fast.

  • KP_USE_DELTA_P: when set to .TRUE. (default value), the next SCF step is calculated using the density matrix difference, rather than the full new density matrix. This helps with computational efficiency by increasing sparsity. If your calculation struggles to converge, you can try to turn this off.

The rest of the HF/RI input parameters related to k-point sampling (all with a KP_ prefix) have little to no impact, and their default values are good enough.