Low-scaling post Hartree-Fock
Brief theory recap
The low-scaling post-HF methods implemented in CP2K are SOS-MP2 and (direct) RPA. To achieve the desired low-scaling (sub-cubic), all equations are formulated in the atomic orbital (AO) basis, using the resolution-of-the-identity technique with a short range metric, and a numerical Laplace transform. A central quantity of post-HF is the collection of 4-center 2-electron electron repulsion integrals (ERIs). In the AO basis and within the RI approximation:
where \(\lfloor\) denotes the short range RI metric, and \(P,Q,R,S\) are RI basis elements. A short range metric increases the sparsity of the 3-center RI integrals \((\mu\sigma\lfloor P)\), resulting in more efficient sparse tensor contractions. The shortest possible ranged RI metric is the overlap. Using the truncated Coulomb with a short truncation radius (1.5-2.0 Angstroms) is recommended in sparser systems (e.g. water), while the overlap is typically sufficient for dense solids. The 3- and 2-center RI ERIs are calculated analytically with the libint library. The 2-center potential ERIs, \((Q|R)\), are calculated numerically with the GPW scheme, because it uses the (very) long range 1/r Coulomb interaction.
During post-HF calculations, the ERIs discussed above are contracted with various other quantities, with a specific order of operations for maximum performance. The details can be found in Wilhelm2016b for the initial RPA energy implementation. Bussy2023 covers the detailed implementation of SOS-MP2 and RPA nuclear gradients. Calculations for the energy, forces, MD, cell and gemoetry optimization are possible. While these low-scaling methods have more favorable scaling than their standard implementations (sub-cubic vs quartic), they suffer from a heavy prefactor. Therefore, they are only interesting for large to very large systems (hundreds of atoms), and will still consume a lot of resources. For smaller system, the standard MP2 and RPA implementations are recomended (they also have gradients implemented).
Simple examples
We cover two simple examples of low-scaling post-HF calculations: SOS-MP2 water MD, and titania RPA@PBE0 cell optimization. These examples are meant to be representative of the available options, and the parameters to pay attention to. They are also relatively afforable (although the titania input might require a few nodes on an HPC). To maintain a resonable cost, the systems are small and with low quality basis sets. In practice, such small systems should be simulated with the standard quartic scaling implementation. Moreover, it is never recommended to run post-HF calculations with double-zeta quality basis sets. All files required to run these examples can be found here.
SOS-MP2 liquid water
This example is a short 3 steps MD liquid water calculation (32 molecules). The RI-MP2 optimized cc-TZ and RI_TZ basis sets are used. Whenever available, one should use pre-optimized RI basis sets for better performance. By default, the RI basis is generated on the fly. While it yields accurate results, the automatically generated RI basis sets are larger, and therefore less efficient. Using a short range RI metric improves efficiency by introducing sparsity. However, in somewhat sparse systems like water, the overlap metric lacks accuracy (Bussy2023). Using the truncated Coulomb operator with a cutoff radius of 1.5-2.0 Angstroms as a RI metric is safer.
All post-HF calculations need an accurate and well converged SCF as a starting point. In this case, a pure Hartree-Fock calculation is performed. Note the use of the truncated Coulomb operator as HFX potential, with cutoff radius < L/2, as always required in periodic HF. Here, the value of 4.5 Angstroms is somewhat small, and simulating a larger water box would be advised.
A numerical Laplace transform is used to reduce the scaling of the method. The integral is replaced by a weighted sum over a few grid points, placed according to the MINIMAX algortihm. Typically, using 6-8 points is enough. CP2K can go up to 20 points, although more points do not necessarily increase accuracy (and sometime bring numerical instability). The MINIMAX quadrature depends on the band gap of the system and the total spread of the SCF eigenvalues. If the seleceted number of grid point is innapropriate, a warning is issued.
&GLOBAL
PROJECT water32
RUN_TYPE MD
PRINT_LEVEL MEDIUM
&END GLOBAL
&MOTION
&MD
STEPS 3
&END MD
&END MOTION
&FORCE_EVAL
&DFT
!cc-TZ: RI-MP2 optimized basis sets
BASIS_SET_FILE_NAME BASIS_RI_cc-TZ
POTENTIAL_FILE_NAME POTENTIAL
SORT_BASIS EXP
&MGRID
CUTOFF 600
REL_CUTOFF 50
NGRIDS 5
&END MGRID
&SCF
SCF_GUESS RESTART
EPS_SCF 1.0E-6
MAX_SCF 40
&END SCF
&XC
&XC_FUNCTIONAL NONE
&END XC_FUNCTIONAL
&HF
FRACTION 1.0
&INTERACTION_POTENTIAL
!TC potential with cutoff < L/2 for periodic HFX
POTENTIAL_TYPE TRUNCATED
CUTOFF_RADIUS 4.5
&END INTERACTION_POTENTIAL
&MEMORY
!maximum memory allocated to HFX ERI storage, per MPI rank
!the optimal number depends on the specifics of the computer
MAX_MEMORY 4000
&END MEMORY
&END HF
&WF_CORRELATION
!explicit opposite-spin scaling
SCALE_S 1.3
&RI_SOS_MP2
!MINIMAX quadrature by default
QUADRATURE_POINTS 6
&END RI_SOS_MP2
!Enabling low-scaling
&LOW_SCALING
MEMORY_CUT 3
&END LOW_SCALING
&RI
&RI_METRIC
!Short range RI metric for SOS-MP2
POTENTIAL_TYPE TRUNCATED
CUTOFF_RADIUS 1.5
&END RI_METRIC
&END RI
&INTEGRALS
ERI_METHOD GPW
&WFC_GPW
!Safe yet faster than default values
CUTOFF 200
REL_CUTOFF 40
&END WFC_GPW
&END INTEGRALS
&END WF_CORRELATION
&END XC
&END DFT
&SUBSYS
&CELL
ABC 9.8528 9.8528 9.8528
&END CELL
&TOPOLOGY
COORD_FILE_FORMAT XYZ
COORD_FILE_NAME ./H2O-32.xyz
&END TOPOLOGY
&KIND H
BASIS_SET cc-DZ
BASIS_SET RI_AUX RI_DZ
POTENTIAL GTH-HF
&END KIND
&KIND O
BASIS_SET cc-DZ
BASIS_SET RI_AUX RI_DZ
POTENTIAL GTH-HF
&END KIND
&END SUBSYS
&END FORCE_EVAL
RPA bulk titania
This second example is 2 steps of a cell optimization of titania at the RPA@PBE0 level (72 atoms). All low-scaling post-HF methods implemented in CP2K can be used with the ADMM (ADMM2 flavor) approximation to speed up the Hartree-Fock calculations for the SCF, the response forces, and EXX (in case of RPA). Here, the admm-dzp basis is used as an auxiliary ADMM basis set, and ccGRB-D as the primary basis. The RI basis is generated automatically.
The overlap RI metric is used in this example. For dense systems, this choice of metric offers a
good tradeoff between accuracy and efficiency. Note that the &HF
input section is repeated; it
appears once for the SCF specification, and once for the RPA. In RPA@PBE0, the SCF is first
converged at the PBE0 level of theory. Then, the RPA correlation energy is computed using the PBE0
orbitals and eigenvalues, and finally, the PBE0 exchange-correlation energy is replaced by the exact
exchange energy (EXX) calculated with the same orbitals. The &HF
section in RPA refers to that
last step. If taken to be the same as the SCF (except for FRACTION
and MEMORY
), the stored ERIs
can be reused and rescaled instead of recomputed, thus saving precious computational time. The
ADMM
keyword in RPA specifies that the EXX should be calculated within the ADMM approximation.
Note the truncated Coulomb potential used for HFX, and its cutoff radius of 4.5 Angstroms (< L/2),
necessary for periodic HFX calculations. This cutoff radius is on the shorter size, and using a
larger simulation cell would be advised.
The MINIMAX_QUADRATURE
is explicitely requested in the input. This is a necessary step for
efficiency, as the default grid follows a Clenshaw-Curtis scheme, which requires many more
integration points for the same accuracy.
&GLOBAL
PROJECT TiO2
RUN_TYPE CELL_OPT
PRINT_LEVEL MEDIUM
&END GLOBAL
&MOTION
&CELL_OPT
MAX_ITER 2
&END CELL_OPT
&END MOTION
&FORCE_EVAL
STRESS_TENSOR ANALYTICAL
&DFT
BASIS_SET_FILE_NAME BASIS_ccGRB_UZH
BASIS_SET_FILE_NAME BASIS_ADMM_UZH
POTENTIAL_FILE_NAME POTENTIAL_UZH
SORT_BASIS EXP
!automatically generated RI basis set
AUTO_BASIS RI_AUX SMALL
!enabling the ADMM2 approximation
&AUXILIARY_DENSITY_MATRIX_METHOD
METHOD BASIS_PROJECTION
ADMM_PURIFICATION_METHOD NONE
EXCH_CORRECTION_FUNC PBEX
&END AUXILIARY_DENSITY_MATRIX_METHOD
&MGRID
CUTOFF 600
REL_CUTOFF 50
NGRIDS 5
&END MGRID
&SCF
EPS_SCF 1.0E-6
MAX_SCF 20
&OT
PRECONDITIONER FULL_ALL
MINIMIZER DIIS
&END OT
&OUTER_SCF
MAX_SCF 5
EPS_SCF 1.0E-6
&END OUTER_SCF
&END SCF
&XC
&XC_FUNCTIONAL
&PBE
SCALE_X 0.75
&END PBE
&END XC_FUNCTIONAL
&HF
FRACTION 0.25
!always use a short range potential < L/2 in periodic HFX
&INTERACTION_POTENTIAL
POTENTIAL_TYPE TRUNCATED
CUTOFF_RADIUS 4.5
&END INTERACTION_POTENTIAL
&MEMORY
!maximum memory allocated to HFX ERI storage, per MPI rank
!the optimal number depends on the specifics of the computer
MAX_MEMORY 4000
&END MEMORY
&END HF
&WF_CORRELATION
&RI_RPA
MINIMAX_QUADRATURE
QUADRATURE_POINTS 6
!calculate EXX with ADMM, using the same HF section as in SCF
!so that the integrals can be resued without recomputing
ADMM
&HF
FRACTION 1.0
&INTERACTION_POTENTIAL
POTENTIAL_TYPE TRUNCATED
CUTOFF_RADIUS 4.5
&END INTERACTION_POTENTIAL
&END HF
&END RI_RPA
!enabling low-scaling
&LOW_SCALING
MEMORY_CUT 3
&END LOW_SCALING
&RI
!overlap RI metric is appropriate for dense systems
&RI_METRIC
POTENTIAL_TYPE IDENTITY
&END RI_METRIC
&END RI
&INTEGRALS
ERI_METHOD GPW
!Safe yet faster than default values
&WFC_GPW
CUTOFF 200
REL_CUTOFF 40
&END WFC_GPW
&END INTEGRALS
&END WF_CORRELATION
&END XC
&END DFT
&SUBSYS
&CELL
ABC 9.330 9.330 9.107
&END CELL
&TOPOLOGY
COORD_FILE_FORMAT XYZ
COORD_FILE_NAME ./TiO2.xyz
&END TOPOLOGY
&KIND Ti
BASIS_SET ccGRB-D-q12
BASIS_SET AUX_FIT admm-dz-q12
POTENTIAL GTH-PBE0-q12
&END KIND
&KIND O
BASIS_SET ccGRB-D-q6
BASIS_SET AUX_FIT admm-dz-q6
POTENTIAL GTH-PBE0-q6
&END KIND
&END SUBSYS
&END FORCE_EVAL
Important input parameters
There are a few importnat input parameters for low-scaling post-HF calculations:
SORT_BASIS
, in the DFT input section, should be set to EXP. This way, basis elements are sorted according to their exponents, leading to increased sparsity and performance.EPS_FILTER
, in the LOW_SCALING input section. The bottleneck of the low-scaling post-HF calculations consists of sparse tensor contractions. This parameter is the threshold for block sparsity during calculations, with a safe default of \(1.0\times10^{-9}\). A looser threshold might significantly speedup calculations (not recommended to go higher than \(1.0\times10^{-8}\)).MEMORY_CUT
, in the LOW_SCALING input section. This keyword influences the batching strategy for large tensor contractions. In post-HF, some tensor contractions are done in multiple steps, with large intermediate results. Storing these intermediates can lead to memory shortage. The value ofMEMORY_CUT
(default of 5) indicates how large tensors are split into batches, such that smaller intermediates can be stored. Note that MEMORY_CUT 3 does not mean that the total memory consumption of the program is divided by three (initial and final tensors are not affected, as well as the data concerning the rest of the calculation). A higher value reduces the memory footprint, but leads to performance overheads.RI METRIC
: the choice of RI metric is crucial for performance in periodic calculations. A
shorter ranged RI metric will generally be more efficient, while a longer ranged metric (e.g. TC
with a cutoff radius of 1.5 Angstrom) can be more accurate. See Bussy2023 for a full discussion.