Nequip and Allegro

This Colab tutorial illustrates how to train an equivariant neural network interatomic potential for bulk water using the Allegro framework. You will learn how to train a model, deploy it in production, and run molecular dynamics simulations in CP2K. The training and inference will be carried out on the GPU provided by the Colab environment.

Allegro is designed for constructing highly accurate and scalable interatomic potentials for molecular dynamics simulations. The methodology is described in detail in this paper (Musaelian2023). An open-source package that implements Allegro, built on the Nequip framework was developed by the Allegro and NequIP authors, A. Musaelian, S. Batzner, A. Johansson, L. Sun, C. J. Owen, M. Kornbluth, B. Kozinsky.

Input Section

Inference in CP2K is performed through the NEQUIP and ALLEGRO sections. As an example, the relevant section for Allegro (or similarly for NequIP) is:

&ALLEGRO
  ATOMS Si
  PARM_FILE_NAME Allegro/si-deployed.pth
  UNIT_COORDS angstrom
  UNIT_ENERGY eV
  UNIT_FORCES eV*angstrom^-1
&END ALLEGRO

where the si-deployed.pth refers to the PyTorch model that was deployed using the Allegro framework, and the UNIT tags refer to the units of the coordinates, energy and forces of the model itself. An example for the full input file can be found in the Colab tutorial and on the regtests, see Allegro_si_MD.inp

Input details

The tag ATOMS expects a list of elements/kinds in a way and order that is consistent with the YAML file of NequIP and Allegro. If this is not done unphysical results will be obtained. Additionally, the atomic coordinates in the COORD or TOPOLOGY section have to be provided in a way that is consistent with the YAML file. If this is not done unphysical results will be obtained. Spotting such issues is quite straightforward as the energy is significantly wrong. For example, by inverting the order of one of the elements in the test regtest-nequip/NequIP_water.inp, the error with respect to the reference value is of the order of 1 eV. Additionally, running MD leads rapidly to highly unstable simulations.

Compiling CP2K with LibTorch

Running with NequIP or Allegro requires compiling CP2K with the libtorch library. For the CP2K binaries running on CPUs installing the toolchain using the flag --with-libtorch is enough. To benefit from (often significant) GPU acceleration, the precompiled Libtorch library for CUDA can be obtained at https://pytorch.org, for example for CUDA 11.8:

wget https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcu118.zip

After extracting the libtorch CUDA binaries, the toolchain script ./install_cp2k_toolchain.sh can be run providing the appropriate path with the flag --with-libtorch=<path-to-libtorch-cuda>.

Further Resources

For additional references on NequIP, Allegro and equivariant neural networks (e3nn) see:

Allegro paper Musaelian2023 and code https://github.com/mir-group/allegro
NequIP paper Batzner2022 and code https://github.com/mir-group/nequip
A Tutorial on LAMMPS by the NequIP/Allegro authors is found at the Colab notebook here
For an introduction to e3nn see e3nn.org and doi:10.5281/zenodo.7430260