DBCSR

References: Borstnik2014, Schuett2016

Configuration options for the DBCSR library. [Edit on GitHub]

Subsections

Keywords

Keyword descriptions

AVG_ELEMENTS_IMAGES: integer = 0

Usage: avg_elements_images 10000

Average number of elements (dense limit) for each image, which also corresponds to the average number of elements exchanged between MPI processes during the operations. A negative or zero value means unlimited. [Edit on GitHub]

COMM_THREAD_LOAD: integer = -1

Usage: comm_thread_load 50

If a communications thread is used, specify how much multiplication workload (%) the thread should perform in addition to communication tasks. A negative value leaves the decision up to DBCSR. [Edit on GitHub]

MAX_ELEMENTS_PER_BLOCK: integer = 32

Usage: MAX_ELEMENTS_PER_BLOCK 32

Default block size for turning dense matrices in blocked ones [Edit on GitHub]

MM_DRIVER: enum = AUTO

Usage: mm_driver blas

Valid values:

  • AUTO Choose automatically the best available driver

  • BLAS BLAS (requires the BLAS library at link time)

  • MATMUL Fortran MATMUL

  • SMM Library optimised for Small Matrix Multiplies (requires the SMM library at link time)

  • XSMM LIBXSMM

References: Heinecke2016

Select which backend to use preferably for matrix block multiplications on the host. [Edit on GitHub]

MM_STACK_SIZE: integer = -1

Usage: mm_stack_size 1000

Size of multiplication parameter stack. A negative value leaves the decision up to DBCSR. [Edit on GitHub]

MULTREC_LIMIT: integer = 512

Recursion limit of cache oblivious multrec algorithm. [Edit on GitHub]

NUM_LAYERS_3D: integer = 1

Usage: num_layers_3D 1

Number of layers for the 3D multplication algorithm. [Edit on GitHub]

NUM_MULT_IMAGES: integer = 1

Usage: num_mult_images 2

Multiplicative factor for number of virtual images. [Edit on GitHub]

N_SIZE_MNK_STACKS: integer = 3

Usage: n_size_mnk_stacks 2

Number of stacks to use for distinct atomic sizes (e.g., 2 for a system of mostly waters). [Edit on GitHub]

USE_COMM_THREAD: logical = T

Usage: use_comm_thread T

During multiplication, use a thread to periodically poll MPI to progress outstanding message completions. This is beneficial on systems without a DMA-capable network adapter e.g. Cray XE6. [Edit on GitHub]

USE_MEMPOOLS_CPU: logical = F

Enable memory pools on the CPU. [Edit on GitHub]

USE_MPI_ALLOCATOR: logical = F

Usage: use_mpi_allocator T

Use MPI allocator to allocate buffers used in MPI communications. [Edit on GitHub]

USE_MPI_RMA: logical = F

Usage: use_mpi_rma F

Use RMA for MPI communications for each image, which also corresponds to the number of elements exchanged between MPI processes during the operations. [Edit on GitHub]