DBCSR

References: Borstnik2014, Schuett2016

Configuration options for the DBCSR library. [Edit on GitHub]

Subsections

ACC
TENSOR

Keywords

AVG_ELEMENTS_IMAGES
COMM_THREAD_LOAD
MAX_ELEMENTS_PER_BLOCK
MM_DRIVER
MM_STACK_SIZE
MULTREC_LIMIT
NUM_LAYERS_3D
NUM_MULT_IMAGES
N_SIZE_MNK_STACKS
USE_COMM_THREAD
USE_MEMPOOLS_CPU
USE_MPI_ALLOCATOR
USE_MPI_RMA

Keyword descriptions

AVG_ELEMENTS_IMAGES: integer = 0 

Usage: avg_elements_images 10000

Average number of elements (dense limit) for each image, which also corresponds to the average number of elements exchanged between MPI processes during the operations. A negative or zero value means unlimited. [Edit on GitHub]

COMM_THREAD_LOAD: integer = -1 

Usage: comm_thread_load 50

If a communications thread is used, specify how much multiplication workload (%) the thread should perform in addition to communication tasks. A negative value leaves the decision up to DBCSR. [Edit on GitHub]

MAX_ELEMENTS_PER_BLOCK: integer = 32 

Usage: MAX_ELEMENTS_PER_BLOCK 32

Default block size for turning dense matrices in blocked ones [Edit on GitHub]

MM_DRIVER: enum = AUTO 

Usage: mm_driver blas

Valid values:

AUTO Choose automatically the best available driver
BLAS BLAS (requires the BLAS library at link time)
MATMUL Fortran MATMUL
SMM Library optimised for Small Matrix Multiplies (requires the SMM library at link time)
XSMM LIBXSMM

References: Heinecke2016

Select which backend to use preferably for matrix block multiplications on the host. [Edit on GitHub]

MM_STACK_SIZE: integer = -1 

Usage: mm_stack_size 1000

Size of multiplication parameter stack. A negative value leaves the decision up to DBCSR. [Edit on GitHub]

MULTREC_LIMIT: integer = 512 : Recursion limit of cache oblivious multrec algorithm. [Edit on GitHub]

NUM_LAYERS_3D: integer = 1 

Usage: num_layers_3D 1

Number of layers for the 3D multplication algorithm. [Edit on GitHub]

NUM_MULT_IMAGES: integer = 1 

Usage: num_mult_images 2

Multiplicative factor for number of virtual images. [Edit on GitHub]

N_SIZE_MNK_STACKS: integer = 3 

Usage: n_size_mnk_stacks 2

Number of stacks to use for distinct atomic sizes (e.g., 2 for a system of mostly waters). [Edit on GitHub]

USE_COMM_THREAD: logical = T 

Usage: use_comm_thread T

During multiplication, use a thread to periodically poll MPI to progress outstanding message completions. This is beneficial on systems without a DMA-capable network adapter e.g. Cray XE6. [Edit on GitHub]

USE_MEMPOOLS_CPU: logical = F : Enable memory pools on the CPU. [Edit on GitHub]

USE_MPI_ALLOCATOR: logical = F 

Usage: use_mpi_allocator T

Use MPI allocator to allocate buffers used in MPI communications. [Edit on GitHub]

USE_MPI_RMA: logical = F 

Usage: use_mpi_rma F

Use RMA for MPI communications for each image, which also corresponds to the number of elements exchanged between MPI processes during the operations. [Edit on GitHub]