DBCSR
References: Borstnik2014, Schuett2016
Configuration options for the DBCSR library. [Edit on GitHub]
Keywords
Keyword descriptions
- AVG_ELEMENTS_IMAGES: integer = 0
Usage: avg_elements_images 10000
Average number of elements (dense limit) for each image, which also corresponds to the average number of elements exchanged between MPI processes during the operations. A negative or zero value means unlimited. [Edit on GitHub]
- COMM_THREAD_LOAD: integer = -1
Usage: comm_thread_load 50
If a communications thread is used, specify how much multiplication workload (%) the thread should perform in addition to communication tasks. A negative value leaves the decision up to DBCSR. [Edit on GitHub]
- MAX_ELEMENTS_PER_BLOCK: integer = 32
Usage: MAX_ELEMENTS_PER_BLOCK 32
Default block size for turning dense matrices in blocked ones [Edit on GitHub]
- MM_DRIVER: enum = AUTO
Usage: mm_driver blas
Valid values:
AUTO
Choose automatically the best available driverBLAS
BLAS (requires the BLAS library at link time)MATMUL
Fortran MATMULSMM
Library optimised for Small Matrix Multiplies (requires the SMM library at link time)XSMM
LIBXSMM
References: Heinecke2016
Select which backend to use preferably for matrix block multiplications on the host. [Edit on GitHub]
- MM_STACK_SIZE: integer = -1
Usage: mm_stack_size 1000
Size of multiplication parameter stack. A negative value leaves the decision up to DBCSR. [Edit on GitHub]
- MULTREC_LIMIT: integer = 512
Recursion limit of cache oblivious multrec algorithm. [Edit on GitHub]
- NUM_LAYERS_3D: integer = 1
Usage: num_layers_3D 1
Number of layers for the 3D multplication algorithm. [Edit on GitHub]
- NUM_MULT_IMAGES: integer = 1
Usage: num_mult_images 2
Multiplicative factor for number of virtual images. [Edit on GitHub]
- N_SIZE_MNK_STACKS: integer = 3
Usage: n_size_mnk_stacks 2
Number of stacks to use for distinct atomic sizes (e.g., 2 for a system of mostly waters). [Edit on GitHub]
- USE_COMM_THREAD: logical = T
Usage: use_comm_thread T
During multiplication, use a thread to periodically poll MPI to progress outstanding message completions. This is beneficial on systems without a DMA-capable network adapter e.g. Cray XE6. [Edit on GitHub]
- USE_MEMPOOLS_CPU: logical = F
Enable memory pools on the CPU. [Edit on GitHub]
- USE_MPI_ALLOCATOR: logical = F
Usage: use_mpi_allocator T
Use MPI allocator to allocate buffers used in MPI communications. [Edit on GitHub]
- USE_MPI_RMA: logical = F
Usage: use_mpi_rma F
Use RMA for MPI communications for each image, which also corresponds to the number of elements exchanged between MPI processes during the operations. [Edit on GitHub]