References: Borstnik2014, Schuett2016
Configuration options for the DBCSR library. [Edit on GitHub]
Keyword descriptions
- AVG_ELEMENTS_IMAGES: integer = 0
Usage: avg_elements_images 10000
Average number of elements (dense limit) for each image, which also corresponds to the average number of elements exchanged between MPI processes during the operations. A negative or zero value means unlimited. [Edit on GitHub]
- COMM_THREAD_LOAD: integer = -1
Usage: comm_thread_load 50
If a communications thread is used, specify how much multiplication workload (%) the thread should perform in addition to communication tasks. A negative value leaves the decision up to DBCSR. [Edit on GitHub]
- MAX_ELEMENTS_PER_BLOCK: integer = 32
Default block size for turning dense matrices in blocked ones [Edit on GitHub]
- MM_DRIVER: enum = AUTO
Usage: mm_driver blas
Valid values:
Choose automatically the best available driverBLAS
BLAS (requires the BLAS library at link time)MATMUL
Library optimised for Small Matrix Multiplies (requires the SMM library at link time)XSMM
References: Heinecke2016
Select which backend to use preferably for matrix block multiplications on the host. [Edit on GitHub]
- MM_STACK_SIZE: integer = -1
Usage: mm_stack_size 1000
Size of multiplication parameter stack. A negative value leaves the decision up to DBCSR. [Edit on GitHub]
- MULTREC_LIMIT: integer = 512
Recursion limit of cache oblivious multrec algorithm. [Edit on GitHub]
- NUM_LAYERS_3D: integer = 1
Usage: num_layers_3D 1
Number of layers for the 3D multplication algorithm. [Edit on GitHub]
- NUM_MULT_IMAGES: integer = 1
Usage: num_mult_images 2
Multiplicative factor for number of virtual images. [Edit on GitHub]
- N_SIZE_MNK_STACKS: integer = 3
Usage: n_size_mnk_stacks 2
Number of stacks to use for distinct atomic sizes (e.g., 2 for a system of mostly waters). [Edit on GitHub]
- USE_COMM_THREAD: logical = T
Usage: use_comm_thread T
During multiplication, use a thread to periodically poll MPI to progress outstanding message completions. This is beneficial on systems without a DMA-capable network adapter e.g. Cray XE6. [Edit on GitHub]
- USE_MEMPOOLS_CPU: logical = F
Enable memory pools on the CPU. [Edit on GitHub]
- USE_MPI_ALLOCATOR: logical = F
Usage: use_mpi_allocator T
Use MPI allocator to allocate buffers used in MPI communications. [Edit on GitHub]
- USE_MPI_RMA: logical = F
Usage: use_mpi_rma F
Use RMA for MPI communications for each image, which also corresponds to the number of elements exchanged between MPI processes during the operations. [Edit on GitHub]