DBCSR
References: Borstnik2014, Schuett2016
Configuration options for the DBCSR library. [Edit on GitHub]
Keywords
Keyword descriptions
- AVG_ELEMENTS_IMAGES: integer = 0 
 Usage: avg_elements_images 10000
Average number of elements (dense limit) for each image, which also corresponds to the average number of elements exchanged between MPI processes during the operations. A negative or zero value means unlimited. [Edit on GitHub]
- COMM_THREAD_LOAD: integer = -1 
 Usage: comm_thread_load 50
If a communications thread is used, specify how much multiplication workload (%) the thread should perform in addition to communication tasks. A negative value leaves the decision up to DBCSR. [Edit on GitHub]
- MAX_ELEMENTS_PER_BLOCK: integer = 32 
 Usage: MAX_ELEMENTS_PER_BLOCK 32
Default block size for turning dense matrices in blocked ones [Edit on GitHub]
- MM_DRIVER: enum = AUTO 
 Usage: mm_driver blas
Valid values:
AUTOChoose automatically the best available driverBLASBLAS (requires the BLAS library at link time)MATMULFortran MATMULSMMLibrary optimised for Small Matrix Multiplies (requires the SMM library at link time)XSMMLIBXSMM
References: Heinecke2016
Select which backend to use preferably for matrix block multiplications on the host. [Edit on GitHub]
- MM_STACK_SIZE: integer = -1 
 Usage: mm_stack_size 1000
Size of multiplication parameter stack. A negative value leaves the decision up to DBCSR. [Edit on GitHub]
- MULTREC_LIMIT: integer = 512 
 Recursion limit of cache oblivious multrec algorithm. [Edit on GitHub]
- NUM_LAYERS_3D: integer = 1 
 Usage: num_layers_3D 1
Number of layers for the 3D multplication algorithm. [Edit on GitHub]
- NUM_MULT_IMAGES: integer = 1 
 Usage: num_mult_images 2
Multiplicative factor for number of virtual images. [Edit on GitHub]
- N_SIZE_MNK_STACKS: integer = 3 
 Usage: n_size_mnk_stacks 2
Number of stacks to use for distinct atomic sizes (e.g., 2 for a system of mostly waters). [Edit on GitHub]
- USE_COMM_THREAD: logical = T 
 Usage: use_comm_thread T
During multiplication, use a thread to periodically poll MPI to progress outstanding message completions. This is beneficial on systems without a DMA-capable network adapter e.g. Cray XE6. [Edit on GitHub]
- USE_MEMPOOLS_CPU: logical = F 
 Enable memory pools on the CPU. [Edit on GitHub]
- USE_MPI_ALLOCATOR: logical = F 
 Usage: use_mpi_allocator T
Use MPI allocator to allocate buffers used in MPI communications. [Edit on GitHub]
- USE_MPI_RMA: logical = F 
 Usage: use_mpi_rma F
Use RMA for MPI communications for each image, which also corresponds to the number of elements exchanged between MPI processes during the operations. [Edit on GitHub]