CP_FM_GEMM

Section can be repeated.

Benchmark and test the cp_fm_gemm routines by multiplying C=A*B [Edit on GitHub]

Keywords

FORCE_BLOCKSIZE
GRID_2D
K
M
N
NCOL_BLOCK
NROW_BLOCK
N_LOOP
ROW_MAJOR
TRANSA
TRANSB

Keyword descriptions

FORCE_BLOCKSIZE: logical = F 

Lone keyword: T

Usage: FORCE_BLOCKSIZE

Forces the blocksize, even if this implies that a few processes might have no data [Edit on GitHub]

GRID_2D: integer[2] = 1 1 

Usage: GRID_2D 64 16

Explicitly set the blacs 2D processor layout. If the product differs from the number of MPI ranks, it is ignored and a default nearly square layout is used. [Edit on GitHub]

K: integer = 256 

Usage: A 1024

Dimension 1 of C [Edit on GitHub]

M: integer = 256 

Usage: A 1024

Inner dimension M [Edit on GitHub]

N: integer = 256 

Usage: A 1024

Dimension 2 of C [Edit on GitHub]

NCOL_BLOCK: integer = 32 

Usage: nrow_block 64

block_size for cols [Edit on GitHub]

NROW_BLOCK: integer = 32 

Usage: nrow_block 64

block_size for rows [Edit on GitHub]

N_LOOP: integer = 10 

Usage: N_LOOP 10

Number of cp_fm_gemm operations being timed (useful for small matrices). [Edit on GitHub]

ROW_MAJOR: logical = T 

Lone keyword: T

Usage: ROW_MAJOR .FALSE.

Use a row major blacs grid [Edit on GitHub]

TRANSA: logical = F 

Lone keyword: T

Usage: TRANSA

Transpose matrix A [Edit on GitHub]

TRANSB: logical = F 

Lone keyword: T

Usage: TRANSB

Transpose matrix B [Edit on GitHub]