SGEMM, DGEMM, CGEMM, and ZGEMM (Combined Matrix Multiplication and Addition for General Matrices, Their Transposes, or Conjugate Transposes)

Purpose

SGEMM and DGEMM can perform any one of the following combined matrix computations, using scalars α and β, matrices A and B or their transposes, and matrix C:

SGEMM and DGEMM Combined Matrix Computations
C αAB+βC C αABT+βC
C αATB+βC C αATBT+βC

CGEMM and ZGEMM can perform any one of the following combined matrix computations, using scalars α and β, matrices A and B, their transposes or their conjugate transposes, and matrix C:

CGEMM and ZGEMM Combined Matrix Computations
C αAB+βC C αABT+βC C αABH+βC
C αATB+βC C αATBT+βC C αATBH+βC
C αAHB+βC C αAHBT+βC C αAHBH+βC
Table 1. Data Types
Data Types
A, B, C, α, β Subroutine
Short-precision real SGEMM
Long-precision real DGEMM
Short-precision complex CGEMM
Long-precision complex ZGEMM
Note: On certain processors, SIMD algorithms may be used if alignment requirements are met. For further details, see Use of SIMD Algorithms by Some Subroutines in the Libraries Provided by ESSL.

Syntax

Language Syntax
Fortran CALL SGEMM | DGEMM | CGEMM | ZGEMM (transa, transb, l, n, m, alpha, a, lda, b, ldb, beta, c, ldc)
C and C++ sgemm | dgemm | cgemm | zgemm (transa, transb, l, n, m, alpha, a, lda, b, ldb, beta, c, ldc);
CBLAS cblas_sgemm | cblas_dgemm | cblas_cgemm | cblas_zgemm (cblas_layout, cblas_transa, cblas_transb, l, n, m, alpha, a, lda, b, ldb, beta, c, ldc);
On Entry
cblas_layout
indicates whether the input and output matrices are stored in row major order or column major order, where:
  • If cblas_layout = CblasRowMajor, the matrices are stored in row major order.
  • If cblas_layout = CblasColMajor, the matrices are stored in column major order.

Specified as: an object of enumerated type CBLAS_LAYOUT. It must be CblasRowMajor or CblasColMajor.

transa
indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', AT is used in the computation.

If transa = 'C', AH is used in the computation.

Specified as: a single character; transa = 'N', 'T', or 'C'.

cblas_transa
indicates the form of matrix A to use in the computation, where:

If cblas_transa = CblasNoTrans, A is used in the computation.

If cblas_transa = CblasTrans, AT is used in the computation.

If cblas_transa = CblasConjTrans, AH is used in the computation.

Specified as: an object of enumerated type CBLAS_TRANSPOSE. It must be CblasNoTrans, CblasTrans, or CblasConjTrans.

transb
indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', BT is used in the computation.

If transb = 'C', BH is used in the computation.

Specified as: a single character; transb = 'N', 'T', or 'C'.

cblas_transb
indicates the form of matrix B to use in the computation, where:

If cblas_transb = CblasNoTrans, B is used in the computation.

If cblas_transb = CblasTrans, BT is used in the computation.

If cblas_transb = CblasConjTrans, BH is used in the computation.

Specified as: an object of enumerated type CBLAS_TRANSPOSE. It must be CblasNoTrans, CblasTrans, or CblasConjTrans.

l
is the number of rows in matrix C.

Specified as: an integer; 0 lldc.

n
is the number of columns in matrix C.

Specified as: an integer; n0.

m
has the following meaning, where:

If transa = 'N', it is the number of columns in matrix A.

If transa = 'T' or 'C', it is the number of rows in matrix A.

In addition:

If transb = 'N', it is the number of rows in matrix B.

If transb = 'T' or 'C', it is the number of columns in matrix B.

Specified as: an integer; m0.

alpha
is the scalar α.
Specified as: a number of the data type indicated in Table 1.
a
is the matrix A, where:

If transa = 'N', A is used in the computation, and A has l rows and m columns.

If transa = 'T', AT is used in the computation, and A has m rows and l columns.

If transa = 'C', AH is used in the computation, and A has m rows and l columns.
Note: No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form.
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 1, where:

If transa = 'N', its size must be lda by (at least) m.

If transa = 'T' or 'C', its size must be lda by (at least) l.

lda
is the leading dimension of the array specified for a.

Specified as: an integer; lda > 0 and:

If transa = 'N', ldal.

If transa = 'T' or 'C', ldam.

b
is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

If transb = 'T', BT is used in the computation, and B has n rows and m columns.

If transb = 'C', BH is used in the computation, and B has n rows and m columns.
Note: No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form.
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 1, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T' or 'C', its size must be ldb by (at least) m.

ldb
is the leading dimension of the array specified for b.

Specified as: an integer; ldb > 0 and:

If transb = 'N', ldbm.

If transb = 'T' or 'C', ldbn.

beta
is the scalar β.
Specified as: a number of the data type indicated in Table 1.
c
is the l by n matrix C.
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 1.
ldc
is the leading dimension of the array specified for c.

Specified as: an integer; ldc > 0 and ldcl.

On Return
c
is the l by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 1.

Notes

  1. All subroutines accept lowercase letters for the transa and transb arguments.
  2. For SGEMM and DGEMM, if you specify 'C' for the transa or transb argument, it is interpreted as though you specified 'T'.
  3. Matrix C must have no common elements with matrices A or B; otherwise, results are unpredictable. See Vector concepts.

Function

The combined matrix addition and multiplication is expressed as follows, where aik, bkj, and cij are elements of matrices A, B, and C, respectively:

Combined Matrix Addition and Multiplication Graphic
See references [42] and [48]. In the following three cases, no computation is performed:
  • l is 0.
  • n is 0.
  • β is 1 and α is 0.

Assuming the above conditions do not exist, if β1 and m is 0, then βC is returned.

Special Usage

Equivalence Rules
The equivalence rules, defined for matrix multiplication of A and B in Special Usage, also apply to the matrix multiplication part of the computation performed by this subroutine. You should use the equivalent rules when you want to transpose or conjugate transpose the multiplication part of the computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your input and output array C has dimensions large enough to hold the resulting matrix. See Example 4.

Error conditions

Resource Errors
Unable to allocate internal work area.
Computational Errors
None
Input-Argument Errors
  1. cblas_layoutCblasRowMajor or CblasColMajor
  2. lda, ldb, ldc0
  3. l, m, n < 0
  4. l > ldc
  5. transa, transb 'N', 'T', or 'C'
  6. transa = 'N' and l > lda
  7. transa = 'T' or 'C' and m > lda
  8. cblas_transaCblasNoTrans, CblasTrans, or CblasConjTrans
  9. cblas_transa = CblasNoTrans and l > lda
  10. cblas_transa = CblasTrans, or CblasConjTrans and m > lda
  11. transb = 'N' and m > ldb
  12. transb = 'T' or 'C' and n > ldb
  13. cblas_transbCblasNoTrans, CblasTrans, or CblasConjTrans
  14. cblas_transb = CblasNoTrans and m > ldb
  15. cblas_transb = CblasTrans, or CblasConjTrans and n > ldb

Examples

Example 1

This example shows the computation CαAB+βC, where A, B, and C are contained in larger arrays A, B, and C, respectively.

Call Statement and Input:
           TRANSA TRANSB  L   N   M  ALPHA  A  LDA  B  LDB  BETA  C  LDC
             |      |     |   |   |    |    |   |   |   |    |    |   |
CALL SGEMM( 'N'  , 'N'  , 6 , 4 , 5 , 1.0 , A , 8 , B , 6 , 2.0 , C , 7 )
                                       
        |  1.0   2.0  -1.0  -1.0   4.0 |
        |  2.0   0.0   1.0   1.0  -1.0 |
        |  1.0  -1.0  -1.0   1.0   2.0 |
A    =  | -3.0   2.0   2.0   2.0   0.0 |
        |  4.0   0.0  -2.0   1.0  -1.0 |
        | -1.0  -1.0   1.0  -3.0   2.0 |
        |   .     .     .     .     .  |
        |   .     .     .     .     .  |
                                       
                                 
        |  1.0  -1.0   0.0   2.0 |
        |  2.0   2.0  -1.0  -2.0 |
B    =  |  1.0   0.0  -1.0   1.0 |
        | -3.0  -1.0   1.0  -1.0 |
        |  4.0   2.0  -1.0   1.0 |
        |   .     .     .     .  |
                                 
                             
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
C    =  | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        |  .    .    .    .  |
                             
Output:
                                 
        | 24.0  13.0  -5.0   3.0 |
        | -3.0  -4.0   2.0   4.0 |
        |  4.0   1.0   2.0   5.0 |
C    =  | -2.0   6.0  -1.0  -9.0 |
        | -4.0  -6.0   5.0   5.0 |
        | 16.0   7.0  -4.0   7.0 |
        |   .     .     .     .  |
                                 
Example 2

This example shows the computation CαABT+βC, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B in which it is contained.

Call Statement and Input:
           TRANSA TRANSB  L   N   M  ALPHA  A  LDA  B  LDB  BETA  C  LDC
             |      |     |   |   |    |    |   |   |   |    |    |   |
CALL SGEMM( 'N'  , 'T'  , 3 , 3 , 2 , 1.0 , A , 4 , B , 3 , 2.0 , C , 5 )
                    
        | 1.0  -3.0 |
A    =  | 2.0   4.0 |
        | 1.0  -1.0 |
        |  .     .  |
                    
                    
        | 1.0  -3.0 |
B    =  | 2.0   4.0 |
        | 1.0  -1.0 |
                    
                        
        | 0.5  0.5  0.5 |
        | 0.5  0.5  0.5 |
C    =  | 0.5  0.5  0.5 |
        |  .    .    .  |
        |  .    .    .  |
                        
Output:
                           
        | 11.0  -9.0   5.0 |
        | -9.0  21.0  -1.0 |
C    =  |  5.0  -1.0   3.0 |
        |   .     .     .  |
        |   .     .     .  |
                           
Example 3

This example shows the computation CαAB+βC using complex data, where A, B, and C are contained in larger arrays, A, B, and C, respectively.

Call Statement and Input:
           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'N'  , 'N'  , 6 , 2 , 3 , ALPHA , A , 8 , B , 4 , BETA , C , 8 )
ALPHA    =  (1.0, 0.0)
BETA     =  (2.0, 0.0)
 
                                             
        | (1.0, 5.0)  (9.0, 2.0)  (1.0, 9.0) |
        | (2.0, 4.0)  (8.0, 3.0)  (1.0, 8.0) |
        | (3.0, 3.0)  (7.0, 5.0)  (1.0, 7.0) |
A    =  | (4.0, 2.0)  (4.0, 7.0)  (1.0, 5.0) |
        | (5.0, 1.0)  (5.0, 1.0)  (1.0, 6.0) |
        | (6.0, 6.0)  (3.0, 6.0)  (1.0, 4.0) |
        |     .           .           .      |
        |     .           .           .      |
                                             
                                 
        | (1.0, 8.0)  (2.0, 7.0) |
B    =  | (4.0, 4.0)  (6.0, 8.0) |
        | (6.0, 2.0)  (4.0, 5.0) |
        |     .           .      |
                                 
                                 
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
C    =  | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        |     .           .      |
        |     .           .      |
                                 
Output:
                                         
        | (-22.0, 113.0)  (-35.0, 142.0) |
        | (-19.0, 114.0)  (-35.0, 141.0) |
        | (-20.0, 119.0)  (-43.0, 146.0) |
C    =  | (-27.0, 110.0)  (-58.0, 131.0) |
        |   (8.0, 103.0)    (0.0, 112.0) |
        | (-55.0, 116.0)  (-75.0, 135.0) |
        |       .               .        |
        |       .               .        |
                                         
Example 4

This example shows how to obtain the conjugate transpose of ABH.

Conjugate Transpose Graphic
This shows the conjugate transpose of the computation performed in Example 8 for CGEMUL, which uses the following calling sequence:
CALL CGEMUL( A , 4 , 'N' , B , 3 , 'C' , C , 4 , 3 , 2 , 3 )

You instead code the calling sequence for CβC+αBAH, where β = 0, α = 1, and the array C has the correct dimensions to receive the transposed matrix. Because β is zero, βC = 0. For a description of all the matrix identities, see Special Usage.

Call Statement and Input:
           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'N'  , 'C'  , 3 , 3 , 2 , ALPHA , B , 3 , A , 3 , BETA , C , 4 )
ALPHA    =  (1.0, 0.0)
BETA     =  (0.0, 0.0)
 
                                  
        | (1.0, 3.0)  (-3.0, 2.0) |
B    =  | (2.0, 5.0)   (4.0, 6.0) |
        | (1.0, 1.0)  (-1.0, 9.0) |
                                  
                                  
        | (1.0, 2.0)  (-3.0, 2.0) |
A    =  | (2.0, 6.0)   (4.0, 5.0) |
        | (1.0, 2.0)  (-1.0, 8.0) |
        |     .            .      |
                                  
C        =(not relevant)
Output:
                                                     
        | (20.0,   1.0)  (18.0, 23.0)  (26.0,  23.0) |
C    =  | (12.0, -25.0)  (80.0,  2.0)  (56.0, -37.0) |
        | (24.0, -26.0)  (49.0, 37.0)  (76.0,  -2.0) |
        |      .              .             .        |
                                                     
Example 5

This example shows the computation CαATBH+βC using complex data, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained. Because β is zero, βC = 0. (Based on the dimensions of the matrices, A is actually a column vector, and C is actually a row vector.)

Call Statement and Input:
           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'T'  , 'C'  , 1 , 3 , 3 , ALPHA , A , 3 , B , 3 , BETA , C , 1 )
ALPHA    =  (1.0, 1.0)
BETA     =  (0.0, 0.0)
 
                      
        | (1.0,  2.0) |
A    =  | (2.0,  5.0) |
        | (1.0,  6.0) |
                      
                                               
        | (1.0, 6.0)  (-3.0, 4.0)   (2.0, 6.0) |
B    =  | (2.0, 3.0)   (4.0, 6.0)   (0.0, 3.0) |
        | (1.0, 3.0)  (-1.0, 6.0)  (-1.0, 9.0) |
                                               
C        =(not relevant)
Output:
                                                  
C    =  | (86.0, 44.0) (58.0, 70.0) (121.0, 55.0) |