PDGETRI and PZGETRI — General Matrix Inverse

Purpose

PDGETRI and PZGETRI compute the inverse of general matrix A. These subroutines use the results of the factorization of matrix A, produced by a preceding call to PDGETRF or PZGETRF, respectively. For details on the factorization, see PDGETRF and PZGETRF — General Matrix Factorization.

If n = 0, no computation is performed and the subroutine returns after doing some parameter checking.

Table 1. Data Types
Data Types
A, work	*ipvt*, iwork	Subroutine
Long-precision real	Integer	PDGETRI
Long-precision complex	Integer	PZGETRI

Syntax

Language	Syntax
Fortran	CALL PDGETRI \| PZGETRI (n, a, ia, ja, desc_a, ipvt, work, lwork, iwork, liwork, info)
C and C++	pdgetri \| pzgetri (n, a, ia, ja, desc_a, ipvt, work, lwork, iwork, liwork, info);

On Entry

n

is the order of the factored submatrix A.

Scope: global

Specified as: a fullword integer; n ≥ 0.

a

is the local part of the global general matrix A, containing the factorization of matrix A produced by a preceding call to PDGETRF or PZGETRF, respectively. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+n-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 1. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 ≤ ia ≤ M_A and ia+n-1 ≤ M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 ≤ ja ≤ N_A and ja+n-1 ≤ N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

desc_a	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If n = 0: M_A ≥ 0 Otherwise: M_A ≥ 1	Global
4	N_A	Number of columns in the global matrix	If n = 0: N_A ≥ 0 Otherwise: N_A ≥ 1	Global
5	MB_A	Row block size	MB_A ≥ 1	Global
6	NB_A	Column block size	NB_A ≥ 1	Global
7	RSRC_A	The process row of the p × q grid over which the first row of the global matrix is distributed	0 ≤ RSRC_A < p	Global
8	CSRC_A	The process column of the p × q grid over which the first column of the global matrix is distributed	0 ≤ CSRC_A < q	Global
9	LLD_A	The leading dimension of the local array	LLD_A ≥ max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

ipvt

is the local part of the global vector ipvt, containing the pivoting indices produced on a preceding call to PDGETRF or PZGETRF, respectively. This identifies the first element of the local array IPVT. This subroutine computes the location of the first element of the local subarray used, based on ia, desc_a, p, and myrow; therefore, the leading LOCp(ia+n-1) part of the local array IPVT must contain the local pieces of the leading ia+n-1 part of the global vector.

A copy of the vector ipvt, with a block size of MB_A and global index ia, is contained in each column of the process grid. The process row over which the first row of ipvt is distributed is RSRC_A.

Scope: local

Specified as: an array of (at least) length LOCp(ia+n-1), containing fullword integers, where ia ≤ (pivoting index values) ≤ ia+n-1. Details about the block-cyclic data distribution of global vector ipvt are stored in desc_a.

work

has the following meaning:

If lwork = 0, work is ignored.

If lwork ≠ 0, work is a work area used by this subroutine, where:

If lwork ≠ -1, then its size is (at least) of length lwork.
If lwork = -1, then its size is (at least) of length 1.

Scope: local

Specified as: an area of storage containing numbers of data type indicated in Table 1.

lwork

is the number of elements in array WORK.

Scope:

If lwork ≥ 0, lwork is local.
If lwork = -1, lwork is global.

Specified as: a fullword integer; where:

If lwork = 0, PDGETRI and PZGETRI dynamically allocate the work area used by the subroutine. The work area is deallocated before control is returned to the calling program. This option is an extension to the ScaLAPACK standard.
If lwork = -1, PDGETRI and PZGETRI perform a work area query and return the minimum size of work in work₁. No computation is performed and the subroutine returns after error checking is complete.
Otherwise, it must have the following value:
lwork = NUMROC(n + iroff, MB_A, myrow, iarow, nprow) * NB_A
where:
- iroff = mod(ia-1, MB_A)
- iarow = mod(RSRC_A + (ia-1)/MB_A, nprow)

iwork

has the following meaning:

If liwork = 0, iwork is ignored.

If liwork ≠ 0, iwork is a work area used by this subroutine, where:

If liwork ≠ -1, then its size is (at least) of length liwork.
If liwork = -1, then its size is (at least) of length 1.

Scope: local

Specified as: an area of storage containing fullword integers.

liwork

is the number of elements in array IWORK.

Scope:

If liwork ≥ 0, liwork is local.
If liwork = -1, liwork is global.

Specified as: a fullword integer; where:

If liwork = 0, PDGETRI and PZGETRI dynamically allocate the work area used by the subroutine. The work area is deallocated before control is returned to the calling program. This option is an extension to the ScaLAPACK standard.
If liwork = -1, PDGETRI and PZGETRI perform a work area query and return the minimum size of iwork in iwork₁. No computation is performed and the subroutine returns after error checking is complete.
Otherwise, use the following rules to determine the value to specify:
If nprow = npcol, then liwork = nq + NB_A

If nprow ≠ npcol, then liwork = nq + max(liwork1, liwork2, NB_A)
where:
- liwork1 = MB_A * iceil(iceil(mp, MB_A), lcm/nprow)
- liwork2 = NB_A * iceil(iceil(mq, NB_A), lcm/npcol)
- mp = NUMROC(M_A, MB_A, myrow, RSRC_A, nprow)
- mq = NUMROC(M_A, MB_A, mycol, CSRC_A, npcol)
- nq = NUMROC(N_A, NB_A, mycol, CSRC_A, npcol)
- lcm = ilcm(nprow, npcol)

info

See On Return.

On Return

a

is the updated local part of the global general matrix A, containing the inverse of matrix A.

Scope: local

Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 1.

work

is the work area used by this subroutine if lwork ≠ 0, where:

If lwork ≠ 0 and lwork ≠ -1, its size is (at least) of length lwork.

If lwork = -1, its size is (at least) of length 1.

Scope: local

Returned as: an area of storage, containing numbers of the data type indicated in Table 1, where:

If lwork ≥ 1 or lwork = -1, then work₁ is set to the minimum lwork value needed. Except for work₁, the contents of work are overwritten on return.

iwork

is the work area used by this subroutine if liwork ≠ 0, where:

If liwork ≠ 0 and liwork ≠ -1, then its size is (at least) of length liwork.

If liwork = -1, then its size is (at least) of length 1.

Scope: local

Returned as: an area of storage, where:

If liwork ≥ 1 or liwork = -1, then iwork₁ is set to the minimum liwork value and contains numbers of the data type indicated in Table 1. Except for iwork₁, the contents of iwork are overwritten on return.

info

has the following meaning:

If info = 0, global submatrix A is not singular, and the inverse completed normally.

If info > 0, global submatrix A is singular and the inverse could not be computed. For info = k, the corresponding diagonal element of U_{k, k} is exactly zero.

Scope: global

Returned as: a fullword integer; info ≥ 0.

Notes and Coding Rules

In your C program, argument info must be passed by reference.
The matrix and vector must have no common elements; otherwise, results are unpredictable.
The scalar data specified for input argument n must be the same for both PDGETRF/PZGETRF and PDGETRI/PZGETRI. In addition, the scalar data specified for input argument m in PDGETRF/PZGETRF must be the same as input argument n in both PDGETRF/PZGETRF and PDGETRI/PZGETRI.
If, however, you do not plan to call PDGETRI/PZGETRI after calling PDGETRF/PZGETRF, then input arguments m and n in PDGETRF/PZGETRF do not need to be equal.
The global submatrices for A and ipvt input to PDGETRI/PZGETRI must be the same as for the corresponding output arguments for PDGETRF/PZGETRF; and thus, the scalar data specified for ia, ja, and the contents of desc_a must also be the same.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument previous descriptions. For details, see Determining the number of rows and columns in local arrays and NUMROC — Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see Coding tips for optimizing parallel performance.
On both input and output, matrix A conforms to ScaLAPACK format.
The global general matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
The global general matrix A must be aligned on a block row boundary; that is, ia-1 must be a multiple of MB_A.
The block row offset of A must be equal to the block column offset of A; that is, mod(ia-1, MB_A) = mod(ja-1, NB_A).
There is no array descriptor for ipvt. It is a column-distributed vector with block size MB_A, local arrays of dimension LOCp(ia+n-1) by 1, and global index ia. A copy of this vector exists on each column of the process grid, and the process row over which the first column of ipvt is distributed is RSRC_A.

Error conditions

Computational Errors

Matrix A is a singular matrix. For details, see the description of the info argument.

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

This subroutine was called from outside the process grid.

Stage 4

n < 0
M_A < 0 and n = 0; M_A < 1 otherwise
N_A < 0 and n = 0; N_A < 1 otherwise
ia < 1
ja < 1
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A ≥ p
CSRC_A < 0 or CSRC_A ≥ q

Stage 5

If n ≠ 0:

ia > M_A
ja > N_A
ia+n-1 > M_A
ja+n-1 > N_A

In all cases:
MB_A ≠ NB_A
mod(ia-1, MB_A) ≠ mod(ja-1, NB_A)
mod(ia-1, MB_A) ≠ 0

Stage 6

LLD_A < max(1, LOCp(M_A))
lwork ≠ 0, lwork ≠ -1, and lwork < (minimum value). (For the minimum value, see the lwork argument description.)
liwork ≠ 0, liwork ≠ -1, and liwork < (minimum value). (For the minimum value, see the liwork argument description.)

Stage 7

Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P₀₀:

n differs.
ia differs.
ja differs.
DTYPE_A differs.
M_A differs.
N_A differs.
MB_A differs.
NB_A differs.
RSRC_A differs.
CSRC_A differs.

Also:
lwork = -1 on a subset of processes.
liwork = -1 on a subset of processes.

Examples

Example 1

This example computes the inverse of a real matrix using the LU factorization computed by PDGETRF. The input ipvt vector and transformed matrix A are the output from PDGETRF, Example 1.

Note:

Because lwork = 0 and liwork = 0, PDGETRI dynamically allocates the work areas used by this subroutine.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET(0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
               N   A   IA  JA  DESC_A  IPVT  WORK  LWORK  IWORK LIWORK INFO
               |   |   |    |   |        |    |      |      |     |      |
CALL PDGETRI(  9 , A , 1 ,  1 ,DESC_A ,IPVT, WORK ,  0  , IWORK , 0  , INFO )

	Desc_A
DTYPE_	1
CTXT_	icontxt¹
M_	9
N_	9
MB_	3
NB_	3
RSRC_	1
CSRC_	0
LLD_	See ²

Note:

icontxt is the output of the BLACS_GRIDINIT call.
Each process should set the LLD_ as follows:
```
LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
```
In this example, LLD_A = 3 on P₀₀ and P₀₁, and LLD_A = 6 on P₁₀ and P₁₁.

Output:

Global general 9 × 9 inverted matrix A with block size 3 × 3:

B,D          0                  1                  2
                                                           
     | -2.4  2.5  0.0  |   0.0  0.0  0.0  |   0.0  0.0  0.1 |
 0   |  2.5 -5.0  2.5  |   0.0  0.0  0.0  |   0.0  0.0  0.0 |
     |  0.0  2.5 -5.0  |   2.5  0.0  0.0  |   0.0  0.0  0.0 |
     | ----------------|------------------|---------------- |
     |  0.0  0.0  2.5  |  -5.0  2.5  0.0  |   0.0  0.0  0.0 |
 1   |  0.0  0.0  0.0  |   2.5 -5.0  2.5  |   0.0  0.0  0.0 |
     |  0.0  0.0  0.0  |   0.0  2.5 -5.0  |   2.5  0.0  0.0 |
     | ----------------|------------------|---------------- |
     |  0.0  0.0  0.0  |   0.0  0.0  2.5  |  -5.0  2.5  0.0 |
 2   |  0.0  0.0  0.0  |   0.0  0.0  0.0  |   2.5 -5.0  2.5 |
     |  0.1  0.0  0.0  |   0.0  0.0  0.0  |   0.0  2.5 -2.4 |

The following is the 2 × 2 process grid:

B,D  |  0 2  |  1
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note: The first row of A begins in the second row of the process grid.

Local arrays for A:

p,q  |               0                |        1
-----|--------------------------------|-----------------
     |  0.0  0.0  2.5  0.0  0.0  0.0  |  -5.0  2.5  0.0
 0   |  0.0  0.0  0.0  0.0  0.0  0.0  |   2.5 -5.0  2.5
     |  0.0  0.0  0.0  2.5  0.0  0.0  |   0.0  2.5 -5.0
-----|--------------------------------|-----------------
     | -2.4  2.5  0.0  0.0  0.0  0.1  |   0.0  0.0  0.0
     |  2.5 -5.0  2.5  0.0  0.0  0.0  |   0.0  0.0  0.0
     |  0.0  2.5 -5.0  0.0  0.0  0.0  |   2.5  0.0  0.0
 1   |  0.0  0.0  0.0 -5.0  2.5  0.0  |   0.0  0.0  2.5
     |  0.0  0.0  0.0  2.5 -5.0  2.5  |   0.0  0.0  0.0
     |  0.1  0.0  0.0  0.0  2.5 -2.4  |   0.0  0.0  0.0

The value of info is 0 on all processes.

Example 2

This example computes the inverse of a complex matrix using the LU factorization computed by PZGETRF. The input ipvt vector and transformed matrix A are the output from PZGETRF, Example 2.

Note:

Because lwork = 0 and liwork = 0, PZGETRI dynamically allocates the work areas used by this subroutine.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET(0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
               N   A   IA  JA  DESC_A  IPVT  WORK  LWORK  IWORK LIWORK INFO
               |   |   |    |   |        |    |      |      |     |      |
CALL PZGETRI(  9 , A , 1 ,  1 ,DESC_A ,IPVT ,WORK ,  0  , IWORK , 0  , INFO )

	Desc_A
DTYPE_	1
CTXT_	icontxt¹
M_	9
N_	9
MB_	3
NB_	3
RSRC_	1
CSRC_	0
LLD_A	See ²

Note:

icontxt is the output of the BLACS_GRIDINIT call.
Each process should set the LLD_ as follows:
```
LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
```
In this example, LLD_A = 3 on P₀₀ and P₀₁, and LLD_A = 6 on P₁₀ and P₁₁.

Output:

Global general 9 × 9 inverted complex matrix A with block size 3 × 3:

B,D                     0                                     1                                     2
                                                                                                                     
     |(-.17,-.42) (-.12, .13) (-.06, .15)  | (.00, .15)  (.05, .13)  (.09, .09)  | (.11, .05)  (.11, .00)  (.04,-.28) |
 0   | (.18, .43) (-.04,-.55) (-.06,-.03)  |(-.06,-.01) (-.05, .01) (-.04, .03)  |(-.03, .04) (-.01, .04)  (.11, .00) |
     | (.01, .00)  (.18, .43) (-.04,-.55)  |(-.06,-.03) (-.06,-.01) (-.05, .01)  |(-.04, .03) (-.03, .04)  (.11, .05) |
     | ------------------------------------|-------------------------------------|----------------------------------- |
     | (.01, .00)  (.01, .00)  (.18, .43)  |(-.04,-.55) (-.06,-.03) (-.06,-.01)  |(-.05, .01) (-.04, .03)  (.09, .09) |
 1   | (.00, .01)  (.01, .00)  (.01, .00)  | (.18, .43) (-.04,-.55) (-.06,-.03)  |(-.06,-.01) (-.05, .01)  (.05, .13) |
     | (.00, .01)  (.00, .01)  (.01, .00)  | (.01, .00)  (.18, .43) (-.04,-.55)  |(-.06,-.03) (-.06,-.01)  (.00, .15) |
     | ------------------------------------|-------------------------------------|----------------------------------- |
     | (.00, .01)  (.00, .01)  (.00, .01)  | (.01, .00)  (.01, .00)  (.18, .43)  |(-.04,-.55) (-.06,-.03) (-.06, .15) |
 2   | (.00, .01)  (.00, .01)  (.00, .01)  | (.00, .01)  (.01, .00)  (.01, .00)  | (.18, .43) (-.04,-.55) (-.12, .13) |
     |(-.01, .01)  (.00, .01)  (.00, .01)  | (.00, .01)  (.00, .01)  (.01, .00)  | (.01, .00)  (.18, .43) (-.17,-.42) |

The following is the 2 × 2 process grid:

B,D  |  0 2  |  1
-----|-------|-----
1    |   P₀₀   |  P₀₁
-----|-------|-----
0    |   P₁₀   |  P₁₁
2    |       |

Note: The first row of A begins in the second row of the process grid.

Local arrays for A:

p,q  |                                   0                                     |                   1
-----|-------------------------------------------------------------------------|-----------------------------------
     | (.01, .00)  (.01, .00)  (.18, .43) (-.05, .01) (-.04, .03)  (.09, .09)  |(-.04,-.55) (-.06,-.03) (-.06,-.01)
 0   | (.00, .01)  (.01, .00)  (.01, .00) (-.06,-.01) (-.05, .01)  (.05, .13)  | (.18, .43) (-.04,-.55) (-.06,-.03)
     | (.00, .01)  (.00, .01)  (.01, .00) (-.06,-.03) (-.06,-.01)  (.00, .15)  | (.01, .00)  (.18, .43) (-.04,-.55)
-----|-------------------------------------------------------------------------|-----------------------------------
     |(-.17,-.42) (-.12, .13) (-.06, .15)  (.11, .05)  (.11, .00)  (.04,-.28)  | (.00, .15)  (.05, .13)  (.09, .09)
     | (.18, .43) (-.04,-.55) (-.06,-.03) (-.03, .04) (-.01, .04)  (.11, .00)  |(-.06,-.01) (-.05, .01) (-.04, .03)
     | (.01, .00)  (.18, .43) (-.04,-.55) (-.04, .03) (-.03, .04)  (.11, .05)  |(-.06,-.03) (-.06,-.01) (-.05, .01)
 1   | (.00, .01)  (.00, .01)  (.00, .01) (-.04,-.55) (-.06,-.03) (-.06, .15)  | (.01, .00)  (.01, .00)  (.18, .43)
     | (.00, .01)  (.00, .01)  (.00, .01)  (.18, .43) (-.04,-.55) (-.12, .13)  | (.00, .01)  (.01, .00)  (.01, .00)
     |(-.01, .01)  (.00, .01)  (.00, .01)  (.01, .00)  (.18, .43) (-.17,-.42)  | (.00, .01)  (.00, .01)  (.01, .00)

The value of info is 0 on all processes.