Understanding nzMatrix

nzMatrix Layered Architecture

nzMatrix has a layered architecture, consisting of an NZPLSQL layer with its API and a C++ layer, which is not exposed as an API. The C++ layer performs many of its functions by calling the Intel MKL library, including PBLAS and ScaLAPACK routines. PBLAS and ScaLAPACK are proven libraries of highly scalable distributed matrix routines that rely on BLACS and MPI for interprocess communication. For more information on UDAPs, refer to the User-Defined Analytic Process Developer's Guide.

Matrix Storage and Access

The Netezza Matrix Engine stores data in the currently connected database using standard Netezza database tables beginning with the prefix NZ_MAT_. As a best practice, you should set privileges to hide these tables from direct access by standard users. The standard Netezza Analytics privilege-granting scripts, create_inza_db.sh and create_inza_db_user.sh are used to modify the permissions. While elevated privileges allow direct access to nzMatrix tables, it is not recommended because direct access could result in code incompatibility with future versions of nzMatrix or data corruption.

The create_inza_db_user.sh script grants users the necessary privileges to execute the nzMatrix stored procedures registered in the NZM database. It is recommended that you create additional databases for connection purposes, rather than using the NZM database.

The nzMatrix NZPLSQL stored procedures allow interaction with matrices. nzMatrix provides routines for creating matrices from and saving matrices to user-accessible Netezza database tables in Row/Column/Value format. nzMatrix also provides routines for listing and deleting matrices stored in the database as well as for creating identity matrices, random matrices, and others.

When a given database is shared, the matrix namespace is also shared. Therefore, if a metric is created while connected to shared database, that matrix is accessible to any user who has the proper privileges to work in the shared database. This is different than tables where, by default, users do not have visibility into other users' tables. Note that you cannot create a new matrix with the same name in the shared database without first deleting the previously created matrix.

nzMatrix General Limitations

nzMatrix uses the 64-bit double precision floating point approximate numeric data type for storage and computation of matrix element values. Row and column indices are stored as 32-bit integer values, allowing up to 2,147,483,647 rows and columns. Row and column indices begin at 1; a zero index value is not permitted.

nzMatrix computations are automatically distributed across the Netezza appliance. Computations that make use of PBLAS and ScaLAPACK are automatically queued FIFO and run one at a time. If a computational step that uses PBLAS or ScaLAPACK is being performed, any other users' attempt to do the same are queued.

To check which tasks are queued, if any, use the following SQL command:

CAll NZA..SP_MPI_STATS():

To abort a task, use the following SQL command, replacing the Job Task ID as needed:

CALL NZM..KILL_ENGINE(<job_task_ID>);

Calculations using PBLAS or ScaLAPACK consume S-Blade RAM for temporarily holding input matrices, intermediate work matrices, and result matrices. Each matrix element consumes 8 bytes. Exceeding available RAM may result in aborted computations and an "Out of memory" user error. Available RAM equals total RAM minus the RAM requirements of the Linux operating system, the Netezza software, and concurrent, unrelated queries.

For example, an IBM Netezza 1000-12 with 12 S-Blades and 16 GB of RAM per S-Blade, performing the matrix multiplication of two 65,000 x 65,000 element matrices, consumes its available RAM and aches approximately 1 hour. Performing the Singular Value Decomposition of a 45,000 x 45,000 element matrix also consumes the available RAM and takes approximately 7 hours. These are maximum limits and assume minimal concurrent queries.

Scalability

Memory requirements and computation time grow rapidly as matrix dimensions increase. The amount of memory required to hold a matrix with n rows and n columns is proportional to n squared, and the number of operations required to multiply, invert, or decompose such matrices is proportional to approximately n cubed.

Random Number Generators

Netezza Analytics provides a set of wrappers for the Intel Math Kernel Library random number generators (RNGs). The SQL API provides a set of stored procedures that generate matrices filled with pseudorandom values drawn from given probability distribution. See the Netezza Matrix Engine Reference Guide for a description of each stored procedure. See Intel Math Kernel Library Vector Statistical Library Notes for more information on the RNGs.

Bulk Algorithms

From Version 3, IBM Netezza Analytics provides bulk algorithms that you can use to work on many smaller matrices in parallel.

The bulk algorithms are:

Bulk matrix operations
Bulk linear regression
Bulk principal components analysis (PCA)

For detailed information about bulk algorithms, see the IBM Netezza In-Database Analytics Developer's Guide.