Reordering data after migration

Deployment options: Netezza Performance Server for Cloud Pak for Data System

Learn how to order data after you migrated from PureData System for Analytics to Netezza Performance Server.

The PureData System for Analytics (TwinFin, Mako, Striper) and Netezza Performance Server boxes have different data slice configurations. To cut down on the amount of data that needed scanning, your PureData System for Analytcis might have relied on well ordered tables and zonemaps. However, when you migrate data from one generation of a system to another generation, the migrated data is not going to be migrated on the same data slices, nor in the same order. This includes nzrestore, nz_restore, nz_migrate and other migration commands.

Note that when data is moved, the natural order of the data is disrupted. It is important that the natural order of the migrated data is restored once the migration process is completed. This ensures that you don't encounter any issues after migration.

Checking data ordering

To check whether or not a query is (or isn't) making use of zonemaps, you can look at the *.pln file.

Detail 98620_21 table 158475/159304.76/160935 scan 18/19.75/23

This is an example of a good use of zonemaps. Only about 1/1000th of the whole data had to be scanned for this table/query.

Detail 2730534_81 table 33122/34768.89/37139 scan 30619/32226.61/34278

For this table/query, a full table scan had to be performed.

So this will tell you everything you need to know about THIS query. But if you want to compare "apples to oranges", then you really need a copy of that other plan as well. To see how much data it had to scan.

Data reordering

For this, select large tables, the ones with more than 10 million rows. The larger the table, the more important this is if you want to be able to reduce the amount of data that has to be scanned for any given query.

CBT tables

If a table already has an ORGANIZE ON clause, run the GROOM command:
groom table <tableName> records all;
Non-CBT tables

You might be already sorting your table as part of the ETL processing and you know what columns work best for you for organizing the data.

If not, you can run the nz_sort_order script against the original source table. The script looks at the existing zonemap information for the table and tries to compute a % sorted for each of the columns. That information can help you choose the proper sort key.

Once you have decided on how to best sort the data, perform the following steps to reorder the data:
  1. Run the ALTER TABLE and ORGANIZE ON commands.

    As a result, your table is now recognized as a clustered base table.

  2. Run the GROOM command:
    groom table <tableName> records all;

    This command performs the actual sort/reorganization of the data in the table.

Alternatively, you can create a copy of the table and sort it on the fly by running the CREATE TABLE AS and ORDER BY commands. For more information about CTAS, see CREATE TABLE AS.