Preparing to expand Cloud Pak for Data System

Deployment options: Netezza Performance Server for Cloud Pak for Data System

This section outlines the preparatory procedures necessary before expansion.

Before you can expand your system, you must

complete/run multiple checks,
prepare/organize your data,
ensure that you have enough space for the expansion and,
choose a redistribution method.

Verify Netezza Performance Server and Cloud Pak for Data System versions

Ensure that your systems (Netezza Performance Server and Cloud Pak for Data System) are on the following versions:

For Cloud Pak for Data System: 1.0.8.0 or later.
For Netezza Performance Server: 11.2.1.11 or later (but not 11.2.2.0 or later). This is applicable only for online redistribution.

Pre-expansion checks

Note:

If system is part of NRS:

Find all the databases that are participating in NRS via the nzdr list db command.
Remove all the all those databases from the replication using nzdr delete db command. To delete database from NRS, see Deleting replication databases.

If system is part of replication setup:

Remove the currently replicating database configuration from NRS.
Set it up again from scratch to ensure it works properly with the expanded system.

Check system health.

Complete the checks to ensure that the following conditions are met:

System health is good.
Check whether there are any issues:
```
nzds -issues
```
Check whether the system needs to be re-balanced (down time) before the expansion:
```
nzds rebalance -check
```
Check AP issues and ensure that there are no open alerts.
```
ap issues
```

Check network health.

Collect logs by running the command. Ensure that there are no errors in the logs.

apdiag collect --components hw/switch/ network/

Verify that the node model number was set correctly.

Model numbers must be the same for all nodes.

$ ap hw -d | grep -w node | awk -F '|' '{print $4 " "$7}'

Example:

$ ap hw -d | grep -w node | awk -F '|' '{print $4 " "$7}'
enclosure1.node1 7X21CTO1WW
enclosure1.node2 7X21CTO1WW
enclosure1.node3 7X21CTO1WW
enclosure1.node4 7X21CTO1WW
enclosure2.node1 7X21CTO1WW
enclosure2.node2 7X21CTO1WW
enclosure2.node3 7X21CTO1WW
enclosure2.node4 7X21CTO1WW
enclosure3.node1 7X21CTO1WW
enclosure3.node2 7X21CTO1WW
enclosure3.node3 7X21CTO1WW
enclosure3.node4 7X21CTO1WW

Manually vacuum Netezza Performance Server a couple of days before the expansion.

If you vacuum the system, you can shorten the redistribution time and by extension reduce the time that is needed to expand the system. Depending on the size of the catalog,an extra system outage happens. For example, a 2-hour outage.

Stop Netezza Performance Server:
```
nzstop
```
Run a manual vacuum:
```
/nz/support/bin/nz_manual_vacuum
```
Start Netezza Performance Server:
```
nzstart
```

Check the speed of the disks.

Analyze the command logs to identify low speed disks. If you identify any slow disks, contact the Netezza Performance Server development team.

/nz/support/bin/nz_check_disk_scan_speeds -size 2 –cleanup

After the command finishes, you can see the following information:

Dropping table 'NZ_CHECK_DISK_SCAN_SPEEDS' now that the testing is complete.
DROP TABLE

Note: Certain actions may need to be repeated based on the outcomes of preceding steps. For instance, if an evaluation reveals insufficient disk space to facilitate online redistribution of all tables, consider optimizing disk space by cleaning databases, schemas, and tables, followed by reassessing the available free disk space.

Check RAID consistency

Run the nzraidcheck command two days before the expansion during the system idle time to detect bad pages or disk issues and validate primary and mirror data consistency.

/nz/kit/bin/adm/tools/nzraidcheck -mode checkOnly

Contact IBM if there are any issues.

Unlock read-only databases

The redistribution process needs exclusive write access to all databases. Unlock any locked databases. First run the nz_redr_db_lock_info tool.

/nz/support/bin/nz_redr_db_lock_info -d <directory>

If there are no locked databases, the output shows No database needs to be unlocked, and no further action is needed.
If there are locked databases, the output shows Run the following before expansion starts with nzsql commands. Run the first nzsql command to unlock the databases.

Note: When unlocking read only databases (if they are part of incremental restores) they will lose the ability to continue their incremental restore. It is recommended to drop such database to reduce re-distribution times.

Clean up unwanted databases, schemas, and tables

: Clean up unwanted databases, schemas, and tables (including grooming of tables that have many deleted rows). This reduces disk space consumption and data redistribution time.

Groom versioned tables

Run the following command to groom versioned tables:

/nz/support/bin/nz_altered_tables -groom

Prepare to preserve data order

Prepare to preserve data order two days before expansion. During the expansion process, data is redistributed across data slices. The natural order of data is changed and might impact query performance.

Run the nz_sort_order tool against each database to obtain recommendations for converting tables to CBTs (by adding organizing columns). For example:
```
/nz/support/bin/nz_sort_order <database Name> -recommend yes
```
Save the output files with recommendations that add organizing columns and then groom the tables. The recommendations are needed after the expansion and redistribution.

Note: Preserving data order is not required if they are going from x to 2x size system. For example, base+2 to base+4, base+4 to base+8, base+8 to base+16, base+16 to base+32 does not requires this step. However, any other configuration like base+2 to base+6 or base+4 to base+6 or base+8 to base+12 or base+32 to base+48 will require this step.

Backup databases

: Backup databases before expansion. If you are performing your regular backups, take the final increment before Netezza Performance Server expansion.

Prepare for post-expansion data validation

Prepare for post-expansion data validation. To capture row count, run the following command:

nz_db_table_row_count

Tip: Select some reports or queries to run for data validation, performance comparison, and capture the results.

Ensure sufficient disk space

Note: This step is applicable only if online redistribution is chosen.

Determine the number of data slices that the system will have after expansion. You can obtain this from IBM or by multiplying the number of SPU enclosures by 96. For example, if expanding to a Base+8 system, the new data slice count is 768.
Tip: To calculate new data slice count: number of SPU enclosures * 96.

Run the following command to determine whether all tables can be redistributed by using online redistribution. Use the output when choosing a distribution method in Redistribution methods.

/nz/support/bin/nz_redistribute -SpaceEstimate <new data slice count>

Output

If there is sufficient disk space to redistribute all tables online:

nz_redistribute -SpaceEstimate <new data slice count>

     # Of Dataslices
     --------------------
        Before Expansion:  120
         After Expansion:  160

     Dataslice Sizing
     --------------------
          Total Capacity:  195.31  GiB
              USED (max):  133.64  GiB
              FREE (min):   61.66  GiB

     Largest Table
     --------------------
                    Name:  SAMPLE_DATABASE.ADMIN.CUSTOMER_ORDERS
      DSlice AVG Storage:   21.07  GiB
      DSlice MAX Storage:   27.95  GiB

     Estimation Summation
     --------------------
     Total space needed (per dataslice) to do the online redistribution:  128.19  GiB

      Should be adequate:   67.12  GiB  to spare

If the disk space is insufficient to redistribute all tables online:

nz_redistribute -SpaceEstimate <new data slice count>

     # Of Dataslices
     --------------------
        Before Expansion:  120
         After Expansion:  160

     Dataslice Sizing
     --------------------
          Total Capacity:  195.31  GiB
              USED (max):  181.02  GiB
              FREE (min):   14.29  GiB

     Largest Table
     --------------------
                    Name:  SAMPLE_DATABASE.ADMIN.WEB_HITS
      DSlice AVG Storage:   50.83  GiB
      DSlice MAX Storage:   61.56  GiB

     Estimation Summation
     --------------------
     Total space needed (per dataslice) to do the online redistribution:  197.33  GiB

         INSUFFICIENT BY:    2.02  GiB

Proceed to Redistribution methods, or try to free up more disk space to cover the INSUFFICIENT BY amount per data slice and then repeat the

nz_redistribute
-SpaceEstimate

command before choosing the redistribution method.