Data set configuration

Learn how to configure the data sets to be analyzed.

Sample data sets

Standard data-mining data sets are used in the Netezza Performance Server Analytics document set to provide examples of how various functions and stored procedures perform in normal operation. The data sets are also used as insights into how the various components of the product might be used in real-world scenarios.

The following sample data sets used by the documentation are not included with Netezza Performance Server Analytics and must be downloaded from the internet and installed to Netezza Performance Server by an administrator before they can be used. The data cannot be used directly from the downloaded data set files, so a script has been provided to create the tables that are needed to contain the data, manipulate the downloaded data, and load the data for use on the system. Although not necessary, to use the documentation examples, the following data sets must be acquired:
Data set name URL and files to download
Retail

URL: fimi.ua.ac.be/data/

File: retail.dat.gz (click the .gz link)

CensusIncome

URL: archive.ics.uci.edu/ml/databases/census-income/

File: census.tar.gz

WineQuality

URL: archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

File: winequality-white.csv

Adult

URL: archive.ics.uci.edu/ml/machine-learning-databases/adult

File: adult.data

Soybean

URL: archive.ics.uci.edu/ml/machine-learning-databases/soybean

Files: soybean-large.data and soybean-large.test

Iris

URL: archive.ics.uci.edu/ml/machine-learning-databases/iris/

File: iris.data

Installing sample data sets

  1. Download each data set file to a local machine. If a file is packed (for example, a file with the extension .gz is packed), do not unpack it.
  2. Log in to the host as user nz.
  3. Create a directory in which to store the downloaded data sets, for example:
    /nz/export/ae/utilities/bin/testData
  4. Transfer the data set files to the newly created directory. Do not change the file names.
  5. Navigate the following directory:
    /nz/export/ae/utilities/bin
  6. Run the installation script by entering one of the following commands:
    • If the sample data set files are in the directory /nz/export/ae/utilities/bin/testData:
      ./loadTestTables.sh
    • If the sample data set files are in a different directory:
      ./loadTestTables.sh path_to_directory
      Because of the large amounts of data that the files contain, the script might run for several minutes. This is normal.
    • After the script finishes, temporary files created by the script are deleted automatically. However, the downloaded data files and the log files are not deleted, and remain on the host. If you do not wish to retain them, delete them manually.

If the script is re-run, all sample data is deleted from the database and the corresponding tables are dropped. Then, the tables are re-created and the original sample data is reinserted.

Netezza Performance Server Cartridge Manager (nzcm)

Cartridge management for Netezza Performance Server Analytics is performed using the Netezza Performance Server Cartridge Manager (nzcm) utility. Use nzcm to install, uninstall, register, unregister, and otherwise administer cartridges.

Installing nzcm

Netezza Performance Server Analytics is distributed as a collection of cartridges in the form of .nzc files. You must extract these files from the full Netezza Performance Server Analytics package. You can extract and access the cartridges and the Netezza Performance Server Cartridge Manager (nzcm) through the Netezza Performance Server Analytics installation utility.

On the appliance host, take the following steps:
  1. Log in to the host as user nz.
  2. Go to the to the directory that contains the following file:
    nz-analytics-vversion.zip
  3. Run the following command:
    unzip nz-analytics-vversion.zip
    The unzip utility must be used to extract the file; gunzip cannot be used. This command creates a directory with the name nzcmrepo under the directory where the files were extracted.
  4. Go to to the nzcmrepo subdirectory, typically /nz/var/inza/nzcmrepo.
  5. Locate the nzcm file to determine the release number. The file is named in the form nzcm-<version>.
  6. Decompress the file.
    tar -xf nzcm-<version>
  7. When decompressed, go to the nzcm directory:
    cd /nz/var/inza/nzcmrepo/nzcm-<<version>
  8. Install nzcm:
    ./install.sh

    The script installs nzcm to the /nz/var/nzcm directory and the repository is configured automatically.

  9. As instructed by the output of the install.sh script, run:
    source ~/.bashrc
  10. Issue the following command to change to the target directory:
    cd /nz/var/inza/nzcmrepo
  11. Confirm that the target directory is empty.
  12. Decompress the cartridges and group files:
    cp -f *.nzc /nz/var/nzcm/nzcmrepo/
    cp -f *.grp /nz/var/nzcm/nzcmrepo/
    This installs nzcm.