Loading data from cloud: AWS S3 and IBM Cloud Object Storage

Deployment options: Netezza Performance Server for Cloud Pak for Data SystemNetezza Performance Server for Cloud Pak for Data

You can load data from external tables on cloud into a database table on a Netezza Performance Server host system.

Before you begin

  • If you want the database table to contain only the records from the external table, make sure that the database table is empty before you load the data.
  • If you load from an external table into a database table that is not empty, the load is successful, but the new records are appended to the existing records.

About this task

You can load data from AWS S3 and IBM Cloud Object Storage. You can load all data types (including Unicode) and file types (decompressed and compressed formats).

To create an external table, you must be the admin user or have the CREATE EXTERNAL TABLE administration privilege. You must also have permission to read from the path of the data object.

Procedure

  1. Create an external table by issuing the CREATE EXTERNAL TABLE command with the REMOTESOURCE option.
    CREATE EXTERNAL TABLE TABLE SAMEAS TABLE USING (
    DATAOBJECT DATAOBJECT
    REMOTESOURCE REMOTESOURCE
    DELIM DELIM
    UNIQUEID UNIQUEID
    ACCESSKEYID ACCESSKEYID
    SECRETACCESSKEY SECRETACCESSKEY
    DEFAULTREGION DEFAULTREGION
    BUCKETURL BUCKETURL
    ENDPOINT ENDPOINT
    MULTIPARTSIZEMB MULTIPARTSIZEMB 
    );
    Example:
    CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING (
    DATAOBJECT ('/employee_data.dat')
    REMOTESOURCE 'S3' 
    DELIM '|' 
    UNIQUEID 'samplebackup' 
    ACCESSKEYID 'xxxxx'
    SECRETACCESSKEY 'xxxxx'
    DEFAULTREGION 'ap-geo' 
    BUCKETURL 'my.backup.bucket' 
    ENDPOINT 's3.us-east.cloud-object-storage.appdomain.cloud' 
    MULTIPARTSIZEMB '50'
    );
    The DATAOBJECT file specification must refer to a valid file on the cloud platform. For the REMOTESOURCE option, use s3.
    Note: The example showcases external table to IBM Cloud Object Storage. For IBM Cloud Object Storage, ENDPOINT is mandatory. For AWS S3, the ENDPOINT option is not mandatory.
    This table lists all the additional parameters, which are necessary if you want to use the REMOTESOURCE option.
    Option Distribution Description
    UNIQUEID Optional The namespace, which is used to group data in the cloud bucket.
    ACCESSKEYID Mandatory Key generated on AWS/IBM Cloud Object Storage.
    SECRETACCESSKEY Mandatory Secret access key generated on AWS/IBM Cloud Object Storage.
    DEFAULTREGION Mandatory Region of the bucket.
    BUCKETURL Mandatory Name of the bucket.
    MULTIPARTSIZEMB Optional The default is 105. This value is in MB. It indicates the size of each part in a multipart upload. (Max 105 MB of buffer can be uploaded in one request)
    Tip: Use values such as 50 MB and higher. If you use 10 MB, software issues might occur.
    ENDPOINT Optional for AWS.

    Mandatory for IBM Cloud Object Storage.

    The region URL to access your bucket.
  2. Insert data from the external table into the table on the Netezza Performance Server host.
    INSERT INTO TABLE SELECT * FROM TABLE;
    Example:
    INSERT INTO emp SELECT * FROM emp_backup;