Deployment options:
Netezza Performance Server for Cloud Pak for Data SystemNetezza Performance Server for Cloud Pak for Data
You can load data
from external tables on cloud into a database table on a Netezza Performance Server host system.
Before you begin
- If you want the database table to contain only the records from the external table, make sure
that the database table is empty before you load the data.
- If you load from an external table into a database table that is not empty, the load is
successful, but the new records are appended to the existing records.
About this task
You can load data from AWS S3 and IBM Cloud Object Storage. You can load all data types
(including Unicode) and file types (decompressed and compressed formats). To create an external
table, you must be the admin user or have the CREATE EXTERNAL TABLE
administration privilege. You must also have permission to read from the path of the data
object.
Procedure
-
Create an external table by issuing the CREATE EXTERNAL TABLE command with
the REMOTESOURCE
option.
CREATE EXTERNAL TABLE TABLE SAMEAS TABLE USING (
DATAOBJECT DATAOBJECT
REMOTESOURCE REMOTESOURCE
DELIM DELIM
UNIQUEID UNIQUEID
ACCESSKEYID ACCESSKEYID
SECRETACCESSKEY SECRETACCESSKEY
DEFAULTREGION DEFAULTREGION
BUCKETURL BUCKETURL
ENDPOINT ENDPOINT
MULTIPARTSIZEMB MULTIPARTSIZEMB
);
Example:
CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING (
DATAOBJECT ('/employee_data.dat')
REMOTESOURCE 'S3'
DELIM '|'
UNIQUEID 'samplebackup'
ACCESSKEYID 'xxxxx'
SECRETACCESSKEY 'xxxxx'
DEFAULTREGION 'ap-geo'
BUCKETURL 'my.backup.bucket'
ENDPOINT 's3.us-east.cloud-object-storage.appdomain.cloud'
MULTIPARTSIZEMB '50'
);
The
DATAOBJECT file specification must refer to a valid file on the
cloud platform. For the
REMOTESOURCE option, use
s3
.
Note: The
example showcases external table to IBM Cloud Object Storage. For IBM Cloud Object Storage,
ENDPOINT is mandatory. For AWS S3, the ENDPOINT option is
not mandatory.
This table lists all the additional parameters, which are necessary if you
want to use the
REMOTESOURCE option.
Option |
Distribution |
Description |
UNIQUEID |
Optional |
The namespace, which is used to group data in the cloud bucket. |
ACCESSKEYID |
Mandatory |
Key generated on AWS/IBM Cloud Object Storage. |
SECRETACCESSKEY |
Mandatory |
Secret access key generated on AWS/IBM Cloud Object Storage. |
DEFAULTREGION |
Mandatory |
Region of the bucket. |
BUCKETURL |
Mandatory |
Name of the bucket. |
MULTIPARTSIZEMB |
Optional |
The default is 105. This value is in MB. It indicates the size of each part in a multipart
upload. (Max 105 MB of buffer can be uploaded in one request) Tip: Use values such as 50
MB and higher. If you use 10 MB, software issues might occur.
|
ENDPOINT |
Optional for AWS. Mandatory for IBM Cloud Object Storage.
|
The region URL to access your bucket. |
- Insert data from the external table into the table on the Netezza Performance Server
host.
INSERT INTO TABLE SELECT * FROM TABLE;
Example:
INSERT INTO emp SELECT * FROM emp_backup;