Teradata
The Teradata target writes data to one or more tables in a Teradata database. For information about supported versions, see Supported systems and versions.
You can use the Teradata target to write new data or change data capture (CDC) data to Teradata. When processing new data, the target can load data to Teradata using the COPY command. When processing CDC data, the target uses the MERGE command.
The Teradata target writes data from record fields to table columns based on matching names. The target can compensate for data drift by creating new columns and tables in Teradata when new fields and table references appear in records.
The Teradata target uses a JDBC URL to connect to the Teradata database. When you configure the target, you specify the JDBC URL and credentials to use to connect to the database.
You configure the Teradata database and tables to use. You specify the load method, error handling, and staging details.
You can optionally enable data drift. You can have the target create new tables, as needed. You can also specify whether to create all new columns as String instead of inferring the type.
Before you use the Teradata target, you must complete several prerequisite tasks.
Prerequisites
- Install the Teradata driver for the database.
- If using external staging, create a Teradata Authorization for the stage. For information on creating a Teradata Authorization, see the Teradata documentation.
- To use the COPY command to load new data, complete the COPY prerequisites.
- To use the MERGE command to load CDC data, complete the MERGE prerequisites.
Installing the Teradata driver
Before you use the Teradata target, install the Teradata driver for the database. You cannot access the database until you install the required driver.
You install the driver into the Teradata stage library,
streamsets-datacollector-teradata-lib, which includes the
target.
For information about installing additional drivers, see Install external libraries.
COPY prerequisites
When processing new data, you can configure the target to use the COPY command to load data to Teradata tables.
Using the COPY command to load data requires a role with one of the following sets of access privileges:
- Required privileges when using a local stage:
Object Type Privilege Table SELECT, INSERT - Required privileges when using an external stage:
Object Type Privilege Authorization USAGE Table SELECT, INSERT
MERGE prerequisites
When processing CDC data, you can configure the target to use the MERGE command to load data to Teradata tables.
Using the MERGE command to load data requires a role with one of the following sets of access privileges:
- Required privileges when using a local stage:
Object Type Privilege Table SELECT, INSERT, UPDATE, DELETE - Required privileges when using an external stage:
Object Type Privilege Authorization USAGE Table SELECT, INSERT, UPDATE, DELETE
Load methods
- When using local staging, the target uses Teradata FastLoad capabilities to create temporary tables.
- When using external staging, the target uses external tables to create temporary tables.
The target can load data to Teradata tables using the following methods:
- COPY command for new data
- The COPY command, the default load method, performs a bulk synchronous load to Teradata, treating all records as INSERTS. Use this method to write new data to Teradata tables.
- MERGE command for CDC data
- Like the COPY command, the MERGE command performs a bulk synchronous load to Teradata. But instead of treating all records as INSERT, it inserts, updates, upserts, and deletes records as appropriate. Use this method to write change data capture (CDC) data to Teradata tables using CRUD operations.
Use the recommended guidelines to optimize for performance and cost-effectiveness.
For more information about the COPY or MERGE commands, see the Teradata documentation.
Primary key location
When you configure the Teradata target to write CDC data using the MERGE command, you must provide primary key information. You can configure the target to access primary key information in the following ways:
- Use JDBC record header attribute
- The target uses primary key columns stored in the
jdbc.primaryKeySpecificationrecord header attribute. - Use specified columns for each table
- The target uses the primary key columns that you specify in the Table Key Columns properties.
Performance optimization
Use the following tips to optimize for performance and cost-effectiveness when using the Teradata target:
- Increase the batch size
- The maximum batch size is determined by the source in the flow and typically has a default value of 1,000 records. To take advantage of the Teradata loading abilities when writing to Teradata tables using the COPY or MERGE commands, increase the maximum batch size in the flow source to 20,000-50,000 records. Be sure to increase the Data Collector Java heap size, as needed.
- Use multiple threads
- When writing to Teradata tables using the COPY command, you can use multiple threads to improve performance by including a multithreaded source in the flow. When Data Collector resources allow, using multiple threads enables processing multiple batches of data concurrently. As with increasing the batch size, when using multiple threads, you should make sure that the Data Collector is sized appropriately.
- Enable additional connections to Teradata
- When writing to multiple Teradata tables using the COPY or MERGE commands, increase the number of connections that the Teradata target makes to Teradata. Each additional connection allows the target to write to an additional table, concurrently.
Specifying tables
- Single table
- To write data to a single table, simply enter table name as
follows:
<table_name> - Multiple tables
- To write data to multiple tables, specify a field in the record that defines the database and tables.
Use the Table property on the Teradata tab to specify the tables to write to.
Enabling data drift handling
The Teradata target can automatically compensate for changes in column or table requirements, also known as data drift.
- Create new columns
-
The target can create new columns in Teradata tables when new fields appear in records. For example, if a record suddenly includes a new
Address2field, the target creates a newAddress2column in the target table.By default, the target creates new columns based on the data in the new fields, such as creating a Double column for decimal data. You can, however, configure the target to create all new columns as String.
- Create new tables
- The target can create new
tables as needed. For example, say the target writes data to
tables based on the region name in the
Regionfield. When a newSW-3region shows up in a record, the target creates a newSW-3table in Teradata and writes the record to the new table.
Generated data types
When creating new tables or creating new columns in existing tables, the Teradata target uses field names to generate the new column names.
| Record Field Data Type | Teradata Column Data Type |
|---|---|
| Byte | Byte |
| Byte Array | Varbyte |
| Char | Character |
| String | Varchar |
| Decimal | Decimal |
| Long | Bigint |
| Float | Float |
| Integer | Integer |
| Short | Smallint |
| Time | Time |
| Date | Date |
| Datetime | Timestamp |
| Zoned_Datetime | Timestamp With Time Zone |
| Boolean | Varchar |
| Double | Float |
| File_Ref | Varchar |
The target does not support nested Data Collector data types: List, List-Map, and Map. By default, the target treats fields with invalid data types as an error record. You can configure the target to ignore fields with invalid data types, as described in Missing fields and fields with invalid types.
Creating tables
If you configure the Teradata target to handle data drift, you can also configure the target to create tables. Once configured, the target creates tables when the specified or needed tables do not exist in Teradata.
- Table
- The target creates a table if the table specified in the Table property does not exist.
- Table columns
- In the created tables, the target determines the columns from the first batch of data processed. The target infers the data types from the data in the batch.
- Primary key columns
- In the created tables, the target creates primary key columns. If
configured to use the COPY command for new data, the target uses the
primary key information in the
jdbc.primaryKeySpecificationrecord header attribute to create primary key columns.
Staging location
The Teradata target first stages the flow data in text files in a specified location. Then, the target sends the COPY or MERGE command to Teradata to process the staged files.
- Amazon S3
- After selecting Amazon S3 as the staging location, specify the existing S3 bucket to stage the files to. You also specify the credentials that the target uses to connect to Amazon S3.
- ADLS Gen2
- After selecting ADLS Gen2 as the staging location, specify the name of the existing Azure account and storage container to stage the files to. You then configure the target to use the appropriate authentication method to connect to Azure Data Lake Storage Gen2.
- Azure Blob Storage
- After selecting Azure Blob Storage as the staging location, specify the fully-qualified domain name of the Blob Storage account and the name of the storage container or file system to stage files to. You then configure the target to use the appropriate authentication method to connect to Azure Blob Storage.
- Google Cloud Storage
- After selecting Google Cloud Storage as the staging location, specify the project ID and an existing Google Cloud Storage bucket to use. You also define credential provider details.
- Local FileSystem
- After selecting Local FileSystem as the staging location, specify the local directory to stage the files to and a prefix for the staged files. You can optionally configure a limit on the number of FastLoad sessions for staging. For more information on the Teradata FastLoad utility, see the Teradata documentation.
For all staging locations, you specify the stage file name prefix and whether the target removes a staged file after its data is written to Teradata tables.
Amazon S3 credentials
When you configure the target to connect to an Amazon S3 staging location, the target must pass credentials to Amazon Web Services. Use one of the following methods to pass AWS credentials:
- Instance profile
- When Data Collector runs on an Amazon EC2 instance that has an associated instance profile, Data Collector uses the instance profile credentials to automatically authenticate with AWS.
- AWS access key pair
-
When Data Collector does not run on an Amazon EC2 instance or when the EC2 instance doesn’t have an instance profile, you can connect with an AWS access key pair.
To connect with an AWS access key pair, on the Staging tab, set the Authentication Method property to AWS Keys. Then define the access key ID and secret access key.Tip: To secure sensitive information such as access key pairs, you can use runtime resources or credential stores.
Azure authentication information
When you configure the target to connect to an ADLS Gen2 or Azure Blob Storage staging location, you select the authentication method that the target uses to connect to Azure Data Lake Storage Gen2 or Azure Blob Storage.
- OAuth with Service Principal
- Connections made with OAuth with Service Principal authentication
require the following information:
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
For information on accessing the application ID from the Azure portal, see the Azure documentation.
- Endpoint - OAuth 2.0 token endpoint for
the Azure Active Directory v1.0 application for Data Collector. For example:
https://login.microsoftonline.com/<uuid>/oauth2/token. - Application Key - Authentication key or client secret
for the Azure Active Directory application. Also known as the
client secret.
For information on accessing the application key from the Azure portal, see the Azure documentation.
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
- Azure Managed Identity
- Connections made with Azure Managed Identity authentication
require the following information:
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
For information on accessing the application ID from the Azure portal, see the Azure documentation.
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
- Shared Key
- Connections made with Shared Key authentication require the following
information:
- Account Shared Key - Shared access key that Azure
generated for the storage account.
For more information on accessing the shared access key from the Azure portal, see the Azure documentation.
- Account Shared Key - Shared access key that Azure
generated for the storage account.
- SAS Token
-
Connections made with SAS Token authentication require the following information:
- Azure SAS Token - Shared access signature
(SAS) token that provides secure access to the needed resources
in Azure Blob Storage.
For more information on SAS tokens for storage containers, see the Azure documentation.
SAS Token authentication is available only for Azure Blob Storage staging.
- Azure SAS Token - Shared access signature
(SAS) token that provides secure access to the needed resources
in Azure Blob Storage.
Google Cloud credentials
Before staging data on Google Cloud Storage, the Teradata target must pass credentials to Google Cloud.
- Google Cloud default credentials
- Credentials in a file
- Credentials in a stage property
For details on how to configure each option, see Security in Google Cloud stages.
Row generation
When writing a record to a table, the Teradata target includes all record fields in
the resulting row, by default. The target uses the root field, /,
as the basis for the resulting row.
You can configure the Row Field property to specify a map or list-map field in the record as the basis for the row. The resulting record includes only the data from the specified map or list-map field and excludes all other record data. Use this functionality when the data that you want to write to Teradata exists in a single map or list-map field within the record.
If you want to use the root field, but do not want to include all fields in the resulting row, you can configure the target to ignore all specified first-level fields.
/, as the basis for the resulting row. A record contains the
following fields:{
"name": "Jane Smith",
"id": "557",
"address": {
"street": "101 3rd St",
"city": "Huntsville",
"state": "NC",
"zipcode": "27023"
}
}The target treats the address map field as a field with an
invalid data type, processing the field as an error record by default. You can configure
the target to ignore the field and process the remaining record data, as described
in Missing fields and fields with invalid types.
Missing fields and fields with invalid types
By default, the target treats records with missing fields or with invalid data types in fields as error records. You can configure the target to ignore missing fields or fields with invalid data types, replacing the data in the field with an empty value.
You can specify a default value to use for each data type on the Data Defaults tab. For
example, you might define the default value for a missing String field or a String field with an
invalid data type as none or not_applicable.
To configure the target to ignore missing fields and fields with invalid data types, select the Ignore Missing Fields and the Ignore Fields with Invalid Types properties on the Data Defaults tab.
CRUD operation processing
The Teradata target can insert, update, upsert, or delete data when you configure the target to process CDC data. When processing CDC data, the target uses the MERGE command to write data to Teradata.
sdc.operation.type record header attribute. The target
performs operations based on the following numeric values:- 1 for INSERT
- 2 for DELETE
- 3 for UPDATE
- 4 for UPSERT
If
your flow has a CRUD-enabled source that processes changed data, the
target simply reads the operation type from the sdc.operation.type
header attribute that the source generates. If your flow has a non-CDC
source, you can use the Expression Evaluator processor or a scripting processor to
define the record header attribute. For more information about Data Collector changed data processing and a list of
CDC-enabled sources, see Processing changed data.
Configuring a Teradata target
About this task
Configure a Teradata target to write data to Teradata tables.