Specifying an InfoSphere DataStage ustring character set

The InfoSphere® DataStage® Teradata interface operators perform character set conversions between the Teradata database characters sets and the multi-byte Unicode ustring field type data.

You use the -db_cs option of the Teradata interface operators to specify a ICU character set for mapping strings between the database and the Teradata operators. The default value is Latin1.

Its syntax is:


teraread | terawrite -db_cs icu_character_set

For example:


terawrite -db_cs ISO-8859-5

Your database character set specification controls the following conversions:

  • SQL statements are mapped to your specified character set before they are sent to the database via the native Teradata APIs.
  • If you do not want your SQL statements converted to this character set, set the APT_TERA_NO_SQL_CONVERSION environment variable. This variable forces the SQL statements to be converted to Latin1 instead of the character set specified by the -db_cs option.
  • All char and varchar data read from the Teradata database is mapped from your specified character set to the ustring schema data type (UTF-16). If you do not specify the -db_cs option, string data is read into a string schema type without conversion.
  • The teraread operator converts a varchar(n) field to ustring[n/min], where min is the minimum size in bytes of the largest codepoint for the character set specified by -db_cs.
  • ustring schema type data written to a char or varchar column in the Teradata database is converted to your specified character set.
  • When writing a varchar(n) field, the terawrite operator schema type is ustring[n * max] where max is the maximum size in bytes of the largest codepoint for the character set specified by -db_cs.

No other environment variables are required to use the Teradata operators. All the required libraries are in /usr/lib which should be on your PATH.

To speed up the start time of the load process slightly, you can set the APT_TERA_NO_PERM_CHECKS environment variable to bypass permission checking on several system tables that need to be readable during the load process.