File encoding

You can control the encoding of the files that the File connector reads or writes.

You can specify the file encoding in the following ways, which are listed in their order of precedence:
  1. As a value for the Encoding property in the stage editor.
  2. As a value for the charset attribute in a .osh schema file. You can use this method only if runtime column propagation is enabled and the connector uses metadata from a .osh schema file.
  3. As a value for the APT_IMPEXP_CHARSET environment variable.

To specify the file encoding, use the Encoding property. You can also specify the character set encoding of files in for the charset attribute in the .osh schema file or the APT_IMPEXP_CHARSET environment variable. If a value is specified in the Encoding property, the connector ignores the values in the charset attribute in the .osh schema file and the environment variable. If the property is not set, the connector uses the value in the .osh schema. If neither are set the connector uses the value in the environment variable.

You can use the job NLS map or the project NLS map to specify the character set for columns in your job of the Char, VarChar, and LongVarChar columns data types. When you use the File connector to write data, the value that is specified in the NLS map is used for columns of that the Char, VarChar, and LongVarChar data types. When you use the File connector to read data, connector encodes Char, VarChar, and LongVarChar data types in the specified encoding. Columns of the NChar, NVarChar, and LongNVarChar data types are always encoded in UTF-16.

Note: The Implicit format supports only single byte character encoding.

To ensure that data is not corrupted, use the same encoding for columns of Char, VarChar, and LongVarChar data types in all stages in the job. For more information about NLS configuration for other stages, see the related documentation.