Properties reference: File connector

This topic lists all properties that you can set to configure the stage.

Connection

For more information, see the Defining a connection topic.

File system
Select the file system to read files from or write files to.
  • Type: selection
  • Default: Local
  • Values:
    • Local
    • WebHDFS
    • HttpFS
    • NativeHDFS
Use custom URL
Select Yes to use custom URL instead of one generated based on Use SSL (HTTPS), Host, and Port
  • Type: boolean
  • Default: false
Use SSL (HTTPS)
Select Yes to use Secure Sockets Layer (HTTPS)
  • Type: boolean
For more information, see the Configuring the truststore topic.
SSL Truststore certificates
SSL truststore (X.509) certificates in the PEM format.
  • Type: string
Use Kerberos
Select Yes to use Kerberos
  • Type: boolean
Use keytab
Select Yes to use keytab instead of password for Kerberos login
  • Type: boolean
Host
Enter the name of the host.
  • Type: string
Port
Enter the port to connect to.
  • Type: integer
Service principal
Enter the service principal for the host. Use this property to specify the service principal for the host, if the realm of the host is different from the realm of the user. Service principal for the web server is of the format <B>HTTP/FQDN@REALM</B>
  • Type: string
User name
Enter the name of the user to connect as.
  • Type: string
Password
Enter the password for the specified user.
  • Type: protected string
Keytab
Enter the fully qualified path of the keytab for the specified user.
  • Type: string
Custom URL
Enter the base URL (http or https) for the WebHDFS or gateway server
  • Type: string
Use Proxy
Select Yes to use Proxy server
  • Type: boolean
  • Default: false
For more information, see the Configuring the Proxy topic.
Proxy host
Enter the host name of the Proxy server
  • Type: string
Proxy port
Enter the port for Proxy server
  • Type: integer
Proxy user name
Enter the name of the user to connect to Proxy
  • Type: string
Proxy password
Enter the password for the specified user
  • Type: protected string
HDFS HA connection options
Use this property when the <B>HDFS high availability</B> is enabled on the Hadoop cluster and the standby namenode and the name service details needs to be configured. The HDFS high availability is supported only for the <B>NativeHDFS</B> and <B>WebHDFS</B> FileSystem modes.
  • Type: boolean
  • Default: false
Nameservice ID
Use this property to specify the HDFS HA nameservice ID
  • Type: string
  • Default: hadoop.ha.nameservice
Standby namenode(s)
Use this property to specify a comma separated list of the standby namenode(s) along with the namenode port details. The host and the port should be separated by colon as shown in the example. <B> HOST1:PORT1;HOST2:PORT2 </B>
  • Type: string
Standby namenode service principal
Use this property to specify the service principal for the host on which the standby namenode is configured. The service principal is required when the realm of the standby namenode host is different from the realm of the user principal accessing the HDFS. Service principal for the web server is of the format <B>HTTP/FQDN@REALM</B>
  • Type: string
Advanced configuration options
Use the properties in this category to set any of the advanced configuration options that needs to be set for any of the supported FileSystem modes.
  • Type: category
HDFS client parameters
Use this property to specify the HDFS Native client configuration parameters as key=value pairs. While running in the Hadoop cluster, these are typically set either in the core-site.xml or hdfs-site.xml. When using the Native HDFS FileSystem mode, additional configuration parameters can be passed to the HDFS Client using this property. The parameters should be separated by semi-colon as shown in the example, hadoop.rpc.protection=privacy;hadoop.tmp.dir=logdir;
  • Type: string

Usage

Is reject link
Select <B>Yes</B> if the link is configured to carry rejected data from the source stage.
  • Type: boolean
  • Default: false
Write mode
Select Write single file to write a file per node, select Write multiple files to write multiple files per node (based on size and/or key value), or select Delete to delete files.
  • Type: selection
  • Default: Write single file
  • Values:
    • Write single file
    • Write multiple files
    • Delete
For more information, see the Configuring the File connector as a target topic.
Read mode
Select <B>Read single file</B> to read from a single file or <B>Read multiple files</B> to read from the files that match a specified file prefix. Select <B>List buckets</B> to list the buckets for your account in the specified region. Select <B>List files</B> to list files files for your account in the specified bucket.
  • Type: selection
  • Default: Read multiple files
  • Values:
    • Read single file
    • Read multiple files
Exclude files
Specify a comma-separated list of file prefixes to exclude from the files that are read. If a prefix includes a comma, escape the comma by using a backslash (\\).
  • Type: string
Wave handling properties
Use the properties under this category to define how the data should be handled when it is being streamed as waves (i.e in batches) from the upstream stage. Typically, this would be required for any source stages that are configured to send the data in waves
  • Type: category
Append unique identifier
Use this property to choose if a unique identifier is to be appended to the file name. When the value of this property is set to yes, then the file name gets appended with the unique identifier, and a new file would be written for every wave of data that is streamed into the stage. When the value of this property is set to No, then the file would be overwritten on every wave
  • Type: boolean
  • Default: false
File size threshold
Specify the threshold for the file size in megabytes. Processing nodes will start a new file each time the size exceeds the value specified in the threshold and on reaching the wave boundary. The file will be written only on the wave boundary and hence the threshold value specified is only a soft limit. The actual size of the file can be higher than the specified threshold depending on the size of the wave
  • Type: integer
  • Default: 1
If file exists
Specify what the connector does when it tries to write a file that already exists. Select <B>Overwrite file</B> to overwrite a file if it already exists, <B>Do not overwrite file</B> to not overwrite the file and stop the job, or <B>Fail</B> to stop the job with an error message.
  • Type: selection
  • Default: Overwrite file
  • Values:
    • Overwrite file
    • Do not overwrite file
    • Fail
Split file on key changes
Select Yes to create a new file when key column value changes. Data must be sorted and partitioned for this to work properly.
  • Type: boolean
  • Default: false
Key column
Specify the key column to use for splitting files. If not specified, the connector will use the first key column on the link.
  • Type: string
Case sensitive
Select Yes to make the key value case sensitive.
  • Type: boolean
  • Default: false
Use key value in file name
Select Yes to use the key value in the generated file name.
  • Type: boolean
  • Default: false
Exclude partition string
Select Yes to exclude the partition string each processing node appends to the file name.
  • Type: boolean
  • Default: false
Maximum file size
Specify the maximum file size in megabytes. Processing nodes will start a new file each time the size exceeds this value.
  • Type: integer
  • Default: 0
Force sequential
Select Yes to run the connector sequentially on one node.
  • Type: boolean
  • Default: false
Reject mode
Specify what the connector does when a record that contains invalid data is found in the source file. Select <B>Continue</B> to read the rest of the file, <B>Fail</B> to stop the job with an error message, or <B>Reject</B> to send the rejected data to a reject link.
  • Type: selection
  • Default: Continue
  • Values:
    • Continue
    • Fail
    • Reject
For more information, see the Rejecting records that contain errors topic.
Cleanup on failure
If a job fails, select whether the connector deletes the file or files that have been created.
  • Type: boolean
  • Default: true
File name column
Specify the name of the column to write the source file name to.
  • Type: string
File format
Specify the format of the files to read or write. The implicit file format specifies that the input to the file connector is in binary or string format without a delimiter.
  • Type: selection
  • Default: Delimited
  • Values:
    • Delimited
    • Comma-separated value (CSV)
    • Avro
    • Implicit
    • orc
    • parquet
    • sequencefile
Avro format properties
Avro configuration properties
  • Type: category
Output as JSON
Specify if each rows in the avro file should be exported as JSON to a string column.
  • Type: boolean
  • Default: false
Avro format properties
Avro configuration properties
  • Type: category
Input as JSON
Specify if each rows in avro file should be imported from a JSON string.
  • Type: boolean
  • Default: false
Avro schema file
Specify the fully qualified path for a JSON file that defines the schema for the Avro file.
  • Type: string
Avro compression codec
Specify the compression algorithm that will be used to compress the data.
  • Type: selection
  • Default: None
  • Values:
    • None
    • Deflate
    • Snappy
    • Bzip2
Array keys
If the file format is Avro in a target stage, then normalization is controlled through array keys. Specify ''ITERATE()'' in the description for the corresponding array field in column definition in the input tab of file connector.
  • Type: string
ORC Settings
ORC configuration properties
  • Type: category
ORC Settings
ORC configuration properties
  • Type: category
Stripe Size
Stripe Size
  • Type: integer
  • Default: 100000
Buffer Size
Buffer Size
  • Type: integer
  • Default: 10000
Compression Kind
Specify Compression mechanism
  • Type: selection
  • Default: SNAPPY
  • Values:
    • NONE
    • ZLIB
    • SNAPPY
Parquet settings
Parquet configuration properties
  • Type: category
Parquet settings
Parquet configuration properties
  • Type: category
Block size
Block size
  • Type: integer
  • Default: 10000000
Page size
Page size
  • Type: integer
  • Default: 10000
Compression type
Specify compression mechanism
  • Type: selection
  • Default: SNAPPY
  • Values:
    • NONE
    • SNAPPY
    • GZIP
    • LZO
Delimited format properties
Specify the file syntax for delimited files.
  • Type: category
Record limit
Specify the maximum number of records to read from the file per node. If a value is not specified for this property, the entire file is read.
  • Type: integer
Encoding
Specify the encoding of the files to read or write, for example, UTF-8.
  • Type: string
For more information, see the File encoding topic.
Include byte order mark
Specify whether to include a byte order mark in the file when the file encoding is a Unicode encoding such as UTF-8, UTF-16, or UTF-32.
  • Type: boolean
  • Default: false
Record definition
Select whether the record definition is provided to the connector from the source file, a delimited string, a file that contains a delimited string, or a schema file. When runtime column propagation is enabled, this metadata provides the column definitions. If a schema file is provided, the schema file overrides the values of formatting properties in the stage and the column definitions that are specified on the Columns page of the output link.
  • Type: selection
  • Default: None
  • Values:
    • None
    • File header
    • Delimited string
    • Delimited string in a file
    • Schema file
    • Infer schema
Definition source
If the record definition is a delimited string, enter a delimited string that specifies the names and data types of the files. Use the format name:data_type, and separate each field with the delimiter specified as the >B<Field delimiter>/B< property. If the record definition is in a delimited string file or Osh schema file, specify the full path of the file.
  • Type: string
For more information, see the Metadata formatting options topic.
First row is header
Select <B>Yes</B> if the first row of the file contains field headers and is not part of the data. If you select <B>Yes</B>, when the connector writes data, the field names will be the first row of the output. If runtime column propagation is enabled, metadata can be obtained from the first row of the file.
  • Type: boolean
  • Default: false
Include data types
Select <B>Yes</B> to append the data type to each field name that the connector writes in the first row of the output.
  • Type: boolean
  • Default: false
Field delimiter
Specify a string or one of the following values: <NL>, <CR>, <LF>, <TAB>. The string can include Unicode escape strings in the form \\uNNNN where NNNN is the Unicode character code.
  • Type: string
  • Default: ,
Row delimiter
Specify a string or one of the following values: <NL>, <CR>, <LF>, <TAB>. The string can include Unicode escape strings in the form \\uNNNN where NNNN is the Unicode character code.
  • Type: string
  • Default: <NL>
Escape character
Specify the character to use to escape field and row delimiters. If an escape character exists in the data, the escape character is also escaped. Because escape characters require additional processing, do not specify a value for this property if you do not need to include escape characters in the data.
  • Type: string
Quotation mark
  • Type: selection
  • Default: None
  • Values:
    • None
    • Double
    • Single
Null value
Specify the character or string that represents null values in the data. For a source stage, input data that has the value that you specify is set to null on the output link. For a target stage, in the output file that is written to the file system, null values are represented by the value that is specified for this property. To specify that an empty string represents a null value, specify "" (two double quotation marks).
  • Type: string
Field formats
  • Type: category
For more information, see the Formatting options for Decimal, Time, Date, and Timestamp data types topic.
Decimal format
Specify a string that defines the format for fields that have the Decimal or Numeric data type.
  • Type: string
Date format
Specify a string that defines the format for fields that have the Date data type.
  • Type: string
Time format
Specify a string that defines the format for fields that have the Time data type.
  • Type: string
Timestamp format
Specify a string that defines the format for fields that have the Timestamp data type.
  • Type: string
Implicit format properties
Specify the file syntax for implicit files.
  • Type: category
Data format
Specify the type of implicit file
  • Type: selection
  • Default: Binary
  • Values:
    • Binary
Record limit
Specify the maximum number of records to read from the file per node. If a value is not specified for this property, the entire file is read.
  • Type: integer
Encoding
Specify the encoding of the files to read or write, for example, UTF-8.
  • Type: string
For more information, see the File encoding topic.
Include byte order mark
Specify whether to include a byte order mark in the file when the file encoding is a Unicode encoding such as UTF-8, UTF-16, or UTF-32.
  • Type: boolean
  • Default: false
Record definition
Select whether the record definition is provided to the connector from the source file, a delimited string, a file that contains a delimited string, or a schema file. When runtime column propagation is enabled, this metadata provides the column definitions. If a schema file is provided, the schema file overrides the values of formatting properties in the stage and the column definitions that are specified on the Columns page of the output link.
  • Type: selection
  • Default: None
  • Values:
    • None
    • File header
    • Delimited string
    • Delimited string in a file
    • Schema file
Definition source
Enter a delimited string that specifies the names and data types and length of each fields. Use the format name:data_type[length], and separate each field with the delimiter specified as the >B<Field delimiter>/B< property. If the record definition is in a delimited string file or Osh schema file, specify the full path of the file.
  • Type: string
For more information, see the Metadata formatting options topic.
First row is header
Select <B>Yes</B> if the first row of the file contains field headers and is not part of the data. If you select <B>Yes</B>, when the connector writes data, the field names will be the first row of the output. If runtime column propagation is enabled, metadata can be obtained from the first row of the file.
  • Type: boolean
  • Default: false
Include data types
Select <B>Yes</B> to append the data type to each field name that the connector writes in the first row of the output.
  • Type: boolean
  • Default: false
Trace file
Specify the full path to a file to contain trace information from the parser for delimited files. Because writing to a trace file requires additional processing, specify a value for this property only during job development.
  • Type: string
Create or Use existing hive table
Select <B>Yes</B> to create or use an existing Hive table after data has been loaded to HDFS.
  • Type: boolean
  • Default: false
Use staging table
Set <B>Yes</B> to use staging table. This option will be enabled only when the <B>FileFormat</B> is <B>Delimited</B>
  • Type: boolean
  • Default: false
Target table properties
  • Type: category
Table format
Use this property to set the format of the target table
  • Type: selection
  • Default: parquet
  • Values:
    • orc
    • parquet
ORC compression
Use this property to set the compression type for the target table when the table format is ORC
  • Type: selection
  • Default: ZLIB
  • Values:
    • NONE
    • ZLIB
    • SNAPPY
Parquet compression
Use this property to set the compression type for the target table when the table format is Parquet
  • Type: selection
  • Default: SNAPPY
  • Values:
    • NONE
    • SNAPPY
    • GZIP
    • LZO
Stripe size
Stripe size
  • Type: integer
  • Default: 64
Table type
Use this property to set the format of the type of the target table.
  • Type: selection
  • Default: External
  • Values:
    • External
    • Internal
Location
Use this property to set the location of the HDFS files serving as storage for the Hive table
  • Type: string
Drop staging table
Use this property to drop the staging table. By default, the staging table would be dropped once the target table has been created. In case, the user do not want the staging table to be removed, set the value of this property to No
  • Type: boolean
  • Default: true
Load into existing table
Use the properties in this category while loading the data into an existing hive table. The table can be partitioned or non-partitioned.
  • Type: category
Maximum number of dynamic partitions
Use this property to set the maximum number of Dynamic paritions to be created while loading into a partitioned table.
  • Type: integer
  • Default: 1000
Enable SSL
Select <B>Yes</B> if SSL is enabled on the Hive server.
  • Type: boolean
  • Default: false
SSL Truststore certificates
SSL truststore (X.509) certificates in the PEM format.
  • Type: string
Use Kerberos
Select Yes to use Kerberos
  • Type: boolean
  • Default: false
Use keytab
Select Yes to use keytab instead of password for Kerberos login
  • Type: boolean
Hive host
Enter the name of the host. If not specified, the value specified in <B>Host name</B> will be used.
  • Type: string
Hive port
Enter the port for Hive.
  • Type: integer
Hive user name
Enter the name of the user to connect to Hive as.
  • Type: string
Hive password
Enter the password for the specified user.
  • Type: protected string
Hive keytab
Enter the fully qualified path of the keytab for the specified user.
  • Type: string
Hive service principal
Use this property to specify the service principal for hive. Service principal for the hive service is of the format <B>hive/FQDN@REALM</B>
  • Type: string
Table
Enter the name of the table to create.
  • Type: string
Hive table type
Specify Hive table type, as external (default) or internal.
  • Type: selection
  • Default: External
  • Values:
    • External
    • Internal
Create schema
Specify Yes to create the schema indicated in the fully qualified table name if it does not already exist. If Yes is specified and the table name does not contain a schema, the job will fail. If Yes is specified and the schema already exists, the job will not fail.
  • Type: boolean
  • Default: false
Drop existing table
Specify <B>Yes</B> to drop the Hive table if it already exists. <B>No</B> to append to existing Hive table.
  • Type: boolean
  • Default: true
Additional driver attributes
Specify any additional driver-specific connection attributes. Enter the attributes in the name=value format, separated by semi-colon if multiple attributes needs to be specified. For information about the supported driver-specific attributes, refer to the Progress DataDirect driver documentation.
  • Type: string