Properties reference: File connector
This topic lists all properties that you can set to configure the stage.
Connection
For more information, see the Defining a connection topic.
- File system
- Select the file system to read files from or write files to.
- Type: selection
- Default: Local
- Values:
- Local
- WebHDFS
- HttpFS
- NativeHDFS
- Use custom URL
- Select Yes to use custom URL instead of one generated based on Use SSL (HTTPS), Host, and Port
- Type: boolean
- Default: false
- Use SSL (HTTPS)
- Select Yes to use Secure Sockets Layer (HTTPS)
- Type: boolean
- SSL Truststore certificates
- SSL truststore (X.509) certificates in the PEM format.
- Type: string
- Use Kerberos
- Select Yes to use Kerberos
- Type: boolean
- Use keytab
- Select Yes to use keytab instead of password for Kerberos login
- Type: boolean
- Host
- Enter the name of the host.
- Type: string
- Port
- Enter the port to connect to.
- Type: integer
- Service principal
- Enter the service principal for the host. Use this property to specify the service principal for the host, if the realm of the host is different from the realm of the user. Service principal for the web server is of the format <B>HTTP/FQDN@REALM</B>
- Type: string
- User name
- Enter the name of the user to connect as.
- Type: string
- Password
- Enter the password for the specified user.
- Type: protected string
- Keytab
- Enter the fully qualified path of the keytab for the specified user.
- Type: string
- Custom URL
- Enter the base URL (http or https) for the WebHDFS or gateway server
- Type: string
- Use Proxy
- Select Yes to use Proxy server
- Type: boolean
- Default: false
- Proxy host
- Enter the host name of the Proxy server
- Type: string
- Proxy port
- Enter the port for Proxy server
- Type: integer
- Proxy user name
- Enter the name of the user to connect to Proxy
- Type: string
- Proxy password
- Enter the password for the specified user
- Type: protected string
- HDFS HA connection options
- Use this property when the <B>HDFS high availability</B> is enabled on the Hadoop cluster and the standby namenode and the name service details needs to be configured. The HDFS high availability is supported only for the <B>NativeHDFS</B> and <B>WebHDFS</B> FileSystem modes.
- Type: boolean
- Default: false
- Nameservice ID
- Use this property to specify the HDFS HA nameservice ID
- Type: string
- Default: hadoop.ha.nameservice
- Standby namenode(s)
- Use this property to specify a comma separated list of the standby namenode(s) along with the namenode port details. The host and the port should be separated by colon as shown in the example. <B> HOST1:PORT1;HOST2:PORT2 </B>
- Type: string
- Standby namenode service principal
- Use this property to specify the service principal for the host on which the standby namenode is configured. The service principal is required when the realm of the standby namenode host is different from the realm of the user principal accessing the HDFS. Service principal for the web server is of the format <B>HTTP/FQDN@REALM</B>
- Type: string
- Advanced configuration options
- Use the properties in this category to set any of the advanced configuration options that needs to be set for any of the supported FileSystem modes.
- Type: category
- HDFS client parameters
- Use this property to specify the HDFS Native client configuration parameters as key=value pairs. While running in the Hadoop cluster, these are typically set either in the core-site.xml or hdfs-site.xml. When using the Native HDFS FileSystem mode, additional configuration parameters can be passed to the HDFS Client using this property. The parameters should be separated by semi-colon as shown in the example, hadoop.rpc.protection=privacy;hadoop.tmp.dir=logdir;
- Type: string
Usage
- Is reject link
- Select <B>Yes</B> if the link is configured to carry rejected data from the source stage.
- Type: boolean
- Default: false
- Write mode
- Select Write single file to write a file per node, select Write multiple files to write multiple files per node (based on size and/or key value), or select Delete to delete files.
- Type: selection
- Default: Write single file
- Values:
- Write single file
- Write multiple files
- Delete
- Read mode
- Select <B>Read single file</B> to read from a single file or <B>Read multiple files</B> to read from the files that match a specified file prefix. Select <B>List buckets</B> to list the buckets for your account in the specified region. Select <B>List files</B> to list files files for your account in the specified bucket.
- Type: selection
- Default: Read multiple files
- Values:
- Read single file
- Read multiple files
- Exclude files
- Specify a comma-separated list of file prefixes to exclude from the files that are read. If a prefix includes a comma, escape the comma by using a backslash (\\).
- Type: string
- Wave handling properties
- Use the properties under this category to define how the data should be handled when it is being streamed as waves (i.e in batches) from the upstream stage. Typically, this would be required for any source stages that are configured to send the data in waves
- Type: category
- Append unique identifier
- Use this property to choose if a unique identifier is to be appended to the file name. When the value of this property is set to yes, then the file name gets appended with the unique identifier, and a new file would be written for every wave of data that is streamed into the stage. When the value of this property is set to No, then the file would be overwritten on every wave
- Type: boolean
- Default: false
- File size threshold
- Specify the threshold for the file size in megabytes. Processing nodes will start a new file each time the size exceeds the value specified in the threshold and on reaching the wave boundary. The file will be written only on the wave boundary and hence the threshold value specified is only a soft limit. The actual size of the file can be higher than the specified threshold depending on the size of the wave
- Type: integer
- Default: 1
- If file exists
- Specify what the connector does when it tries to write a file that already exists. Select <B>Overwrite file</B> to overwrite a file if it already exists, <B>Do not overwrite file</B> to not overwrite the file and stop the job, or <B>Fail</B> to stop the job with an error message.
- Type: selection
- Default: Overwrite file
- Values:
- Overwrite file
- Do not overwrite file
- Fail
- Split file on key changes
- Select Yes to create a new file when key column value changes. Data must be sorted and partitioned for this to work properly.
- Type: boolean
- Default: false
- Key column
- Specify the key column to use for splitting files. If not specified, the connector will use the first key column on the link.
- Type: string
- Case sensitive
- Select Yes to make the key value case sensitive.
- Type: boolean
- Default: false
- Use key value in file name
- Select Yes to use the key value in the generated file name.
- Type: boolean
- Default: false
- Exclude partition string
- Select Yes to exclude the partition string each processing node appends to the file name.
- Type: boolean
- Default: false
- Maximum file size
- Specify the maximum file size in megabytes. Processing nodes will start a new file each time the size exceeds this value.
- Type: integer
- Default: 0
- Force sequential
- Select Yes to run the connector sequentially on one node.
- Type: boolean
- Default: false
- Reject mode
- Specify what the connector does when a record that contains invalid data is found in the source file. Select <B>Continue</B> to read the rest of the file, <B>Fail</B> to stop the job with an error message, or <B>Reject</B> to send the rejected data to a reject link.
- Type: selection
- Default: Continue
- Values:
- Continue
- Fail
- Reject
- Cleanup on failure
- If a job fails, select whether the connector deletes the file or files that have been created.
- Type: boolean
- Default: true
- File name column
- Specify the name of the column to write the source file name to.
- Type: string
- File format
- Specify the format of the files to read or write. The implicit file format specifies that the input to the file connector is in binary or string format without a delimiter.
- Type: selection
- Default: Delimited
- Values:
- Delimited
- Comma-separated value (CSV)
- Avro
- Implicit
- orc
- parquet
- sequencefile
- Avro format properties
- Avro configuration properties
- Type: category
- Output as JSON
- Specify if each rows in the avro file should be exported as JSON to a string column.
- Type: boolean
- Default: false
- Avro format properties
- Avro configuration properties
- Type: category
- Input as JSON
- Specify if each rows in avro file should be imported from a JSON string.
- Type: boolean
- Default: false
- Avro schema file
- Specify the fully qualified path for a JSON file that defines the schema for the Avro file.
- Type: string
- Avro compression codec
- Specify the compression algorithm that will be used to compress the data.
- Type: selection
- Default: None
- Values:
- None
- Deflate
- Snappy
- Bzip2
- Array keys
- If the file format is Avro in a target stage, then normalization is controlled through array keys. Specify ''ITERATE()'' in the description for the corresponding array field in column definition in the input tab of file connector.
- Type: string
- ORC Settings
- ORC configuration properties
- Type: category
- ORC Settings
- ORC configuration properties
- Type: category
- Stripe Size
- Stripe Size
- Type: integer
- Default: 100000
- Buffer Size
- Buffer Size
- Type: integer
- Default: 10000
- Compression Kind
- Specify Compression mechanism
- Type: selection
- Default: SNAPPY
- Values:
- NONE
- ZLIB
- SNAPPY
- Parquet settings
- Parquet configuration properties
- Type: category
- Parquet settings
- Parquet configuration properties
- Type: category
- Block size
- Block size
- Type: integer
- Default: 10000000
- Page size
- Page size
- Type: integer
- Default: 10000
- Compression type
- Specify compression mechanism
- Type: selection
- Default: SNAPPY
- Values:
- NONE
- SNAPPY
- GZIP
- LZO
- Delimited format properties
- Specify the file syntax for delimited files.
- Type: category
- Record limit
- Specify the maximum number of records to read from the file per node. If a value is not specified for this property, the entire file is read.
- Type: integer
- Encoding
- Specify the encoding of the files to read or write, for example, UTF-8.
- Type: string
- Include byte order mark
- Specify whether to include a byte order mark in the file when the file encoding is a Unicode encoding such as UTF-8, UTF-16, or UTF-32.
- Type: boolean
- Default: false
- Record definition
- Select whether the record definition is provided to the connector from the source file, a delimited string, a file that contains a delimited string, or a schema file. When runtime column propagation is enabled, this metadata provides the column definitions. If a schema file is provided, the schema file overrides the values of formatting properties in the stage and the column definitions that are specified on the Columns page of the output link.
- Type: selection
- Default: None
- Values:
- None
- File header
- Delimited string
- Delimited string in a file
- Schema file
- Infer schema
- Definition source
- If the record definition is a delimited string, enter a delimited string that specifies the names and data types of the files. Use the format name:data_type, and separate each field with the delimiter specified as the >B<Field delimiter>/B< property. If the record definition is in a delimited string file or Osh schema file, specify the full path of the file.
- Type: string
- First row is header
- Select <B>Yes</B> if the first row of the file contains field headers and is not part of the data. If you select <B>Yes</B>, when the connector writes data, the field names will be the first row of the output. If runtime column propagation is enabled, metadata can be obtained from the first row of the file.
- Type: boolean
- Default: false
- Include data types
- Select <B>Yes</B> to append the data type to each field name that the connector writes in the first row of the output.
- Type: boolean
- Default: false
- Field delimiter
- Specify a string or one of the following values: <NL>, <CR>, <LF>, <TAB>. The string can include Unicode escape strings in the form \\uNNNN where NNNN is the Unicode character code.
- Type: string
- Default: ,
- Row delimiter
- Specify a string or one of the following values: <NL>, <CR>, <LF>, <TAB>. The string can include Unicode escape strings in the form \\uNNNN where NNNN is the Unicode character code.
- Type: string
- Default: <NL>
- Escape character
- Specify the character to use to escape field and row delimiters. If an escape character exists in the data, the escape character is also escaped. Because escape characters require additional processing, do not specify a value for this property if you do not need to include escape characters in the data.
- Type: string
- Quotation mark
- Type: selection
- Default: None
- Values:
- None
- Double
- Single
- Null value
- Specify the character or string that represents null values in the data. For a source stage, input data that has the value that you specify is set to null on the output link. For a target stage, in the output file that is written to the file system, null values are represented by the value that is specified for this property. To specify that an empty string represents a null value, specify "" (two double quotation marks).
- Type: string
- Field formats
- Type: category
- Decimal format
- Specify a string that defines the format for fields that have the Decimal or Numeric data type.
- Type: string
- Date format
- Specify a string that defines the format for fields that have the Date data type.
- Type: string
- Time format
- Specify a string that defines the format for fields that have the Time data type.
- Type: string
- Timestamp format
- Specify a string that defines the format for fields that have the Timestamp data type.
- Type: string
- Implicit format properties
- Specify the file syntax for implicit files.
- Type: category
- Data format
- Specify the type of implicit file
- Type: selection
- Default: Binary
- Values:
- Binary
- Record limit
- Specify the maximum number of records to read from the file per node. If a value is not specified for this property, the entire file is read.
- Type: integer
- Encoding
- Specify the encoding of the files to read or write, for example, UTF-8.
- Type: string
- Include byte order mark
- Specify whether to include a byte order mark in the file when the file encoding is a Unicode encoding such as UTF-8, UTF-16, or UTF-32.
- Type: boolean
- Default: false
- Record definition
- Select whether the record definition is provided to the connector from the source file, a delimited string, a file that contains a delimited string, or a schema file. When runtime column propagation is enabled, this metadata provides the column definitions. If a schema file is provided, the schema file overrides the values of formatting properties in the stage and the column definitions that are specified on the Columns page of the output link.
- Type: selection
- Default: None
- Values:
- None
- File header
- Delimited string
- Delimited string in a file
- Schema file
- Definition source
- Enter a delimited string that specifies the names and data types and length of each fields. Use the format name:data_type[length], and separate each field with the delimiter specified as the >B<Field delimiter>/B< property. If the record definition is in a delimited string file or Osh schema file, specify the full path of the file.
- Type: string
- First row is header
- Select <B>Yes</B> if the first row of the file contains field headers and is not part of the data. If you select <B>Yes</B>, when the connector writes data, the field names will be the first row of the output. If runtime column propagation is enabled, metadata can be obtained from the first row of the file.
- Type: boolean
- Default: false
- Include data types
- Select <B>Yes</B> to append the data type to each field name that the connector writes in the first row of the output.
- Type: boolean
- Default: false
- Trace file
- Specify the full path to a file to contain trace information from the parser for delimited files. Because writing to a trace file requires additional processing, specify a value for this property only during job development.
- Type: string
- Create or Use existing hive table
- Select <B>Yes</B> to create or use an existing Hive table after data has been loaded to HDFS.
- Type: boolean
- Default: false
- Use staging table
- Set <B>Yes</B> to use staging table. This option will be enabled only when the <B>FileFormat</B> is <B>Delimited</B>
- Type: boolean
- Default: false
- Target table properties
- Type: category
- Table format
- Use this property to set the format of the target table
- Type: selection
- Default: parquet
- Values:
- orc
- parquet
- ORC compression
- Use this property to set the compression type for the target table when the table format is ORC
- Type: selection
- Default: ZLIB
- Values:
- NONE
- ZLIB
- SNAPPY
- Parquet compression
- Use this property to set the compression type for the target table when the table format is Parquet
- Type: selection
- Default: SNAPPY
- Values:
- NONE
- SNAPPY
- GZIP
- LZO
- Stripe size
- Stripe size
- Type: integer
- Default: 64
- Table type
- Use this property to set the format of the type of the target table.
- Type: selection
- Default: External
- Values:
- External
- Internal
- Location
- Use this property to set the location of the HDFS files serving as storage for the Hive table
- Type: string
- Drop staging table
- Use this property to drop the staging table. By default, the staging table would be dropped once the target table has been created. In case, the user do not want the staging table to be removed, set the value of this property to No
- Type: boolean
- Default: true
- Load into existing table
- Use the properties in this category while loading the data into an existing hive table. The table can be partitioned or non-partitioned.
- Type: category
- Maximum number of dynamic partitions
- Use this property to set the maximum number of Dynamic paritions to be created while loading into a partitioned table.
- Type: integer
- Default: 1000
- Enable SSL
- Select <B>Yes</B> if SSL is enabled on the Hive server.
- Type: boolean
- Default: false
- SSL Truststore certificates
- SSL truststore (X.509) certificates in the PEM format.
- Type: string
- Use Kerberos
- Select Yes to use Kerberos
- Type: boolean
- Default: false
- Use keytab
- Select Yes to use keytab instead of password for Kerberos login
- Type: boolean
- Hive host
- Enter the name of the host. If not specified, the value specified in <B>Host name</B> will be used.
- Type: string
- Hive port
- Enter the port for Hive.
- Type: integer
- Hive user name
- Enter the name of the user to connect to Hive as.
- Type: string
- Hive password
- Enter the password for the specified user.
- Type: protected string
- Hive keytab
- Enter the fully qualified path of the keytab for the specified user.
- Type: string
- Hive service principal
- Use this property to specify the service principal for hive. Service principal for the hive service is of the format <B>hive/FQDN@REALM</B>
- Type: string
- Table
- Enter the name of the table to create.
- Type: string
- Hive table type
- Specify Hive table type, as external (default) or internal.
- Type: selection
- Default: External
- Values:
- External
- Internal
- Create schema
- Specify Yes to create the schema indicated in the fully qualified table name if it does not already exist. If Yes is specified and the table name does not contain a schema, the job will fail. If Yes is specified and the schema already exists, the job will not fail.
- Type: boolean
- Default: false
- Drop existing table
- Specify <B>Yes</B> to drop the Hive table if it already exists. <B>No</B> to append to existing Hive table.
- Type: boolean
- Default: true
- Additional driver attributes
- Specify any additional driver-specific connection attributes. Enter the attributes in the name=value format, separated by semi-colon if multiple attributes needs to be specified. For information about the supported driver-specific attributes, refer to the Progress DataDirect driver documentation.
- Type: string