General Job Administration environment variables in DataStage
These environment variables are concerned with details about the running of DataStage® and QualityStage® parallel jobs.
APT_CLOBBER_OUTPUT environment variable in DataStage
The APT_CLOBBER_OUTPUT environment variable controls the overwriting of existing files or data sets.
By default, if an output file or data set already exists, DataStage issues an error and stops before the file or data set is overwritten, notifying you of the name conflict. Setting this variable to any value permits DataStage to overwrite existing files or data sets without a warning message.
APT_CONNECTION_PORT_RANGE environment variable in DataStage
APT_CONNECTION_PORT_RANGE changes the way TCP ports are chosen for interprocess communication. Setting this environment variable can reduce the start-up time for parallel jobs.
By default, the process manager tries to find an available port by sequentially searching for the port. When the APT_CONNECTION_PORT_RANGE environment variable is set, random TCP port numbers are generated for the conductor and player processes.
For conductor processes, random port numbers are generated between the range that is specified by the APT_PM_STARTUP_PORT environment variable, or the APT_PM_STARTUP_PORT environment variable plus the APT_CONNECTION_PORT_RANGE environment variable. The default value of the APT_PM_STARTUP_PORT environment variable is 10000.
For player processes, random port number gets generated between the range that is specified between the APT_PLAYER_CONNECTION_PORT environment variable, or the APT_PLAYER_CONNECTION_PORT environment variable plus the APT_CONNECTION_PORT_RANGE environment variable. The default value of the APT_PLAYER_CONNECTION_PORT environment variable is 11000.
- If a random port is available, the process uses it to bind.
- If a random port is not available, the process tries the next port in the range.
- If a random port is not available and falls within the range of ports, but is not available for binding, then the system goes back to the beginning of the range and tries to bind to the next available port.
- If none of the ports in the range are available, then the job fails.
- If the environment variable is set to zero or to an incorrect value, then the value is reset to 65535, which is the starting point. For example, if the range is 65535 - 10000, then the starting point is 55535. Setting the variable to zero is the best way to get the system to choose from the largest number of ports.
APT_CONFIG_FILE environment variable in DataStage
The APT_CONFIG_FILE environment variable specifies the path name of the configuration file.
You might want to include this environment variable as a job parameter, so that you can specify the configuration file at job run time.
sh-4.4$ nano /px-storage/config/1node.apt
{
node "node1"
{
fastname "$conductor"
pools ""
resource disk "/opt/ibm/PXService/Server/Datasets" {pool ""}
resource disk "/opt/ibm/PXService/Server/pds_files/node1" {pool "" "export" "node1" "node1a"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pools ""}
}
}
{
node "node1"
{
fastname "$conductor"
pools "conductor"
resource disk "/px-storage/pds_files/node1" {pool "" "export" "node1"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
node "node2"
{
fastname "$pod"
pools ""
resource disk "/px-storage/pds_files/node2" {pool "" "export" "node2"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
node "node3"
{
fastname "$pod"
pools ""
resource disk "/px-storage/pds_files/node3" {pool "" "export" "node3"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
}
{
node "node1"
{
fastname "$conductor"
pools "conductor"
resource disk "/px-storage/pds_files/node1" {pool "" "export" "node1"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
node "node2"
{
fastname "$pod"
pools ""
resource disk "/px-storage/pds_files/node2" {pool "" "export" "node2"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
node "node3"
{
fastname "$pod"
pools ""
resource disk "/px-storage/pds_files/node3" {pool "" "export" "node3"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
node "node4"
{
fastname "$pod"
pools ""
resource disk "/px-storage/pds_files/node4" {pool "" "export" "node4"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
node "node5"
{
fastname "$pod"
pools ""
resource disk "/px-storage/pds_files/node5" {pool "" "export" "node5"}
resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
}
}
APT_DISABLE_COMBINATION environment variable in DataStage
The APT_DISABLE_COMBINATION environment variable globally disables operator combining.
Operator combining is the default behavior of DataStage, in which any number of operators within a step are combined into one process where possible.
You might need to disable combining to facilitate debugging. Disabling combining generates more UNIX processes, and hence requires more system resources and memory. It also disables internal optimizations for job efficiency and run times.
APT_DONT_COMPRESS_BOUNDED_FIELDS environment variable in DataStage
Set the APT_DONT_COMPRESS_BOUNDED_FIELDS environment variable to suppress the conversion of bounded length fields to variable length fields when a copy operator writes to a data set.
APT_FILE_EXPORT_ADD_BOM environment variable in DataStage
Set the APT_FILE_EXPORT_ADD_BOM environment variable to direct the export operator to insert a byte order mark of the specified type at the beginning of the output file.
- none
- utf8
- utf16be
- utf16le
- utf32be
- utf32le
APT_IMPORT_FORCE_QUOTE_DELIM environment variable in DataStage
Set the APT_IMPORT_FORCE_QUOTE_DELIM environment variable so that import recognizes only a closing quotation mark character that is followed by the field's delimiter character.
APT_ORCHHOME environment variable in DataStage
The APT_ORCHHOME environment variable must be set by all DataStage users to point to the top-level directory of the parallel engine installation.
APT_RECORD_TIMEOUT environment variable in DataStage
Set the APT_RECORD_TIMEOUT environment variable to define the number of seconds that the parallel engine (PX) framework waits until timing out if a virtual data set doesn't process a record.
A value less than or equal to zero disables the timeout.
APT_STARTUP_SCRIPT environment variable in DataStage
Set the APT_STARTUP_SCRIPT environment variable to run a startup shell script on all DataStage processing nodes before your job is run.
As part of running a job, DataStage creates a remote shell on all DataStage processing nodes on which the job runs. By default, the remote shell is given the same environment as the shell from which DataStage is started. However, you can write an optional startup shell script to modify the shell configuration of one or more processing nodes. If a startup script exists, DataStage runs it on remote shells before it runs your job.
The APT_STARTUP_SCRIPT environment variable specifies the script to be run. If it is not defined, DataStage searches ./startup.apt, $APT_ORCHHOME/etc/startup.apt and $APT_ORCHHOME/etc/startup, in that order. The APT_NO_STARTUP_SCRIPT environment variable disables running the startup script.
APT_NO_STARTUP_SCRIPT environment variable in DataStage
The APT_NO_STARTUP_SCRIPT environment variable prevents DataStage from running a startup script before it runs your job.
By default, the APT_NO_STARTUP_SCRIPT environment variable is not set, and DataStage runs the startup script. If this variable is set, DataStage ignores the startup script. This setting might be useful when you are debugging a startup script. See also APT_STARTUP_SCRIPT.
APT_STARTUP_STATUS environment variable in DataStage
Set the APT_STARTUP_STATUS environment variable to cause messages to be generated as parallel job startup moves from phase to phase.
. This setting can be useful as a diagnostic if parallel job startup is failing.
QSM_DISABLE_DISTRIBUTE_COMPONENT environment variable in DataStage
Set the QSM_DISABLE_DISTRIBUTE_COMPONENT environment variable to ensure that control files for QualityStage jobs are not copied from the conductor node to one or more compute nodes.
In MPP or grid environments, the conductor and compute nodes might not share the project directory. If the project directory is not shared, some QualityStage jobs must copy control files from the conductor node to the compute nodes.
If the project directory is shared, the control files do not need to be copied. Copying the control files can cause file access issues. Set this environment variable to ensure that control files are not copied.
If you configure your grid environment to run DataStage jobs, this environment variable is set by default.
OSH_JOB_START_TIMEOUT environment variable in DataStage
Set the OSH_JOB_START_TIMEOUT environment variable to specify the number of seconds that the OshWrapper waits for the job to start before terminates it.
The default value is 600 seconds.
APT_JOB_MONITOR_TIMEOUT environment variable in DataStage
Set the APT_JOB_MONITOR_TIMEOUT environment variable to specify the number of minutes that the job can be idle for before being terminated by runtime.
The default value is 120 minutes.
A value of 0 or less disables the runtime monitor for the job.
APT_CACHE_VAULT_SECRETS environment variable in DataStage
Enables in-memory caching of vault secrets on the runtime instance for using them by the subsequent runs.
You can enable caching of vault secrets that are fetched during the job by setting the
environment variable APT_CACHE_VAULT_SECRETS
. The secrets are cached for a default
period of 1 hour. This cache can be cleared by calling dsjob with the cpdctl dsjob
clear-vault-cache
command, after which you should see a status message.
cpdctl dsjob clear-vault-cache
...
Cache cleared successfully
Status code = 0
APT_DISABLE_ROOT_FORKJOIN environment variable in DataStage
APT_OVERRIDE_SYSTEM_LC_ALL environment variable in DataStage
This environment variable overrides the system LC_ALL setting. This variable has a String type.
APT_OVERRIDE_SYSTEM_LC_NUMERIC environment variable in DataStage
This environment variable overrides the system LC_NUMERIC setting. This variable has a String type.
APT_MAX_MSG_SIZE environment variable in DataStage
APT_MAX_MSG_SIZE specifies the maximum allowable message size for transferring message between players. This variable is only meaningful when used in combination with APT_MAX_TRANSPORT_BLOCK_SIZE and APT_AUTO_TRANSPORT_BLOCK_SIZE. Its default value is “131072”. This variable has a Number type.
APT_MONITOR_MINTIME environment variable in DataStage
APT_MONITOR_MINTIME defines minimum time interval in seconds that should elapse between each internal rowcount-based monitor update. Its default value is “10”. This variable has a Number type.
APT_MONITOR_SIZE environment variable in DataStage
APT_MONITOR_SIZE defines the interval in rows between job monitor updates. Its default value is “50000”. This value has a Number type.
APT_MONITOR_TIME environment variable in DataStage
APT_MONITOR_TIME defines the interval in seconds between job monitor updates. Its default value is “10”. This variable has a Number type.
APT_NO_ONE_NODE_COMBINING_OPTIMIZATION environment variable in DataStage
APT_NO_ONE_NODE_COMBINING_OPTIMIZATION controls whether the PX Engine performs extra optimization to combine as many processes as possible on a one node configuration. Its default value is True. This variable has a Boolean type.
APT_NO_TRANSFER_BINDING environment variable in DataStage
APT_NO_TRANSFER_BINDING turns off a copy elimination optimization in the transfer logic. Its default value is False. This variable has a Boolean type.
APT_PARAM_VALUE_FILE environment variable in DataStage
APT_PARAM_VALUE_FILE provides remote engine support file-based credentials. This variable has a String type.
APT_SCRATCH_COMPRESSION_BLOCK_SIZE environment variable in DataStage
APT_SCRATCH_COMPRESSION_BLOCK_SIZE defines the size of the data to compress and decompress as an individual block when writing sort data to scratch files. Defined in bytes, with minimum default value of 1MB. Helps reducing the memory utilization when scratch compression is used by defining APT_TSORT_SCRATCH_COMPRESSION. When not defined, whole scratch file is compressed and decompressed at a time. This variable has a Number type.
APT_TRANSFORM_STAGEVARS_ALWAYS_NULLABLE environment variable in DataStage
Setting this environment variable causes null code handling to be generated for stage variables in all cases even if the variable might never be set to null.
It is required for complete correctness because by default null handling code is generated when it is detected that the variable might contain null.
All code generated for the variable up to that point will not have null handling code. Since stage variables retain their value across records, it means that a variable might contain null in code that doesn't expect it leading to incorrect behavior. The behavior of APT_TRANSFORM_STAGEVARS_ALWAYS_NULLABLE is not considered as a default behaviour because of the large potential change to generated code and the difficulty of building enough tests to help ensure correctness.
Its default value is False. This variable has a Boolean type.
CC_JVM_OPTIONS environment variable in DataStage
Set this value to override JVM options passed for running Connectors. This variable has a String type.