General Job Administration environment variables in DataStage

These environment variables are concerned with details about the running of DataStage® and QualityStage® parallel jobs.

APT_CLOBBER_OUTPUT environment variable in DataStage

The APT_CLOBBER_OUTPUT environment variable controls the overwriting of existing files or data sets.

By default, if an output file or data set already exists, DataStage issues an error and stops before the file or data set is overwritten, notifying you of the name conflict. Setting this variable to any value permits DataStage to overwrite existing files or data sets without a warning message.

APT_CONNECTION_PORT_RANGE environment variable in DataStage

APT_CONNECTION_PORT_RANGE changes the way TCP ports are chosen for interprocess communication. Setting this environment variable can reduce the start-up time for parallel jobs.

By default, the process manager tries to find an available port by sequentially searching for the port. When the APT_CONNECTION_PORT_RANGE environment variable is set, random TCP port numbers are generated for the conductor and player processes.

For conductor processes, random port numbers are generated between the range that is specified by the APT_PM_STARTUP_PORT environment variable, or the APT_PM_STARTUP_PORT environment variable plus the APT_CONNECTION_PORT_RANGE environment variable. The default value of the APT_PM_STARTUP_PORT environment variable is 10000.

For player processes, random port number gets generated between the range that is specified between the APT_PLAYER_CONNECTION_PORT environment variable, or the APT_PLAYER_CONNECTION_PORT environment variable plus the APT_CONNECTION_PORT_RANGE environment variable. The default value of the APT_PLAYER_CONNECTION_PORT environment variable is 11000.

Setting the APT_CONNECTION_PORT_RANGE environment variable results in the following behavior:
  • If a random port is available, the process uses it to bind.
  • If a random port is not available, the process tries the next port in the range.
  • If a random port is not available and falls within the range of ports, but is not available for binding, then the system goes back to the beginning of the range and tries to bind to the next available port.
  • If none of the ports in the range are available, then the job fails.
  • If the environment variable is set to zero or to an incorrect value, then the value is reset to 65535, which is the starting point. For example, if the range is 65535 - 10000, then the starting point is 55535. Setting the variable to zero is the best way to get the system to choose from the largest number of ports.

APT_CONFIG_FILE environment variable in DataStage

The APT_CONFIG_FILE environment variable specifies the path name of the configuration file.

You might want to include this environment variable as a job parameter, so that you can specify the configuration file at job run time.

The following examples show 1-node, 2-node, and 4-node configuration files.
sh-4.4$ nano /px-storage/config/1node.apt 
{
 node "node1"
 {
  fastname "$conductor"
  pools ""
  resource disk "/opt/ibm/PXService/Server/Datasets" {pool ""}
  resource disk "/opt/ibm/PXService/Server/pds_files/node1" {pool "" "export" "node1" "node1a"}
  resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pools ""}
 } 
}

{
  node "node1"
  {
   fastname "$conductor" 
   pools "conductor"
   resource disk "/px-storage/pds_files/node1" {pool "" "export" "node1"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  }
  node "node2"
  {
   fastname "$pod" 
   pools ""
   resource disk "/px-storage/pds_files/node2" {pool "" "export" "node2"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  }
  node "node3"
  {
   fastname "$pod" 
   pools ""
   resource disk "/px-storage/pds_files/node3" {pool "" "export" "node3"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  }
}


{
  node "node1"
  {
   fastname "$conductor" 
   pools "conductor"
   resource disk "/px-storage/pds_files/node1" {pool "" "export" "node1"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  }
  node "node2"
  {
   fastname "$pod" 
   pools ""
   resource disk "/px-storage/pds_files/node2" {pool "" "export" "node2"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  }
  node "node3"
  {
   fastname "$pod" 
   pools ""
   resource disk "/px-storage/pds_files/node3" {pool "" "export" "node3"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  } 
  node "node4"
  {
   fastname "$pod" 
   pools ""
   resource disk "/px-storage/pds_files/node4" {pool "" "export" "node4"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  }
  node "node5"
  {
   fastname "$pod" 
   pools ""
   resource disk "/px-storage/pds_files/node5" {pool "" "export" "node5"}
   resource scratchdisk "/opt/ibm/PXService/Server/scratch" {pool ""}
  }
}

APT_DISABLE_COMBINATION environment variable in DataStage

The APT_DISABLE_COMBINATION environment variable globally disables operator combining.

Operator combining is the default behavior of DataStage, in which any number of operators within a step are combined into one process where possible.

You might need to disable combining to facilitate debugging. Disabling combining generates more UNIX processes, and hence requires more system resources and memory. It also disables internal optimizations for job efficiency and run times.

APT_DONT_COMPRESS_BOUNDED_FIELDS environment variable in DataStage

Set the APT_DONT_COMPRESS_BOUNDED_FIELDS environment variable to suppress the conversion of bounded length fields to variable length fields when a copy operator writes to a data set.

If the APT_DONT_COMPRESS_BOUNDED_FIELDS environment variable is not set, a modify adapter is generated in copy operators that are writing file data sets to convert bounded length fields to variable length fields for storage in the data set. The data set contains a modify adapter to convert the variable length fields back to bounded length fields.

APT_FILE_EXPORT_ADD_BOM environment variable in DataStage

Set the APT_FILE_EXPORT_ADD_BOM environment variable to direct the export operator to insert a byte order mark of the specified type at the beginning of the output file.

Set the APT_FILE_EXPORT_ADD_BOM environment variable to one of these values:
  • none
  • utf8
  • utf16be
  • utf16le
  • utf32be
  • utf32le

APT_IMPORT_FORCE_QUOTE_DELIM environment variable in DataStage

Set the APT_IMPORT_FORCE_QUOTE_DELIM environment variable so that import recognizes only a closing quotation mark character that is followed by the field's delimiter character.

APT_ORCHHOME environment variable in DataStage

The APT_ORCHHOME environment variable must be set by all DataStage users to point to the top-level directory of the parallel engine installation.

APT_RECORD_TIMEOUT environment variable in DataStage

Set the APT_RECORD_TIMEOUT environment variable to define the number of seconds that the parallel engine (PX) framework waits until timing out if a virtual data set doesn't process a record.

A value less than or equal to zero disables the timeout.

APT_STARTUP_SCRIPT environment variable in DataStage

Set the APT_STARTUP_SCRIPT environment variable to run a startup shell script on all DataStage processing nodes before your job is run.

As part of running a job, DataStage creates a remote shell on all DataStage processing nodes on which the job runs. By default, the remote shell is given the same environment as the shell from which DataStage is started. However, you can write an optional startup shell script to modify the shell configuration of one or more processing nodes. If a startup script exists, DataStage runs it on remote shells before it runs your job.

The APT_STARTUP_SCRIPT environment variable specifies the script to be run. If it is not defined, DataStage searches ./startup.apt, $APT_ORCHHOME/etc/startup.apt and $APT_ORCHHOME/etc/startup, in that order. The APT_NO_STARTUP_SCRIPT environment variable disables running the startup script.

APT_NO_STARTUP_SCRIPT environment variable in DataStage

The APT_NO_STARTUP_SCRIPT environment variable prevents DataStage from running a startup script before it runs your job.

By default, the APT_NO_STARTUP_SCRIPT environment variable is not set, and DataStage runs the startup script. If this variable is set, DataStage ignores the startup script. This setting might be useful when you are debugging a startup script. See also APT_STARTUP_SCRIPT.

APT_STARTUP_STATUS environment variable in DataStage

Set the APT_STARTUP_STATUS environment variable to cause messages to be generated as parallel job startup moves from phase to phase.

. This setting can be useful as a diagnostic if parallel job startup is failing.

QSM_DISABLE_DISTRIBUTE_COMPONENT environment variable in DataStage

Set the QSM_DISABLE_DISTRIBUTE_COMPONENT environment variable to ensure that control files for QualityStage jobs are not copied from the conductor node to one or more compute nodes.

In MPP or grid environments, the conductor and compute nodes might not share the project directory. If the project directory is not shared, some QualityStage jobs must copy control files from the conductor node to the compute nodes.

If the project directory is shared, the control files do not need to be copied. Copying the control files can cause file access issues. Set this environment variable to ensure that control files are not copied.

If you configure your grid environment to run DataStage jobs, this environment variable is set by default.

OSH_JOB_START_TIMEOUT environment variable in DataStage

Set the OSH_JOB_START_TIMEOUT environment variable to specify the number of seconds that the OshWrapper waits for the job to start before terminates it.

The default value is 600 seconds.

APT_JOB_MONITOR_TIMEOUT environment variable in DataStage

Set the APT_JOB_MONITOR_TIMEOUT environment variable to specify the number of minutes that the job can be idle for before being terminated by runtime.

The default value is 120 minutes.

A value of 0 or less disables the runtime monitor for the job.

APT_CACHE_VAULT_SECRETS environment variable in DataStage

Enables in-memory caching of vault secrets on the runtime instance for using them by the subsequent runs.

You can enable caching of vault secrets that are fetched during the job by setting the environment variable APT_CACHE_VAULT_SECRETS. The secrets are cached for a default period of 1 hour. This cache can be cleared by calling dsjob with the cpdctl dsjob clear-vault-cache command, after which you should see a status message.

cpdctl dsjob clear-vault-cache 

...
Cache cleared successfully

Status code = 0