APT_DUMP_SCORE report in InfoSphere DataStage parallel jobs

To analyze job performance and diagnose problems in your jobs, you can review the report in the job log that is generated by enabling the APT_DUMP_SCORE environment variable.

The configuration file specifies the nature and amount of parallelism for a job, and the specific resources that are used to run a job. When a job is run, the data flow information in the compiled job is combined with the information in the configuration file to produce a detailed execution plan that is called the score. The APT_DUMP_SCORE environment variable is a text representation of the score (a report) that is written to the job for the log.

The report includes information about:
  • Where and how data is partitioned
  • Whether InfoSphere DataStage inserted extra operators in the flow
  • The degree of parallelism each operator runs with, and on which nodes
  • Information about where the data is buffered
Figure 1 displays a sample parallel job that was run in InfoSphere® DataStage®. The example APT_DUMP_SCORE report in this topic refers to this sample job. However, any parallel job in InfoSphere DataStage can have an APT_DUMP_SCORE report that is generated after the job is run.
Figure 1. A view of a parallel job in InfoSphere DataStage
Figure 1 shows a view of a parallel job that a user created and is running with InfoSphere DataStage.
Figure 1 shows a view of a parallel job that a user created and is running with InfoSphere DataStage.

To set the APT_DUMP_SCORE environment variable, open the Administrator client, and then click Parallel > Reporting. You can set the APT_DUMP_SCORE environment variable to true for a job, a project, or the entire system. If you set it to true for the entire system, all parallel jobs produce the report, which you can use in your development and test environments.

The following score report is for a small job. When you enable APT_DUMP_SCORE and then run a job, you might typically see the following text in the job log.
main_program: This step has 10 datasets:
ds0: {op0[1p] (sequential PacifBaseMCES)
	eOther(APT_ModulusPartitioner {key={ value=MBR_SYS_ID }
})<>eCollectAny
	op1[4p] (parallel RemDups.IndvIDs_in_Sort)}
ds1: {op1[4p] (parallel RemDups.IndvIDs_in_Sort)
	[pp] eSame=>eCollectAny
	op2[4p] (parallel RemDups)}
ds2: {op2[4p] (parallel RemDups) 
	[pp] eSame=>eCollecttAny
	op6[4p] (parallel buffer (0))}
ds3: {op3[1p] (sequential PacifGalaxyMember) 
	eOther(APT_ModulusPartitioner {key={ value=MBR_SYS_ID } 
})<>eCollectAny
	op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)}
ds4: {op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)
	eOther(APT_HashPartitioner) {key={ value=MBR_SYS_ID, 
})#>eCollectAny
	op5[4p](parallel inserted tsort operator {key={value=MBR_SYS_ID,
subArgs={asc}}}(0) in IndvIdJoin)} 
ds5: {op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, 
subArgs={asc}}}(0) in IndvIdJoin)
	[pp] eSame=>eCollectAny
	op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}
ds6: {op6[4p] (parallel buffer(0))
	[pp] eSame=>eCollectAny
	op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}
ds7: {op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)
	[pp] eAny=>eCollectAny
	op8[4p] (parallel 
APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 
in SplitTran2)}
ds8: {op8[4p] (parallel 
APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 
in SplitTran2)
	eSame=>eCollectAny
	op9[4p] (parallel buffer (1))}
ds9: {op9[4p] (parallel buffer(1))
	>>eCollectOther(APT_SortedMergeCollector { key={ value=MBR_SYS_ID,
		subArgs={ asc }
	}
})
	op10[1p] (sequential APT_RealFileExportOperator in 
HP37_OvaWestmember_extract_dat)}
It has 11 operators:
op0[1p] {(sequential PacifBaseMCES)
	on nodes (
		node1[op0,p0]
	)}
op1[4p] {(parallel RemDups.IndvIDs_in_Sort)
	on nodes (
		node1[op1,p0]
		node2[op1, p1]
		node3[op1, p2]
		node4[op1, p3]
	)}
op2[4p] {(parallel RemDups)
	on nodes (
		node1[op2,p0]
		node2[op2,p1]
		node3[op2,p2]
		node4[op2,p3]
	)}
op3[1p] {(sequential PacifGalaxyMember)
	on nodes (
		node2[op3,p0]
	)}
op4[4p] {(parallel IndvIdJoin.toIndvIdJoin_Sort)
	on nodes (
		node1[op4,p0]
		node2[op4,p1]
		node3[op4,p2]
		node4[op4,p3]
	)}
op5[4p] {(parallel inserted tsort operator {key={value=MBR_SYS_ID, 
subArgs={asc}}}(0) in IndvIdJoin)
	on nodes (
		node1[op5,p0]
		node2[op5,p1]
		node3[op5,p2]
		node4[op5,p3]
	)}
op6[4p] {(parallel buffer(0))
	on nodes (
		node1[op6,p0]
		node2[op6,p1]
		node3[op6,p2]
		node4[op6,p3]
	)}
op7[4p] {(parallel APT_JoinSubOperatorNC in IndvIdJoin)op8[4p] {(parallel
	on nodes(
		node1[op7,p0]
		node2[op7,p1]
		node3[op7,p2]
		node4[op7,p3]
	)}
op8[4p] { (parallel
(APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 
in SplitTran2)
	on nodes (
		node1[op8,p0]
		node2[op8,p1]
		node3[op8,p2]
		node4[op8,p3]
	)}
op9[4p] {(parallel buffer(1))
	on nodes (
		node1[op9,p0]
		node2[op9,p1]
		node3[op9,p2]
		node4[op9,p3]
	)}
op10[1p] {(sequential APT_RealFileExportOperator in 
HP37_OvaWestmember_extract_dat)
	on nodes (
		node2[op10,p0]
	)}
It runs 35 processes on 4 nodes.

In a typical job flow, operators are endpoints and data sets are the links between the operators. An exception is when data sets are used to output a file.

Each link on the job design might write data to a temporary data set that is passed to the next operator. These temporary data sets are only placed in the scratch disk space when an imposed limit is reached. A limit can be imposed due to environmental settings or physical memory limitations.

Each operator that is listed in the score generates a number of processes that depend on these settings:
  • The established configuration file for the job
  • The node pool settings
  • The operator configured settings
  • The job flow environment variables, such as APT_DISABLE_COMBINATION, being set or not set
In the report, the operator names are prefixed with the code name op and appended with an incremental numeric value that starts with zero. Next to the operator name, a value that indicates the number of partitions that are given to that operator by the engine is appended by the letter p and enclosed in brackets, for example [1p]. For the first operator, only one partition is provided, and the second operator is given four partitions.
op0[1p] {(sequential PacifBaseMCES)
	on nodes (
		node1[op0,p0]
	)}
op1[4p] {(parallel RemDups.IndvIDs_in_Sort)
	on nodes (
		node1[op1,p0]
		node2[op1,p1]
		node3[op1,p2]
		node4[op1,p3]
	)}

In the example above, the first operator is listed as PacifBaseMCES and is the stage name in its entirety. However, the second operator, is listed as remDups.IndvIDs_in_Sort. The stage name IndvIDs is renamed to indicate that the sort process triggered by the Remove Duplicates stage occurred.

is each operator name the specific nodes that the operators are tagged to run on. In the example, node1 is for the first operator, and node1, node2, node3, and node4 are for the second operator. The name of the nodes is defined in the job configuration file.

In the example above, this is the group of data sets:
ds0: {op0[1p] (sequential PacifBaseMCES)
	eOther(APT_ModulusPartitioner { key={ value=MBR_SYS_ID }
})<>eCollectAny
	op1[4p] (parallel RemDups.IndvIDs_in_Sort)}
ds1: {op1[4p] (parallel RemDups.IndvIDs_in_Sort)
	[pp] eSame=>eCollectAny
	op2[4p] (parallel RemDups)}
The name of the data set is provided first. Within the curly brackets, three stages are specified:
  • sequential PacifBaseMCES is the source of the data set - operator 0. This stage specifies that the data sets must be read sequentially, or in a specific order, by the program. The stage also specifies that the job cannot be run in a parallel structure because the user specified that the files must be read sequentially.
  • parallel RemDups.IndvIDs_in_Sort is the activity of the data set - operator 1. This stage specifies that the data sets can be read in a parallel structure, and therefore, the data sets can run on multiple nodes.
  • parallel RemDups is the target of the data set - operator 2. The parallel RemDups operator is the final stage in which the data set is transformed before the job is completed and the data sets complete the job.

The source and target are usually operators, although you might see a specific file name that is provided, which indicates that the operator is referencing and reading from a physical data set.

The first data set, ds0, partitions the data from the first operator (op0 running in 1 partition). The data set uses the APT_ModulusPartitioner class, which is sometimes referred to as Advanced Parallel Technology modulus, to partition the data set. The modulus partitioning is using the key field MBR_SYS_ID in this scenario. The partitioned data is being sent to the second operator (op1 running in 4 partitions), which means that the data is partitioned in 4 partitions using the modulus method.

The second data set, ds1, reads from the second operator (op1 running in 4 partitions). The second data set uses the eSame method to partition the data and sends the data over to the third operator (op2 running 4 partitioning). The value [pp] means preserved partitioning. Preserved partitioning is an option that is set by default when you define your jobs. If data must be repartitioned, the [pp] flag is overridden and a warning message is triggered.

In the example for the first data set, the eOther and eCollectAny input and target read methods are being used. The second method indicates the method that the receiving operator uses to collect the data.
  • In this example, eOther is the originating or input method for op0. It is an indication that something else is being imposed outside the expected partitioning option and that you need to observe the string within the parenthesis that enclose APT_ModulusPartitioner. In this example, modulus partitioning is imposed.
  • eCollectAny is the target read method. Any records that are fed to this data set are collected in a round robin manner. The round robin behavior is less significant than the behavior that occurs for the input partitioning method, which is eOther (APT_ModulusPartitioner) for ds0.
For the ds9 data set, where the operator and stage uses the APT_SortedMergeCollector class, the eCollectOther method indicates where actual partitioning occurs and is specified when you are referencing a sequential flat file.
ds8: {op8[4p] (parallel
APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 
in SplitTran2)
	eSame=>eCollectAny
	op9[4p] (parallel buffer(1))}
ds9: {op9[4p] (parallel buffer(1))
	>>eCollectOther(APT_SortedMergeCollector { key={ value=MBR_SYS_ID,
		subArgs={ asc } 
The report uses symbols to represent the partitioning method and read method. See table 1 for a description of the symbols.
Table 1. Symbols for parallel job structure
Symbol Originating partitioning method Target read method
-> Sequential Sequential
<> Sequential Parallel
=> Parallel Parallel (same)
#> Parallel Parallel (not same)
>> Parallel Sequential
> No source No target

In the example above, the op0 operator runs first in sequential mode on node node1, and sends data to the ds0 data set. The ds0 data set is partitioned using the modulus partitioning method data that is provided from sequential to parallel (4 ways). Then the data is sent to the op1 operator that is running in parallel mode on node1, node2, node3, and node4. The op1 operator then handles the collected data, and sends the results to the ds1 data set. The ds1 data just provides data in the same partitioning order for the op2 operator as it was for the op1 operator.

The report also describes when the parallel engine inserts an operator that is based on its internal analysis of each operator's requirements. For example, Join stages require that the data is sorted, but you are not required to supply the sort details. The engine automatically detects when a sort is required, and adds the requirement when necessary. For example, consider the report for the op5 operator in the original example:
op5[4p] {(parallel inserted tsort operator {key={value=MBR_SYS_ID, 
subArgs={asc}}}(0) in IndvIdJoin)
	on nodes (
		node1[op5,p0]
		node2[op5,p1]
		node3[op5,p2]
		node4[op5,p3]
	)}
The tsort operator was inserted. As part of this insertion, the data is repartitioned based on the same key as the hash partitioning, as shown for the ds4 data set:
ds4: {op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)
	eOther(APT_HashPartitioner { key={ value=MBR_SYS_ID }
})#>eCollectAny
	op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID,
subArgs={asc}}}(0) in IndvIdJoin)}
All of the partitioning and sorting provided in the example is for the Sort stage:
ds5: {op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID,
subArgs={asc}}}(0) in IndvIdJoin)
		[pp] eSame=>eCollectAny
		op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}
[...]
op7[4p] {(parallel APT_JoinSubOperatorNC in IndvIdJoin)
	on nodes (
		node1[op7,p0]
		node2[op7,p1]
		node3[op7,p2]
		node4[op7,p3]
	)}

One potential problem with this particular dump score report is that one of the two input links for that sort stage (op7) is partitioned using modulus order (ds0), while the other input link is partitioned by hash partitioning (ds4). The hash partitioning overrode the initial modulus partitioning request (ds3). The first modulus insertion was overridden because the engine detected that the job design did not supply the required fields. The key fields are frequently supplied in the wrong order, or the job uses different key fields that break the compatibility of the data order requirements for the downstream stages. It is important to review the APT_DUMP_SCORE report and confirm that your valid job design is interpreted correctly by the parallel engine. Ensure that the intended design is correctly implemented.

Buffer operators are for instances where the downstream operator is at risk of getting overloaded with data while it is processing. The op6 operator in the example is a buffer operator:
op6[4p] {(parallel buffer(0))
	on nodes (
		node1[op6,p0]
		node2[op6,p1]
		node3[op6,p2]
		node4[op6,p3]
	)}
Buffer operators are an attempt to produce a buffer zone:
  1. The buffer operator communicates with the upstream operator to slow down it is sending of data.
  2. The buffer operator holds on to the data until the downstream operator is ready for the next block of data.

If your job is running slower than other jobs, look at the number of buffer operators. Buffer operators prevent race conditions between operators, which helps to ensure that the operators perform the job in the correct order to help prevent errors. Disabling buffering can cause severe problems that are difficult to analyze. However, better job design can reduce the amount of buffering that occurs.

Another type of operator that a user might find in an APT_DUMP_SCORE report that is not shown in the example is called a combined operator. The combined operator is a single operator that is made up of multiple other operators. You use combined operators to reduce the total number of processes within a job by combining the processes into a single step. In the following example, a combined operator is shown:
op1[2p] {(parallel APT_CombinedOperatorController:
	(APT_TransformOperatorImplV0S1_TrafoTest1_Transformer_1 in
Transformer_1)
		(Peek_2)
	) on nodes (
		node1[op1,p0]
		node2[op1,p1]
	)}

Data sets take up memory, and as part of optimization, jobs try to combine multiple operators that handle data in the same way as the operators would separately. For example, without any requirement to change the partition or sort order for the data flow, data is immediately handed off to the next operator when processing is completed in the prior operator with less memory impact.

In this example, two combined operators, a transform operator and a peek operator, are running on two partitions.

When the job log indicates that an error occurred in APT_CombinedOperator, the APT_DUMP_SCORE report can help you identify which of the combined operators is causing the problem. To identify the problem, enable the environment variable APT_DISABLE_COMBINATION. This environment variable can help you to identify which stage has the error.