DEFINE UPDATE statement

The DEFINE UPDATE statement defines how to process data from the source record type.

Syntax

DEFINE UPDATE statement

Read syntax diagramSkip visual syntax diagram DEFINE UPDATE update_name VERSIONversion FROM record_nameSECTIONsection_name WHEREcondition TO &IBM_UPDATE_TARGETTABLEtable_name &IBM_CORRELATION AS &IBM_FILE_FORMAT LET clause GROUP BY clause SET clause
LET clause
Read syntax diagramSkip visual syntax diagramLET(,identifier =  expression)
GROUP BY clause
Read syntax diagramSkip visual syntax diagramGROUP BY NONEGROUP BY RECORDGROUP BY ALLGROUP BY(, field_name =  expression)DURATION integerSECONDSMINUTES
SET clause
Read syntax diagramSkip visual syntax diagramSET(ALL,field_name = accumulation)
accumulation
Read syntax diagramSkip visual syntax diagramSUM( expression)MIN( expression)MAX( expression)COUNT( expression)FIRST( expression)LAST( expression)AVG( expression column_name)

Parameters

DEFINE UPDATE update_name
Specifies the name of the update that you are defining to the System Data Engine. The update name must be unique. Verify that in the SHBODEFS data set that is used as the CDP concatenation library, and in the data set that is used as the USER concatenation library, no existing definition has the same name.
Important: The DEFINE UPDATE statements must be included after the DEFINE RECORD statement.
VERSION version
Specifies the version for the update. The maximum length for this value is 18 characters. You might want to specify this optional value for troubleshooting purposes.
FROM record_name
Specifies the source of the data for the update. It must refer to a previously defined record.
SECTION section_name
Controls the processing of repeated sections. This clause specifies that the source of the update is a repeated section section_name of the record record_name. A section is repeated if the keyword REPEATED is included when it is defined. For more information about how to define a section, see SECTION clause.

If you include this clause, the System Data Engine generates an internal record for each occurrence of the repeated section, and the source of the update is that internal record.

If the record record_name has repeated sections, and you omit the SECTION clause, the update can use only the data that is outside of the repeated sections, meaning that the repeated sections are not processed.

WHERE
Specifies that the update applies to only those source records or rows for which the condition that follows WHERE is true.
condition
Specifies the condition for the records or rows according to which the update applies. For more information, see Conditions.

Any identifier that is used in the condition must be the name of a field in the source record.

TO &IBM_UPDATE_TARGET; &IBM_CORRELATION; AS &IBM_FILE_FORMAT
These parameters must be included as shown in the syntax.
LET, GROUP BY, and SET clauses
These clauses specify the processing to be performed by the update. This includes advanced functions like GROUP BY(field_name=expression) and SET (field_name=accumulation) that are not commonly used.

The processing occurs in the order that the clauses are defined. You must specify at least one GROUP BY clause or SET clause.

The first clause in the definition uses the source records from record_name as input. Each subsequent clause uses the result of the preceding clause as input.

For more information about these clauses, see the following sections:

LET clause

Assigns names to expressions that are frequently used in subsequent clauses, which can simplify the DEFINE UPDATE definition and improve the efficiency of the update. For example, if you want to calculate a value from a field, and use that value in several expressions, you can assign the result of the calculation to a name in the LET clause, and refer to that name wherever the result of the calculation is required in the definition.

identifier
Specifies the name that is assigned to the result of the expression. The name can be any identifier that is distinct from the names of fields in the source and names that are defined in the same LET clause.
expression
Specifies the expression to whose result the name is assigned. The expression can use the names that are defined in the same LET clause. For more information about how to use expressions, see Expressions.

Any identifier that is used in this clause must be the name of a field in the source record, or a name that is introduced in one of the preceding clauses.

GROUP BY clause

Organizes records in groups according to specified values. The input to this clause is the source data that is specified in the FROM clause.

The result of GROUP BY processing is groups of input records, such that all records within each group have the same grouping values. All grouping values must be non-null. A row that has a null grouping value is not included in any group.

If you omit the GROUP BY clause, all input records are processed as one group.
Important: You must specify at least one of the following elements:
  • GROUP BY group_by_option
  • SET set_option

If the update definition has table definition, the GROUP BY clause must be consistent with that in DEFINE TABLE statement.

GROUP BY RECORD
With this option, all the records and repeated sections are recombined into a single output record. Use GROUP BY RECORD to have each input record produce a single output record. For example, for a section SECTION_A that repeated three times, this option outputs only one record.
GROUP BY NONE
With this option, all the records and repeated sections are treated individually. If you specify GROUP BY NONE, no grouping is performed, and each input record is processed individually. For example, for a section SECTION_A that repeated three times, this option outputs three records.
Tip: If there are no repeated sections, GROUP BY RECORD and GROUP BY NONE have the same behavior.
GROUP BY ALL
With this option, all input records are processed as a single group.
GROUP BY (field_name = expression)
With this option, all input records are processed according to the following values:
field_name
Specifies the field based on which the records are grouped. The field value cannot be a decimal or a long string.
expression
Specifies one grouping value. For more information about how to use expressions, see Expressions.

Any identifier that is used in an expression in any of the clauses must be the name of a field in the source record, or a name that is introduced in one of the preceding clauses.

You can have multiple (field_name = expression) entries, separated by commas.

DURATION integer SECONDS/MINUTES
Incoming records are organized in memory according to specified grouping value. Special control is required to ensure System Data Engine does not send a partially aggregated record out. DURATION option specifies the minimum amount of time that a temporary group of records will be retained in memory before it is sent to the Data Streamer.
Changing the DURATION value can affect the resource consumption. Follow these guidelines to determine a proper DURATION value:
  • DURATION is specified when GROUP BY organizes records according to a specified time, and full data aggregation is required. Specified time means that you might want to aggregate your data within a certain time.
  • Use a value that is a multiple of the time that is specified in GROUP BY. For example, if you aggregate data for 3 second intervals, then the DURATION value can be 3 seconds, 6 seconds, 9 seconds, 12 seconds etc.
  • The minimum value is 1 second. The maximum value is 60 minutes. A large value may cause storage buffer overrun if the rate of incoming records is high. Specify a value closest to the data aggregation requirement.
  • DURATION is not useful if GROUP BY does not aggregate data with any time fields.
  • Be default, the timestamp of an incoming record is determined by TIMESTAMP (SMFDATE, SMFTIME), where SMFDAT and SMFTME are the date and time from the SMF record header. For other data like non-SMF records, you can set a variable called CDP_TIMESTAMP in the Let clause or GROUP BY clause to specify the field that is used to identify the timestamps of incoming records.
  • If you omit DURATION, the System Data Engine might produce partially aggregated records. In stream mode, all data, aggregated or non-aggregated, in memory are always sent to the data streamer. In batch mode, all data, aggregated or non-aggregated, in memory are sent to the destination at end of the data set or when buffered.
CAUTION:
Using the GROUP BY clause and LET clause might cause storage buffer overrun.

SET clause

Summarizes the groups of records that result from the GROUP BY clause. The SET clause produces one record in the target output for each group. In that record, the grouping values are stored in the fields that are specified in the GROUP BY clause. The values of other fields are derived from all the records in the group, as specified in the SET clause.
Important: You must specify at least one of the following elements:
  • GROUP BY group_by_option
  • SET set_option

If the update definition has table definition, the GROUP BY clause must be consistent with that in DEFINE TABLE statement.

SET (ALL)
Specifies that an update object is to be created that contains all fields in the source record object. If a section named TRIPLETS is present in the source record object, no fields from that section are included in the generated update object.

If SECTION section_name is not specified, fields in repeated sections are also omitted from the generated update object. If SECTION section_name is specified, all fields inside and outside of the repeated sections are included in the generated update object.

If GROUP BY RECORD is specified with SET(ALL), all fields of the generated update object are field_name=FIRST(field_name).

Important: If SET (ALL) is specified, the LET clause is not valid.
SET (field_name=accumulation)

You can have multiple (field_name=accumulation) entries, separated by commas.

field_name
Specifies a field of the target output.
accumulation
Specifies how to derive the value to be stored in the field. It can be one of the following expressions:
SUM(expression)
Evaluates the expression expression for each record in the group. The value of SUM is the sum of all non-null values that are obtained. If the value of expression is null for all records in the group, the value of SUM is null. The expression must specify a numerical value.
MIN(expression)
Evaluates the expression expression for each record in the group. The value of MIN is the least of all non-null values that are obtained. If the value of expression is null for all records in the group, the value of MIN is null.
MAX(expression)
Evaluates the expression expression for each record in the group. The value of MAX is the greatest of all non-null values that are obtained. If the value of expression is null for all records in the group, the value of MAX is null.
COUNT(expression)
Evaluates the expression expression for each record in the group. The value of COUNT is an integer that is the total number of non-null values that are obtained.
FIRST(expression)
Evaluates the expression expression for each record in the group, in the order in which the records are processed. The value of FIRST is the first non-null value of expression. If the value of expression is null for all records in the group, the value of FIRST is null.
LAST(expression)
Evaluates the expression expression for each record in the group, in the order in which the records are processed. The value of LAST is the last non-null value of expression. If the value of expression is null for all records in the group, the value of LAST is null.
AVG(expression,field_name)
Evaluates the expression expression for each record in the group. The value of AVG is the average, or weighted average, of the values that are obtained, depending on the field_name value, which must name a field whose value is specified in the same SET clause. The value of field_name must be specified by using either COUNT or SUM. If field_name is specified by using COUNT, its value must be equal to the number of non-null values of the expression. The result of AVG is the average of all non-null values of the expression in the group. If the value of expression is null for all records in the group, the value of AVG is null. If field_name is specified by using SUM, the result of AVG is the weighted average of all non-null values of expression in the group. The argument of SUM obtained for the same record is used as the weight. If the value of expression is null for all records in the group, or the sum of all weights is 0, the value of AVG is null. The expression must specify a numeric value. The result of AVG is a floating-point field.
Any identifier that is used in an expression in any of the clauses must be the name of a field in the source record, or a name that is introduced in one of the preceding clauses. For more information about how to use expressions, see Expressions.

Examples

The following example of a DEFINE UPDATE statement specifies how the System Data Engine is to extract data from SMF record type 130 and type 140.

Example 1

DEFINE UPDATE SMF_130
   VERSION ’CDP.510’
   FROM SMF_130
   WHERE (SMF130PNAME='SYSTEMA')
   TO &IBM_UPDATE_TARGET
   &IBM_CORRELATION
   AS &IBM_FILE_FORMAT
   SET(ALL);
Example 2

DEFINE UPDATE SMF_140 
   VERSION CDP.510’ 
   FROM SMF_140 
   WHERE (SMF140PNAME='SYSTEMA') 
   TO &IBM_UPDATE_TARGET 
   &IBM_CORRELATION 
   AS &IBM_FILE_FORMAT 
   LET (C_S_TIME  = TIME(SMF140STME), 
        C_Q_TIME  = TIME(SMF140QTME)) 
   GROUP BY NONE 
   SET (C_SYSTEM_ID   = FIRST(SMF140SYSID), 
        C_A_SYSTEM_ID = FIRST(SMF140ASYID), 
        C_LAST_SEC    = SUM(INTERVAL(C_S_TIME,C_Q_TIME)));