DEFINE UPDATE statement
The DEFINE UPDATE statement defines how to process data from the source
record type.
Syntax
Parameters
DEFINE UPDATE update_name- Specifies the name of the update that you are defining to the System Data Engine. The update
name must be unique. Verify that in the
SHBODEFSdata set that is used as the CDP concatenation library, and in the data set that is used as the USER concatenation library, no existing definition has the same name.Important: The DEFINE UPDATE statements must be included after the DEFINE RECORD statement. VERSION version- Specifies the version for the update. The maximum length for this value is 18 characters. You might want to specify this optional value for troubleshooting purposes.
FROM record_name- Specifies the source of the data for the update. It must refer to a previously defined record.
SECTION section_name- Controls the processing of repeated sections. This clause specifies that the source of the
update is a repeated section
section_nameof the recordrecord_name. A section is repeated if the keywordREPEATEDis included when it is defined. For more information about how to define a section, see SECTION clause.If you include this clause, the System Data Engine generates an internal record for each occurrence of the repeated section, and the source of the update is that internal record.
If the record record_name has repeated sections, and you omit the
SECTIONclause, the update can use only the data that is outside of the repeated sections, meaning that the repeated sections are not processed.
WHERE- Specifies that the update applies to only those source records or rows for which the condition
that follows
WHEREis true.condition- Specifies the condition for the records or rows according to which the update applies. For more
information, see Conditions.
Any identifier that is used in the condition must be the name of a field in the source record.
TO &IBM_UPDATE_TARGET;&IBM_CORRELATION;AS &IBM_FILE_FORMAT- These parameters must be included as shown in the syntax.
LET,GROUP BY, andSETclauses- These clauses specify the processing to be performed by the update. This includes advanced
functions like
GROUP BY(field_name=expression)andSET (field_name=accumulation)that are not commonly used.The processing occurs in the order that the clauses are defined. You must specify at least one GROUP BY clause or SET clause.
The first clause in the definition uses the source records from
record_nameas input. Each subsequent clause uses the result of the preceding clause as input.For more information about these clauses, see the following sections:
LET clause
Assigns names to expressions that are frequently used in subsequent clauses, which can simplify
the DEFINE UPDATE definition and improve the efficiency of the update. For example,
if you want to calculate a value from a field, and use that value in several expressions, you can
assign the result of the calculation to a name in the LET clause, and refer to that name wherever
the result of the calculation is required in the definition.
identifier- Specifies the name that is assigned to the result of the expression. The name can be any identifier that is distinct from the names of fields in the source and names that are defined in the same LET clause.
expression- Specifies the expression to whose result the name is assigned. The expression can use the names
that are defined in the same LET clause. For more information about how to use expressions, see
Expressions.
Any identifier that is used in this clause must be the name of a field in the source record, or a name that is introduced in one of the preceding clauses.
GROUP BY clause
Organizes records in groups according to specified values. The input to this clause is the source
data that is specified in the FROM clause.
The result of GROUP BY processing is groups of input records, such that all
records within each group have the same grouping values. All grouping values must be non-null. A row
that has a null grouping value is not included in any group.
GROUP BY clause, all input records are processed as one
group.GROUP BY group_by_optionSET set_option
If the update definition has table definition, the GROUP BY clause must be
consistent with that in DEFINE TABLE statement.
GROUP BY RECORD- With this option, all the records and repeated sections are recombined into a single output
record. Use
GROUP BY RECORDto have each input record produce a single output record. For example, for a sectionSECTION_Athat repeated three times, this option outputs only one record. GROUP BY NONE- With this option, all the records and repeated sections are treated individually. If you specify
GROUP BY NONE, no grouping is performed, and each input record is processed individually. For example, for a sectionSECTION_Athat repeated three times, this option outputs three records.Tip: If there are no repeated sections,GROUP BY RECORDandGROUP BY NONEhave the same behavior. GROUP BY ALL- With this option, all input records are processed as a single group.
GROUP BY (field_name = expression)- With this option, all input records are processed according to the following values:
field_name- Specifies the field based on which the records are grouped. The field value cannot be a decimal or a long string.
expression- Specifies one grouping value. For more information about how to use expressions, see Expressions.
Any identifier that is used in an expression in any of the clauses must be the name of a field in the source record, or a name that is introduced in one of the preceding clauses.
You can have multiple
(field_name = expression)entries, separated by commas. DURATION integer SECONDS/MINUTES- Incoming records are organized in memory according to specified grouping value. Special control
is required to ensure System Data Engine does not send a partially aggregated record out.
DURATIONoption specifies the minimum amount of time that a temporary group of records will be retained in memory before it is sent to the Data Streamer.Changing theDURATIONvalue can affect the resource consumption. Follow these guidelines to determine a properDURATIONvalue:DURATIONis specified whenGROUP BYorganizes records according to a specified time, and full data aggregation is required. Specified time means that you might want to aggregate your data within a certain time.- Use a value that is a multiple of the time that is specified in
GROUP BY. For example, if you aggregate data for 3 second intervals, then theDURATIONvalue can be 3 seconds, 6 seconds, 9 seconds, 12 seconds etc. - The minimum value is 1 second. The maximum value is 60 minutes. A large value may cause storage buffer overrun if the rate of incoming records is high. Specify a value closest to the data aggregation requirement.
DURATIONis not useful ifGROUP BYdoes not aggregate data with any time fields.- Be default, the timestamp of an incoming record is determined by
TIMESTAMP(SMFDATE,SMFTIME), whereSMFDATandSMFTMEare the date and time from the SMF record header. For other data like non-SMF records, you can set a variable calledCDP_TIMESTAMPin theLetclause orGROUP BYclause to specify the field that is used to identify the timestamps of incoming records. - If you omit
DURATION, the System Data Engine might produce partially aggregated records. In stream mode, all data, aggregated or non-aggregated, in memory are always sent to the data streamer. In batch mode, all data, aggregated or non-aggregated, in memory are sent to the destination at end of the data set or when buffered.
GROUP BY clause and LET clause
might cause storage buffer overrun. SET clause
GROUP BY clause. The
SET clause produces one record in the target output for each group. In that record,
the grouping values are stored in the fields that are specified in the GROUP BY
clause. The values of other fields are derived from all the records in the group, as specified in
the SET clause.GROUP BY group_by_optionSET set_option
If the update definition has table definition, the GROUP BY clause must be
consistent with that in DEFINE TABLE statement.
SET (ALL)- Specifies that an update object is to be created that contains all fields in the source record
object. If a section named TRIPLETS is present in the source record object, no fields from that
section are included in the generated update object.
If
SECTION section_nameis not specified, fields in repeated sections are also omitted from the generated update object. IfSECTION section_nameis specified, all fields inside and outside of the repeated sections are included in the generated update object.If
GROUP BY RECORDis specified withSET(ALL), all fields of the generated update object arefield_name=FIRST(field_name).Important: IfSET (ALL)is specified, the LET clause is not valid. SET (field_name=accumulation)-
You can have multiple
(field_name=accumulation)entries, separated by commas.field_name- Specifies a field of the target output.
accumulation- Specifies how to derive the value to be stored in the field. It can be one of the following expressions:
SUM(expression)- Evaluates the expression
expressionfor each record in the group. The value of SUM is the sum of all non-null values that are obtained. If the value ofexpressionis null for all records in the group, the value of SUM is null. The expression must specify a numerical value. MIN(expression)- Evaluates the expression
expressionfor each record in the group. The value of MIN is the least of all non-null values that are obtained. If the value ofexpressionis null for all records in the group, the value of MIN is null. MAX(expression)- Evaluates the expression
expressionfor each record in the group. The value of MAX is the greatest of all non-null values that are obtained. If the value ofexpressionis null for all records in the group, the value of MAX is null. COUNT(expression)- Evaluates the expression
expressionfor each record in the group. The value of COUNT is an integer that is the total number of non-null values that are obtained. FIRST(expression)- Evaluates the expression
expressionfor each record in the group, in the order in which the records are processed. The value of FIRST is the first non-null value ofexpression. If the value of expression is null for all records in the group, the value of FIRST is null. LAST(expression)- Evaluates the expression
expressionfor each record in the group, in the order in which the records are processed. The value of LAST is the last non-null value ofexpression. If the value ofexpressionis null for all records in the group, the value of LAST is null. AVG(expression,field_name)- Evaluates the expression
expressionfor each record in the group. The value of AVG is the average, or weighted average, of the values that are obtained, depending on thefield_namevalue, which must name a field whose value is specified in the same SET clause. The value offield_namemust be specified by using either COUNT or SUM. Iffield_nameis specified by using COUNT, its value must be equal to the number of non-null values of the expression. The result of AVG is the average of all non-null values of the expression in the group. If the value of expression is null for all records in the group, the value of AVG is null. Iffield_nameis specified by using SUM, the result of AVG is the weighted average of all non-null values of expression in the group. The argument of SUM obtained for the same record is used as the weight. If the value of expression is null for all records in the group, or the sum of all weights is 0, the value of AVG is null. The expression must specify a numeric value. The result of AVG is a floating-point field.
Examples
The following example of a DEFINE UPDATE statement specifies how the System Data Engine is to extract data from SMF record type 130 and type 140.
- Example 1
-
DEFINE UPDATE SMF_130 VERSION ’CDP.510’ FROM SMF_130 WHERE (SMF130PNAME='SYSTEMA') TO &IBM_UPDATE_TARGET &IBM_CORRELATION AS &IBM_FILE_FORMAT SET(ALL); - Example 2
-
DEFINE UPDATE SMF_140 VERSION CDP.510’ FROM SMF_140 WHERE (SMF140PNAME='SYSTEMA') TO &IBM_UPDATE_TARGET &IBM_CORRELATION AS &IBM_FILE_FORMAT LET (C_S_TIME = TIME(SMF140STME), C_Q_TIME = TIME(SMF140QTME)) GROUP BY NONE SET (C_SYSTEM_ID = FIRST(SMF140SYSID), C_A_SYSTEM_ID = FIRST(SMF140ASYID), C_LAST_SEC = SUM(INTERVAL(C_S_TIME,C_Q_TIME)));
