SELECT function

The SELECT function combines, filters, and transforms complex message and database data.

Syntax

Read syntax diagramSkip visual syntax diagram1SELECTSelectClause FromClauseWhereClause
WHERE
Read syntax diagramSkip visual syntax diagramSelectClause =  <<---- ,------ << ExpressionASPathINSERTITEMExpressionCOUNT2MAXMINSUM(Expression)
Read syntax diagramSkip visual syntax diagramFromClause =  FROM <<---- ,------ << FieldReferenceASCorrelationName
Read syntax diagramSkip visual syntax diagramWhereClause =  WHERE Expression
Notes:
  • 1 You no longer require the enclosing parentheses in SELECT expressions. This does not prevent you using parentheses but, if they are present, they are merely normal, expression-scoping, parentheses.
  • 2 For the COUNT parameter only, you can specify the value of the following Expression as a single star (*).

Usage

The SELECT function is the usual and most efficient way of transforming messages. You can use SELECT to:
  • Comprehensively reformat messages
  • Access database tables
  • Make an output array that is a subset of an input array
  • Make an output array that contains only the values of an input array
  • Count the number of entries in an array
  • Select the minimum or maximum value from a number of entries in an array
  • Sum the values in an array

Introduction to SELECT

The SELECT function considers a message tree (or sub-tree) to consist of a number of rows and columns, rather like a database table. A FieldReference in a FROM clause identifies a field in a message tree. The identified field is regarded in the following ways:
  • The identified field is regarded as a row in a table.
  • The field's siblings are regarded as other rows of the same table.
  • The field's children are regarded as the table's columns.
Note: The FieldReference in a FROM clause can also be a table reference that refers directly to a real database table.

The return value of the SELECT function is typically another message tree that contains rows whose structure and content is determined by the SelectClause. The number of rows in the result is the sum of all the rows pointed to by all the field references and table references in the FROM clause, filtered by the WHERE clause; only those fields for which the WHERE clause evaluates to TRUE are included.

The return value of the SELECT function can also be scalar (see ITEM selections).

You can specify the SelectClause in several ways; see:
If you have created a message flow that contains one of the following nodes, and the ESQL that is associated with this node includes a SELECT function and a database reference, you must specify a value for the Data source property of the relevant node:
  • Compute
  • Database
  • Filter

Simple selections

To understand the SELECT function in more detail, first consider the following simple case:
  • The SelectClause consists of a number of expressions, each with an AS Path clause.
  • The FROM clause contains a single FieldReference and an AS CorrelationName clause.
The SELECT function creates a local, reference, correlation variable, whose name is given by the AS CorrelationName clause, and then steps, in turn, through each row of the list of rows derived from the FROM clause. For each row:
  1. The correlation variable is set to point to the current row.
  2. The WHERE clause (if present) is evaluated. If it evaluates to FALSE or unknown (null), nothing is added to the result tree and processing proceeds to the next row of the input. Otherwise processing proceeds to the next step.
  3. A new member is added to the result list.
  4. The SELECT clause expressions are evaluated and assigned to fields named as dictated by the AS Path clause. These fields are child fields of the new member of the result list.

Typically, both the SelectClause and the WHERE clause expressions use the correlation variable to access column values (that is, fields in the input message tree) and thus to build a new message tree containing data from the input message. The correlation variable is referred to by the name specified in the AS CorrelationName clause or, if an AS clause is not specified, by the final name in the FROM FieldReference (that is, the name after the last dot).

Note that:
  • Despite the analogy with a table, you are not restricted to accessing or creating messages with a flat, table-like, structure; you can access and build trees with arbitrarily deep folder structures.
  • You are not restricted to a column being a single value; a column can be a repeating list value or a structure.
These concepts are best understood by reference to the examples.

If the field reference is a TableReference, the operation is very similar. In this case, the input is a real database table and is thus restricted to the flat structures supported by databases. The result tree is still not so restricted, however.

If the FROM clause contains more than one field reference, the rightmost reference steps through each of its rows for each row in the next-to-rightmost reference, and so on. The total number of rows in the result is thus the product of the number of rows in each table. Such selects are known as joins and commonly use a WHERE clause that excludes most of these rows from the result. Joins are commonly used to add database data to messages.

The AS Path clause is optional. If it is unspecified, the integration node generates a default name according to the following rules:
  1. If the SelectClause expression is a reference to a field or a cast of a reference to a field, the name of the field is used.
  2. Otherwise the integration node uses the default names Column1, Column2, and so on.

Examples

The following example performs a SELECT on the table Parts in the schema Shop in the database DSN1. Because no WHERE clause exists, all rows are selected. Because the select clause expressions (for example, P.PartNumber) contain no AS clauses, the fields in the result adopt the same names:

SET PartsTable.Part[] = SELECT
  P.PartNumber,
  P.Description,
  P.Price
 FROM Database.DSN1.Shop.Parts AS P;

If the target of the SET statement (PartsTable) is a variable of type ROW, after the statement is executed PartsTable will have, as children of its root element, a field called Part for each row in the table. Each of the Part fields will have child fields called PartNumber, Description, and Price. The child fields will have values dictated by the contents of the table. (PartsTable could also be a reference into a message tree).

The next example performs a similar SELECT. This case differs from the last in that the SELECT is performed on the message tree produced by the first example (rather than on a real database table). The result is assigned into a subfolder of OutputRoot:

SET OutputRoot.XMLNS.Data.TableData.Part[] = SELECT
  P.PartNumber,
  P.Description,
  P.Price
 FROM PartsTable.Part[] AS P;

INSERT selections

The INSERT clause is an alternative to the AS clause. It assigns the result of the SelectClause expression (which must be a row) to the current new row itself, rather than to a child of it. The effect of this is to merge the row result of the expression into the row being generated by the SELECT. This differs from the AS clause, in that the AS clause always generates at least one child element before adding a result, whereas INSERT generates none. INSERT is useful when inserting data from other SELECT operations, because it allows the data to be merged without extra folders.

ITEM selections

The SelectClause can consist of the keyword ITEM and a single expression. The effect of this is to make the results nameless. That is, the result is a list of values of the type returned by the expression, rather than a row. This option has several uses:
  • In conjunction with a scalar expression and the THE function, it can be used to create a SELECT query that returns a single scalar value (for example, the price of a particular item from a table).
  • In conjunction with a CASE expression and ROW constructors, it can be used to create a SELECT query that creates or handles messages in which the structure of some rows (that is, repeats in the message) is different from others. This is useful for handling messages that have a repeating structure but in which the repeats do not all have the same structure.
  • In conjunction with a ROW constructor, it can be used to create a SELECT query that collapses levels of repetition in the input message.

Column function selections

The SelectClause can consist of one of the functions COUNT, MAX, MIN, and SUM operating on an expression. These functions are known as column functions. They return a single scalar value (not a list) giving the count, maximum, minimum, or sum of the values that Expression evaluated to in stepping through the rows of the FROM clause. If Expression evaluates to NULL for a particular row, the value is ignored, so that the function returns the count, maximum, minimum, or sum of the remaining rows.

For the COUNT function only, Expression can consist of a single star (*). This form counts the rows regardless of null values.

To make the result a useful reflection of the input message, Expression typically includes the correlation variable.

Typically, Expression evaluates to the same data type for each row. In these cases, the result of the MAX, MIN, and SUM functions are of the same data type as the operands. The returned values are not required to be all of the same type however, and if they are not, the normal rules of arithmetic apply. For example, if a field in a repeated message structure contains integer values for some rows and float values for others, the sum follows the normal rules for addition. The sum is of type float because the operation is equivalent to adding a number of integer and float values.

The result of the COUNT function is always an integer.

Differences between message and database selections

FROM expressions in which a correlation variable represents a row in a message behave slightly differently from those in which the correlation variable represents a row in a real database table.

In the message case, a path involving a star (*) has the normal meaning; it ignores the field's name and finds the first field that matches the other criteria (if any).

In the database case a star (*) has, for historical reasons, the special meaning of all fields. This special meaning requires advance knowledge of the definition of the database table and is supported only when querying the default database (that is, the database pointed to by the node's data source attribute). For example, the following queries return column name and value pairs only when querying the default database:
SELECT * FROM Database.Datasource.SchemaName.Table As A
SELECT A.* FROM Database.Datasource.SchemaName.Table As A
SELECT A FROM Database.Datasource.SchemaName.Table AS A

Specifying the SELECT expressions

SelectClause
SelectClause expressions can use any of the integration node's operators and functions in any combination. They can refer to the tables' columns, message fields, correlation names declared by containing SELECTs, and to any other declared variables or constants that are in scope.
AS Path
An AS Path expression is a relative path (that is, there is no correlation name) but is otherwise unrestricted in any way. For example, it can contain:
  • Indexes (for example, A.B.C[i])
  • Field-type specifiers (for example, A.B.(XML.Attribute)C )
  • Multipart paths (for example, A.B.C )
  • Name expressions (for example, A.B.{var})

Any expressions in these paths can also use any of the integration node's operators and functions in any combination. The expressions can refer to the tables' columns, message fields, correlation names declared by containing SELECTs, and any declared variables or constants.

FROM clause
FROM clause expressions can contain multiple database references, multiple message references, or a mixture of the two. You can join tables with tables, messages with messages, or tables with messages.

FROM clause FieldReferences can contain expressions of any kind (for example, Database.{DataSource}.{Schema}.Table1).

You can calculate a field, data source, schema, or table name at run time.

WHERE clause

The WHERE clause expression can use any of the integration node's operators and functions in any combination. It can refer to table columns, message fields, and any declared variables or constants.

However, be aware that the integration node treats the WHERE clause expression by examining the expression and deciding whether the whole expression can be evaluated by the database. If it can, it is given to the database. In order to be evaluated by the database, it must use only those functions and operators supported by the database.

The WHERE clause can, however, refer to message fields, correlation names declared by containing SELECT functions, and to any other declared variables or constants within scope.

If the whole expression cannot be evaluated by the database, the integration node looks for top-level AND operators and examines each sub-expression separately. It then attempts to give the database those sub-expressions that it can evaluate, leaving the integration node to evaluate the rest. You need to be aware of this situation for two reasons:
  1. Apparently trivial changes to WHERE clause expressions can have large effects on performance. You can determine how much of the expression was given to the database by examining a user trace.
  2. Some databases' functions exhibit subtle differences of behavior from those of the integration node.

Relation to the THE function

You can use the function THE (which returns the first element of a list) in conjunction with SELECT to produce a non-list result. This is useful, for example, when a SELECT query is required to return no more than one item. It is particularly useful in conjunction with ITEM (see ITEM selections).

Differences from the SQL standard

ESQL SELECT differs from database SQL SELECT in the following ways:
  • ESQL can produce tree-structured result data
  • ESQL can accept arrays in SELECT clauses
  • ESQL has the THE function and the ITEM and INSERT parameters
  • ESQL has no SELECT ALL function in this release
  • ESQL has no ORDER BY function in this release
  • ESQL has no SELECT DISTINCT function in this release
  • ESQL has no GROUP BY or HAVING parameters in this release
  • ESQL has no AVG column function in this release

Restrictions

The following restrictions apply to the current release:
  • When a SELECT command operates on more than one database table, all the tables must be in the same database instance. (That is, the TableReferences must not specify different data source names.)
  • If the FROM clause refers to both messages and tables, the tables must precede the messages in the list.
  • Using dynamic DSN, SCHEMA and TABLE names with 'SELECT *' statements is not supported. If you use a schema, table or datasource name as a variable (dynamic variables) in 'SELECT *' queries, the variables are not resolved to the correct set of schema or table names.
  • The WHERE clause of a SELECT statement cannot itself contain a SELECT statement that relies on results returned from the original SELECT if either SELECT statement is from database tables.