Begin Fix Pack 11.4.02 information

Block clause

A block clause is a convenient short cut to express that certain values of the entities that are being compared are identical.

Each block clause uses the following syntax.
'block' ( entity_name 'on' <expr> )+

Syntax and semantics

The block clause of an entity resolution statement must specify exactly two blocking expressions, each of which is associated with a blocking entity. The entity name that is specified for each blocking expression must be that of a top-level input entity that is declared in the from list (nested entities are not allowed). The two blocking expressions cannot be associated with the same blocking entity.

If the from clause contains only a single top-level entity, then block clauses are not allowed.

The expression of a block clause cannot contain comparison predicates, nor can it contain a conjunction of predicates.

All field references that are used in the expression of a block clause must be defined with respect to the blocking entity name that is given (the entity name is automatically added as a prefix to all field references in the blocking expression). Thus, when you specify field names in a blocking expression, the entity name for each field must be omitted.

In the following example, the first blocking expression refers to url, but since the corresponding entity name in the block clause is s, that field internally becomes s.url. Similarly, the field short_name in the second block clause internally becomes e.short_name.
create link SocialEmps as
    select [ empid: e.id, upd_text: s.message ]
    from StatusUpdates s, Employees e
    block
      s on GetURLUserName(url),
      e on short_name
    match using
      match1:  GetURLSiteName(s.url) = "NAME_OF_SOCIAL_SITE",
      match2:  GetURLSiteName(s.url) = "NAME_OF_SOCIAL_FEED";

Usage notes

A block clause divides input views into value-based partitions and then joins the views on a partition-by partition basis, where two partitions are joined only if their corresponding partition values are equal. For entity resolution statements like the one above, the high-level integration language effectively turns the block clauses into an equality join condition, and then applies the join condition to every rule in the disjunctive match clause (if a match clause exists). So the above query example ends up being semantically equivalent to the following query.
create link SocialEmps as
      select [ empid: e.id, upd_text: s.message ]
      from StatusUpdates s, Employees e
      match using
        match1:  GetURLUserName(s.url) = e.short_name
                 and GetURLSiteName(s.url) = "NAME_OF_SOCIAL_SITE",
        match2:  GetURLUserName(s.url) = e.short_name
                 and GetURLSiteName(s.url) = "NAME_OF_SOCIAL_FEED";
Or more generally, when an entity resolution rule contains a block clause and a match clause, such as the following statement,
block <blockPred>
then the functionality is semantically defined as:
match using (<blockPred> AND <match1>) OR (<blockPred> AND <match2>) OR ...
Note: The use of OR in the example is purely for semantic demonstration. The high-level integration language does not support OR predicates explicitly.

For some entity resolution flows, blocking is the first step that is done in order to partition the data. A different option for entity resolution is to directly use the match rules without a block clause.

If an entity resolution rule contains a block clause but no match clause, then the block clause is internally transformed into a single match condition that contains the blocking join predicate.



Last updated: 25 Jun 2015
End Fix Pack 11.4.02 information