
A block clause is a convenient short cut to express that certain values of the entities that are being compared are identical.
'block' ( entity_name 'on' <expr> )+
The block clause of an entity resolution statement must specify exactly two blocking expressions, each of which is associated with a blocking entity. The entity name that is specified for each blocking expression must be that of a top-level input entity that is declared in the from list (nested entities are not allowed). The two blocking expressions cannot be associated with the same blocking entity.
If the from clause contains only a single top-level entity, then block clauses are not allowed.
The expression of a block clause cannot contain comparison predicates, nor can it contain a conjunction of predicates.
All field references that are used in the expression of a block clause must be defined with respect to the blocking entity name that is given (the entity name is automatically added as a prefix to all field references in the blocking expression). Thus, when you specify field names in a blocking expression, the entity name for each field must be omitted.
create link SocialEmps as
select [ empid: e.id, upd_text: s.message ]
from StatusUpdates s, Employees e
block
s on GetURLUserName(url),
e on short_name
match using
match1: GetURLSiteName(s.url) = "NAME_OF_SOCIAL_SITE",
match2: GetURLSiteName(s.url) = "NAME_OF_SOCIAL_FEED";
create link SocialEmps as
select [ empid: e.id, upd_text: s.message ]
from StatusUpdates s, Employees e
match using
match1: GetURLUserName(s.url) = e.short_name
and GetURLSiteName(s.url) = "NAME_OF_SOCIAL_SITE",
match2: GetURLUserName(s.url) = e.short_name
and GetURLSiteName(s.url) = "NAME_OF_SOCIAL_FEED";
block <blockPred>
then
the functionality is semantically defined as:match using (<blockPred> AND <match1>) OR (<blockPred> AND <match2>) OR ...
For some entity resolution flows, blocking is the first step that is done in order to partition the data. A different option for entity resolution is to directly use the match rules without a block clause.
If an entity resolution rule contains a block clause but no match clause, then the block clause is internally transformed into a single match condition that contains the blocking join predicate.