Handling conflicting information

When a new Registration Record is created or updated, it might contain property values that conflict with the property values in other Registration Records. This new Registration Record might also contain property values that conflict with their associated Resource Records properties.

About this task

Conflicts arise in two different ways:
  • Two Registration Records for the same resource type have property values that satisfy the same identification rule. When they satisfy the same identification rule, they can be linked to the same Resource Record. However, the Registration Records also contain single-valued property values that conflict, and the conflicting values cannot co-exist in the same Resource Record. Multivalued properties with different values do not result in a conflict. The merged record includes the union of all values from all Registration Records for these properties.
  • Two Registration Records for the same resource type have different property values for the single-valued highest priority identification rule. With this condition, they cannot be linked to the same Resource Record. However, they have the same values for properties that satisfy a lower priority uniquely identifying identification rule. And this uniquely identifying set of property values cannot be in two different Resource Records.

The Resource Registry resolves conflicts between two partially matching Registration Records by identifying one of the two Registration Records as "more reliable". Also, by making everything consistent with the more reliable Registration Record. A Registration Record can be judged as being more reliable if it contains newer data (later observation time).

The general algorithm for creating reconciled Resource Records from two partially matching Registration Records is as follows:

Procedure

  1. Determine which of the two Registration Records is more reliable. In the absence of other information, the record with more recent observation times is more reliable.
  2. Create a Resource Record from the more reliable Registration Record (including all properties that are specified in the Resource Record's Resource Shape.
  3. Reconcile the less reliable Registration Record with the Resource Record as follows:
    1. If there is no conflict, then link the Registration Record to the same Resource Record. Merge in the properties from the Registration Record (as specified in the relevant Resource Shape).
    2. Else, if the conflict does not involve the highest priority identification rule for either the Registration Record or the Resource Record, then link the Registration Record to the same Resource Record. Invalidate conflicting property values in the Registration Record (they are subsumed by the more reliable values in the Resource Record). Merge in all valid properties from the Registration Record (as specified in the relevant Resource Shape).
    3. Else, if the conflict involves the highest priority identification rule for either the Registration Record or the Resource Records, invalidate any lower priority properties in the Registration Record that match. Also, create a Resource Record from the still valid (non-matching) properties in the Registration Record (as specified in the relevant Resource Shape). Matching properties must be invalidated because two Resource Records cannot have the same values for any set of properties that make up a unique identification rule.
  4. This process results in either one or two Resource Records, with each of the Registration Records linked to one of the Resource Records.

Results

The following table illustrates the result from reconciling data from two Registration Records and three possible identification rules (R1, R2, R3). The table shows all combinations that involve at least one rule which matches and one rule which conflicts. Rule R1 has the highest priority and rule R3 has the lowest priority. R1 has possible values A1 or A2, R2 has possible values B1 or B2, and R3 has possible values C1 or C2. In this table, RegistrationRecord-1 is determined to be more reliable than RegistrationRecord-2.

In some of the combinations, the conflict is treated as a data update, and both Registration Records are linked to the first (yellow) Resource Record. However, in other cases, the conflict results in creation of a second (blue) Resource Record, and the lower priority Registration Records is linked to that Resource Record. The Registration Record entries are colored to show which Resource Record they are associated with. Values in the less reliable RegistrationRecord-2 which become invalidated by the conflict resolution are shown in gray font. They are not reflected in the Resource Record and would be ignored by subsequent reconciliation operations.

Matching and conflicting reconciliation example.

If a new or updated Registration Record matches portions of more than one other Registration Record, the reconciliation involves the complete set of matching Registration Records. This reconciliation includes all Registration Records matched by the new Registration Record, and all Registration Records matched by those records. Therefore, all the Registration Records in the set match at least another Registration Record in the set. And none of the Registration Records in the set match any Registration Records outside of the set. This set forms a matching graph which has no matching connections to any other Registration Records.

With a set of partially matching Registration Records, the entire set can be reconciled by reconciling the records one at a time, starting with the most reliable. The following process describes a reconciliation sequence:

  1. Take the most reliable Registration Record and construct a Resource Record from it.
  2. While there are more Registration Records to be processed
    1. Take the next most reliable Registration Record from the set.
    2. Reconcile it with all Resource Records which were created so far and which match the current Registration Record as follows:
      1. Process the Resource Records in order of decreasing identification rule priority.
      2. Perform each pair-wise reconciliation as per the previous discussion, ignoring any properties in the Registration Record that were invalidated by previous reconciliation operations.
      3. Since the data in the Resource Records is always from the most one, some additional properties might be invalidated by the reconciliation operation.
    3. This process might result in creation of a new Resource Record which can be added to the existing list for reconciliation with less reliable Registration Records.

After all Registration Records are processed, each Registration Record is linked to a Resource Record. And multiple Registration Records might be linked to the same Resource Record (merge scenario). Conflicting property values might be invalidated in some of the less reliable Registration Records.

The Registration Records are never merged or split, only Resource Records are. However, values in Registration Records can be marked as invalid so that they are not used in Resource Records.