Nullability analysis

Nullability analysis is used to refine the existing nulls-allowed indicator metadata definition for columns based on the actual data values that are present in the column.

Function

Nullability analysis is useful if the original column nulls allowed indicator was set without knowledge or regard to the actual data values that the column would contain. If the column's nulls allowed indicator needs to be changed, the existing metadata for the column can be changed in the original data source, or can be used to define the column in a new target schema for the data.

Analysis technique

The decision to infer whether the column should allow nulls is based on the percentage of nulls that is present in the column relative to a percentage threshold.

System capability

During column analysis processing, the system constructs each column's frequency distribution and then analyzes each distinct value to determine if the column contains null values. If the column contains nulls, the frequency percentage of those nulls is compared to the nullability threshold (for example, 1%) applicable for that column. If the actual percentage of nulls is equal or greater than the nullability threshold percentage, the system infers that nulls should be allowed. If the actual null percentage is less than the nullability threshold percentage or there are no nulls in the column, the system infers that nulls are not allowed. This system-inferred nulls allowed indicator is then recorded in the repository as the inferred selection and becomes by default the chosen selection, as shown in the following example.

Figure 1. An example of the system-inferred nulls allowed indicator
Shows the system-inferred nulls allowed indicator recorded in the repository as the inferred selection.

User responsibility

You can view the nulls analysis when the column analysis review of columns is viewed. At the detailed column view, nullability analysis has its own panel for viewing results as part of the properties analysis tab. From this panel, you can accept the system-inferred nulls allowed indicator or can override the system's inference with the opposite choice. If you override the system-inferred nulls allowed indicator selection, the new selection is recorded in the repository as the chosen nulls allowed indicator. The process is ultimately completed when you review all of the column properties and mark the column property function as reviewed.

Interpreting results

Typically, the appropriate nulls allowed indicator setting for a column will be obvious to a user by the presence of nulls in the column.

For example, there can only be existing nulls in the column if the column currently is set to allow nulls. To change the column to nulls not allowed will require that the current null values be replaced by some other data value.

However, if there are no existing nulls in the column, it might be because no nulls have been entered or because the column is set to not allow nulls. In this case, you are free to set the nulls allowed indicator as you wish.

Like other column properties, there is an advantage to maintain the consistency of allowing nulls across columns in the data environment.

Decisions and actions

You can either accept the system-inferred nulls allowed indicator or override the inference with the other choice.

Once that decision has been made, you can continue to review the other column properties or can mark the column properties review as complete.

Performance considerations

There are no significant system performance considerations for the nullability analysis function.