SQL5104N or SQL5105N error returned when querying a Datalake table

When querying a Datalake table, an SQL5104N or SQL5105N error may be returned if a null value is read for a column which is defined as NOT NULL.

Symptoms

If a NULL value is read for a column defined as NOT NULL, an error will be returned.

Causes

There are several scenarios that can cause this situation, but the most likely cause is when a Datalake table which has some existing data is altered to add a new column which is defined as NOT NULL. A query which reads the existing table data will not have a value for the new column, and hence a NULL value will be returned. This issue can also occur if a table is imported from watsonx.data that has been altered in the same manner.

In most cases, this will result in an SQL5104N error being returned:
SQL5104N  The statement failed because the data in HDFS cannot be mapped to 
the Hadoop table definition.  Hadoop table name: "mcknight.kathy".  Column 
index: "1".  Log entry identifier: "[BSL-1-1e2754f01]".  Reason: "NULL in a 
non-nullable column".  SQLSTATE=530AC
In some cases, an SQL5105N error may be returned which indicates the data for some column is missing, like the following:
SQL5105N  The statement failed because a Big SQL component encountered an
error.  Component receiving the error: "BigSQL IO".  Component returning the
error: "UNKNOWN".  Log entry identifier: "[BSL-1-1d7af4eb8]".  Reason:
"Missing required field: c2".  SQLSTATE=58040

Resolving the problem

To avoid this issue, do not modify an existing Datalake table by adding a new column with a NOT NULL constraint, especially when the table already contains data.

If this issue arises, it can be resolved by altering the table to remove the NOT NULL constraint from the affected column.