IBM Support

Select * from <tablename> returns wrong number of rows. Why?

Technical Blog Post


Abstract

Select * from <tablename> returns wrong number of rows. Why?

Body

Hello db2 DBAs

I recently had an interesting situation, which seemed on the face of it highly unusual. So I wanted to share with this group.

PROBLEM:
Cust called in indicating he has inserted 32 rows into a table.  Then he runs the command
db2 "select count (*) from <tabname>"
returned correct 32 value.

Then he ran the command
db2 "select * from <tabname>"

However this returned just one row!!!   WHY ? db2 has encountered a bug or what?

My thought processes in debugging this was as follows:

1. Are we querying the right table under the right schema?
Cust confirmed yes he was.


2. Are the results happening via a remote GUI app or even from the db2 cli on the local server?
It was happening via db2 CLI within the server itself!!!  
So that eliminated a problem with the way the client was reporting the data!!

 

3. Any Row or column based access control in place against this table ?
No.

 

4. Since the behavior was easily reproducible, could it be an optimizer issue?
Possible.  Asked for the db2look output for the tablename involved.

That showed an interesting tidbit.  The table definition had an "informational constraint" as follows:
ALTER TABLE "Schema_name"."Tab_name"
        ADD UNIQUE
                ("column_name")
        NOT ENFORCED;  
        

And this explained what was going on under the covers for the seemingly incorrect result returned by db2!!

Since this constraint was defined as "NOT ENFORCED" db2 does not enforce uniquness. It expects the end user to make sure the column in question does not have any duplicate values.  This way db2 can avoid creating an index and use the index to return the actual values.   This is ideal in data warehouse environment where you end up loading millions or billions of rows into the table.  Time savings can be substantial.

However in this case cust had all the duplicate values for that column.  So db2 optimizer was "assuming" this column value to be unique and returned just one row satisfying that condition.  

 

MORAL of the story:  Beware of the way db2 would interpret the data when they contain informational constraints and not enforced.  You can shoot your foot if not careful.

The work around was to add the option DISABLE OPTIMIZATION for that constraint.  

That way db2 will not use that constraint for query optimization and return correct results.

 

Here is the link to our db2 Info center which documents this behavior.

http://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0023324.html?lang=en

Some snippets from that page.

"Informational RI (referential integrity) constraints are used to optimize query performance, the incremental processing of REFRESH IMMEDIATE MQT, and staging tables. Query results, MQT data, and staging tables might be incorrect if informational constraints are violated."

and

Specify an informational constraint only (emphasis added..) if the table data is independently known to conform to the constraint. Because the DB2® database manager does not enforce uniqueness for these constraints, if the table data violates the not-enforced constraint, incorrect results can occur. You cannot reference not-enforced primary key constraints in any enforced referential integrity definitions.

 

I also came across this blog post during my research, which confirmed our analysis

http://db2commerce.com/2014/06/10/informational-constraints-benefits-and-drawbacks/

Lesson well learned for both customer and me!!

Hope you enjoyed it!!  Feel free to leave your comments in this page.  I very much appreciate it.

Murali

 

[{"Business Unit":{"code":"BU029","label":"Data and AI"}, "Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":""}]

UID

ibm11140868