Comments (6) Visits (13569)
The discussion about the pros and cons of NULL in SQL has been raging for as long as SQL has been around.
In this post I won't dare wading into this controversy.
For those interested in such matters I point to the paper "Much ado about nothing" by C.J. Date.
Here, instead I want to discuss the implications of NULL, given that it is part of DB2's SQL.
What is NULL?
NULL represents the absence of a value.
Technically it is not a value in itself. However we often use the phrase "NULL value".
But then we also use database where we should be using database management system...
The meaning of NULL in a column is to some degree up to the user to define.
For example if delivery_date is NULL, then this can mean that an item has not yet been delivered.
Or it could mean that the delivery date is unknown.
One thing is for certain, however: NULL never has the meaning of the neutral value of the type.
A good description for NULL, I find, is UNKNOWN. Most behavior of NULL can be described by that concept.
The type of NULL
The simple specification of the NULL keyword represents an unknown thing of an unknown type.
NULL can be cast to any type using a cast expression, such as CAST(NULL AS INTEGER)..
NULL can also be implicitly cast based on the context.
For example: SET x = NULL will cast NULL to the type of X before assigning x the state of NULL.
Prior to DB2 9.7 NULL was therefore only allowed in explicitly listed conditions.
Since DB2 9.7 introduced implicit casting NULL can appear nearly everywhere a value can appear.
But it can happen that DB2 cannot figure out the implied type of NULL.
In that case errors may be raised.
NULL as a function argument
The vast majority of functions supplied by DB2 accept NULL as input.
However, if any argument to a function is NULL the function returns NULL.
There are three notable classes of exception to that rule that I can think of though:
First, some functions are specifically intended to deal with NULL.
These functions include:
Second, most aggregate functions such as MAX and MIN ignore NULLs. The same is true for their scalar companions:
Comparing and testing for NULL
The only way to test for a NULL is to use the IS NULL or IS NOT NULL predicate:
WHERE and NULL
It is important to note that the WHERE clause in a select statement test for True.
Any row predicate that evaluates to False or NULL will be filtered out.
SET NULL -; CREATE OR REPLACE VARIABLE var INTEGER; SET var = 1; SELECT * FROM (VALUES (1), (NULL), (3)) AS T(c1) WHERE c1 = var; C1 ----------- 1 SET var = NULL; SELECT * FROM (VALUES (1), (NULL), (3)) AS T(c1) WHERE c1 = var; No rows were retrieved.
To find NULLs the following predicate is required:
SELECT * FROM (VALUES (1), (NULL), (3)) AS T(c1) WHERE c1 = var OR (c1 IS NULL AND var IS NULL); C1 ----------- -
NULL and CHECK constraints
A WHERE clause requires row to evaluate to True to allow it to pass.
A CHECK constraint on the other hand requires a row not to violate the constraint.
In other words a check constraint if it evaluates to True or Unknown.
CREATE TABLE emp(name VARCHAR(20), salary INTEGER CHECK (salary > 20000)); INSERT INTO emp VALUES ('John', NULL); SELECT * FROM emp; NAME
To disallow NULL the check constraint must test for the NULL or the column must be explicitly defined as NOT NULL.
DROP TABLE emp; CREATE TABLE emp(name VARCHAR(20), salary INTEGER CHECK (salary > 20000 AND salary IS NOT NULL)); INSERT INTO emp VALUES ('John', NULL); SQL0545N The requested operation is not allowed because a row does not satisfy the check constraint "SER
ORDER and NULL
A NULL is not bigger or smaller than any value of the types domain when NULLs are compared.
The result of such a comparison is always NULL or UNKNOWN itself.
But when order is imposed on a result set using an ORDER BY clause it makes sense to place all NULLs together.
SELECT * FROM (VALUES 1, NULL, 2, NULL, 3, NULL) AS T(c1) ORDER BY c1; C1 ----------- 1 2 3 - - - 6 rows were retrieved.
Presently an ORDER BY clause of a select query considers NULL to be "bigger" than any value in the type domain.
That is NULLs will always we sorted last in an ascending order and first is a descending order.
An ORDER BY clause of an OLAP function however can override this behavior and sort NULLs last or first as desired.
SELECT ROW_NUMBER() OVER(ORDER BY c1 ASC NULLS FIRST) AS rn, c1 FROM (VALUES 1, NULL, 2, NULL, 3, NULL) AS T(c1) ORDER BY c1; RN C1 ----
DISTINCT, GROUP BY and NULL
The DISTINCT for NULL behavior is interesting.
When counting the number of distinct values NULL is ignored)
SELECT COUNT(DISTINCT c1) AS distinct, COUNT(c1) AS count_c1, COUNT(*) AS count FROM (VALUES 1, NULL, 2, NULL, 2, NULL) AS T(c1) ; DISTINCT COUNT_C1 COUNT ----------- ----------- ----------- 2 3 6
Note that a regular COUNT(c1) already ignores the NULLs in compliance with teh general rule explained earlier for aggregate functions.
Using the DISTINCT keyword in the select list will treat all NULLs as one distinct "value":
SELECT DISTINCT c1 FROM (VALUES 1, NULL, 2, NULL, 2, NULL) AS T(c1) ; C1 ----------- 1 2 -
The same is true for GROUP BY:
SELECT c1 FROM (VALUES 1, NULL, 2, NULL, 2, NULL) AS T(c1) GROUP BY c1 ; C1 ----------- 1 2 -
Unique indices and NULL
Last but not least. In DB2 NULL is equal to NULL as far as uniqueness within an index is concerned.
For more on that I refer to my past BLOG post on unique where not null indexes.
I tried to collect all sorts of NULL related issues into one sport in this post.
No doubt I have omitted many. If you find holes, please comment and I will try to add the missing pieces.