If so, then strap on those boots and sign up for one of the many events happening soon around the world!
You can also contact firstname.lastname@example.org with subject "Bootcamp question"
Would you like to get a four day, expert guided, expedition into DB2 10 for LUW?
If so, then strap on those boots and sign up for one of the many events happening soon around the world!
A the end of the class you can also get your DB2 10 certifications.
For more information and to register go here.
If you don't see a boot camp in your neighborhood, don't despair.
Instead either come back to check once in a while.
You can also contact email@example.com with subject "Bootcamp question"
SergeRielau 120000D76F Tags:  expression insensitive on functional column generated upper ucase hidden case index 5 Comments 31,663 Views
As part of my job I work in a group called the SQL Language council (SLC).
The purpose of that group is, roughly three fold:
I like those.
On the flip side meeting scan be excruciatingly boring and be consumed by discussions of minute details.
I have a tendency not to be found in those.
But what is always the most intriguing about the SLC is the conscious attempt to not paint ourselves into a corner when it comes to defining language.
With product cycles of 1 1/2 to 3 years maintaining a vision across multiple releases is quite a challenge.
Therefore I love it when, at last, syntax and ideas that have been in place of over a decade finally become reality as features pile on top of features and they all lock into place adding to more than the sum of their individual parts,
Generated column support is a tale worth re-telling for that reason since it has gained more and more function over the releases culminating in DB2 10..
Setting the stage
As always let's take a look an example to motivate the problem generated columns are meant to solve.
First assume an expensive function.
Perhaps this is a function deriving your credit rating based on numerous properties of your financial standing.
As a proxy for such a real life expensive function I choose the "3a+1" once more.
The function will converge for any (let's hope) positive whole number eventually to 1 and return the number of steps it took.
CREATE OR REPLACE FUNCTION three_a_plus_one(IN n INTEGER) RETURNS INTEGER SPECIFIC three_a_plus_one CONTAINS SQL DETERMINISTIC NO EXTERNAL ACTION BEGIN DECLARE a INTEGER; DECLARE steps INTEGER DEFAULT 0; SET a = n; WHILE a <> 1 DO SET a = CASE WHEN (a / 2) * 2 = a THEN a / 2 ELSE 3 * a + 1 END, steps = steps + 1; END WHILE; RETURN steps; END / VALUES three_a_plus_one(27); 1 ----------- 111
Already for a small number such as 27 it takes 111 steps for the function to converge on 1.
Now, because these values are so expensive to produce we want to pre-compute them and store the result in a table.
CREATE TABLE three_a (n INTEGER NOT NULL PRIMARY KEY, steps INTEGER NOT NULL); INSERT INTO three_a WITH rec(n, steps) AS (VALUES (1, three_a_plus_one(1)) UNION ALL SELECT n + 1, three_a_plus_one(n + 1) FROM rec WHERE n < 10000) SELECT * FROM rec; COMMIT; SELECT * FROM three_a ORDER BY n FETCH FIRST 10 ROWS ONLY; N STEPS ----------- ----------- 1 0 2 1 3 7 4 2 5 5 6 8 7 16 8 3 9 19 10 6 10 rows were retrieved.
We can query the table and for any n retrieve the number of steps.
But there are problems with such a table.
To deal with the first problem we can add a check constraint that ensures that the dependency is always true:
ALTER TABLE three_a ADD CONSTRAINT steps CHECK (steps = three_a_plus_one(n));
As long as we add the right values everything is fine:
INSERT INTO three_a VALUES(300000, three_a_plus_one(300000)); SELECT * FROM three_a WHERE n = 300000; N STEPS ----------- ----------- 300000 52
Let's add one more twist to this before taking a closer look.
Having to add a value that is really generated and which we can only do wrong is annoying.
This is a good usage for a before trigger:
CREATE TRIGGER steps_gen BEFORE INSERT OR UPDATE ON three_a REFERENCING NEW AS n FOR EACH ROW BEGIN SET n.steps = three_a_plus_one(n.n); END; / INSERT INTO three_a(n) VALUES (300001); SELECT * FROM three_a WHERE n = 300001; N STEPS ----------- ----------- 300001 189
Now, independent of whether we use the trigger to insert the row or we specify the function directly, we are effectively executing this expensive function twice.
The first execution is within the trigger or the VALUES clause.
The second execution is by the check constraint which does nothing else but double check what must be true by design.
The second problem, as previously mentioned is that an existing application cannot exploit the precomputed values without being changed.
SELECT COUNT(*) FROM three_a WHERE three_a_plus_one(n) = 100; 1 ----------- 36 EXPLAIN PLAN FOR SELECT COUNT(*) FROM three_a WHERE three_a_plus_one(n) = 100; ID TYPE OBJECT_SCHEMA OBJECT_NAME PREDICATE_TEXT ------ --------------- -------------------- -------------------- -------------------------------------------------- 1 RETURN 2 GRPBY 3 TBSCAN SERGE THREE_A ( "SERGE "."THREE_A_PLUS_ONE"(Q1.N) = 100)
Expression Generated Columns
Generated columns are meant to solve both problems stated above:
Using an expression generated column the above table can get expressed like this:
DROP TABLE three_a; CREATE TABLE three_a(n INTEGER NOT NULL PRIMARY KEY, steps INTEGER NOT NULL GENERATED ALWAYS AS (three_a_plus_one(n)));
The syntax is very similar to that of an IDENTITY column (also introduced in DB2 7,1) , except that instead of the identity properties an expression is provided.
Since the generated column acts as a check constraint there are several rules that apply to the expression to ensure the constraint is always true:
Identity columns by contrast do allow override.
INSERT INTO three_a(n) WITH rec(n) AS (VALUES (1) UNION ALL SELECT n + 1 FROM rec WHERE n < 100000) SELECT * FROM rec;
There was no need to specify the function anywhere in the insert.
In fact we did not even need to use the column at all.
DB2 automatically will inject the equivalent of a before trigger to produce the required value.
In contrast to the handcrafted schema we first used there is no need to check consistency of the database since DB2 is fully in control.
INSERT INTO three_a(n) VALUES (300000); Optimized Statement: ------------------- INSERT INTO SERGE.THREE_A AS Q3 SELECT 300000, "SERGE "."THREE_A_PLUS_ONE"(300000) FROM (VALUES 1) AS Q1
DB2 has injected the SQL function as if it were user specified with no additional overhead.
In the above two INSERT statements the column has been omitted from the INSERT column list altogether.
If a column is omitted from the column list DB2 will default that column.
For a generated column defaulting implies generation of the computed value.
Therefore you can also specify the DEFAULT keyword as a placeholder for the value:
INSERT INTO three_a VALUES (300003, DEFAULT);
You may not, however provide an explicit value, even if that value is correct:
INSERT INTO three_a VALUES (300005, three_a_plus_one(300005)); SQL0798N A value cannot be specified for column "STEPS" which is defined as GENERATED ALWAYS.
Very similar to inserting DB2 will maintain the generated column value on updates.
If the a value the generated column depends on is changed the generated column will be adjusted accordingly.
SELECT steps FROM three_a WHERE n = 300003; STEPS ----------- 158 UPDATE three_a SET n = 251111 WHERE n = 300003; SELECT steps FROM three_a WHERE n = 251111; STEPS ----------- 256
Despite not having defined an explicit check constraint DB2 not only knows of its existence.
It can also use the knowledge to rewrite the query:
SELECT COUNT(*) FROM three_a WHERE three_a_plus_one(n) = 100; 1 ----------- 36 Optimized Statement: ------------------- SELECT Q3.$C0 FROM (SELECT COUNT(*) FROM (SELECT $RID$ FROM SERGE.THREE_A AS Q1 WHERE (Q1.STEPS = 100) ) AS Q2 ) AS Q3
Note how the optimized SQL returned by db2exfmt shows that the expression has been replaced by the column STEPS.
This opens up an interesting possibility.
If the generated column itself is indexed then the optimizer can choose that index after DB2 has exploited the generated column reference.
That is, de-facto, a generated column provides similar function to an index on an expression.
CREATE INDEX steps ON three_a(steps); SELECT COUNT(*) FROM three_a WHERE three_a_plus_one(n) = 100; 1 ----------- 36 Rows RETURN ( 1) Cost I/O | 1 GRPBY ( 2) 6.81703 1 | 38.8879 IXSCAN ( 3) 6.81315 1 | 10002 INDEX: SERGE STEPS Q1
The difference to an index on an expression, of course, is that here the expression is actually materialized in the table.
This costs a bit more space. It does however also speed up queries that do table scans, such as a hash join over the expression.
Altering generated columns
So what do you do if you have a functionally dependent column today without using a generated column syntax?
If you want to exploit the functionality you want to make that column a generated column.
Another scenario may be where you want to replace the generated column with another expression, perhaps to fix a logic error.
To achieve that you can use the regular ALTER TABLE statement, but with a twist.
Since the table currently has data you cannot simply replace the expression.
When adding a true check constraint DB2 will never change the data in the table.
In this case however this may be necessary.
DB2 not only needs to validate rows, it also needs to fix them.
Therefore whenever you add a generated column or make a non generated column a generated column, you need to place the table into check pending mode and use SET INTEGRITY to recover from that once the ALTER TABLE is complete.
Altering columns to add or remove the generated column properties (along with identity and default) was introduced in DB2 8.2.
In the following example will replace the three_a_plus_one() function with a faster inlined version:
CREATE OR REPLACE FUNCTION three_a_plus_one(IN n INTEGER) RETURNS INTEGER SPECIFIC three_a_plus_one CONTAINS SQL DETERMINISTIC NO EXTERNAL ACTION BEGIN ATOMIC DECLARE a INTEGER; DECLARE steps INTEGER DEFAULT 0; SET a = n; WHILE a <> 1 DO SET a = CASE WHEN (a / 2) * 2 = a THEN a / 2 ELSE 3 * a + 1 END, steps = steps + 1; END WHILE; RETURN steps; END / SQL0478N DROP, ALTER, TRANSFER OWNERSHIP or REVOKE on object type "FUNCTION" cannot be processed because there is an object "SERGE.THREE_A", of type "TABLE", which depends on it.
The function cannot be replaced while the generated column is in place.
So we need to drop the generated column property, replace the function and then add it back.
ALTER TABLE three_a ALTER COLUMN steps DROP EXPRESSION;
DB2 10: Hiding generated columns
Remember how I stated above that one of the benefits of generated columns is that they can be exploited without changing the applications?
That was true for queries which use the expression. But any queries of the form SELECT * or INSERT INTO T VALUES will break when a new column is being added.
This has often been brought forward as a reason not to use generated columns.
In DB2 9.5 ROW CHANGE TOKEN, and ROW CHANGE TIMESTAMP columns were added in support of optimistic locking strategies.
These new kinds of columns can be hidden from applications using an IMPLICITLY HIDDEN property.
In DB2 10, with temporal query support this new feature is available to all columns.
This includes generated columns!
You can hide them from select-star syntax as well as from the INSERT-column list.
This is very convenient!
To show the effect let's replay the example from the beginning.
But this time we will add the generated column after the fact.
DROP TABLE three_a;
Now we add the generated column with the goal of speeding up the select query.
We ultimately want to use an index on the expression.
SET INTEGRITY FOR three_a OFF; ALTER TABLE three_a
Note that implicitly hidden columns can still be referenced explicitly:
SELECT steps FROM three_a WHERE n = 5678; STEPS ----------- 129
SergeRielau 120000D76F Tags:  statement concentrator cursor parameter marker sharing 3 Comments 9,340 Views
Some years ago I visited a business partner to help them overcome some performance problems on DB2.
They had a rather simple workload for quality assurance test which ran on a simple Windows PC.
On another DBMS that workload executed without a problem and with acceptable throughput.
On DB2 however that same workload completely overwhelmed the machine and throughput was, frankly, abysmal.
Needless to say the partner was not too amused.
We quickly discovered that the CPU was pegged at 100%.
We further discovered that DB2 saturated the CPU through compilation of SQL statements.
A dump of the package cache showed that it was flooded with simple update statements.
Each update being identical to the next except for input values.
That in itself could not explain why the CPU usage was so high.
The workload was identical to the competitor's DBMS.
It turned out that there was a recursive update trigger using inline SQL PL.
So what looked like a simple statement was actually quite complex.
The other DBMS did not have the concept of inline SQL PL.
It did not have to pay for trigger compilation for each statement.
Once the problem was known the fix was simple:
Using parameter markers there was no need to keep compiling SQL Statements.
The inline SQL PL provided an additional boost propelling DB2's performance past the other DBMS.
While the outcome was good, the partner needed to do work to achieve acceptable performance.
Lesson learned: DB2 must run really bad SQL really fast.
In DB2 9.7 two features were added which are meant to fix the issue we encountered:
A Problem Scenario
It is a best practice when working with DB2, or most other DBMS to use parameter markers or host variables when submitting dynamic SQL.
DB2 maintains a cache of recently executed SQL statements.
Every time a new SQL statement is submitted DB2 will match it to the existing cache entries.
If a match is found then there is no need to parse and compile.
DB2 can execute the SQL statement right away.
Unfortunately many application developers do not follow that practice.
Even more unfortunate is that several abstract APIs which generate SQL under the covers also do not use parameter markers,
Let's take a look what happens here by setting up a simple experiment.
VARIABLE elapsed INTEGER; BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; WHILE i < 50000 DO SET txt = 'SELECT c1 + 1 FROM (VALUES(' || i || ')) AS T(c1)'; PREPARE stmt FROM txt; OPEN cur; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed 42 COLUMN STMT_TEXT FORMAT A80 SET WRAP OFF SET LINESIZE 200 SELECT VARCHAR(SUBSTR(STMT_TEXT, 1, 80)) AS STMT_TEXT FROM TABLE(MON_GET_PKG_CACHE_STMT('D', NULL, NULL, NULL)) WHERE STMT_TEXT LIKE 'SELECT c1%' FETCH FIRST 10 ROWS ONLY; STMT_TEXT -------------------------------------------------------------------------------- SELECT c1 + 1 FROM (VALUES(49993)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49998)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49894)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49940)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49963)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49925)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49906)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49968)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49994)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49898)) AS T(c1) 10 rows were retrieved.
So 50000 executions of this simple SQL Statement take about 42 seconds to execute.
And each statement was compiled and inserted into the package cache.
The best fix
The above scenario represents bad SQL programming.
The correct way to write SQL is to use parameter markers and bind the values when the query is being executed.
What happens when we use a parameter marker?
FLUSH PACKAGE CACHE DYNAMIC; BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; WHILE i < 50000 DO SET txt = 'SELECT c1 + 1 FROM (VALUES(CAST(? AS INTEGER))) AS T(c1)'; PREPARE stmt FROM txt; OPEN cur USING i; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed 2
That's a 20 fold increase. when using parameter markers!
Here DB2 will match the txt content with the previously generated plan and simply assign the stmt handle.
We can do even better by doing that ourselves: We pull the prepare outside the loop:
BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; SET txt = 'SELECT c1 + 1 FROM (VALUES(CAST(? AS INTEGER))) AS T(c1)'; PREPARE stmt FROM txt; WHILE i < 50000 DO OPEN cur USING i; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed; 1
A finer granularity measuring would be better here.
We could use some a technique discussed earlier in this blog to exactly subtract timestamps.
But this last part isn't really the point of investigation.
What we want to achieve is to speed up this test without changing the application!
Using the Statement Concentrator
The statement concentrator is a feature that has been introduced in DB2 9.7.
When the concentrator is turned on every dynamic SQL statement text which cannot be matched to an entry in the cache will undergo some very basic parsing.
In this parsing DB2 will find constant literals such as 'Hello', and 5 and replace them with typed parameter markers.
get's replaced with the equivalent:
VALUES CAST(? AS INTEGER)
DB2 will then attempt to match the modified SQL statement text once more.
If a match is found then DB2 will execute the cached statement and supply the extracted constant literals.
If no match is found DB2 will compile the statement, place it in the cache and then execute it.
The easiest way to turn of the statement concentrator is through the DB configuration.
This is online and will effect any subsequent compilations from any session:
UPDATE DB CFG USING STMT_CONC LITERALS; FLUSH PACKAGE CACHE DYNAMIC; BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; WHILE i < 50000 DO SET txt = 'SELECT c1 + 1 FROM (VALUES(' || i || ')) AS T(c1)'; PREPARE stmt FROM txt; OPEN cur; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed 4 SELECT VARCHAR(SUBSTR(STMT_TEXT, 1, 80)) AS STMT_TEXT FROM TABLE(MON_GET_PKG_CACHE_STMT('D', NULL, NULL, NULL)) WHERE STMT_TEXT LIKE 'SELECT c1%' FETCH FIRST 10 ROWS ONLY; STMT_TEXT -------------------------------------------------------------------------------- SELECT c1 + :L0 FROM (VALUES(:L1 )) AS T(c1)
As you can see the load executed manifold faster now.
But the statement concentrator is no complete substitute for using parameter markers.
There is an overhead to shallow parsing.
There are several rules to consider when dealing with the statement concentrator:
SergeRielau 120000D76F Tags:  constraint empty null check where unique string 9 Comments 74,441 Views
The discussion about the pros and cons of NULL in SQL has been raging for as long as SQL has been around.
In this post I won't dare wading into this controversy.
For those interested in such matters I point to the paper "Much ado about nothing" by C.J. Date.
Here, instead I want to discuss the implications of NULL, given that it is part of DB2's SQL.
What is NULL?
NULL represents the absence of a value.
Technically it is not a value in itself. However we often use the phrase "NULL value".
But then we also use database where we should be using database management system...
The meaning of NULL in a column is to some degree up to the user to define.
For example if delivery_date is NULL, then this can mean that an item has not yet been delivered.
Or it could mean that the delivery date is unknown.
One thing is for certain, however: NULL never has the meaning of the neutral value of the type.
A good description for NULL, I find, is UNKNOWN. Most behavior of NULL can be described by that concept.
The type of NULL
The simple specification of the NULL keyword represents an unknown thing of an unknown type.
NULL can be cast to any type using a cast expression, such as CAST(NULL AS INTEGER)..
NULL can also be implicitly cast based on the context.
For example: SET x = NULL will cast NULL to the type of X before assigning x the state of NULL.
Prior to DB2 9.7 NULL was therefore only allowed in explicitly listed conditions.
Since DB2 9.7 introduced implicit casting NULL can appear nearly everywhere a value can appear.
But it can happen that DB2 cannot figure out the implied type of NULL.
In that case errors may be raised.
NULL as a function argument
The vast majority of functions supplied by DB2 accept NULL as input.
However, if any argument to a function is NULL the function returns NULL.
There are three notable classes of exception to that rule that I can think of though:
First, some functions are specifically intended to deal with NULL.
These functions include:
Second, most aggregate functions such as MAX and MIN ignore NULLs. The same is true for their scalar companions:
Comparing and testing for NULL
The only way to test for a NULL is to use the IS NULL or IS NOT NULL predicate:
WHERE and NULL
It is important to note that the WHERE clause in a select statement test for True.
Any row predicate that evaluates to False or NULL will be filtered out.
SET NULL -; CREATE OR REPLACE VARIABLE var INTEGER; SET var = 1; SELECT * FROM (VALUES (1), (NULL), (3)) AS T(c1) WHERE c1 = var; C1 ----------- 1 SET var = NULL; SELECT * FROM (VALUES (1), (NULL), (3)) AS T(c1) WHERE c1 = var; No rows were retrieved.
To find NULLs the following predicate is required:
SELECT * FROM (VALUES (1), (NULL), (3)) AS T(c1) WHERE c1 = var OR (c1 IS NULL AND var IS NULL); C1 ----------- -
NULL and CHECK constraints
A WHERE clause requires row to evaluate to True to allow it to pass.
A CHECK constraint on the other hand requires a row not to violate the constraint.
In other words a check constraint if it evaluates to True or Unknown.
CREATE TABLE emp(name VARCHAR(20), salary INTEGER CHECK (salary > 20000)); INSERT INTO emp VALUES ('John', NULL); SELECT * FROM emp; NAME SALARY -------------------- ----------- John -
To disallow NULL the check constraint must test for the NULL or the column must be explicitly defined as NOT NULL.
DROP TABLE emp; CREATE TABLE emp(name VARCHAR(20), salary INTEGER CHECK (salary > 20000 AND salary IS NOT NULL)); INSERT INTO emp VALUES ('John', NULL); SQL0545N The requested operation is not allowed because a row does not satisfy the check constraint "SERGE.EMP.SQL120328111829320". DROP TABLE emp; CREATE TABLE emp(name VARCHAR(20), salary INTEGER NOT NULL CHECK (salary > 20000)); INSERT INTO emp VALUES ('John', NULL); SQL0407N Assignment of a NULL value to a NOT NULL column "TBSPACEID=2, TABLEID=258, COLNO=1" is not allowed.
ORDER and NULL
A NULL is not bigger or smaller than any value of the types domain when NULLs are compared.
The result of such a comparison is always NULL or UNKNOWN itself.
But when order is imposed on a result set using an ORDER BY clause it makes sense to place all NULLs together.
SELECT * FROM (VALUES 1, NULL, 2, NULL, 3, NULL) AS T(c1) ORDER BY c1; C1 ----------- 1 2 3 - - - 6 rows were retrieved.
Presently an ORDER BY clause of a select query considers NULL to be "bigger" than any value in the type domain.
That is NULLs will always we sorted last in an ascending order and first is a descending order.
An ORDER BY clause of an OLAP function however can override this behavior and sort NULLs last or first as desired.
SELECT ROW_NUMBER() OVER(ORDER BY c1 ASC NULLS FIRST) AS rn, c1 FROM (VALUES 1, NULL, 2, NULL, 3, NULL) AS T(c1) ORDER BY c1; RN C1 -------------------- ----------- 4 1 5 2 6 3 1 - 2 - 3 - 6 rows were retrieved.
DISTINCT, GROUP BY and NULL
The DISTINCT for NULL behavior is interesting.
When counting the number of distinct values NULL is ignored)
SELECT COUNT(DISTINCT c1) AS distinct, COUNT(c1) AS count_c1, COUNT(*) AS count FROM (VALUES 1, NULL, 2, NULL, 2, NULL) AS T(c1) ; DISTINCT COUNT_C1 COUNT ----------- ----------- ----------- 2 3 6
Note that a regular COUNT(c1) already ignores the NULLs in compliance with teh general rule explained earlier for aggregate functions.
Using the DISTINCT keyword in the select list will treat all NULLs as one distinct "value":
SELECT DISTINCT c1 FROM (VALUES 1, NULL, 2, NULL, 2, NULL) AS T(c1) ; C1 ----------- 1 2 -
The same is true for GROUP BY:
SELECT c1 FROM (VALUES 1, NULL, 2, NULL, 2, NULL) AS T(c1) GROUP BY c1 ; C1 ----------- 1 2 -
Unique indices and NULL
Last but not least. In DB2 NULL is equal to NULL as far as uniqueness within an index is concerned.
For more on that I refer to my past BLOG post on unique where not null indexes.
I tried to collect all sorts of NULL related issues into one sport in this post.
No doubt I have omitted many. If you find holes, please comment and I will try to add the missing pieces.
Some of you may have been using DB2 as far back or even longer than DB2 7.1, or DB2 UDB V7.1 to be correct.
In that release we introduced SQL Procedures, and SQL Functions.
When we shipped the features there was a bit of a confusion because everyone from sales to marketing to customers seemed have a dire need for a name of logical constructs we supported.
It's just SQL we said. ANSI SQL comprises more than just SELECT. In includes IF, BEGIN.. END and so forth.
All we do is implement the SQL standard!
I learned and important lesson then. When you refuse to name a thing that wants to be named the market does it for you.
So we ended up with contraptions such as PL/SQL (oops, that's Oracle) and SPL (IDS).
We finally ended up with SQL Procedure Language (SQL PL) which has the unfortunate habit of being two acronyms SQL PL (hard to search) .
And it also get people confused with PL/SQL leaving us with SQL/PL.
This slash from OS/2 will haunt us to retirement I fear.
Internally however we did have a name: PSM
PSM is derived from the ANSI SQL/PSM section of the standard and stands for Persistent Stored Modules.
So, why didn't we use PSM externally? Simple: DB2 was missing the M for MODULE.
This post is all about that missing M which finally joined SQL PL in DB2 9.7.
In a nutshell a module in DB2 is an extension of the namespace for a number of objects types.
Objects in DB2 either have one part (TABLESPACE), or two parts (TABLE) to their name.
The first of the two parts is the schema.
Elements of modules
A module allows the the following objects a third name which is called the module name.
The objects for which this is allowed presently are:
There is no particular reason why no other objects are supported.
What these supported objects have in common however is that they are what one would typically find in a library in a regular procedural language.
And a library is really what a module is meant to model in DB2 (and ANSI SQL).
Creating a module
In ANSI SQL/PSM a Module is one single object very much like a file in, say C which contains all sorts of functions and types..
PL/SQL in Oracle is very similar. There is a head and a body and that's it.
We in DB2 Development didn't like that very much.
It seems outdated in an age of GUis and version control systems to thing as a module as something monolithic.
We wanted the properties associated with libraries, but with finer control.
So in DB2 you assemble modules piece by piece. And you can alter them on a per-object basis.
Every module starts of empty:
CREATE OR REPLACE MODULE HR;
That looks a lot like a CREATE SCHEMA statement, but a module itself is part of a schema.
So you can create modules of the same name in multiple schemata.
In addition access control is managed on a module level.
Something that is generally not available for schemata.
CREATE ROLE HR;
Now every member of the HR role will be able to reference all visible objects within the module.
I say visible here because libraries in most languages differentiate between private objects and public objects.
E.g. in C you may define an export file which tells the compiler/linker which symbols to export and make visible to users of the library.
Modules in DB2 support the same concept.
Adding prototypes to modules
Every object in the module can be either:
This is useful to implement recursive calls of routines as well as to provide an external specification for reference to the user which is needed to use the module.
Think of C-style header-files for comparison.
Since objects are added to the module we use the ALTER MODULE statement.
SET SCHEMA BLOG;
This is a valid specification for a module.
You cannot reference any object within the module without specifying the module itself.
Only within the module itself are references to its; own objects by simple name allowed.
To reference a module you can either qualify the module with its schema name
Unlike tables modules are not resolved by CURRENT SCHEMA :
VALUES CURRENT SCHEMA;
Instead they resolve strictly by PATH.
SET PATH = CURRENT PATH, BLOG;
By strictly I mean that the resolution of the module is independent of matching rules for routines.
Only once the module is determined will DB2 look at the best match for a referenced routine within that module.
The first module of the right name within the path is selected.
Lastly you can create public synonyms on modules.
If no module with the right name is found on the path DB2 will consider a public synonym:
SET PATH = SYSTEM PATH;
For more information about name resolution in general you can refer to Scoping Rules in DB2.
Adding routine bodies
While the type and variable are fully defined, procedures have no bodies.
Any attempt to execute them will yield and error:
Before fixing that we complete our schema with an employee table and a sequence;
SET SCHEMA = BLOG;
To provide the body for a previously specified routine we must use ADD and provide an exact match for the signature and any specified routine properties:
ALTER MODULE HR ADD PROCEDURE HIRE(INOUT emp hr.emp)
Now the procedures can be executed:
VARIABLE empid INTEGER;
Altering module elements
An advantage of modules over PL/SQL Packages is the ability to surgically change bits and pieces of it.
For example if we want to change the EMP type we can do that easily.
Currently DB2 does not support OR REPLACE fro module elements.
We need to DROP the TYPE before adding it back in.
ALTER TABLE emp ADD COLUMN country VARCHAR(20) DEFAULT 'Canada';
As a side note: DB2 automatically re-validated the HIRE procedure after the change to the emp data type.
So far all our objects within the module have been public.
Adding private module elements
Let's make the computation of the employee id a bit more interesting.
We want to empid to be combined with a country code.
The translation of countries to codes will be stored in a private array.
And we will define a routine to encapsulate the sequence generation
ALTER MODULE HR ADD TYPE countries AS INTEGER ARRAY[VARCHAR(20)];
Note that there is no requirement to have a separate specification for the routine.
The same is true for the public routine HIRE.
When we replace it we can immediately provide the body if that's what we want.
ALTER MODULE HR DROP PROCEDURE HIRE;
We forgot to fill in values for the countries!
This is an opportunity to introduce module initialization.
You can ADD a module procedure called SYS_INIT without parameters which is invoked by DB2 the first time an object in the module is referenced.
Note that there is no "un-init".
ALTER MODULE HR ADD PROCEDURE SYS_INIT
Sidenote: since my connection is the only one on my laptop and I did not activate my database explicitly DB2 deactivated the database.
As a result I lost a few sequence values from the cache.
Module initialization is rarely used since module variables provide DEFAULT clauses. But not every variable can have defaults.
Also SYS_INIT provides a better control about the timing of the initialization.
When using external routines SYS_INIT can be useful to set up the environment.
I have shown the highlights of module creation. There is nothing left but switching of the lights.
One thing you will learn to appreciate about modules is that all the module's objects get dropped when you drop the module.
DROP MODULE BLOG.HR;
When we started down the path of making DB2 compatible with Oracle Applications we wondered how to demonstrate the capability.
It seemed unlikely that any application vendor would allow us to expose their Oracle Application to a public audience.
Getting a custom application seemed not much more likely.
But creating an artificial benchmark application would immediately cause suspicion that the workload is either too trivial or biased towards success.
After a quite a bit of searching we found OpenBravo. This company sells an ERP product that is available in source code.
So, since DB2 9.7 this product has turned into the standard product to run migration boot camps for partners and customers.
OpenBravo has also become a means to get one of many formal data points to measure progressive improvements in compatibility against.
So a few months ago we decided to record a live migration of OpenBravo and put it up on YouTube.
SergeRielau 120000D76F Tags:  pl/sql single escape dynamic pl embedded quote prepare immediate cursor open string execute sql 2 Comments 65,293 Views
It is very common nowadays for SQL statements to be generated by the application itself for by some productivity middle ware which abstracts the database.
On the server however, within routines, triggers or anonymous blocks most SQL is fixed and will execute just the way the definer of the object has specified it.
There are, however exceptions to this rule:
So there is a need to compose dynamic SQL inside of SQL PL and in this post I'll illuminate the usage and some of the pitfalls.
Thanks to Cuong for bringing up this great question
The most basic way to execute dynamic SQL is the EXECUTE IMMEDIATE statement.
Basic EXECUTE IMMEDIATE
If you provide it with a string it will pass it to DB2's compiler and execute the statement on the spot.
In this case we have presented a constant string. But the string can be an expression as well including bind variables.
So far our examples were doing DDL, but you can also use EXECUTE IMMEDIATE for any other dynamic statement with the exception of queries.
One common scenario for dynamic statements is to force compilation at execution time of the SQL PL because some dependent object does not yet exist when the SQL PL is compiled.
In the following example we insert into a table that does not exist until we execute the block:
In the examples above the entire SQL statement is defined by the composed string.
Any variables we used to compose the string do not appear as variables in the SQL statement.
Instead their values are incorporated into the string.
Let's insert multiple rows into the temperature table above and see what happens
This worked, but was it efficient?
SELECT SUBSTR(STMT_TEXT, 1, 60)
It appears each statement was compiled individually. This is wasteful.
Instead we should recycle the statement and use parameter markers just like we would do from JDBC and ODBC.
EXECUTE IMMEDIATE however does not support variables
To exploit parameter markers we must separate the compilation (PREPARE) from the execution (EXECUTE).
The PREPARE statement will provide us with a handle which can then be used by EXECUTE repeatedly.
DELETE FROM temperature; BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE text VARCHAR(1000); DECLARE stmt STATEMENT; SET text = 'INSERT INTO temperature VALUES(?, ?)'; PREPARE stmt FROM text; WHILE i < 10 DO EXECUTE stmt USING CURRENT DATE - i DAYS, 20 - i; SET i = i + 1; END WHILE; END; / SELECT SUBSTR(STMT_TEXT, 1, 40) AS STMT_TEXT FROM TABLE(MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, NULL)) WHERE STMT_TEXT = 'INSERT INTO temperature VALUES(?, ?)'; INSERT INTO temperature VALUES(?, ?)
This time we only got one entry of the statement in the cache.
Note that the USING clause is optional.
If there are no parameters you can still use PREPARE/EXECUTE. You simply don't use the USING clause.
In addition to input parameters there are also statements that return results.
The first three to discuss here are:
VARIABLE sum INTEGER;
Procedure invocation is very similar using a CALL statement.
What it interesting here is that any INOUT parameters need to be specified twice: Once each in the INTO and the USING clause.
CREATE OR REPLACE PROCEDURE add(INOUT arg1 INTEGER, IN arg2 INTEGER, OUT sign INTEGER) BEGIN SET arg1 = arg1 + arg2; SET sign = SIGN(arg2); END; / VARIABLE arg1 INTEGER; VARIABLE sign INTEGER; BEGIN DECLARE arg1 INTEGER DEFAULT 10; DECLARE arg2 INTEGER DEFAULT -3; DECLARE sign INTEGER; DECLARE text VARCHAR(128) DEFAULT 'CALL add(?, ?, ?)'; DECLARE stmt STATEMENT; PREPARE stmt FROM text; EXECUTE stmt INTO arg1, sign USING arg1, arg2; SET (:arg1, :sign) = (arg1, sign); END; / PRINT arg1; 7 PRINT sign; -1
If you prepare a SET statement with a function that contains OUT and INOUT parameters the same rules apply.
An anonymous block works just the same.
CREATE OR REPLACE FUNCTION add(INOUT arg1 INTEGER, IN arg2 INTEGER)
So how does one operate a dynamic cursor?
BEGIN DECLARE text VARCHAR(128); DECLARE cnt, sum INTEGER; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET text = 'SELECT COUNT(1), SUM(tempC) FROM temperature WHERE tempC < ?'; PREPARE stmt FROM text; OPEN cur USING 15; FETCH cur INTO cnt, sum; CLOSE cur; SET (:cnt, :sum) = (cnt, sum); END; / PRINT cnt; 4 PRINT sum; 50
As you can see, instead of declaring the cursor with a query we declare it with a statement handle.
The PREPARE remains unchanged.
Then, of course, instead of doing and EXECUTE we perform an OPEN, FETCH, CLOSE sequence as is usual for cursors.
Parameters are passed at the OPEN with the USING clause.
Dynamic SQL and string literals
The perhaps most common complication when building dynamic SQL inside of SQL PL is the usage of string literals.
To illustrate let's go all the way back to the first example of the employee table:
CREATE TABLE emp(name VARCHAR(20), salary INTEGER); BEGIN EXECUTE IMMEDIATE 'INSERT INTO emp VALUES ('John', 23000)'; END; / SQL0104N An unexpected token "John" was found following "ERT INTO T VALUES ('". Expected tokens may include: "CONCAT".
What went wrong here?
The single quote before "John" makes the parser believe the string starting with "INSERT" is done.
Now it's looking for a semicolon to finish the statement or perhaps a concat operator to add more pieced to the string.
But really what we want is to produce a string with a single quote inside.
What we need here is an escape sequence.
Single quotes get escaped by doubling them up.
So instead of one single quote we need two single quotes (not one double quote!).
BEGIN EXECUTE IMMEDIATE 'INSERT INTO emp VALUES (''John'', 23000)'; END; / SELECT * FROM emp; NAME SALARY -------------------- ----------- John 23000
So far so good.
Now let's spin this example a bit further and provide "Jack" via a separate variable:
What's wrong this time?BEGIN DECLARE text VARCHAR(20) DEFAULT 'Jack'; EXECUTE IMMEDIATE 'INSERT INTO emp VALUES (' || text || ', 23000)'; END; / SQL0206N "JACK" is not valid in the context where it is used.
VARIABLE text VARCHAR(40); BEGIN DECLARE text VARCHAR(20) DEFAULT 'Jack'; SET :text = 'INSERT INTO emp VALUES (' || text || ', 23000)'; END; / PRINT text; 'INSERT INTO emp VALUES (Jack, 23000)'
"Jack" is not a string at all!
There are no single quotes here.
Therefore "JACK" must be a variable, parameter, column, ...that's why we get the -206 error.
Again we need to ensure we produce the single quotes.
Typically this is done in the surrounding statement since the name may have been passed in or looked up.
BEGIN DECLARE text VARCHAR(20) DEFAULT 'Jack'; EXECUTE IMMEDIATE 'INSERT INTO emp VALUES (''' || text || ''', 23000)'; END; / SELECT * FROM emp; NAME SALARY -------------------- ----------- John 23000 Jack 23000
You may need to squint to see this on a a tablet.
These are three single-quotes each!
After the round bracket the first two single quotes produce the quote needed for "Jack" to be a string literal.
The third single quotes concludes the "INSERT..." string.
The first single quote after the concat (||) operator starts the next string which immediately uses two more single quotes to conclude "Jack" as a string.
BEGIN DECLARE text VARCHAR(20) DEFAULT 'Jack'; SET :text = 'INSERT INTO emp VALUES (''' || text || ''', 23000)'; END; / PRINT text; 'INSERT INTO emp VALUES ('Jack', 23000)'
SergeRielau 120000D76F Tags:  overload default routine procedure parameter function 2 Comments 10,592 Views
In my last post I described how to improve maintainability of routines using named parameter invocation.
Named parameter invocation made it easier to keep the many parameters of routines organized with out messing up the order.
Another way to make a routine invocation easier to read is by reducing the number of arguments I have to specify in the first place.
Let's go back to to that same example.To underline that routines comprise procedures as well as functions I use an inline function this time.
CREATE OR REPLACE FUNCTION total_compensation (IN salary DECIMAL(9, 2), IN percent_bonus DECIMAL(4, 2), IN stocks_value DECIMAL(9, 2)) RETURNS DECIMAL(9, 2) RETURN salary + salary * percent_bonus / 100 + stocks_value;
Now, unless you are looking at a senior member of a company, stocks are unlikely to be part of the equation.
Similarly some employees, such as students and contractors, may not get a bonus.
So, instead doing the following:
VALUES total_compensation(23000, 5.3, 0);
Wouldn't it be nice to have simpler versions of total_compensations?
VALUES total_compensation(23000, 5.3);
In the past the only way to achieve this in DB2 was by overloading the function:
CREATE OR REPLACE FUNCTION total_compensation(IN salary DECIMAL(9, 2), IN percent_bonus DECIMAL(4, 2)) RETURNS DECIMAL(9, 2) RETURN total_compensation(salary, percent_bonus, 0);
But this is very inconvenient!
While we have made the invocation nicer we have have made maintenance of the routine that much harder.
There are now multiple functions of the same name to manage.
There must be a better way...
DROP FUNCTION total_compensation(DECIMAL(), DECIMAL());
DEFAULT for parameters
DB2 9.7 introduces the ability to specify a DEFAULT clause for parameters very similar to that of global variables.
CREATE OR REPLACE FUNCTION total_compensation(IN salary DECIMAL(9, 2), IN percent_bonus DECIMAL(4, 2), IN stocks_value DECIMAL(9, 2) DEFAULT (0)) RETURNS DECIMAL(9, 2) RETURN salary + salary * percent_bonus / 100 + stocks_value;
This is nice, but doesn't exactly knock ones socks of.
We can do more. How about deriving the bonus from the company's targets?
CREATE TABLE company_stats(year integer, target DECFLOAT, achieved DECFLOAT); INSERT INTO company_stats VALUES(2011, 1.1E9, 1.3E9);
Defaults can be any expression as long as the expression does not update the database or take external action.
Now, is there a way that we can default the bonus, but not the stocks?
There are actually two!
One option is to use the DEFAULT keyword, just like you would do in an UPDATE or INSERT statement when defaulting a column value.
VALUES total_compensation(31000, DEFAULT, 10000);
This works well for routines with few parameters.
But when the routine has more parameters and these parameters have names, then you can combine parameter defaulting with named parameter invocation:
VALUES total_compensation(31000, stocks_value => 10000);
Truly, in this case the value of these two features combined is bigger than the sum of the individual parts.
One more benefit of parameter defaulting should not be forgotten. You can now add parameters to routines without having to change the source code of all its invokers.
So if we want to add cash retention to the function all the previous examples of invoking total_compensation will continue to work unchanged:
CREATE OR REPLACE FUNCTION total_compensation
The rules for defaults are rather straight forward and intuitive:
Now if only all those administrative routines had sensible names and defaults....something for the DB2 Development to-do list.
SergeRielau 120000D76F Tags:  function procedure syntax named invocation exec call position parameter routine 6,952 Views
DB2 for LUW supports functions with up to 90 parameters and for procedures the limit it 32000.
While I hope few if any users actually reach these limits, routines with a dozen or more parameters are not uncommon.
Let's take a look at a procedure invocation:
CALL total_compensation(salary, percent_bonus, stocks, total_comp);
Chances are the procedure's definition looked something like:
CREATE OR REPLACE PROCEDURE total_compensation(IN salary DECIMAL(9, 2),
But can we be sure? What will happen happen if the application developer mixed up the order of the arguments?
Perhaps total_comp is the first parameter rather than the last?
This thought is troublesome enough. but now imagine using constants or parameter markers:
VARIABLE result DECIMAL(9,2);
This is completely unreadable!
Sometimes developers do the following:
CALL total_compensation(23000 /* Salary */, 3.4 /* Bonus_Percent */, 5000 /* Stocks_Value */, :result);
This clearly improves readability, but there is no guarantee that there is no mix-up.
Named parameter invocation
DB2 9.7 introduced named parameter invocation in procedure and shortly after in routines.
Named parameter invocation is not a new concept. Personally I remember using this capability is LISP during my university years.
It has also been introduced into SQL by other vendors.
Normally, when invoking a routine the first argument is assigned to the first parameter, the second argument to the second parameter and so on.
This is called the positional syntax for routine invocation.
In named syntax however you associate the arguments to the routine to the parameters by name.
Here is an example:
CALL total_compensation( total_comp => :result,
Note that this syntax is not only more readable, it also allows me to mix up the order.
There is no need to remember which parameter goes where.
There is however a need to know the names of parameters.
Maintainable code comes at a price.
The rule for named parameter invocation are quite simple:
First you have to know the exact parameter names and type them in. This is where IDE's could provide some real help
But the bigger downside is that you now have a dependency on the parameter name used that is not recorded anywhere.
So if someone changes the names or a procedure parameters the invoking routine will fail to automatically recompile
If you have a good coding standards around CamelCasing, using underbars and so on, then neither reason should be big problem.
All in all I think named parameter invocation is a great feature addition to DB2 which aids in application development and maintenance.