Matthias Nicola of IBM labs covers the bi-temporal data management features which let you manage data at past or future points in time, consistently and cost-effectively. Replay at: www.idug-db2.com/
SQL Tips for DB2 LUW
Please join me on June 21, 2012 12:30 - 2:00 PM ET for our newest installment of DB2 Tech Talk.
This Tech Talk continues the "deep dive" on the new DB2 10 and InfoSphere Warehouse 10 products.
Matthias Nicola of IBM labs covers the bi-temporal data management features which let you manage data at past or future points in time, consistently and cost-effectively. Replay at: www.idug-db2.com/
SergeRielau 120000D76F Tags:  statement cursor concentrator parameter marker sharing 3 Comments 9,340 Views
Some years ago I visited a business partner to help them overcome some performance problems on DB2.
They had a rather simple workload for quality assurance test which ran on a simple Windows PC.
On another DBMS that workload executed without a problem and with acceptable throughput.
On DB2 however that same workload completely overwhelmed the machine and throughput was, frankly, abysmal.
Needless to say the partner was not too amused.
We quickly discovered that the CPU was pegged at 100%.
We further discovered that DB2 saturated the CPU through compilation of SQL statements.
A dump of the package cache showed that it was flooded with simple update statements.
Each update being identical to the next except for input values.
That in itself could not explain why the CPU usage was so high.
The workload was identical to the competitor's DBMS.
It turned out that there was a recursive update trigger using inline SQL PL.
So what looked like a simple statement was actually quite complex.
The other DBMS did not have the concept of inline SQL PL.
It did not have to pay for trigger compilation for each statement.
Once the problem was known the fix was simple:
Using parameter markers there was no need to keep compiling SQL Statements.
The inline SQL PL provided an additional boost propelling DB2's performance past the other DBMS.
While the outcome was good, the partner needed to do work to achieve acceptable performance.
Lesson learned: DB2 must run really bad SQL really fast.
In DB2 9.7 two features were added which are meant to fix the issue we encountered:
A Problem Scenario
It is a best practice when working with DB2, or most other DBMS to use parameter markers or host variables when submitting dynamic SQL.
DB2 maintains a cache of recently executed SQL statements.
Every time a new SQL statement is submitted DB2 will match it to the existing cache entries.
If a match is found then there is no need to parse and compile.
DB2 can execute the SQL statement right away.
Unfortunately many application developers do not follow that practice.
Even more unfortunate is that several abstract APIs which generate SQL under the covers also do not use parameter markers,
Let's take a look what happens here by setting up a simple experiment.
VARIABLE elapsed INTEGER; BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; WHILE i < 50000 DO SET txt = 'SELECT c1 + 1 FROM (VALUES(' || i || ')) AS T(c1)'; PREPARE stmt FROM txt; OPEN cur; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed 42 COLUMN STMT_TEXT FORMAT A80 SET WRAP OFF SET LINESIZE 200 SELECT VARCHAR(SUBSTR(STMT_TEXT, 1, 80)) AS STMT_TEXT FROM TABLE(MON_GET_PKG_CACHE_STMT('D', NULL, NULL, NULL)) WHERE STMT_TEXT LIKE 'SELECT c1%' FETCH FIRST 10 ROWS ONLY; STMT_TEXT -------------------------------------------------------------------------------- SELECT c1 + 1 FROM (VALUES(49993)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49998)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49894)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49940)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49963)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49925)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49906)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49968)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49994)) AS T(c1) SELECT c1 + 1 FROM (VALUES(49898)) AS T(c1) 10 rows were retrieved.
So 50000 executions of this simple SQL Statement take about 42 seconds to execute.
And each statement was compiled and inserted into the package cache.
The best fix
The above scenario represents bad SQL programming.
The correct way to write SQL is to use parameter markers and bind the values when the query is being executed.
What happens when we use a parameter marker?
FLUSH PACKAGE CACHE DYNAMIC; BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; WHILE i < 50000 DO SET txt = 'SELECT c1 + 1 FROM (VALUES(CAST(? AS INTEGER))) AS T(c1)'; PREPARE stmt FROM txt; OPEN cur USING i; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed 2
That's a 20 fold increase. when using parameter markers!
Here DB2 will match the txt content with the previously generated plan and simply assign the stmt handle.
We can do even better by doing that ourselves: We pull the prepare outside the loop:
BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; SET txt = 'SELECT c1 + 1 FROM (VALUES(CAST(? AS INTEGER))) AS T(c1)'; PREPARE stmt FROM txt; WHILE i < 50000 DO OPEN cur USING i; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed; 1
A finer granularity measuring would be better here.
We could use some a technique discussed earlier in this blog to exactly subtract timestamps.
But this last part isn't really the point of investigation.
What we want to achieve is to speed up this test without changing the application!
Using the Statement Concentrator
The statement concentrator is a feature that has been introduced in DB2 9.7.
When the concentrator is turned on every dynamic SQL statement text which cannot be matched to an entry in the cache will undergo some very basic parsing.
In this parsing DB2 will find constant literals such as 'Hello', and 5 and replace them with typed parameter markers.
get's replaced with the equivalent:
VALUES CAST(? AS INTEGER)
DB2 will then attempt to match the modified SQL statement text once more.
If a match is found then DB2 will execute the cached statement and supply the extracted constant literals.
If no match is found DB2 will compile the statement, place it in the cache and then execute it.
The easiest way to turn of the statement concentrator is through the DB configuration.
This is online and will effect any subsequent compilations from any session:
UPDATE DB CFG USING STMT_CONC LITERALS; FLUSH PACKAGE CACHE DYNAMIC; BEGIN DECLARE i INTEGER DEFAULT 0; DECLARE txt VARCHAR(1000); DECLARE res INTEGER; DECLARE start TIMESTAMP; DECLARE stmt STATEMENT; DECLARE cur CURSOR FOR stmt; SET start = CURRENT TIMESTAMP; WHILE i < 50000 DO SET txt = 'SELECT c1 + 1 FROM (VALUES(' || i || ')) AS T(c1)'; PREPARE stmt FROM txt; OPEN cur; FETCH cur INTO res; CLOSE cur; SET i = i + 1; END WHILE; SET :elapsed = CURRENT TIMESTAMP - start; END; / PRINT elapsed 4 SELECT VARCHAR(SUBSTR(STMT_TEXT, 1, 80)) AS STMT_TEXT FROM TABLE(MON_GET_PKG_CACHE_STMT('D', NULL, NULL, NULL)) WHERE STMT_TEXT LIKE 'SELECT c1%' FETCH FIRST 10 ROWS ONLY; STMT_TEXT -------------------------------------------------------------------------------- SELECT c1 + :L0 FROM (VALUES(:L1 )) AS T(c1)
As you can see the load executed manifold faster now.
But the statement concentrator is no complete substitute for using parameter markers.
There is an overhead to shallow parsing.
There are several rules to consider when dealing with the statement concentrator:
In this post I want to introduce a handy procedure which can be used to suspend a session for a specified amount of time.A naive approach to wait is to simply execute a tight loop in a stored procedure such as this:
--#SET TERMINATOR @
Sometimes naive works. Here it is definitely not.
The problem is that the LOOP will consume CPU while waiting. That is not acceptable.
You can test this out yourself by profiling the procedure using the SQL PL Profiler discussed earlier.
A better way is to really suspend the thread and thus allowing DB2 to do other work with the CPU.
The C language provides a sleep(sec) function on Unix and Sleep(msec) on Windows.
There is also a Sleep() on Java.
Just for the fun of it let's use C today.
VALUES CURRENT TIMESTAMP;
If you have an Oracle background you can create a DBMS_LOCK module for comfort:
CREATE OR REPLACE MODULE DBMS_LOCK;
Some of you may know that I am engaged in enabling Oracle applications to DB2.
One of the frequent "issues" we see are differences between the way how indexing works in Oracle and DB2 and, more specifically about different semantics for unique indexes.
In DB2 a unique index requires that no two keys are the same.
Sounds simple and clear on the surface, but not so clear when you consider NULLs.
Generally speaking a NULL cannot be compared to a NULL and thus it is debatable whether one should allow more than one NULL key in a unique index. In the DB2 world only one NULL key may exist in a unique index.
In Oracle however you can have multiple NULL keys.
So does that mean in Oracle a NULL is not equal to a NULL? Not really.
Let's look closer and define what a "key" is.
A "key" is the set of columns making up an index. That is it may be one column or multiple columns.
So when we talk about a NULL key that is a key where all columns are NULL.
So when I say you can have multiple NULL keys in Oracle that means that a key will all NULLs can appear multiple times.
But a key where at least one column is NOT NULL cannot appear multiple times.
So this key is legal for a unique index in Oracle:
(NULL, NULL)But this key is not:
(NULL, 1)So even in Oracle NULL's are equal to NULLs most of the times as far as unique keys are concerned.
So, if DB2 and Oracle are so close why are they not the same?
Simple: Oracle does not index NULL keys ever. Unique or not there is no such thing an an index with an all NULL key.
As a result a UNIQUE index in Oracle simply does not know about NULL keys. It's literally blind to the issue.
Well, that's all nice as an explanation, but if you have a need for a unique index with this Oracle behavior, what do you do?
What is needed is an extra column that makes all NULL keys unique but does not interfere with uniqueness enforcement if there is any NULL column in the key.
db2 => insert into t(c1) values 1, 2, 3, 4, 5, NULL;
Having demonstrated how to aggregate strings using XML in an earlier post I have been asked about how to do the inverse.
Giving a a string and a defined separator, produce a table with one row per substring.
So here is what I came up with:
Note that the order in which the strings are returned is not exactly defined.CREATE OR REPLACE FUNCTION PK_BASE.SPLIT(text VARCHAR(32000), split VARCHAR(10))
Also note that th3 code supports empty strings. that is if to split-characters follow each other the matching string reported is empty (or NULL in VARCHAR2 mode)
SELECT * FROM TABLE(PK_BASE.SPLIT('123/45/6/789/abc/def/ghi', '/'));
SergeRielau 120000D76F Tags:  java network performance jdbc db2 enableextendeddescribe 2 Comments 9,876 Views
But first, what's up with Serge?
OK, I haven't been posting for a long time, but I do have a good excuse.
OpenPages is a product IBM acquired some two years ago that is leading in the eGRC (enterprise Governance, Risk, and Compliance) space.
Luckily, for me, OpenPages was not totally new to me. The product has undergone "blue washing" and now supports DB2 for LUW in Oracle compatibility mode.
The role of the network in application performance
With my old DB2 server hat performance used to be a function of reducing code path within DB2, and reducing Disk IO.
To be sure, reducing network trips between client and server was also on the agenda and those who have seen my "SQL on Fire 1" and 'SQL on Fire 2" presentations have witnessed me driving that point home.
There is one area however that I was totally blind towards: The amount of data that actually flows on the network.
DB2 uses the open DRDA standard protocol to flow statements and bind variables from the client to the server and resultsets or out parameter values back to the client. That's just what it is.
Now, prior to helping blue wash OpenPages I was engaged in the same effort at the Tivoli Maximo.
Maximo on DB2 was slower than Maximo on Oracle!
All telltales on the application and the DBMS told a different story, however.
It took a while until we seriously started looking at the network as only remaining cause.
We found out that small DB2 result sets were multiple times bigger on the wire than those from Oracle.
Why does this matter?
I'm no networking expert, but the more frames are being transmitted the higher the likely hood that one of them encounters a collision.
And if a collision is encountered, I'm told, the entire stream must be resubmitted, not just the lost frame.
Result Sets in DRDA are largely self describing. There is a header which describes what is flowing back.
But there is other information that can be of interest, such as flagging of special properties of a column:
Other examples include information about the makeup of the query such as the qualified names of the tables in the FROM clause of the query and the original names of the table columns that feed the result set columns.
All this is additional metadata (aka "Extended Describe") for which the size is independent of the number of rows being returned.
In an OLTP environment where queries are fast to execute and mostly return just one row, often with few columns, this metadata can quickly become a multiple of the actual payload in size.
And this is exactly what happened here!
DB2's JDBC driver, by default asks for Extended Describe information because it cannot know whether the data will be requested at a later point or not.
The following three values are possible:
NO is what we want to set here unless we happen to call any of the following methods:
The proof is in the pudding
In the case of Tivoli Maximo disabling Extended Describe gathering in the JDBC driver solved the performance problem.
Last I heard Maximo on DB2 performs better than Maximo on Oracle.
Here at OpenPages one key performance critical benchmark that I used to test out the property (a simple update to the WebSphere configuration) gave an instant 10% boost.
So, give it a try and release that handbrake you have been driving around with for years!
SergeRielau 120000D76F Tags:  optim talk query manager tech perfromance tuner 2 Comments 9,223 Views
In this episode of the DB2 Tech Talk Series Cliff Leung and Holly Hayes will discuss how you can use Optim Query Workload Tuner and Optim Performance Manager to easily get to the bottom of common database problems.
For those who have not looked at OPM and OQWT in a while, prepare to be amazed.
We hope to see you at this very informative Tech Talk on Thursday August 30th at 12:30PM - 2PM ET!
SergeRielau 120000D76F Tags:  luw windows z db2 i unix linux sql reference 2 Comments 10,911 Views
The DB2 family is comprised of three products:
And each product has its own set of loyal and new customers.
Naturally different customers and applications mean that there are different needs for features when planning for new releases.
To maintain the highest level of compatibility between the platform as new versions of DB2 are developed and to enable partners and customers to share skills and applications a number measures are taken including:
Cross-Platform SQL Reference Version 4
That being said there is a new version 4 of this SQL Reference for Cross-Platform Development available here.
The document is lined up with:
If you want to get older versions of the SQL Reference or keep track of future version you can find them on developer works here.
SergeRielau 120000D76F Tags:  number mvcc db2 varchar2 decode isolation snapshot pl_sql rowid compatibility sql oracle dual 2 Comments 11,800 Views
Join me in this DB2 Tech Talk where I will provide an update on DB2's Oracle Compatibility features.
I will:This technical tour is the third DB2 Tech Talk on the DB2 10 and InfoSphere Warehouse 10 product releases.
Additional webcasts will be offered throughout 2012 to provide an in-depth discussion on the features in these releases.
May 31, 2012
Also be reminded that on May 11, 2012 at 12:30 EST we will discuss Optimizing Storage Utilization for DB2 10
SergeRielau 120000D76F Tags:  table return transition from insert select new old order call final delete processing update with 2 Comments 38,172 Views
Remember the days when DB2 re-entered the TPC-C fight after a long hiatus?
The time was DB2 8.1 FP4. I seriously do not remember years.. Time passes in releases.
While my colleagues were tuning code path, bufferpools disks my team was looking at something more fundamental.
What can we do to make SQL as efficient as possible for TPC-C.
And ideally how can we make SQL more efficient for any class of OLTP workload.
In other words how can we pour the most bang into the least SQL for a typical OLTP transaction?
The result was what we called the new SQL.
If I recall correctly it consisted out of the following enhancements:
In this post I want to dive into the last bullet.
DB2 8.1 FP4 was a long time ago, but there are still developers who are not yet familiar with this powerful feature.
So it warrants some attention.
One very common task in an OLTP system is that of order processing.
You can distinguish between three phases of order processing
Taking an order
Taking an order is comprised of:
This is where the inline table function came in in TPC-C.
Our schema is simple:
CREATE TABLE order(order_id INTEGER NOT NULL PRIMARY KEY, stamp TIMESTAMP DEFAULT CURRENT TIMESTAMP, name VARCHAR(20),
To submit an order an order-id must be generated.
Different ways to do so exist is various products:
Either two SQL Statements need to be executed or the statement need to comply with very specific properties: E.g. single row insert with identity column.
As a response to this problem each some vendors have invented extensions to INSERT, UPDATE and DELETE such as a WITH RETURN clause.
The purpose of the clause is to pick up on new, changed or deleted rows and pass them back to the client or insert them into a table or variable.
The approach we took in DB2 has been radically different.
CREATE OR REPLACE SEQUENCE order_seq; SELECT order_id, stamp FROM NEW TABLE(INSERT INTO order(order_id, name, price) VALUES(NEXT VALUE FOR order_seq, 'John', 1000)); ORDER_ID STAMP ----------- -------------------------- 1 2012-05-09 22:05:29.156000
What we did in DB2 was to simply expose the trigger transition table NEW TABLE in the from-clause.
So, when you put an INSERT statement into the from clause that insert statement is executed and in the process a transition table is produced.
That transition table, which includes all the modifications of any before triggers can then be queried.
Note in the example above that we did not only pick up the generated "ORDER_ID", but also the value generated for the "STAMP" column.
How is that different from an INSERT WITH RETURN as available in Oracle?
INSERT WITH RETURN requires an INTO clause. That is the returned values must leave SQL and be returned into a variable.
Once they are returned you can re-scan them for further processing.
Pushing INSERT into the from clause allows for the full power of SQL to be applied for further processing.
Any client language that knows how to process a cursor can immediately use the feature.
DB2 can handle multiple order inserts:
SELECT order_id, stamp FROM NEW TABLE(INSERT INTO order(order_id, name, price) VALUES(NEXT VALUE FOR order_seq, 'John', 1000), (NEXT VALUE FOR order_seq, 'Beth', 500), (NEXT VALUE FOR order_seq, 'John', 1200)); ORDER_ID STAMP
Better yet, DB2 can process the result. For example it can do totals on the orders:
SELECT order_id, stamp, name, sum(price) over(order by order_id) as total_price FROM NEW TABLE(INSERT INTO order(order_id, name, price) VALUES(NEXT VALUE FOR order_seq, 'John', 1000), (NEXT VALUE FOR order_seq, 'Beth', 500), (NEXT VALUE FOR order_seq, 'John', 1200), (NEXT VALUE FOR order_seq, 'Beth', 700), (NEXT VALUE FOR order_seq, 'Jack', 1100)); ORDER_ID STAMP NAME TOTAL_PRICE
The question which you should immediately ask is:
What is the overhead of NEW TABLE?
How is this different than inserting the rows into an array or temp table and then rescanning that array?
An explain of the statement above will show:
Rows RETURN ( 1) Cost I/O | 5 TBSCAN ( 2) 6.7822 1 | 5 SORT ( 3) 6.78117 1 | 5 TBSCAN ( 4) 6.77634 1 | 5 SORT ( 5) 6.7753 1 | 5 INSERT ( 6) 6.77286 1 /------+------\ 5 0 TBSCAN TABLE: ADMINISTRATOR ( 7) ORDER 8.88007e-005 Q5 0 | 5 TABFNC: SYSIBM GENROW Q1
Note the absence of any temp here. The INSERT (6) serves immediately as the input for the OLAP function.
SORT (5) handles the "ORDER BY order_id" of the SUM.
SORT (3) however is interesting:
3) SORT : (Sort) ... SORTKEY : (Sort Key column) NONE
This SORT doesn't actually sort. All it does is to force the cursor to be INSENSITIVE.
This assures that the INSERT is completed when the cursor is OPENed.
A common technique is to open such cursors as WITH HOLD and then COMMIT before first fetch.
That way no locks are being held while the cursor is being fetched.
Dispatching an order
After the order has been accepted it needs to be processed.
This can either happen in a single database transaction or the workflow can consist of multiple transaction.
In the second case the order needs to be marked on the queue as being processed.
Only once the processing is completed in a later transaction can it be deleted.
Let's assume there are multiple agents who are fulfilling orders.
Here is an efficient way to do that using the same technology as described for INSERT with an UPDATE statement.
CREATE OR REPLACE VARIABLE my_agent_id INTEGER; SET my_agent_id = 1; SELECT order_id, stamp, name, price FROM OLD TABLE(UPDATE (SELECT * FROM order WHERE agent_id IS NULL
We have select the oldest order which is unassigned (order_id IS NULL).
That order has been updated with the local agent_id and the row has then been returned as a query.
Note the usage of OLD TABLE here. You can choose either the NEW or OLD transition table for UPDATE operations.
Generally the optimizer plan is better when you use the OLD TABLE whenever possible whenever the subject of the UPDATE is not a base table.
The plan is very tight:
Rows RETURN ( 1) Cost I/O | 1 TBSCAN ( 2) 13.5567 2 | 1 SORT ( 3) 13.5561 2 | 1 UPDATE ( 4) 13.5549 2 /-----+-----\ 1 9 FETCH TABLE: ADMINISTRATOR ( 5) ORDER 6.79517 Q1 1 /-----+------\ 9 9 IXSCAN TABLE: ADMINISTRATOR ( 6) ORDER 0.0205156 Q2 0 | 9 INDEX: SYSIBM SQL120509220520090 Q2
After the cursor is opened the transaction can immediately be committed.
As a result update locks are held a minimal amount of time with no chance of a deadlock.
Deleting an order
Once the order has been fulfilled it can easily be deleted from the queue by order_id.
Alternatively the order can be marked as processed in an extra column.
For the sake of this exercise however, let's assume we want to delete the order from the order table and archive it in a separate table.
WITH del AS (SELECT * FROM OLD TABLE(DELETE FROM order WHERE order_id = 1)) SELECT fulfilled
What we have built here is a pipeline of actions.
First we DELETE the order we have finished processing.
But we interrogate the OLD transition table to pass that information on to the order_archive table.
Finally we capture the timestamp when the order was fulfilled from the NEW transition table of the order_archive table.
Theoretically this could all be written in one nested query.
But these statements can be very complex.
What if the query contained joins where the table is updated and select from in the same query?
What if the same table is modified multiple times?
Therefore DB2 restricts NEW TABLE and OLD TABLE to occur in nested queries or in the presence of joins.
Placing them into common table expressions however is allowed.
Common table expressions provide a natural order.
So if there are conflicts DB2 will execute each query in the WITH clause at a time in order of specification.
If however there are no conflicts DB2 can provide a very efficient access path:
Rows RETURN ( 1) Cost I/O | 1 TBSCAN ( 2) 20.3274 3 | 1 SORT ( 3) 20.3268 3 | 1 INSERT ( 4) 20.3256 3 /-----+-----\ 1 1 DELETE TABLE: ADMINISTRATOR ( 5) ORDER_ARCHIVE 13.5558 Q6 2 /-----+-----\ 1 9 FETCH TABLE: ADMINISTRATOR ( 6) ORDER 6.78603 Q1 1 /-----+-----\ 1 9 IXSCAN TABLE: ADMINISTRATOR ( 7) ORDER 0.015546 Q2 0 | 9 INDEX: SYSIBM SQL120509220520090 Q2
Note how the DELETE and INSERT are stacked!
You can use the same technique also to stack INSERTs.
For example to split a staging table across multiple target tables.
I'll save that one for another day though since it will introduce another fancy clause..