Making SQL PL and PL/SQL sensitive to changes in statistics, and more
Comments (4) Visits (6814)
Recently one of our major business partners submitted the following feature request:
"Please make all the SQL in my SQL Routines dynamic!"
Why, I asked, would you want to do that? If you make the SQL dynamic at lot of bad things would happen to your SQL:
There are probably more side-effects when switching from static SQL to dynamic. But these are the ones I can come up with immediately.
The answer I got was interesting:
"I'm OK with all conservative binding semantics, but I want to be sensitive to statistics changes.
Since the database is empty when the routines are created.
The execution plans will nearly guaranteed be sub-optimal."
So what this vendor wants is for SQL within routines and triggers to be recompiled whenever there is a change to any of the used objects.
This may be an update to the statistics or perhaps the addition of an index.
In case of dynamic SQL such operations cause so called "soft invalidation".
That is the cached SQL plans for the statements which depend on the changed objects cannot be reused for new invocations.
So the first new invocation of the the statement will cause a recompilation placing an updated version of the plan in the cache.
Well, if that's all you want fro static SQL, then DB2 can actually do that today.
Here I describe how.
As always we'll work of a running example. The following schema defines two tables "EMP" and "DEPT".
We then create a procedure which fetches an employee based on the emp id.
The *cough* "complexity" lies in the fact that there is a join.
SET SCHEMA = SAMPLE;
Insert some data.
INSERT INTO dept VALUES(1, 'Marketing');
Test the procedure.
CALL get_emp(1, ?, ?);
Exposing the problem
Using my SQL PL profiler we can easily display the executed plan:
A small problem. I had intended to start with a nested-loop join and argue that this would be a bad join choice as the number of rows increases.
DB2 outwitted me (again).
But there are less subtle ways to "encourage" a plan changes:
ALTER TABLE emp ADD PRIMARY KEY (id);
Nothing has changed! DB2 did not pick the new indices.
None of the changes we did to the table forced DB2 to invalidate the plan, so it didn't.
That's the premise of static SQL.
In order to get DB2 to re-consider the plan we must tell it to do so explicitly:
So in DB2's preferred mode of operation it is your responsibility to rebind static SQL after updating statistics or adding indexes.
But what if I want DB2 to take care of itself? I have already turned on the self tuning memory manager and automatic runstats collection.
I'm happy how that works for dynamic SQL and I want DB2 to take care of all my SQL PL or PL/SQL as well.
Can VALIDATE RUN do the job?
My original comeback to the vendor was to try VALIDATE RUN.
This BIND option will delay compilation of a static SQL statement until first execution - I thought.
So that will solve the problem.
Unfortunately during development of this scenario I discovered that I had missed a crucial property of VALIDATE RUN.
VALIDATE RUN is a fail-over mechanism only.
That is, only SQL that would otherwise raise a bind time error gets pushed out.
The purpose of VALIDATE RUN is to cross your fingers and hope by the time the SQL statement executes a missing object such as a user temporary table is available.
If your SQL has no such errors, which can be assumed, the SQL will be just as static as without VALIDATE RUN.
What VALIDATE RUN can't REOPT ONCE can.
The idea of using a BIND option however does hold water. We just have to pick a different option.
REOPT ONCE and REOPT ALWAYS are options which delay optimization of an SQL statement until the input values are available.
The idea is that given a specific set of input values a better plan can be generated.
When defining SQL Routines, triggers or even anonymous blocks there is no direct way to set the BIND options within the syntax.
Instead you set bind options either at an instance level using the DB2_
Let's redo our scenario using REOPT ONCE.
DROP TABLE SAMPLE.emp;
Enable REOPT ONCE for the following routine definitions.
Note that, despite the name, compiled triggers and anonymous blocks will also be affected.
It's good practice to clean up after ourselves.
We don't expect any changes yet. The plan should be identical to the original scenario:
Now, this is where it gets interesting:
ALTER TABLE emp ADD PRIMARY KEY (id);
Success! The plan changed without the need to rebind.
For the record
REOPT ONCE does not affect the semantics of static SQL:
One of the nice things about static SQL is that you can follow the who's using whom chains:
SELECT BNAME FROM SYSCAT.ROUTINEDEP
There is one more upside to using REOPT ONCE with SQL PL.
While the granularity of invalidation on a schema change is a package in static SQL, using REOPT ONCE will cause only directly affected statements to be invalidated.
This will greatly reduce the impact of schema evolution.
One last word of caution:
REOPT ONCE does assume that the plan gets better when given the first set of input values.
While this is true in most cases, it cannot be guaranteed!