Blog Content/Topics List
The
information provided applies to the Informix Server Version 11.70.xC3
and later.
In this blog entry we will look in detail at the steps utilizing the
"SmartMart" feature that are performed by the shell script introduced in
the
last blog entry. The "SmartMart" feature was introduced with the
recent release of Informix Ultimate Warehouse Edition 11.70.FC3 and is a
very helpful tool for the creation of data marts that accelerate a
specific query workload.
A query sent to the Informix Server can be accelerated by the
Informix Warehouse Accelerator only, if the apropriate data mart meeting
the queries requirements exists in the accelerator. Acceleration of
partial queries (where the rest of the query would be executed by the
Informix Server) is not supported with current releases. Furthermore,
the accelerator can use only a single data mart for the execution of a
single query. This means it will not take data missing in one data mart
from a different data mart that happens to exist in the accelerator.
With that it is obvious, that the data mart must contain all the needed
columns of all the tables involved in a specific query. Additionally,
data marts are created for the specific purpose of executing queries
with many joins between different tables exceptionally fast. To make
this possible, the data marts are built with an internal structure that
greatly helps performing the joins of the query. However, this requires
specifying these table joins in the data mart definition, so that the
data mart can be constructed accordingly.
An oversimplifying solution might be to include the whole database,
i.e. all tables with all their columns, in a single data mart and
specifying for this data mart all possible joins, with the intent that
such a data mart will be able to accelerate all queries that may ever be
issued. While this may work with a small database, [ like the
"iwadb" example database ;-) ], it most probably is not a
good approach for a real warehouse production database, for two reasons:
first, warehouse databases tend to be very big, as they contain all the
data that is available, even though by many queries it may never be
needed. Typical examples are wide character columns that contain
extensive descriptions for some other column with an encoded key value.
Including such descriptive text columns in a data mart would
unnecessarily bloat the size of a data mart and subsequently require a
lot of memory, as the accelerator keeps all data in memory for
performance reasons. Secondly, a data mart with a lot of unneeded data
as ballast would diminish acceleration, as would unnecessarily specified
joins, especially if they would be n:m joins (rather than 1:n
joins).
To get the most acceleration, it is paramount to find the optimal
definition for a data mart. With all the tables and columns needed, but
excluding columns and perhaps even whole tables not required. And to
specify the necessary joins between the tables just as they are used in
the queries. For a data warehouse with a complex database schema and a
workload including reports, each possibly running hundreds of different
queries, it is quite a challenge to figure out the optimal data mart
definition.
The "SmartMart" feature offers a semi-automatic method for creating
the right data mart. While it automates the daunting task of query
workload analysis, it still lets you intervene where possibly useful.
The main goal is to produce an XML file containing the data mart
definition, that either can be used readily to create and load the data
mart, or can be manually modified to suit individual requirements before
using it to deploy the data mart. The main steps in the process are:
- Perform Update Statics Low.
- Collecting query data for analysis. We call this "query probing".
- Analysing the collected data to form an internal representation of
a mart definition.
- Extracting the mart definition in XML format.
In the following we will look at each of these steps in detail.
Update Statistics Low
If statistics on the database have not yet been updated, it is
necessary to do this before beginning with the query probing. As a
minimmum UPDATE STATISTICS LOW
should be run, so that the
database server has the basic information on all the tables and columns
that is later to be collected in the query probing step. As long as the
schema of the database doesn't get changed, it is sufficient to update
the statistics just once.
"Query Probing"
When query probing is switched on for a specific session, the
database server internally collects information about the queries run in
the session and stores this information in its memory. The collected
data can be looked at by querying the SMI (i.e. pseudo tables in the
sysmaster database) or by using the command onstat -g
probe
. The data remains available until it is either actively
cleared, or the Informix Server instance gets shut down.
The SQL statement to clear any probing data possibly still existing
is SET ENVIRONMENT use_dwa 'probe cleanup';
. Executing this
SQL statement before starting a new query probing session probably is
good practice to avoid including old query data in a new mart
definition. To control that the data was cleared, onstat -g
probe
can be executed. It should not show any query information
in its output.
The SQL statement SET ENVIRONMENT use_dwa 'probe start';
is used to activate the query probing for the remaining SQL statments of
the active session. Data for all these SQL statements will be collected
in memory and later be used for the analysis phase to build the internal
representation of the data mart. This may even include query data
collected earlier - if the data was not cleared using the SQL statement
with the "probe cleanup" value as described above.
Thus, using the "iwadb" example database, we can execute the following
commands to perform the query probing phase:
- Clear any data from previous query probing:
% echo "set environment use_dwa 'probe cleanup';" | dbaccess iwadb -
Database selected.
Environment set.
Database closed.
%
- Probe the queries of the workload without actually executing
the queries (to avoid a potentially long wait time). The workload is
contained in file q1.sql that came with the "iwadb" example
database and actually contains only a single query statement:
% ( echo "set explain on avoid_execute; \
set environment use_dwa 'probe start';"; \
cat q1.sql ) | dbaccess iwadb -
Database selected.
Explain set.
Environment set.
No rows found.
Database closed.
%
It is important that the command set environment use_dwa
'probe start';
and the query workload are executed in the same
session. In the above example this is achieved by grouping together
the echo
and the cat
commands in a subshell,
using the parentheses. This way dbaccess will treat all the output of
these commands as input for a single database session. The output "No
rows found." is correct, because we chose to not really execute the
query workload.
- Optionally we can check the collected probing data:
% onstat -g probe
IBM Informix Dynamic Server Version 11.70.FC4 -- On-Line --
Up 6 days 19:20:45 -- 293100 Kbytes
DWA probing data for database iwadb:
statement 5:
columns: tabid[colno,...]
100[1,2]
101[1,3]
102[1,2]
103[1,2]
104[1,2]
105[1,2,3]
106[1,2,3,4,5]
109[1,5]
110[1,2]
111[1,2,4] f
joins: tabid[colno,...] = tabid[colno,...] (type) {u:unique}
105[3] = 101[1] (inner) u
106[3] = 104[1] (inner) u
106[5] = 102[1] (inner) u
106[2] = 103[1] (inner) u
106[4] = 105[1] (inner) u
101[3] = 100[1] (inner) u
109[5] = 101[1] (inner) u
110[2] = 109[1] (inner) u
111[2] = 106[1] (inner) u
111[1] = 110[1] (inner) u
%
The command onstat -g probe
presents the data in a
rather raw format, only giving table IDs and column numbers (of the
respective table). In the first part of the output there is the list
of tables and their columns accessed by the query. The 'f'
accompanying the table with tabid 111 signifies, that this table is
the fact table of the query. The second part of the output contains
the join information, where the 'u' signifies that there is a unique
constraint on the column of the parent table. This is important, as it
will later allow to define the preferred 1:n relationship in the
reference definition for the data mart, rather than an n:m
relationship.
Analysing Query Data
In the analysis phase the collected query probing data will be used
to determine the structure of a data mart that will be able to
accelerate all the queries that have been probed. The structure of the
data mart not only includes the needed tables and columns, but also the
join criteria between the tables. The result of the analysis basically
is the data mart definition, but it will be stored in an internal format
in special tables. These tables will be created if they do not yet
exist. Due to the internal method of handling the tables it is
necessary that the database containing these table has logging enabled.
As real data warehouse databases often do not have any logging, it is
recommended to always use a separate database (with logging enabled)
specifically for the purpose of the analysis phase. This will also
separate the mart definition data from the data warehouse, which may get
re-created periodically (and thus the mart definition data would be
lost). If this database is used only for the purpose of query probing
analysis, then it should not need a lot of (DBSpace) space. Once this
separate database with logging has been created, it can be utilized to
analyse query data for different data marts (also of different warehouse
databases).
The stored procedure probe2mart() is used to perform the
analysis of the query probing data and the result will be stored
permanently in the tables of this database. Therefore the data can no
longer get lost like the probing data from the previous phase that is
kept in memory only.
To avoid for a specific data mart the mixing of existing data in the
database with new probing data analysed in this phase, the old data in
the database first needs to be cleared. This is done by manually
executing a delete statement, specifying the name of the data mart:
delete from iwa_marts where martname='...';
.
Based upon the probing phase completed previously, we can now
proceed to the analysis phase with the following steps:
- Prepare the logging database named "smartmart_demodb"
for the new data mart "smart_iwadb1":
- If the separate database named "smartmart_demodb" with
logging does not yet exist, we create it now for the mart
definition data:
% echo "create database smartmart_demodb with log;" | dbaccess - -
Database created.
Database closed.
%
- If the database "smartmart_demodb" already exists, we execute
the above mentioned delete statement to make sure that no old
data for our new data mart named "smart_iwadb1" lingers in the
database:
% echo 'delete from iwa_marts where martname = "smart_iwadb1";' \
| dbaccess smartmart_demodb -
Database selected.
1 row(s) deleted.
Database closed.
%
- Analyse the probing data of our "iwadb" example database to form a
mart definition for a new data mart to be named "smart_iwadb1":
% echo "execute procedure probe2mart('iwadb', 'smart_iwadb1');" \
| dbaccess smartmart_demodb -
Database selected.
Routine executed.
Database closed.
%
We ran the probe2mart() stored procedure connected to the
database "smartmart_demodb". If they did not exist before, new
tables holding the mart definition in an internal format have been
created by the stored procedure in this database
"smartmart_demodb".
- Optionally we can look at the table "iwa_marts" in database
"smartmart_demodb" to check, whether a new mart definition was
created:
% echo "select * from iwa_marts;" | dbaccess smartmart_demodb -
Database selected.
mid 1
martname smart_iwadb1
database iwadb
1 row(s) retrieved.
Database closed.
%
The remaining data of the mart definition is kept in tables
"iwa_mtabs","iwa_mcols", "iwa_mrefs" and "iwa_mrefcols", where user
"informix" is the owner of each of these tables. As knowledge of the
contents of these tables is not needed for successfully using the
"SmartMart" functionality, we will skip any further exploration.
- As the probing data has been analysed and the result is stored
permanently in the "smartmart_demodb" database, we can now safely
clear the probing data that was collected for the "iwadb" example
databse:
% echo "set environment use_dwa 'probe cleanup';" | dbaccess iwadb -
Database selected.
Environment set.
Database closed.
%
With that all the needed mart definition data is stored in the tables
of the "smartmart_demodb" database and the analysis phase is
complete.
Extracting the Mart Definition
Finally we want to create the data mart. For that we first need to
extract the data mart definition in the needed XML format from the
"smartmart_demodb" database and put the XML text into a file. This file
can then be used to create the data mart, either utilizing the Example
Java CLI or the ISAO Studio GUI.
The extraction of the XML mart definition text from the
"smartmart_demodb" database is done with stored procedure
genmartdef(), which takes only the data mart name as parameter.
However, just executing this stored procedure we would get the XML mart
definition text sent as the large object that it is to standard output,
together with the status messages of dbaccess. To avoid this, the
function lotofile() is used to put the large object data in a
specified file. That way the XML text is readily in a file and is not
mixed with the status messages of dbaccess (that continue to appear on
standard output).
The lotofile() function takes as parameter the large object,
the file name (where the added '!' avoids filename extension to a
unique, but otherwise unreadable string) and the specification that the
file should be created in the client's environment (rather than the
server's).
- For our data mart named "smart_iwadb1" we extract the XML
definition text and put it into the local file
smart_iwadb1.xml:
% echo "select lotofile(genmartdef(martname),'smart_iwadb1.xml"'!'"', 'client') \
from iwa_marts where martname='smart_iwadb1';" \
| dbaccess smartmart_demodb -
Database selected.
(expression) smart_iwadb1.xml
1 row(s) retrieved.
Database closed.
%
% ls -l smart_iwadb1.xml
-rw-rw-r-- 1 informix informix 5348 Jul 19 16:41 smart_iwadb1.xml
%
- Using the created XML file, we can now create the data mart. In
the example we use the Example Java CLI to handle the data mart in
the remaining steps:
% java createMart dwa1 smart_iwadb1.xml
createMart dwa1 smart_iwadb1.xml
--------------------------------------------------------------------------
result message:
... The operation was completed successfully. ...
==========================================================================
%
% java listMarts dwa1
listMarts dwa1
--------------------------------------------------------------------------
result message:
... The operation was completed successfully. ...
--------------------------------------------------------------------------
mart list:
<?xml version="1.0" encoding="UTF-8" ?>
<dwa:martList xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa" version="1.0">
<mart name="smart_iwadb1" status="LoadPending" />
</dwa:martList>
==========================================================================
%
- Finally, to make the data mart active and usable, it needs to
be loaded with data:
% java loadMart dwa1 smart_iwadb1 NONE
loadMart dwa1 smart_iwadb1 NONE
--------------------------------------------------------------------------
result message:
... The operation was completed successfully. ...
==========================================================================
%
% java listMarts dwa1
listMarts dwa1
--------------------------------------------------------------------------
result message:
... The operation was completed successfully. ...
--------------------------------------------------------------------------
mart list:
<?xml version="1.0" encoding="UTF-8" ?>
<dwa:martList xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa" version="1.0">
<mart name="smart_iwadb1" status="Active"
memoryConsumptionInMB="10"
lastLoadTimestamp="2011-07-19T15:22:50.756321Z" />
</dwa:martList>
==========================================================================
%
Now our query workload can be accelerated by the new data mart that
we just have created. To verify that acceleration takes place, either
use a setting of 3 for the "use_dwa" session environment and check the
online.log output, or switch on the SQL explain output and look for
the remote query path.
Example messages in the online.log file:
13:13:23 SELECT
SUM(p_quantity) as
13:13:23 Identified 1 candidate(s) AQTs for matching
13:13:23 try AQT iwadb:aqt2cb6f1cf-b584-42e4-8c05-1ff3d0a8d392
13:13:23 matched AQT iwadb:aqt2cb6f1cf-b584-42e4-8c05-1ff3d0a8d392
13:13:23 Matching and offloading successful.
13:13:23 Matching took 32 msec
Example output in the sqexplain.out file:
QUERY: DWA executed:(OPTIMIZATION TIMESTAMP: 07-21-2011 13:20:50)
------
SELECT
SUM(p_quantity) as Quantity,
...
1) iwadb@dwa1:dwa.aqt2cb6f1cf-b584-42e4-8c05-1ff3d0a8d392: REMOTE PATH
Remote SQL Request:
{QUERY {FROM dwa.aqt2cb6f1cf-b584-42e4-8c05-1ff3d0a8d392} {WHER
...
Note: The sqexplain.out file will also contain a section
headed by QUERY: IDS FYI:(OPTIMIZATION TIMESTAMP: ...)
.
This really is only information for comparison, how the query would have
been executed if it were performed by the database server. Do not get
confused by this.
In a later blog entry we will look at some more advanced aspects of
the "SmartMart" feature. These methods are mainly used to augment an
existing data mart definition so that it can serve the acceleration of
specific queries newly added to the workload.
Blog Content/Topics List