Blog Content/Topics List

The information provided applies to Informix Server and IWA Version 12.10.FC6 and newer.
The new release of Informix Server 12.10.FC6 (GA in November 2015) now offers support on the POWER8 Little Endian Linux platform. While this mainly concerns the Informix database server, the good news regarding the Informix Warehouse Accelerator (IWA) is: This new Informix database server running on POWER8 LE Linux also supports connection to and query acceleration with IWA. As always, IWA runs on the Intel 64-bit Linux platform.
Recently I found this new video on youtube. It nicley demonstrates how easy and quick the installation of IWA is. Check it out.
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to Informix Server and IWA Version 12.10.FC6 and newer.
With the new release of Informix 12.10.FC6 (GA in November 2015) comes the following new feature for the Informix Warehouse Accelerator (IWA), enhancing the information on query tasks.
The utility "ondwa", which is run in the environment of IWA accepts the option "tasks" to display a list of tasks that are currently ongoing in IWA. For each task some general and task specific data is displayed in the output. A feature request from at least one customer asked for more information on tasks of the type QUERY. While it is nice to see in the list of tasks which tasks pertain to queries that are currently being worked on or queued, including how much memory the tasks has in use, it is even better to be able to figure out, from where running queries originate. While on an Informix server there is the possibility to list current sessions with the command "onstat -g ses [ session_id ]", there is no reference in the output of "ondwa tasks" for linking a running query task back to the client session in Informix that issued the query.
Because different Informix server instances can connect to the same IWA, the complete information needed to link back to the origin of a query also needs to identify the Informix server instance. The choice for implementing all this is to use the numerical identifiers for both, as this can be transfered in a rather compact format compared to longwinded things like the actual name of an Informix server instance.
As a result, "ondwa tasks" output now shows the session ID of the client session in the Informix server and the server number of the Informix server instance for running query tasks. The client session ID corresponds to the one displayed by "onstat -g ses" and the server number of the Informix server instance corresponds to the value of the configuration parameter SERVERNUM. Following is an example of this new output format from the "ondwa tasks" command:
TaskManager tracking 3 task(s):
------------------+--------------------------------+---------+---------+--------+------------
Task 410390516044136472 (of type 'QUERY' with name Session 4 @ Servernum 74 - 0:00)
------------------+--------------------------------+---------+---------+--------+------------
Location | Status | Progr. | Upd. ms | Memory | Monitor
------------------+--------------------------------+---------+---------+--------+------------
Primary Node 0 | OPNQRY | 0 | 59 | 163K | Fine
-> Node 1 | Dim 110 | 0 | 3 | 18M | Fine
(Total Memory) | | | | 18M |
------------------+--------------------------------+---------+---------+--------+------------
Used Resources | Mart ID: 0 1 on node 0
------------------+--------------------------------+---------+---------+--------+------------
Task 227150306205499494 (of type 'QUERY' with name Query execution @ coordinator - 0:00)
------------------+--------------------------------+---------+---------+--------+------------
Location | Status | Progr. | Upd. ms | Memory | Monitor
------------------+--------------------------------+---------+---------+--------+------------
Primary Node 0 | Queued | 0 | 160 | 0 | Fine
(Total Memory) | | | | 0 |
------------------+--------------------------------+---------+---------+--------+------------
------------------+--------------------------------+---------+---------+--------+------------
Task 125819314589663232 (of type 'DAEMON' with name DRDADaemon - 12:06)
------------------+--------------------------------+---------+---------+--------+------------
Location | Status | Progr. | Upd. ms | Memory | Monitor
------------------+--------------------------------+---------+---------+--------+------------
Primary Node 0 | Running | 4 | 3078 | 0 | Fine
(Total Memory) | | | | 0 |
------------------+--------------------------------+---------+---------+--------+------------
Used Resources | DRDA device: 'lo' address: '127.0.0.1:21022' on node 0
| Unbound on node 0
------------------+--------------------------------+---------+---------+--------+------------
- End of Tasklist -
The session ID and server number information is sent from the Informix server to IWA at the time of preparing the query statement. Therefore, for queries that are only queued but have not yet reached the prepare phase this new additional information is not displayed.
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to Informix Server and IWA Version 12.10.FC5 and newer.
Since 26th March 2015 Version 12.10.xC5 of the Informix server is electronically available (so-called eGA). The free Developer or Time Limited Edition can be downloaded here.
This new version also contains a few enhancements for the Informix Warehouse Accelerator. Following is a brief description of each of these new enhancements:
-
UNITS operator for MONTH and YEAR
IWA now supports the UNITS operator for MONTH and YEAR. A common situation where this is useful is in date or datetime arithmetics like the following example:
SELECT lead_time + 5 UNITS DAY FROM manufact;
-
Default UNITS DAY
An Informix specialty is that a date arithmetic argument defaults to the unit of days when the UNITS operator is omitted. This is now also supported by IWA. With that, the following query is equivalent to the example query shown above:
SELECT lead_time + 5 FROM manufact;
-
Support for function SQRT
The math function SQRT is now supported. It returns the positive square root of its argument, which must be a non-negative numeric expression. Example:
SELECT SQRT(9) FROM angles;
-
Operators CURRENT and SYSDATE
The CURRENT operator returns a DATETIME value with the date and time of day, showing the current instant. SYSDATE is identical to the CURRENT operator, except that the default precision of SYSDATE is YEAR TO FRACTION(5), while the default precision of CURRENT is YEAR TO FRACTION(3). Example:
SELECT * FROM tab1 WHERE datetime_column < CURRENT;
-
New command "ondwa listmarts"
"listmarts" is a new command for the "ondwa" utility which is run by the IWA administrator in the environment of IWA. It shows information about existing data marts in the accelerators. Example:
$ ondwa listmarts
Found 4 marts:
mart="cq_sqrt", martid="151", accelerator="IDS.dwa1", state="ACTIVE", epoch="1", lastload="2014-12-08 11:23:16.788831"
mart="iwadb_mart1", martid="152", accelerator="IDS.dwa1", state="ACTIVE", epoch="3", lastload="2015-01-09 17:44:40.822332", lastupdate="2015-01-13 16:43:49.048554"
mart="iwadb_mart2", martid="153", accelerator="IDS.dwa1", state="LOAD PENDING", epoch="0"
mart="iwadb_mart2", martid="154", accelerator="IDS.dwa2", state="LOAD PENDING", epoch="0"
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
Jon J Ritson of Oninit Consulting Ltd. has published "Easy IWA", a nice series of articles on how to get started with IWA. Quoting the abstract:
This article provides an overview of the Easy IWA Technical Articles Series. Using the TPCH database schema, Data Generation utilities and sample queries from http://www.tpc.org, this series covers everything to get up and running with IBM Informix Warehouse Accelerator including data generation, creation of an accelerator, loading a Data Mart and running a set of queries.
You find the introduction to "Easy IWA" here:
http://www.oninitgroup.com/easy-iwa-introduction-easy-iwa-technical-articles-series
From the introduction you will find links to the individual articles:
-
Easy IWA: Part 1 Got any data? - The TPCH database, data generation and IBM Informix 12.10.FC4
-
Easy IWA: Part 2 Creating an IBM Informix Warehouse Accelerator under 12.10.FC4
-
Easy IWA: Part 3 Creating and Loading Data Marts under IBM Informix 12.10.FC4 and IBM Informix Warehouse Accelerator
-
Easy IWA: Part 4 Running a selection of TPCH Queries accelerated by IBM Informix Warehouse Accelerator 12.10.FC4
I recommend "Easy IWA" especially to anyone who is interested to see IWA in action, but shy of getting hands-on experience. The step-by-step instructions of "Easy IWA" will help tremendously to get IWA up and running for the first time, and then demonstrate real query acceleration.
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to the new Informix Server and IWA Version 12.10.FC4.
In addition to the support for table synonyms (see last blog entry), version 12.10.xC4 allows the use of views when building a data mart. As with synonyms, a few things need to be considered when including views in data marts.
-
When a data mart includes views, the queries for which the data mart was built should also use these views (rather than the tables on which the views are based). This way, the query matching algorithm can reliably identify the data mart to be used. If queries use the base tables instead of the view, then they may not be eligible for acceleration, because the data mart may not be determined as appropriate for the query.
-
The database system catalog does not contain info on the number of data rows that are produced by a view. Therefore, the algorithms for query matching and query probing are not able to get this information on views and cannot use it to determine the fact table of a query. For queries using views, it is therefore necessary, that these algoritms can determine the fact table from the query syntax only. It may be necessary to add the fact table optimizer directive to the queries to specify the fact table. This optimizer directive takes the form of {+FACT table/view name } .
-
Because it is not possible to create an index on a view, the check during data mart creation cannot find an index to ensure the uniqueness for a view that is the parent in a data mart reference. This would mean, that the preferrable 1:n cardinality references cannot be used in a data mart where the reference parent is a view.
To still allow 1:n cardinality references even with views as parent, the checking for uniqueness can be disabled. This is done by issueing the following SQL statement before creating the data mart:
set environment use_dwa 'uniquecheck off';
The above statement will disable the uniqueness checking for all 1:n cardinality references in the data mart, not just for those where the parent is a view. It is the responsibility of the database administrator to ensure somehow, that the keys of the view used as reference parent are truly unique. Using a 1:n cardinality when the keys in the reference parent are not unique will produce wrong query results!
-
The functionalities of data mart refresh and trickle feed are not supported for data marts that contain views.
While the above restrictions may look inconvenient, they may be a small price to pay for all the new possibilities provided by the support for views in a data mart. For instance, a simple view can be used to map a data type or a function not directly supported by IWA to something that IWA can handle well. Or a view can be used to select only some data of a very big table to be included in the data mart. This may be quite useful when a table contains a lot of historic data, but only a portion of the most recent data is of interest for query acceleration.
On the other hand, defining a view with complex joins of several tables and then using it in a data mart may not be very efficient, because this is what IWA itself is very good at. Certainly, it would cause long load times, because this is where the Informix server will have to do all the processing for the complex view.
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to the new Informix Server and IWA Version 12.10.FC4.
A new functionality in version 12.10.FC4 is the support for table synonyms, so that they can be used to build a data mart. While this may sound trivial enough, when working with synonyms there are a few details to consider, which are explained in this blog entry.
-
In Informix databases synonyms can be created to reference different types of database objects, for example tables, views and sequences. The support for synonyms in data marts is restricted to synonyms that reference real tables. Synonyms that reference views or sequences are not supported. It is still very useful functionality, because the synonyms can reference remote tables. These include tables in a different database (that resides in the same Informix server instance) as well as tables in a remote database (that resides in a different Informix server instance).
-
In order to work correctly, the algorithms for query probing (used to generate a data mart definition from workload analysis) and for query matching (necessary to determine, whether and by which data mart a query can be accelerated) must determine, which is the fact table of the query being considered. Depending on the query syntax, the query parser in the Informix server may be able to determine the fact table. However, if this is not possible, then the table with the most data rows is defined as fact table. Therefore, the algorithms must be able to get the information about the number of rows in a table from the system catalog. This is implemented for tables in Informix server databases with version 12.10.xC4 and newer. It means also, that synonyms referencing tables in remote Informix server instances older than version 12.10.xC4 or in foreign (non-Informix) databases are not supported.
Queries may define the fact table by an optimizer directive of the form {+FACT table_name} . This method cannot be used with a synonym, because the synonym in the query is replaced with the referenced table before the query matching occurs. The result is, that the fact table optimizer directive seemingly contains a table that is not part of the query (because the synonym remained in place in the optimizer directive). In this situation, it is possible that the algorithm will determine the fact table incorrectly by choosing a wrong table from the queries table list, instead of following the optimizer directive.
-
A data mart definition usually contains references that define the relationships between parent and child tables in the star or snowflake schema of the data mart. An important attribute of a reference in a data mart is the cardinality, which can either be 1:n if the key columns of the parent table uniquely identify the data rows, or n:m otherwise. For better performance and lower memory consumption, references with a 1:n cardinality are much preferred. Usually, the required uniqueness of the parent table is enforced by a unique constraint, unique index or primary key in the database. When creating a data mart with 1:n cardinality references, the existence of corresponding indexes is checked for the parent tables. In case the parent table is a synonym, the check is done for the referenced table. However, due to the implementation of the necessary index information in the system catalog, this information can be retrieved only for remote tables in databases within the same Informix server instance, but not for tables in a remote database. With this, references with 1:n cardinality for parent tables in remote databases would not be allowed, because the existence of the required index cannot be checked.
In order to allow 1:n cardinality references even with a parent table in a remote database, it is possible to skip the checking for the index. When doing this, it is the responsibility of the database administrator to ensure somehow, that the keys of the parent table in the remote database are truly unique. Using a 1:n cardinality reference when the keys in the parent table are not unique will produce wrong results for queries accelerated with such a data mart!
The checking for parent table indexes that ensure the required uniqueness can be disabled by issueing the following SQL statement before creating the data mart:
set environment use_dwa 'uniquecheck off';
The above statement will disable the uniqueness checking for all 1:n cardinality references in the data mart, not just for those where the parent table is a synonym referencing a table in a remote database.
-
When the index information for a synonym cannot be checked, because the referenced table is in a remote database (see also above), then this has consequences for query probing and the subsequent generation of the data mart definition with the stored procedure ifx_probe2mart(). Every reference in the generated data mart definition must contain the cardinality for the parent and child table. The cardinality of a parent table can be defined as 1 only, if ifx_probe2mart() can retrieve the information on the corresponding index, to ensure that the keys in the parent table are unique. For tables in remote databases, referenced by synonyms, ifx_probe2mart() will therefore always create a n:m cardinality reference rather than a 1:n cardinality reference. Unlike for the checking during data mart creation, this behavior of ifx_probe2mart() cannot be altered by using "set environment use_dwa 'uniquecheck off';".
When creating the data mart definition via ifx_probe2mart(), the only possibility to still get a 1:n reference for a parent table in a remote database is to first output the data mart definition to an XML file. This file can then be edited manually to change the generated n:m cardinality references to the desired 1:n cardinality. After this, the edited XML file is used to create the data mart. It must be noted (again), that it remains the administrators responsibility to ensure the uniquness of the keys in the remote parent table. If these keys are not unique in the data mart, the data mart will produce wrong results.
-
The functionalities of data mart refresh and trickle feed are not supported for data marts that contain synonyms.
Similar considerations apply when using views, another new functionality of version 12.10.xC4 which will be discussed in the next blog entry.
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to Informix Server and IWA Version 12.10.FC1 and newer.
The routines for data mart administration are implemented in the Informix server to run on their own virtual processor, the DWA-VP. This DWA-VP normally gets started automatically as soon as the first administrative operation is performed. Most of these administration commands run so fast, that a single DWA-VP is sufficient to finish all operations without delay.
An exception to this is the loading of a data mart, which can take considerably longer, depending on the amount of data to be loaded. Moreover, the load processing keeps the DWA-VP so busy with transferring the data to IWA, that concurrent administration commands for different data marts can be noticeably delayed. For example a command to list the existing data marts while one data mart is being loaded, may get blocked by the ongoing data load, possibly even until the load has completed. Additionally, it is possible that the newly loaded data mart is listed with status "LoadInProgress", even though the load finished before the data mart listing is output to the user. From the point of view of the internal implementation this status may be correct, but for a user it is not really comprehensible, because at the time when the answer arrives this status is already obsolete. The status already has changed to "Active", because the data load finished in the meantime.
To avoid the delay of administrative commands in such situations, the Informix server administrator can manually start an additional DWA-VP. This is done with the following command:
onmode -p +1 dwavp
The same rules apply as for the other classes of virtual processors (VPs) that are started in this way. The above command starts the additional DWA-VP dynamically (i.e. immediately), but does not make it permanent in the configuration file of the Informix server. After a restart of the Informix server the command needs to be run again. If there should always be two DWA-VPs running, without the need to issue the above command, then the following line can be added to the configuration file of the Informix server:
VPCLASS dwavp,num=2
For completeness it should be noted, that it is not sensible to run several administration commands concurrently for the same data mart.
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to the new Informix Server and IWA Version 12.10.FC3.
With the Internet of Things (IoT) becoming more and more prevalent in areas as different as science, business, services and even our personal lifes, new methods are needed to efficiently use all the data that these 'things' generate. A good part of this data is generated at regular intervals, thus building a time series spanning a larger period of time. Examples are data from meteorological observations, stock trading, clinical patient monitoring, smart meter readings, or even our GPS tracks from last weekends mountain hike. Examples for typical questions on time series data are "what was the electricity consumption at a specific smart meter over a certain period?" or "what was the value of a specific stock at a certain time?" Enabled by the special data storage for time series data, the IBM Informix TimeSeries technology provides very efficient functions to answer such questions. However, warehouse style analytic questions typically involve processing the time series as a whole, for example "what was the yearly average value of all traded stocks combined during the last 50 years?" Such queries are of a different class and still pose a formidable challenge to the Informix TimeSeries technology. This is where the IBM Informix Warehouse Accelerator (IWA) can improve performance significantly.
The combination of in-memory and columnar data storage for data marts lets IWA process time series data very efficiently. The algorithms in IWA are optimized for highly parallel processing of vectors in a multi-core environment, enabling IWA to perform aggregations very fast. And with frequency partitioning used for data compression in IWA, time series data in a data mart usually do not require more space than in the Informix TimeSeries data type in the database server. All this is optimal for analytic queries on time series data. However, to load time series data into an IWA data mart, the data must be accessible like normal data in relational tables. The Informix TimeSeries specific storage and functions for working with the data do not allow this in a straight forward way. Looking at a time series example in Informix's demo database demonstrates how different the Informix TimeSeries data type is and why it cannot be used as is for a data mart in IWA.
The following example from the Informix demo database shows the definition of an Informix TimeSeries data type and how it is used for a table:
create row type meter_data (
tstamp datetime year to fraction(5),
value decimal(14,3)
);
create table ts_data (
loc_esi_id char(20) not null ,
measure_unit varchar(10) not null ,
direction char(1) not null ,
multiplier timeseries(meter_data),
raw_reads timeseries(meter_data),
primary key (loc_esi_id,measure_unit,direction)
);
Running a normal SQL SELECT statement on the table defined above shows, that the data is not stored in a normal, relational format. Even though the above table in the demo database contains thousands of meter readings, the SELECT returns only 28 data rows. Each data row holds many meter readings in special containers. The output shown below is shortened by '...' for readability:
loc_esi_id 4727354321000111
measure_unit KWH
direction P
raw_reads origin(2010-11-10 00:00:00.00000), calendar(cal15min),
container(raw_container), threshold(0), regular,
[(0.092), (0.084), (0.090), (0.085), (0.088), (0.088),
(0.085), (0.091), (0.083), (0.094), ... , (1.412)]
loc_esi_id 4727354321046021
measure_unit KWH
direction P
raw_reads origin(2010-11-10 00:00:00.00000), calendar(cal15min),
container(raw_container), threshold(0), regular,
[(0.041), (0.041), (0.040), (0.041), (0.041), (0.041),
(0.055), (0.073), (0.071), (0.068), ... , (0.023)]
...
28 row(s) retrieved.
In order to load the data into an IWA data mart, the data must be available in a normal, relational format, so that it can be requested by IWA with a normal SQL SELECT statement. Fortunately, Informix can accomplish this with the Virtual Table Interface (VTI), which is now also supported for this purpose by IWA. The advantage of using the VTI is, that the data does not need to be stored in the relational format. This would be a waste of storage space. Instead, as the name suggests, the VTI makes the data available in relational format, but virtually without storing it, neither in memory nor on disk. The demo database already has such a virtual table predefined. It was created as follows:
EXECUTE PROCEDURE TSContainerCreate('raw_container',
'rootdbs', 'meter_data', 100, 50);
EXECUTE PROCEDURE TSCreateVirtualTab('ts_data_v', 'ts_data',
'origin(2010-11-10 00:00:00.00000),calendar(cal15min),
container(raw_container),threshold(0),regular', 0,
'raw_reads');
Using this virtual table in the SQL SELECT statement, the time series data is returned as if it would be stored in normal tables, one data row for each meter reading value:
loc_esi_id 4727354321000111
measure_unit KWH
direction P
tstamp 2010-11-10 00:00:00.00000
value 0.092
loc_esi_id 4727354321000111
measure_unit KWH
direction P
tstamp 2010-11-10 00:15:00.00000
value 0.084
...
loc_esi_id 4727354321046021
measure_unit KWH
direction P
tstamp 2010-11-10 00:00:00.00000
value 0.041
loc_esi_id 4727354321046021
measure_unit KWH
direction P
tstamp 2010-11-10 00:15:00.00000
value 0.041
...
241920 row(s) retrieved.
In this format the data can be loaded into an IWA data mart. Since Informix version 12.10 IWA now supports the use of VTI tables directly in a data mart definition. Once such a data mart has been created and loaded with the data, SQL queries selecting the time series data from the VTI table can now be accelerated by IWA.
Once the data has been loaded into the IWA data mart, queries that use the VTI table can be accelerated by IWA. Using a simple query that just gets the number of meter readings in a TimeSeries, we can see the effect of IWA acceleration. In the example, the TimeSeries contains meter readings taken every 15 minutes from 100.000 smart meters over a period of 90 days. Therefore the TimeSeries table contains 100,000 data rows, whereas the VTI table presents almost 900,000,000 million data rows.
Querying the VTI table without acceleration:
SELECT COUNT(*) FROM ts_data_v WHERE value >= 0.001;
Run time: 1h 12min 52sec
Querying the TimeSeries table, using TimeSeries native function countif():
SELECT SUM(countif(raw_reads, 'value >= 0.001')) FROM ts_data;
Run time: 15min 40.068sec
Querying the VTI table with acceleration enabled:
SELECT COUNT(*) FROM ts_data_v WHERE value >= 0.001;
Run time: 1.327sec
For this example, IWA is configured on eight nodes of a hardware cluster. This enables IWA to use high parallelism when running the query, resulting in a response time of less than two seconds.
Blog Content/Topics List
|
Blog Content/Topics List
The draft version of a new IBM Redbook on the Informix Warehouse Accelerator has just been made available:
Query Acceleration for Business using Informix Warehouse Accelerator - A draft IBM Redbooks publication
Even before the actual publishing date you can access the complete draft book (as PDF file of 4 MB) at the following URL:
http://www.redbooks.ibm.com/redpieces/abstracts/sg248150.html?Open
You also have the opportunity to give your rating for the draft book!
Please use the link Rate this Draft on the right side of the above access page. The rating is simple, just give one to five stars for each of 4 categories and provide a short comment on the book. (No login, etc. required. But the rating will also help us in the future to do new Redbooks on the Informix product family.)
Quote from the Book Abstract:
This book provides a comprehensive look at the technology and architecture behind the system. It contains information about the tools, data synchronization, and query processing capabilities of Informix Warehouse Accelerator, and steps to implement data analysis by using Informix Warehouse Accelerator within an organization.
Table of content:
-
Introduction to Informix Warehouse Accelerator
-
Designing data marts
-
Designing and sizing an Informix Warehouse Accelerator environment
-
Informix Warehouse Accelerator installation and configuration
-
Creating Informix Warehouse Accelerator data marts
-
Query execution
-
Managing Informix Warehouse Accelerator data mart
-
Informix Warehouse Accelerator server administration
-
Use of Cognos with Informix Warehouse Accelerator
-
Proof of concept for Informix Warehouse Accelerator
-
Appendix: Tools for Informix Warehouse Accelerator
Authors:
-
Whei-Jen Chen
-
Nigel Campbell
-
Martin Fuerderer
-
Frederick Ho
-
Holger Kirstein
-
Keshava Murthy
-
Ramanathan Sivaguru
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to the new Informix Server and IWA Version 12.10.FC2.
The latest Release 12.10.xC2 of Informix Server and the Informix Warehouse Accelerator (IWA) a few days ago contains two new features. Here is a brief description:
For detailed information on how to use external tables in a data mart, please refer to the topic Load data from external tables in the manual.
-
New functions in the IWA Admin SQL-API to handle TimeSeries data
The amount of TimeSeries data often is huge, and new data gets added almost continuously (e.g. new smart meter readings may arrive as frequently as every 15 minutes). But often it is not necessary to have the complete history of TimeSeries data available for analysis. Many times, only a certain period of time (e.g. the last 24 or 12 months, the last quarter, the days of the current year, etc.) is of interest. For the data mart, this means that new data should be added frequently, and perhaps old data should be removed at the same time.
The new functions in the IWA Admin SQL-API use the concept of a 'time window' that limits the TimeSeries data to be included in a data mart. This 'time window' can be defined either by using a TimeSeries calendar, or by manually specifying begin and end times. Based on the defined 'time window' the TimeSeries data can be refreshed in the data mart. The new functions are:
-
ifx_TSDW_setCalendar()
-
ifx_TSDW_updatePartition()
-
ifx_TSDW_createWindow()
-
ifx_TSDW_dropWindow()
-
ifx_TSDW_moveWindow()
For detailed information on these functions, please refer to the topic SQL administration routines sorted by task in the manual.
Besides the above reference descriptions, the manual also contains two extensive examples that demonstrate how these functions can be used in practice:
-
Support for external tables in a data Mart
Data that is interesting to analysis might not always be stored in the data warehouse or the OLTP database that serve as basis for data marts. Structured data from diverse sources can be integrated by using the external table capabilities of Informix. With Veriosn 12.10.xC2 it is now possible, to use such data in external tables in an IWA data mart for query acceleration. This has several advantages:
-
Data in existing external tables does not need to be loaded into normal tables.
-
Structured data in ASCII-delimited flat files can be loaded across platforms.
-
When loading a data mart, data of a single external table can be read in parallel from multiple files.
-
By using named pipes for external tables, storing the data in a file (on disk) can be avoided.
-
When using named pipes, additional pre-processing of the data (transforming, cleansing, etc.) can be done without storing the intermediate results on disk.
-
Utilization of external and normal tables can be mixed in a data mart as well as in queries.
-
Accelerated queries can access more than only one external table in a single query.
Creating a data mart by workload analysis and query probing (aka "SmartMart" feature) is supported for external tables. Performance of loading data from external tables into a data mart equals the load performance with normal tables.
Also supported is the functionality to load partitions of tables into an existing data mart. This can be used with external tables based on pipes to load new data. From the external side, just the new data is fed into the pipe. As only the new data appears in (the pipe of) the external table, ifx_loadPartMart() can be called to load all the data of the external table, which thus will add the new data to the data mart.
With the external data actually loaded into the data mart, IWA has all the data available for query execution. This means that IWA is not restricted to queries that use only a single external table, as it is the case for the Informix server. Instead, the whole data mart can be created from external tables only, and acceleratable queries are free to use all the external tables as if they were normal tables in the Informix server, joining them seamlessly. In fact, this makes IWA capable of universially accelerating analytic queries on structured data from almost any source, without the requirement to actually store the data on disk (in the Informix server).

Informix server and IWA used to accelerate queries on external data
Because Informix Server has no control over the data in an external table and how it may change, some restrictions apply when using external tables in a data mart:
-
Uniqueness of the primary key values in a parent table (a prerequisite for 1:n references) cannot be guaranteed by the Informix server. The responsibility for this is with the DBA, who can create a disabled index on external tables. This will then permit 1:n references in the data mart.
-
The Informix optimizer needs to determine the fact table of queries, both for query probing as well as query matching for acceleration. The optimizer might need the number of data rows in a table for this task (provided by the table statistics). To facilitate determining the fact table for external tables, the (approximate) number of rows must be specified when creating the external table, using the NUMROWS option. (Alternatively, the fact table can be specified explicitly for queries using optimizer directives.)
-
There is no locking for external tables. Therefore data mart load operations must use the lockmode NONE if an external table is part of the data mart.
-
If columns of data types DATE or MONEY of an external table are to be included in a data mart, then the format of these columns must be specified when creating the external table. This is necessary for the data mart loading operation to correctly interpret the data from the external table. E.g. for a DATE column with values of the format 2013-06-30 in the external table, DBDATE 'Y4MD-' must be specified as table option.
-
Because the Informix server has no control on the data in an external table, it cannot collect any statistics on how it is changed. Therefore, features automatic data mart refresh and trickle feed are not supported for data marts containing data from external tables.
Reminder: The URL to the now updated manual IBM Informix Warehouse Accelerator Administration Guide, Version 12.10 is:
https://pic.dhe.ibm.com/infocenter/informix/v121/topic/com.ibm.acc.doc/acc.htm
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The configuration tips for a first-time setup apply to the Informix Server Version 11.70.FC7 and later and to Version 12.10.FC1. Based upon the background discussed in the last blog entry New Configuration Tips (1), we will now take a closer look at:
IWA Configuration on a SMP System
Probably the first consideration for configuring IWA on a SMP system is to determine, whether IWA will use the system exclusively or share it with any other application(s). For simplicity and better understanding, the following description assumes that IWA uses the SMP system exclusively. If IWA should share the system with other applications, the allocation of resources will have to be adapted accordingly.
The major characteristic of an SMP machine is, that all resources, CPU and memory, are accessible by all processes, enabling each IWA node to fully utilize available resources. As described before, the architecture of IWA requires a minimum of two IWA nodes also on a SMP system, a coordinator node and a worker node. Each of these has its specific tasks to perform, therefore this role separation is also useful on a SMP system. With the multithreading built into each IWA node, the single coordinator node as well as the single worker node are both capable of performing tasks in parallel by utilizing multiple CPUs and shared memory (SHM) among the threads of an IWA node. Using the values of configuration parameters CORES_FOR_LOAD_THREADS_PERCENTAGE and CORES_FOR_SCAN_THREADS_PERCENTAGE and based on the number of existing CPU cores, each IWA node will determine, how many parallel threads it will start for the particular tasks. With this, a single coordinator node and a single worker node will be sufficient for most IWA installations on a SMP system. The NUM_NODES configuration parameter should remain at the default value of 2. The coordinator node will share resources of the SMP system with the single worker node. This applies to memory as well as CPU resources. Therefore, if the configuration parameters for CPU resources are specified, they need to be set accordingly.
Increasing the parameter NUM_NODES for IWA on a SMP system should be considered very carefully. On a SMP system with many CPU cores (e.g. 64 or above), it may turn out, that the single IWA worker node starts so many parallel threads, that the overhead of synchronizing them becomes a bottleneck. Only in such a case may IWA performance benefit from configuring multiple worker nodes by increasing the value of NUM_NODES even on a SMP system. At the same time, the values of parameters CORES_FOR_LOAD_THREADS_PERCENTAGE and CORES_FOR_SCAN_THREADS_PERCENTAGE should be lowered accordingly, as the parallel threads of multiple worker nodes will compete for available CPU resources. As a result of the configuration change, each worker node should start less threads, decreasing the synchronisation overhead between them. Parallelism will still be at a high level as the coordinator node will distribute the work among the multiple worker nodes. All this, however, comes at the high price of increased memory consumption. As each worker node has its own part of SHM, dimension table data will be duplicated into the SHM portion of each worker node. This may be acceptable for small dimension tables, but on a system with big dimension tables available memory may not be sufficient. Adding to this increase of needed SHM is also the increase of temporary (private) memory needed for query acceleration. The threads of multiple worker nodes will have (at least portions of) dimension table data in their memory during query processing. Last but not least, changing NUM_NODES will require that existing data marts be dropped, re-created and re-loaded, so that the fact table data can be distributed correctly among the configured number of worker nodes. After having increased NUM_NODES, it may be a good idea to test queries of a representative workload to verify they still can be accelerated without failure due to memory limitation.
There is always one coordinator nodes that can access all available resources on a SMP system.
For the configuration of IWA SHM, parameter WORKER_SHM is the most important, as it mainly specifies how big the data marts can be. Though the configured amount of memory is not allocated initially, SHM for data marts will be allocated as data marts get created and loaded with data. If the limit of WORKER_SHM is reached, no more data can be loaded. As IWA threads also use temporary (private) memory additionally to the SHM, for data load processing as well as query acceleration, WORKER_SHM must be configured to leave enough space for these additional memory requirements. On the SMP system the worker node also has to share resources with the coordinator node. The threads of the coordinator node also use private memory for final processing of results. For generally small result sets, this may be almost negligible. But for large result sets, even when only selecting the first 10 data records of it, the amount of memory needed by the coordinator for result processing may be considerable. Hence, this also needs to be accounted for when determining the value for WORKER_SHM. To maximize memory and CPU utilization it may be worth considering different application scenarios.
By default, the configuration parameters for CPU utilization are set equally for both tasks, data loading and query acceleration. This should be optimal for an application scenario, where loading data marts never concurs with ongoing query acceleration, as all available resources can be given to both tasks. E.g. if there is only a single data mart, there is no query acceleration while the data mart is loaded with data, as the data mart is not available during this time. It will go into state "Active" only after the load is complete. Hence data loading and query acceleration exclude each other and the tasks will not compete for resources. Thus the setting of WORKER_SHM does not need to account for the possibility of temporary memory needed for both tasks at the same time.
In an application scenario where certain data marts are loaded while other data marts are in use for continued query acceleration, it may be possible, that query acceleration performance suffers noticably during the phase of concurrent data loading. In such a specific case it may be desirable to give query acceleration preference over data mart loading tasks, and configuration parameters CORES_FOR_SCAN_THREADS_PERCENTAGE and CORES_FOR_LOAD_THREADS_PERCENTAGE can be given different values. When specifying these parameters it should be noted, that both parameter values are effective per IWA node. Data load concurring with query acceleration can also mean, that more temporary (private) memory will be allocated, and WORKER_SHM may need to be lowered accordingly to accommodate this.
Whether query acceleration requests can arrive concurrently or are always strictly serial, can make another difference of application scenarios. Though IWA executes queries serially, there can be a certain overlap between the data scanning by a worker node and the final result processing (like order by or group by) by the coordinator node. Worker nodes can start processing a new query while the coordinator node is still busy with final result processing of the previous query and relaying such results to the database server. If query acceleration requests arrive fast enough for this overlap to occur, this can mean that worker threads as well as coordinator threads need to allocate and use their temporary (private) memory at the same time. A lower value for WORKER_SHM may be necessary to satisfy this. In an extreme case, temporary (private) memory of the worker node and that of the coordinator node can both be as big as the biggest data mart. To be on the safe side for successful query execution, WORKER_SHM would need to be set accordingly.
Blog Content/Topics List
Modificado el por martinfu 060000BPWC
|
Blog Content/Topics List
The information provided applies to the new Informix Server and IWA Version
12.10.FC1.
The latest Release of Informix Server and the Informix Warehouse
Accelerator (IWA) a few days ago contains quite useful new features
that will be described briefly in the list below.
( The new URL to the official Informix Warehouse Accelerator Version 12.10
manual is:
https://pic.dhe.ibm.com/infocenter/informix/v121/topic/com.ibm.acc.doc/acc.htm )
- Two new methods to refresh data in an existing data mart without
having to fully reload the complete data set into the data mart. Both
methods have the advantages, that compared to a full reload, less data
is moved and the data mart remains available for query acceleration
during the refresh operation.
- Automatic Partition Refresh (aka RefreshMart) checks in
the warehouse database for changed partitions. These are either
whole unfragmented tables or fragments of a fragmented table. Only
the data of partitions that have been changed will be loaded
again into the data mart. In addition, fragments detached from
tables in the warehouse database will have their corresponding
partition dropped from the data mart. Likewise, fragments newly
attached to tables in the warehouse database will have their
corresponding partition loaded into the data mart. This method
is most effective for warehouse databases containing fragmented
tables, where data is removed by detaching older fragments and
new data is added by attaching new fragments.
- Trickle Feed can be set up for a data mart to continuously
feed the data that gets inserted into the fact tables in small
portions (as a trickle) to the data mart. This is most useful
for warehouse databases where accelerated queries should utilize
the newest data, assuming that most of the data changes in
the warehouse database are inserts to fact tables. Other data
changes to the fact tables, i.e. updates and deletes, are not
propagated by trickle feed to the data mart. Optionally, the
trickle feed can be set up in a way to also perform partition
refresh of dimension tables in defined time intervals.
- Integration of TimeSeries data with IWA
Informix TimeSeries data normally is stored in the database in a
special format that maximizes space utilization and performance of
TimeSeries specific queries using the TimeSeries Datablade©
functions. To speed up general analytic queries that are not using
the TimeSeries specific syntax, but still should use the TimeSeries
data, IWA can now be utilized. The TimeSeries data can be loaded
into a data mart via the Virtual Table Interface.
- Enhanced support for SQL Elements
- Multiple queries combined with UNION or UNION ALL
into a single SELECT statement are now considered for acceleration.
Qualifying queries will be accelerated and their results will be
combined with results from the queries executed locally by the
Informix Server.
- When using the ANSI SQL Standard OLAP window functions,
now supported by the Informix Server, IWA can accelerate the underlying
SELECT, JOIN and PROJECT operations of the query. Before handing their
results to the user, Informix Server will perform the OLAP and ORDER BY
portions of the query.
- The new keywords NULLS FIRST or NULLS LAST as
options in the ORDER BY clause are now supported by IWA. Previously
NULL was always considered the lowest value and appeared in the sorting
order accordingly.
- Additional query syntax, that now can be accelerated by IWA:
- Aggregate functions inside a CASE construct
- Multiple DISTINCTs
- COUNT of DISTINCT with CASE construct
- Integration with the IBM OpenAdmin Tool (OAT)
IWA and its data marts can now be administered using OAT Version
3.11. Tasks supported by OAT include creating an accelerator,
monitoring it, as well as creating, loading, monitoring and dropping
data marts. You can get OAT Version 3.11 from
http://www.openadmintool.org
.
- New SQL routines for IWA and data mart administration
IWA includes new procedures and functions that can be called
from any SQL client application. The names of these new routines
start with the prefix "ifx_", e.g. "ifx_createMart()". Usage of
these routines is simplified, as no handling of CLOB data is
needed.
- Privilege control for data mart administration
From now on, users need to have required privileges to be able
to perform IWA and data mart administrative tasks. By default
user "informix" has these privileges. Other users need to have
the DBA role or must be granted the WAREHOUSE privilege explicitly.
The latter is done by running the "grant admin" command with the
argument "WAREHOUSE" via the SQL administration API.
- New utility "ondwachk"
This new utility script is used to check, that the setup and
configuration of the Informix Server instance is valid for
accelerating queries using IWA. As such, the utility is part of the
Informix Server installation, located in $INFORMIXDIR/bin , and run
from within the Informix Server instance's environment. It will
quickly check a list of prerequisites for successful connection
to IWA, like the matching of IWA and Informix Server versions,
existance and configuration of a SmartBLOB Space, an appropriate
listener thread is running, a DWA VP is started or can be started
automatically, etc.
Blog Content/Topics List
|
Blog Content/Topics List
Guy Lohman, Manager, Disruptive Information Management Architectures at
IBM's Almaden Research Center in California, explains in this new video
the Blink technology, developed by his group, and how it works at the
core of the Informix Warehouse Accelerator. In the 12 minutes of the
video, Guy Lohman also illustrates how the Informix Warehouse
Accelerator is perfectly suited for existing Informix Server
environments, complementing them in a unique way with this radically
progressive technology.
Watch the video on youtube:
http://www.youtube.com/watch?v=7Wy3uadexe8
|
Blog Content/Topics List
Read the independent Ovum Market Research report on the Informix
Warehouse Accelerator completed in July by analyst Madan Sheina:
"Informix Accelerates Analytic Integration into OLTP".
The 4-page document will be available as PDF file until 3.
September 2013: download.
|
Blog Content/Topics List
A good how-to description of all the steps to be performed is already provided as the topic Configuring Informix Warehouse Accelerator on secondary servers in a high-availability environment in the IBM Informix Warehouse Accelerator Administration Guide at the IBM Informix 11.70 Information Center.
Rather than repeating the content of the manual, I'll try here to give some background information.
- Why use a secondary server with the Informix Warehouse Accelerator?
- For query acceleration:
Within a high-availability (HA) environment of Informix servers, there is a primary server that is best suited for OLTP workload, where data is frequently changed, inserted or deleted. The primary server has direct access to the disks in order to perform all required data changes directly. A secondary server in such a HA environment can also perform changes to data (update, insert, delete), but in order to do so, it always needs to route the change requests to the primary server to get them executed and then replicated to all the secondary servers of the HA environment.
Therefore, a secondary server is more suited to performing queries that change only very little or no data at all. This is exactly the case for typical data warehouse queries comprising an OLAP workload. It is therefore quite desirable to let a secondary server perform such OLAP workloads and still benefit from acceleration by the Informix Warehouse Accelerator. With Release 11.70.FC5 this is now possible.
It could be argued, that also the primary server would not be strained by accelerated queries as they get executed by the accelerator rather than the Informix server. However, an OLAP workload, e.g. for a specific report, can contain queries dealing with temporary tables to be created and used on the fly. Such queries cannot be accelerated, but with according configuration can be executed on a secondary server without burdening the primary server. In such cricumstances, using the secondary server is still beneficial to the overall workload distribution in the HA environment.
- For data mart administration:
Furthermore, loading data into a data mart in the accelerator is a demanding task for the database server, as all the data must be read from disk and sent over to the accelerator. Nevertheless, no changes to the user data need to occur in the database server. Only tiny amounts of meta data need to be created when creating a data mart, and even less such meta data needs to be changed after a data mart was loaded successfully, to reflect the new state "Active" for the data mart. Therefore, also data mart administration, including the loading of a data mart with the data, is suited very well for a secondary server.
- Necessary configuration
- Updates on the secondary server:
In order to be able to perform the required changes of the small amounts of meta data, the secondary server still needs to be an "updateable secondary" server. Hence, the secondary server must be configured accordingly with the parameter UPDATEABLE_SECONDARY in its onconfig file. Strictly speaking this is not necessary for pure query acceleration only. Nevertheless, it is recommended that every secondary server connecting to the accelerator be an "updateable secondary" server, to avoid errors and confusion when trying to do some administrative action for the accelerator, e.g. after an HA-failover has happened.
- Virtual Processor for acclereator administration:
For the same reason, the secondary server must be configured with the VPCLASS parameter to run the required virtual process (the "dwavp") handling administrative commands, and have access to the SmartBLOBSpace required as a staging area of user input and output for such administrative commands. If configuration parameters were changed or added, then the secondary server needs to be restarted to make these effective. With that, the secondary server basically is ready.
(For examples and detailed how-to on the configuration tasks please see the manual.)
- Connecting the secondary server to the accelerator
The only thing missing now is an actual connection to the accelerator. The necessary connection information is stored as a group entry in the sqlhosts file of the respective Informix server instance, where the accelerator name is represented as the group name.
Example for an accelerator named "dwa1":
dwa1 group - - c=1,a=524e61686a285e6368325d767...
dwa1_1 dwsoctcp 127.0.0.1 21022 g=dwa1
Here the long string following the option "a=" is the encoded authentication token, that the database server uses when sending any requests to the accelerator. If all database servers of the HA environment to be connected to the accelerator share the same sqlhosts file (e.g. via shared disk or file system), then it is sufficient to have once established a connection to the accelerator from one of the database servers. All other database servers in the HA environment will then see the automatically created entry in the sqlhosts file and be able to utilize this. Thus even when the primary server previously had already established a connection to the accelerator, this will be sufficient for the secondary servers as they will see the corresponding entry in the shared sqlhosts file. However, if the sqlhosts file is not shared, then the connection still should be established only once (if not already done) and the resulting sqlhosts entry of that database server should only be copied manually to the sqlhosts files of the other servers in the HA environment. Two cases are possible that need slightly different handling: - The primary server of the HA environment already has an established connection to the accelerator:
The connection information for this connection should also be used by the secondary server(s) intended to connect to the accelerator. As the sqlhosts file is not shared, the respective group entry needs to be copied manually to all secondary servers to be connected to the accelerator. - None of the servers in the HA environment has yet a connection to the accelerator:
A new connection from one of the chosen servers must be established to the accelerator as described in the manual. Once this is done, the created sqlhosts entry needs to be copied manually to the sqlhosts files of the other servers.
What happens, if a new connection is done even though another server already has an established connection? If one database server of the HA environment already has an established connection to the accelerator, but still a new connection is established from a different database server (with a different sqlhosts file), then the existing connection of the first database server will be invalidated. This is because upon the new connection request, the accelerator server will generate a new authentication token for the same accelerator name. This will be stored correctly in the sqlhosts file of the database server requesting that new connection. But without sharing the sqlhosts file, the first database server will have no knowledge of the new authentication token. Its sqlhosts file will still have the old authentication token, that has just been invalidated. When attempting to use this old authentication token, the accelerator will reject any request. Therefore, when not sharing the same sqlhosts file among the database servers of a HA envrionment, some extra care may be needed regarding the handling of the accelerator's connection info and distributing it among the database servers involved. The same is true when only renewing just the authentication token, as in some envrionments there will be a requirement to do this regularly.
Once all chosen database servers of the HA environment have access to the accelerator, it is probably good practice to determine one of them as the "master" for all administrative actions regarding data marts in the accelerator, and utilize with other connected database servers only the query acceleration. This will prevent confusion that could easily ensue when data marts are manipulated concurrently from different database servers. E.g. when from one database server a data mart is being loaded while at the same time from a different database server a status change is attempted for the same data mart.
Blog Content/Topics List
|