With the Internet of Things (IoT) becoming more and more prevalent in areas as different as science, business, services and even our personal lifes, new methods are needed to efficiently use all the data that these 'things' generate. A good part of this data is generated at regular intervals, thus building a time series spanning a larger period of time. Examples are data from meteorological observations, stock trading, clinical patient monitoring, smart meter readings, or even our GPS tracks from last weekends mountain hike. Examples for typical questions on time series data are "what was the electricity consumption at a specific smart meter over a certain period?" or "what was the value of a specific stock at a certain time?" Enabled by the special data storage for time series data, the IBM Informix TimeSeries technology provides very efficient functions to answer such questions. However, warehouse style analytic questions typically involve processing the time series as a whole, for example "what was the yearly average value of all traded stocks combined during the last 50 years?" Such queries are of a different class and still pose a formidable challenge to the Informix TimeSeries technology. This is where the IBM Informix Warehouse Accelerator (IWA) can improve performance significantly.
The combination of in-memory and columnar data storage for data marts lets IWA process time series data very efficiently. The algorithms in IWA are optimized for highly parallel processing of vectors in a multi-core environment, enabling IWA to perform aggregations very fast. And with frequency partitioning used for data compression in IWA, time series data in a data mart usually do not require more space than in the Informix TimeSeries data type in the database server. All this is optimal for analytic queries on time series data. However, to load time series data into an IWA data mart, the data must be accessible like normal data in relational tables. The Informix TimeSeries specific storage and functions for working with the data do not allow this in a straight forward way. Looking at a time series example in Informix's demo database demonstrates how different the Informix TimeSeries data type is and why it cannot be used as is for a data mart in IWA.
The following example from the Informix demo database shows the definition of an Informix TimeSeries data type and how it is used for a table:
create row type meter_data ( tstamp datetime year to fraction(5), value decimal(14,3) ); create table ts_data ( loc_esi_id char(20) not null , measure_unit varchar(10) not null , direction char(1) not null , multiplier timeseries(meter_data), raw_reads timeseries(meter_data), primary key (loc_esi_id,measure_unit,direction) );
Running a normal SQL SELECT statement on the table defined above shows, that the data is not stored in a normal, relational format. Even though the above table in the demo database contains thousands of meter readings, the SELECT returns only 28 data rows. Each data row holds many meter readings in special containers. The output shown below is shortened by '...' for readability:
loc_esi_id 4727354321000111 measure_unit KWH direction P raw_reads origin(2010-11-10 00:00:00.00000), calendar(cal15min), container(raw_container), threshold(0), regular, [(0.092), (0.084), (0.090), (0.085), (0.088), (0.088), (0.085), (0.091), (0.083), (0.094), ... , (1.412)] loc_esi_id 4727354321046021 measure_unit KWH direction P raw_reads origin(2010-11-10 00:00:00.00000), calendar(cal15min), container(raw_container), threshold(0), regular, [(0.041), (0.041), (0.040), (0.041), (0.041), (0.041), (0.055), (0.073), (0.071), (0.068), ... , (0.023)] ... 28 row(s) retrieved.
In order to load the data into an IWA data mart, the data must be available in a normal, relational format, so that it can be requested by IWA with a normal SQL SELECT statement. Fortunately, Informix can accomplish this with the Virtual Table Interface (VTI), which is now also supported for this purpose by IWA. The advantage of using the VTI is, that the data does not need to be stored in the relational format. This would be a waste of storage space. Instead, as the name suggests, the VTI makes the data available in relational format, but virtually without storing it, neither in memory nor on disk. The demo database already has such a virtual table predefined. It was created as follows:
EXECUTE PROCEDURE TSContainerCreate('raw_container', 'rootdbs', 'meter_data', 100, 50); EXECUTE PROCEDURE TSCreateVirtualTab('ts_data_v', 'ts_data', 'origin(2010-11-10 00:00:00.00000),calendar(cal15min), container(raw_container),threshold(0),regular', 0, 'raw_reads');
Using this virtual table in the SQL SELECT statement, the time series data is returned as if it would be stored in normal tables, one data row for each meter reading value:
loc_esi_id 4727354321000111 measure_unit KWH direction P tstamp 2010-11-10 00:00:00.00000 value 0.092 loc_esi_id 4727354321000111 measure_unit KWH direction P tstamp 2010-11-10 00:15:00.00000 value 0.084 ... loc_esi_id 4727354321046021 measure_unit KWH direction P tstamp 2010-11-10 00:00:00.00000 value 0.041 loc_esi_id 4727354321046021 measure_unit KWH direction P tstamp 2010-11-10 00:15:00.00000 value 0.041 ... 241920 row(s) retrieved.
In this format the data can be loaded into an IWA data mart. Since Informix version 12.10 IWA now supports the use of VTI tables directly in a data mart definition. Once such a data mart has been created and loaded with the data, SQL queries selecting the time series data from the VTI table can now be accelerated by IWA.
Once the data has been loaded into the IWA data mart, queries that use the VTI table can be accelerated by IWA. Using a simple query that just gets the number of meter readings in a TimeSeries, we can see the effect of IWA acceleration. In the example, the TimeSeries contains meter readings taken every 15 minutes from 100.000 smart meters over a period of 90 days. Therefore the TimeSeries table contains 100,000 data rows, whereas the VTI table presents almost 900,000,000 million data rows.
Querying the VTI table without acceleration: SELECT COUNT(*) FROM ts_data_v WHERE value >= 0.001; Run time: 1h 12min 52sec Querying the TimeSeries table, using TimeSeries native function countif(): SELECT SUM(countif(raw_reads, 'value >= 0.001')) FROM ts_data; Run time: 15min 40.068sec Querying the VTI table with acceleration enabled: SELECT COUNT(*) FROM ts_data_v WHERE value >= 0.001; Run time: 1.327sec
For this example, IWA is configured on eight nodes of a hardware cluster. This enables IWA to use high parallelism when running the query, resulting in a response time of less than two seconds.