It's been a while since I posted my last entry. I've been meeting with customers quite a bit over the last few months, here in the US, in Europe and in China. I even managed to get caught in the Icelandic cloud along the way. I feel these last 6 months have been some of the most exciting times for Informix. There is huge interest in our next release which is code named Panther. On top of that we have a new executive VP of business development - Rob Thomas, a new enablement team led by Dilip Kikla, and our Informix editions have been completely revamped. In additon to all this a brand new market has opened up for Informix - the smart meters market in energy and utilities.
Earlier this year we had several prototypes in the works with various customers in the E&U field. Since these engagements were spread around the world - the UK, Iceland, and India - no one noticed at first that a trend was building. What was happening is that we were winning each of the opportunities and quite easily at that. The real break came when an article was published on our win in the UK with Hildebrand. With that it became quite apparent to many people that we were a great fit for smart meters.
Based on the information in article several different E&U teams began working with us to understand why Informix was such a good fit for smart metering. The bottom line was that the Timeseries datablade, which no other RDBMS has, was the key. Smart metering is a classic example of the sort of thing that the Timeseries blade was built for - collect massive amounts of time stamped data very quickly and simultaneously run reports and do analysis on that data. This is what smart meters do - they collect data about energy consumption at a residence or business and then periodically send the information to be stored in a database for billing and data mining purposes. Applications that access the data do so in timestamp order. Operations can be as simple as pulling data for a particular meter or could be finding the average daily usage for a particular zipcode or even correlating energy usage with weather data. This was exactly what Oncor, a smart meters provider in Texas, needed. They came to us looking for a way to handle 3.5 million meters and store data for 25 months. Their current solution based on Oracle was handing about 1 million meters and was taking too long to ingest the data and run reports. A proof of concept (POC) was setup and the result was that using Informix plus the Timeseries blade resulted in load times going down from multiple hours to about 18 minute for a days worth of data for all meters. Also, reports that were taking hours to run would complete in 6 minutes with Informix, and in seconds if the data was already cached. Not only that but with the intrinsic disk space savings you get with the Timeseries blade disk space when from 1.3 TB down to about 350GB for 90 days worth of data for 1 million meters. With these kinds of results Oncor became a strong champion of Informix and have been reaching out to their customers to encourage them to have a look at Informix.
In addition to customers in the US other smart meter providers around the world have heard of these results and have begun contacting us. This has led to additional meetings and POCs with customers in the US, the UK, Holland, Denmark, and Germany, and I'm sure many more will follow.
Although this blog concentrates on smart meters there are many other applications that work with time series data within E&U. I'm sure as we get more established in the smart meter business there will be additional opportunities in other areas of E&U.
I wanted everyone to be aware that a continuous availability white paper is being written and you can have your voice heard by going to http://www.advancedatatools.com/Informix/Survey.html and filling out the survey. I would urge all of you to please take the survey, and blog about it if possible. Here is more information about what is being done:
This October 24-28 IBM will be hosting its Information on Demand Conference in Las Vegas and I urge you to come and attend. If you have not been to one of these events its a great opportunity to meet the Informix senior developers as well as the executive staff. In addition there are many hands on labs and sessions to attend. In fact we will also be covering the content of our Panther release at this conference so if you have not been part of the Panther EVP this will be a great opportunity to get a deep dive into the newest features and functions. If you are interested in going you need to hurry to take advantage of the early bird special ($500 off the normal fee) which expires on Aug 31. Here is the registration link: http://www-01.ibm.com/software/data/2010-conference/registration.html
The other thing that we will doing in Las Vegas is conducting our customer advisory council meeting. The CAC meeting is used to discuss a range of topics such as what our future direction should be, discussion of customer solutions, support, documentation, etc... It is also a good opportunity to meet with IBM execs. In the past we have had meetings with Ambuj Goyal, Arvind Krishna, and Alyse Passarelli. If you are interested in becoming part of our customer advisory council please let me or anyone else on our team know. We're always happy to have more passionate Informix advocates in the CAC - it helps us make Informix a better product for everyone and helps you communicate directly with the Informix decision makers.
Following on theme of communicating with IBM executives I wanted to mention that the Informix team had a chance to spend a few days with Martin Wildberger, VP of Information Management Development, last week in Lenexa Kansas. Martin took over from Arvind Krishna last year around October. It was great to find that Martin has the same dedication and devotion to the Informix business as Arvind. If you come to the IOD conference you should seek out Martin (or any of the other IBM execs) and introduce yourself.
I'm adding this entry at 35,000 feet. I'm on the way back from a customer meeting and the first thing I see as I enter the plane is a large WiFi sticker on the slide of the plane. It turns out that Delta now has 1800+ planes flying with WiFi enabled. I paid $12.95 and now I'm able to anything on my laptop that I could have done back home (although I must admit I'm quite a bit more cramped). This may not be that new, but its the first time I've had a chance to do this and I think it is pretty cool.
The other thing I think is pretty cool that I wanted to share is Informix support for time series data. Infomrix is the only relational database that has access methods and functions designed specifically for time series data. In case you were wondering, a time series is a set of related data that changes over time. For instance the stock trades for IBM, or the smart meter energy usage, over time. In a way you can think of a time series as an array only instead of asking for the 1st, 2nd, and 3rd records in the array you can ask for the Jan 1st, Jan 2nd, and Jan 3rd items in the array - in other words an array accessed by time. Typical access to time series data is by time range, meaning queries tend to look at all the data in a time range before moving to another time series.
Informix has taken advantage of this and built storage which is optimized for this kind of access. We cluster data for a particular time series to minimize the number of I/O's needed to retrieve data for a particular series. This means that if you want all the IBM stock trades for Jan 1st, 2nd and 3rd we insure that those pieces of data are clustered together on the same physical disk page.
The other thing we do is a sort of compression to insure the data is small as possible. One of the things we found is that quite often time series data can be sparsely populated. Because of this we only create pages for time series where there is actually data. For instance, if you had stock data for IBM for 2007 and 2009 but not 2008 we will not reserve any space for 2008. Later when that data becomes available we will add it into the series.
Another way we save space is that we do not store any NULL value columns. If you define a time series to hold 6 columns of data and you enter a record that has 2 columns that contain NULL those NULLs will not take any space. This is because we add a header to every record that indicates which columns (if any) are NULL. Relational databases use a value to indicate NULL typically, which means NULLS take the same amount of space as non-NULL values. Since NULL columns tend to be common in time series this can lead to a lot of space savings.
We also save space by not storing the timestamps if we don't have to. We can do this because it is very simple to calculate the timestamp of intervalized data. For example if you are storing smart meter data that comes in every 15 minutes all you need to know is the timestamp of the first entry in the time series. After that it is very ease to do the datetime math to calculate the timestamp for every other entry in the series.
To finish up with space savings, there is also space save by not requiring an "id" attached to each record. For example, if you had a time series of stock data and used a standard relational schema you would have to add some sort of stock id to each record so that you determine what stock the data belogned to. With our time series approach this is not required. We would store the "id" once and then every record would be associated with that entry and not need to have an "id" attached.
We have just completed a POC for a smart meter company and one of the things that drove them to look at Informix and time series was this space savings. In their case they have 3.5 million smart meters each generating a record every 15 minutes, and they want to save 25 months worth of data. Doing the math you get:
3,500,000 * 96 intervals/day * 760 days = 255,360,000,000 (about 255 billion records!)
Right off the bat we are going to save the size of an id which is 8 bytes, the size of a timestamp which is 11 bytes, so 19 bytes times 255 billion is quite a savings. This is pretty much what we saw at the POC - we used about 1/3 the space that Oracle did.
The other savings will be in the number of rows that have to be managed. In the case of time series there is one huge row per meter - so 3.5 million rows vs 255 billion rows. Index maintenance, statistics maintenance, storage management all become quite a problem.
All in all I think this technology is pretty cool and something which we should use more often. In another entry I will go into some detail about the query side of time series which is equally a good story. I'll also talk about how the time series fits with BI queries.