There was a recent InformationWeek webcast sponsored by IBM on the topic of
Combining OLTP and Extreme Data Warehouse Performance in which Lester Knutsen of ADT and I discussed the Informix Warehouse Accelerator as part of the release for the Informix Ultimate Warehouse Edition. The webcast was attended by over 350 people and comments were overall quite positive. In this blog entry, I will attempt to answer the questions that were submitted. I consolidated some of the duplicate questions. Questions are answered in no particular order.
For a replay of the webcast, you can go to the following link:
For more information about the product, e.g. white paper and other collateral, you can use either of the following links:
Note that in the upcoming International Users’ Group conference held in Overland Park, Kansas, there will be additional dedicated sessions on this topic. For more information on IIUG Conference for 2011, please refer to the link:
Q: What is MOLAP/ROLAP/HOLAP?
A: MOLAP stands for Multi-dimensional OLAP, ROLAP stands for Relational OLAP, and HOLAP stands for Hybrid-OLAP.
MOLAP is the most common of the three in building multi-dimensional “cubes” which is essentially summary tables at the BI tools layer. Cognos uses a product called PowerPlay for its MOLAP product, and Hyperion has a similar thing. Once these cubes are built, queries can be answered very quickly if the cubes have the same dimensions used in the query. It can take considerable time to build the cubes, usually measured in hours. ROLAP essentially drills through to the backend relational database for the queries. And HOLAP, which is offered by products such as Microsoft SQL Server, does either or both, depending on the query. There are lots of materials written up on how this is done. With Informix IWA, we eliminate the need to build these cubes.
Q: Can IWA be also used to accelerate OLTP queries, and not only OLAP ones?
A: IWA's sweet spot, as mentioned in the presentation, is for OLAP queries against data warehouses where there is at least some form of Star Schema set up. That is, there needs to be a Fact Table with surrounding dimension tables. For OLTP queries, chances are that the transactions are short and would not need the type of acceleration provided by IWA, where it is scanning potentially millions and billions of records. However, if the OLTP schema can be defined in such a way that a data mart can be offloaded to IWA, then the queries will be accelerated. In some situations, e.g. operational data store, both OLTP and warehouse workloads share the same schema. In that case, IWA can be used to accelerate the OLAP queries while co-existing with OLTP. This is the basis for the mixed-workload I was referring to during the webcast.
Q: Is there any formula to determine how much memory you need for in memory database?
A: The amount of memory needed for IWA is dependent on the amount of compression that can be done on the data. Since we use columnar technology, data can be compressed quite well, as data in columns are usually quite similar. We use a rule of thumb of 3:1, i.e. 300 GB of table data means about 100 GB of main memory. There is no need to account for index or temp table space as IWA does not need them. Allow for additional memory if the same system is being used for IDS and other processing on the same system.
Lester's testing showed that he went from 110GB of raw data to 24GB in memory, which is much better than a ratio of 3:1.
Q: Can get special build to get along with IDS11.5 for iWA?
A: IWA is part of Informix Ultimate Warehouse edition and needs to run on 11.70xC2.
Q: If query using temp table, does IWA works in the same way (fast)?
A: Yes, for BI tools can use temp tables, queries that were run to retrieve the result onto the temp tables can be accelerated. The temp table processing itself is done on the IDS side, not in IWA.
Q: What is the largest size database you can have with the in memory db appliance? How do you handle high availability?
A: For the 1st release of IWA, the configuration in a single IBM x-Series machine, it has a maximum main memory capacity of 1.5 TB. With a 3:1 compression ratio, it would mean that the current maximum limit is about 4.5 TB of raw data. X-series is releasing a new system this year that will double that capacity. Note also that IWA can run on any Linux/Intel box, not necessarily from IBM. In an upcoming release, we will support running IWA on a Blade Server, thus allowing maximum scalability to multiple nodes (or blades).
In terms of high availability, the set of disks on IWA stores an in-memory image of what is in-memory for recovery purposes. This set of disk (much smaller than those that are stored on the Informix side) is only used to provide High Availability, not the actual database itself.
Q: How does the performance compare to Oracle Exadata appliances?
A: We have not done direct comparison to Exadata, but one of our large customer prospect is running a Proof of Concept (PoC) comparing the two. We will have much better data points once that are completed. We do believe that our numbers will be very competitive, and also at a much lower cost as compared to Exadata.
Q: IWA is based on MPP architecture; what is added at hardware level to make it MPP?
A: Actually IWA is not based on an MPP architecture. It is based on SMP. In fact, the recommended configuration for the 1st release is a single Intel box (with multiple cpus/cores). However, the original IWA design allows it to be deployed on a Blade Server with multiple nodes. We intend to support that configuration in an upcoming release. As I mentioned in the talk, we don't need extra hardware to do I/O for us, just the Intel box for which IWA resides. The fact that we use columnar technology and SIMD parallelism to scatter our "cells" to the different cores within the system provides us with a very distributed platform similar to what a MPP system would do. In a way, you can say that we combine the best of SMP and MPP onto a single system.
Q: Does IWA support locales other than default?
A: It is correct that as the 1st release, only the default locale is supported as indicated in our manual.
Only the en_us.8859-1 locale is supported.
We recognize this limitation and are working to add more locales for the upcoming fixpack. If there is interest for you company or an end customer, testing using the default locale will demonstrate the performance gain which can lead to successful implementation once we support the additional locales.
Q: Can you comment on costs? Of course it can vary a lot depending on what one want to do?, However can you comment on costs?
A: Cost is highly dependent on the amount of data that you plan to have in memory. As I mentioned during the talk, the rule of thumb is 3:1 so 300GB of raw data in your warehouse would require about 100GB of main memory. There is no need to account for index or temp space since IWA does not require them. The current eX5 system from IBM can contain up to 1.5TB of main memory at a very affordable price.
Q: What vendor/products is IBM positioning this Suite of products (IDS & IWA) against?
Q: How is IBM planning on Marketing/Selling this Suite of products (IDS & IWA) to the General public, the Non-Informix Community?
A: Actually, the two questions are sufficiently related that I think I can answer them together.
IWA is primarily targeted towards the Informix base. Obviously, the solution is generally applicable to non-Informix customers, but we believe there is sufficient need and interest within the customer base to generate sufficient demand. Our target is data warehouses that are less than 5 TB. As mentioned in the webcast, most analytics are done on warehouses of much smaller sizes regardless of how large the enterprise warehouse may be.
Q: Is there a problem\conflict if using CISCO\Riverbed (or any other) compression\acceleration with IWA
A: No, there is no conflict between Wide Area Network compression/acceleration with IWA. Once data comes over from the network to the IDS database, it is in uncompressed format. You may choose to apply IDS compression to store the data on IDS, but data in IWA uses completely different compression techniques and is not affected by other forms of compression.
Q: Is this solution is comparable to Oracle's Exadata?
A: I believe I addressed this somewhat during the call. The IWA solution from Informix is very much comparable to Oracle's Exadata approach, in that it tries to provide a substantial increase in performance for warehousing workloads. The architecture and approach is quite different because instead of utilizing another layer of hardware to perform I/O from the database, IWA eliminates I/O by using pure in-memory technologies, extreme compression and predicate evaluation with compressed data. This is how we can achieve an order of magnitude speedup at a fraction of the cost/admin required by standard relational systems.
Q: The data mart will still exist in the database and then loaded into memory?
A: Yes, database/datawarehouse resides on IDS, the data to be accelerated is loaded by IWA (by IWA). Then all subsequent queries that can be accelerated will be routed automatically to IWA and the answers will be returned to IDS. Applications that access the IDS database, e.g. for OLTP, will continue unchanged. Only the affected data mart (with a defined Fact Table) will be re-routed to IWA.
Q: Is the data loaded into IWA or meta-data gets loaded into IWA? If data is being loaded then is more storage needed?
A: Both IDS and IWA contain metadata for the "data mart" that's defined to be part of IWA. The data warehouse is stored on the IDS side, but not on IWA. It is only loaded into memory. So there is no requirement for additional storage on IWA. The disks shown on IWA are only for recovery purposes in case there is a failure to the IWA box. It is a memory image of what is in IWA.
Q: Putting entirely in memory? And what happens when there's a machine failure? No threads that flush to disk and persistence? Not smart.
Q: You said it was an 'all in memory only' solution, if true, what do you handle system failure, and disaster recovery planning.
A: These first two questions are referring to the same thing. The disks attached to IWA (not the ones on the IDS side that contains the database) are used for machine failure. It contains a memory image of what was loaded in memory. The amount of disk needed on IWA is limited to size of main memory configured, usually much smaller than the data base itself.
Q: So if I understand the diagram... The Warehouse accelerator is a separate box. All in memory solution. The timing listed in Fred's slide show time for queries. But how/when is the data loaded in to the box and what's the time to build the 'In Memory' warehouse that the query is run against?
A: The Beta customer (large shoe retailer) loaded 1 billion records of the Fact Table and two dimension tables (one with 3 million and another with about 330K rows) in about 30 minutes. The customer's comment was that since it used to take him 45 minutes to run a report, he gets his ROI within 2 queries. There are configuration parameters that will help with optimizing the load time.
Q: 512GB of memory? And what's the cost? What's reasonable? I thought the solution was all 'in memory' so why the 6 300GB 'fast' disk?
Q: Again, what's reasonable with respect to price.
A: Answer to both questions - for the recommended configuration using a System X server, the cost of the server including 512GB is around $50K to $60K, for which I believe is a very reasonable price. The software licenses for IWA depends on whether you are coming from the Informix Ultimate Edition ($160/PVU) or if you are buying a new license ($463/PVU). The 6 300GB disk is for recovery purposes as mentioned above.
Q: Will there be any Sessions, Tutorials or Labs for IWA and DW/BI offerings at the upcoming Informix conference in May?
A: There are several dedicated sessions for IWA at IIUG in May. We are also planning a live demo for accessing IWA in the Cloud. There will also be business partner sessions on this topic.