Over the last year, and specially with the release of Informix Warehouse Accelerator, we've improved the performance and lowered the TCO.
As many of you know, Stonebraker is associated with Vertica, columnar database company recently bought by HP.
Even though his blog is admittedly biased (towards vertica?), his contributions to RDBMS are too numerous to mention in the blog, but you can see it here..
Informix bought his company Illustra and one point, he was CTO of Informix as well.
I reviewed his assertions against what we've done. My comment is in blue.
- (Stonebraker) Star and snowflake schemas are a good idea in the data warehouse world.
(Keshav) I couldn’t agree more. Using Star and snow-flake schema has become best practice for data warehousing world. Informix 11.70 has added star and snowflake query optimizations to improve performance. IWA implementation focuses on star and snowflake schema. The data mart design, the compression techniques, the query processing is all optimized for star and snowflake schema.
- Column stores will dominate the data warehouse market over time, replacing row stores.
In the last few years, column store and access has improved the warehouse query performance. The benefit if column store comes into fore when you combine column store with compression. The compression can and will be done on the value instead of bit streams.
- The vast majority of data warehouses are not candidates for main memory or flash memory.
In warehouses, size matters. You can have terabytes of data in your
warehouse. The hardware vendors
recognize this and are increasing capacity and the prices are falling on
this. Recently, IBM announced eX5
servers with up to 6 terabytes capacity.
Intel announced Westmere with 10-cores.
Gartner did a survey of warehouses in 2010 and found 75% of data warehouses have less than 5TB of data. IWA will typically compress 5TB to about 1.5TB. So, there’s lot of room for growth. Once we have the MPP version, the capacity will grow linearly. So, this statement should be: vast majority of data warehouses ARE candidates for main memory!
- Massively parallel processor (MPP) systems will be omnipresent in this market.
MPP for warehouse has been proven in many contexts, configurations, hardware and vendors. At the same time, multi-core processors have added lot of CPU power into a single node. Depending on your data warehouse size and performance requirements, an SMP system can provide very good return on investment. SMP – simplicity
- “No knobs” is the only thing that makes any sense.
IWA is a no knob accelerator – no indexes, no statistics, nothing to tune. You tell IWA how much memory and CPU to use. Then, you simply load the data and start querying immediately. When you cannot tune, there’s nothing to tune.
- Appliances should be “software only.”
Informix Ultimate Warehouse Edition with Informix Warehouse Accelerator is available in the following configuration.
• IWA on Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11)
• IDS 11.70 + IWA code modules including IDS Stored Procedures
– Linux on Intel (64 bit)
– AIX on Power (64 bit)
– HPUX on Itanium (64 bit)
– Solaris on Sparc (64bit)
We give the software. You choose the right hardware for you.
- Hybrid workloads are not optimized by “one size fits all.”
ROW stores are well suited for OLTP and COLUMN stores are well suited for warehousing. So far, we’ve seen database supports either ROW store or COLUMN store… Hence Stonebraker’s comment
Informix database server uses ROW store. Informix OLTP performances are well known and proven. It also has number of features for warehouse management like time cyclic data management (via fragmentation features) and query optimization (has joins, multi index scans, star and snowflake join optimization).
IWA uses deep columnar storage, optimized for extreme performance. See this paper for details.
So, with Informix Ultimate Edition, you do get best of both worlds. You can run just OLTP, hybrid or just warehouse workload on Informix. And, do it all very well.
- Essentially all data warehouse installations want high availability (HA).
Inforimx has the best HA solution in the database industry. HDR, ER technology has been used by our customers for 15 years. MACH11 has increased this presence in the four years since its release. Flexible Grid in Informix 11.7 takes this to a new level by enabling easier management, replication of schema changes along with data.
One you off-load the data to IWA from the primary, you can accelerate the query from any node in the cluster or HA server.
- DBMSs should support online reprovisioning.
Stonebraker’s idea here is the DBMS instances should be elastic ONLINE.
The snapshot of the data fro Informix to IWA is done ONLINE. When you need to reprovision the number of workers, you simply disable the marts, change the number of nodes and reload the data ONLINE. IWA still runs on a single machine… But, the underlying architecture and implementation will eventually support MPP environment.
- Virtualization often has performance problems in a DBMS world.
The performance issues are owed to two factors: IO performance and CPU sharing. IWA does not have any IO – That’s one factor taken care of. Virtualization is typically deployed when/because you have excess CPU capacity. IWA maximizes the usage of CPU usage. We’ve tested IWA in virtualized and cloud environment and found you still get these incredible speeds… We’ve internal users and customers validating this in virtualized and Cloud environment.
So, we score quite well on these assertions!