May 2011 edition of Communications of ACM has an article from Michael Stonebraker.
It can also be found on the ACM blog below.http://cacm.acm.org/blogs/blog-cacm/98136-my-top-10-assertions-about-data-warehouses/fulltext
Over the last year, and specially with the release of Informix Warehouse Accelerator, we've improved the performance and lowered the TCO.
As many of you know, Stonebraker is associated with Vertica, columnar database company recently bought by HP.
Even though his blog is admittedly biased (towards vertica?), his contributions to RDBMS are too numerous to mention in the blog, but you can see it here
Informix bought his company Illustra and one point, he was CTO of Informix as well.
I reviewed his assertions against what we've done. My comment is in blue.
- (Stonebraker) Star and snowflake schemas
are a good idea in the data warehouse world.
(Keshav) I couldn’t agree more.
Using Star and snow-flake schema has become best practice for data
warehousing world. Informix 11.70 has added star and snowflake query
optimizations to improve performance. IWA implementation focuses on star and snowflake
schema. The data mart design, the
compression techniques, the query processing is all optimized for star and
- Column stores will
dominate the data warehouse market over time, replacing row stores.
In the last few years, column store and access has
improved the warehouse query performance.
The benefit if column store comes into fore when you combine column
store with compression. The compression
can and will be done on the value instead of bit streams.
- The vast majority of data
warehouses are not candidates for main memory or flash memory.
In warehouses, size matters. You can have terabytes of data in your
warehouse. The hardware vendors
recognize this and are increasing capacity and the prices are falling on
this. Recently, IBM announced eX5
servers with up to 6 terabytes capacity.
Intel announced Westmere with 10-cores.
Gartner did a survey of warehouses in 2010 and found 75% of data warehouses
have less than 5TB of data. IWA will
typically compress 5TB to about 1.5TB.
So, there’s lot of room for growth.
Once we have the MPP version, the capacity will grow linearly. So, this statement should be: vast majority of data warehouses ARE
candidates for main memory!
- Massively parallel
processor (MPP) systems will be omnipresent in this market.
MPP for warehouse has been proven in many contexts,
configurations, hardware and vendors. At
the same time, multi-core processors have added lot of CPU power into a single
node. Depending on your data warehouse
size and performance requirements, an SMP system can provide very good return
on investment. SMP – simplicity
- “No knobs” is the only
thing that makes any sense.
IWA is a no knob accelerator – no indexes, no statistics, nothing
to tune. You tell IWA how much memory
and CPU to use. Then, you simply load
the data and start querying immediately. When you cannot tune, there’s nothing to tune.
- Appliances should be
Informix Ultimate Warehouse Edition with Informix
Warehouse Accelerator is available in the following configuration.
IWA on Linux on Intel x86_64 (RHEL
5 or SUSE SLES 11)
IDS 11.70 + IWA code modules
including IDS Stored Procedures
Linux on Intel (64 bit)
AIX on Power (64 bit)
HPUX on Itanium (64 bit)
Solaris on Sparc (64bit)
We give the software.
You choose the right hardware for you.
- Hybrid workloads are not
optimized by “one size fits all.”
ROW stores are well suited for OLTP and COLUMN stores are
well suited for warehousing. So far,
we’ve seen database supports either ROW store or COLUMN store… Hence
Informix database server uses ROW store. Informix OLTP performances are well known and
proven. It also has number of features
for warehouse management like time cyclic data management (via fragmentation
features) and query optimization (has joins, multi index scans, star and
snowflake join optimization).
IWA uses deep columnar storage, optimized for extreme
performance. See this paper for details.
So, with Informix Ultimate Edition, you do get best of
both worlds. You can run just OLTP,
hybrid or just warehouse workload on Informix.
And, do it all very well.
- Essentially all data
warehouse installations want high availability (HA).
Inforimx has the best HA solution in the database
industry. HDR, ER technology has been used by our customers for 15 years. MACH11 has increased this presence in the
four years since its release. Flexible
Grid in Informix 11.7 takes this to a new level by enabling easier management,
replication of schema changes along with data.
One you off-load the data to IWA from the primary, you can
accelerate the query from any node in the cluster or HA server.
- DBMSs should support
Stonebraker’s idea here is the DBMS instances should be
The snapshot of the data fro Informix to IWA is done
ONLINE. When you need to reprovision the
number of workers, you simply disable the marts, change the number of nodes and
reload the data ONLINE. IWA still runs
on a single machine… But, the underlying architecture and implementation will
eventually support MPP environment.
- Virtualization often has
performance problems in a DBMS world.
The performance issues are owed to two
factors: IO performance and CPU
sharing. IWA does not have any IO –
That’s one factor taken care of.
Virtualization is typically deployed when/because you have excess CPU
capacity. IWA maximizes the usage of CPU
usage. We’ve tested IWA in virtualized
and cloud environment and found you still get these incredible speeds… We’ve internal users and customers validating
this in virtualized and Cloud environment.
So, we score quite well on these assertions!