Has EMC stooped so low that they have to resort to Hitachi math for their latest performance claims?
Readers might remember that just a few months ago, I had a blog post [Is this what HDS tells our mainframe clients?] pointing out the outlandish comparison Hitachi was using in their presentations. Their response was to cover it up, forcing me to follow up with my post [The Cover-up is worse than the original crime]. To their credit, they eventually removed the false and misleading information from their materials.
Now an avid reader of my blog has brought this to my attention. Apparently, EMC has been showing customers a presentation [Accelerating Storage Transformation with VMAX and VPLEX] with false and misleading comparison claims between IBM DS8000, HDS VSP and EMC VMAX 40K disk system performance.
(FTC Disclosure: This would be a good time to remind my readers that I work for IBM and own IBM stock. I do not endorse any of the EMC or HDS products mentioned in this post, and have no financial affiliation or investments directly with either EMC nor HDS. I am basing my information solely on the presentation posted on the internet and other sources publicly available, and not on any misrepresentations from EMC speakers at the various conferences where these charts might have been shown.)
The problem with misinformation is that it is not always obvious. The EMC presentation is quite pretty and professional-looking. It is the typical slick, attention-getting, low-content, over-simplified marketing puffery you have come to expect from EMC. There are two slides in particular that I have issue with.
This first graphic implies that IBM and HDS are nearly tied in performance, but that EMC VMAX 40K has nearly triple that bandwidth. Overall the slide has very little detail. That makes it difficult to determine what exactly is being claimed and whether a fair comparison is being made.
- The title claims that VMAX 40K is "#1 in High Bandwidth Apps". Only three disk systems are shown so the claim appears to be relative to only the three systems. The wording "High Bandwidth Apps" is confusing considering the cited numbers are for disk systems and no application is identified. By comparison, IBM SONAS can drive up to 105 GB/sec sequential bandwidth, nearly double what EMC claims for its VMAX 40K, so EMC is certainly not even close to #1.
- Is the workload random or sequential? That is not easy to determine. The use of "GB/s" along with the large block size of 128KB implies the I/O workload is sequential, which is great for some workloads like high performance computing, technical computing and video broadcasts. Random workloads, on the other hand, are usually measured in I/Os per second (IOPS) with a block size ranging 4KB to 64KB. (I am assuming the 128K blocks refers to 128KB block size, and not reading the same block of cache 128,000 times.)
- The slide states "Maximum Sustainable RRH Bandwidth 128K Blocks". The acronym "RRH" is not defined; but I suspect this refers to "random read hits". For random workloads, 100 percent random read hits from cache represents one corner of the infamous "four corners" test. Real-world workloads have a mix of reads and writes, and a mix of cache hits and cache misses. It is also unclear whether the hits are from standard data cache or from internal buffers in adapters (perhaps accessing the same blocks repeatedly) or something else. So is this really for a random workload, or a sequential workload?
(The term "Hitachi Math" was coined by an EMC blogger precisely to slam Hitachi Data Systems for their blatant use of four-corners results, claiming that spouting ridiculously large, but equally unrealistic, 100 percent random read hit results don't provide any useful information. I agree. There are much better industry-standard benchmarks available, such as SPC-1 for random workloads, SPC-2 for sequential workloads, and even benchmarks for specific applications, that represent real-world IT environments. To shame HDS for their use of four-corners results, only for EMC themselves to use similar figures in their own presentation is truly hypocritical of them!)
- The IBM system is identified as "DS8000". DS8000 is a generic family name that applies to multiple generations of systems first introduced in 2004. The specific model is not identified, but that is critical information. Is this a first generation DS8100, or the latest DS8800, or something in between?
The slide says "Full System Configs", but that is not defined and configuration details are not identified. Configuration details, also critical information in assessing system performance capabilities, are not specified. If the EMC box costs seven times more than IBM or HDS, would you really buy it to get 3x more performance? Is the EMC packed with the maximum amount of SSD? Were there any SSD in the IBM or HDS boxes to match?
The source of the claimed IBM DS8000 performance numbers is not identified. Did they run their own tests? While I cannot tell, the VMAX may have been configured with 64 Fibre Channel 8Gbps host connections. In that case each channel is theoretically capable of supporting about 800 MB/s at 100% channel utilization. Multiplying 64 x 800MB/s = 51.2GB/s, so did EMC just do the performance comparison on the back of a napkin, assuming there are no other bottlenecks in the system? Even then, I would not round up 51.2 to 52!
- Response times were not identified. For random I/Os, response time is a very important metric. It is possible that the Symmetrix was operating with some resources at 100% utilization to get the highest GB/s result, but that would likely make I/O response times unacceptable for real-world random I/O workloads.
IBM and HDS have both published Storage Performance Council [SPC] industry-standard performance benchmarks. EMC has not published any SPC benchmarks for VMAX systems. If EMC is interested in providing customers with audited, detailed performance information along with detailed configuration information, all based on benchmarks designed to represent real-world workloads, EMC can always publish SPC benchmark results as IBM and other vendors have done. In past blog fights, EMC resorts to the excuse that SPC isn't perfect, but can they really argue that vague and unrealistic claims cited in its presentation are better?
The second graphic is so absurd, you would think it came directly from Larry Ellison at an Oracle OpenWorld keynote session. EMC is comparing a configuration with VMAX 40K plus an EMC VFCache host-side flash memory cache card to a configuration with an IBM and HDS disk system without host-side flash memory cache also configured. The comparison is clearly apples-to-oranges. Other disk system configuration details are also omitted.
FAST VP is EMC's name for its sub-volume drive tiering feature, comparable to IBM Easy Tier and Hitachi's Dynamic Tiering. The graph implies that IBM and HDS can only achieve a modest increment improvement from their sub-volume tiering. I beg to differ. I have seen various cases where a small amount of SSD on IBM DS8000 series can drastically improve performance 200 to 400 percent.
The "DBClassify" shown on the graph is a tool run as part of an EMC professional services offering called Database Performance Tiering Assessment, makes recommendations for storing various database objects on different drive tiers based on object usage and importance. Do you really need to pay for professional services? With IBM Easy Tier, you just turn it on, and it works. No analysis required, no tools, no professional services, and no additional charge!
- VFCache is an optional product from EMC that currently has no integration whatsoever with VMAX. A fair comparison would have included a host-side flash memory cache (from any vendor) when the IBM or HDS storage system was configured. Or leave it out altogether and just focus on the sub-volume tiering comparison.
Keep in mind that EMC's VFCache supports only selected x86-based hosts. IBM has published a [Statement of Direction] indicating that it will also offer this for Power systems running AIX and Linux host-side flash memory cache integrated with DS8000 Easy Tier.
I feel EMC's claims about IBM DS8000 performance are vague and misleading. EMC appears to lack the kind of technical marketing integrity that IBM strives to attain. Since EMC is not able or willing to publish fair and meaningful performance comparisons, it is up to me to set the record straight and point out EMC's failings in this matter.
Reminder: It's not to late to register for my Webcast "Solving the Storage Capacity Crisis" on Tuesday, September 25. See my blog post [Upcoming events in September] to register!