Over at StorageMojo, Robin Harris writes in his post[The High-End Storage Melt-Down
Expect to hear a lot more about the SMB segment over the next 6 months.
Because the high-end market is sucking wind. NetApp and EMC are both reporting problems in the high-end. HP and IBM don’t break out as much detail but I’m sure they are feeling the chill as well.
With Hulk/Maui coming in Q2CY08, you should hold off on any 2nd tier storage purchases you can. I estimate that H/M will be about 30% per GB less than the current gear.
Robin blames the U.S. subprime mortgage mess, butI disagree with the term melt-down.
IBM doesn't publicly report subset numbers on individual product lines, but we are growing, albeit single-digit growth, on the high-end with our IBM System Storage DS8000 and DS6000 series products. Single digit growth is not "booming", but it is what we expected in this space, so it is not like we are"feeling the chill" as Robin stated.Obviously, if the U.S. market overall is doing poorly, then it must be from something else. IBM's success appears to be from organic growth in our Asia and Europe markets, and taking marketshare away from the top two contenders, EMC and HDS. Here are my thoughts why:
- EMC is remodeling its kitchen
Not happy with its status as #1 disk hardware specialty shop, EMC is admirably trying to redefine itself as an ["information infrastructure"] company, buying up software companies and introducing new storage services. [Byte and Switch] reports onEMC's recent acquisitions:
EMC is the latest vendor to pin its colors to the SaaS mast, revealing its plan to offer SaaS-based archiving services during its recent Innovation Day in Boston.IBM has offered[Managed Storage Services] foryears through our Global Technology Services (GTS) division. Gartner recognized IBM as the #1 leader in storageservices, with three times more revenues than EMC in this space.
EMC gave another clear indication of its SaaS intentions last month, when it spent $76 million to acquire online backup specialist Mozy.
As with a restaurant that is remodeling its kitchen, it can expect a temporary drop inrevenue. If it is done right, customers will come back to a bigger brighter restaurant. If not, the restaurant re-opens as a much smaller lesser version of itself. Recent events this year might incent EMC to get that kitchen done quickly:
- A recent [class-action lawsuit]might result in having EMC's "86 percent male" sales force goes to sexual harassment sensitivity training, takingtime away from selling high-end storage arrays in the field. Analysts consider "high-end" boxes as those costingover $300,000 US dollars. Because of the money involved, there is a lot of competition for high-end storage, so face-to-face time with prospective customers is crucial to making the sale.Anytime any vendor is mentioned in a lawsuit (andcertainly IBM has had its share in the past, as Chuck Hollis correctly points out in the comment below), priorities get shifted, and there is potential dip in revenues.
- Dell acquires EMC's rival EqualLogic. Dell resold EMC midrange storage, like CLARiiON, so this should notimpact their high-end storage sales. While Dell will be allowed to sell EMC until 2011, this new acquisition mightmean Dell leads with the EqualLogic offerings, and that could potentially reduce EMC revenues in the midrange space.
IBM went through a similar phase in the 1990's, redefining itself from an "IT Technology" company, intoa "Systems, Software and Services" company. These transitions can't be done in a quarter, or even a year, theytake several years. IBM lost business to EMC in the 1990s, but is back with a stronger portfolio in the 2000's, and so IBM's kitchen remodeling effort appears to be paying off. We will see what happens with EMC in a few years.
- HDS puts on the white lab coats
Meanwhile, HDS appears interested in taking over as #1 disk hardware specialty shop.For years, Hitachi was the stereotypical JCM (Japanese IBM-compatible manufacturer) that made well-engineered"me, too" storage arrays. They would see what innovators like IBM and EMC were doing, and copy them. Recently,however, they seemed to have changed strategy, introducing new featuresand functions on their high-end USP-V device, like[Dynamic Provisioning].
The problem is that customers don't want to feel like [Guinea pigs] in an experimental lab, especially withmission-critical data that they trust to their most-available, most-reliable high-end disk storage systems.Like IBM and EMC and the rest of the major storage vendors, Hitachi has top-notch engineers making quality products, but new features scare people, and so there is a lag in the adoption of new technologies.
In our youth, we might have preferred beer with recent born-on dates, and tequila aged less than 90 days. But as weget older, we switch to drinks like wine and whiskey, aged years, not weeks. The same is true for themarketplace. New start-ups and other "early adopters"might be willing to try fresh new features and functions on their storage systems, but more established enterprises prefer storage with more mature and stable microcode.Storage admins want to leave at the end of the day, knowing that the data will still be there the next morning. In tough financial times, many established companies want the technological equivalent to ["comfort food"], nothing spicy or exotic, but simplehearty fare that fills the belly and keeps you satisfied.
Recognizing this, IBM often introduces new features and functions on its midrange lines first, and position them accordingly. Once customers are comfortable with the concepts, IBM then can consider moving them into the high-end lines. For example, dynamic volume expansion was introduced on the DS4000 and SAN Volume Controller first, and once proven safe and effective, brought over to the DS8000 series. This strategy has served us well.
Well those are my theories. If you have a different explanation of why storage vendors are not doing well in thehigh-end, drop me a comment!
technorati tags: SMB, EMC, NetApp, DS8000, DS6000, HDS, Dell, EqualLogic, subprime, mortgage, USP, USP-V, Dynamic Provisioning, DS4000, SAN Volume Controller, SVC
It's official! My "blook" Inside System Storage - Volume I
is now available.
|This blog-based book, or “blook”, comprises the first twelve months of posts from this Inside System Storage blog,165 posts in all, from September 1, 2006 to August 31, 2007. Foreword by Jennifer Jones. 404 pages.|
- IT storage and storage networking concepts
- IBM strategy, hardware, software and services
- Disk systems, Tape systems, and storage networking
- Storage and infrastructure management software
- Second Life, Facebook, and other Web 2.0 platforms
- IBM’s many alliances, partners and competitors
- How IT storage impacts society and industry
You can choose between hardcover (with dust jacket) or paperback versions:
This is not the first time I've been published. I have authored articles for storage industry magazines, written large sections of IBM publications and manuals, submitted presentations and whitepapers to conference proceedings, and even had a short story published with illustrations by the famous cartoon writer[Ted Rall].
But I can say this is my first blook, and as far as I can tell, the first blook from IBM's many bloggers on DeveloperWorks, and the first blook about the IT storage industry.I got the idea when I saw [Lulu Publishing] run a "blook" contest. The Lulu Blooker Prize is the world's first literary prize devoted to "blooks"--books based on blogs or other websites, including webcomics. The [Lulu Blooker Blog] lists past year winners. Lulu is one of the new innovative "print-on-demand" publishers. Rather than printing hundredsor thousands of books in advance, as other publishers require, Lulu doesn't print them until you order them.
I considered cute titles like A Year of Living Dangerously, orAn Engineer in Marketing La-La land, or Around the World in 165 Posts, but settled on a title that matched closely the name of the blog.
In addition to my blog posts, I provide additional insights and behind-the-scenes commentary. If you go to the Luluwebsite above, you can preview an entire chapter in its entirety before purchase. I have added a hefty 56-page Glossary of Acronyms and Terms (GOAT) with over 900 storage-related terms defined, which also doubles as an index back to the post (or posts) that use or further explain each term.
So who might be interested in this blook?
- Business Partners and Sales Reps looking to give a nice gift to their best clients and colleagues
- Managers looking to reward early-tenure employees and retain the best talent
- IT specialists and technicians wanting a marketing perspective of the storage industry
- Mentors interested in providing motivation and encouragement to their proteges
- Educators looking to provide books for their classroom or library collection
- Authors looking to write a blook themselves, to see how to format and structure a finished product
- Marketing personnel that want to better understand Web 2.0, Second Life and social networking
- Analysts and journalists looking to understand how storage impacts the IT industry, and society overall
- College graduates and others interested in a career as a storage administrator
And yes, according to Lulu, if you order soon, you can have it by December 25.
technorati tags: IBM, blook, Volume I, Jennifer Jones, system, storage, strategy, hardware, software, services, disk, tape, networking, SAN, secondlife, Web2.0, facebook, Lulu, publishing, Blooker Prize, articles, magazines, proceedings, Ted Rall, insights, glossary, early-tenure, mentors, library, classroom, administrator, print, publish, on demand
In North America, today marks the start of the "Give 1 Get 1" program.
|Children using the XO laptop|
I first learned from this when I was reading about Timothy Ferriss' [LitLiberation project] on his [Four Hour Work Week] blog, and was surfing around for related ideas, and chanced upon this. I registered for a reminder, and it came today(the reminder, not the laptop itself).
Here's how the program works. You give $399 US dollars to the "One Laptop per Child" (OLPC)[laptop.org] organization for two laptops: One goes to a deserving child ina developing country, the second goes to you, for your own child, or to donate to a localcharity that helps children. This counts as a $199 purchase plus a $200 tax-deductible donation.For Americans, this is a [US 501(c)(3)] donation, and for Canadians and Mexicans, take advantage of the low-value of the US dollar!
If your employer matches donations, like IBM does, get them to match the $200donation for a third laptop, which goes to another child in a developing country. As for shipping, you pay only for the shipping of the one to you, each receiving country covers their own shipping. In my case, the shipping was another $24 US dollars for Arizona.No guarantees that it will arrive in time for the holidays this December, but it might.
To sweeten the deal, T-mobile throws in a year's worth of "Wi-Fi Hot Spot"that you can use for yourself, either with the XO laptop itself, or your regular laptop, iPhone, or otherWi-Fi enabled handheld device.
National Public Radio did a story last week on this:[The $100 Laptop Heads for Uganda]where they interview actor [Masi Oka], best known from the TV show ["Heroes"], who has agreed to be their spokesman.At the risk of sounding like their other spokesman, I thought I would cover the technology itself, inside the XO,and how this laptop represents IBM's concept of "Innovation that matters"!
The project was started by [Nicholas Negroponte] from [MIT University] as the "$100 laptop project". Once the final designwas worked out, it turns out it costs $188 US dollars to make, so they rounded it up to $200. This is stillan impressive price, and requires that hundreds of thousands of them be manufactured to justify ramping upthe assembly line.
Two of IBM's technology partners are behind this project. First is Advanced Micro Devices (AMD) that providesthe 433Mhz x86 processor, which is 75 percent slower than Thinkpad T60. Second is Red Hat,as this runs lean Fedora 6 version of Linux. Obviously, you couldn't have Microsoft Windows or Apple OS X, as both require significantly more resources.
The laptop is "child size", and would be considered in the [subnotebook] category. At 10" x 9" x 1.25", it is about the size of class textbook,can be carried easily in a child's backpack, or carried by itself with the integrated handle. When closed, it is sealedenough to be protected when carried in rain or dust storms. It weighs about 3.5 pounds, less than the 5.2 pounds of myThinkpad T60.
The XO is "green", not just in color, but also in energy consumption.This laptop can be powered by AC, or human power hand-crank, with workin place to get options for car-battery or solar power charging. Compared to the 20W normally consumed bytraditional laptops, the XO consumes 90 percent less, running at 2W or less. To accomplish this, there is no spinning disk inside. Instead, a 1GB FLASH drive holds 700MB of Linux, and gives you 300MB to hold your files. There isa slot for an MMC/SD flash card, and three USB 2.0 ports to connect to USB keys, printers or other remote I/O peripherals.
The XO flips around into three positions:
Standard laptop position has screen and keyboard. The water-tight keyboard comes in ten languages:International/English, Thai, Arabic, Spanish, Portuguese, West African, Urdu, Mongolian, Cyrillic, and Amharic.(I learned some Amharic, having lived five years with Ethiopians.)There does not appear be a VGA port, so don't be thinking this could be used as an alternative to project Powerpoint presentations onto a big screen.
Built-in 640x480 webcam, microphone and speakers allow the XO to be used as a communication device. Voice-over-IP (VOIP) client software, similar to Skype or [IBM Lotus Sametime], is pre-installed for this purpose.
The basic built-in communication are 802.1g (54Mbs) that you can use to surf the web usingthe Wi-Fi at your local Starbucks; and 802.1s which forms a "mesh network" with other XO laptops, and can surf theweb finding the one laptop nearby that is connected to the internet to share bandwidth. This eliminates the need to build a separate Wi-Fi hub at the school. There are USB-to-Ethernet and USB-to-Cellular converters, so that might be an alternative option.
Flipped vertically, the device can be read like a book.The screen can be changed between full-color and black-white, 200 dpi, with decent 1200x900 pixel resolution. The full-color is back-lit, and can be read in low-lighting. The black-white is not back-lit, consumes much less power, andcan be read in bright sunlight. In that regards, it is comparable to other [e-book devices], like a Cybook or Sony Reader.
Software includes a web-browser, document reader, word processor and RSS feed reader to read blogs.The OLPC identifies all of the software, libraries and interfaces they use, so that anyone that wants to developchildren software for this platform can do so.
- Game mode
With the keyboard flipped back, the 6" x 4.5" screen has directional controls and X/Y/A/B buttons to run games. This would make it comparable to a Nintendo DS or Playstation Portable (PSP). Again, the choice between back-lit color,or sunlight black-white screen modes apply. Some games are pre-installed.
So for $399, you could buy a Wi-Fi enabled[16GB iPod Touch
] for yourself, which does much the same thing, or you can make a difference in the world.I made my donation this morning, and suggest you--my dear readers in the US, Canada and Mexico--consider doing the same.Go to [www.laptopgiving.org
] for details.
Perhaps I wrapped up my exploration of disk system performance one day too early. (While it is Friday here in Malaysia, it is still only Thursday back home)
Barry Burke, EMC blogger (aka The Storage Anarchist) writes:
Aren't you mixing metrics here?
Miles per Gallon measures an effeciency ratio (amount of work done with a fixed amount of energy), not a speed ratio (distance traveled in a unit of time).
Given that IOPs and MB/s are the unit of "work" a storage array does, wouldn't the MPG equivalent for storage be more like IOPs per Watt or MB/s per Watt? Or maybe just simply Megabytes Stored per Watt (a typical "green" measurement)?
You appear to be intentionally avoiding the comparison of I/Os per Second and Megabytes per Second to Miles Per Hour?
May I ask why?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
- MPG applies to all instances of a particular make and model. Before Henry Ford and the assembly line, cars were made one at a time, by a small team of craftsmen, and so there could be variety from one instance to another. Today, vehicles and storage systems are mass-produced in a manner that provides consistent quality. You can test one vehicle, and safely assume that all similar instances of the same make and model will have the similar mileage. The same is true for disk systems, test one disk system and you can assume that all others of the same make and model will have similar performance.
MPG has a standardized measurement benchmark that is publicly available. The US Environmental Protection Agency (EPA) is an easy analogy for the Storage Performance Council, providing the results of various offerings to chose from.
MPG has usage-specific benchmarks to reflect real-world conditions.The EPA offers City MPG for the type of driving you do to get to work, and Highway MPG, to reflect the type ofdriving on a cross-country trip. These serve as a direct analogy to SPC having SPC-1 for Online transaction processing (OLTP) and SPC-2 for large file transfers, database queries and video streaming.
MPG can be used for cost/benefit analysis.For example, one could estimate the amount of business value (miles travelled) for the amount of dollar investment (cost to purchase gallons of gasoline, at an assumed gas price). The EPA does this as part of their analysis. This is similar to the way IOPS and MB/s can be divided by the cost of the storage system being tested on SPC benchmark results. The business value of IOPS or MB/s depends on the application, but could relate to the number of transactions processed per hour, the number of music downloads per hour, or number of customer queries handled per hour, all of which can be assigned a specific dollar amount for analysis.
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
- Increasing MPH, and driving anywhere near the maximum rated MPH for a vehicle, can be reckless and dangerous,risking loss of human life and property damage. Even professional race car drivers will agree there are dangers involved. By contrast, processing I/O requests at maximum speed poses no additional risk to the data, nor possibledamage to any of the IT equipment involved.
- While most vehicles have top speeds in excess of 100 miles per hour, most Federal, State and Local speed limits prevent anyone from taking advantage of those maximums. Race-car drivers in NASCAR may be able to take advantage of maximum MPH of a vehicle, the rest of us can't. The government limits speed of vehicles precisely because of the dangers mentioned in the previous bullet. In contrast, processing I/O requests at faster speeds poses no such dangers, so the government poses no limits.
- Neither IOPS nor MB/s match MPH exactly.Earlier this week,I related IOPS to "Questions handled per hour" at the local public library, and MB/s to "Spoken words per minute" in those replies. If I tried to find a metric based on unit type to match the "per second" in IOPS and MB/s, then I would need to find a unit that equated to "I/O requests" or "MB transferred" rather than something related to "distance travelled".
In terms of time-based units, the closest I could come up with for IOPS was acceleration rate of zero-to-sixty MPH in a certain number of seconds. Speeding up to 60MPH, then slamming the breaks, and then back up to 60MPH, start-stop, start-stop, and so on, would reflect what IOPS is doing on a requestby request basis, but nobody drives like this (except maybe the taxi cab drivers here in Malaysia!)
Since vehicles are limited to speed limits in normal road conditions, the closest I could come up with for MB/s would be "passenger-miles per hour", such that high-occupancy vehicles like school buses could deliver more passengers than low-occupancy vehicles with only a few passengers.
Neither start-stops nor passenger-miles per hour have standardized benchmarks, so they don't work well for comparisonbetween vehicles.If you or anyone can come up with a metric that will help explain the relevance of standardized benchmarks better than the MPG that I already used, I would be interested in it.
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s.
Oh, and yes, it's true - the DMX-4 is not the first high-end storage array to ship a 4Gb/s FC back-end. The USP-V, announced way back in May, has that honor (but only if it meets the promised first shipments in July 2007). DMX-4 will be in August '07, so I guess that leaves the DS8000 a distant 3rd.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
technorati tags: IBM, EMC, Storage Anarchist, MPG, MPH, IOPS, NASCAR, Malaysia, Watts, GB, green, back-end, DMX-3, DMX-4, HDS, USP, USP-V, SPC, SPC-1, SPC-2, standardized, benchmarks, workload, DS8000, disk, storage, tape
Wrapping up this week's exploration on disk system performance, today I willcover the Storage Performance Council (SPC) benchmarks, and why I feel they are relevant to help customers make purchase decisions. This all started to address a comment from EMC blogger Chuck Hollis, who expressed his disappointment in IBM as follows:
You've made representations that SPC testing is somehow relevant to customers' environments, but offered nothing more than platitudes in support of that statement.
Apparently, while everyone else in the blogosphere merely states their opinions and moves on,IBM is held to a higher standard. Fair enough, we're used to that.Let's recap what we covered so far this week:
- Monday, I explained how seemingly simple questions like "Which is the tallestbuilding?" or "Which is the fastest disk system?" can be steeped in controversy.
- Tuesday, I explored what constitutes a disk system. While there are special storage systemsthat include HDD that offer tape-emulation, file-oriented access, or non-erasable non-rewriteable protection,it is difficult to get apples-to-apples comparisions with storage systems that don't offer these special features.I focused on the majority of general-purpose disk systems, those that are block-oriented, direct-access.
- Wednesday, I explored two metrics to measure storage performance, I/O requestsper second (IOPS) and Megabytes transferred per second (MB/s).
Today, I will explore ways to apply these metrics to measure and compare storageperformance.
Let's take, for example, an IBM System Storage DS8000 disk system. This has a controller thatsupports various RAID configurations, cache memory, and HDD inside one or more frames.Engineers who are testing individual components of this system might run specifictypes of I/O requests to test out the performance or validate certain processing.
- 100% read-hit, this means that all the I/O requests are to read data expectedto be in the cache.
- 100% read-miss, this means that all the I/O requests are to read data expectedNOT to be in the cache, and must go fetch the data from HDD.
- 100% write-hit, this means that all the I/O requests are to write data into cache.
- 100% write-miss, this means that all the I/O requests are to bypass the cache,and are immediately de-staged to HDD. Depending on the RAID configuration, this can result in actually reading or writing several blocks of data on HDD to satisfy thisI/O request.
Known affectionately in the industry as the "four corners" test, because you can show them on a box, with writes on the left, reads on the right,hits on the top, and misses on the bottom.Engineers are proud of these results, but these workloads do notreflect any practical production workload. At best, since all I/O requests are oneof these four types, the four corners provide an expectation range from the worst performance (most often write-missin the lower left corner)and the best performance (most often read-hit in the upper right corner) you might get with a real workload.
To understand what is needed to design a test that is more reflective of real business conditions,let's go back to yesterday's discussion of fuel economy of vehicles, with mileage measured in miles per gallon.The How Stuff Works websiteoffers the following description for the two measurements taken by the EPA:
- City MPG
The "city" program is designed to replicate an urban rush-hour driving experience in which the vehicle is started with the engine cold and is driven in stop-and-go traffic with frequent idling. The car or truck is driven for 11 miles and makes 23 stops over the course of 31 minutes, with an average speed of 20 mph and a top speed of 56 mph.
- Highway MPG
The "highway" program, on the other hand, is created to emulate rural and interstate freeway driving with a warmed-up engine, making no stops (both of which ensure maximum fuel economy). The vehicle is driven for 10 miles over a period of 12.5 minutes with an average speed of 48 mph and a top speed of 60 mph.
Why two different measurements? Not everyone drives in a city in stop-and-go traffic. Having only one measurement may not reflect the reality that you may travel long distances on the highway. Offering both city and highway measurements allows the consumers to decide which metric relates closer to their actual usage.
Should you expect your actual mileage to be the exact same as the standardized test?Of course not. Nobody drives exactly 11 miles in the city every morning with 23 stops along the way,or 10 miles on the highway at the exact speeds listed.The EPA's famous phrase "your mileage may vary" has been quickly adopted into popular culture's lexicon. All kinds of factors, like weather, distance, anddriving style can cause people to get better or worse mileage than thestandardized tests would estimate.
Want more accurate results that reflect your driving pattern, in specific conditions that you are most likely to drive in? You could rentdifferent vehicles for a week and drive them around yourself, keeping track of whereyou go, and how fast you drove, and how many gallons of gas you purchased, so thatyou can then repeat the process with another rental, and so on, and then use yourown findings to base your comparisons. Perhaps you find that your results are always20% worse than EPA estimates when you drive in the city, and 10% worse when you driveon the highway. Perhaps you have many mountains and hills where you drive, you drive too fast, you run the Air Conditioner too cold, or whatever.
If you did this with five or more vehicles, and ranked them best to worstfrom your own findings, and also ranked them best to worst based on the standardizedresults from the EPA, you likely will find the order to be the same. The vehiclewith the best standardized result will likely also have the best result from your ownexperience with the rental cars. The vehicle with the worst standardized result willlikely match the worst result from your rental cars.
(This will be one of my main points, that standardized estimates don't have to be accurate to beuseful in making comparisons. The comparisons and decisions you would make with estimatesare the same as you would have made with actual results, or customized estimates based on current workloads. Because the rankings are in the same order, they are relevant and useful for making decisions based on those comparisons.)
Most people shopping around for a new vehicle do not have the time or patience to do this with rental cars. Theycan use the EPA-certified standardized results to make a "ball-park" estimate on how much they will spendin gasoline per year, decide only on cars that might go a certain distancebetween two cities on a single tank of gas, or merely to provide ranking of thevehicles being considered. While mileage may not be the only metric used in making a purchase decision, it can certainly be used to help reduce your consideration setand factor in with other attributes, like number of cup-holders, or leather seats.
In this regard, the Storage Performance Council has developed two benchmarks that attempt to reflect normal business usage, similar to "City" and "Highway" driving measurements.
SPC-1 consists of a single workload designed to demonstrate the performance of a storage subsystem while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries as well as update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.
SPC-2 consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.
- Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
- Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
- Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.
The SPC-2 benchmark was added when people suggested that not everyone runs OLTP anddatabase transactional update workloads, just as the "Highway" measurement was addedto address the fact that not everyone drives in the City.
If you are one of the customers out there willing to spend the time and resources to do your own performance benchmarking, either at your own data center, or with theassistance of a storage provider, I suspect most, if not all, the major vendors(including IBM, EMC and others), and perhaps even some of the smaller start-ups, would be glad to work with you.
If you want to gather performance data of your actual workloads, and use this to estimate how your performance might be with a new or different storage configuration, IBMhas tools to make these estimates, and I suspect (again) that most, if not all, of theother storage vendors have developed similar tools.
For the rest of you who are just looking to decide which storage vendors to invite on your next RFP, and which products you might like to investigate that matchthe level of performance you need for your next project or application deployment,than the SPC benchmarks might help you with this decision. If performance is importantto you, factor these benchmark comparisons with the rest of the attributes you arelooking for in a storage vendor and a storage system.
In my opinion, I feel that for some people, the SPC benchmarks provide some value in this decision making process. They are proportionally correct, in that even ifyour workload gets only a portion of the SPC estimate, that storage systems withfaster benchmarks will provide you better performance than storage systems with lower benchmark results. That is why I feel they can be relevant in makingvalid comparisons for purchase decisions.
Hopefully, I have provided enough "food for thought"on this subject to support why IBM participates in the Storage Performance Council, why the performance of the SAN Volume Controller can be compared to the performanceof other disk systems, and why we at IBM are proud of the recent benchmark results in our recent press release.
Enjoy the weekend!
technorati tags: IBM, SPC, EMC, Chuck Hollis, fastest, disk, system, SVC, HDD, storage, four corners, read-hit, read-miss, write-hit, write-miss, City, Highway, MPG, OLTP, SPC-1, SPC-2, benchmarks, file, database, video,
Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
- Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
- Seagate (including Maxstor they acquired)
- Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
- American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
- The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
- ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
- RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
- Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
technorati tags: IBM, Seagate, Hitachi, HGST, EMC, NetApp, HP, HDS, Sun, Google, CMU, DDM, Fujitsu, MTBF, MTTF, AFR, ARR, JBOD, RAID, Tivoli, SVC, DR550, CDP, FC, SCSI, disk, tape, SAN,
Continuing my week in Chicago, for the IBM Storage Symposium 2008, we had sessions that focused on individual products. IBM System Storage SAN Volume Controller (SVC) was a popular topic.
- SVC - Everything you wanted to know, but were afraid to ask!
Bill Wiegand, IBM ATS, who has been working with SAN Volume Controller since it was first introduced in 2003. answered some frequently asked questions about IBM System Storage SAN Volume Controller.
- Do you have to upgrade all of your HBAs, switches and disk arrays to the recommended firmware levels before upgrading SVC? No. These are recommended levels, but not required. If you do plan to update firmware levels, focus on the host end first, switches next, and disk arrays last.
- How do we request special support for stuff not yet listed on the Interop Matrix?
Submit an RPQ/SCORE, same as for any other IBM hardware.
- How do we sign up for SVC hints and tips? Go to the IBM
[SVC Support Site] and select the "My Notifications" under the "Stay Informed" box on the right panel.
- When we call IBM for SVC support, do we select "Hardware" or "Software"?
While the SVC is a piece of hardware, there are very few mechanical parts involved. Unless there are sparks,
smoke, or front bezel buttons dangling from springs, select "Software". Most of the questions are
related to the software components of SVC.
- When we have SVC virtualizing non-IBM disk arrays, who should we call first?
IBM has world-renown service, with some of IT's smartest people working the queues. All of the major storage vendors play nice
as part of the [TSAnet Agreement when a mutual customer is impacted.
When in doubt, call IBM first, and if necessary, IBM will contact other vendors on your behalf to resolve.
- What is the difference between livedump and a Full System Dump?
Most problems can be resolved with a livedump. While not complete information, it is generally enough,
and is completely non-disruptive. Other times, the full state of the machine is required, so a Full System Dump
is requested. This involves rebooting one of the two nodes, so virtual disks may temporarily run slower on that
- What does "svc_snap -c" do?
The "svc_snap" command on the CLI generates a snap file, which includes the cluster error log and trace files from all nodes. The "-c" parameter includes the configuration and virtual-to-physical mapping that can be useful for
disaster recovery and problem determination.
- I just sent IBM a check to upgrade my TB-based license on my SVC, how long should I wait for IBM to send me a software license key?
IBM trusts its clients. No software license key will be sent. Once the check clears, you are good to go.
- During migration from old disk arrays to new disk arrays, I will temporarily have 79TB more disk under SVC management, do I need to get a temporary TB-based license upgrade during the brief migration period?
Nope. Again, we trust you. However, if you are concerned about this at all, contact IBM and they will print out
a nice "Conformance Letter" in case you need to show your boss.
- How should I maintain my Windows-based SVC Master Console or SSPC server?
Treat this like any other Windows-based server in your shop, install Microsoft-recommended Windows updates,
run Anti-virus scans, and so on.
- Where can I find useful "How To" information on SVC?
Specify "SAN Volume Controller" in the search field of the
[IBM Redbooks vast library of helpful books.
- I just added more managed disks to my managed disk group (MDG), can I get help writing a script to redistribute the extents to improve wide-striping performance?
Yes, IBM has scripting tools available for download on
[AlphaWorks]. For example, svctools will take
the output of the "lsinfo" command, and generate the appropriate SVC CLI to re-migrate the disks around to optimize
performance. Of course, if you prefer, you can use IBM Tivoli Storage Productivity Center instead for a more
- Any rules of thumb for sizing SVC deployments?
IBM's Disk Magic tool includes support for SVC deployments. Plan for 250 IOPS/TB for light workloads,
500 IOPS/TB for average workloads, and 750 IOPS/TB for heavy workloads.
- Can I migrate virtual disks from one manage disk group (MDG) to another of different extent size?
Yes, the new Vdisk Mirroring capability can be used to do this. Create the mirror for your Vdisk between the
two MDGs, wait for the copy to complete, and then split the mirror.
- Can I add or replace SVC nodes non-disruptively? Absolutely, see the Technotes
[SVC Node Replacement page.
- Can I really order an SVC EE in Flamingo Pink? Yes. While my blog post that started all
this [Pink It and Shrink It] was initially just some Photoshop humor, the IBM product manager for SVC accepted this color choice as an RPQ option.
The default color remains Raven Black.
technorati tags: IBM, SVC, Audacity of Cope, svc_snap, Flamingo pink, Raven black, non-disruptive, svctools, AlphaWorks
Continuing my ongoing discussion on Solid State Disk (SSD), fellow blogger BarryB (EMC) points out in his [latest post
Oh – and for the record TonyP, I don't think I ever said EMC was using a newer or different EFDs than IBM. I just asserted that EMC knows more than IBM about these EFDs and how they actually work a storage array under real-world workloads.
(Here "EFD" is refers to "Enterprise Flash Drive", EMC's marketing term for Single Layer Cell (SLC) NAND Flash non-volatile solid-state storage devices. Both IBM and EMC have been selling solid-state storage for quite some time now, but EMC felt that a new term was required to distinguish the SLC NAND Flash devices sold in their disk systems from solid-state devices sold in laptops or blade servers. The rest of the industry, including IBM, continues to use the term SSD to refer to these same SLC NAND Flash devices that EMC is referring to.)
The disagreement resulted from his earlier statement from his post[IBM's amazing...part deux]:
Although STEC asserts that IBM is using the latest ZeusIOPS drives, IBM is only offering the 73GB and 146GB STEC drives (EMC is shipping the latest ZeusIOPS drives in 200GB and 400GB capacities for DMX4 and V-Max, affording customers a lower $/GB, higher density and lower power/footprint per usable GB.)
Here is where I enjoy the subtleties between marketing and engineering. Does the above seem like he is saying EMC is using newer or different drives? What are typical readers expected to infer from the statement above?
- That there are four different drives from STEC, in four different capacities. In the HDD world, drives of different capacities are often different, and larger capacities are often newer than those of smaller capacities.
- That the 200GB and 400GB are the latest drives, and that 73GB and 146GB drives are not the latest.
- That STEC press release is making false or misleading claims.
Uncontested, some readers might infer the above and come to the wrong conclusions. I made an effort to set the record straight. I'll summarize with a simple table:
|Raw capacity||128 GB||256 GB||512 GB|
|Usable (conservative format)||73 GB||146 GB||300 GB|
|Usable (aggressive format)||100 GB||200 GB||400 GB|
So, we all agree now that the 256GB drives that are formatted as 146GB or 200GB are in fact the same drives, that IBM and EMC both sell the latest drives offered by STEC, and that the STEC press release was in fact correct in its claims.
I also wanted to emphasize that IBM chose the more conservative format on purpose. BarryB [did the math himself] and proved my key points:
- Under some write-intensive workloads, an aggressive format may not last the full five years. (But don't worry, BarryB assures us that EMC monitors these drives and replaces them when they fail within the five years under their warranty program.)
- Conservative formats with double the spare capacity happen to have roughly double the life expectancy.
I agree with BarryB that an aggressive format can offer a lower $/GB than the conservative format. Cost-conscious consumers often look for less-expensive alternatives, and are often willing to accept less-reliable or shorter life expectancy as a trade-off. However, "cost-conscious" is not the typical EMC targeted customer, who often pay a premiumfor the EMC label. To compensate, EMC offers RAID-6 and RAID-10 configurations to provide added protection. With a conservative format, RAID-5 provides sufficient protection.
(Just so BarryB won't accuse me of not doing my own math, a 7+P RAID-5 using conservative format 146GB drives would provide 1022GB of capacity, versus 4+4 RAID-10 configuration using aggressive format 200GB drives only 800GB total.)
In an ideal world, you the consumer would know exactly how many IOPS your application will generate over the next five years, exactly how much capacity you will require, be offered all three drives in either format to choose from, and make a smart business decision. Nothing, however, is ever this simple in IT.
technorati tags: IBM, SSD, EMC, EFD, SLC, NAND, Flash, disk, storage systems, life expectancy, reliability, capacity, Barry Burke, STEC, IOPS
My post last week [Solid State Disk on DS8000 Disk Systems
] kicked up some dust in the comment section.Fellow blogger BarryB (a member of the elite [Anti-Social Media gang from EMC
]) tried to imply that 200GB solid state disk (SSD) drives were different or better than the 146GB drives used in IBM System Storage DS8000 disk systems. I pointed out that they are the actual same physical drive, just formatted differently.
To explain the difference, I will first have to go back to regular spinning Hard Disk Drives (HDD). There are variances in manufacturing, so how do you make sure that a spinning disk has AT LEAST the amount of space you are selling it as? The solution is to include extra. This is the same way that rice, flour, and a variety of other commodities are sold. Legally, if it says you are buying a pound or kilo of flour, then it must be AT LEAST that much to be legal labeling. Including some extra is a safe way to comply with the law. In the case of disk capacity, having some spare capacity and the means to use it follows the same general concept.
(Disk capacity is measured in multiples of 1000, in this case a Gigabyte (GB) = 1,000,000,000 bytes, not to be confused with [Gibibyte (GiB)] = 1,073,741,824 bytes, based on multiples of 1024.)
Let's say a manufacturer plans to sell 146GB HDD. We know that in some cases there might be bad sectors on the disk that won't accept written data on day 1, and there are other marginally-bad sectors that might fail to accept written data a few years later, after wear and tear. A manufacturer might design a 156GB drive with 10GB of spare capacity and format this with a defective-sector table that redirects reads/writes of known bad sectors to good ones. When a bad sector is discovered, it is added to the table, and a new sector is assigned out of the spare capacity.Over time, the amount of space that a drive can store diminishes year after year, and once it drops below its rated capacity, it fails to meet its legal requirements. Based on averages of manufacturing runs and material variances, these could then be sold as 146GB drives, with a life expectancy of 3-5 years.
With Solid State Disk, the technology requires a lot of tricks and techniques to stay above the rated capacity. For example, you can format a 256GB drive as a conservative 146GB usable, with an additional 110GB (75 percent) spare capacity to handle all of the wear-leveling. You could lose up to 22GB of cells per year, and still have the rated capacity for the full five-year life expectancy.
Alternatively, you could take a more aggressive format, say 200GB usable, with only 56GB (28 percent) of spare capacity. If you lost 22GB of cells per year, then sometime during the third year, hopefully under warranty, your vendor could replace the drive with a fresh new one, and it should last the rest of the five year time frame. The failed drive, having 190GB or so usable capacity, could then be re-issued legally as a refurbished 146GB drive to someone else.
The wear and tear on SSD happens mostly during erase-write cycles, so for read-intensive workloads, such as boot disks for operating system images, the aggressive 200GB format might be fine, and might last the full five years.For traditional business applications (70 percent read, 30 percent write) or more write-intensive workloads, IBM feels the more conservative 146GB format is a safer bet.
This should be of no surprise to anyone. When it comes to the safety, security and integrity of our client's data, IBM has always emphasized the conservative approach.[Read More]
Well, it's Tuesday, which means IBM makes its announcements!
This week, IBM announces that it now supports 50GB Solid State Disk (SSD) in its [IBM System Storage EXP3000] disk systems.IBM has already made announcements about SSD enablement in the DS8000 and SAN Volume Controller (SVC), but now the EXP3000 brings SSD technology down to smaller System x server deployments.
Adoption of this new exciting technology is still in the early stages, despite the fact that IBM and other vendors have been touting this technology for a while. (For a quick blast to the past, here was my first post on the subject back from December 20, 2006: [Hybrid, Solid State and the future of RAID])Recently, fellow blogger BarryB admitted that EMC have only sold SSD to [hundreds of their customers], and to be fair, I suspect IBM's sales of SSD in its BladeCenter servers [available since July 2007] have been in similar single-digit percentage territory as well.
The advantage of today's announcement is that you can mix and match SSD drives with SAS and SATA drives in the EXP3000. You won't have to buy the entire drawer of SSD, you can start with just a few, depending on your business needs. On the other extreme, you can have up to two drawers, with 12 SSD drives each, for a total of 24 drives directly attached to System x servers via the ServeRAID MR10M SAS/SATA controller adapter.
technorati tags: IBM, System x, ServeRAID MR10M, SAS, SATA, 50GB, SSD, EMC, BladeCenter
Perhaps the recent financial meltdown is making storage vendors nervous.Both IBM and EMC gained market share in 3Q08, but EMC is acting strangelyat IBM's latest series of plays and announcements. Almost contradictory!
- Benchmarks bad, rely on your own in-house evaluations instead
Let's start with fellow blogger Barry Burke from EMC, who offers his latest post[Benchmarketing Badly] with commentaryabout Enterprise Strategy Group's [DS5300 Lab Validation Report]. The IBM System Storage DS5300 is one of IBM's latest midrange disk systems recently announced. Take for example this excerpt from BarryB's blog post:
"I was pleasantly surprised to learn that both IBM and ESG agree with me about the relevance and importance of the Storage Performance Council benchmarks.Nowhere in the ESG report says this, nor have I found any public statements from either IBM nor ESG that makes this claim. Instead, the ESG report explains that traditional benchmarks from the Storage Performance Council [SPC] focus on a single, specific workload, and ESG has chosen to complement this with a variety of other benchmarks to perform their product validation, including VMware's "VMmark", Oracle's Orion Utility, and Microsoft's JetStress.
That is, SPC's are a meaningless tool by which to measure or compare enterprise storage arrays."
Benchmarks provide prospective clients additional information to make purchasedecisions. IBM understands this, ESG understands this, and other well-respected companies like VMware, Oracle and Microsoft understand this. EMC is afraid that benchmarks mightencourage a client to "mistakenly" purchase a faster IBM product than a slower EMC product. Sunshine makes a great disinfectant, but EMC (and vampires) prefer their respective "prospects" remain in the dark.
Perhaps stranger still is BarryB's postscript. Here's an excerpt:
"... a customer here asked me if EMC would be willing to participate in an initiative to get multiple storage vendors to collaborate on truly representative real-world "enterprise-class" benchmarks, and I reassured him that I would personally sponsor active and objective participation in such an effort - IF he could get the others to join in with similar intent."
As I understand it, EMC was once part of the Storage Performance Council a long time ago, then chose to drop out of it. Why re-invent the wheel by creating yet another storage industry benchmark group? EMC is welcome to come back to SPC anytime! In addition to the SCP-1 and SPC-2 workloads, there is work underway for an SPC-3 benchmark. Each SPC workload provides additional insight for product comparisons to help with purchase decisions. If EMC can suggest an SPC-4 benchmark that it feels is more representative of real-world conditions, they are welcome to join the SPC party and make that a reality.
The old adage applies: ["It's better to light a candle than curse the darkness"]. EMC has been cursing the lack of what it considers to be acceptable benchmarks but has yet to offer anything more realistic or representative than SPC.What does EMC suggest you do instead? Get an evaluation box and run your own workloads and see for yourself! EMC has in the past offered evaluation units specifically for this purpose.
- In-house evaluations bad, it's a trap!
Certainly, if you have the time and staff to run your own evaluation, with your own applications in your own environment, then I agree with EMC that this can provide better insight for your particular situation than standardized benchmarks.
In fact, that is exactly what IBM is doing for IBM XIV storage units, which are designed for Web 2.0 and Digital Archive workloads that current SPC benchmarks don't focus on. Fellow blogger Chuck Hollis from EMC opines in his post[Get yer free XIV!]. Here's an excerpt:
"Now that I think about it, this could get ugly. Imagine a customer who puts one on the floor to evaluate it, and -- in a moment of desperation or inattention -- puts production data on the device.
Nobody was paying attention, and there you are. Now IBM comes calling for their box back, and you've got a choice as to whether to go ahead and sign the P.O., or migrate all your data off the thing. Maybe they'll sell you an SVC to do this?
Yuck. I bet that happens more than once. And I can't believe that IBM (or the folks at XIV) aren't aware of this potentially happening."
Perhaps Chuck is speaking from experience here, as this may have happened with customers with EMC evaluation boxes, and is afraid this could happen with IBM XIV. I don't see anything unique about IBM XIV in the above concern. Typical evaluations involve copying test data onto the box, test it out with some particular application or workload, and then delete the data no longer required. Repeat as needed. Moving data off an IBM XIV is aseasy as moving data off an EMC DMX, EMC CLARiiON or EMC Celerra, and I am sure IBM wouldgladly demonstrate this on any EMC gear you now have.
Thanks to its clever RAID-X implementation, losing data on an IBM XIV is less likely thanlosing data on any RAID-5 based disk array from any storage vendor. Of course, there will always be skeptics about new technology that will want to try the box out for themselves.
If EMC thought the IBM XIV had nothing unique to offer, that its performance was just "OK",and is not as easy to manage as IBM says it is, then you would think EMC would gladly encourage such evaluations and comparisons, right?
No, I think EMC is afraid that companies will discover what they already know, that IBM has quality products that would stand a fair chance of side-by-side comparisons with their own offerings.We have enough fear, uncertainty and doubt from our current meltdown of the global financial markets, don't let EMC add any more.
Have a safe and fun Halloween! If you need to add some light to your otherwise dark surroundings, consider some of these ideas for [Jack-O-Lanterns]!
technorati tags: IBM, DS5300, ESG, benchmarks, SPC, SPC-1, SPC-2, SPC-3, VMware, VMmark, Oracle, Orion, Microsoft, JetStress, EMC, BarryB, RAID-X, RAID-5, DMX, CLARiiON, Celerra, XIV, financial, global markets, crisis, meltdown, Halloween, Jack-O-Lantern
In my post yesterday [Spreading out the Re-Replication process
], fellow blogger BarryB [aka The Storage Anarchist
]raises some interesting points and questions in the comments section about the new IBM XIV Nextra architecture.I answer these below not just for the benefit of my friends at EMC, but also for my own colleagues within IBM,IBM Business Partners, Analysts and clients that might have similar questions.
- If RAID 5/6 makes sense on every other platform, why not so on the Web 2.0 platform?
Your attempt to justify the expense of Mirrored vs. RAID 5 makes no sense to me. Buying two drives for every one drive's worth of usable capacity is expensive, even with SATA drives. Isn't that why you offer RAID 5 and RAID 6 on the storage arrays that you sell with SATA drives?Let's take a look at various disk configurations, for example 3TB on 750GB SATA drives:
And if RAID 5/6 makes sense on every other platform, why not so on the (extremely cost-sensitive) Web 2.0 platform? Is faster rebuild really worth the cost of 40+% more spindles? Or is the overhead of RAID 6 really too much for those low-cost commodity servers to handle.
- JBOD: 4 drives
- JBOD here is industry slang for "Just a Bunch of Disks" and was invented as the term for "non-RAID".Each drive would be accessible independently, at native single-drive speed, with no data protection. Puttingfour drives in a single cabinet like this provides simplicity and convenience only over four separate drivesin their own enclosures.
- RAID-10: 8 drives
- RAID-10 is a combination of RAID-1 (mirroring) and RAID-0 (striping). In a 4x2 configuration, data is striped across disks 1-4,then these are mirrored across to disks 5-8. You get performance improvement and protection against a singledrive failure.
- RAID-5: 5 drives
- This would be a 4+P configuration, where there would be four drives' worth of data scattered across fivedrives. This gives you almost the same performance improvement as RAID-10, similar protection againstsingle drive failure, but with fewer drives per usable TB capacity.
- RAID-6: 6 drives
- This would be a 4+2P configuration, where the first P represents linear parity, and the second represents a diagonal parity. Similar in performance improvement as RAID-5, but protects against single and double drive failures, and still better than RAID-10 in terms of drives per TB usable capacity.
For all the RAID configurations, rebuild would require a spare drive, but often spares are shared among multiple RAID ranks, not dedicated to a single rank. To this end, you often have to have several spares per I/O loop, and a different set of spares for each kind of speed and capacity. If you had a mix of 15K/73GB, 10K/146GB, and 7200/500GB drives, then you would have three sets of spares to match.
In contrast, IBM XIV's innovative RAID-X approach doesn't requireany spare drives, just spare capacity on existing drives being used to hold data. The objects can be mirroredbetween any two types of drives, so no need to match one with another.
All of these RAID levels represent some trade-off between cost, protection and performance, and IBM offers each of theseon various disk systems platforms. Calculating parity is more complicated than just mirrored copies, but this can be done with specialized chips in cache memory to minimize performance impact.IBM generally recommends RAID-5 for high-performance FC disk, and RAID-6 for slower, large capacity SATA disk.
However, the questionassumes that the drive cost is a large portion of the overall "disk system" cost. It isn't. For example,Jon Toigo discusses the cost of EMC's new AX4 disk system in his post [National Storage Rip-Off Day]:
- EMC is releasing its low end Clariion AX4 SAS/SATA array with 3TB capacity for $8600. It ships with four 750GB SATA drives (which you and I could buy at list for $239 per unit). So, if the disk drives cost $956 (presumably far less for EMC), that means buyers of the EMC wares are paying about $7700 for a tin case, a controller/backplane, and a 4Gbps iSCSI or FC connector. Hmm.
- Dell is offering EMC’s AX4-5 with same configuration for $13,000 adding a 24/7 warranty.
(Note: I checked these numbers. $8599 is the list price that EMC has on its own website. External 750GB drivesavailable at my local Circuit City ranged from $189 to $329 list price. I could not find anything on Dell'sown website, but found [The Register] to confirm the $13,000 with 24x7 warranty figure.)
Disk capacity is a shrinking portion of the total cost of ownership (TCO). In addition to capacity, you are paying forcache, microcode and electronics of the system itself, along with software and services that are included in the mix,and your own storage administrators to deal with configuration and management. For more on this, see [XIV storage - Low Total Cost of Ownership].
- EMC Centera has been doing this exact type of blob striping and protection since 2002
As I've noted before, there's nothing "magic" about it - Centera has been employing the same type of object-level replication for years. Only EMC's engineers have figured out how to do RAID protection instead of mirroring to keep the hardware costs low while not sacrificing availability.
I agree that IBM XIV was not the first to do an object-level architecture, but it was one of the first to apply object-level technologies to the particular "use case" and "intended workload" of Web 2.0 applications.
RAID-5 based EMC Centera was designed insteadto hold fixed-content data that needed to be protected for a specific period of time, such as to meet government regulatory compliance requirements. This is data that you most likelywill never look at again unless you are hit with a lawsuit or investigation. For this reason, it is important to get it on the cheapest storage configuration as possible. Before EMC Centera, customers stored this data on WORM tape and optical media, so EMC came up with a disk-only alternative offering.IBM System Storage DR550 offers disk-level access for themost recent archives, with the ability to migrate to much less expensive tape for the long term retention. The end result is that storing on a blended disk-plus-tape solution can help reduce the cost by a factor of 5x to 7x, making RAID level discussion meaningless in this environment. For moreon this, see my post [OptimizingData Retention and Archiving].
While both the Centera and DR550 are based on SATA, neither are designed for Web 2.0 platforms.When EMC comes out with their own "me, too" version, they will probably make a similar argument.
- IBM XIV Nextra is not a DS8000 replacement
Nextra is anything but Enterprise-class storage, much less a DS8000 replacement. How silly of all those folks to suggest such a thing.
I did searches on the Web and could not find anybody, other than EMC employees, who suggested that IBM XIV Nextra architecture represented a replacement for IBM System Storage DS8000. The IBM XIV press release does not mentionor imply this, and certainly nobody I know at IBM has suggested this.
The DS8000 is designed for a different "use case" andset of "intended workloads" than what the IBM XIV was designed for. The DS8000 is the most popular disk systemfor our IBM System z mainframe platform, for activities like Online Transaction Processing (OLTP) and large databases, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives. Web 2.0 customers that might chooseIBM XIV Nextra for their digital content might run their financial operations or metadata search indexes on DS8000.Different storage for different purposes.
As for the opinion that this is not "enterprise class", there are a variety of definitions that refer to this phrase.Some analysts look at "price band" of units that cost over $300,000 US dollars. Other analysts define this as beingattachable to mainframe servers via ESCON or FICON. Others use the term to refer to five-nines reliability, havingless than 5 minutes downtime per year. In this regard, based on the past two years experience at 40 customer locations,I would argue that it meets this last definition, with non-disruptive upgrades, microcode updates and hot-swappable components.
By comparison, when EMC introduced its object-level Centera architecture, nobody suggested it was the replacement for their Symmetrix or CLARiiON devices. Was it supposed to be?
- Given drive growth rates have slowed, improving utilization is mandatory to keep up with 60-70 percent CAGR
Look around you, Tony- all of your competitors are implementing thin provisioning specifically to drive physical utilization upwards towards 60-80%, and that's on top of RAID 5/RAID 6 storage and not RAID 1. Given that disk drive growth rates and $/GB cost savings have slowed significantly, improving utilization is mandatory just to keep up with the 60-70% CAGR of information growth.
Disk drive capacities have slowed for FC disk because much of the attention and investment has been re-directed to ATA technology. Dollar-per-GB price reduction is slowing for disks in general, as researchers are hitting physicallimitations to the amount of bits they can pack per square inch of disk media, and is now around 25 percent per year.The 60-70 percent Compound Annual Growth Rate (CAGR) is real, and can be even growing faster for Web 2.0providers. While hardware costs drop, the big ticket items to watch will be software, services and storage administrator labor costs.
To this end, IBM XIV Nextra offers thin provisioning and differential space-efficient snapshots. It is designed for 60-90 percent utilization, and can be expanded to larger capacities non-disruptively in a very scalable manner.
Well, I hope that helps clear some things up.
technorati tags: IBM, XIV, Nextra, EMC, BarryB, RAID-0, RAID-1, RAID-5, RAID-6, RAID-10, RAID-X, AX4, Dell, AX4-5, FC, SAS, SATA, iSCSI, TCO, blob, object-level, disk, storage, system, Centera, ESCON, FICON, Symmetrix, CLARiiON, ATA, CAGR, Web2.0
On his The Storage Architect
blog, Chris Evans wrote [Twofor the Price of One
]. He asks: why use RAID-1 compared to say a 14+2 RAID-6 configuration which would be much cheaper in terms of the disk cost?
Perhpaps without realizing it, answers itwith his post today [XIV part II
So, as a drive fails, all drives could be copying to all drives in an attempt to ensure the recreated lost mirrors are well distributed across the subsystem. If this is true, all drives would become busy for read/writes for the rebuild time, rather than rebuild overhead being isolated to just one RAID group.
Let me try to explain. (Note: This is an oversimplification of the actual algorithm in an effortto make it more accessible to most readers, based on written materials I have been provided as partof the acquisition.)
In a typical RAID environment, say 7+P RAID-5, you might have to read 7 drives to rebuild one drive, and in the case of a 14+2 RAID-6, reading 15 drives to rebuild one drive. It turns out the performance bottleneck is the one driveto write, and today's systems can rebuild faster Fibre Channel (FC) drives at about 50-55 MB/sec, and slower ATA disk at around 40-42 MB/sec. At these rates, a 750GB SATA rebuild would take at least 5 hours.
In the IBM XIV Nextra architecture, let's say we have 100 drives. We lose drive 13, and we need to re-replicate any at-risk 1MB objects.An object is at-risk if it is the last and only remaining copy on the system. A 750GB that is 90 percent full wouldhave 700,000 or so at-risk object re-replications to manage. These can be sorted by drive. Drive 1 might have about 7000 objects that need re-replication, drive 2might have slightly more, slightly less, and so on, up to drive 100. The re-replication of objects on these other 99 drives goes through three waves.
- Wave 1
Select 49 drives as "source volumes", and pair each randomly with a "destination volume". For example, drive 1 mapped todrive 87, drive 2 to drive 59, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
- Wave 2
50 volumes left.Select another 49 drives as "source volumes", and pair each with a "destination volume". For example, drive 87 mapped todrive 15, drive 59 to drive 42, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
- Wave 3
Only one drive left. We select the last volume as the source volume, pair it off with a random destination volume,and complete the process.
Each wave can take as little as 3-5 minutes. The actual algorithm is more complicated than this, as tasks complete early the source and volumes drives are available for re-assignment to another task, but you get the idea. XIV hasdemonstrated the entire process, identifying all at-risk objects, sorting them by drive location, randomly selectingdrive pairs, and then performing most of these tasks in parallel, can be done in 15-20 minutes. Over 40 customershave been using this architecture over the past 2 years, and by now all have probably experienced at least adrive failure to validate this methodology.
In the unlikely event that a second drive fails during this short time, only one of the 99 task fails. The other 98 tasks continue to helpprotect the data. By comparison, in a RAID-5 rebuild, no data is protected until all the blocks are copied.
As for requiring spare capacity on each drive to handle this case, the best disks in production environments aretypically only 85-90 percent full, leaving plenty of spare capacity to handle re-replication process. On average,Linux, UNIX and Windows systems tend to only fill disks 30 to 50 percent full, so the fear there is not enough sparecapacity should not be an issue.
The difference in cost between RAID-1 and RAID-5 becomes minimal as hardware gets cheaper and cheaper. For every $1 dollar you spend on storage hardware, you spend $5-$8 dollars managing the environment. As hardware gets cheaper still, it might even be worth making three copies of every 1MB object, the parallel processto perform re-replications would be the same. This could be done using policy-based management, some data gets triple-copied, and other data gets only double-copied, based on whether the user selected "premium" or "basic" service.
The beauty of this approach is that it works with 100 drives, 1000 drives, or even a million drives. Parallel processingis how supercomputers are able to perform feats of amazing mathematical computations so quickly, and how Web 2.0services like Google and Yahoo can perform web searches so quickly. Spreading the re-replication process acrossmany drives in parallel, rather than performing them serially onto a single drive, is just one of the many uniquefeatures of this new architecture.
technorati tags: Chris Evans, RAID-1, RAID-5, RAID-6, performance, bottleneck, FC, SATA, disk, system, IBM, XIV, Nextra, objects, re-replication, spare capacity
Well, we had another successful event in Second Life today.
Unlike our April 26 launch of our System Storage products for IBM Business Partners only, this time we decided this time to make it as a "Meet the Storage Experts" Q&A Panel format, and open up registration to everyone. Thesubject matter experts sat at the front of the room on four stools. We had six rows of chairs arrangedsemi-circularly.
Shown above, from left to right, are the avatars of our four experts:
- Steve Grillo
- IBM System Storage N series, focusing on recent N3000 disk system announcements
- Harold Pike (holding the microphone while speaking)
- IBM System Storage DS3000 and DS4000 series, focusing on recent DS3000 disk system announcements
- Eric Buckley
- IBM System Storage TS series, focusing on recent TS2230, TS3400 and TS7700 tape system announcements
- Pete Danforth
- IBM storage networking, focusing on recent IBM SAN256B director blade announcements
(you can read more about these products here:July announcements
While Eric was a veteran Second Lifer, having presented at our April event, the other three were trainedon how to raise their hand, speak into the microphone, sit on the stool, and so on. I want to thank allof our experts for putting in this effort!
The event was produced by Katrina H Smith. She did a great job, and made sure we were on top ofall the issues and tasks required to get the job done. Running a Second Life event is every bit ashard as running a real face-to-face event. We had several meetings to discuss venue details, placementof chairs, placement of product demos, audio/video recording, wall decorations, tee-shirt and coffee mug design, logistics, and so on.
I acted as moderator/emcee for the event. That is my back in the picture above. The process wassimple, modeled after the "Birds of a Feather" sessions at events like SHARE and the IBMStorage and Storage Networking Symposium. We threw out a list of topics the experts would cover,and people in the audience would "raise their left hand". I, as the moderator, would then walkover to each person, and hold out the microphone for them to ask the question. I would then repeat the question and ask the appropriate expert to provide an answer. We defined gestures onhow to "raise hand" and "put hand down" that we gave to each registered participant.
We had four dedicated "camera-avatars" in world to capture both video and screenshots.Our video editors are now working to edit "highlight videos" that we can use at future events, for training materials, and for our internal "BlueTube" online video system.
The room was filled with examples of each of our products, made into 3D objects that were dimensionallycorrect, and "textured" with photographs of the actual products. If you click on an object, you get a "notecard" that provided more information. Special thanks to Scott Bissmeyer for making all of theseobjects for us.
We made posters of each expert and placed them in all four corners of the room. On the bottom of each coffee mug was a picture of each of the experts, and if you walked under each of the posters, you were"dispensed" a coffee mug matching the expert shown in the poster.Participants could "Collect all Four!" When you bring the coffee mug up to takea sip, the picture on the bottom of the mug is exposed for all to see.And as a final give-away to the audience, we made a variety of event tee-shirts and polo-shirts.
At the end of the session, we asked everyone to click on the "Survey" kiosk near the exit door. We askedsix simple questions using SurveyMonkey.com that took only a fewminutes to process. We found asking questions immediately at the end of the event was the best way tocapture this feedback.
From a "Green" perspective, we had people registered from the following countries: US, India, Mexico,Australia, United Kingdom, Brazil, Germany, Argentina, Chile, China, Canada, and Venezuela. Second Lifeallows all these people who probably could not travel, or could not afford the time and expense to travel,to participate in a simulated face-to-face meeting without energy consumption of traditional travel methods.
More importantly, we got several leads for business. People often ask "Yes, but is there any businessassociated with this?" This time, there was, based on the answers to the questions, several avatars asked for a real sales call to follow-up on the products and offerings they were discussed.
With such a great success, we have already scheduled our next Second Life event, November 8. Mark your calendars! I'll postmore details on the registration process of the November event when available.
technorati tags: IBM, secondlife, meet, the, storage, experts, Steve Grillo, Harold Pike, Eric Buckley, Pete Danforth, Katrina Smith, Scott Bissmeyer, US, India, Mexico, Australia, UK, Brazil, Germany, Argentina, Chile, China, Canada, Venezuela, Green, business
The proof-of-concept that IBM Haifa research center developed back in 1998 became what we now call the iSCSI protocol.The book iSCSI: The Universal Storage Connection
introduces the history as follows:
In the fall of 1999 IBM and Cisco met to discuss the possibility of combining their SCSI-over-TCP/IP efforts. After Cisco saw IBM's demonstration of SCSI over TCP/IP, the two companies agreed to develop a proposal that would be taken to the IETF for standardization.
There are three ways to introduce iSCSI into your data center:
- Through a gateway, like the IBM System Storage N series gateway, that allows iSCSI-based servers connect to FC-based storage devices
- Through a SAN switch or director, a FC-based server can access iSCSI-based storage, an iSCSI-based server accessing FC-based storage, or even iSCSI-based servers attaching to iSCSI-based storage.
- Directly through the storage controller.
IBM has been delivering the first method with its successful IBM System Storage N series gateway products, buttoday we have announced additional support for the second and third methods.Here's a quick recap.
- New SAN director blades
Supporting the second method, IBM TotalStorage SAN256B Director is enhanced to deliver iSCSI functionality with a new M48 iSCSI Blade, which includes 16 ports (8 Fibre Channel ports; and 8 Ethernet ports for iSCSI connectivity). We also announced a new Fibre Channel M48 Blade which provides 10 Gbps Fibre Channel Inter Switch Link (ISL) connectivity between SAN256B Directors.
- Entry Level Disk Systems
Supporting the third method, IBM introduces new iSCSI-capabable disk systems, including the IBM System Storage DS3300 model using SAS drives, N3300 A10/A20 models and N3600 A10/A20 models supporting use of FC and SATA drives.The DS3000 Express models include frequently requested options, including appropriate host bus adapters (HBA) and cables. Likewise, we have announced hardware features for our IBM N series , such as TCP offload engine (TOE) network interface cards (NIC), HBA and cables.
With support for Boot-over-iSCSI, diskless rack-optimized and blade servers can boot Windows or Linux over Ethernet,eliminating the management hassles with internal disk.
All of this is part of IBM's overall push into the Small and Medium size Business marketplace, making it easier to shop for and buy from IBM and its many IBM Business Partners, easier to deploy and install storage, and easier tomanage the storage once you have it.
technorati tags: IBM, Cisco, iSCSI, gateways, announcement, DS3300, N3300, N3600, A10, A20, SAS, FC, SATA, HBA, TOE, NIC, cables, SCSI, TCP, boot-over-iSCSI, SMB, Ethernet