Comments (3) Visits (9330)
A few weeks ago, my Tivo(R) digital video recorder (DVR) died. All of my digital clocks in my house were flashing 12:00 so I suspect it wasa power strike while I was at the office. The only other item to die was the surge protector,and so it did what it was supposed to do, give up its own life to protect the rest of myequipment. Although somehow, it did not protect my Tivo.
I opened a problem ticket with Sony, and they sent me instructions on how to send itover to another state to get it repaired.Amusingly, the instructions included "Please make a backup of the drive contents beforesending the unit in for repair." Excuse me? How am I supposed to do that, exactly?
My model has only a single 80GB drive, and so my friend and I removed the drive and attachedit to one of our other systems to see if anything was salvageable. It failed every diagnostictest. There was just not enough to read to be usable elsewhere.
This is typical of many home systems. They are not designed for robust usage, high availability, nor any form of backup/recovery process. Some of the newer models havetwo drives in a RAID-1 mode configuration, but most have many single points of failure.
And certainly, it is not mission critical data. Life goes on without the last few episodesof Jack Bauer on "24", or the various Food Network shows that I recorded for items I planto bake some day. For the past few weeks, I have spent more time listening to the radioand reading books. Somehow, even though my television runs fine without my Tivo, watchingTV in "real time" just isn't the same.
I suspect that if you gave someone a method to do the backup, most would not bother to useit. People are now relying more and more heavily on their home
The smart people at the University of Pittsburgh manage five campuses and over 33,000 students, andneeded to create an enterprise storage solution that would give it three key benefits. Of course, they turnedto IBM, the number one overall storage hardware vendor, to deliver.
Here is what Jinx Walton, Director of Computing Services and Systems Development at the University of Pittsburgh, had to say about it...
"The University of Pittsburgh supports large enterprise systems, and the number and complexity of new systems continue to grow. To effectively manage these systems it was necessary to identify an enterprise storage solution that would leverage our existing investments in storage, make allocation of storage flexible and responsive to project needs, provide centralized management, and offer the reliability and stability we require. The integrated IBM storage solution met these requirements"
You can read the details in the official IBM press release.Read More]
Comment (1) Visits (9132)
Often, when looking at disk storage it is easy to focus on comparisons to other disk storage, but disruptive technologies cross boundaries. Already we have seen Flash Memory drives on the IBM BladeCenter, replacing traditional disk drives internal to each blade server. They are smaller than regular disk drives, but big enough to hold the operating system to boot from.
The New York Times has an article by John Markoff, Redefining the Architecture of Memory that talks about IBM's research on "Racetrack Memory".The article is a good read, but here are some interesting excerpts:
Now, if an idea that Stuart S. P. Parkin is kicking around in an I.B.M. lab here is on the money, electronic devices could hold 10 to 100 times the data in the same amount of space.
This technology has the potential to break some of the physical limitations that are currently worrying disk drive designers. I look forward to see how this plays out.Read More]
This Doonesbury cartoonabout Second Life reminded me about our September 20 event.
Registration for the "Meet the Storage Experts" event in Second Life will close this week fornext week's September 20 event. All IBMers, clients and IBM Business Partners are welcome to attend. We will focus this time on DS3000 and N series disk systems, tape systems,and IBM storage networking gear.
If you miss this one, we plan to have another one in November!Read More]
Comments (6) Visits (12678)
The Storage Architect writes in his post:
Array-based replication does have drawbacks; all externalised storage becomes dependent on the virtualising array. This makes replacement potentially complex. To date, HDS have not provided tools to seamlessly migrate away from one USP to another (as far as I am aware). In addition, there's the problem of "all your eggs in one basket"; any issue with the array (e.g. physical intervention like fire, loss of power, microcode bug etc) could result in loss of access to all of your data. Consider the upgrade scenario of moving to a higher level of code; if all data was virtualised through one array, you would want to be darn sure that both the upgrade process and the new code are going to work seamlessly...
I would argue that the IBM System Storage SAN Volume Controller (SVC) is more like the HDS USP, and less like the Invista. Both SVC and USP provide a common look and feel to the application server, both provide additional cache to external disk, both are able to provide a consistent set of copy services.
IBM designed the SVC so that upgrades can occur non-disruptively. You can replace the hardware nodes, one node at a time, while the SVC system is up and running, without disruption to reading and writing data on virtual disk. You can upgrade the software, one node at a time, while the SVC system is up and running, without disruption to reading and writing data on virtual disk. You can upgrade the firmware on the managed disk arrays behind the SVC, again, without disruption to reading and writing data on virtual disk.
More importantly, SVC has the ultimate "un-do" feature. It is called "image mode". If for any reason you want to take a virtual disk out of SVC management, you migrate over to an "image mode" LUN, and then disconnect it from SVC. The "image mode" LUN can then be used directly, with all the file system data in tact.
I define "virtualization" as technology that makes one set of resources look and feel like a different set of resources with more desirable characteristics. For SVC, the more desirable characteristics include choice of multi-pathing driver, consistent copy services, improved performance, etc. For EMC Invista, the question is "more desirable for whom?" EMC Invista seems more designed to meet EMC's needs, not its customers. EMC profits greatly from its EMC PowerPath multi-pathing driver, and from its SRDF copy services, so it appears to have designed a virtualization offering that:
A post from Dan over at Architectures of Control explains the anti-social nature of public benches. City planners, in an effort to discourage homeless people from sleeping on benches in parks or sidewalks, design benches that are so uncomfortableto use, that nobody uses them. These included benches made of metal that are too hot or too cold during certainmonths, benches slanted at an angle that dump you on the ground if you lay down, or benches that have dividers sothat you must be in an upright seated position to use.
This is not a disparagement of split-path switch-based designs. Rather, EMC's specific implementation appears to be designed for it to continuevendor lock-in for its multi-pathing driver, continuevendor lock-in for its copy services when used with EMC disk, and only provide slightly improved data migration capability for heterogeneous storage environments. Other switch-based solutions, such as those from Incipient or StoreAge, had different goals in mind.
Sadly, my IBM colleague BarryW and I have probably spent more words discussing Invista than all eleven EMC bloggers combined this year. While everyone in the industry is impressed how often EMC can sell "me, too" products with an incredibly large marketing budget, EMC appears not to have set aside funds for the Invista.
If a customer could design the ideal "storage virtualization" solution that would provide them the characteristics they desire the most from storage resources, it would not be anything like an Invista. While there are pros and cons between IBM's SVC and HDS's TagmaStore offerings, the reason both IBM and HDS are the market leaders in storage virtualization is because both companies are trying to provide value to the customer, just in different ways, and with different implementations.
Comments (6) Visits (10413)
When new technologies are introduced to the marketplace, it is normal for customers to be skeptical.
My sister is a mechanical engineer, so when she needs to configure a part or component, she candesign it on the computer, and then use a "Rapid Prototyping Machine"that acts like a 3D printer, to generate a plastic part that matches the specifications. Some machinesdo this by taking a hunk of plastic and cutting it down to the appropriate shape, and others use glue andpowder to assemble the piece.
But not everything is that simple. Harry Beckwith deals with the issue of selling services and software featuresin his book "Selling the Invisible". How do you sell a service before it is performed? How do you sell a softwarefeature based on new technology that the customer is not familiar with?
Our good friends over at NetApp, our technology partners for the IBM System Storage N series, developed a"storage savings estimator" tool that can provide good insight into the benefits of Advanced Single InstanceStorage (A-SIS) deduplication feature.
I decided to run the tool to analyze my own IBM Thinkpad C: drive (Windows operating system and programs) and D: drive ("My Documents" folder containing all my data files) to see how much storage savings thetool would estimate. Here are my results:
WINXP-C-07G (C: drive)Total Number of Directories: 1272Total Number of Files: 56265Total Number of Symbolic Links: 0Total Number of Hard Links: 41996Total Number of 4k Blocks: 2395884Total Number of 512b Blocks: 18944730Total Number of Blocks: 2395884Total Number of Hole Blocks: 290258Total Number of Unique Blocks: 1611792Percentage of Space Savings: 20.61Scan Start Time: Wed Sep 5 14:37:06 2007Scan End Time: Wed Sep 5 14:53:51 2007WINXP-D-07H (D: drive)Total Number of Directories: 507Total Number of Files: 7242Total Number of Symbolic Links: 0Total Number of Hard Links: 11744Total Number of 4k Blocks: 3954712Total Number of 512b Blocks: 31610595Total Number of Blocks: 3954712Total Number of Hole Blocks: 3204Total Number of Unique Blocks: 3524605Percentage of Space Savings: 10.79Scan Start Time: Wed Sep 5 14:21:16 2007Scan End Time: Wed Sep 5 14:34:30 2007
I am impressed with the results, and have a better understanding of the way A-SIS works. A-SIS looks at every4kB block of data, and creates a "fingerprint", a type of hash code of the contents. If two blocks have different "fingerprints", then the contents are known to be different. If two blocks have the same fingerprint, it is mathematically possible for them to be unique in content, so A-SIS schedules a byte-for-byte comparison to be sure they are indeed the same. This might happen hours after the block is initially written to disk, but is a much safer implementation, and does not slow down the applications writing data.
(In an effort to provide support "real time" as data was being written, earlier versions of deduplication
The estimator tool runs on any x86-based Laptop, personal computer or server, and can scan direct-attached, SAN-attached, or NAS-attached file systems. If you are a customer shopping around for deduplication, ask your IBM pre-sales technical support, storage sales rep, or IBM Business Partner to analyze your data. Tools like this can help make a simple cost-benefit analysis: the cost of licensing the A-SIS software feature versus the amount of storage savings.
technorati tags: IBM, Rapid prototyping, 3D printer, Harry Beckwith, Selling the Invisible, IBM, NetApp, Advanced Single Instance Storage, A-SIS, deduplication, fingerprint, hash code, EMC, flaw, MD5, Centera
Comments (4) Visits (12105)
The proof-of-concept that IBM Haifa research center developed back in 1998 became what we now call the iSCSI protocol.The book iSCSI: The Universal Storage Connection introduces the history as follows:
In the fall of 1999 IBM and Cisco met to discuss the possibility of combining their SCSI-over-TCP/IP efforts. After Cisco saw IBM's demonstration of SCSI over TCP/IP, the two companies agreed to develop a proposal that would be taken to the IETF for standardization.
There are three ways to introduce iSCSI into your data center:
IBM has been delivering the first method with its successful IBM System Storage N series gateway products, buttoday we have announced additional support for the second and third methods.Here's a quick recap.
With support for Boot-over-iSCSI, diskless rack-optimized and blade servers can boot Windows or Linux over Ethe All of this is part of IBM's overall push into the Small and Medium size Business marketplace, making it easier to shop for and buy from IBM and its many IBM Business Partners, easier to deploy and install storage, and easier tomanage the storage once you have it.
All of this is part of IBM's overall push into the Small and Medium size Business marketplace, making it easier to shop for and buy from IBM and its many IBM Business Partners, easier to deploy and install storage, and easier tomanage the storage once you have it.
In his blog Rough Type, Nick Carr asks Where is my CloudBook?and points to John Markoff's 2-part series in the New York Times on computing in the clouds.(Read it here: Part 1, Part 2)
At first, I thought he meant computing while in an airplane, but instead, he is talking about computing on a laptop or other hand-held device that does not have an internal disk drive, no installedoperating system, no internal data storage. Instead, the idea is that you boot from a CD, accessyour data, and even some of your programs, over the internet. John used an Ubuntu Linux LiveCD in his example.
This week, I am in Sao Paulo, Brazil, and was "in the clouds" for over 10 hours flying from Dallas to here.The one time I am guaranteed "off-line" from the internet is on the plane, and I spend enough time on planesthat I am able to get work done despite being "disconnected".
The same reasons people want to get out of having a disk drive on their laptop, are the reasons data centersare getting out of internal disk on their servers.
Booting from CD is especially clever. No more worrying about fixing your Windows registry, viruses,corrupted operating system files, or the cruft that accumulates on your C: drive that slowsyou down. The CD is the sameevery time, so it is like running your system with a freshly installed operating system every day.
The need for central repositories of data harkens back to the years of the IBM mainframe. Of course, whatmade sense back then continues to make sense now. The old 3270 terminals stored no data, and instead merelyprovided keyboard input and display text screen output to the vast amount of data stored on the central system.Today, the inputs are different, using your finger or mouse instead to point to what you want, sliding itacross to make things happen, and the output may now include photos, audio and video, but the concept isstill the same.
I carry my Ubuntu Linux LiveCD with me on every business trip. Combined with externally rewriteable media,such as a USB key, you can get work done even when you are in an airplane, and upload it whenyou are back on the net.
The IBM Storage and Storage Networking Symposium continues ...
technorati tags: Phil Allison, Fidelity National, FIS, PAIO, IBM, DS8300, Global Mirror, Ciena, CN2000, Cisco, 9216i, SDH, SONET, Gigabit, Ethernet, GigE, Layer-2, Layer-3, Fibre Channel, network impairment, Dave Canan, bare metal restore, bare machine recovery, Tivoli Storage Manager, TSM, ASR, Ghost, Cristie, CBMR, Linux, Windows PE, Bart PE, Bill+Giles, hospital, PACS, cardiology, radiology, storage, utilization, BOF, CKD, ESCON, FICON, FCP, SCSI, iSCSI, z/OS, z/VM[Read More]
Comments (2) Visits (11372)
The IBM Storage and Storage Networking Symposium in Las Vegas continues ...
I can tell that many people are feeling like they are "drinking from a firehose".IBM's success in storage reaches out to so many different aspects of information management,a variety of industries, and disciplines as varied as regulatory compliance and medical imaging.
technorati tags: IBM, storage, symposium, NAS, Vmware, N series, Allison Pate, Ron Henkhaus, DR550, express, Business Continuity, iostat, AIX, SLA, TS1120, tape, drive, LTO, LTO-4, Tony Abete, encryption, key, management, drinking firehose[Read More]
Comment (1) Visits (9179)
Registration is now open for our next "Meet the Storage Experts" event in Second Life. All IBMers, clients and IBM Business Partners are welcome to attend. We will focus this time on DS3000 and N series disk systems, tape systems,and IBM storage networking gear.
Comment (1) Visits (11674)
I have arrived safely in Las Vegas for the IBM System Storage and Storage Networking Symposium. This eventis held once every year. The gold sponsors were: Brocade, Cisco, Finisar, Servergraph, and VMware. Our silversponsor was Qlogic.
Barry Rudolph was the keynote speaker with "Storage for the Green Data Center", similar to his presentationfor Storage Networking World in April, but with new and improved slides.
I myself had a busy day. Here's a quick recap:
The last session I attended was "Storage .. to Optimize your ECM depoloyments" by Jerry Bower, now working for IBM as part of our recent acquisition of the Filenet company. ECM stands for Enterprise Content Management, and IBM is the market leader in this space. Jerry gave a great overview of IBM Content Manager software suite, our newly acquired Filenet portfolio, and the storage supported.
After the sessions was a reception at the Solution Center with dozens of exhibitor booths. For example,Optica Technologies had their PRIZM productswhich are able to connect FICON servers to ESCON storage devices.
technorati tags: IBM, storage, networking, symposium, Brocade, Cisco, Finisar, Servergraph, VMware, Qlogic, Barry Rudolph, green, datacenter, strategy, ILM, ITIL, SNIA, SMI-S, offering, disk, tape, software, SAN Volume Controller, SVC, David Snyder, Mark Prybylski, Jerry Bower, Filenet, ECM, Optica, FICON, ESCON[Read More]
Comment (1) Visits (9959)
I am back at "the Office" for a single day today. This happens often enough I need a name for it.Air Force pilots that practice landing and take-offs call them "Touch and Go", but I think I needsomething better. If you can think of a better phrase, let me know.
This week, I was in Hartford, CT, Somers, NY and our Corporate Headquarters in Armonk, in a varietyof meetings, some with editors of magazines, others with IBMers I have only spoken to over the phone andfinally got a chance to meet face to face.
I got back to Tucson last night, had meetings this morning in Second Life, then presented "Inf Sunday, I leave for Las Vegas for our upcoming IBM Storage and Storage Networking Symposium. We will cover the latest in our disk, tape, storage networking and related software.Do you have your tickets? If you plan to attend, and want to meet up with me, let me know.
Sunday, I leave for Las Vegas for our upcoming IBM Storage and Storage Networking Symposium. We will cover the latest in our disk, tape, storage networking and related software.Do you have your tickets? If you plan to attend, and want to meet up with me, let me know.Read More]
Last week, a writer for a magazine contacted us at IBM to confirm a quote that writing a Terabyte (TB) on disk saves 50,000 trees. I explained that this was cited from UC Berkeley's famousHow Much Information? 2003 study.
I thought of this today as I read Jefferson Graham's article "How many trees did your iPhone bill kill?" in the USA Today newspaper. Apparently, new Apple iPhone users were sent AT&T billing statements that detailed their every phone call, text message or internet access. Here's a video on YouTube from Justine Ezarik that shows the absurdity of a 300-page monthly phone bill:
To be fair, the USA Today article explains that AT&T also offers "summary billing" as well as "on-line billing", but apparently neither of these are the default choice. I can understand that phone companies send out bills on paper because not everyone who has a phone has internet access, but in the case of its iPhone customers, internet access is in the palm of your hands! Since all iPhone customers have internet access, and AT&T knows which customers are using an iPhone, it would make sense for either on-line billing or summary billing to be the default choice, and let only those that hate trees explicitly request the full billing option.
Sending a box of 300 pages of printed paper is expensive, both for the sender and the recipient. This informationcould have been shipped less expensively on computer media, a single floppy diskette or CDrom for example. Forthose who prefer getting this level of detail, a searchable digitized version might be more useful to the consumer.
Which brings me to the concept of Information Lifecycle Management (ILM). You can read my recent posts on ILM byclicking the Lifecycle tab on the right panel, or my now infamous post from last year about ILM for my iPod.
His recollection of the history and evolution of ILM fairly matches mine:
While the SNIA definition provides a vendor-independent platform to start the conversation, it can be intimidatingto some, and is difficult to memorize word for word.When I am briefing clients, especially high-level executives, they often ask for ILM to be explained in simpler terms. My simplified version is:
So ILM is not just a good idea to save a company money, it can keep them out of the court room, as well as help save the environment and not kill so many trees. Now that 100 percent of iPhone customers have internet access, and a goodnumber of non-iPhone customers have internet access at home, work, school or public library, it makes sense for companies to ask people to "opt-in" to getting their statements on paper, rather than forcing them to "opt-out".
technorati tags: IBM, Terabyte, TB, 50,000 trees, Jefferson Graham, USAtoday, Apple, iPhone, iPod, AT&T, Justine Ezarik, YouTube, Information, Lifecycle, Management, ILM, SNIA, EMC, Sun, StorageTek, HP, asset, laptops, expense, employees, privacy, exposure, liability, unethical tampering, unexpected loss, unauthorized access, opt-in, opt-out[Read More]
Comments (4) Visits (9580)
Jon W Toigo over at Drunkendata has had a great set of posts on his skepticism of storage vendors touting their "green storage" solutions. My apologies for my"unnecessary" use of quotation marks.
The ones I liked specifically were:
The last of which refers to this ComputerWorld article "EPA: U.S. needs more power plants to support data centers", which claims "from a technology perspective, the systems most responsible for gobbling up power are the relatively low-cost x86 servers ..." The article is based onthe recent EPA report that was just released.
Last month, in my post How manys Watts per Terabyte, I mentioned:
Some people find it surprising that it is often more cost-effective, and power-efficient, to run workloads on mainframe logical partitions (LPARs) than a stack of x86 servers running VMware.
Perhaps they won't be surprised any more. Here is an article in eWeek that explains how IBM isreducing energy costs 80% by consolidating 3,900 rack-optimized servers to 33 IBM System z mainframe servers, running Linux, in its own data centers. Since 1997, IBM has consolidated its 155 strategic worldwide data center locations down to just seven.
I am very pleased that IBM has invested heavily into Linux, with support across servers, storage, software andservices. Linux is allowing IBM to deliver clever, innovative solutions that may not be possible with other operating systems. If you are in storage, you should consider becoming more knowledgeable in Linux.
The older systems won't just end up in a landfill somewhere. Instead, the details are spelled out inthe IBM Press Release:
As part of the effort to protect the environment, IBM Global Asset Recovery Services, the refurbishment and recycling unit of IBM, will process and properly dispose of the 3,900 reclaimed systems. Newer units will be refurbished and resold through IBM's sales force and partner network, while older systems will be harvested for parts or sold for scrap. Prior to disposition, the machines will be scrubbed of all sensitive data. Any unusable e-waste will be properly disposed following environmentally compliant processes perfected over 20 years of leading environmental skill and experience in the area of IT asset disposition.
Whereas other vendors might think that some operational improvements will be enough, such as switching to higher-capacity SATA drives, or virtualizing x86 servers, IBM recognizes that sometimes more fundamental changes are required to effect real changes and real results.
I would like to welcome IBMer Barry Whyte to the blogosphere!
From his bio:
Barry Whyte is a 'Master Inventor' working in the Systems & Technology Group based in IBM Hursley, UK. Barry primarly works on the IBM SAN Volume Controller virtualization appliance. Barry graduated from The University of Glasgow in 1996 with a B.Sc (Hons) in Computing Science. In his 10 years at IBM he has worked on the successful Serial Storage Architecture (SSA) range of products and the follow-on Fibre Channel products used in the IBM DS8000 series. Barry joined the SVC development team soon after its inception and has held many positions before taking on his current role as SVC performance architect. Outside of work, Barry enjoys playing golf and all things to do with Rotary Engines.
To avoid confusion in future posts, I will refer to Barry Whyte as BarryW, and fellow EMC blogger Barry Burke (aka the Storage Anarchist) as BarryB.
I'm in Chicago this week, but it is actually HOTTER here than in my home town of Tucson, Arizona.
Comments (5) Visits (16454)
Perhaps I wrapped up my exploration of disk system performance one day too early. (While it is Friday here in Malaysia, it is still only Thursday back home)
Barry Burke, EMC blogger (aka The Storage Anarchist) writes:
Aren't you mixing metrics here?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
technorati tags: IBM, EMC, Storage Anarchist, MPG, MPH, IOPS, NASCAR, Malaysia, Watts, GB, green, back-end, DMX-3, DMX-4, HDS, USP, USP-V, SPC, SPC-1, SPC-2, standardized, benchmarks, workload, DS8000, disk, storage, tape[Read More]
Comments (5) Visits (12682)
Wrapping up this week's exploration on disk system performance, today I willcover the Storage Performance Council (SPC) benchmarks, and why I feel they are relevant to help customers make purchase decisions. This all started to address a comment from EMC blogger Chuck Hollis, who expressed his disappointment in IBM as follows:
You've made representations that SPC testing is somehow relevant to customers' environments, but offered nothing more than platitudes in support of that statement.
Apparently, while everyone else in the blogosphere merely states their opinions and moves on,IBM is held to a higher standard. Fair enough, we're used to that.Let's recap what we covered so far this week:
Today, I will explore ways to apply these metrics to measure and compare storageperformance.
Let's take, for example, an IBM System Storage DS8000 disk system. This has a controller thatsupports various RAID configurations, cache memory, and HDD inside one or more frames.Engineers who are testing individual components of this system might run specifictypes of I/O requests to test out the performance or validate certain processing.
Known affectionately in the industry as the "four corners" test, because you can show them on a box, with writes on the left, reads on the right,hits on the top, and misses on the bottom.Engineers are proud of these results, but these workloads do notreflect any practical production workload. At best, since all I/O requests are oneof these four types, the four corners provide an expectation range from the worst performance (most often write-missin the lower left corner)and the best performance (most often read-hit in the upper right corner) you might get with a real workload.
To understand what is needed to design a test that is more reflective of real business conditions,let's go back to yesterday's discussion of fuel economy of vehicles, with mileage measured in miles per gallon.The How Stuff Works websiteoffers the following description for the two measurements taken by the EPA:
Why two different measurements? Not everyone drives in a city in stop-and-go traffic. Having only one measurement may not reflect the reality that you may travel long distances on the highway. Offering both city and highway measurements allows the consumers to decide which metric relates closer to their actual usage.
Should you expect your actual mileage to be the exact same as the standardized test?Of course not. Nobody drives exactly 11 miles in the city every morning with 23 stops along the way,or 10 miles on the highway at the exact speeds listed.The EPA's famous phrase "your mileage may vary" has been quickly adopted into popular culture's lexicon. All kinds of factors, like weather, distance, anddriving style can cause people to get better or worse mileage than thestandardized tests would estimate.
Want more accurate results that reflect your driving pattern, in specific conditions that you are most likely to drive in? You could rentdifferent vehicles for a week and drive them around yourself, keeping track of whereyou go, and how fast you drove, and how many gallons of gas you purchased, so thatyou can then repeat the process with another rental, and so on, and then use yourown findings to base your comparisons. Perhaps you find that your results are always20% worse than EPA estimates when you drive in the city, and 10% worse when you driveon the highway. Perhaps you have many mountains and hills where you drive, you drive too fast, you run the Air Conditioner too cold, or whatever.
If you did this with five or more vehicles, and ranked them best to worstfrom your own findings, and also ranked them best to worst based on the standardizedresults from the EPA, you likely will find the order to be the same. The vehiclewith the best standardized result will likely also have the best result from your ownexperience with the rental cars. The vehicle with the worst standardized result willlikely match the worst result from your rental cars.
(This will be one of my main points, that standardized estimates don't have to be accurate to beuseful in making comparisons. The comparisons and decisions you would make with estimatesare the same as you would have made with actual results, or customized estimates based on current workloads. Because the rankings are in the same order, they are relevant and useful for making decisions based on those comparisons.)
Most people shopping around for a new vehicle do not have the time or patience to do this with rental cars. Theycan use the EPA-certified standardized results to make a "ball-park" estimate on how much they will spendin gasoline per year, decide only on cars that might go a certain distancebetween two cities on a single tank of gas, or merely to provide ranking of thevehicles being considered. While mileage may not be the only metric used in making a purchase decision, it can certainly be used to help reduce your consideration setand factor in with other attributes, like number of cup-holders, or leather seats.
In this regard, the Storage Performance Council has developed two benchmarks that attempt to reflect normal business usage, similar to "City" and "Highway" driving measurements.
The SPC-2 benchmark was added when people suggested that not everyone runs OLTP anddatabase transactional update workloads, just as the "Highway" measurement was addedto address the fact that not everyone drives in the City.
If you are one of the customers out there willing to spend the time and resources to do your own performance benchmarking, either at your own data center, or with theassistance of a storage provider, I suspect most, if not all, the major vendors(including IBM, EMC and others), and perhaps even some of the smaller start-ups, would be glad to work with you.
If you want to gather performance data of your actual workloads, and use this to estimate how your performance might be with a new or different storage configuration, IBMhas tools to make these estimates, and I suspect (again) that most, if not all, of theother storage vendors have developed similar tools.
For the rest of you who are just looking to decide which storage vendors to invite on your next RFP, and which products you might like to investigate that matchthe level of performance you need for your next project or application deployment,than the SPC benchmarks might help you with this decision. If performance is importantto you, factor these benchmark comparisons with the rest of the attributes you arelooking for in a storage vendor and a storage system.
In my opinion, I feel that for some people, the SPC benchmarks provide some value in this decision making process. They are proportionally correct, in that even ifyour workload gets only a portion of the SPC estimate, that storage systems withfaster benchmarks will provide you better performance than storage systems with lower benchmark results. That is why I feel they can be relevant in makingvalid comparisons for purchase decisions.
Hopefully, I have provided enough "food for thought"on this subject to support why IBM participates in the Storage Performance Council, why the performance of the SAN Volume Controller can be compared to the performanceof other disk systems, and why we at IBM are proud of the recent benchmark results in our recent press release.
Enjoy the weekend!
technorati tags: IBM, SPC, EMC, Chuck Hollis, fastest, disk, system, SVC, HDD, storage, four corners, read-hit, read-miss, write-hit, write-miss, City, Highway, MPG, OLTP, SPC-1, SPC-2, benchmarks, file, database, video,[Read More]
Comments (2) Visits (8899)
Continuing our exploration this week into the performance of disk systems, today I will cover the metrics to measure performance. Why do people have metrics?
Several bloggers suggested that perhaps an analogy to vehicles would be reasonable, given that cars and trucks are expensive pieces of engineering equipment, and people make purchase decisions between different makes and models.
In the United States, the Environmental Protection Agency (EPA) government entity is responsible for measuringfuel economy of vehicles using the metric Miles Per Gallon (mpg).Specifically, these are U.S. miles (not nautical miles) and U.S. gallons, not imperial gallons. It is importantwhen defining metrics that you are precise on the units involved.
Since nearly all vehicles are driven by gallons of gasoline, and travel miles of distance, this is a great metric to use for comparing all kinds of vehicles, including motorcycles, cars, trucks and airplanes. The EPA has a fuel economy website to help people make these comp What about storage performance? What could we use as the "MPG"-like metric that would allow you to compare different makes and models of storage? The two most commonly used are I/O requests per second (IOPS) and Megabytes transferred per second (MB/s). To understand the difference in each one, let's go back to our analogy from yesterday's post. In this example, it might have only taken 1 second to actually provide the answer, but it might have taken 10-30 seconds to pick up the phone, hear the request, respond, and then hang up the phone. If one person is able to do this in 10 seconds, on average, then he can handle 360 questions per hour. If another person takes 30 seconds, then only 120 questions per hour. Many business applications read or write less than 4KB of information per I/O request, and as such the dominant factor is not the amount of time to transfer the data, but how quickly the disk system can respond to each request. IOPS is very much like counting "Questions handled per hour" at the public library. To be more specific on units, we may specify the specific block size of the request, say 512 bytes or 4096 bytes, to make comparisons consistent. Now suppose that instead of asking for something with a short answer, you ask the public library to read you the article from a magazine, identify all the movies and show times of a local theatre, or recite a work from Shakespeare. In this case, the time it took to pick up the phone and respond is very small compared to the time it takes to deliverthe information, and could be measured instead in words per minute. Some employees of the library may be faster talkers, having perhaps worked in auction houses in a prior job, and can deliver more words per minute than other employees. MB/s is very much like counting "Spoken words per minute" at the public library. To be more specific on units, we may request a specific amount of information, say the words contained in "Romeo and Juliet", to make comparisons consistent. Now that we understand the metrics involved, tomorrow we can discuss how to use these in the measurement process.
What about storage performance? What could we use as the "MPG"-like metric that would allow you to compare different makes and models of storage?
The two most commonly used are I/O requests per second (IOPS) and Megabytes transferred per second (MB/s). To understand the difference in each one, let's go back to our analogy from yesterday's post.
In this example, it might have only taken 1 second to actually provide the answer, but it might have taken 10-30 seconds to pick up the phone, hear the request, respond, and then hang up the phone. If one person is able to do this in 10 seconds, on average, then he can handle 360 questions per hour. If another person takes 30 seconds, then only 120 questions per hour. Many business applications read or write less than 4KB of information per I/O request, and as such the dominant factor is not the amount of time to transfer the data, but how quickly the disk system can respond to each request. IOPS is very much like counting "Questions handled per hour" at the public library. To be more specific on units, we may specify the specific block size of the request, say 512 bytes or 4096 bytes, to make comparisons consistent.
Now suppose that instead of asking for something with a short answer, you ask the public library to read you the article from a magazine, identify all the movies and show times of a local theatre, or recite a work from Shakespeare. In this case, the time it took to pick up the phone and respond is very small compared to the time it takes to deliverthe information, and could be measured instead in words per minute. Some employees of the library may be faster talkers, having perhaps worked in auction houses in a prior job, and can deliver more words per minute than other employees. MB/s is very much like counting "Spoken words per minute" at the public library. To be more specific on units, we may request a specific amount of information, say the words contained in "Romeo and Juliet", to make comparisons consistent.
Now that we understand the metrics involved, tomorrow we can discuss how to use these in the measurement process.
Comments (8) Visits (12452)
Yesterday, I started this week's topic discussing the various areas of exploration to helpunderstand our recent press release of the IBM System Storage SAN Volume Controller and itsimpressive SPC-1 and SPC-2 benchmark results that ranks it the fastest disk system in the industry.
Some have suggested that since the SVC has a unique design, it should be placed in its own category,and not compared to other disk systems. To address this, I would like to define what IBM meansby "disk system" and how it is comparable to other disk systems.
When I say "disk system", I am going to focus specifically on block-oriented direct-access storage systems, which I will define as:
One or more IT components, connected together, that function as a whole, to serve as a target forread and write requests for specific blocks of data.
Clarification: One could argue, and several do in various comments below, that there are other typesof storage systems that contain disks, some that emulate sequential access tape libraries, some that emulate file-systems through CIFS or NFS protocols, and some that support thestorage of archive objects and other fixed content. At the risk of looking like I may be including or excluding such to fit my purposes, I wanted to avoid appl
People who have been working a long time in the storage industry might be satisfied by this definition, thinkingof all the disk systems that would be included by this definition, and recognize that other types of storage liketape systems that are appropriately excluded.
Others might be scratching their heads, thinking to themselves "Huh?" So, I will provide some background, history, and additional explanation. Let's break up the definition into different phrases, and handle each separately.
So, the SAN Volume Controller is a disk system comprising of one to four node-pairs. Each node is a piece of IT equipment that have processors and cache. These node-pairs are connected to a pair of UPS power supplies to protect the cache memory holding writes that have not yet been de-staged. The combination of node-pairs and UPS acting as a whole, is able to serve as a target to SCSI commands sent over Fibre Channel cables on a Storage Area Network (SAN). To read some blocks of data, it uses its internal cache storage to satisfy the request, and for others, it goes out to external disk systems that contain the data required. All writes are satisfied immediately in cache on the SVC, and later de-staged to external disk when appropriate.
As of end of 2Q07, having reached our four-year anniversary for this product, IBM has sold over 9000 SVC nodes, which are part of more than 3100 SVC disk systems. These things are flying off the shelves, clocking in a 100% YTY growth over the amount we sold twelve months ago. Congratulations go to the SVC development team for their impressive feat of engineering that is starting to catch the attention of many customers and return astounding results!
So, now that I have explained why the SVC is considered a disk system, tomorrow I'll discuss metrics to measure performance.