Comment (1) Visits (9390)
Registration is now open for our next "Meet the Storage Experts" event in Second Life. All IBMers, clients and IBM Business Partners are welcome to attend. We will focus this time on DS3000 and N series disk systems, tape systems,and IBM storage networking gear.
Comment (1) Visits (12206)
I have arrived safely in Las Vegas for the IBM System Storage and Storage Networking Symposium. This eventis held once every year. The gold sponsors were: Brocade, Cisco, Finisar, Servergraph, and VMware. Our silversponsor was Qlogic.
Barry Rudolph was the keynote speaker with "Storage for the Green Data Center", similar to his presentationfor Storage Networking World in April, but with new and improved slides.
I myself had a busy day. Here's a quick recap:
The last session I attended was "Storage .. to Optimize your ECM depoloyments" by Jerry Bower, now working for IBM as part of our recent acquisition of the Filenet company. ECM stands for Enterprise Content Management, and IBM is the market leader in this space. Jerry gave a great overview of IBM Content Manager software suite, our newly acquired Filenet portfolio, and the storage supported.
After the sessions was a reception at the Solution Center with dozens of exhibitor booths. For example,Optica Technologies had their PRIZM productswhich are able to connect FICON servers to ESCON storage devices.
technorati tags: IBM, storage, networking, symposium, Brocade, Cisco, Finisar, Servergraph, VMware, Qlogic, Barry Rudolph, green, datacenter, strategy, ILM, ITIL, SNIA, SMI-S, offering, disk, tape, software, SAN Volume Controller, SVC, David Snyder, Mark Prybylski, Jerry Bower, Filenet, ECM, Optica, FICON, ESCON[Read More]
Comment (1) Visits (10225)
I am back at "the Office" for a single day today. This happens often enough I need a name for it.Air Force pilots that practice landing and take-offs call them "Touch and Go", but I think I needsomething better. If you can think of a better phrase, let me know.
This week, I was in Hartford, CT, Somers, NY and our Corporate Headquarters in Armonk, in a varietyof meetings, some with editors of magazines, others with IBMers I have only spoken to over the phone andfinally got a chance to meet face to face.
I got back to Tucson last night, had meetings this morning in Second Life, then presented "Inf Sunday, I leave for Las Vegas for our upcoming IBM Storage and Storage Networking Symposium. We will cover the latest in our disk, tape, storage networking and related software.Do you have your tickets? If you plan to attend, and want to meet up with me, let me know.
Sunday, I leave for Las Vegas for our upcoming IBM Storage and Storage Networking Symposium. We will cover the latest in our disk, tape, storage networking and related software.Do you have your tickets? If you plan to attend, and want to meet up with me, let me know.Read More]
Last week, a writer for a magazine contacted us at IBM to confirm a quote that writing a Terabyte (TB) on disk saves 50,000 trees. I explained that this was cited from UC Berkeley's famousHow Much Information? 2003 study.
I thought of this today as I read Jefferson Graham's article "How many trees did your iPhone bill kill?" in the USA Today newspaper. Apparently, new Apple iPhone users were sent AT&T billing statements that detailed their every phone call, text message or internet access. Here's a video on YouTube from Justine Ezarik that shows the absurdity of a 300-page monthly phone bill:
To be fair, the USA Today article explains that AT&T also offers "summary billing" as well as "on-line billing", but apparently neither of these are the default choice. I can understand that phone companies send out bills on paper because not everyone who has a phone has internet access, but in the case of its iPhone customers, internet access is in the palm of your hands! Since all iPhone customers have internet access, and AT&T knows which customers are using an iPhone, it would make sense for either on-line billing or summary billing to be the default choice, and let only those that hate trees explicitly request the full billing option.
Sending a box of 300 pages of printed paper is expensive, both for the sender and the recipient. This informationcould have been shipped less expensively on computer media, a single floppy diskette or CDrom for example. Forthose who prefer getting this level of detail, a searchable digitized version might be more useful to the consumer.
Which brings me to the concept of Information Lifecycle Management (ILM). You can read my recent posts on ILM byclicking the Lifecycle tab on the right panel, or my now infamous post from last year about ILM for my iPod.
His recollection of the history and evolution of ILM fairly matches mine:
While the SNIA definition provides a vendor-independent platform to start the conversation, it can be intimidatingto some, and is difficult to memorize word for word.When I am briefing clients, especially high-level executives, they often ask for ILM to be explained in simpler terms. My simplified version is:
So ILM is not just a good idea to save a company money, it can keep them out of the court room, as well as help save the environment and not kill so many trees. Now that 100 percent of iPhone customers have internet access, and a goodnumber of non-iPhone customers have internet access at home, work, school or public library, it makes sense for companies to ask people to "opt-in" to getting their statements on paper, rather than forcing them to "opt-out".
technorati tags: IBM, Terabyte, TB, 50,000 trees, Jefferson Graham, USAtoday, Apple, iPhone, iPod, AT&T, Justine Ezarik, YouTube, Information, Lifecycle, Management, ILM, SNIA, EMC, Sun, StorageTek, HP, asset, laptops, expense, employees, privacy, exposure, liability, unethical tampering, unexpected loss, unauthorized access, opt-in, opt-out[Read More]
Comments (4) Visits (10009)
Jon W Toigo over at Drunkendata has had a great set of posts on his skepticism of storage vendors touting their "green storage" solutions. My apologies for my"unnecessary" use of quotation marks.
The ones I liked specifically were:
The last of which refers to this ComputerWorld article "EPA: U.S. needs more power plants to support data centers", which claims "from a technology perspective, the systems most responsible for gobbling up power are the relatively low-cost x86 servers ..." The article is based onthe recent EPA report that was just released.
Last month, in my post How manys Watts per Terabyte, I mentioned:
Some people find it surprising that it is often more cost-effective, and power-efficient, to run workloads on mainframe logical partitions (LPARs) than a stack of x86 servers running VMware.
Perhaps they won't be surprised any more. Here is an article in eWeek that explains how IBM isreducing energy costs 80% by consolidating 3,900 rack-optimized servers to 33 IBM System z mainframe servers, running Linux, in its own data centers. Since 1997, IBM has consolidated its 155 strategic worldwide data center locations down to just seven.
I am very pleased that IBM has invested heavily into Linux, with support across servers, storage, software andservices. Linux is allowing IBM to deliver clever, innovative solutions that may not be possible with other operating systems. If you are in storage, you should consider becoming more knowledgeable in Linux.
The older systems won't just end up in a landfill somewhere. Instead, the details are spelled out inthe IBM Press Release:
As part of the effort to protect the environment, IBM Global Asset Recovery Services, the refurbishment and recycling unit of IBM, will process and properly dispose of the 3,900 reclaimed systems. Newer units will be refurbished and resold through IBM's sales force and partner network, while older systems will be harvested for parts or sold for scrap. Prior to disposition, the machines will be scrubbed of all sensitive data. Any unusable e-waste will be properly disposed following environmentally compliant processes perfected over 20 years of leading environmental skill and experience in the area of IT asset disposition.
Whereas other vendors might think that some operational improvements will be enough, such as switching to higher-capacity SATA drives, or virtualizing x86 servers, IBM recognizes that sometimes more fundamental changes are required to effect real changes and real results.
I would like to welcome IBMer Barry Whyte to the blogosphere!
From his bio:
Barry Whyte is a 'Master Inventor' working in the Systems & Technology Group based in IBM Hursley, UK. Barry primarly works on the IBM SAN Volume Controller virtualization appliance. Barry graduated from The University of Glasgow in 1996 with a B.Sc (Hons) in Computing Science. In his 10 years at IBM he has worked on the successful Serial Storage Architecture (SSA) range of products and the follow-on Fibre Channel products used in the IBM DS8000 series. Barry joined the SVC development team soon after its inception and has held many positions before taking on his current role as SVC performance architect. Outside of work, Barry enjoys playing golf and all things to do with Rotary Engines.
To avoid confusion in future posts, I will refer to Barry Whyte as BarryW, and fellow EMC blogger Barry Burke (aka the Storage Anarchist) as BarryB.
I'm in Chicago this week, but it is actually HOTTER here than in my home town of Tucson, Arizona.
Comments (5) Visits (16780)
Perhaps I wrapped up my exploration of disk system performance one day too early. (While it is Friday here in Malaysia, it is still only Thursday back home)
Barry Burke, EMC blogger (aka The Storage Anarchist) writes:
Aren't you mixing metrics here?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
technorati tags: IBM, EMC, Storage Anarchist, MPG, MPH, IOPS, NASCAR, Malaysia, Watts, GB, green, back-end, DMX-3, DMX-4, HDS, USP, USP-V, SPC, SPC-1, SPC-2, standardized, benchmarks, workload, DS8000, disk, storage, tape[Read More]
Comments (5) Visits (13174)
Wrapping up this week's exploration on disk system performance, today I willcover the Storage Performance Council (SPC) benchmarks, and why I feel they are relevant to help customers make purchase decisions. This all started to address a comment from EMC blogger Chuck Hollis, who expressed his disappointment in IBM as follows:
You've made representations that SPC testing is somehow relevant to customers' environments, but offered nothing more than platitudes in support of that statement.
Apparently, while everyone else in the blogosphere merely states their opinions and moves on,IBM is held to a higher standard. Fair enough, we're used to that.Let's recap what we covered so far this week:
Today, I will explore ways to apply these metrics to measure and compare storageperformance.
Let's take, for example, an IBM System Storage DS8000 disk system. This has a controller thatsupports various RAID configurations, cache memory, and HDD inside one or more frames.Engineers who are testing individual components of this system might run specifictypes of I/O requests to test out the performance or validate certain processing.
Known affectionately in the industry as the "four corners" test, because you can show them on a box, with writes on the left, reads on the right,hits on the top, and misses on the bottom.Engineers are proud of these results, but these workloads do notreflect any practical production workload. At best, since all I/O requests are oneof these four types, the four corners provide an expectation range from the worst performance (most often write-missin the lower left corner)and the best performance (most often read-hit in the upper right corner) you might get with a real workload.
To understand what is needed to design a test that is more reflective of real business conditions,let's go back to yesterday's discussion of fuel economy of vehicles, with mileage measured in miles per gallon.The How Stuff Works websiteoffers the following description for the two measurements taken by the EPA:
Why two different measurements? Not everyone drives in a city in stop-and-go traffic. Having only one measurement may not reflect the reality that you may travel long distances on the highway. Offering both city and highway measurements allows the consumers to decide which metric relates closer to their actual usage.
Should you expect your actual mileage to be the exact same as the standardized test?Of course not. Nobody drives exactly 11 miles in the city every morning with 23 stops along the way,or 10 miles on the highway at the exact speeds listed.The EPA's famous phrase "your mileage may vary" has been quickly adopted into popular culture's lexicon. All kinds of factors, like weather, distance, anddriving style can cause people to get better or worse mileage than thestandardized tests would estimate.
Want more accurate results that reflect your driving pattern, in specific conditions that you are most likely to drive in? You could rentdifferent vehicles for a week and drive them around yourself, keeping track of whereyou go, and how fast you drove, and how many gallons of gas you purchased, so thatyou can then repeat the process with another rental, and so on, and then use yourown findings to base your comparisons. Perhaps you find that your results are always20% worse than EPA estimates when you drive in the city, and 10% worse when you driveon the highway. Perhaps you have many mountains and hills where you drive, you drive too fast, you run the Air Conditioner too cold, or whatever.
If you did this with five or more vehicles, and ranked them best to worstfrom your own findings, and also ranked them best to worst based on the standardizedresults from the EPA, you likely will find the order to be the same. The vehiclewith the best standardized result will likely also have the best result from your ownexperience with the rental cars. The vehicle with the worst standardized result willlikely match the worst result from your rental cars.
(This will be one of my main points, that standardized estimates don't have to be accurate to beuseful in making comparisons. The comparisons and decisions you would make with estimatesare the same as you would have made with actual results, or customized estimates based on current workloads. Because the rankings are in the same order, they are relevant and useful for making decisions based on those comparisons.)
Most people shopping around for a new vehicle do not have the time or patience to do this with rental cars. Theycan use the EPA-certified standardized results to make a "ball-park" estimate on how much they will spendin gasoline per year, decide only on cars that might go a certain distancebetween two cities on a single tank of gas, or merely to provide ranking of thevehicles being considered. While mileage may not be the only metric used in making a purchase decision, it can certainly be used to help reduce your consideration setand factor in with other attributes, like number of cup-holders, or leather seats.
In this regard, the Storage Performance Council has developed two benchmarks that attempt to reflect normal business usage, similar to "City" and "Highway" driving measurements.
The SPC-2 benchmark was added when people suggested that not everyone runs OLTP anddatabase transactional update workloads, just as the "Highway" measurement was addedto address the fact that not everyone drives in the City.
If you are one of the customers out there willing to spend the time and resources to do your own performance benchmarking, either at your own data center, or with theassistance of a storage provider, I suspect most, if not all, the major vendors(including IBM, EMC and others), and perhaps even some of the smaller start-ups, would be glad to work with you.
If you want to gather performance data of your actual workloads, and use this to estimate how your performance might be with a new or different storage configuration, IBMhas tools to make these estimates, and I suspect (again) that most, if not all, of theother storage vendors have developed similar tools.
For the rest of you who are just looking to decide which storage vendors to invite on your next RFP, and which products you might like to investigate that matchthe level of performance you need for your next project or application deployment,than the SPC benchmarks might help you with this decision. If performance is importantto you, factor these benchmark comparisons with the rest of the attributes you arelooking for in a storage vendor and a storage system.
In my opinion, I feel that for some people, the SPC benchmarks provide some value in this decision making process. They are proportionally correct, in that even ifyour workload gets only a portion of the SPC estimate, that storage systems withfaster benchmarks will provide you better performance than storage systems with lower benchmark results. That is why I feel they can be relevant in makingvalid comparisons for purchase decisions.
Hopefully, I have provided enough "food for thought"on this subject to support why IBM participates in the Storage Performance Council, why the performance of the SAN Volume Controller can be compared to the performanceof other disk systems, and why we at IBM are proud of the recent benchmark results in our recent press release.
Enjoy the weekend!
technorati tags: IBM, SPC, EMC, Chuck Hollis, fastest, disk, system, SVC, HDD, storage, four corners, read-hit, read-miss, write-hit, write-miss, City, Highway, MPG, OLTP, SPC-1, SPC-2, benchmarks, file, database, video,[Read More]
Comments (2) Visits (9194)
Continuing our exploration this week into the performance of disk systems, today I will cover the metrics to measure performance. Why do people have metrics?
Several bloggers suggested that perhaps an analogy to vehicles would be reasonable, given that cars and trucks are expensive pieces of engineering equipment, and people make purchase decisions between different makes and models.
In the United States, the Environmental Protection Agency (EPA) government entity is responsible for measuringfuel economy of vehicles using the metric Miles Per Gallon (mpg).Specifically, these are U.S. miles (not nautical miles) and U.S. gallons, not imperial gallons. It is importantwhen defining metrics that you are precise on the units involved.
Since nearly all vehicles are driven by gallons of gasoline, and travel miles of distance, this is a great metric to use for comparing all kinds of vehicles, including motorcycles, cars, trucks and airplanes. The EPA has a fuel economy website to help people make these comp What about storage performance? What could we use as the "MPG"-like metric that would allow you to compare different makes and models of storage? The two most commonly used are I/O requests per second (IOPS) and Megabytes transferred per second (MB/s). To understand the difference in each one, let's go back to our analogy from yesterday's post. In this example, it might have only taken 1 second to actually provide the answer, but it might have taken 10-30 seconds to pick up the phone, hear the request, respond, and then hang up the phone. If one person is able to do this in 10 seconds, on average, then he can handle 360 questions per hour. If another person takes 30 seconds, then only 120 questions per hour. Many business applications read or write less than 4KB of information per I/O request, and as such the dominant factor is not the amount of time to transfer the data, but how quickly the disk system can respond to each request. IOPS is very much like counting "Questions handled per hour" at the public library. To be more specific on units, we may specify the specific block size of the request, say 512 bytes or 4096 bytes, to make comparisons consistent. Now suppose that instead of asking for something with a short answer, you ask the public library to read you the article from a magazine, identify all the movies and show times of a local theatre, or recite a work from Shakespeare. In this case, the time it took to pick up the phone and respond is very small compared to the time it takes to deliverthe information, and could be measured instead in words per minute. Some employees of the library may be faster talkers, having perhaps worked in auction houses in a prior job, and can deliver more words per minute than other employees. MB/s is very much like counting "Spoken words per minute" at the public library. To be more specific on units, we may request a specific amount of information, say the words contained in "Romeo and Juliet", to make comparisons consistent. Now that we understand the metrics involved, tomorrow we can discuss how to use these in the measurement process.
What about storage performance? What could we use as the "MPG"-like metric that would allow you to compare different makes and models of storage?
The two most commonly used are I/O requests per second (IOPS) and Megabytes transferred per second (MB/s). To understand the difference in each one, let's go back to our analogy from yesterday's post.
In this example, it might have only taken 1 second to actually provide the answer, but it might have taken 10-30 seconds to pick up the phone, hear the request, respond, and then hang up the phone. If one person is able to do this in 10 seconds, on average, then he can handle 360 questions per hour. If another person takes 30 seconds, then only 120 questions per hour. Many business applications read or write less than 4KB of information per I/O request, and as such the dominant factor is not the amount of time to transfer the data, but how quickly the disk system can respond to each request. IOPS is very much like counting "Questions handled per hour" at the public library. To be more specific on units, we may specify the specific block size of the request, say 512 bytes or 4096 bytes, to make comparisons consistent.
Now suppose that instead of asking for something with a short answer, you ask the public library to read you the article from a magazine, identify all the movies and show times of a local theatre, or recite a work from Shakespeare. In this case, the time it took to pick up the phone and respond is very small compared to the time it takes to deliverthe information, and could be measured instead in words per minute. Some employees of the library may be faster talkers, having perhaps worked in auction houses in a prior job, and can deliver more words per minute than other employees. MB/s is very much like counting "Spoken words per minute" at the public library. To be more specific on units, we may request a specific amount of information, say the words contained in "Romeo and Juliet", to make comparisons consistent.
Now that we understand the metrics involved, tomorrow we can discuss how to use these in the measurement process.
Comments (8) Visits (12749)
Yesterday, I started this week's topic discussing the various areas of exploration to helpunderstand our recent press release of the IBM System Storage SAN Volume Controller and itsimpressive SPC-1 and SPC-2 benchmark results that ranks it the fastest disk system in the industry.
Some have suggested that since the SVC has a unique design, it should be placed in its own category,and not compared to other disk systems. To address this, I would like to define what IBM meansby "disk system" and how it is comparable to other disk systems.
When I say "disk system", I am going to focus specifically on block-oriented direct-access storage systems, which I will define as:
One or more IT components, connected together, that function as a whole, to serve as a target forread and write requests for specific blocks of data.
Clarification: One could argue, and several do in various comments below, that there are other typesof storage systems that contain disks, some that emulate sequential access tape libraries, some that emulate file-systems through CIFS or NFS protocols, and some that support thestorage of archive objects and other fixed content. At the risk of looking like I may be including or excluding such to fit my purposes, I wanted to avoid appl
People who have been working a long time in the storage industry might be satisfied by this definition, thinkingof all the disk systems that would be included by this definition, and recognize that other types of storage liketape systems that are appropriately excluded.
Others might be scratching their heads, thinking to themselves "Huh?" So, I will provide some background, history, and additional explanation. Let's break up the definition into different phrases, and handle each separately.
So, the SAN Volume Controller is a disk system comprising of one to four node-pairs. Each node is a piece of IT equipment that have processors and cache. These node-pairs are connected to a pair of UPS power supplies to protect the cache memory holding writes that have not yet been de-staged. The combination of node-pairs and UPS acting as a whole, is able to serve as a target to SCSI commands sent over Fibre Channel cables on a Storage Area Network (SAN). To read some blocks of data, it uses its internal cache storage to satisfy the request, and for others, it goes out to external disk systems that contain the data required. All writes are satisfied immediately in cache on the SVC, and later de-staged to external disk when appropriate.
As of end of 2Q07, having reached our four-year anniversary for this product, IBM has sold over 9000 SVC nodes, which are part of more than 3100 SVC disk systems. These things are flying off the shelves, clocking in a 100% YTY growth over the amount we sold twelve months ago. Congratulations go to the SVC development team for their impressive feat of engineering that is starting to catch the attention of many customers and return astounding results!
So, now that I have explained why the SVC is considered a disk system, tomorrow I'll discuss metrics to measure performance.