Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Systems Client Experience Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The technology industry is full of trade-offs. Take for example solar cells that convert sunlight to electricity. Every hour, more energy hits the Earth in the form of sunlight than the entire planet consumes in an entire year. The general trade-off is between energy conversion efficiency versus abundance of materials:
Get 9-11 percent efficiency using rare materials like indium (In), gallium (Ga) or cadmium (Cd).
Get only 6.7 percent efficiency using abundant materials like copper (Cu), tin (Sn), zinc (Zn), sulfur (S), and selenium (Se)
A second trade-off is exemplified by EMC's recent GeoProtect announcement. This appears similar to the geographic dispersal method introduced by a company called [CleverSafe]. The trade-off is between the amount of space to store one or more copies of data and the protection of data in the event of disaster. Here's an excerpt from fellow blogger Chuck Hollis (EMC) titled ["Cloud Storage Evolves"]:
"Imagine a average-sized Atmos network of 9 nodes, all in different time zones around the world. And imagine that we were using, say, a 6+3 protection scheme.
The implication is clear: any 3 nodes could be completely lost: failed, destroyed, seized by the government, etc.
-- and the information could be completely recovered from the surviving nodes."
For organizations worried about their information falling into the wrong hands (whether criminal or government sponsored!), any subset of the nodes would yield nothing of value -- not only would the information be presumably encrypted, but only a few slices of a far bigger picture would be lost.
Seized by the government?falling into the wrong hands? Is EMC positioning ATMOS as "Storage for Terrorists"? I can certainly appreciate the value of being able to protect 6PB of data with only 9PB of storage capacity, instead of keeping two copies of 6PB each, the trade-off means that you will be accessing the majority of your data across your intranet, which could impact performance. But, if you are in an illicit or illegal business that could have a third of your facilities "seized by the government", then perhaps you shouldn't house your data centers there in the first place. Having two copies of 6PB each, in two "friendly nations", might make more sense.
(In reality, companies often keep way more than just two copies of data. It is not unheard of for companies to keep three to five copies scattered across two or three locations. Facebook keeps SIX copies of photographs you upload to their website.)
ChuckH argues that the governments that seize the three nodes won't have a complete copy of the data. However, merely having pieces of data is enough for governments to capture terrorists. Even if the striping is done at the smallest 512-byte block level, those 512 bytes of data might contain names, phone numbers, email addresses, credit cards or social security numbers. Hackers and computer forensics professionals take advantage of this.
You might ask yourself, "Why not just encrypt the data instead?" That brings me to the third trade-off, protection versus application performance. Over the past 30 years, companies had a choice, they could encrypt and decrypt the data as needed, using server CPU cycles, but this would slow down application processing. Every time you wanted to read or update a database record, more cycles would be consumed. This forced companies to be very selective on what data they encrypted, which columns or fields within a database, which email attachments, and other documents or spreadsheets.
An initial attempt to address this was to introduce an outboard appliance between the server and the storage device. For example, the server would write to the appliance with data in the clear, the appliance would encrypt the data, and pass it along to the tape drive. When retrieving data, the appliance would read the encrypted data from tape, decrypt it, and pass the data in the clear back to the server. However, this had the unintended consequences of using 2x to 3x more tape cartridges. Why? Because the encrypted data does not compress well, so tape drives with built-in compression capabilities would not be able to shrink down the data onto fewer tapes.
(I covered the importance of compressing data before encryption in my previous blog post
[Sock Sock Shoe Shoe].)
Like the trade-off between energy efficiency and abundant materials, IBM eliminated the trade-off by offering compression and encryption on the tape drive itself. This is standard 256-bit AES encryption implemented on a chip, able to process the data as it arrives at near line speed. So now, instead of having to choose between protecting your data or running your applications with acceptable performance, you can now do both, encrypt all of your data without having to be selective. This approach has been extended over to disk drives, so that disk systems like the IBM System Storage DS8000 and DS5000 can support full-disk-encryption [FDE] drives.
Last week, I presented IBM's strategic initiative, the IBM Information Infrastructure, which is part of IBM's New Enterprise Data Center vision. This week, I will try to get around to talking about some of theproducts that support those solutions.
I was going to set the record straight on a variety of misunderstandings, rumors or speculations, but I think most have been taken care of already. IBM blogger BarryW covered the fact that SVC now supports XIV storage systems, in his post[SVC and XIV],and addressed some of the FUD already. Here was my list:
Now that IBM has an IBM-branded model of XIV, IBM will discontinue (insert another product here)
I had seen speculation that XIV meant the demise of the N series, the DS8000 or IBM's partnership with LSI.However, the launch reminded people that IBM announced a new release of DS8000 features, new models of N series N6000,and the new DS5000 disk, so that squashes those rumors.
IBM XIV is a (insert tier level here) product
While there seems to be no industry-standard or agreement for what a tier-1, tier-2 or tier-3 disk system is, there seemed to be a lot of argument over what pigeon-hole category to put IBM XIV in. No question many people want tier-1 performance and functionality at tier-2 prices, and perhaps IBM XIV is a good step at giving them this. In some circles, tier-1 means support for System z mainframes. The XIV does not have traditional z/OS CKD volume support, but Linux on System z partitions or guests can attach to XIV via SAN Volume Controller (SVC), or through NFS protocol as part of the Scale-Out File Services (SoFS) implementation.
Whenever any radicalgame-changing technology comes along, competitors with last century's products and architectures want to frame the discussion that it is just yet another storage system. IBM plans to update its Disk Magic and otherplanning/modeling tools to help people determine which workloads would be a good fit with XIV.
IBM XIV lacks (insert missing feature here) in the current release
I am glad to see that the accusations that XIV had unprotected, unmirrored cache were retracted. XIV mirrors all writes in the cache of two separate modules, with ECC protection. XIV allows concurrent code loadfor bug fixes to the software. XIV offers many of the features that people enjoy in other disksystems, such as thin provisioning, writeable snapshots, remote disk mirroring, and so on.IBM XIV can be part of a bigger solution, either through SVC, SoFS or GMAS that provide thebusiness value customers are looking for.
IBM XIV uses (insert block mirroring here) and is not as efficient for capacity utilization
It is interesting that this came from a competitor that still recommends RAID-1 or RAID-10 for itsCLARiiON and DMX products.On the IBM XIV, each 1MB chunk is written on two different disks in different modules. When disks wereexpensive, how much usable space for a given set of HDD was worthy of argument. Today, we sell you abig black box, with 79TB usable, for (insert dollar figure here). For those who feel 79TB istoo big to swallow all at once, IBM offers "capacity on demand" pricing, where you can pay initially for as littleas 22TB, but get all the performance, usability, functionality and advanced availability of the full box.
IBM XIV consumes (insert number of Watts here) of energy
For every disk system, a portion of the energy is consumed by the number of hard disk drives (HDD) andthe remainder to UPS, power conversion, processors and cache memory consumption. Again, the XIV is a bigblack box, and you can compare the 8.4 KW of this high-performance, low-cost storage one-frame system with thewattage consumed by competitive two-frame (sometimes called two-bay) systems, if you are willing to take some trade-offs. To getcomparable performance and hot-spot avoidance, competitors may need to over-provision or use faster, energy-consuming FC drives, and offer additional software to monitor and re-balance workloads across RAID ranks.To get comparable availability, competitors may need to drop from RAID-5 down to either RAID-1 or RAID-6.To get comparable usability, competitors may need more storage infrastructure management software to hide theinherent complexity of their multi-RAID design.
Of course, if energy consumption is a major concern for you, XIV can be part of IBM's many blended disk-and-tapesolutions. When it comes to being green, you can't get any greener storage than tape! Blended disk-and-tapesolutions help get the best of both worlds.
Well, I am glad I could help set the record straight. Let me know what other products people you would like me to focus on next.
It's official! My "blook" Inside System Storage - Volume I is now available.
This blog-based book, or “blook”, comprises the first twelve months of posts from this Inside System Storage blog,165 posts in all, from September 1, 2006 to August 31, 2007. Foreword by Jennifer Jones. 404 pages.
IT storage and storage networking concepts
IBM strategy, hardware, software and services
Disk systems, Tape systems, and storage networking
Storage and infrastructure management software
Second Life, Facebook, and other Web 2.0 platforms
IBM’s many alliances, partners and competitors
How IT storage impacts society and industry
You can choose between hardcover (with dust jacket) or paperback versions:
This is not the first time I've been published. I have authored articles for storage industry magazines, written large sections of IBM publications and manuals, submitted presentations and whitepapers to conference proceedings, and even had a short story published with illustrations by the famous cartoon writer[Ted Rall].
But I can say this is my first blook, and as far as I can tell, the first blook from IBM's many bloggers on DeveloperWorks, and the first blook about the IT storage industry.I got the idea when I saw [Lulu Publishing] run a "blook" contest. The Lulu Blooker Prize is the world's first literary prize devoted to "blooks"--books based on blogs or other websites, including webcomics. The [Lulu Blooker Blog] lists past year winners. Lulu is one of the new innovative "print-on-demand" publishers. Rather than printing hundredsor thousands of books in advance, as other publishers require, Lulu doesn't print them until you order them.
I considered cute titles like A Year of Living Dangerously, orAn Engineer in Marketing La-La land, or Around the World in 165 Posts, but settled on a title that matched closely the name of the blog.
In addition to my blog posts, I provide additional insights and behind-the-scenes commentary. If you go to the Luluwebsite above, you can preview an entire chapter in its entirety before purchase. I have added a hefty 56-page Glossary of Acronyms and Terms (GOAT) with over 900 storage-related terms defined, which also doubles as an index back to the post (or posts) that use or further explain each term.
So who might be interested in this blook?
Business Partners and Sales Reps looking to give a nice gift to their best clients and colleagues
Managers looking to reward early-tenure employees and retain the best talent
IT specialists and technicians wanting a marketing perspective of the storage industry
Mentors interested in providing motivation and encouragement to their proteges
Educators looking to provide books for their classroom or library collection
Authors looking to write a blook themselves, to see how to format and structure a finished product
Marketing personnel that want to better understand Web 2.0, Second Life and social networking
Analysts and journalists looking to understand how storage impacts the IT industry, and society overall
College graduates and others interested in a career as a storage administrator
And yes, according to Lulu, if you order soon, you can have it by December 25.
I am still wiping the coffee off my computer screen, inadvertently sprayed when I took a sip while reading HDS' uber-blogger Hu Yoshida's post on storage virtualization and vendor lock-in.
HDS is a major vendor for disk storage virtualization, and Hu Yoshida has been around for a while, so I felt it was fair to disagree with some of the generalizations he made to set the record straight. He's been more careful ever since.
However, his latest post [The Greening of IT: Oxymoron or Journey to a New Reality] mentions an expert panel at SNW that includedMark O’Gara Vice President of Infrastructure Management at Highmark. I was not at the SNW conference last week in Orlando, so I will just give the excerpt from Hu's account of what happened:
"Later I had the opportunity to have lunch with Mark O’Gara. Mark is a West Point graduate so he takes a very disciplined approach to addressing the greening of IT. He emphasized the need for measurements and setting targets. When he started out he did an analysis of power consumption based on vendor specifications and came up with a number of 513 KW for his data center infrastructure....
The physical measurements showed that the biggest consumers of power were in order: Business Intelligence Servers, SAN Storage, Robotic tape Library, and Virtual tape servers....
Another surprise may be that tape libraries are such large consumers of power. Since tape is not spinning most of the time they should consume much less power than spinning disk - right? Apparently not if they are sitting in a robotic tape library with a lot of mechanical moving parts and tape drives that have to accelerate and decelerate at tremendous speeds. A Virtual Tape Library with de-duplication factor of 25:1 and large capacity disks may draw significantly less power than a robotic tape library for a given amount of capacity.
Obviously, I know better than to sip coffee whenever reading Hu's blog. I am down here in South America this week, the coffee is very hot and very delicious, so I am glad I didn't waste any on my laptop screen this time, especially reading that last sentence!
In that report, a 5-year comparison found that a repository based on SATA disk was 23 times more expensive overall, and consumed 290 times more energy, than a tape library based on LTO-4 tape technology. The analysts even considered a disk-based Virtual Tape Library (VTL). Focusing just on backups, at a 20:1 deduplication ratio, the VTL solution was still 5 times per expensive than the tape library. If you use the 25:1 ratio that Hu Yoshida mentions in his post above, that would still be 4 times more than a tape library.
I am not disputing Mark O'Gara's disciplined approach. It is possible that Highmark is using a poorly written backup program, taking full backups every day, to an older non-IBM tape library, in a manner that causes no end of activity to the poor tape robotics inside. But rather than changing over to a VTL, perhaps Mark might be better off investigating the use of IBM Tivoli Storage Manager, using progressive backup techniques, appropriate policies, parameters and settings, to a more energy-efficient IBM tape library.In well tuned backup workloads, the robotics are not very busy. The robot mounts the tape, and then the backup runs for a long time filling up that tape, all the meanwhile the robot is idle waiting for another request.
(Update: My apologies to Mark and his colleagues at Highmark. The above paragraph implied that Mark was using badproducts or configured them incorrectly, and was inappropriate. Mark, my full apology [here])
If you do decide to go with a Virtual Tape Library, for reasons other than energy consumption, doesn't it make sense to buy it from a vendor that understands tape systems, rather than buying it from one that focuses on disk systems? Tape system vendors like IBM, HP or Sun understand tape workloads as well as related backup and archive software, and can provide better guidance and recommendations based on years of experience. Asking advice abouttape systems, including Virtual Tape Libraries, from a disk vendor is like asking for advice on different types of bread from your butcher, or advice about various cuts of meat at the bakery.
The butchers and bakers might give you answers, but it may not be the best advice.
Am I dreaming? On his Storagezilla blog, fellow blogger Mark Twomey (EMC) brags about EMC's standard benchmark results, in his post titled [Love Life. Love CIFS.]. Here is my take:
A Full 180 degree reversal
For the past several years, EMC bloggers have argued, both in comments on this blog, and on their own blogs, that standard benchmarks are useless and should not be used to influence purchase decisions. While we all agree that "your mileage may vary", I find standard benchmarks are useful as part of an overall approach in comparing and selecting which vendors to work with, and which architectures or solution approaches to adopt, and which products or services to deploy. I am glad to see that EMC has finally joined the rest of the planet on this. I find it funny this reversal sounds a lot like their reversal from "Tape is Dead" to "What? We never said tape was dead!"
Impressive CIFS Results
The Standard Performance Evaluation Corporation (SPEC) has developed a series of NFS benchmarks, the latest, [SPECsfs2008] added support for CIFS. So, on the CIFS side, EMC's benchmarks compare favorably against previous CIFS tests from other vendors.
On the NFS side, however, EMC is still behind Avere, BlueArc, Exanet, and IBM/NetApp. For example, EMC's combination of Celerra gateways in front of V-Max disk systems resulted in 110,621 OPS with overall response time of 2.32 milliseconds. By comparison, the IBM N series N7900 (tested by NetApp under their own brand, FAS6080) was able to do 120,011 OPS with 1.95 msec response time.
Even though Sun invented the NFS protocol in the early 1980s, they take an EMC-like approach against standard benchmarks to measure it. Last year, fellow blogger Bryan Cantrill (Sun) gives his [Eulogy for a Benchmark]. I was going to make points about this, but fellow blogger Mike Eisler (NetApp) [already took care of it]. We can all learn from this. Companies that don't believe in standard benchmarks can either reverse course (as EMC has done), or continue their downhill decline until they are acquired by someone else.
(My condolences to those at Sun getting laid off. Those of you who hire on with IBM can get re-united with your former StorageTek buddies! Back then, StorageTek people left Sun in droves, knowing that Sun didn't understand the mainframe tape marketplace that StorageTek focused on. Likewise, many question how well Oracle will understand Sun's hardware business in servers and storage.)
What's in a Protocol?
Both CIFS and NFS have been around for decades, and comparisons can sometimes sound like religious debates. Traditionally, CIFS was used to share files between Windows systems, and NFS for Linux and UNIX platforms. However, Windows can also handle NFS, while Linux and UNIX systems can use CIFS. If you are using a recent level of VMware, you can use either NFS or CIFS as an alternative to Fibre Channel SAN to store your external disk VMDK files.
The Bigger Picture
There is a significant shift going on from traditional database repositories to unstructured file content. Today, as much as [80 percent of data is unstructured]. Shipments this year are expected to grow 60 percent for file-based storage, and only 15 percent for block-based storage. With the focus on private and public clouds, NAS solutions will be the battleground for 2010.
So, I am glad to see EMC starting to cite standard benchmarks. Hopefully, SPC-1 and SPC-2 benchmarks are forthcoming?
As a consultant, I am often asked to help design the architecture for the information infrastructure. A usefulanalogy to gather requirements and preferences is the difference between area rugs and wall-to-wall carpeting. Arearugs are not secured to the floor and cover only a portion of the floor area. Carpets are generally tacked or cemented to the floor, often with an underlay of cushion padding, stretched across the entire floor surface, out to all four walls of each room.
Each has its pros and cons, and often is a matter of preference. Some people like area rugs because they can choosea different style for each room, match the decor and color scheme of furniture, and use these to define each livingspace. Ever since paleolithic man put animal skins on the floor of their cave, people recognize that cold, hard andugly floors could be covered up with something soft and more attractive.Others prefer wall-to-wall carpeting because they want to walk around the house barefoot, have their young children crawl on their hands and knees, and give the entire house a unified look and feel. This is often an inexpensive option when compared against the cost of individual rugs.
The same is true for an information infrastructure. For some, they prefer the "area rug" approach: this style ofstorage for their email, this other type of storage for their databases, and perhaps a third for their unstructuredfile systems. When customers ask what storage would I recommend for their SAP application, or their Microsoft Exchangeemail environment, or their Business Intelligence (BI) software, I recognize they are taking this "area rug" approach.
Like area rugs, having different storage can focus on specific attributes of the workload characteristics. It alsoinsulates against company-wide changes, the dreaded "rip-and-replace" of replacing all of your storage with somethingfrom a different vendor. With "area rug" storage, you can support a dual-vendor or multi-vendor strategy, and upgrade or replace each on its own schedule.
Thanks to open standards and industry-standard benchmarks, changing out one storage solution for another is assimple as rolling up an area rug, and putting another one in its place that is similar in size dimensions.
Others may prefer "wall-to-wall carpeting" approach: one disk system type, one tape library type,one network type, that provides unified management and minimizes the needs for unique skills. Generally, the choice of NAS, SAN or iSCSI infrastrucutre is done company-wide, and might strongly influence the set of products that will support that decision. For example, those with a mix of mainframe and distributed servers looking for SAN-attached storage may look at an [IBM System Storage DS8000] and [TS3500 tape library] that can provide support for FICON and FCP.
Those looking at NAS or iSCSI might consider the IBM System Storage N series products, "unified storage" supporting iSCSI, FCP and NAS protocols. If you want the "wall-to-wall" to stretch across all the sites in your globally integrated enterprise, IBM's scalable NAS product, Scale-Out File Services[SoFS], provides a global name spacein combination with a clustered file system that provides incredible scalability and performance based on field-proven technology used by the majority of the [Top 100 supercomputer] deployments.
IBM can help you design an information infrastructure that fits either approach.
Miles per Gallon measures an effeciency ratio (amount of work done with a fixed amount of energy), not a speed ratio (distance traveled in a unit of time).
Given that IOPs and MB/s are the unit of "work" a storage array does, wouldn't the MPG equivalent for storage be more like IOPs per Watt or MB/s per Watt? Or maybe just simply Megabytes Stored per Watt (a typical "green" measurement)?
You appear to be intentionally avoiding the comparison of I/Os per Second and Megabytes per Second to Miles Per Hour?
May I ask why?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
MPG applies to all instances of a particular make and model. Before Henry Ford and the assembly line, cars were made one at a time, by a small team of craftsmen, and so there could be variety from one instance to another. Today, vehicles and storage systems are mass-produced in a manner that provides consistent quality. You can test one vehicle, and safely assume that all similar instances of the same make and model will have the similar mileage. The same is true for disk systems, test one disk system and you can assume that all others of the same make and model will have similar performance.
MPG has a standardized measurement benchmark that is publicly available. The US Environmental Protection Agency (EPA) is an easy analogy for the Storage Performance Council, providing the results of various offerings to chose from.
MPG has usage-specific benchmarks to reflect real-world conditions.The EPA offers City MPG for the type of driving you do to get to work, and Highway MPG, to reflect the type ofdriving on a cross-country trip. These serve as a direct analogy to SPC having SPC-1 for Online transaction processing (OLTP) and SPC-2 for large file transfers, database queries and video streaming.
MPG can be used for cost/benefit analysis.For example, one could estimate the amount of business value (miles travelled) for the amount of dollar investment (cost to purchase gallons of gasoline, at an assumed gas price). The EPA does this as part of their analysis. This is similar to the way IOPS and MB/s can be divided by the cost of the storage system being tested on SPC benchmark results. The business value of IOPS or MB/s depends on the application, but could relate to the number of transactions processed per hour, the number of music downloads per hour, or number of customer queries handled per hour, all of which can be assigned a specific dollar amount for analysis.
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
Increasing MPH, and driving anywhere near the maximum rated MPH for a vehicle, can be reckless and dangerous,risking loss of human life and property damage. Even professional race car drivers will agree there are dangers involved. By contrast, processing I/O requests at maximum speed poses no additional risk to the data, nor possibledamage to any of the IT equipment involved.
While most vehicles have top speeds in excess of 100 miles per hour, most Federal, State and Local speed limits prevent anyone from taking advantage of those maximums. Race-car drivers in NASCAR may be able to take advantage of maximum MPH of a vehicle, the rest of us can't. The government limits speed of vehicles precisely because of the dangers mentioned in the previous bullet. In contrast, processing I/O requests at faster speeds poses no such dangers, so the government poses no limits.
Neither IOPS nor MB/s match MPH exactly.Earlier this week,I related IOPS to "Questions handled per hour" at the local public library, and MB/s to "Spoken words per minute" in those replies. If I tried to find a metric based on unit type to match the "per second" in IOPS and MB/s, then I would need to find a unit that equated to "I/O requests" or "MB transferred" rather than something related to "distance travelled".
In terms of time-based units, the closest I could come up with for IOPS was acceleration rate of zero-to-sixty MPH in a certain number of seconds. Speeding up to 60MPH, then slamming the breaks, and then back up to 60MPH, start-stop, start-stop, and so on, would reflect what IOPS is doing on a requestby request basis, but nobody drives like this (except maybe the taxi cab drivers here in Malaysia!)
Since vehicles are limited to speed limits in normal road conditions, the closest I could come up with for MB/s would be "passenger-miles per hour", such that high-occupancy vehicles like school buses could deliver more passengers than low-occupancy vehicles with only a few passengers.
Neither start-stops nor passenger-miles per hour have standardized benchmarks, so they don't work well for comparisonbetween vehicles.If you or anyone can come up with a metric that will help explain the relevance of standardized benchmarks better than the MPG that I already used, I would be interested in it.
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s.
Oh, and yes, it's true - the DMX-4 is not the first high-end storage array to ship a 4Gb/s FC back-end. The USP-V, announced way back in May, has that honor (but only if it meets the promised first shipments in July 2007). DMX-4 will be in August '07, so I guess that leaves the DS8000 a distant 3rd.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
EMC Corporation (NYSE:EMC) today announced it has been positioned as a leader in the Forrester Wave™: Enterprise Open Systems Virtual Tape Library (VTL), Q1 2008 by Forrester Research, Inc. (January 31, 2008), an independent market and technology research firm. EMC achieved a position as a leader in the Forrester Wave report on virtual tape libraries based on the largest installed base of the EMC® Disk Library family of systems, its broad ecosystem interoperability. Virtual tape libraries emulate tape drives and work in conjunction with existing backup software applications, enabling fast backup and restoration of data by using high-capacity, low-cost disk drives.
EMC was the first major vendor in the open systems virtual tape library market as it introduced the EMC Disk Library in April 2004 and today is a leading provider of open systems virtual tape solutions, with systems that are designed for businesses and organizations of all sizes.
While the press release implies that "EDL equals VTL", Chuck tries to explain they are in fact very different. Here is an excerpt from his blog post:
Virtual Tape Libraries vs. Disk Libraries
As many of you know, VTLs have been around for a while. They use disk as a cache -- they buffer the incoming backup streams, do some housekeeping and stacking, then turn around and write tape efficiently. When you go to restore, you're usually coming back off of tape, unless the backup image in question is sitting in the disk cache.
Now, there is nothing wrong with the VTL approach, but it was conceived in a time when disks were horribly expensive. It was also pretty clear to many of us that disks were going to be a whole lot cheaper in the near future, and this fundamental assumption wouldn't be valid for much longer.
I kept thinking in terms of disk as a direct target for a backup application. No modifications to the backup application. Native speed of sequential disks for both backup and restore. Tape positioned as a backup to the backup. Use the strengths of the underlying array (e.g. CLARiiON) for performance, availability, management, etc.
We ended up calling the concept a "disk library" to differentiate from the VTLs that had come before it. It was a different value proposition and offering, based on the emergence of lower-cost disk media.
... It's nice to see we're at 1,100+ customers, and still going strong.
For those new to the blogosphere, there is a difference between "Press Releases" as formalcorporate communications versus "Blog Posts" which are informal opinions of the individual blogger, whichmay or may not match exactly the views of their respective employer.As we've learned many times before, one should not treat termslike "first" or "leader" in corporate press releases literally! Let's explore each.
Was EDL the first "open systems" Virtual Tape Library?
This is implied by the Forrester report. Chuck mentions the "VTLs that had came before it" in his blog, and many people are aware that IBM and StorageTek had introduced mainframe-attached VTLs in the 1990s. But what about VTL for "open systems"?
(Hold aside for the moment that IBM System zmainframe is an open system itself, with z/OS certified as a bona fide UNIX operating system by the [the Open Group] standards body. Most analysts and research firms usually refer only to the non-mainframe versions of UNIX and Windows. Alternative definitions for "open systems" can be foundin [Web definitions or Wikipedia]. I will assume Forrester meantnon-mainframe servers.)
IBM announced AIX non-mainframe attachment via SCSI connectivity to the IBM 3494 Virtual Tape Server (VTS) on Feb 16, 1999, with general availability in May 28, 1999. That's nearly FIVE YEARS before the April 2004 introduction of EDL. IBM VTS support for Sun Solaris and Microsoft Windows came shortly thereafter in November 2000, and support for HP-UX a bit later in June 2001. One of my 17 patents is for the software inside the IBM 3494 VTS, so like Chuck, I can takesome pride in the success of a successful product.
(I don't remember if StorageTek, which was subsequently acquired by Sun, had ever supported non-mainframe operating systems with their Virtual Storage Manager[VSM] offering, but if they did, I am sure it was also before EMC.)
Last week, another EMC blogger, BarryB (aka [the Storage Anarchist]),took me to task in comments on my post [IBM now supports 1TB SATA drives]. He felt that IBM should not claim support, given that the software inside the IBM System Storage N series is developed by NetApp. He compared this to the situation of HP and Sun re-badging the HDS USP-V disk system. If someone else wrote the software, BarryB opines, IBM should not claim credit for it. I tried to explain how IBM provides added value and has full-time employees dedicated to N series development and support, butdoubt I have changed his mind.
Why do I bring that up? Because the EMC Disk Library runs OEM software from FalconStor. Basically EMC is assembling a hardware/software solution with components provided from OEM suppliers. Hmmm? Sound familiar? Who is calling the kettle black?
If there is a clear winner here, it is FalconStor itself.Perhaps one of the worst kept industry secrets is that FalconStor software is also used in VTL offerings from Sun, Copan, and IBM, the latter embodied as the [IBM TS7520 Virtualization Engine] offering. If you like the concept of an EDL,but prefer instead one-stop shopping from an "information infrastructure" vendor, IBM can offer the TS7520 along with servers, software and services for a complete end-to-end solution.
Can EMC claim to be "a leader" in Virtual Tape Libraries?
During the measured quarter, IBM shipped its 10 millionth LTO-4 tape drive cartridge to Getty Images, the world's leading creator and distributor of still imagery, footage and multi-media products, as well as a recognized provider of other forms of premium digital content, including music. Getty Images is using the LTO-4 drives as part of a tiered infrastructure of IBM disk and tape solutions that help support the backup needs of their digital imagery;
IBM shipped more than 1,500 Petabytes of tape storage in Q3'07 alone;
During Q3'07, IBM shipped the 10,000th IBM System Storage TS3500 Tape Library. The TS3500 is a highly scalable tape library with support from 1 to 192 tape drives and up to 6,400 cartridge slots for open system, mainframe and virtual tape system attachment.
Let's take a look at the numbers. IBM has sold over 5,400 virtual tape libraries. Sun/STK has sold over 4,000 virtual tape libraries. Both are drastically more than the 1,100 mentioned in Chuck's post. Does IDC recognize EMC in third place? No, EMC chooses instead to declare EDL as disk arrays (probably toprop up their IDC "Disk Tracker" numbers), so they don't even earn an honorable mention under the virtual tape librarycategory. This of course includes the number of mainframe-attached models from IBM and Sun/STK. So, if EMC did call these tape systems instead, they might showup in third place, and as such EMC could claim to be "a leader" in much the same way an athlete can claim to be an "Olympic medalist" winning the bronze for third place. (If you limit thecount to just the FalconStor-based models from IBM, EMC, Sun and Copan, then EMC moves up to first or second, but then press release titles like "EMC a Leader in FalconStor-based non-mainframe Virtual Tape Libraries" can get too confusing.)
Chuck, if you are reading this, I feel you have every right to celebrate your involvement with the EDL. Despite having common software and hardware components, both IBM and EMC can rightfully declare their own unique value-add through their respective VTL offerings. Like the IBM N series, the EMC Disk Library is not diminished by the fact the software was written by someone else. BarryB might disagree.
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData blog, posted a ["bleg"] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
Keeping more backup data available online for fast recovery.
Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
Will this meet my current business requirements?
Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
Storage for the Green Data Center
I presented this topic in four general categories:
Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
This week I was aboard the Queen Mary in Long Beach, California! This was a business event organized by [Key Info Systems], a valued IBM Business Partner. Key Info resells IBM servers, storage and switches.
The Queen Mary retired in 1967, and has been converted into a hotel and events venue. The locals just parked their car and walked on board, but I got to stay Tuesday through Thursday in one of the cabins. It was long and narrow, with round windows! There were four dials for the bathtub: Cold Salt, Hot Fresh, Cold Fresh, and Hot Salt.
Stepping on the boat was like walking back in time through history! If you decide to go see it, check out the [Art Deco bar at the front of the Promenade deck. The ship is still in the water, but is permanently docked. It is sectioned off to prevent the ocean waves from affecting it, so we did not have the nauseous moving back and forth normally associated with cruise ships.
(It is with a bit of irony that we are on the Queen Mary just days after the tragedy of the [Costa Concordia], the largest Italian cruise ship that ran aground near Isola de Giglio. The captain will have to explain how he [fell into a lifeboat] before he had a chance to wait for everyone else to get safely off the shipwreck. He was certainly no [Captain Sulley]! I am thankful that most of the 4,200 people survived the incident.)
Lief Morin, Founder and Chief Executive for Key Info Systems, kicked off the meeting with highlights of 2011 successes. I have known Lief for years, as Key Info comes to the Tucson EBC on a frequent basis. This event was designed to give his sellers an update of what is the latest for each product line, and what to look forward to in the next 12-18 months.
The next speaker was from Vision Solutions that provides High Availability solutions for IBM i on Power Systems. In 2010, their company nearly doubled in size with the acquisition of Double-Take, which provides data replication for x86 servers running Windows, Linux, VMware, Hyper-V and other hypervisors. The capabilities of Double-Take sounded similar to what IBM offers with [Tivoli Storage Manager FastBack] and [Tivoli Storage Manager for Virtual Environments].
Dinner at Sir Winston's
Rather than take the "Ghosts and Legends" tour, I opted for dinner at the Queen Mary's signature restaurant, Sir Winston's. This is a fancy place, so dress accordingly. If you want the Raspberry soufflé, order it early as it takes 30 minutes to prepare!
[Storwize V7000], including the new Storwize V7000 Unified configuration
Storage is an important part of the Key Info Systems revenue stream, so I was glad to have lots of questions and interactions from the audience.
Murder Mystery Dinner
The acting troupe from [Dinner Detective] put on quite the show for us! With all that is going on in the world, it is good to laugh out loud every now and then.
In other murder mystery dinners I have participated in, each person is assigned a "character" and given a script of what to say and when to say it. This was different, we got to pick our own characters. I chose "Doctor Watson", from the Sherlock Holmes series. Several attendees thought it was a double meaning with [IBM Watson], the computer that figured out the clues on Jeopardy! television game show, and has since been [put to work at Wellpoint] to help out the Healthcare industry.
After the "murder" happened, two actors portraying policemen selected members of the audience to answer questions. We didn't get a script of what to say, so everyone had to "ad lib". I was singled out as a suspect, and had fun playing along in character. One of the attendees afterwards said he was impressed that I was able to fabricate such amusing and elaborate responses to their personal and embarassing questions. As a public speaker for IBM, I have had a lot of practice thinking quickly on my feet.
Fibre Channel and Ethernet Switches
The next two speakers gave us an update on Fibre Channel and Ethernet switches, and their thoughts on the inevitability of Fibre Channel over Ethernet (FCoE). One of the exciting new developments is the [Brocade Network Subscription] which creates a flexible pay-per-use Ethernet port rental model for customers. This is especially timely given the Financial Accounting Standards Board proposed [FASB Change 13] that affects operating leases in the balance sheet.
With the Brocade Network Subscription, you pay monthly for the ports you are using. Need more ports, Brocade will install the added gear. Use fewer ports, Brocade will take the equipment back. There is no term endpoint or residual value like tradtional leasing, so when you are done using the equipment, give it back any time. This is ideal for companies that may need to have a lot of Ethernet ports for the next 2-3 years, but then plan to taper down, and don't want to get stuck with a long-term commitment or capital depreciation.
The last speaker was from VMware. IBM is the #1 reseller of VMware, and VMware commands an impressive 81 percent marketshare in the x86 virtualization space. The speaker presented VMware's strategy going forward, which aligns well with IBM's own strategy, to help companies Cloud-enable their existing IT infrastructures, in preparation for eventual moves to Hybrid or Public cloud deployments.
Special thanks to Lief Morin for sponsoring this event, Raquel Hernandez from IBM for coordinating my travel, and Pete, Christina and Kendrell from Key Info Systems for organizing the activities!
Well, it's Tuesday, and so it is "announcement day" again! Actually, for me it is Wednesday morning herein Mumbai, India, but since I was "press embargoed" until 4pm EDT in talking about these enhancements, I had to wait until Wednesday morning here to talk about them.
World's Fastest 1TB tape drive
IBM announced its new enterprise [TS1130 tape drive]and corresponding [TS3500 tape library support]. This one has a funny back-story. Last week while we were preparing the Press Release, we debated on whether we should compare the 1TB per cartridge capacity as double that of Sun's Enterprise T10000 (500GB), or LTO-4 (800GB). The problem changed when Sun announced on Monday they too had a 1TB tape drive, so now instead ofsaying that we had the "World's First 1TB tape drive", we quickly changed this to the "World's Fastest 1TB tape drive" instead. At 160MB/sec top speed, IBM's TS1130 is 33 percent faster than Sun's latest announcement. Sun was rather vague when they will actually ship their new units, so IBM may still end up being first to deliver as well.
While EMC and other disk-only vendors have stopped claiming that "tape is dead", these recent announcements from IBM and Sun indicate that indeed tape is alive and well. IBM is able to borrow technologies from disk, such as the Giant Magneto Resistive (GMR) head over to its tape offerings, which means much of the R&D for disk applies to tape, keeping both forms ofstorage well invested. Tape continues to be the "greenest" storage option, more energy efficient than disk, optical, film, microfiche and even paper.
On the LTO front, IBM enhanced the reporting capabilities of its[TS3310] midrange tape library. This includes identifying the resource utilization of the drives, reporting on media integrity, and improved diagnostics to support library-managed encryption.
IBM System Storage DR550
As a blended disk-and-tape solution, the [IBM System Storage DR550] easily replaces the EMC Centera to meet compliance storagerequirements. IBM announced that we have greatly expanded its scalability, being able to support both 1TBdisk drives, as well as being able to attach to either IBM or Sun's 1TB tape drives.
Massive Array of Idle Disks (MAID)
IBM now offers a "Sleep Mode" in the firmware of the [IBM System Storage DCS9550], which is often called "Massive Array of Idle Disks" (MAID) or spin-down capability. This can reduce the amount of power consumed during idle times.
That's a lot of exciting stuff. I'm off to breakfast now.
Well, it's Tuesday again, and that means more announcements from IBM!
In conjunction with IBM's new [System z10 Business Class (BC)] mainframe designed for Small and Medium-sized Businesses (SMB), IBM also announced related storage productenhancements.
Yes, it's alive! Contrary to the FUD you might have read from our competitors, IBM continues to sell thousands and thousands of IBM System Storage DS6800 disk systems, and now enhances them with the optionfor 450GB 15K RPM drives. What is nice about these 450GB drives is that they are as fast or faster* than 300GBdrives, so the typical trade-off between performance and capacity do not apply.
(* I compared Seagate 15.6K (450GB) with 15.5K (300GB) models.
Avg Seek time (Read)
Avg Seek time (Write)
Full Seek time (Read)
Full Seek time (Write)
This may or may not result in application performance improvements, depending on workload pattern. Your mileage may vary.)
Our clients report back that these are incredibly stable systems that they don't have toworry about. This enhancement applies to both the [511/EX1 models] and [522/EX2 models].
Understanding that clients want complete solutions from single vendors, IBM offers synergy between System z and the IBM System Storage DS8000 disk systems. The latest R4.1 microcode upgrade offers two key features onthe various models [2107,
zHPF - High Performance FICON for System z. IBM was able to increase the throughput on 4 Gbps links. For OLTP workloads randomly accessing 4KB blocks, IBM internal tests showed zHPF doubled performance from 13,000 IOPSto 26,000 IOPS per channel. For sequential workloads, such as batch processing, zHPF increased performance 50 percent, from 350 MB/sec to 525 MB/sec.
In February, IBM previewed[IncrementalResync] for z/OS Metro Global Mirror. However, some concepts are better explained with pictures.
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z, you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and also have site "A" sending data over to site "C" asynchronously using z/OS Global Mirror. This is called "z/OS Metro Global Mirror".
In the past, if the disk system in site A failed, you would switch over to site B, which would have to resend send all the data again to site C to be resynchronized. This is because site B was not tracking what the System Data Mover (SDM) reader had or had not yet processed.
With DS8000 4.1, the "incremental resync" function that, along with using IBM HyperSwap, requires site B to only send and resync the data that was in-flight when the outage occurred. When you compare the difference in sending this limited amount of in-flight data with the traditional complete volume of data, you can see how "Incremental Resync" can resynchronize the data 95% faster, and also greatly decrease your bandwidth requirements. This reduces the risk in case a subsequent outage occurs.
Introduced originally in 1997 as the IBM Virtual Tape Server (VTS), the [IBMSystem Storage TS7700] series supports Grid capabilityto replicate tape image data across locations. Here's a quick recap of today's announcement:
Existing TS7740 can be upgraded up to 9TB of disk cache. New models can have up to 13TB of disk cache.
A new "tape-less" TS7720 that has up to 70TB of disk cache.
Integrate Library Management support. I discussed[IntegratedRemovable Media Manager (IRMM)] before, and this is basically IRMM inside. For those with TS3500 tape libraries,this support eliminates the need for a separate IBM 3953 L05 Library Manager.
TS1130 back-end tape drive support. These are the fastest 1TB drives in the industry, with support of built-in encryption, and now can be used asthe physical tape back-end for the virtual tape TS7740 repository.
While our competitors might be boarding up their windows in preparation for the economic downturn in the USAeconomy, IBM remains generating solid results. San Jose Mercury News has an article that discusses this titled[IBM's 3Q profit strong on global sales].There has never been a better time to buy from, or invest in, IBM!
Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
Seagate (including Maxstor they acquired)
Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
Continuing this week's theme about new products that were mentioned in last week's launch, today I willcover the new [S24 and S54 frames].
Before these new frames, customers had two choices for their tape cartridges: keep them in an automatedtape library, or on an external shelf. Most of the critics of tape focus almost entirely on the problemsrelated to the latter. When tapes are placed outside of automation, you need human intervention to findand fetch the tapes, tapes can be misplaced or misfiled, tapes can be dropped, tapes can get liquids spilledon them, and so on. These problems just don't happen when stored in automated tape libraries.
Until now, the number of cartridges were limited to the surface area of the wall accessible by the roboticpicker. Whether the robot rotates in a circle picking from dodecagon walls, or back and forth from longrectangular walls, the problem was the same.
But what about tapes that may not need to be readily accessible, but still automated? With the newhigh density frames, you can now stack tapes several cartridges deep, spring loaded deep shelves thatpush the tape cartridges up to the front one at a time. The high-density frame design might have been inspired by thefamous [Pez] candy dispenser, but at 70.9 inches, does not beat the[World's Tallest Pez Dispenser].
(Note: PEZ® is a registered trademark of Pez Candy, Inc.)
In a regular cartridge-only frame, like the D23, you have slots for 200 cartridges on the left, and 200 cartridges on the right, and the robotic picker can pull out and push back cartridges into any of theseslot positions. In the new S24, there are still 200 slots on the left, now referred to as "tier 0",but up to 800 cartridges on the right. In each slot there are up to four 3592 cartridges, the positionimmediately reachable to the picker is referred to as "tier 1", and the ones tucked behindare "tier 2", "tier 3" and "tier 4".
<- - - S24 frame - - - >
We have fun slow-motion videos we show customers on how these work. For example, in the diagram above, let'ssuppose you want to fetch Tape E in the "tier 4" position. The following sequence happens:
Robotic picker pulls "tier 1" tape cartridge B, and pushes it into another shelf slot. Tapes C, D and E get pushed up to be Tiers 1, 2 and 3 now.
Robotic picker pulls "tier 1" tape cartridge C, and puts it in another shelf slot. Tapes D and E get pushed up to be Tiers 1 and 2 now.
Robotic picker pulls "tier 1" tape cartridge D, and puts it in another shelf slot. Tape E gets pushed up to be Tier 1 now.
Robotic picker pulls "tier 1" tape cartridge E, this is the tape we wanted, and can move it to the drive.
The other three cartridges (B, C and D) are then pulled out of the temporary slot, and pushed back into their original order.
In this manner, the most recently referenced tape cartridges will be immediately accessible, and the ones leastreferenced will eventually migrate to the deeper tiers. The 3592 cartridges can be used with either TS1120 orTS1130 drives. Each cartridge can hold up to 3TB of data (1TB raw, at 3:1 compression), so the entire framecould hold 3PB in just 10 square feet of floor space. Five D23 frames could be consolidated down to two S24 frames.The S24 frame comes in "Capacity on Demand" pricing options. The base model of the S24 has just tiers 0, 1 and 2, for a total capacity of 600 cartridges. You can then later license tiers 3 and 4 when needed.
The S54 is basically similar in operation, but for LTO cartridges. It works with any mix of LTO-1, LTO-2, LTO-3 andLTO-4 cartridges.The left side holds tier 0 as before, but the right side has up to five LTO cartridges deep. For Capacity on Demand pricing,the base model supports 660 cartridges (tiers 0,1,2), with options to upgrade for the additional 660 cartridges.The total 1320 cartridges could hold up to 2.1 PB of data (at 2:1 compression). One S54 frame could replacethree traditional S53 frames that held only 440 LTO cartridges each.
If you have both TS1100 series and LTO drives in your TS3500 tape library, then you can haveboth S24 and S54 frames side by side.
Continuing my week in Chicago, for the IBM Storage Symposium 2008, I attended two presentations on XIV.
XIV Storage - Best Practices
Izhar Sharon, IBM Technical Sales Specialist for XIV, presented best practices using XIV in various environments.He started out explaining the innovative XIV architecture: a SATA-based disk system from IBM can outperformFC-based disk systems from other vendors using massive parallelism. He used a sports analogy:
"The men's world record for running 800 meters was set in 1997 by Wilson Kipketer of Denmark in a time of 1:41.11.
However, if you have eight men running, 100 meters each, they will all cross the finish line in about 10 seconds."
Since XIV is already self-tuning, what kind of best practices are left to present? Izhar presented best practicesfor software, hosts, switches and storage virtualization products that attach to the XIV. Here's some quickpoints:
Use as many paths as possible.
IBM does not require you to purchase and install multipathing software as other competitors might. Instead, theXIV relies on multipathing capabilities inherent to each operating system.For multipathing preference, choose Round-Robin, which is now available onAIX and VMware vSphere 4.0, for example. Otherwise, fixed-path is preferred over most-recently-used (MRU).
Encourage parallel I/O requests.
XIV architecture does not subscribe to the outdated notion of a "global cache". Instead, the cache is distributed across the modules, to reduce performance bottlenecks. Each HBA on the XIV can handle about 1400requests. If you have fewer than 1400 hosts attached to the XIV, you can further increase parallel I/O requests by specifying a large queue depth in the host bus adapter (HBA).An HBA queue depth of 64 is a good start. Additional settings mightbe required in the BIOS, operating system or application for multiple threads and processes.
For sequential workloads, select host stripe size less than 1MB. For random, select host stripe size larger than 1MB. Set rr_min_io between ten(10) and the queue depth(typically 64), setting it to half of the queue depth is a good starting point.
If you have long-running batch jobs, consider breaking them up into smaller steps and run in parallel.
Define fewer, larger LUNs
Generally, you no longer need to define many small LUNs, a practice that was often required on traditionaldisk systems. This means that you can now define just 1 or 2 LUNs per application, and greatly simplifymanagement. If your application must have multiple LUNs in order to do multiple threads or concurrent I/O requests, then, by all means, define multiple LUNs.
Modern Data Base Management Systems (DBMS) like DB2 and Oracle already parallelize their I/O requests, sothere is no need for host-based striping across many logical volumes. XIV already stripes the data for you.If you use Oracle Automated Storage Management (ASM), use 8MB to 16MB extent sizes for optimal performance.
For those virtualizing XIV with SAN Volume Controller (SVC), define manage disks as 1632GB LUNs, in multiple of six LUNs per managed disk group (MDG), to balance across the six interface modules. Define SVC extent size to 1GB.
XIV is ideal for VMware. Create big LUNs for your VMFS that you can access via FCP or iSCSI.
Organize data to simplify Snapshots.
You no longer need to separate logs from databases for performance reasons. However, for some backup productslike IBM Tivoli Storage Manager (TSM) for Advanced Copy Services (ACS), you might want to keep them separatefor snapshot reasons. Gernally, putting all data for an application on one big LUNgreatly simplifies administration and snapshot processing, without losing performance.If you define multiple LUNs for an application, simply put them into the same "consistencygroup" so that they are all snapshot together.
OS boot image disks can be snapshot before applying any patches, updates or application software, so that ifthere are any problems, you can reboot to the previous image.
Employ sizing tools to plan for capacity and performance.
The SAP Quicksizer tool can be used for new SAP deployments, employing either the user-based orthroughput-based sizing model approach. The result is in mythical unit called "SAPS", which represents0.4 IOPS for ERP/OLTP workloads, and 0.6 IOPS for BI/BW and OLAP workloads.
If you already have SAP or other applications running, use actual I/O measurements. IBM Business Partners and field technical sales specialists have an updated version of Disk Magic that can help size XIV configurations fromPERFMON and iostat figures.
Lee La Frese, IBM STSM for Enteprise Storage Performance Engineering, presented internal lab test results forthe XIV under various workloads, based on the latest hardware/software levels [announced two weeks ago]. Three workloadswere tested:
Web 2.0 (80/20/40) - 80 percent READ, 20 percent WRITE, 40 percent cache hits for READ.YouTube, FlickR, and the growing list at [GoWeb20] are applications with heavy read activity, but because of[long-tail effects], may not be as cache friendly.
Social Networking (50/50/50) - 50 percent READ, 50 percent WRITE, 50 percent cache hits for READ.Lotus Connections, Microsoft Sharepoint, and many other [social networking] usage are more write intensive.
Database (70/30/50) - 70 percent READ, 30 percent WRITE, 50 percent cache hits for READ.The traditional workload characteristics for most business applications, especially databases like DB2 andOracle on Linux, UNIX and Windows servers.
The results were quite impressive. There was more than enough performance for tier 2 application workloads,and most tier 1 applications. The performance was nearly linear from the smallest 6-module to the largest 15-module configuration. Some key points:
A full 15-module XIV overwhelms a single SVC 8F4 node-pair. For a full XIV, consider 4 to 8 nodes 8F4 models, or 2 to 4 nodes of an 8G4. For read-intensive cache-friendly workloads, an SVC in front of XIV was able to deliver over 300,000 IOPS.
A single node TS7650G ProtecTIER can handle 6 to 9 XIV modules. Two nodes of TS7650G were needed to drivea full 15-module XIV. A single node TS7650 in front of XIV was able to ingest 680 MB/sec on the seventh day with17 percent per-day change rate test workload using 64 virtual drives. Reading the data back got over 950 MB/sec.
For SAP environments where response time 20-30 msec are acceptable, the 15-module XIV delivered over 60,000 IOPS. Reducing this down to 25,000-30,000 cut the msec response time to a faster 10-15 msec.
These were all done as internal lab tests. Your mileage may vary.
Not surprisingly, XIV was quite the popular topic here this week at the Storage Symposium. There were many moresessions, but these were the only two that I attended.
Well, it's Tuesday, and you know what that means? IBM announcements!
Today we had several for the IBM System Storage product line. Here are some of them:
DS8000 gets thinner, leaner and faster
The 4.3 level of microcode for the IBM System Storage DS8000 series disk systems [announced enhancements] for both fixed block architecture (FBA) LUNs and count key data (CKD) volumes.
For FBA LUNs that attach to Linux, UNIX and Windows distributed systems, IBM announced DS8000 Thin Provisioning native support. Of course, many people already had this by putting IBM System Storage SAN Volume Controller (SVC) in front, but now DS8000 clients out there without SVC can also achieve benefits ofthin provisioning. This support also improves quick initialization a whopping 2.6 times faster.
For CKD volumes attached to z/OS on System z mainframes, IBM announced zHPF multitrack support for z/OS 1.9 and above. zHPF provide high performance FICON performance, and can now handle multitrack I/O transfers foreven better performance for zFS, HFS, PDSE, and extended striped data sets.
XIV gets better connected
A lot of XIV[announced enhancements] and preview announcements centered around better connectivity. Here's a run down:
Better host attachment connectivity by beefing up the interface modules that hold the FCP and iSCSI interface cards. XIV disk arrays have 3 to 6 of these in different configurations, and since they manage both their own disks,as well as receive host I/O requests for other disks, are basically doing double-duty.These interface modules can now be ordered as [Dual-CPU] modules.
Better infrastructure management by connecting XIV with the industry standard SMI-S interface to IBM Tivoli Storage Productivity Center. Now, XIV can be part of the single pane of glass console that manages all of your other disk arrays, tape libraries and SAN fabrics.
Better copy services for backups by connecting XIV with IBM Tivoli Storage Manager Advanced Copy Services. TSM for Advanced Copy Services is application aware and can coordinate XIV Snapshots similar to its current support for SVC and DS8000 FlashCopy capabilities.
Better connectivity to security systems by supporting LDAP credentials. Before, you had individual userid and passwords for each XIV, and these were probably different than all the other userid/password combinations you have for every other box on your data center floor. IBM is working on getting all products to support theLightweight Directory Access Protocol, or [LDAP] so that we can reach the nirvana of "single sign-on",one userid/password per administrator for all IT devices in the company.
Better support with flexible warranty periods and non-disruptive code load options.
Better remote copy support by connecting to sites far, far away. IBM previewed that it will provideasynchronous disk mirroring from one XIV to another XIV natively. Before this, XIV's synchronous mirroring was limited to 300km distances. Many of our clients do long distance global mirroring of their XIV today behind an SVC, but again, for those out there that don't yet have an SVC, this can be a reasonable alternative.
TS7650 ProtecTIER data deduplication appliance now offers "no dedupe" option
In what some might consider a surprising move, IBM announced a "no dedupe" licensing option on their premiere deduplication solution, which somewhat reminds me of IBM's NOCOPY option on DS8000 FlashCopy. At first I thought "Are you kidding me?!?!" However, this new license option allows the TS7650 appliance to compete with other virtual tape libraries (VTL) that do not offer deduplication capability on an even playing field. It also allows TS7650 to be used for data that doesn'tdedupe very well, such as seismic recordings, satellite images, or what have you. There are also clients who do not yet feel comfortable to dedupe their financial records for compliance reasons.This option now allows IBM to withdraw from marketing the TS7530 non-dedupe library. Having one technology thatdoes both dedupe and no-dedupe is better than offering two separate libraries based on different technologies.
The ProtecTIER series also announced [IP remote distance replication]. This can be used to replicate virtualtape cartridges in one ProtecTIER over to another ProtecTIER at a remote location. You can decide to replicateall or just a subset of your virtual tapes, and this feature can be used to migrate, merge or split ProtecTIERconfigurations as your needs grow. Before this support, our TS7650G clients replicated the disk repositoryusing native disk array replication technology, such as Global Mirror on the DS8000, but that meant that all data was replicated over to the secondary site. Now, with this new IP replication feature, you can be selective, and replicate only those virtual tapes that are mission critical.
The appliance now supports up to 36TB of disk capacity, and the new "IBM i" operating system on System i servers,formerly known as i5/OS.
GPFS does Windows
IBM's General Parallel File System (GPFS) has the lion's marketshare of file systems used in the [Top 500 Supercomputers]. For a while, it was limited to just Linux and AIX operating system support, but version 3.3 [extends this to Windows 2008 on 64-bit architectures]. GPFS isthe file system used in IBM's Scale-Out File Services, the underlying technology of IBM's Cloud Computing and Storage offerings.
Well, it's Tuesday, and that means IBM announcements! Today is bigger, as there are a lot of Dynamic Infrastructure announcements throughout the company with a common theme, cloud computing and smart business systems that support the new way of doing things. Today, IBM announced its new "IBM Smart Archive" strategy that integrates software, storage, servers and services into solutions that help meet the challenges of today and tomorrow. IBM has been spending the past few years working across its various divisions and acquisitions to ensure that our clients have complete end-to-end solutions.
IBM is introducing new "Smart Business Systems" that can be used on-premises for private-cloud configurations, as well as by cloud-computing companies to offer IT as a service.
IBM [Information Archive] is the first to be unveiled, a disk-only or blended disk-and-tape Information Infrastructure solution that offers a "unified storage" approach with amazing flexibility for dealing with various archive requirements:
For those with applications using the IBM Tivoli Storage Manager (TSM) or IBM System Storage Archive Manager (SSAM) API of the IBM System Storage DR550 data retention solution, the Information Archive will provide a direct migration, supporting this API for existing applications.
For those with IBM N series using SnapLock or the File System Gateway of the DR550, the Information Archive will support various NAS protocols, deployed in stages, including NFS, CIFS, HTTP and FTP access, with Non-Erasable, Non-Rewriteable (NENR) enforcement that are compatible with current IBM N series SnapLock usage.
For those using NAS devices with PACS applications to store X-rays and other medical images, the Information Archive will provide similar NAS protocol interfaces. Information Archive will support both read-only data such as X-rays, as well as read/write data such as Electronic Medical Records.
Information Archive is not just for compliance data that was previously sent to WORM optical media. Instead, it can handle all kinds of data, rewriteable data, read-only data, and data that needs to be locked down for tamper protection. It can handle structured databases, emails, videos and unstructured files, as well as objects stored through the SSAM API.
The Information Archive has all the server, storage and software integrated together into a single machine type/model number. It is based on IBM's General Parallel File System (GPFS) to provide incredible scalability, the same clustered file system used by many of the top 500 supercomputers. Initially, Information Archive will support up to 304TB raw capacity of disk and Petabytes of tape. You can read the [Spec Sheet] for other technical details.
For those who prefer a more "customized" approach, similar to IBM Scale-Out File Services (SoFS), IBM has [Smart Business Storage Cloud]. IBM Global Services can customize a solution that is best for you, using many of the same technologies. In fact, IBM Global Services announced a variety of new cloud-computing services to help enterprises determine the best approach.
In a related announcement, IBM announced [LotusLive iNotes], which you can think of as a "business-ready" version of Google's GoogleApps, Gmail and GoogleCalendar. IBM is focused on security and reliability but leaves out the advertising and data mining that people have been forced to tolerate from consumer-oriented Web 2.0-based solutions. IBM's clients that are already familiar with on-premises version of Lotus Notes will have no trouble using LotusLive iNotes.
There was actually a lot more announced today, which I will try to get to in later posts.
Last week, on January 31, two of my colleagues retired from IBM. At IBM, retirements always happen on the last day of the month. Here is my memories of each, listed alphabetically by last name.
Mark Doumas retires after working 32 years with IBM. Mark was my manager for a few months in 2003. Back then, IBM was working on launching a variety of new products, including the IBM SAN File System (SFS), the IBM SAN Volume Controller (SVC), a new release of Tivoli Storage Manager (TSM), and TotalStorage Productivity Center (TPC), which was later renamed to IBM Tivoli Storage Productivity Center.
Mark was manager of the portfolio management team, and I was asked to manage the tape systems portfolio. I am no stranger to tape, as one of my 19 patents is for the pre-migration feature of the IBM 3494 Virtual Tape Server (VTS). The portfolio included LTO and Enterprise tape drives, tape libraries and virtual tape systems. My job was to help decide how much of IBM's money we should invest in each product area. This was less of a technical role, and more of a business-oriented project management position
Portfolio management is actually part of a chain of project management roles. At the lowest level are team leads that manage individual features, referred to as line items of a release. Release managers are responsible for all the line items of a particular release. Product managers determine which line items will be shipped in which release, and often have to balance across three or more releases. Architects help determine which products in a portfolio should have certain features. Since I was chief architect for DFSMS and Productivity Center, stepping up to portfolio manager was naturally the next rung on the career ladder.
(Side note: If you were wondering why I was only a few months on the job, it was because I was offered an even better position as Technical Evangelist for SVC. See my 2007 blog post [The Art of Evangelism] for a humourous glimpse of the kind of trouble I got in with that title on my business card!)
While my stint in this role was brief, I am still considered an honorary member of the tape development team. Nearly every week I present an overview of our tape systems portfolio at the Tucson Executive Briefing Center, or on the road at conferences and marketing events.
This year, 2012, marks the 60th anniversary of IBM Tape, but I will save that for a future post!
Jim is an IBM Fellow for IBM Systems and Technology Group. There are only 73 IBM Fellows currently working for IBM, and this is the highest honor IBM can bestow on an employee. He has been working with IBM since 1968 and now retires after 44 years! Jim was tasked with predicting the future of IT, and help drive strategic direction for IBM. Cost pressures, requirements for growth, accelerating innovation and changing business needs help influence this direction.
Many consider Jim one of the fathers of server virtualization. For those who think VMware invented the concept of running multiple operating systems on a single host machine, guess again! IBM developed the first server hypervisor in 1967, and introduced the industry's first [offical VM product on August 2, 1972] for the mainframe.
When I joined IBM in 1986, my first job was to work on what was then called DFHSM software for the MVS operating system. Each software engineer had unlimited access to his or her own VM instance of a mainframe for development and testing. This was way better than what we had in college, having to share time on systems for only a few minutes or hours per day. Today, DFHSM is now called the DFSMShsm component of DFSMS, an element of the z/OS operating system.
At various conferences like [SHARE] and [WAVV] we celebrated VM's 25th anniversary in 1997, and its 30th anniversary in 2002. Today, it is called z/VM and IBM continues to invest in its future. Last October, IBM announced [z/VM 6.2] release which provides Live Guest Relocation (LGR) to seemlessly move VM guest images from one mainframe to another, similar to PowerVM's Live Partition Mobility or VMware's VMotion.
Lately, it seems employees at other companies jump from job to job, and from employer to employer, on average every 4.1 years. According to [National Longitudinal Surveys] conducted by the [US. Government's Bureau of Labor Statistics], the average baby boomer holds 11 jobs. In contrast, it is quite common to see IBMers work the majority of their career at IBM.
The next time you have a tasty beverage in your hand, raise your glass! To Mark and Jim, you have earned our respect, and you both have certainly earned your retirement!
Continuing my week in Chicago for the IBM Storage and Storage Networking Symposium and System x and BladeCenter Technical Conference, I presented a variety of topics.
Hybrid Storage for a Green Data Center
The cost of power and cooling has risen to be a #1 concern among data centers. I presented the following hybrid storage solutions that combine disk with tape. These provide the best of both worlds, the high performance access time of disk with the lower costs and reduced energy consumption of tape.
IBM [System Storage DR550] - IBM's Non-erasable, Non-rewriteable (NENR) storage for archive and compliance data retention
IBM Grid Medical Archive Solution [GMAS] - IBM's multi-site grid storage for PACS applications and electronic medical records[EMR]
IBM Scale-out File Services [SoFS] - IBM's scalable NAS solution that combines a global name space with a clustered GPFS file system, serving as the ideal basis for IBM's own[Cloud Computing and Storage] offerings
Not only do these help reduce energy costs, they provide an overall lower total cost of ownership (TCO) thantraditional WORM optical or disk-only storage configurations.
The Convergence of Networks - Understanding SAN, NAS and iSCSI in the Data Center Network
This turned out to be my most popular session. Many companies are at a crossroads in choosing data and storage networking solutions in light of recent announcements from IBM and others. In the span of 75 minutes, I covered:
Block storage concepts, storage virtualization and RAID levels
File system concepts, how file systems map files to block storage
Network Attach Storage, the history of the NFS and CIFS protocols, Pros and Cons of using NAS
Storage Area Networks, the history of SAN protocols including ESCON, FICON and FCP, Pros and Cons of using SAN
IP SAN technologies, iSCSI and Fibre Channel over Ethernet (FCoE), Pros and Cons of using this approach
Network Convergence with Infiniband and Fibre Channel over Convergence Enhanced Ethernet (FCoCEE), why Infiniband was not adopted historically in the marketplace as a storage protocol, and the features and enhancements of Convergence Enhanced Ethernet (CEE) needed to merge NAS, SAN and iSCSI traffic onto a single converged data center network [DCN]
Yes, it was a lot of information to cover, but I managed to get it done on time.
IBM Tivoli Storage Productivity Center version 4.1 Overview and Update
In conferences like these, there are two types of product-level presentations. An "Overview" explains howproducts work today to those who are not familiar with it. An "Update" explains what's new in this version of the product for those who are already familiar with previous releases. I decided to combine these into one sessionfor IBM's new version of [Tivoli Storage Productivity Center].I was one of the original lead architects of this product many years ago, and was able to share many personalexperiences about its evolution in development and in the field at client facilities.Analysts have repeatedly rated IBM Productivity Center as one of the top Storage Resource Management (SRM) tools available in the marketplace.
Information Lifecycle Management (ILM) Overview
Can you believe I have been doing ILM since 1986? I was the lead architect for DFSMS which provides ILM support for z/OS mainframes. In 2003-2005, I spent 18 months in the field performingILM assessments for clients, and now there are dozens of IBM practitioners in Global Technology Services andSTG Lab Services that do this full time. This is a topic I cover frequently at the IBM Executive Briefing Center[EBC], because it addressesseveral top business challenges:
Reducing costs and simplifying management
Improving efficiency of personnel and application workloads
Managing risks and regulatory compliance
IBM has a solution based on five "entry points". The advantage of this approach is that it allows our consultants to craft the right solution to meet the specific requirements of each client situation. These entry points are:
Tiered Information Infrastructure - we don't limit ourselves to just "Tiered Storage" as storage is only part of a complete[information infrastructure] of servers,networks and storage
Storage Optimization and Virtualization - including virtual disk, virtual tape and virtual file solutions
Process Enhancement and Automation - an important part of ILM are the policies and procedures, such as IT Infrastructure Library [ITIL] best practices
Archive and Retention - space management and data retention solutions for email, database and file systems
I did not get as many attendees as I had hoped for this last one, as I was competing head-to-head in the same time slot as Lee La Frese covering IBM's DS8000 performance with Solid State Disk (SSD) drives, John Sing covering Cloud Computing and Storage with SoFS, and Eric Kern covering IBM Cloudburst.
I am glad that I was able to make all of my presentations at the beginning of the week, so that I can then sit back and enjoy the rest of the sessions as a pure attendee.
We had a great event today! This was a first-of-a-kind product launch, using Second Life as the medium. We invited IBM Business Partners, industry analysts and reporters from the Press to have their "avatars" in-world to watch us launch new tape systems, archive and retention systems, and disk systems announced this month.
Andy Monshaw, IBM System Storage General Manager, welcomed everyone to the event, and introduced our three speakers.He mentioned that this was a great innovative way to meet, collaborate and forge relationships without the carbon pollution associated with travel required by a more traditional face-to-face meeting. We had attendees from the USA, UK, Germany, Sweden, Italy, Colombia, and Brazil.
All the attendees were given a "goody bag" that contained IBM BP-logo clothing, animations and gestures to be used during the meeting.
Eric Buckley, one of our marketing managers for tape systems, introduced our complete line of LTO 4 tape systems, as wellas the TS7520 Virtualization Engine, a virtual tape library for Windows, UNIX and Linux servers. Eric had a virtual 3-Dversion of an LTO cartridge that is photo-realistic and dimensionally correct.
Funda Eceral, our solutions manager for archive and retention offerings, presented the new version of the IBM System Storage DR550, the DR550 file system gateway, and the IBM System Storage Multilevel Grid Archive Manager. At first we thought we would "pass the microphone" from speaker to speaker, but it turned out to be easier just to give all three speakers their own microphone.
Last, but not least, was David Tareen, marketing manager for disk systems, covering the entry-level DS3000 Express disk system bundles designed for our SMB client. David used a black-and-brown pointer stick to point out specific things on the charts.
After the presentations, Kristie Bell, VP of Marketing for IBM System Storage, hosted a Question & Answer (Q&A) panel.Avatars rose their left hand to indicate they had a question.
We thought it would be a good idea to have a few minutes at the end to socialize over a cup of coffee. This involved making a "coffee machine" that dispensed coffee, and the appropriate animations and gestures so that everyone could sip the coffee, and hold the coffee at waist level when they were talking.
The event was held upstairs in one of the conference rooms of the IBM Briefing Center, located on "IBM 8" island.Many people went to the ground floor to look at the many IBM System Storage products on display. Unlike a picture on a web-page, Second Life gives you a 3-D view that you can walk around each product, and get a feel for the size and shape of the hardware.
We had four photographers and camera-persons on hand to capture still shots, video, audio, and chat text, and are working now to combine them for marketing collateral. I want to thank the builders, script programmers, animators, clothing designers, speakers, editors, and channel enablement team for making this event such a great success!
Two weeks ago, I mentioned in my post [Pulse 2008 - Day 2 Breakout sessions] thatHenk de Ruiter from ABN Amro bank presented his success storyimplementing Information Lifecycle Management (ILM) across hisvarious data centers. I am no stranger to ABN Amro, having helped "ABN" and "Amro" banks merge their mainframe data in 1991. Henk has agreed to let me share with my readers more ofthis success story here on my blog:
Back in December 2005, Henkand his colleagues had come to visit the IBM Tucson ExecutiveBriefing Center (EBC) to hear about IBM products and services. At the time, I was part of our "STG Lab Services" team that performed ILM assessments at client locations. I explained to ABN Amro that the ILM methodology does not requirean all-IBM solution, and that ILM could even provide benefits with their current mix of storage, software and service providers.The ABN Amro team liked what I had to say, andmy team was commissioned to perform ILM assessments atthree of their data centers:
Sao Paulo (Brazil)
Chicago, IL (USA)
Each data center had its own management, its owndecision making, and its own set of issues, so we structuredeach ILM assessment independently. When we presented our results,we showed what each data center could do better with their existing mixed bagof storage, software and service providers, and also showed howmuch better their life would be with IBM storage, software andservices. They agreed to give IBM a chance to prove it, and soa new "Global Storage Study" was launched to take the recommendationsfrom our three ILM studies, and flesh out the details to make aglobally-integrated enterprise work for them. Once completed,it was renamed the "Global Storage Solution" (GSS).
Henk summarized the above with "I am glad to see Tony Pearsonin the audience, who was instrumental to making this all happen."As with many client testimonials, he presented a few charts onwho ABN Amro is today, the 12th largest bank worldwide, 8th largest in Europe. They operate in 53 countries and manage over a trillioneuros in assets.
They have over 20 data centers, with about 7 PB of disk, and over20 PB of tape, both growing at 50 to 70 percent CAGR. About 2/3 of theiroperations are now outsourced to IBM Global Services, the remaining 1/3is non-IBM equipment managed by a different service provider.
ABN Amro deployed IBM TotalStorage Productivity Center, variousIBM System Storage DS family disk systems, SAN Volume Controller (SVC), Tivoli StorageManager (TSM), Tivoli Provisioning Manager (TPM), and several other products. Armed with these products, they performed the following:
Clean Up. IBM uses the term "rationalization" to relate to the assignment of business value, to avoid confusion with theterm "classification" which many in IT relate to identifyingownership, read and write authorization levels. Often, in theinitial phases of an ILM deployment, a portion of the data isdetermined to be eligible for clean up, either to move to a lower-cost tier or deleted immediately. ABN Amro and IBM set a goal to identifyat least 20 percent of their data for clean up.
New tiers. Rather than traditional "storage tiers" which are often justTier 1 for Fibre Channel disk and Tier 2 for SATA disk, ABN Amroand IBM came up with seven "information infrastructure tiers" thatincorporate service levels, availability and protection status.They are:
High-performance, Highly-available disk with Remote replication.
High-performance, Highly-available disk (no remote replication)
Mid-performance, high-capacity disk with Remote replication
Mid-performance, high-capacity disk (no remote replication)
Non-erasable, Non-rewriteable (NENR) storage employinga blended disk and tape solution.
Enterprise Virtual Tape Library with remote replicationand back-end physical tape
Mid-performance physical tape
These tiers are applied equally across their mainframe anddistributed platforms. All of the tiers are priced per "primary GB", so any additional capacity required for replication orpoint-in-time copies, either local or remote, are all folded into the base price.ABN Amro felt a mission-critical applicationon Windows or UNIX deserves the same Tier 1 service level asa mission-critical mainframe application. Exactly!
Deployed storage virtualization for disk and tape. Thisinvolved the SAN Volume Controller and IBM TS7000 series library.
Implemented workflow automation. The key product here is IBM Tivoli Provisioning Manager
Started an investigation for HSM on distributed. This would be policy-based space management to migrate lessfrequently accessed data to the TSM pool for Windows or UNIX data.
While the deployment is not yet complete, ABN Amro feels they have alreadyrecognized business value:
Reduced cost by identifying data that should be stored on lower tiers
Simplified management, consolidated across all operating systems (mainframe, UNIX, Windows)
Increased utilization of existing storage resources
Reduced manual effort through policy-based automation, which can lead to fewer human errors and faster adaptability to new business opportunities
Standardized backup and other operational procedures
Henk and the rest of ABN Amro are quite pleased with the progress so far,although recent developments in terms of the takeover of ABN AMRO by aconsortium of banks means that the model is only implemented so far in Europe. Further rollout depends on the storage strategy of the new owners. Nonetheless,I am glad that I was able to work with Henk, Jason, Barbara, Steve, Tom, Dennis, Craig and othersto be part of this from the beginning and be able to see it rollout successfully over the years.
IBM makes another breakthrough today with an announcement about tape data density. Unlike hard disk drive technologies that are hitting physical limits, IBM is proving that tape technology still has plenty of life in its future.
When I first started working for IBM in Tucson, back in 1986, a 3420 tape reel held only 180MB of data, and a 3480 tape cartridge improved this to 200MB of data. Today's enterprise tapes, like 3592 cartridges for the TS1130 drives, or LTO4 cartridges for the IBM TS1040 drives, are half-inch wide, half-mile long, and can store 1 TB or more of data per cartridge, depending on how well the data can compress. To increase cartridge capacity, designers can make changes in three dimensions:
Wider tape: The film industry tried this, going from 35mm to 70mm film, only to find that most cinemas did not want to upgrade their equipment. Keeping the media dimensions to half inch wide allows much of the engineering hardware to continue unchanged.
Longer tape: The problem with longer tape is that either the reel inside gets fatter, or you need to develop flatter media to fit within the existing cartridge dimensions. Wider reels means a bigger tape cartridge external dimensions, forcing changes to shelving units, cartridge trays, and carrying units. The media just can't get any flatter without risking getting more brittle.
Denser bit recording: once a convenient width and length were established, improving bit density turned out to be the best way to increase cartridge capacity.
Working with FujiFilm Corporation of Japan, my colleagues at IBM Research facility in Zurich were able to demonstrate an incredible 29.5 Gigabits per square inch, nearly 40 times more dense than today's commercial tape technology. In the near future, we will be able to hold a 35TB tape cartridge in our hand. There was actually a lot to make this happen, improved giant magentoresistive read/write heads, better servo patterns to stay on track, thinner tracks less than a micron thick, and better signal-to-noise processing to accomplish this. To learn more, you can read the [Press Release] or watch this quick [4-minute YouTube video].
Federal Rules for Civil Procedures (FRCP) will increase adoption of unstructured data classification, email archive systems and CAS.
CAS continues to flounder, but the rest I can agree with. Regulations are being adopted world wide. Japan has its own Sarbanes-Oxley (SOX) style legislation go into effect in 2008.IBM TotalStorage Productivity Center for Data is a great tool to help classify unstructured file systems. IBM CommonStore for email supports both Microsoft Exchange and Lotus Domino, and can be connected to IBM System Storage DR550 for compliance storage.
Unified storage systems (combined file and block storage target systems) will become increasingly attractive in 2007, because of their ease of use and simplicity.
I agree with this one also. Our sales of IBM N series in 2006 was great, and looking to continue its strong growth in 2007. The IBM N series brings together FCP, iSCSI and NAS protocols into one disk system. With the SnapLock(tm) feature, N series can store both re-writable data, as well as non-erasable, non-rewriteable data, on the same box. Combine the N series gateway on the front-end with SAN Volume Controller on the back-end, and you have an even more powerful combination.
Distributed ROBO backup to disk will emerge as the fastest growing data protection solution in 2007.
IDC had a similar prediction for 2006. ROBO refers to "Remote Office/Branch Office", and so ROBO backup deals with how to back up data that is out in the various remote locations. Do you back it up locally? or send it to a central location?Fortunately, IBM Tivoli Storage Manager (TSM) supports both ways, and IBM has introduced small disk and tape drives and auto-loaders that can be used in smaller environments like this. I don't know whether "backup to disk" will be the fastest growing, but I certainly agree that a variety of ROBO-related issues will be of interest this year.
2007 will be remembered as the year iSCSI SAN took off because of the much reduced pricing for 10 Gbit iSCSI and the continued deployment of 10 Gbit iSCSI targets.
While I agree that iSCSI is important, I can't say 2007 will be remembered for anything.We have terrible memory in these things. Ask someone what year did Personal Computers (PC) take off, and they will tell you about Apple's famous 1984 commercial. Ask someone when the Internet took off, cell phones took off, etc, and I suspect most will provide widely different answers, but most likely based on their own experience.
For the longest time, I resisted getting a cell phone. I had a roll of quarters in my car, and when I needed to make a call, I stopped at the nearby pay-phone, and made the call. In 1998, pay phones disappeared. You can't find them anymore. That was the year of the cell phones took off, at least for me.
Back to iSCSI, now that you can intermix iSCSI and SAN on the same infrastructure, either through intelligent multi-protocol switches available from your local IBM rep, or through an N series gateway, you can bring iSCSI technology in slowly and gradually. Low-cost copper wiring for 10 Gbps Ethernet makes all this very practical.
Another up-and-coming technology is AoE, or ATA-over-Ethernet. Same idea as iSCSI, but taken down to the ATA level.
CDP will emerge as an important feature on comprehensive data protection products instead of a separate managed product.
Here, CDP stands for Continuous Data Protection. While normal backups work like a point-and-shoot camera, taking a picture of the data once every midnight for example. CDP can record all the little changes like a video camera, with the option to rewind or fast-forward to a specific point in the day. IBM Tivoli CDP for Files, for example, is an excellent complement to IBM Tivoli Storage Manager.
The technology is not really new, as it has been implemented as "logs" or "journals" on databases like DB2 and Oracle, as well as business applications like SAP R/3.
The prediction here, however, relates to packaging. Will vendors "package" CDP into existing backup products, possibly as a separately priced feature, or will they leave it as a separate product that perhaps, like in IBM's case, already is well integrated.
The VTL market growth will continue at a much reduced rate as backup products provide equivalent features directly to disk. Deduplication will extend the VTL market temporarily in 2007.
VTL here refers to Virtual Tape Library, such as IBM TS7700 or TS7510 Virtualization Engine. IBM introduced the first one in 1997, the IBM 3494 Virtual Tape Server, and we have remained number one in marketshare for virtual tape ever since. I find it amusing that people are now just looking at VTL technology to help with their Disk-to-Disk-to-Tape (D2D2T) efforts, when IBM Tivoli Storage Manager has already had the capability to backup to disk, then move to tape, since 1993.
As for deduplication, if you need the end-target box to deduplicate your backups, then perhaps you should investigatewhy you are doing this in the first place? People take full-volume backups, and keep to many copies of it, when a more sophisticated backup software like Tivoli Storage Manager can implement backup policies to avoid this with a progressive backup scheme. Or maybe you need to investigate why you store multiple copies of the same data on disk, perhaps NAS or a clustered file system like IBM General Parallel File System (GPFS) could provide you a single copy accessible to many servers instead.
The reason you don't see deduplication on the mainframe, is that DFSMS for z/OS already allows multiple servers to share a single instance of data, and has been doing so since the early 1980s. I often joke with clients at the Tucson Executive Briefing Center that you can run a business with a million data sets on the mainframe, but that there wereprobably a million files on just the laptops in the room, but few would attempt to run their business that way.
Optical storage that looks, feels and acts like NAS and puts archive data online, will make dramatic inroads in 2007.
Marc says he's going out on a limb here, and that's good to make at least one risky prediction. IBM used to have anoptical library emulate disk, called the IBM 3995. Lack of interest and advancement in technology encouraged IBM to withdraw it. A small backlash ensued, so IBM now offers the IBM 3996 for the System p and System i clients that really, really want optical.
As for optical making data available "online", it takes about 20 seconds to load an optical cartridge, so I would consider this more "nearline" than online. Tape is still in the 40-60 second range to load and position to data, so optical is still at an advantage.
Optical eliminates the "hassles of tape"? Tape data is good for 20 years, and optical for 100 years, but nobody keeps drives around that long anyways. In general, our clients change drives every 6-8 years, and migrate the data from old to new. This is only a hassle if you didn't plan for this inevitable movement. IBM Tivoli Storage Manager, IBM System Storage Archive Manager, and the IBM System Storage DR550 all make this migration very simple and easy, and can do it with either optical or tape.
The Blue-ray vs. DVD debate will continue through 2007 in the consumer world. I don't see this being a major player in more conservative data centers where a big investment in the wrong choice could be costly, even if the price-per-TB is temporarily in-line with current tape technologies. IBM and others are investing a lot of Research and Development funding to continue the downward price curve for tape, and I'm not sure that optical can keep up that pace.
Well, that's my take. It is a sunny day here in China, and have more meetings to attend.
It's Tuesday, which means IBM makes its announcements. We had several for the IBM System Storage product line. Here's a quick recap.
The IBM System Storage DS3000 now offers DC power models.New DC powered models of the DS3200, DS3400, and EXP3000 are well suited for Telco industry environments, as theseare NEBS and ETSI compliant and are powered by an industry standard 48 volt DC power source.
Also, the IBM System Storage N series now supports750GB SATA drives available for the EXN1000 drawer.
IBM Virtualization Engine TS7740now supports 3-cluster grids. Unlike 3-way replication on disk mirroring, such as IBM Metro/Global Mirror for the DS8000 that enforces a primary, secondary and tertiary copy, the grid implementation of TS7740 tape virtualization allows for any-to-any mirroring. Existing standalone TS7740 clusters can be converted to grid-enabled. A "Copy Export" feature allows virtual tapes to be exported onto physical tape. And in keeping with our theme of "enabling business flexibility", performance throughput can now be purchased in 100 MB/sec increments, up to 600 MB/sec, to match your workload bandwidth requirements.
The IBM System Storage TS1120drives installed in the IBM System Storage™ TS3400 Tape Library can now be attached to System z platforms using the IBM System Storage™ TS1120 Tape Controller. Before this, the TS3400 could only be attached to UNIX, Windows and Linux systems.
The IBM System StorageTS2230 Express is offered as an external stand-alone or rack-mountable unit. This model incorporates the new LTO IBM Ultrium 3 Serial Attached SCSI (SAS) Half-High Tape Drive, and a 3 Gbps single port SAS interface for a connection to a wide spectrum of distributed system servers that support Microsoft Windows and Linux systems.
IBM has added theCisco MDS 9124 for IBM System Storageentry-level fabric switch as an Express offering and part of the IBM Express Advantage Program. Express offerings are specifically created for mid-market companies and are well suited for workgroup storage applications like e-mail serving, collaborative databases and web serving. They bring enterprise-class performance, scalability and features to small and medium-sized companies and are easy to use, highly scalable, and cost-effective.This will make it easier for IBM Business Partners to provide fabric switch connectivity for:
Storage consolidation solutions with IBM System Storage™ DS4000 Express disk arrays, especially the DS4700 Express.
Backup / restore solutions with IBM System Storage™ TS3000 Tape Libraries, such as the TS3200.
Archive and Retention
Ordering large configurations of the IBM System Storage Grid Access Manager just got a lot easier.New features enable configurations greater than 500 TB to be submitted as a single order. No change in the actualproduct, just an improvement in the ordering process.
For System p and System i servers, the IBM 3996 Optical library now supports Gen 2 60GB optical cartridges. These can be read/write or WORM cartridges.
I'm off to Denver, Colorado this week. I hope it is cooler there than it is down here in Tucson, Arizona.
With all the announcements we had in June, it is easy for some of the more subtle enhancements to get overlooked. While I was at Orlando for the IBM Edge conference, I was able to blog about some of the key featured announcements. Then, later, when I got back from Orlando to Tucson, I was able to then blog about [More IBM Storage Announcements]. For IBM's Scale-Out Network Attach Storage (SONAS), I had simply:
"SONAS v1.3.2 adds support for management by the newly announced IBM Tivoli Storage Productivity Center v5.1 release. Also, IBM now officially supports Gateway configurations that have the storage nodes connected to XIV or Storwize V7000 disk systems. These gateway configurations offer new flexible choices and options for our ever-expanding set of clients."
In my defense, IBM numbers its software releasees with version.release.modification, so 1.3.2 is Version 1, Release 3, Modification 2. Generally, modification announcements don't get much attention. The big announcement for v1.3.0 of SONAS happened last October, see my blog post [October 2011 Announcements - Part I] or
the nice summary post [IBM Scale-out Network Attached Storage 1.3.0] from fellow blogger Roger Luethy.
Here is a diagram showing the three configurations of SONAS.
I have covered the SONAS Appliance model in depth in previous blogs, with options for fast and slow disk speeds, choice of RAID protection levels, a collection of enterprise-class software features provided at no additional charge, and interfaces to support a variety of third party backup and anti-virus checking software.
The basics haven't changed. The SONAS appliance consists of 2 to 32 interface nodes, 2 to 60 storage nodes, and up to 7,200 disk drives. The maximum configuration takes up 17 frames and holds 21.6PB of raw disk capacity, which is about 17PB usable space when RAID6 is configured. An interface nodes has one or two hex-core processors with up to 144GB of RAM to offer up to 3.5GB/sec performance each. This makes IBM SONAS the fastest performing and most scalable disk system in IBM's System Storage product line.
I thought I would go a bit deeper on the gateway models. These models support up to ten storage nodes, organized in pairs. The key difference is that instead of internal disk controllers, the storage nodes connect to external disk systems. There is enough space in the base SONAS rack to hold up to six interface nodes, or you can add a second rack if you need more interface nodes for increased performance.
SONAS with XIV gateway
XIV offers a clever approach to storage that allows for incredibly fast access to data on relatively slow 7200 RPM drives. By scattering data across all drives and taking advantage of parallel processing, rebuild times for a failed 3TB drive are less than 75 minutes. Compare that to typical rebuild times for 3TB drives that could take as much as 9-10 hours under active I/O loads!
In the configuration, each pair of storage nodes can connect to external SAN Fabric switches that then connect to one or two XIV storage systems. How simple is that? These can be the original XIV systems that support 1TB and 2TB drives, or the new XIV Gen3 systems that support 400GB Solid-state drives (SSD) and 3TB spinning disk drives. In both cases, you can acquire additional storage capacity as little as 12 drives at a time (one XIV module holds 12 drives).
The maximum configuration of ten XIV boxes could hold 1,800 drives. At 3TB drive per drive, that would be 2.4PB usable capacity.
The SONAS with XIV gateway does not require the XIV devices to be dedicated for SONAS purposes. Rather, you can assign some XIV storage space for the SONAS, and the rest is available for other servers. In this manner, SONAS just looks like another set of Linux-based servers to the XIV storage system. This in effect gives you "Unified Storage", with a full complement of NAS protocols from the SONAS side (NFS, CIFS, FTP, HTTPS, SCP) as well as block-based protocols directly from the XIV (FCP, iSCSI).
SONAS with Storwize V7000 gateway
The other gateway offering is the SONAS with Storwize V7000. Like the SONAS with XIV gateway model, you connect a pair of SONAS storage nodes to 1 or 2 Storwize V7000 disk systems. However, you do not need a SAN Fabric switch in between. You can instead connect the SONAS storage nodes directly to the Storwize V7000 control enclosures.
To acquire additional storage capacity, you can purchase a single drive at a time. That's right. Not 12 drives, or 60 drives, at a time, but one at a time. The Storwize V7000 supports a wide range of SSD, SAS and NL-SAS drives at different sizes, speeds and capacities. The drives can be configured into various RAID protection levels: RAID 0, 1, 3, 5, 6 and 10.
Each Storwize V7000 control enclosure can have up to nine expansion drawers. If you choose the 2.5-inch 24-bay models, you can have up to 480 drives per storage node pair, for a total of 2,400 drives. If you choose the 3.5-inch 12-bay models, you can have up to 240 drives per node pair, 1,200 drives total. At 3TB per drive, this could be 3.6PB of raw capacity. The usable PB would depend on which RAID level you selected. Of course, you don't have to limit yourself all to one size or the other. Feel free to mix 2.5-inch and 3.5-inch drawers to provide different storage pool capabilities.
All three SONAS configurations support Active Cloud Engine. This is a collection of features that differentiate SONAS from the other scale-out NAS wannabees in the marketplace:
Policy-driven Data Placement -- Different files can be directed to different storage pools. You no longer have to associate certain file systems to certain storage technologies.
High-speed Scan Engine -- SONAS can scan 10 million files per minute, per node. These scans can be used to drive data migration, backups, expirations, or replications, for example. It is over 100 times faster than traditional walk-the-directory-tree approaches employed by other NAS solutions.
Policy-driven Migration -- You can migrate files from one storage pool to another, based on age, days since last reference, size, and other criteria. The files can be moved from disk to disk, or move out of SONAS and stored on external media, such as tape or a virtual tape library. A lot of data stored on NAS systems is dormant, with little or no likelihood of being looked at again. Why waste money keeping that kind of data on expensive disk? With SONAS, you can move those files to tape can save lots of money. The files are stubbed in the SONAS file system, so that an access request to a file will automatically trigger a recall to fetch the data from tape back to the SONAS system.
Policy-driven Expiration -- SONAS can help you keep your system clean, by helping you decide what files should be deleted. This is especially useful for things like logs and traces that tend to just hang around until some deletes them manually.
WAN Caching -- This allows one SONAS to act as a "Cloud Storage Gateway" for another SONAS at a remote location connected by Wide Area Network (WAN). Let's say your main data center has a large SONAS repository of files, and a small branch office has a smaller SONAS. This allows all locations to have a "Global" view of the all the interconnected SONAS systems, with a high-speed user experience for local LAN-based access to the most recent and frequently used files.
If you want to learn more, see the [IBM SONAS landing page]. Next week, I will be across the Pacific Ocean in [Taipei], to teach IBM Top Gun class to sales reps and IBM Business Partners. "Selling SONAS" will be one of the topics I will be covering!
Today was the "First Ever Live Virtual Virtualization Tech Fair" sponsored by IBM and VMware. This was a 1-day event hosted by Unisfair.
The day included presentations done at a conference call, along with exhibition booths.
We had an exhibition booth exclusively for "storage virtualization" featuring our IBM System Storage SAN Volume Controller (disk virtualization) and IBM System Storage TS7520 Virtualization Engine (a virtual tape library, or VTL).
People who were logged in were represented in silhouette form. When someone walked into the booth, our army of "booth reps" were able to chat with them and answer their questions. They could also peruse the various online materials we made available about each product.
Here are some of my observations:
A lot of questions were related to IBM's support for VMware. Although VMware is now currently owned by EMC, pending a spin-off IPO, IBM is its biggest reseller, given IBM's vast experience in server virtualization. Ironically, IBM's SAN Volume Controller supports VMware better than EMC's own storage virtualization product, Invista.
People also familiar with Second Life thought this 2-D "silhouette" version eliminated the need to configure and dress up your avatar as is required in participating in Second Life events. However, being only ableto chat, send e-mail and show web pages seemed less immersive than what Second Life can offer.
This event generated over 60 leads. We will pass on the contact information to the appropriate sales team.
Over on his Backup Blog, fellow blogger Scott Waterhouse from EMC has a post titled
[Backup Sucks: Reason #38]. Here is an excerpt:
Unfortunately, we have not been able to successfully leverage economies of scale in the world of backup and recovery. If it costs you $5 to backup a given amount of data, it probably costs you $50 to back up 10 times that amount of data, and $500 to back up 100 times that amount of data.
If anybody can figure out how to get costs down to $40 for 10 times the amount of data, and $300 for 100 times the amount of data, they will have an irrefutable advantage over anybody that has not been able to leverage economies of scale.
I suspect that where Scott mentions we in the above excerpt, he is referring to EMC in general, with products like
Legato. Fortunately, IBM has scalable backup solutions, using either a hardware approach, or one purely with software.
The hardware approach involves using deduplication hardware technology as the storage pool for IBM Tivoli Storage Manager (TSM). Using this approach, IBM Tivoli Storage Manager would receive data from dozens, hundreds or even thousands
of client nodes, and the backup copies would be sent to an IBM TS7650 ProtecTIER data deduplication appliance, IBM TS7650G gateway, or IBM N series with A-SIS. In most cases, companies have standardized on the operating systems and applications used on these nodes, and multiple copies of data reside across employee laptops. As a result, as you have more nodes backing up, you are able to achieve benefits of scale.
Perhaps your budget isn't big enough to handle new hardware purchases at this time, in this economy. Have no fear,
IBM also offers deduplication built right into the IBM Tivoli Storage Manager v6 software itself. You can use sequential access disk storage pool for this. TSM scans and identifies duplicate chunks of data in the backup copies, and also archive and HSM data, and reclaims the space when found.
If your company is using a backup software product that doesn't scale well, perhaps now is a good time to switch over to IBM Tivoli Storage Manager. TSM is perhaps the most scalable backup software product in the marketplace, giving IBM an "irrefutable advantage" over the competition.
For those of us in the northern hemisphere, yesterday was this year's Winter Solstice, representingthe shortest amount of daylight between sunrise and sunset. So today, I thought I would blog on my thoughtsof managing scarcity.
Earlier in my career, I had the pleasure to serve as "administrative assistant" to Nora Denzel for the week at a storage conference. My job was to make her look good at the conference, which if you know Nora, doesn't take much. Later, she left IBM to work at HP, and I gotto hear her speak at a conference, and the one thing that I remember most was her statement that thewhole point of "management" was to manage scarcity, as in not enough money in the budget,not enough people to implement change, or not enough resources to accomplish a task.(Nora, I have no idea where you are today, so if you are reading this, send me a note).
Of course, the flip-side to this is that resources that are in abundance are generallytaken for granted. Priorities are focused on what is most scarce. Let's examine some of theresources involved in an IT storage environment:
Capacity - while everyone complains that they are "running out of space", the truth is that most external disk attached to Linux, UNIX, or Windows systems contain only 20-40% data. Many years ago, I visitedan insurance company to talk about a new product called IBM Tivoli Storage Manager. This company had 7TB of disk on their mainframe,and another 7TB of disk scattered on various UNIX and Windows machines. In the room were TWO storage admins for
the mainframe, and 45 storage admins for the distributed systems. My first question was "why so many people forthe mainframe, certainly one of you could manage all of it yourself, perhaps on Wednesday afternoons?" Their response was that they acted as eachother's backup, in case one goes on vacation for two weeks. My follow-up question to the rest of the audience was:"When was the last time you took two weeks vacation?" Mainframes fill their disk and tape storage comfortablyat over 80-90% full of data, primarily because they have a more mature, robust set of management software, likeDFSMS.
Labor - by this I mean skilled labor able to manage storage for a corporation. Some companies I have visitedkeep their new-hires off production systems for the first two years, working only on test or development systemsonly until then. Of course, labor is more expensive in some countries than others. Last year, I was doing a whiteboard session on-site for a client in China, and the last dry-erase pen ran out of ink. I asked for another pen, and they instead sent someone to go re-fill it. I asked wouldn't it be cheaper just to buy another pen, and they said "No, labor is cheap, but ink is expensive." Despite this, China does complain that there is a shortage of askilled IT labor force, so if you are looking for a job, start learning Mandarin.
Power and Cooling - Most data centers are located on raised floors, with large trunks of electrical power and hugeair conditioning systems to deal with all the heat generated from each machine. I have visited the data centers ofclients that are forced now to make decisions on storage based on power and cooling consumption, because the coststo upgrade their aging buildings are too high. Leading the charge is IBM, with technology advancements in chips, cards, and complete systems that use less power, and generate less heat. While energy is still fairly cheap in the grand scheme of things, fears ofGlobal Warmingand declining oil supplies, the costs ofpower and cooling have gotten some news lately. In 1956, Hubbert predicted US would reach peak oil supplies by1965-1970 (it happened in 1971), and this year Simmonsestimated that world-wide oil production began its decline already in 2005. Smart companies like Google have movedtheir server farms to places like Oregon in the Pacific Northwest for cheaper hydroelectric power.
Bandwidth - Last year IBM introduced 4Gbps Fibre Channel and FICON SAN networking gear, along with the servers and storage needed to complete the solution. 4Gbps equates to about 400 MB/sec in data throughput. By comparison, iSCSI is typically run on 1Gbps Ethernet, but has so much overheads that you only get abour 80 MB/sec. Next year, we may see both 8 Gbps SAN, and 10 GbE iSCSI, to provide 800 MB/sec throughputs. My experience is that the SAN is not the bottleneck, instead people run out of bandwidth at the server or storage end first. They may not have a million dollars to buy the fastest IBM System p5 servers, or may not have enough host adapters at the storage system end.
Floorspace - I end with floorspace because it reminds me that many "shortages" are temporary or artificially created. Floorspace is only in short supply because you don't want to knock down a wall, or build a new building, to handle your additional storage requirements.In 1997, Tihamer Toth-Fejel wrote an article for the National Space Society newsletter that estimated that ...Everybody on Earth could live comfortably in the USA on only 15% of our land area, with a population density between that of Chicago and San Francisco. Using agricultural yields attained widely now, the rest of the U.S. would be sufficient to grow enough food for everyone. The rest of the planet, 93.7% of it, would be completely empty.Of course, back in 1997 the world population was only 5.9 billion, and this year it is over 6.5 billion.
This last point brings me back to the concept of food, and I am not talking about doughnuts in the conference room, or pizza while making year-end storage upgrades. I'm talking aboutthe food you work so hard to provide for yourself and your family. The folks at Oxfam came up with a simpleanalogy. If 20 people sit down at your table, representing the world’s population:
3 would be served a gourmet, multi-course meal, while sitting at decorated table and a cushioned chair.
5 would eat rice and beans with a fork and sit on a simple cushion
12 would wait in line to receive a small portion of rice that they would eat with their hands while sitting on the floor.
So for those of you planning a special meal next Monday, be thankful you are one of the lucky three, and hopefulthat IBM will continue to lead the IT industry to help out the other seventeen.
Use more efficient disk media, such as high-capacity SATA disk drives
Both are great recommendations, but why limit yourself to what EMC offers? Your x86-based machines are only a subset of your servers,and disk is only a subset of your storage. IBM takes a more holistic approach, looking at the entire data center.
VMware is a great product, and IBM is its top reseller. But in addition to VMware, there are other solutions for the x86-based servers, like Xen and Microsoft Virtual Server. IBM's System p, System i, and System z product lines all support logical partitioning.
To compare the energy effectiveness of server virtualization, consider a metric that can apply across platforms. For example, for an e-mail server, consider watts per mailbox. If you have, say, 15,000 users, you can calculate how many watts you are consuming to manage their mailboxes on your current environment, and compare that with running them on VMware, or logical partitions on other servers. Some people find it surprising that it is often more cost-effective, and power-efficient, to run workloads on mainframe logical partitions (LPARs) than a stack of x86 servers running VMware.
More efficient Media
SATA and FATA disks support higher capacities, and run at slower RPM speeds, thus using fewer watts per terabyte.A terabyte stored on 73GB high-speed 15K RPM drives consumes more watts than the same terabyte stored using 500GB SATA.Chuck correctly identifies that tape is more power-efficient than disk, but then argues that paper is more power-efficient than tape. But paper is not necessarily more efficient than tape.
ESG analyst Steve Duplessie divides up data betweenDynamic vs. Persistent. The best place to put dynamic data is on disk, and here is where evaluation of FC/SAS versus SATA/FATA comes into play.Persistent data, on the other hand, can be stored on paper, microfiche, optical or tape media. All of these shelf-resident media consume no electricity, nor generate any heat that would require additional cooling.
A study by scientists at the Lawrence Berkeley National Laboratory titled High-Tech Means High-Efficiency: The Business Case for Energy Management in High-Tech Industries indicates thatData centers consume 15 to 100 times more energy per square foot than traditional office space. Storing persistent data in traditional office space can save a huge amount of energy. Steve Duplessie feels the ratio of dynamic to persistent data is 1:10 today, but is likely to grow to 1:100 in the near future, raising the demand for energy-efficient storage of persistent data ever more important to our environment.
Data centers consume nearly 5000 Megawatts in the USA alone, 14000 Megawatts worldwide. To put that in perspective, the country of Hungary I was in last week can generate up to 8000 Megawatts for the entire country (and they were using 7400 Megawatts last week as a result of their current heat wave, causing them grave concern).
Back in the 1990's, one of the insurance companies IBM worked with kept data on paper in manila folders, and armiesof young adults in roller skates were dispatched throughout the large warehouses of shelves to get the appropriate folder in response to customer service inquiries. Digitizing this paper into electronic format greatly reduced the need for this amount of warehouse space, as well as improved the time to retrieve the data.
A typical file storage box (12 inch x 12 inch x 18 inch) containing typed pages single-spaced, double-sided, 12 point font could hold perhaps 100MB. The same box could hold a hundred or more LTO or 3592 tape cartridges, each storing hundreds of GB of information. That's a million-to-one improvement of space-efficiency, and from a watts-per-TB basis, translates to substantial improvement in standard office air conditioning and lighting conditions.
To learn more about IBM's Project Big Green, watch thisintroductory video which used Second Life for the animation.
We've been quite busy here at the Tucson Executive Briefing Center. I am often asked to explain the relationship between IBM's various storage products. While automakers don't have to explain why they sell sports coupes, pickup trucks and minivans, this analogy does not adequately cover IT storage products. So, I have come up with a new analogy that seems to be a better fit: foundations and flavorings.
All over the world, meals are often comprised of a foundation, perhaps rice, potatoes or pasta, covered with some form of flavoring, sauces, pieces of meat or fish, grated cheese and spices. In Puerto Rico, I had dishes where the foundation was mashed bananas called [plantains]. Sandwich shops often let you pick your choice of bread, the foundation, and then your meats and cheeses, the flavorings.At our local steakhouse,[McMahon's], the menulists a set of steaks, the foundation such as Rib Eye, Filet Mignon, Prime Rib or New York Strip, andvarious flavorings, such as sauces and rubs to cover the steak. Last night, I had the Delmonico steak with the Cristiani sauce consisting of Portobello mushrooms, garlic and aged Romano cheese.
This serves as a useful analogy for IBM's storage strategy. Allowing thefoundations and flavorings to be separately orderable greatly simplifies the selection menu and providesa nearly any-to-any approach to meeting a variety of client needs.Let's take a look at both.
IBM's foundation products are the DS family [DS3000, DS4000, DS5000, DS6000 and DS8000 series], [DS9900 series], and [XIV] for disk, and the TS family [TS1000, TS2000, TS3000] series for tape drives and libraries. In much thesame way you might prefer brown rice instead of white rice, or linguine instead of penne pasta, you might find the attributes of one storagefoundation more attractive based on its performance, scalability and availability features for yourparticular application workloads.
Fellow IBM blogger Barry Whyte discusses SVC at great length on his [Storage Virtualization] blog. Flavoring disk foundation storage with SAN Volume Controller can provide you additionalfeatures and functions, and help improve the scalability, performance or availability characteristics.For example, if you have DS4000, DS8000 and XIV, you might use SVC to provide a consistent methodologyfor asynchronous replication, a form of consistent "flavoring" if you will.
N series Gateways
The [N series gateways] offerflavoring to disk foundation, including unified NAS, iSCSI and FCP protocol host attachment, and application aware capabilities. (As for our IBM N series appliances or "filers", these could be foundational storage behind an SVC, but that's perhaps a topic for another post.)
SoFS provides a global namespace with clustered NAS access to files. This is a blended disk-and-tape solution with built-in backup and Information Lifecycle Management [ILM]. Policies can be used to place different files onto different tiers of storage, automate the movement from tier to tier, including migration to tape, and even expiration when the data is no longer needed.
The [IBM System Storage DR550] provides Non-erasable, Non-rewriteable (NENR) flavoring to storage. While the DR550 comes with internal disk storage, it can front end a tape library filled with WORM cartridges. The DR550 hasbeen paired up with small libraries (TS3200 or TS3310) as well as larger libraries like the TS3500.
The IBM Grid Medical Archive Solution [GMAS] provides a variety of capabilities for storing and accessing medical images, using a blended disk-and-tape approach. This allows hospital and clinicnetworks to provide access for doctors and radiologists from multiple locations.
Many of the flavorings are called "gateways". The IBM TS7650G flavors disk that provides a virtualtape library[VTL] with inline data deduplication capability. Recent performance tests pairing the TS7650G flavoring with XIV foundation storage found this combination to be an excellent match.
Let me know what you think. Does this help you understand IBM's storage strategy and acquisitions? Enteryour comments below.
Well, we had another successful event in Second Life today.
Unlike our April 26 launch of our System Storage products for IBM Business Partners only, this time we decided this time to make it as a "Meet the Storage Experts" Q&A Panel format, and open up registration to everyone. Thesubject matter experts sat at the front of the room on four stools. We had six rows of chairs arrangedsemi-circularly.
Shown above, from left to right, are the avatars of our four experts:
IBM System Storage N series, focusing on recent N3000 disk system announcements
Harold Pike (holding the microphone while speaking)
IBM System Storage DS3000 and DS4000 series, focusing on recent DS3000 disk system announcements
IBM System Storage TS series, focusing on recent TS2230, TS3400 and TS7700 tape system announcements
IBM storage networking, focusing on recent IBM SAN256B director blade announcements
While Eric was a veteran Second Lifer, having presented at our April event, the other three were trainedon how to raise their hand, speak into the microphone, sit on the stool, and so on. I want to thank allof our experts for putting in this effort!
The event was produced by Katrina H Smith. She did a great job, and made sure we were on top ofall the issues and tasks required to get the job done. Running a Second Life event is every bit ashard as running a real face-to-face event. We had several meetings to discuss venue details, placementof chairs, placement of product demos, audio/video recording, wall decorations, tee-shirt and coffee mug design, logistics, and so on.
I acted as moderator/emcee for the event. That is my back in the picture above. The process wassimple, modeled after the "Birds of a Feather" sessions at events like SHARE and the IBMStorage and Storage Networking Symposium. We threw out a list of topics the experts would cover,and people in the audience would "raise their left hand". I, as the moderator, would then walkover to each person, and hold out the microphone for them to ask the question. I would then repeat the question and ask the appropriate expert to provide an answer. We defined gestures onhow to "raise hand" and "put hand down" that we gave to each registered participant.
We had four dedicated "camera-avatars" in world to capture both video and screenshots.Our video editors are now working to edit "highlight videos" that we can use at future events, for training materials, and for our internal "BlueTube" online video system.
The room was filled with examples of each of our products, made into 3D objects that were dimensionallycorrect, and "textured" with photographs of the actual products. If you click on an object, you get a "notecard" that provided more information. Special thanks to Scott Bissmeyer for making all of theseobjects for us.
We made posters of each expert and placed them in all four corners of the room. On the bottom of each coffee mug was a picture of each of the experts, and if you walked under each of the posters, you were"dispensed" a coffee mug matching the expert shown in the poster.Participants could "Collect all Four!" When you bring the coffee mug up to takea sip, the picture on the bottom of the mug is exposed for all to see.And as a final give-away to the audience, we made a variety of event tee-shirts and polo-shirts.
At the end of the session, we asked everyone to click on the "Survey" kiosk near the exit door. We askedsix simple questions using SurveyMonkey.com that took only a fewminutes to process. We found asking questions immediately at the end of the event was the best way tocapture this feedback.
From a "Green" perspective, we had people registered from the following countries: US, India, Mexico,Australia, United Kingdom, Brazil, Germany, Argentina, Chile, China, Canada, and Venezuela. Second Lifeallows all these people who probably could not travel, or could not afford the time and expense to travel,to participate in a simulated face-to-face meeting without energy consumption of traditional travel methods.
More importantly, we got several leads for business. People often ask "Yes, but is there any businessassociated with this?" This time, there was, based on the answers to the questions, several avatars asked for a real sales call to follow-up on the products and offerings they were discussed.
With such a great success, we have already scheduled our next Second Life event, November 8. Mark your calendars! I'll postmore details on the registration process of the November event when available.
I've blogged about some of these videos already, but since there are probably a few out there buying the brand new Apple iPhone looking for YouTube videos to play on them, these links might provide some exampleentertainment on your new handheld device.
Next week has "Fourth of July" Independence Day holiday in the USA smack in the middle of the week, so I suspect the blogosphereto quiet down a bit. So whether you are working next week or not, in the USA or elsewhere, take some time to enjoy your friends and family.
Many people have asked me if there was any logic with the IBM naming convention of IBM Systems branded servers. Here's your quick and easy cheat sheet:
System x -- "x" for cross-platform architecture. Technologies from our mainframe and UNIX servers were brought into chips that sit next to the Intel or AMD processors to provide a more reliable x86 server experience. For example, some models have a POWER processor-based Remote Supervisor Adapter (RSA).
System p -- "p" for POWER architecture.
System z -- "z" for Zero-downtime, zero-exposures. Our lawyers prefer "near-zero", but this is about as close as you get to ["six-nines" availability] in our industry, with the highest level of security and encryption, no other vendor comes close, so you get the idea.
But what about the "i" for System i? Officially, it stands for "Integrated" in that it could integrate different applications running on different operating systems onto a [COMMON] platform. Options were available to insert Intel-based processor cards that ran Windows, or attach special cables that allowed separate System x servers running Windows to attach to a System i. Both allowed Windows applications to share the internal LAN and SAN inside the System i machine. Later, IBM allowed [AIX on System i] and [Linux on Power] operating systems to run as well.
From a storage perspective, we often joked that the "i" stood for "island", as most System i machines used internal disk, or attached externally to only a fewselected models of disk from IBM and EMC that had special support for i5/OS using a special, non-standard 520-byte disk block size. This meant only our popular IBM System Storage DS6000 and DS8000 series disk systems were available. This block size requirement only applies to disk. For tape, i5/OS supports both IBM TS1120 and LTO tape systems. For the most part,System i machines stood separate from the mainframe, and the rest of the Linux, UNIX and Windows distributed serverson the data center floor.
Often, when I am talking to customers, they ask when will product xyz be supported on System z or System i?I explained that IBM's strategy is not to make all storage devices connect via ESCON/FICON or support non-standard block sizes, but rather to get the servers to use standard 512-byte block size, Fibre Channel and other standard protocols.(The old adage applies: If you can't get Mohamed to move to the mountain, get the mountain to move to Mohamed).
On the System z mainframe, we are 60 percent there, allowing three of the five operating systems (z/VM, z/VSE and Linux) to access FCP-based disk and tape devices. (Four out of six if you include [OpenSolaris for the mainframe])But what about System i? As the characters on the popular television show [LOST] would say: It's time to get off the island!
Last week, IBM announced the new [i5/OS V6R1 operating system] with features that will greatly improve the use of external storage on this platform. Check this out:
POWER6-based System i 570 model server
Our latest, most powerful POWER processor brought to the System i platform. The 570 model will be the first in the System i family of servers to make use of new processing technology, using up to 16 (sixteen!) POWER6 processors (running at 4.7GHZ) in each machine.The advantage of the new processors is the increased commercial processing workload (CPW) rating, 31 percent greater than the POWER5+ version and 72 percent greater than the POWER5 version. CPW is the "MIPS" or "TeraFlops" rating for comparing System i servers.Here is the[Announcement Letter].
Fibre Channel Adapter for System i hardware
That's right, these are [Smart IOAs], so an I/O Processor (IOP) is no longer required! You can even boot the Initial Program Load (IPL) direclty from SAN-attached tape.This brings System i to the 21st century for Business Continuity options.
Virtual I/O Server (VIOS)
[VirtualI/O Server] has been around for System p machines, but now available on System i as well. This allows multiplelogical partitions (LPARs) to access resources like Ethernet cards and FCP host bus adapters. In the case of storage, the VIOS handles the 520-byte to 512-byte conversion, so that i5/OS systems can now read and write to standard FCP devices like the IBM System Storage DS4800 and DS4700 disk systems.
IBM System Storage DS4000 series
Initially, we have certified DS4700 and DS4800 disk systems to work with i5/OS, but more devices are in plan.This means that you can now share your DS4700 between i5/OS and your other Linux, UNIX and Windowsservers, take advantage of a mix of FC and SATA disk capacities, RAID6 protection, and so on.
To call [IBM PowerVM] the "VMware for the POWER architecture" would not do it quite justice. In combination with VIOS, IBM PowerVM is able to run a variety of AIX, Linux and i5/OS guest images.The "Live Partition Mobility" feature allows you to easily move guest images from one system to another, while they are running, just like VMotion for x86 machines.
And while we are on the topic of x86, PowerVM is also able to represent a Linux-x86 emulation base to run x86-compiled applications. While many Linux applications could be re-complied from source code for the POWER architecture "as is", others required perhaps 1-2 percent modification to port them over, and that was too much for some software development houses. Now, we can run most x86-compiled Linux application binaries in their original form on POWER architecture servers.
BladeCenter JS22 Express
The POWER6-based [JS22 Express blade] can run i5/OS, taking advantage of PowerVM and VIOS to access all of the BladeCenterresources. The BladeCenter lets you mix and match POWER and x86-based blades in the same chassis, providing theultimate in flexibility.
Well it's Tuesday, which means its time to look at recent announcements.While I was on vacation last week, IBM made a lot of storage announcements October 23.Josh Krischer gives his summary on WikiBon [October 2007 Review].Austin Modine of the The Register went so far as to say that [IBM goes crazy with storage system updates].
IBM System Storage DS8000 series
This is "Release 3" software/microcode upgrades on our existing "Turbo" hardware.
IBM FlashCopy SE -- Here "SE" stands for Space Efficient. Rather than allocating a full 100% of the space for the FlashCopy destination, you can set aside just a fraction, and this will hold all the changed blocks, similar to whatIBM already offers on the DS4000 series.
Dynamic Volume Expansion -- In the past, if you needed more space for a LUN, you had to carve out a newer one elsewhere, and then copy the data over from the old to the new, leaving the old LUN around to be re-used or leftstranded. With this enhancement, you can just upgrade the LUN in place, making it bigger as needed, similar to whatIBM already offers on the DS4000 series and SAN Volume Controller. This applies to CKD volumes for the System zmainframe users out there as well.
Storage Pool Striping -- striping volumes across RAID ranks to eliminate or reduce hot-spots, and provide betterload balancing. Many used SAN Volume Controller in front of the DS8000 to do this, but now you can do it natively inthe DS8000 itself.
z/OS Global Mirror Multiple Reader -- for System z customers, "z/OS Global Mirror" is the new name for XRC. Thisenhancement improves the throughput of sending updates to the remote disaster recovery location.
DS Storage Manager enhancements, the element manager software has been enhanced, and is pre-installed on the new IBM System Storage Productivity Center, which I will talk about below.
Intermix of DS8000 machine types -- this is especially useful to allow new frames to have co-terminating warrantieswith the base units. In other words, as you expand your system, you can ensure that the entire chunk of iron runs outof warranty all at the same time, to simplify your decision making process to upgrade or contract for extended service.
One of the biggest complaints about IBM TotalStorage Productivity Center is that it is software that needs to beinstalled on its own server, and that this installation process can take a day or two. Why wait? Now you can havea hardware console that has the DS8000 Storage Manager software, SVC Admin Console software, and IBM TotalStorageProductivity Center "Basic Edition" pre-installed. Here are the key features.
Pre-installed and tested console
DS8000 R3 GUI integration
Cohabitation of SVC 4.2.1 GUI and CIMOM
Automated device discovery
Asset and capacity reporting, including tape library support
Our "Release 9" applies across the board, from N3000 to N5000 to N7000 series models, includingnew host bus adapters, and the new Data OnTAP 7.2.4 release level.
The Virtual File Manager (VFM) was announced as one of our latest [Storage Virtualization Solutions]. VFMprovides a global namespace that aggregates the file systems from Linux, UNIX, and Windows file servers, as well asN series storage, into a consolidated environment.
IBM's virtual tape library (VTL) for the distributed systems platform, has been enhanced to provide:
Up to 12TB of disk cache, using 750GB SATA disk.
F05 Tape Frames installed as TS7520 base units through a 32 port fibre channel switch
Support for LTO generation 4 tape drives, both as virtual tape drives and as physical tape drives within IBM automated tape libraries attached to the TS7520. This allows you to use Encryption capabilities of LTO4.
DS3000 series now supports SATA disk, and can be attached to AIX and Linux on System p servers. This appliesto the DS3200, DS3300 and DS3400 models.See the [DS3000 Announcement Letter] for more details.
During lunch, people were able to take a look at our solutions. Here are Dan Thompson and Brett Cooper striking a pose.
Hyper-Efficient Backup and Recovery
The afternoon was kicked off by Dr. Daniel Sabbah, IBM General Manager of Tivoli software. He started with some shocking statistics: 42 percent of small companies have experienced data loss, 32 percent have lost data forever. IBM has a solution that offers "Unified Recovery Management". This involves a combination of periodic backups, frequent snapshots, and remote mirroring.
IBM Tivoli Storage Manager (TSM) was introduced in 1993, and was the first backup software solution to support backup to disk storage pools. Today, TSM is now also part of Cloud Computing services, including IBM Information Protection Services. IBM announced today a new bundle called IBM Storwize Rapid Application Backup, which combines IBM Storwize V7000 midrange disk system, Tivoli FlashCopy Manager, implementation services, with a full three-year hardware and software warranty. This could be used, for example, to protect a Microsoft Exchange email system with 9000 mailboxes.
IBM also announced that its TS7600 ProtecTIER data deduplication solutions have been enhanced to support many-to-many bi-direction remote mirroring. Last year, University of Pittsburgh Medical Center (UPMC) reported that they were average 24x data deduplication factor in their environment using IBM ProtecTIER.
"You are out of your mind if you think you can live without tape!"
-- Dick Crosby, Director of System Administration, Estes
The new IBM TS1140 enterprise class tape drive process 2.3 TB per hour, and provides a density of 1.2 PB per square foot. The new 3599 tape media can hold 4TB of data uncompressed, which could hold up to 10TB at a 2.5x compression ratio.
The United States Golfers Association [USGA] uses IBM's backup cloud, which manages over 100PB of data from 750 locations across five continents.
Customer Testimonial - Graybar
Randy Miller, Manager of Technical System Administration at Graybar, provided the next client testimonial. Graybar is an employee-owned company focused on supply-chain management, serving as a distributor for electical, lighting, security, power and cooling equipment.
Their problem was that they had 240 different locations, and expecting local staff to handle tape backups was not working out well. They centralized their backups to their main data center. In the event that a system fails in one of their many remote locations, they can rebuild a new machine at their main data center across high-speed LAN, and then ship overnight to the remote location. The result, the remote location has a system up and running by 10:30am, faster than they would have had from local staff trying to figure out how to recover from tape. In effect, Graybar had implemented a "private cloud" for backup in the 1990s, long before the concept was "cool" or "popular".
In 2001, they had an 18TB SAP ERP application data repository. To back this up, they took it down for 1 minute per day, six days a week, and 15 minutes down on Sundays. The result was less than 99.8 percent availability. To fix this, they switched to XIV, and use Snapshots that are non-disruptive and do not impact application performance.
Over 85 percent of the servers at Graybar are virtualized.
Their next challenge is Disaster Recovery. Currently, they have two datacenters, one in St. Louis and the other in Kansas City. However, in the aftermath of Japan's earthquakes, they realize there is a nuclear power plan between their two locations, so a single incident could impact both data centers. They are working with IBM, their trusted advisors, to investigate a three-site solution.
This week, May 15-22, I am in Auckland, New Zealand teaching IBM Storage Top Gun sales class. Next week, I will be in Sydney, Australia.
Ten years ago, I travelled to New York City with my colleague, Randy Fleenor, to present the latest in IBM tape technology for the 50th Anniversary. On Thursday evening that week, the latest movie in the Star Wars saga, Episode II: Attack of the Clones was just released, and it was being shown using the new Digital Light Projection (DLP) technology just around the corner at the Ziegfeld theater! This movie was the first live-action film to be filmed entirely digital. George Lucas saw that digital video was the future, and started the process moving forward with this film.
I convinced Randy to join me, and we arrived at 11:10pm, the movie was scheduled to start at 11pm, so we figured we had only missed a few previews. We walked into a completely empty lobby. I asked for two tickets for the 11pm show at the ticket counter, and was told it was all sold out, and there was a huge line around the building for all the people waiting to see the 1:00am show, and that we might get in to see the 3:00am show.
Randy and I had meetings on Friday morning, so we were not going to wait in line all night to see a 3am show! Just then, a young man comes out of the theater. He said his girlfriend can't make it, and wanted a refund for his two tickets. I pulled out a twenty-dollar bill, offered to buy them directly at face value, and the theater employees approved the transaction. The seats were front row of the balcony section. By then we had missed all the previews and a short bit of the movie, but that was alright with us.
(FTC Disclosure: I am both an employee and stockholder in IBM. The U.S. Federal Trade Commission may consider this a paid, celebrity endorsement of LTO-5 tapes and the LTFS technology. References to other companies are for illustrative purposes and do not represent an endorsement of their products or services.)
Digital recording is ideal for all types of video, including movies, television, and commercial advertisements.
The latest excitement is over IBM's Linear Tape File System™ (LTFS), which IBM donated to the IT industry as open source so that everyone in the world can benefit. This allows tape cartridges to be treated like USB memory sticks, the ultimate in portability of data. It is supported for Windows, Mac OS, and Linux, and already well embraced by the Media-and-Entertainment (M&E) industry.
"The move to IBM technology has helped the network shrink its archive from 1,507 to just 388 square feet, representing dramatic systems and energy-cost savings."
"AlphaTV has been broadcasting since 1996, creating and storing all forms of video entertainment, from soap operas and documentaries, to movies and sporting events, and creating a vast video archive along the way. Initially, AlphaTV archived its programming on Sony Beta SP format video cassettes that stored up to 90 minutes of content. Not long after, in need of storage that offered greater density, it turned to DVCPRO format videos that stored up to 120 minutes. But even that format was not allowing the network to keep pace with its ballooning archive, a storage infrastructure that by 2011 spanned more than 1,507 square feet."
"'A Greek TV series stored on 100 DVCPRO tapes took up four shelves in our library, whereas on LTO-5 cartridge now takes up the space of a deck of playing cards,' Constantinos Colombus, chief technology officer at AlphaTV, said in a statement."
"IBM LTFS, an intuitive and graphical file system that provides direct access to data on LTO 5 drives, has enabled AlphaTV to manage, move and share video files much like they can with disk-management systems, by simply dragging and dropping. As a result, file management is easier to do and far more efficient, said Colombus."
To prepare for this anniversary, I spoke with Brad Johns, of [Brad Johns Consulting]. Brad was head of IBM tape marketing for a while, and ran tape customer councils to gather feedback from our largest customers. Brad was my mentor in marketing at IBM from 2003-2007 and has since retired from IBM to start his own consulting practice.
The comparison was made between Crossroad Systems' Strongbox® with Enterprise tape library, LTO-5 tapes using LTFS, versus a unified disk storage system offering NAS protocols on high-capacity 3TB drives. The findings: the tape-based archive had nearly 80 percent lower TCO than the disk-based solution!
You don't have to be in the middle of the Greek economy to real that is a good value!
I have created blog categories, based on our System Storage offering matrix, which you can track individually:
Disk systems, including the IBM System Storage DS Family of products, SAN Volume Controller, N series, as well as features unique to these products, such as FlashCopy, MetroMirror, or SnapLock. Tape
Tape systems, including the IBM System Storage TS Family of products, tape-related products in the Virtualization Engine portfolio, drives, libraries and even tape media.
Storage Networking offerings, from Brocade, McData, Cisco and others, such as switches, routers and directors.
Infrastructure management, including IBM TotalStorage Productivity Center software, IBM Tivoli Provisioning Manager, IBM Tivoli Intelligent Orchestrator, and IBM Tivoli Storage Process Manager.
Business Continuity, including IBM Tivoli Storage Manager, Tivoli CDP for Files, Productivity Center for Replication software component, Continuous Availability for Windows (CAW), Continuous Availability for AIX (CAA).
Lifecycle and Retention offerings, including our IBM System Storage DR550, DR550 Express, GPFS, Tivoli Storage Manager Space Management for UNIX, Tivoli Storage Manager HSM for Windows, and DFSMS.
Storage services, including consulting, assessments, design, deployment, management and outsourcing.
In Monday's post, [IBM Information Infrastructure launches today], I explained how this strategic initiative fit into IBM's New EnterpriseData Center vision. The launch was presented at the IBM Storage and Storage Networking Symposium to over 400 attendeesin Montpelier, France, with corresponding standing-room-only crowds in New York and Tokyo.
This post will focus on Information Retention, the third of the four-part series this week.
Here's another short 2-minute video, on Information Retention
Let's start with some interesting statistics.Fellow blogger Robin Harris on his StorageMojo blog has an interesting post:[Our changing file workloads],which discusses the findings of study titled"Measurement and Analysis of Large-Scale Network File System Workloads"[14-page PDF]. This paper was a collaborationbetween researchers from University of California Santa Cruz and our friends at NetApp.Here's an excerpt from the study:
Compared to Previous Studies:
Both of our workloads are more write-oriented. Read to write byte ratios have significantly decreased.
Read-write access patterns have increased 30-fold relative to read-only and write-only access patterns.
Most bytes are transferred in longer sequential runs. These runs are an order of magnitude larger.
Most bytes transferred are from larger files. File sizes are up to an order of magnitude larger.
Files live an order of magnitude longer. Fewer than 50 percent are deleted within a day of creation.
Files are rarely re-opened. Over 66 percent are re-opened once and 95% fewer than five times.
Files re-opens are temporally related. Over 60 percent of re-opens occur within a minute of the first.
A small fraction of clients account for a large fraction of file activity. Fewer than 1 percent of clients account for50 percent of file requests.
Files are infrequently shared by more than one client. Over 76 percent of files are never opened by more than one client.
File sharing is rarely concurrent and sharing is usually read-only. Only 5 percent of files opened by multiple clients are concurrent and 90 percent of sharing is read-only.
Most file types do not have a common access pattern.
Why are files being kept ten times longer than before? Because the information still has value:
Provide historical context
Gain insight to specific situations, market segment demographics, or trends in the greater marketplace
Help innovate new ideas for products and services
Make better, smarter decisions
National Public Radio (NPR) had an interesting piece the other day. By analyzing old photos, a researcher for Cold War Analysis was able to identify an interesting [pattern for Russian presidents]. (Be sure to listen to the 3-minute audio to hear a hilarious song about the results!)
Which brings me to my own collection of "old photos". I bought my first digital camera in the year 2000,and have taken over 15,000 pictures since then. Before that,I used 35mm film camera, getting the negatives developed and prints made. Some of these date back to my years in High School and College. I have a mix of sizes, from 3x5, 4x6 and 5x7 inches,and sometimes I got double prints.Only a small portion are organized intoscrapbooks. The rest are in envelopes, prints and negatives, in boxes taking up half of my linen closet in my house.Following the success of the [Library of Congress using flickr],I decided the best way to organize these was to have them digitized first. There are several ways to do this.
This method is just too time consuming. Lift the lid place 1 or a few prints face down on the glass, close the lid,press the button, and then repeat. I estimate 70 percent of my photos are in [landscape orientation], and 30 percent in [portrait mode]. I can either spend extra time toorient each photo correctly on the glass, or rotate the digital image later.
I was pleased to learn that my Fujitsu ScanSnap S510 sheet-feed scanner can take in a short stack (dozen or so) photos, and generate JPEG format files for each. I can select 150, 300 or 600dpi, and five levels of JPEG compression.All the photos feed in portrait mode, which I can then rotate later on the computer once digitized.A command line tool called [ImageMagick] can help automate the rotations.While I highly recommend the ScanSnap scanner, this is still a time-consuming process for thousands of photos.
"The best way to save your valuable photos may be by eliminating the paper altogether. Consider making digital images of all your photos."
Here's how it works:You ship your prints (or slides, or negatives) totheir facility in Irvine, California. They have a huge machine that scans them all at 300dpi, no compression, andthey send back your photos and a DVD containing digitized versions in JPEG format, all for only 50 US dollars plusshipping and handling, per thousand photos. I don't think I could even hire someone locally to run my scanner for that!
The deal got better when I contacted them. For people like me with accounts on Facebook, flickr, MySpace or Blogger,they will [scan your first 1000 photos for free] (plus shipping and handling). I selected a thousand 4x6" photos from my vast collection, organized them into eight stacks with rubber bands,and sent them off in a shoe box. The photos get scanned in landscape mode, so I had spent about four hours in preparing what I sent them, making sure they were all face up, with the top of the picture oriented either to the top or left edge.For the envelopes that had double prints, I "deduplicated" them so that only one set got scanned.
The box weighed seven pounds, and cost about 10 US dollars to send from Tucson to Irvinevia UPS on Tuesday. They came back the following Monday, all my photos plus the DVD, for 20 US dollars shipping and handling. Each digital image is about 1.5MB in size, roughly 1800x1200 pixels in size, so easily fit on a single DVD. The quality is the sameas if I scanned them at 300dpi on my own scanner, and comparable to a 2-megapixel camera on most cell phones.Certainly not the high-res photos I take with my Canon PowerShot, but suitable enough for email or Web sites. So, for about 30 US dollars, I got my first batch of 1000 photos scanned.
ScanMyPhotos.com offers a variety of extra priced options, like rotating each file to the correct landscape or portrait orientation, color correction, exact sequence order, hosting them on their Web site online for 30 days to share with friends and family, and extra copies of the DVD.All of these represent a trade-off between having them do it for me for an additional fee, or me spending time doing it myself--either before in the preparation, or afterwards managing the digital files--so I can appreciate that.
Perhaps the weirdest option was to have your original box returned for an extra $9.95? If you don't have a hugecollection of empty shoe boxes in your garage, you can buy a similarly sized cardboard box for only $3.49 at the local office supply store, so I don't understand this one. The box they return all your photos in can easily be used for the next batch.
I opted not to get any of these extras. The one option I think they should add would be to have them just discardthe prints, and send back only the DVD itself. Or better yet, discard the prints, and email me an ISO file of the DVD that I can burn myself on my own computer.Why pay extra shipping to send back to me the entire box of prints, just so that I can dump the prints in the trash myself? I will keep the negatives, in case I ever need to re-print with high resolution.
Overall, I am thoroughlydelighted with the service, and will now pursue sending the rest of my photos in for processing, and reclaim my linen closet for more important things. Now that I know that a thousand 4x6 prints weighs 7 pounds, I can now estimate how many photos I have left to do, and decide on which discount bulk option to choose from.
With my photos digitized, I will be able to do all the things that IBM talks about with Information Retention:
Place them on an appropriate storage tier. I can keep them on disk, tape or optical media.
Easily move them from one storage tier to another. Copying digital files in bulk is straightforward, and as new techhologies develop, I can refresh the bits onto new media, to avoid the "obsolescence of CDs and DVDs" as discussed in this article in[PC World].
Share them with friends and family, either through email, on my Tivo (yes, my Tivo is networked to my Mac and PC and has the option to do this!), or upload themto a photo-oriented service like [Kodak Gallery or flickr].
Keep multiple copies in separate locations. I could easily burn another copy of the DVD myself and store in my safe deposit box or my desk at work.With all of the regional disasters like hurricanes, an alternative might be to backup all your files, including your digitized photos, with an online backup service like [IBM Information Protection Services] from last year's acquisition of Arsenal Digital.
If the prospect of preserving my high school and college memories for the next few decades seems extreme,consider the [Long Now Foundation] is focused on retaining information for centuries.They areeven suggesting that we start representing years with five digits, e.g., 02008, to handle the deca-millennium bug which will come into effect 8,000 years from now. IBM researchers are also working on [long-term preservation technologies and open standards] to help in this area.
For those who only read the first and last paragraphs of each post, here is my recap:Information Retention is about managing [information throughout its lifecycle], using policy-based automation to help with the placement, movement and expiration. An "active archive" of information serves to helpgain insight, innovate, and make better decisions. Disk, tape, and blended disk-and-tape solutions can all play a part in a tiered information infrastructure for long-term retention of information.
Lagasse, Inc. sells janitorial supplies, such as mops, cleaning chemicals, waste receptacles, and garbage can liners. Of the 1000 employees of Lagasse nationwide, about 200 associates were located in New Orleans at their main Headquarters, primary customer care center, and primary IT computing center.
Amazingly, Lagasse did not have a formally documented BCP (Business Continuity Plan) but more of aBCI (Business Continuity Idea). They chose to take a ["donut tire"] approach, putting older previous-generation equipment at their DR site. They knew that in the event of a disaster,they would not be processing as many transactions per second. That was a business trade-offthey could accept.
Evaluating all the different threat scenarios for impact and likelihood, and focused on hurricanes and floods.They had experienced previous hurricanes, learning from each,with the most recent being 2004 Hurricane Ivan and 2005 Hurricane Dennis. From this, they wereable to categorize three levels of DR recovery:
Tier 1 - The most mission-critical, which for them related to picking, packing and shipping products.
Tier 2 - The next most important, focused on maintaining good customer service
Tier 3 - Everything else, including reporting and administrative functions
The time-line of events went as follows:
The US Government issues warning that a hurricane may hit New Orleans
August 27 - 7pm
Lagasse declares a disaster, starts recovery procedures to an existing IT facility in Chicago, owned by their parent company. A temporary "Southeast" Headquarters were set up in Atlanta.Remote call centers were identified in Dallas, Atlanta, San Antonio, and Miami.
August 28 - just after midnight
In just five hours, they recovered their "Tier 1" applications.
August 28 - 7:30pm
In just over 24 hours, they recovered their "Tier 2" applications.
August 29 - 6am
The Hurricane hits land. With 73 levees breached, the city of New Orleans was flooded.
The following week
Lagasse was fully operational, and recorded their second and third best sales days ever.
I was quite impressed with their company's policy for how they treat their employees during a disaster. For many companies, people during a disaster prioritize on their families, not their jobs.If any associate was asked to work during a disaster, the company would take care of:
The safety of their family
The safety of their pets. (In the weeks following this hurricane, I sponsored people in Tucson to go to New Orleans to attend to lost and stray dogs and cats, many of which were left behind when rescuers picked up people from their rooftops.)
Any emergency repairs to secure the home they leave behind
Marshall felt that if you don't know the names of the spouse and kids of your key employees, you are not emotionally-invested enough to be successful during a disaster.
For communications, cell phones were useless. They could call out on them, but anyone with acell phone with 504 area code had difficulty receiving calls, as the calls had to be processedthrough New Orleans. Instead, they used Voice over IP (VoIP) to redirect calls to whichever remote call center each associate went to. Laptops, Citrix, VPN and email were considered powerful tools during this process. They did not have Instant Messaging (IM) at the time.
While the disk and tapes needed to recover Tiers 1 and 2 were already in Chicago, the tapes for Tier 3 were stored locally by a third-party provider. When Lagasse asked for thier DR tapes back, the third-party refused, based on their [force majeure] clause. Force majeure is a common clause in many business contracts to free parties from liabilityduring major disasters.Marshall advised everyone to strike out any "force majeure" clauses out of any future third-party DR protection contracts.
Hurricane Katrina hit the US hard, killing over 1400 people, and America still has not fully recovered. The recovery of thecity of New Orleans has been slow. Massive relocations has caused a deficit of talent inthe area, not just IT talent, but also in the areas of medicine, education and other professions. The result has been degraded social services, encouraging others to relocate as well. Some have called it the "liberation effect", a major event that causespeople to move to a new location or take on a new career in a different field.
On a personal note, I was in New Orleans for a conference the week prior to landfall, and helped clients with their recoveries the weeks after. For more on how IBM Business Continuity Recovery Services (BCRS) helped clients during Hurricane Katrina, see the following [media coverage].
Next Monday, September 1, 2008, marks my two year "blogoversary" for this blog!
I won't be blogging on Monday, of course, because that is [Labor Day] holiday here in the United States.
(From a Canadian colleague: US is not the only country who celebrates Labor Day on the first weekend in September. Canada also celebrates Labour Day on the first weekend in September. It's the only holiday(other than Christmas/New Years) where we are in sync with US. Our Thanksgiving Days are different as is your July 4 vs our July 1. But for Labour Day we are one with the Borg...)
(From an Australian colleague: each province of Australia has its own day to celebrate Labor Day, see [Australia Public Holidays])
The rest of the world celebrates Labor Day on May 1, but the USA celebrates this on the first Monday of September, which this year lands on September 1.Originally, the day is intended to be a "day off for working citizens", IBM is kind enough to let managers and marketingpersonnel have the day off also. (Not that anyone is going to notice no press releases next Monday, right?)
I started this blog on September 1, 2006 as part of IBM's big["50 Years of Disk Systems Innovation"] campaign. IBM introduced the first commercial disk system on September 13, 1956 and so the 50th anniversary was in 2006. Last year, IBM celebrated the 55th anniversary of tape systems.
Several readers have asked me why I haven't talked about recent current events, such as the Olympic Games in Beijing, or the U.S. National Conventions for the race for U.S. President. I have to remind them of one of the key precepts of IBMblogging guidelines:
8. Respect your audience. Don’t use ethnic slurs, personal insults, obscenity, or engage in any conduct that would not be acceptable in IBM’s workplace. You should also show proper consideration for others’ privacy and for topics that may be considered objectionable or inflammatory - such as politics and religion.
I made subtle references to my senator from Arizona, John McCain, in my post [ILM for my iPod], and to Barack Obama in my post [Searching for matching information]. I don't think anyone would mind that I send a "Happy Birthday!" wish to both of them.Senator McCain turns 72 years old today, and Senator Obama turned 47 years old earlier this month.
And lastly, Tucson itself [celebrates this entire month] its 233rd birthday. That's right,Tucson, the 32nd largest city of the USA, and headquarters for IBM System Storage, is older than the USA itself.While the Tucson area has been continuously inhabited by humans for over 3500 years, it officially became Tucsonon August 20, 1775.
Fellow blogger Justin Thorp has opined that [blogging is like jogging]. Somedays, you are just too busy to do it, and other days, you make time for it, because you know it is important.For the record, it is not my job to blog for IBM, that ended last September 2007. I continue to blog anyways because I have benefited from it, both personally and professionally.I want to thank all of you readers out there for making this blog a great success! Being named one of the top 10 blogs of the IT storage industry by Network World, two back-to-back Brand Impact awards from Liquid Agency, and recently earning a "31" Technorati ranking, has really helped keep me going.
So, I look forward to next month, and beginning my third year on this blog. I am sure there will be lots of surprises and announcements you can all look forward to in the next coming weeks and months that I will have plenty to write about.
In last week's System Storage Portfolio Top Gun class in Dallas, some of the students were not familiarwith Really Simple Syndication (RSS). For the uninitiated, this can be intimidating.I thought a quick overview of what I've done might help:
Chose a "feed reader". I chose Bloglines but there are many others.
Use Technorati to search other blogs for keywords or phrases I am looking for.
When I find a blog that I like to continue tracking, I "add" it to my subscription list on bloglines. Just hit "add" and copy the URL of the blog you want to track. Bloglines will figure out the RSS keywords required.I track eight blogs at the momemnt, but some people with lots of time on their hands track 20 or more. It is easy to unsubscribe, so don't be afraid to try some out for a few days.
Since I was actually going to run a blog of my own, I read a few books on the topic. One I recommend is "Naked Conversations" by Robert Scoble and Shel Israel, both experienced bloggers.
Finally, I am not big on spell checking, but most places have the option to preview your post or comment before it actually gets posted, which is not a bad idea if you use any HTML tags.
For a quick taste of blogging, consider using Data Storage Blogger Feed Reader. This has a lot of blogs on the topic of storage, already added and categorized for your convenience, ready for your perusal.
I am sure there are many other ways to enjoy the Blogosphere, but this works for me.[Read More]
The title of this post is inspired by Baxter Black's [latest book]. Rathera recap of the break-out sessions, I thought I would comment on a fewsentences, phrases or comments I heard in the afternoon and evening.
Stop buying storage from EMC or NetApp
The lunch was sponsored by Symantec. Rod Soderbery presented "Taking the cost out ofcost savings", explaining some ideas to reduce IT costs immediately.
First, he suggested to "stop buying storage" from EMC or NetApp that charge a premiumfor tier-one products. Instead, Rod suggested that people should "think like a Web company"and buy only storage products based on commodity hardware to save money, and to use SRM software to identify areas of poor storage utilization. IBM's TotalStorage Productivity Center softwareis often used to help with this analysis.
His other suggestions were to adopt thin provisioning, data deduplication, and virtualization.The discussion at my table started with someone asking, "How do we adopt those functions without buying new storage capacity with those features already built-in?" I explained that IBM's SAN Volume Controller (SVC),N series gateways, and TS7650G ProtecTIER virtual tape gateway can all provide one or moreof these features to your existing disk storage capacity.
IBM and HP are leaders in blade servers
In the session "Future of Server and OS: Disappearing Boundaries", the audience confirmedby electronic survey that IBM and HP are the leaders in blade servers, although blades representonly 8-10 percent of the overall server market.
Interestingly, 22 percent of the audience has deployed both x86 and non-x86 (POWER, SPARC, etc.) blade servers.The presenters considered this an interesting insight.
Another survey of the audience found that 3 percent considered Sun/STK as their primary storagevendor. One of the presenters was delighted that Sun is still hanging in there.
IBM Business Partners deliver the best of IBM and mask the worst
Elaine Lennox, IBM VP, and Mark Wyllie, CEO of Flagship Solutions Group, Inc. presentedIBM-sponsored back to back sessions. Elaine presented IBM's vision, the New Enterprise Data Center, and the challenges that demand a smarter planet.
Mark focused on his company's experience working with IBM through Innovation Workshops. Theseare assessments that can help someone identify where you are now, where you want to be, andthen action plans to address the gaps.
Cats and Dogs, Oil and Water, Microsoft Windows and Mission-critical applications, what do all of these have in common?
NEC Corporation of America sponsored some sessions on some x86-based solutions they have to offer.The first part, titled "Rats Nests, Snow Drifts and Trailers" focused unified storage, andthe second part, presented by Michael Nixon, focused on how to bring Microsoft Windows servers into the data center for mission-critical applications.
The Economy might be slowing, but storage is still growing
Two analysts co-presented "The Enterprise Storage Scenario". Unlike computing capacity, thereis no on/off switch for storage, not from applications nor from end-users. The cost ofpower for storage is expected to be 3x by 2013. Virtual servers, includingVMware and Microsoft's Hyper-V will drive the need for shared external disk storage.A survey of the audience found 20 percent were expecting to purchase additional storagecapacity 4Q08.
When someone reaches age 52, they expect to coast the rest of their career
At dinner with analysts, the discussion of financial meltdown and bailouts is unavoidable,including everyone's views about the proposed bailout of the Big 3 automakers. I can'tdefend Ford, GM and Chrysler paying their people $70 US dollars per hour, when their UScounterparts at Toyota or Honda are only paid $45 to $50 dollars per hour.
However, I have a close friend who retired after 20 years working for the fire department,and a cousin who retired after 20 years serving in the Navy (the US Navy, not the BolivianNavy), and both are still in their forties in age. A long time ago, IT professionalsretired after 30 years, in some cases with 50 to 60 percent of their base pay as theirpension for the rest of their lives. A 52-year-old that has worked 30 years might expect to enjoy the rest of his old age playing golf and pursuing other hobbies. This is not "coasting", it is called "retirement". The few of my colleagues that I have seen who worked 35 to 40 years did so becausethey enjoyed the challenge of work at IBM. They enjoyed solving tough engineering problems and helping customers.As long as they were having fun on the job,IBM was glad to keep their wealth of experience on board and actively engaged.
Unfortunately, many people rely on their own investments in the stock market for retirement, ratherthan company pensions. With the current financial crisis, I suspect many people my age arereconsidering their previous retirement plans.
We're going to need more trains!
I took the monorail back to my hotel. The ride includes funny announcements and statistics,including this gem:
"Since 1940, Las Vegas has doubled in population every ten years, which means thatby the year 2230, we will have over 1 trillion people calling Las Vegas home. We're goingto need more trains!"
That wraps up Tuesday, Day 2 of my attendance here! Now for some sleep.
For those who missed it, IBM announced last Tuesday encryption capability for the TS1120 drive, our enterprise tape drive that read and write 3592 cartridges. Do you need special cartridges for this? No! Use the sames ones you have already been using!
The IBM Storage and Storage Networking Symposium in Las Vegas continues ...
N series and VMware
Jeff Barnett presented how VMware manages disk image files in its VMfs repository, and how N series offersa better alternative. Virtual machines can access N series volumes directly.
Business Continuity with System i
Allison Pate presented the various Business Continuity options for System i. Many customersuse internal storage for System i, but this then hampers Business Continuity efforts. Instead,you can have IBM System Storage DS8000 or DS6000 series disk systems provide disk mirroringbetween clustered systems.
There was a lot of interest in DR550, one of our many compliance storage solutions. Ron Henkhauspresented an overview of our DR550 and DR550 Express offerings. Unlike the competitive disk-onlysolutions, such as the EMC Centera, the DR550 allows you to attach an automated tape library, managing large amounts of fixed content data at a much lower cost point. It also has encryption, for both diskand tape data.
Open Systems Disk Management
Siebo Friesenborg presented the various steps needed to troubleshoot performance problemswith open systems, including the use of "iostat" on AIX systems as an example, and the stepsyou can take to make formal Service Level Agreements (SLA) between the IT department and thevarious lines of business.
IBM Encryption - TS1120 and LTO-4 encryption comparison
Tony Abete presented TS1120 and LTO-4 encryption techniques. Deploying encryption is more thanjust choosing a tape drive. There are a variety of factors involved, such as whether to managethe keys from the application, the operating system, or the library manager. You need policiesto decided when to encrypt tapes and when not to, generating your keys, storing them, and sharingthem with your business partners, suppliers and service providers with which you send tapes.
I can tell that many people are feeling like they are "drinking from a firehose".IBM's success in storage reaches out to so many different aspects of information management,a variety of industries, and disciplines as varied as regulatory compliance and medical imaging.
IBM had some big announcements today. The theme for today's announcement was "Protected Information", as there are many reasons to protect your most strategic asset, your information. Let's do a quick run-down of a few of them.
IBM LTO generation 4
LTO 4 provides encryption at the drive level, and supports WORM cartridges similar to LTO 3. It continues the LTO consortium's strategy for higher capacity and faster performance. If you have LTO 1 or LTO 2, now is a good time to consider upgrading your tape technology. The combination of encryption and WORM protects your information against unauthorized access, and unethical tampering of the data. The support is from our largest automated tapelibrary (TS3500),to our smallest drives.
TS7520 Virtualization Engine
The TS7520replaces the TS7510, providing enhanced Virtual Tape Library (VTL) capability. When you hear "storage virtualization" you often think disk, but IBM invented "tape storage virtualization" and this product continues that leadership.
Support for Half-high LTO 3 drives
The TS3100 and TS3200 now support half-high LTO 3 drives, which means you can have twice the number of drives in each unit. LTO 4 drives can read and write to LTO 3 media, so this provides additional investment protection.
IBM System Storage DR550 File System Gateway
This new offering provides much-needed CIFS and NFS access to the DR550, the worlds most flexible compliance-and-retention storage available. Already there is a large body of ISVs that support the DR550 today, and with this new gateway, the list is even longer. The DR550 provides encryption for both disk and tape data, as well as policy-based non-erasable, non-rewriteable enforcement, designed for compliance with government regulations like Sarbanes-Oxley Act, HIPAA, and many others.
IBM System Storage SAN32B-3 switch
This is the first major deliverable from Brocade since their acquisition of McDATA. A powerful switch packs 4 Gbps support in a small 1U form factor. Start with 16 ports, then add in increments of 8 ports to a maximum of 32 ports.
I've provided all the links, so that you can delve deeply into all the data sheets.
A lot of people ask me about IBM branding, as we have recently changed brands. In the past we had two separate brands, one for servers (eServer) and one for storage (TotalStorage). These would be fine if we wanted to promote their independence, but customers today want synergy between servers and storage, they want systems that work well together.
Last year, in response to market feedback, we crated a new brand, "IBM Systems" and put all the server and storage product lines under one roof. Over time, we will transition from TotalStorage to System Storage naming. This will occur with new products, and major versions of existing products.
Two other phrases you will hear in the names of our offerings are "Virtualization Engine" and "Express". These are portfolio identifiers. The Virtualization Engine identifier was created to emphasize our leadership in system virtualization, and we have products that span product lines with this identifier.
The Express identifier was created to emphasize our focus on Small and Medium sized business (SMB). It spans not just servers and storage, but across other offerings from other IBM divisions.
Of course, just renaming products and services isn't enough. Systems don't work together just because they have similar names, are covered in similar "Apple white" plastic, or have similar black bezels. Obviously, thoughtful and collaborative design are needed, with the appropriate amounts of engineering and testing. IBM is aligning its server and storage development so that the IBM Systems brand keeps its promise.
To get beyond the simple statistics of vendor popularity, we looked at the number and combinations of vendors with which enterprises work. Many were customers of one or two storage providers, but the rest were customers of up to six storage providers. More than one-third were customers of systems vendors only, bypassing storage specialists.
Comparisons between solutions vendors and storage component vendors are not new. One could argue that this can be compared to supermarkets and specialty shops.
Supermarkets offer everything you need to prepare a meal. You can buy your meat, bread, cheese,and extras all with one-stop shopping. In a sense, IBM, HP, Sun and Dell are offering this to clients who prefer this approach. Not surprisingly, the two leaders in overall storage hardware,IBM and HP, are also the two best to offer a complete set of software, services, servers and storage.
IBM and HP are also the leaders in tape.While Forrester reports that many large enterprises in North America prefer to buy diskfrom storage specialists, others have found that customers prefer to buy their tape from solution providers. Recently, Byte and Switch reports thatLTO Hits New Milestones,where the LTO consortium (IBM, HP, and Quantum) have collectively shipped over 2 million LTO tape drives, and over 80 million LTO tape cartridges. Perhaps this is because tape is part of an overallbackup, archive or space management solution, and customers trust a solution vendor overa storage specialist.
Where possible, IBM brings synergy between its servers and storage. For example, we justannounced the IBM BladeCenter Boot Disk System, a 2U high unit that supports up to 28 blade servers, ideal for applications running under Windows or Linux, and helping to reduce the energy consumption for thoseinterested in a "Green" data center.
Some people prefer buying their meat at the slaughterhouse, bread at the French pastry shop, andso on. Storage specialists focus on just storage, leaving the rest of the solution, like servers,to be purchased separately from someone else. Storage vendors like NetApp, EMC, HDS and othersoffer storage components to customers that like to do their own "system integration", or to thosethat are large enough to hire their own "systems integrator".
Storage specialists recognize that not everybody is a "specialty shop" shopper.HDS has done well selling their disk through solution vendorslike HP and Sun. EMC sells its gear through solution vendor Dell.
Interestingly, I have met clients who prefer to buy IBM System Storage N series from IBM, becauseIBM is a solution vendor, and others that prefer to buy comparable NetApp equipment directly fromNetApp, because they are a storage component vendor.
I mostly buy my groceries at a supermarket, buthave, on occasion, bought something from the local butcher, baker or candlestick maker. And if you are ever in Tucson, you might be able to find Mexican tamalessold by a complete stranger standing outside of a Walgreens pharmacy, the ultimate extreme of specialization. You can get a dozen tamales for tenbucks, and in my experience they are usually quite good. Theoretically, if you get sick, or they don't taste right, you have no recourse, and will probably never see that stranger again to complain to.(And no, before I get flamed, I am not implying any major vendor mentioned above is like this tamale vendor)
Of course, nothing is starkly black and white, and comparisons like this are just to help provide context and perspective,but if you are looking to have a complete IT solutionthat works, from software and servers to storage and financing, come to the vendor you can trust, IBM.
Based on this success, and perhaps because I am also fluent in Spanish, I was asked to help with Proyecto Ceibal, the team for OLPC Uruguay. Normally theXS school server resides at the school location itself, so that even if the internet connection is disrupted or limited, the school kids can continue to access each other and the web cache content until internet connection is resumed.However, with a diverse developmentteam with people in United States, Uruguay, and India, we first looked to Linux hosting providers that wouldagree to provide free or low-cost monthly access. We spent (make that "wasted") the month of May investigating.Most that I talked to were not interested in having a customized Linux kernel on non-standard hardware on their shop floor, and wanted instead to offer their own standard Linux build on existing standard servers, managed by theirown system administrators, or were not interested in providing it for free. Since the XS-163 kernel is customizedfor the x86 architecture, it is one of those exceptions where we could not host it on an IBM POWER or mainframe as a virtual guest.
This got picked up as an [idea] for the Google's[Summer of Code] and we are mentoring Tarun, a 19-year-old student to actas lead software developer. However, summer was fast approaching, and we wanted this ready for the next semester. In June, our project leader, Greg, came up with a new plan. Build a machine and have it connected at an internet service provider that would cover the cost of bandwidth, and be willing to accept this with remote administration. We found a volunteer organization to cover this -- Thank you Glen and Vicki!
We found a location, so the request to me sounded simple enough: put together a PC from commodity parts that meet the requirements of the customizedLinux kernel, the latest release being called [XS-163]. The server would have two disk drives, three Ethernet ports, and 2GB of memory; and be installed with the customized XS-163 software, SSHD for remote administration, Apache web server, PostgreSQL database and PHP programming language.Of course, the team wanted this for as little cost as possible, and for me to document the process, so that it could be repeated elsewhere. Some stretch goals included having a dual-boot with Debian 4.0 Etch Linux for development/test purposes, an alternative database such as MySQL for testing, a backup procedure, and a Recover-DVD in case something goes wrong.
Some interesting things happened:
The XS-163 is shipped as an ISO file representing a LiveCD bootable Linux that will wipe your system cleanand lay down the exact customized software for a one-drive, three-Ethernet-port server. Since it is based on Red Hat's Fedora 7 Linux base, I found it helpful to install that instead, and experiment moving sections of code over.This is similar to geneticists extracting the DNA from the cell of a pit bull and putting it into the cell for a poodle. I would not recommend this for anyone not familiar with Linux.
I also experimented with modifying the pre-built XS-163 CD image by cracking open the squashfs, hacking thecontents, and then putting it back together and burning a new CD. This provided some interesting insight, but in the end was able to do it all from the standard XS-163 image.
Once I figured out the appropriate "scaffolding" required, I managed to proceed quickly, with running versionsof XS-163, plain vanilla Fedora 7, and Debian 4, in a multi-boot configuration.
The BIOS "raid" capability was really more like BIOS-assisted RAID for Windows operating system drivers. This"fake raid" wasn't supported by Linux, so I used Linux's built-in "software raid" instead, which allowed somepartitions to be raid-mirrored, and other partitions to be un-mirrored. Why not mirror everything? With two160GB SATA drives, you have three choices:
No RAID, for a total space of 320GB
RAID everything, for a total space of 160GB
Tiered information infrastructure, use RAID for some partitions, but not all.
The last approach made sense, as a lot of of the data is cache web page images, and is easily retrievable fromthe internet. This also allowed to have some "scratch space" for downloading large files and so on. For example,90GB mirrored that contained the OS images, settings and critical applications, and 70GB on each drive for scratchand web cache, results in a total of 230GB of disk space, which is 43 percent improvement over an all-RAID solution.
While [Linux LVM2] provides software-based "storage virtualization" similar to the hardware-based IBM System Storage SAN Volume Controller (SVC), it was a bad idea putting different "root" directories of my many OS images on there. With Linux, as with mostoperating systems, it expects things to be in the same place where it last shutdown, but in a multi-boot environment, you might boot the first OS, move things around, and then when you try to boot second OS, it doesn'twork anymore, or corrupts what it does find, or hangs with a "kernel panic". In the end, I decided to use RAIDnon-LVM partitions for the root directories, and only use LVM2 for data that is not needed at boot time.
While they are both Linux, Debian and Fedora were different enough to cause me headaches. Settings weredifferent, parameters were different, file directories were different. Not quite as religious as MacOS-versus-Windows,but you get the picture.
During this time, the facility was out getting a domain name, IP address, subnet mask and so on, so I testedwith my internal 192.168.x.y and figured I would change this to whatever it should be the day I shipped the unit.(I'll find out next week if that was the right approach!)
Afraid that something might go wrong while I am in Tokyo, Japan next week (July 7-11), or Mumbai, India the following week (July 14-18), I added a Secure Shell [SSH] daemon that runs automaticallyat boot time. This involves putting the public key on the server, and each remote admin has their own private key on their own client machine.I know all about public/private key pairs, as IBM is a leader in encryption technology, and was the first todeliver built-in encryption with the IBM System Storage TS1120 tape drive.
To have users have access to all their files from any OS image required that I either (a) have identical copieseverywhere, or (b) have a shared partition. The latter turned out to be the best choice, with an LVM2 logical volumefor "/home" directory that is shared among all of the OS images. As we develop the application, we might findother directories that make sense to share as well.
For developing across platforms, I wanted the Ethernet devices (eth0, eth1, and so on) match the actual ports they aresupposed to be connected to in a static IP configuration. Most people use DHCP so it doesn't matter, but the XSsoftware requires this, so it did. For example, "eth0" as the 1 Gbps port to the WAN, and "eth1/eth2" as the two 10/100 Mbps PCI NIC cards to other servers.Naming the internet interfaces to specific hardware ports wasdifferent on Fedora and Debian, but I got it working.
While it was a stretch goal to develop a backup method, one that could perform Bare Machine Recovery frommedia burned by the DVD, it turned out I needed to do this anyways just to prevent me from losing my work in case thingswent wrong. I used an external USB drive to develop the process, and got everything to fit onto a single 4GB DVD. Using IBM Tivoli Storage Manager (TSM) for this seemed overkill, and [Mondo Rescue] didn't handle LVM2+RAID as well as I wanted, so I chose [partimage] instead, which backs up each primary partition, mirrored partition, or LVM2 logical volume, keeping all the time stamps, ownerships, and symbolic links in tact. It has the ability to chop up the output into fixed sized pieces, which is helpful if you are goingto burn them on 700MB CDs or 4.7GB DVDs. In my case, my FAT32-formatted external USB disk drive can't handle files bigger than 2GB, so this feature was helpful for that as well. I standardized to 660 GiB [about 692GB] per piece, sincethat met all criteria.
The folks at [SysRescCD] saved the day. The standard "SysRescueCD" assigned eth0, eth1, and eth2 differently than the three base OS images, but the nice folks in France that write SysRescCD created a customized[kernel parameter that allowed the assignments to be fixed per MAC address ] in support of this project. With this in place, I was able to make a live Boot-CD that brings up SSH, with all the users, passwords,and Ethernet devices to match the hardware. Install this LiveCD as the "Rescue Image" on the hard disk itself, and also made a Recovery-DVD that boots up just like the Boot-CD, but contains the 4GB of backup files.
For testing, I used Linux's built-in Kernel-based Virtual Machine [KVM]which works like VMware, but is open source and included into the 2.6.20 kernels that I am using. IBM is the leadingreseller of Vmware and has been doing server virtualization for the past 40 years, so I am comfortable with thetechnology. The XS-163 platform with Apache and PostgreSQL servers as a platform for [Moodle], an open source class management system, and the combination is memory-intensive enough that I did not want to incur the overheads running production this manner, but it wasgreat for testing!
With all this in place, it is designed to not need a Linux system admin or XS-163/Moodle expert at the facility. Instead, all we need is someone to insert the Boot-CD or Recover-DVD and reboot the system if needed.
Just before packing up the unit for shipment, I changed the IP addresses to the values they need at the destination facility, updated the [GRUB boot loader] default, and made a final backup which burned the Recover-DVD. Hopefully, it works by just turning on the unit,[headless], without any keyboard, monitor or configuration required. Fingers crossed!
So, thanks to the rest of my team: Greg, Glen, Vicki, Tarun, Marcel, Pablo and Said. I am very excited to bepart of this, and look forward to seeing this become something remarkable!
IDC announced that IBM was number #1 in storage hardware (disk and tape combined)for 2006. Here are some excerpts from the IBM press release:
The newly released May 2007 report  by leading industry analyst firm IDC, "Worldwide Combined Disk and Tape Storage 2006 Market Share Update," shows IBM in the #1 overall position for all disk and tape storage hardware for the full year 2006.
In a total disk and tape storage hardware segment that increased to $28.2 billion in 2006, IBM captured 22.2 percent of the combined revenue for full year 2006, besting HP's 20.9 percent and EMC's 13.2 percent.
Five years ago, IBM was only #3 in this area, butis this new standing from IBM doing things better, or HP and EMC doing things poorly? Probably a little of both, but since it's not polite to point out the flaws of others in a blog, I will focus on what IBM is doing right, and I think our leadership in tape accounts for a good measure of this.
The resurgence of tape comes from a variety of factors:
The focus on being "green", to conserve energy power and cooling costs. Tape is the cheapest storage in this regard, as the tape cartridges only consume power when read or written.
Government regulations where more data must be stored for longer periods of time, such as theFederal Rules of Civil Procedures (FRCP), Sarbanes-Oxley, SEC regulations, and so on.
The widening gap in dollars per MB. Advancements in tape are outpacing disk. Disk is slowing down to about 25% improvement year on year, but tape continues its 30-40% improvement curve. A solution like Information Lifecycle Management (ILM) that moves older less valuable data from disk to tape can result in excellent cost savings.
Exciting "combined storage" solutions like the IBM System Storage DR550 and the IBM Grid Medical Archive Solution (GMAS) that combine disk and tape with internal hierarchy storage management of data, based on policies.