Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
Well, it's Tuesday again, but this time, today we had our third big storage launch of 2009! A lot got announced today as part of IBM's big "Dynamic Infrastructure" marketing campaign. I will just focus on the
disk-related announcements today:
IBM System Storage DS8700
IBM adds a new model to its DS8000 series with the
[IBM System Storage DS8700]. Earlier this month, fellow blogger and arch-nemesis Barry Burke from EMC posted [R.I.P DS8300] on this mistaken assumption that the new DS8700 meant that DS8300 was going away, or that anyone who bought a DS8300 recently would be out of luck. Obviously, I could not respond until today's announcement, as the last thing I want to do is lose my job disclosing confidential information. BarryB is wrong on both counts:
IBM will continue to sell the DS8100 and DS8300, in addition to the new DS8700.
Clients can upgrade their existing DS8100 or DS8300 systems to DS8700.
BarryB's latest post [What's In a Name - DS8700] is fair game, given all the fun and ridicule everyone had at his expense over EMC's "V-Max" name.
So the DS8700 is new hardware with only 4 percent new software. On the hardware side, it uses faster POWER6 processors instead of POWER5+, has faster PCI-e buses instead of the RIO-G loops, and faster four-port device adapters (DAs) for added bandwidth between cache and drives. The DS8700 can be ordered as a single-frame dual 2-way that supports up to 128 drives and 128GB of cache, or as a dual 4-way, consisting of one primary frame, and up to four expansion frames, with up to 384GB of cache and 1024 drives.
Not mentioned explicitly in the announcements were the things the DS8700 does not support:
ESCON attachment - Now that FICON is well-established for the mainframe market, there is no need to support the slower, bulkier ESCON options. This greatly reduced testing effort. The 2-way DS8700 can support up to 16 four-port FICON/FCP host adapters, and the 4-way can support up to 32 host adapters, for a maximum of 128 ports. The FICON/FCP host adapter ports can auto-negotiate between 4Gbps, 2Gbps and 1Gbps as needed.
LPAR mode - When IBM and HDS introduced LPAR mode back in 2004, it sounded like a great idea the engineers came up with. Most other major vendors followed our lead to offer similar "partitioning". However, it turned out to be what we call in the storage biz a "selling apple" not a "buying apple". In other words, something the salesman can offer as a differentiating feature, but that few clients actually use. It turned out that supporting both LPAR and non-LPAR modes merely doubled the testing effort, so IBM got rid of it for the DS8700.
Update: I have been reminded that both IBM and HDS delivered LPAR mode within a month of each other back in 2004, so it was wrong for me to imply that HDS followed IBM's lead when obviously development happened in both companies for the most part concurrently prior to that. EMC was late to the "partition" party, but who's keeping track?
Initial performance tests show up to 50 percent improvement for random workloads, and up to 150 percent improvement for sequential workloads, and up to 60 percent improvement in background data movement for FlashCopy functions. The results varied slightly between Fixed Block (FB) LUNs and Count-Key-Data (CKD) volumes, and I hope to see some SPC-1 and SPC-2 benchmark numbers published soon.
The DS8700 is compatible for Metro Mirror, Global Mirror, and Metro/Global Mirror with the rest of the DS8000 series, as well as the ESS model 750, ESS model 800 and DS6000 series.
New 600GB FC and FDE drives
IBM now offers [600GB drives] for the DS4700 and DS5020 disk systems, as well as the EXP520 and EXP810 expansion drawers. In each case, we are able to pack up to 16 drives into a 3U enclosure.
Personally, I think the DS5020 should have been given a DS4xxx designation, as it resembles the DS4700
more than the other models of the DS5000 series. Back in 2006-2007, I was the marketing strategist for IBM System Storage product line, and part of my job involved all of the meetings to name or rename products. Mostly I gave reasons why products should NOT be renamed, and why it was important to name the products correctly at the beginning.
IBM System Storage SAN Volume Controller hardware and software
Fellow IBM master inventory Barry Whyte has been covering the latest on the [SVC 2145-CF8 hardware]. IBM put out a press release last week on this, and today is the formal announcement with prices and details. Barry's latest post
[SVC CF8 hardware and SSD in depth] covers just part of the entire
The other part of the announcement was the [SVC 5.1 software] which can be loaded
on earlier SVC models 8F2, 8F4, and 8G4 to gain better performance and functionality.
To avoid confusion on what is hardware machine type/model (2145-CF8 or 2145-8A4) and what is software program (5639-VC5 or 5639-VW2), IBM has introduced two new [Solution Offering Identifiers]:
5465-028 Standard SAN Volume Controller
5465-029 Entry Edition SAN Volume Controller
The latter is designed for smaller deployments, supports only a single SVC node-pair managing up to
150 disk drives, available in Raven Black or Flamingo Pink.
EXN3000 and EXP5060 Expansion Drawers
IBM offers the [EXN3000 for the IBM N series]. These expansion drawers can pack 24 drives in a 4U enclosure. The drives can either be all-SAS, or all-SATA, supporting 300GB, 450GB, 500GB and 1TB size capacity drives.
The [EXP5060 for the IBM DS5000 series] is a high-density expansion drawer that can pack up to 60 drives into a 4U enclosure. A DS5100 or DS5300
can handle up to eight of these expansion drawers, for a total of 480 drives.
Pre-installed with Tivoli Storage Productivity Center Basic Edition. Basic Edition can be upgraded with license keys to support Data, Disk and Standard Edition to extend support and functionality to report and manage XIV, N series, and non-IBM disk systems.
Pre-installed with Tivoli Key Lifecycle Manager (TKLM). This can be used to manage the Full Disk Encryption (FDE) encryption-capable disk drives in the DS8000 and DS5000, as well as LTO and TS1100 series tape drives.
IBM Tivoli Storage FlashCopy Manager v2.1
The [IBM Tivoli Storage FlashCopy Manager V2.1] replaces two products in one. IBM used
to offer IBM Tivoli Storage Manager for Copy Services (TSM for CS) that protected Windows application data, and IBM Tivoli Storage Manager for Advanced Copy Services (TSM for ACS) that protected AIX application data.
The new product has some excellent advantages. FlashCopy Manager offers application-aware backup of LUNs containing SAP, Oracle, DB2, SQL server and Microsoft Exchange data. It can support IBM DS8000, SVC and XIV point-in-time copy functions, as well as the Volume Shadow Copy Services (VSS) interfaces of the IBM DS5000, DS4000 and DS3000 series disk systems. It is priced by the amount of TB you copy, not on the speed or number of CPU processors inside the server.
Don't let the name fool you. IBM FlashCopy Manager does not require that you use Tivoli Storage Manager (TSM) as your backup product. You can run IBM FlashCopy Manager on its own, and it will manage your FlashCopy target versions on disk, and these can be backed up to tape or another disk using any backup product. However, if you are lucky enough to also be using TSM, then there is optional integration that allows TSM to manage the target copies, move them to tape, inventory them in its DB2 database, and provide complete reporting.
Yup, that's a lot to announce in one day. And this was just the disk-related portion of the launch!
Last week, I presented IBM's strategic initiative, the IBM Information Infrastructure, which is part of IBM's New Enterprise Data Center vision. This week, I will try to get around to talking about some of theproducts that support those solutions.
I was going to set the record straight on a variety of misunderstandings, rumors or speculations, but I think most have been taken care of already. IBM blogger BarryW covered the fact that SVC now supports XIV storage systems, in his post[SVC and XIV],and addressed some of the FUD already. Here was my list:
Now that IBM has an IBM-branded model of XIV, IBM will discontinue (insert another product here)
I had seen speculation that XIV meant the demise of the N series, the DS8000 or IBM's partnership with LSI.However, the launch reminded people that IBM announced a new release of DS8000 features, new models of N series N6000,and the new DS5000 disk, so that squashes those rumors.
IBM XIV is a (insert tier level here) product
While there seems to be no industry-standard or agreement for what a tier-1, tier-2 or tier-3 disk system is, there seemed to be a lot of argument over what pigeon-hole category to put IBM XIV in. No question many people want tier-1 performance and functionality at tier-2 prices, and perhaps IBM XIV is a good step at giving them this. In some circles, tier-1 means support for System z mainframes. The XIV does not have traditional z/OS CKD volume support, but Linux on System z partitions or guests can attach to XIV via SAN Volume Controller (SVC), or through NFS protocol as part of the Scale-Out File Services (SoFS) implementation.
Whenever any radicalgame-changing technology comes along, competitors with last century's products and architectures want to frame the discussion that it is just yet another storage system. IBM plans to update its Disk Magic and otherplanning/modeling tools to help people determine which workloads would be a good fit with XIV.
IBM XIV lacks (insert missing feature here) in the current release
I am glad to see that the accusations that XIV had unprotected, unmirrored cache were retracted. XIV mirrors all writes in the cache of two separate modules, with ECC protection. XIV allows concurrent code loadfor bug fixes to the software. XIV offers many of the features that people enjoy in other disksystems, such as thin provisioning, writeable snapshots, remote disk mirroring, and so on.IBM XIV can be part of a bigger solution, either through SVC, SoFS or GMAS that provide thebusiness value customers are looking for.
IBM XIV uses (insert block mirroring here) and is not as efficient for capacity utilization
It is interesting that this came from a competitor that still recommends RAID-1 or RAID-10 for itsCLARiiON and DMX products.On the IBM XIV, each 1MB chunk is written on two different disks in different modules. When disks wereexpensive, how much usable space for a given set of HDD was worthy of argument. Today, we sell you abig black box, with 79TB usable, for (insert dollar figure here). For those who feel 79TB istoo big to swallow all at once, IBM offers "capacity on demand" pricing, where you can pay initially for as littleas 22TB, but get all the performance, usability, functionality and advanced availability of the full box.
IBM XIV consumes (insert number of Watts here) of energy
For every disk system, a portion of the energy is consumed by the number of hard disk drives (HDD) andthe remainder to UPS, power conversion, processors and cache memory consumption. Again, the XIV is a bigblack box, and you can compare the 8.4 KW of this high-performance, low-cost storage one-frame system with thewattage consumed by competitive two-frame (sometimes called two-bay) systems, if you are willing to take some trade-offs. To getcomparable performance and hot-spot avoidance, competitors may need to over-provision or use faster, energy-consuming FC drives, and offer additional software to monitor and re-balance workloads across RAID ranks.To get comparable availability, competitors may need to drop from RAID-5 down to either RAID-1 or RAID-6.To get comparable usability, competitors may need more storage infrastructure management software to hide theinherent complexity of their multi-RAID design.
Of course, if energy consumption is a major concern for you, XIV can be part of IBM's many blended disk-and-tapesolutions. When it comes to being green, you can't get any greener storage than tape! Blended disk-and-tapesolutions help get the best of both worlds.
Well, I am glad I could help set the record straight. Let me know what other products people you would like me to focus on next.
The technology industry is full of trade-offs. Take for example solar cells that convert sunlight to electricity. Every hour, more energy hits the Earth in the form of sunlight than the entire planet consumes in an entire year. The general trade-off is between energy conversion efficiency versus abundance of materials:
Get 9-11 percent efficiency using rare materials like indium (In), gallium (Ga) or cadmium (Cd).
Get only 6.7 percent efficiency using abundant materials like copper (Cu), tin (Sn), zinc (Zn), sulfur (S), and selenium (Se)
A second trade-off is exemplified by EMC's recent GeoProtect announcement. This appears similar to the geographic dispersal method introduced by a company called [CleverSafe]. The trade-off is between the amount of space to store one or more copies of data and the protection of data in the event of disaster. Here's an excerpt from fellow blogger Chuck Hollis (EMC) titled ["Cloud Storage Evolves"]:
"Imagine a average-sized Atmos network of 9 nodes, all in different time zones around the world. And imagine that we were using, say, a 6+3 protection scheme.
The implication is clear: any 3 nodes could be completely lost: failed, destroyed, seized by the government, etc.
-- and the information could be completely recovered from the surviving nodes."
For organizations worried about their information falling into the wrong hands (whether criminal or government sponsored!), any subset of the nodes would yield nothing of value -- not only would the information be presumably encrypted, but only a few slices of a far bigger picture would be lost.
Seized by the government?falling into the wrong hands? Is EMC positioning ATMOS as "Storage for Terrorists"? I can certainly appreciate the value of being able to protect 6PB of data with only 9PB of storage capacity, instead of keeping two copies of 6PB each, the trade-off means that you will be accessing the majority of your data across your intranet, which could impact performance. But, if you are in an illicit or illegal business that could have a third of your facilities "seized by the government", then perhaps you shouldn't house your data centers there in the first place. Having two copies of 6PB each, in two "friendly nations", might make more sense.
(In reality, companies often keep way more than just two copies of data. It is not unheard of for companies to keep three to five copies scattered across two or three locations. Facebook keeps SIX copies of photographs you upload to their website.)
ChuckH argues that the governments that seize the three nodes won't have a complete copy of the data. However, merely having pieces of data is enough for governments to capture terrorists. Even if the striping is done at the smallest 512-byte block level, those 512 bytes of data might contain names, phone numbers, email addresses, credit cards or social security numbers. Hackers and computer forensics professionals take advantage of this.
You might ask yourself, "Why not just encrypt the data instead?" That brings me to the third trade-off, protection versus application performance. Over the past 30 years, companies had a choice, they could encrypt and decrypt the data as needed, using server CPU cycles, but this would slow down application processing. Every time you wanted to read or update a database record, more cycles would be consumed. This forced companies to be very selective on what data they encrypted, which columns or fields within a database, which email attachments, and other documents or spreadsheets.
An initial attempt to address this was to introduce an outboard appliance between the server and the storage device. For example, the server would write to the appliance with data in the clear, the appliance would encrypt the data, and pass it along to the tape drive. When retrieving data, the appliance would read the encrypted data from tape, decrypt it, and pass the data in the clear back to the server. However, this had the unintended consequences of using 2x to 3x more tape cartridges. Why? Because the encrypted data does not compress well, so tape drives with built-in compression capabilities would not be able to shrink down the data onto fewer tapes.
(I covered the importance of compressing data before encryption in my previous blog post
[Sock Sock Shoe Shoe].)
Like the trade-off between energy efficiency and abundant materials, IBM eliminated the trade-off by offering compression and encryption on the tape drive itself. This is standard 256-bit AES encryption implemented on a chip, able to process the data as it arrives at near line speed. So now, instead of having to choose between protecting your data or running your applications with acceptable performance, you can now do both, encrypt all of your data without having to be selective. This approach has been extended over to disk drives, so that disk systems like the IBM System Storage DS8000 and DS5000 can support full-disk-encryption [FDE] drives.
It's official! My "blook" Inside System Storage - Volume I is now available.
This blog-based book, or “blook”, comprises the first twelve months of posts from this Inside System Storage blog,165 posts in all, from September 1, 2006 to August 31, 2007. Foreword by Jennifer Jones. 404 pages.
IT storage and storage networking concepts
IBM strategy, hardware, software and services
Disk systems, Tape systems, and storage networking
Storage and infrastructure management software
Second Life, Facebook, and other Web 2.0 platforms
IBM’s many alliances, partners and competitors
How IT storage impacts society and industry
You can choose between hardcover (with dust jacket) or paperback versions:
This is not the first time I've been published. I have authored articles for storage industry magazines, written large sections of IBM publications and manuals, submitted presentations and whitepapers to conference proceedings, and even had a short story published with illustrations by the famous cartoon writer[Ted Rall].
But I can say this is my first blook, and as far as I can tell, the first blook from IBM's many bloggers on DeveloperWorks, and the first blook about the IT storage industry.I got the idea when I saw [Lulu Publishing] run a "blook" contest. The Lulu Blooker Prize is the world's first literary prize devoted to "blooks"--books based on blogs or other websites, including webcomics. The [Lulu Blooker Blog] lists past year winners. Lulu is one of the new innovative "print-on-demand" publishers. Rather than printing hundredsor thousands of books in advance, as other publishers require, Lulu doesn't print them until you order them.
I considered cute titles like A Year of Living Dangerously, orAn Engineer in Marketing La-La land, or Around the World in 165 Posts, but settled on a title that matched closely the name of the blog.
In addition to my blog posts, I provide additional insights and behind-the-scenes commentary. If you go to the Luluwebsite above, you can preview an entire chapter in its entirety before purchase. I have added a hefty 56-page Glossary of Acronyms and Terms (GOAT) with over 900 storage-related terms defined, which also doubles as an index back to the post (or posts) that use or further explain each term.
So who might be interested in this blook?
Business Partners and Sales Reps looking to give a nice gift to their best clients and colleagues
Managers looking to reward early-tenure employees and retain the best talent
IT specialists and technicians wanting a marketing perspective of the storage industry
Mentors interested in providing motivation and encouragement to their proteges
Educators looking to provide books for their classroom or library collection
Authors looking to write a blook themselves, to see how to format and structure a finished product
Marketing personnel that want to better understand Web 2.0, Second Life and social networking
Analysts and journalists looking to understand how storage impacts the IT industry, and society overall
College graduates and others interested in a career as a storage administrator
And yes, according to Lulu, if you order soon, you can have it by December 25.
Tonight PBS plans to air Season 38, Episode 6 of NOVA, titled [Smartest Machine On Earth]. Here is an excerpt from the station listing:
"What's so special about human intelligence and will scientists ever build a computer that rivals the flexibility and power of a human brain? In "Artificial Intelligence," NOVA takes viewers inside an IBM lab where a crack team has been working for nearly three years to perfect a machine that can answer any question. The scientists hope their machine will be able to beat expert contestants in one of the USA's most challenging TV quiz shows -- Jeopardy, which has entertained viewers for over four decades. "Artificial Intelligence" presents the exclusive inside story of how the IBM team developed the world's smartest computer from scratch. Now they're racing to finish it for a special Jeopardy airdate in February 2011. They've built an exact replica of the studio at its research lab near New York and invited past champions to compete against the machine, a big black box code -- named Watson after IBM's founder, Thomas J. Watson. But will Watson be able to beat out its human competition?"
Like most supercomputers, Watson runs the Linux operating system. The system runs 2,880 cores (90 IBM Power 750 servers, four sockets each, eight cores per socket) to achieve 80 [TeraFlops]. TeraFlops is the unit of measure for supercomputers, representing a trillion floating point operations. By comparison, Hans Morvec, principal research scientist at the Robotics Institute of Carnegie Mellon University (CMU) estimates that the [human brain is about 100 TeraFlops]. So, in the three seconds that Watson gets to calculate its response, it would have processed 240 trillion operations.
Several readers of my blog have asked for details on the storage aspects of Watson. Basically, it is a modified version of IBM Scale-Out NAS [SONAS] that IBM offers commercially, but running Linux on POWER instead of Linux-x86. System p expansion drawers of SAS 15K RPM 450GB drives, 12 drives each, are dual-connected to two storage nodes, for a total of 21.6TB of raw disk capacity. The storage nodes use IBM's General Parallel File System (GPFS) to provide clustered NFS access to the rest of the system. Each Power 750 has minimal internal storage mostly to hold the Linux operating system and programs.
When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.
In my post yesterday [Spreading out the Re-Replication process], fellow blogger BarryB [aka The Storage Anarchist]raises some interesting points and questions in the comments section about the new IBM XIV Nextra architecture.I answer these below not just for the benefit of my friends at EMC, but also for my own colleagues within IBM,IBM Business Partners, Analysts and clients that might have similar questions.
If RAID 5/6 makes sense on every other platform, why not so on the Web 2.0 platform?
Your attempt to justify the expense of Mirrored vs. RAID 5 makes no sense to me. Buying two drives for every one drive's worth of usable capacity is expensive, even with SATA drives. Isn't that why you offer RAID 5 and RAID 6 on the storage arrays that you sell with SATA drives?
And if RAID 5/6 makes sense on every other platform, why not so on the (extremely cost-sensitive) Web 2.0 platform? Is faster rebuild really worth the cost of 40+% more spindles? Or is the overhead of RAID 6 really too much for those low-cost commodity servers to handle.
Let's take a look at various disk configurations, for example 3TB on 750GB SATA drives:
JBOD: 4 drives
JBOD here is industry slang for "Just a Bunch of Disks" and was invented as the term for "non-RAID".Each drive would be accessible independently, at native single-drive speed, with no data protection. Puttingfour drives in a single cabinet like this provides simplicity and convenience only over four separate drivesin their own enclosures.
RAID-10: 8 drives
RAID-10 is a combination of RAID-1 (mirroring) and RAID-0 (striping). In a 4x2 configuration, data is striped across disks 1-4,then these are mirrored across to disks 5-8. You get performance improvement and protection against a singledrive failure.
RAID-5: 5 drives
This would be a 4+P configuration, where there would be four drives' worth of data scattered across fivedrives. This gives you almost the same performance improvement as RAID-10, similar protection againstsingle drive failure, but with fewer drives per usable TB capacity.
RAID-6: 6 drives
This would be a 4+2P configuration, where the first P represents linear parity, and the second represents a diagonal parity. Similar in performance improvement as RAID-5, but protects against single and double drive failures, and still better than RAID-10 in terms of drives per TB usable capacity.
For all the RAID configurations, rebuild would require a spare drive, but often spares are shared among multiple RAID ranks, not dedicated to a single rank. To this end, you often have to have several spares per I/O loop, and a different set of spares for each kind of speed and capacity. If you had a mix of 15K/73GB, 10K/146GB, and 7200/500GB drives, then you would have three sets of spares to match.
In contrast, IBM XIV's innovative RAID-X approach doesn't requireany spare drives, just spare capacity on existing drives being used to hold data. The objects can be mirroredbetween any two types of drives, so no need to match one with another.
All of these RAID levels represent some trade-off between cost, protection and performance, and IBM offers each of theseon various disk systems platforms. Calculating parity is more complicated than just mirrored copies, but this can be done with specialized chips in cache memory to minimize performance impact.IBM generally recommends RAID-5 for high-performance FC disk, and RAID-6 for slower, large capacity SATA disk.
However, the questionassumes that the drive cost is a large portion of the overall "disk system" cost. It isn't. For example,Jon Toigo discusses the cost of EMC's new AX4 disk system in his post [National Storage Rip-Off Day]:
EMC is releasing its low end Clariion AX4 SAS/SATA array with 3TB capacity for $8600. It ships with four 750GB SATA drives (which you and I could buy at list for $239 per unit). So, if the disk drives cost $956 (presumably far less for EMC), that means buyers of the EMC wares are paying about $7700 for a tin case, a controller/backplane, and a 4Gbps iSCSI or FC connector. Hmm.
Dell is offering EMC’s AX4-5 with same configuration for $13,000 adding a 24/7 warranty.
(Note: I checked these numbers. $8599 is the list price that EMC has on its own website. External 750GB drivesavailable at my local Circuit City ranged from $189 to $329 list price. I could not find anything on Dell'sown website, but found [The Register] to confirm the $13,000 with 24x7 warranty figure.)
Disk capacity is a shrinking portion of the total cost of ownership (TCO). In addition to capacity, you are paying forcache, microcode and electronics of the system itself, along with software and services that are included in the mix,and your own storage administrators to deal with configuration and management. For more on this, see [XIV storage - Low Total Cost of Ownership].
EMC Centera has been doing this exact type of blob striping and protection since 2002
As I've noted before, there's nothing "magic" about it - Centera has been employing the same type of object-level replication for years. Only EMC's engineers have figured out how to do RAID protection instead of mirroring to keep the hardware costs low while not sacrificing availability.
I agree that IBM XIV was not the first to do an object-level architecture, but it was one of the first to apply object-level technologies to the particular "use case" and "intended workload" of Web 2.0 applications.
RAID-5 based EMC Centera was designed insteadto hold fixed-content data that needed to be protected for a specific period of time, such as to meet government regulatory compliance requirements. This is data that you most likelywill never look at again unless you are hit with a lawsuit or investigation. For this reason, it is important to get it on the cheapest storage configuration as possible. Before EMC Centera, customers stored this data on WORM tape and optical media, so EMC came up with a disk-only alternative offering.IBM System Storage DR550 offers disk-level access for themost recent archives, with the ability to migrate to much less expensive tape for the long term retention. The end result is that storing on a blended disk-plus-tape solution can help reduce the cost by a factor of 5x to 7x, making RAID level discussion meaningless in this environment. For moreon this, see my post [OptimizingData Retention and Archiving].
While both the Centera and DR550 are based on SATA, neither are designed for Web 2.0 platforms.When EMC comes out with their own "me, too" version, they will probably make a similar argument.
IBM XIV Nextra is not a DS8000 replacement
Nextra is anything but Enterprise-class storage, much less a DS8000 replacement. How silly of all those folks to suggest such a thing.
I did searches on the Web and could not find anybody, other than EMC employees, who suggested that IBM XIV Nextra architecture represented a replacement for IBM System Storage DS8000. The IBM XIV press release does not mentionor imply this, and certainly nobody I know at IBM has suggested this.
The DS8000 is designed for a different "use case" andset of "intended workloads" than what the IBM XIV was designed for. The DS8000 is the most popular disk systemfor our IBM System z mainframe platform, for activities like Online Transaction Processing (OLTP) and large databases, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives. Web 2.0 customers that might chooseIBM XIV Nextra for their digital content might run their financial operations or metadata search indexes on DS8000.Different storage for different purposes.
As for the opinion that this is not "enterprise class", there are a variety of definitions that refer to this phrase.Some analysts look at "price band" of units that cost over $300,000 US dollars. Other analysts define this as beingattachable to mainframe servers via ESCON or FICON. Others use the term to refer to five-nines reliability, havingless than 5 minutes downtime per year. In this regard, based on the past two years experience at 40 customer locations,I would argue that it meets this last definition, with non-disruptive upgrades, microcode updates and hot-swappable components.
By comparison, when EMC introduced its object-level Centera architecture, nobody suggested it was the replacement for their Symmetrix or CLARiiON devices. Was it supposed to be?
Given drive growth rates have slowed, improving utilization is mandatory to keep up with 60-70 percent CAGR
Look around you, Tony- all of your competitors are implementing thin provisioning specifically to drive physical utilization upwards towards 60-80%, and that's on top of RAID 5/RAID 6 storage and not RAID 1. Given that disk drive growth rates and $/GB cost savings have slowed significantly, improving utilization is mandatory just to keep up with the 60-70% CAGR of information growth.
Disk drive capacities have slowed for FC disk because much of the attention and investment has been re-directed to ATA technology. Dollar-per-GB price reduction is slowing for disks in general, as researchers are hitting physicallimitations to the amount of bits they can pack per square inch of disk media, and is now around 25 percent per year.The 60-70 percent Compound Annual Growth Rate (CAGR) is real, and can be even growing faster for Web 2.0providers. While hardware costs drop, the big ticket items to watch will be software, services and storage administrator labor costs.
To this end, IBM XIV Nextra offers thin provisioning and differential space-efficient snapshots. It is designed for 60-90 percent utilization, and can be expanded to larger capacities non-disruptively in a very scalable manner.
(Note: The following paragraphs have been updated to clarify the performance tests involved.)
This time, IBM breaks the 1 million IOPS barrier, achieved by running a test workload consisting of a 70/30 mix of random 4K requests. That is 70 percent reads, 30 percent writes, with 4KB blocks. The throughput achieved was 3.5x times that obtained by running the identical workload on the fastest IBM storage system today (IBM System Storage SAN Volume Controller 4.3),
and an estimated EIGHT* times the performance of EMC DMX. With an average response time under 1 millisecond, this solution would be ideal for online transaction processing (OLTP) such as financial recordings or airline reservations.
(*)Note: EMC has not yet published ANY benchmarks of their EMC DMX box with SSD enterprise flash drives (EFD). However, I believe that the performance bottleneck is in their controller and not the back-end SSD or FC HDD media, so I have givenEMC the benefit of the doubt and estimated that their latest EMC DMX4 is as fast as an[IBMDS8300 Turbo] with Fibre Channel drives. If or when EMC publishes benchmarks, the marketplace can make more accurate comparisons. Your mileage may vary.
IBM used 4 TB of Solid State Disk (SSD) behind its IBM SAN Volume Controller (SVC) technology to achieve this amazing result. Not only does this represent a significantly smaller footprint, but it uses only 55 percent of the power and cooling.
The SSD drives are made by [Fusion IO] and are different than those used by EMC made by STEC.
The SVC addresses the one key problem clients face today with competitive disk systems that support SSD enterprise flash drives: choosing what data to park on those expensive drives? How do you decide which LUNs, which databases, or which files should be permanently resident on SSD? With SVC's industry-leading storage virtualization capability, you are not forced to decide. You can move data into SSD and back out again non-disruptively, as needed to meet performance requirements. This could be handy for quarter-end or year-end processing, for example.
Continuing my week in Chicago, for the IBM Storage Symposium 2008, we had sessions that focused on individual products. IBM System Storage SAN Volume Controller (SVC) was a popular topic.
SVC - Everything you wanted to know, but were afraid to ask!
Bill Wiegand, IBM ATS, who has been working with SAN Volume Controller since it was first introduced in 2003. answered some frequently asked questions about IBM System Storage SAN Volume Controller.
Do you have to upgrade all of your HBAs, switches and disk arrays to the recommended firmware levels before upgrading SVC? No. These are recommended levels, but not required. If you do plan to update firmware levels, focus on the host end first, switches next, and disk arrays last.
How do we request special support for stuff not yet listed on the Interop Matrix?
Submit an RPQ/SCORE, same as for any other IBM hardware.
How do we sign up for SVC hints and tips? Go to the IBM
[SVC Support Site] and select the "My Notifications" under the "Stay Informed" box on the right panel.
When we call IBM for SVC support, do we select "Hardware" or "Software"?
While the SVC is a piece of hardware, there are very few mechanical parts involved. Unless there are sparks,
smoke, or front bezel buttons dangling from springs, select "Software". Most of the questions are
related to the software components of SVC.
When we have SVC virtualizing non-IBM disk arrays, who should we call first?
IBM has world-renown service, with some of IT's smartest people working the queues. All of the major storage vendors play nice
as part of the [TSAnet Agreement when a mutual customer is impacted.
When in doubt, call IBM first, and if necessary, IBM will contact other vendors on your behalf to resolve.
What is the difference between livedump and a Full System Dump?
Most problems can be resolved with a livedump. While not complete information, it is generally enough,
and is completely non-disruptive. Other times, the full state of the machine is required, so a Full System Dump
is requested. This involves rebooting one of the two nodes, so virtual disks may temporarily run slower on that
What does "svc_snap -c" do?The "svc_snap" command on the CLI generates a snap file, which includes the cluster error log and trace files from all nodes. The "-c" parameter includes the configuration and virtual-to-physical mapping that can be useful for
disaster recovery and problem determination.
I just sent IBM a check to upgrade my TB-based license on my SVC, how long should I wait for IBM to send me a software license key?
IBM trusts its clients. No software license key will be sent. Once the check clears, you are good to go.
During migration from old disk arrays to new disk arrays, I will temporarily have 79TB more disk under SVC management, do I need to get a temporary TB-based license upgrade during the brief migration period?
Nope. Again, we trust you. However, if you are concerned about this at all, contact IBM and they will print out
a nice "Conformance Letter" in case you need to show your boss.
How should I maintain my Windows-based SVC Master Console or SSPC server?
Treat this like any other Windows-based server in your shop, install Microsoft-recommended Windows updates,
run Anti-virus scans, and so on.
Where can I find useful "How To" information on SVC?
Specify "SAN Volume Controller" in the search field of the
[IBM Redbooks vast library of helpful books.
I just added more managed disks to my managed disk group (MDG), can I get help writing a script to redistribute the extents to improve wide-striping performance?
Yes, IBM has scripting tools available for download on
[AlphaWorks]. For example, svctools will take
the output of the "lsinfo" command, and generate the appropriate SVC CLI to re-migrate the disks around to optimize
performance. Of course, if you prefer, you can use IBM Tivoli Storage Productivity Center instead for a more
Any rules of thumb for sizing SVC deployments?
IBM's Disk Magic tool includes support for SVC deployments. Plan for 250 IOPS/TB for light workloads,
500 IOPS/TB for average workloads, and 750 IOPS/TB for heavy workloads.
Can I migrate virtual disks from one manage disk group (MDG) to another of different extent size?
Yes, the new Vdisk Mirroring capability can be used to do this. Create the mirror for your Vdisk between the
two MDGs, wait for the copy to complete, and then split the mirror.
Can I add or replace SVC nodes non-disruptively? Absolutely, see the Technotes
[SVC Node Replacement page.
Can I really order an SVC EE in Flamingo Pink? Yes. While my blog post that started all
this [Pink It and Shrink It] was initially just some Photoshop humor, the IBM product manager for SVC accepted this color choice as an RPQ option.
The default color remains Raven Black.
Miles per Gallon measures an effeciency ratio (amount of work done with a fixed amount of energy), not a speed ratio (distance traveled in a unit of time).
Given that IOPs and MB/s are the unit of "work" a storage array does, wouldn't the MPG equivalent for storage be more like IOPs per Watt or MB/s per Watt? Or maybe just simply Megabytes Stored per Watt (a typical "green" measurement)?
You appear to be intentionally avoiding the comparison of I/Os per Second and Megabytes per Second to Miles Per Hour?
May I ask why?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
MPG applies to all instances of a particular make and model. Before Henry Ford and the assembly line, cars were made one at a time, by a small team of craftsmen, and so there could be variety from one instance to another. Today, vehicles and storage systems are mass-produced in a manner that provides consistent quality. You can test one vehicle, and safely assume that all similar instances of the same make and model will have the similar mileage. The same is true for disk systems, test one disk system and you can assume that all others of the same make and model will have similar performance.
MPG has a standardized measurement benchmark that is publicly available. The US Environmental Protection Agency (EPA) is an easy analogy for the Storage Performance Council, providing the results of various offerings to chose from.
MPG has usage-specific benchmarks to reflect real-world conditions.The EPA offers City MPG for the type of driving you do to get to work, and Highway MPG, to reflect the type ofdriving on a cross-country trip. These serve as a direct analogy to SPC having SPC-1 for Online transaction processing (OLTP) and SPC-2 for large file transfers, database queries and video streaming.
MPG can be used for cost/benefit analysis.For example, one could estimate the amount of business value (miles travelled) for the amount of dollar investment (cost to purchase gallons of gasoline, at an assumed gas price). The EPA does this as part of their analysis. This is similar to the way IOPS and MB/s can be divided by the cost of the storage system being tested on SPC benchmark results. The business value of IOPS or MB/s depends on the application, but could relate to the number of transactions processed per hour, the number of music downloads per hour, or number of customer queries handled per hour, all of which can be assigned a specific dollar amount for analysis.
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
Increasing MPH, and driving anywhere near the maximum rated MPH for a vehicle, can be reckless and dangerous,risking loss of human life and property damage. Even professional race car drivers will agree there are dangers involved. By contrast, processing I/O requests at maximum speed poses no additional risk to the data, nor possibledamage to any of the IT equipment involved.
While most vehicles have top speeds in excess of 100 miles per hour, most Federal, State and Local speed limits prevent anyone from taking advantage of those maximums. Race-car drivers in NASCAR may be able to take advantage of maximum MPH of a vehicle, the rest of us can't. The government limits speed of vehicles precisely because of the dangers mentioned in the previous bullet. In contrast, processing I/O requests at faster speeds poses no such dangers, so the government poses no limits.
Neither IOPS nor MB/s match MPH exactly.Earlier this week,I related IOPS to "Questions handled per hour" at the local public library, and MB/s to "Spoken words per minute" in those replies. If I tried to find a metric based on unit type to match the "per second" in IOPS and MB/s, then I would need to find a unit that equated to "I/O requests" or "MB transferred" rather than something related to "distance travelled".
In terms of time-based units, the closest I could come up with for IOPS was acceleration rate of zero-to-sixty MPH in a certain number of seconds. Speeding up to 60MPH, then slamming the breaks, and then back up to 60MPH, start-stop, start-stop, and so on, would reflect what IOPS is doing on a requestby request basis, but nobody drives like this (except maybe the taxi cab drivers here in Malaysia!)
Since vehicles are limited to speed limits in normal road conditions, the closest I could come up with for MB/s would be "passenger-miles per hour", such that high-occupancy vehicles like school buses could deliver more passengers than low-occupancy vehicles with only a few passengers.
Neither start-stops nor passenger-miles per hour have standardized benchmarks, so they don't work well for comparisonbetween vehicles.If you or anyone can come up with a metric that will help explain the relevance of standardized benchmarks better than the MPG that I already used, I would be interested in it.
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s.
Oh, and yes, it's true - the DMX-4 is not the first high-end storage array to ship a 4Gb/s FC back-end. The USP-V, announced way back in May, has that honor (but only if it meets the promised first shipments in July 2007). DMX-4 will be in August '07, so I guess that leaves the DS8000 a distant 3rd.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
Well, it's 2008, which could mark the end to RAID5 and mark the beginnings of a new disk storagearchitecture. IBM starts the year with exciting news, acquiring new disk technology from a smallstart-up called XIV, led by former-EMCer Moshe Yanai. Moshe was ousted publicly in 2001 from hisposition as EMC's VP of engineering, and formed his own company. It didn't take long for EMC bloggersto poke fun at this already. Mark Twomey, in his StorageZilla blog, had mentioned XIV before back in August,[XIV], and again todayin [IBM Buys XIV].
To address the new requirements associated with next generation digital content, IBM chose XIV and its NEXTRA™ architecture for its ability to scale dynamically, heal itself in the event of failure, and self-tune for optimum performance, all while eliminating the significant management burden typically associated with rapid growth environments. The architecture also is designed to automatically optimize resource utilization of all the components within the system, which can allow for easier management and configuration and improved performance and data availability.
"We are pleased to become a significant part of the IBM family, allowing for our unique storage architecture, our engineers and our storage industry experience to be part of IBM's overall storage business," said Moshe Yanai, chairman, XIV. "We believe the level of technological innovation achieved by our development team is unparalleled in the storage industry. Combining our storage architectural advancements with IBM's world-wide research, sales, service, manufacturing, and distribution capabilities will provide us with the ability to have these technologies tackle the emerging Web 2.0 technology needs and reach every corner of the world."
The NEXTRA architecture has been in production for more than two years, with more than four petabytes of capacity being used by customers today.
Current disk arrays were designed for online transaction processing (OLTP) databases. The focus was onusing fastest most expensive 10K and 15K RPM Fibre Channel drives, with clever caching algorithmsfor quick small updates of large relational databases. However, the world is changing, and peoplenow are looking for storage designed for digital media, archives, and other Web 2.0 applications.
One problem that NEXTRA architecture addresses is RAID rebuild. In a standard RAID5 6+P+S configuration of 146GB 10K RPM drives, the loss of one disk drive module (DDM) was recovered by reconstructing the data from parity of the other drives onto the spare drive. The process took46 minutes or longer, depending on how busy the system was doing other things. During this time,if a second drive in the same rank fails, all 876GB of data are lost. Double-drive failures are rare,but unpleasant when they happen, and hopefully you have a backup on tape to recover the data from.Moving to slower, less expensive SATA drives made this situation worse. The drives have highercapacity, but run at slower speeds. When a SATA drive fails in a RAID5 array, it could take severalhours to rebuild, and that is more time exposure for a second drive failure. A rebuild for a 750GBSATA drive would take five hours or more,with 4.5 TB of data at risk during the process if a second drive failure occurs.
The Nextra architecture doesn't use traditional RAID ranks or spare DDMs. Instead, data is carved up into 1MBobjects, and each object is stored on two physically-separate drives. In the event of a DDM loss, allthe data is readable from the second copies that are spread across hundreds of drives. New copies aremade on the empty disk space of the remaining system. This process can be done for a lost 750GB drive in under20 minutes. A double-drive failure would only lose those few objects that were on both drives, so perhaps1 to 2 percent of the total data stored on that logical volume.
Losing 1 to 2 percent of data might be devastating to a large relational database, as this could impactthe entire access to the internal structure. However, this box was designed for unstructuredcontent, like medical images, music, videos, Web pages, and other discrete files. In the event of a double-drivefailure, individual files would be recovered, such as with IBM Tivoli Storage Manager backup software.
IBM will continue to offer high-speed disk arrays like the IBM System Storage DS8000 and DS4800 for OLTP applications, and offer NEXTRA for this new surge in digital content of unstructured data. Recognizing this trend, diskdrive module manufacturers will phase out 10K RPM drives, and focus on 15K RPM for OLTP, and low-speedSATA for everything else.
Update: This blog post was focused on the version of XIV box available as of January 2008 that was built by XIV prior to the IBM acquisition. IBM has since made a major revision, made available August 2008 thataddresses a variety of workloads, including database, OLTP, email, as well as digital content and unstructuredfiles. Contact your IBM or IBM Business Partner for the latest details!
Bottom line, IBM continues to celebrate the new year, while the EMC folks in Hopkington, MA will continue to nurse their hangovers. Now that's a good way to start the new year!
As a consultant, I am often asked to help design the architecture for the information infrastructure. A usefulanalogy to gather requirements and preferences is the difference between area rugs and wall-to-wall carpeting. Arearugs are not secured to the floor and cover only a portion of the floor area. Carpets are generally tacked or cemented to the floor, often with an underlay of cushion padding, stretched across the entire floor surface, out to all four walls of each room.
Each has its pros and cons, and often is a matter of preference. Some people like area rugs because they can choosea different style for each room, match the decor and color scheme of furniture, and use these to define each livingspace. Ever since paleolithic man put animal skins on the floor of their cave, people recognize that cold, hard andugly floors could be covered up with something soft and more attractive.Others prefer wall-to-wall carpeting because they want to walk around the house barefoot, have their young children crawl on their hands and knees, and give the entire house a unified look and feel. This is often an inexpensive option when compared against the cost of individual rugs.
The same is true for an information infrastructure. For some, they prefer the "area rug" approach: this style ofstorage for their email, this other type of storage for their databases, and perhaps a third for their unstructuredfile systems. When customers ask what storage would I recommend for their SAP application, or their Microsoft Exchangeemail environment, or their Business Intelligence (BI) software, I recognize they are taking this "area rug" approach.
Like area rugs, having different storage can focus on specific attributes of the workload characteristics. It alsoinsulates against company-wide changes, the dreaded "rip-and-replace" of replacing all of your storage with somethingfrom a different vendor. With "area rug" storage, you can support a dual-vendor or multi-vendor strategy, and upgrade or replace each on its own schedule.
Thanks to open standards and industry-standard benchmarks, changing out one storage solution for another is assimple as rolling up an area rug, and putting another one in its place that is similar in size dimensions.
Others may prefer "wall-to-wall carpeting" approach: one disk system type, one tape library type,one network type, that provides unified management and minimizes the needs for unique skills. Generally, the choice of NAS, SAN or iSCSI infrastrucutre is done company-wide, and might strongly influence the set of products that will support that decision. For example, those with a mix of mainframe and distributed servers looking for SAN-attached storage may look at an [IBM System Storage DS8000] and [TS3500 tape library] that can provide support for FICON and FCP.
Those looking at NAS or iSCSI might consider the IBM System Storage N series products, "unified storage" supporting iSCSI, FCP and NAS protocols. If you want the "wall-to-wall" to stretch across all the sites in your globally integrated enterprise, IBM's scalable NAS product, Scale-Out File Services[SoFS], provides a global name spacein combination with a clustered file system that provides incredible scalability and performance based on field-proven technology used by the majority of the [Top 100 supercomputer] deployments.
IBM can help you design an information infrastructure that fits either approach.
Well, it's Tuesday again, and that means more IBM announcements!
Today, IBM announced the enhanced IBM System Storage DS3200 disk system.It is in our DS3000 series, the DS3200 is SAS-attach, DS3300 is iSCSI-attach, and DS3400 is FC-attach. All of them support up to 48 drives, which can be a mix of SAS and SATA drives.
The DS3200 supports the following operating environments (see IBM's [Interop Matrix] for details):
Linux (both Linux-x86 and Linux on POWER)
With today's announcements, the DS3200 can be used to boot from, as well as contain data. This is ideal to combine with IBM BladeCenter. With the IBM BladeCenter you can have 14 blades, either x86 or POWER based processors, attached to a DS3200 via SAS switch modules in the back of the chassis.
Let's take an example of how this can be used for a Scale-Out File Services[SoFS] deployment.
First, we start with servers. We can have either three [IBM System x3650] servers, but this would use up all six of the direct-attach ports. Instead, we'll choose the [BladeCenter H chassis], with three HS21 blades for SoFS, and that leaves us with eleven empty blade slots we could put in a management node, or other blades to run applications.
SAS connectivity modules
The IBM BladeCenter [SAS Connectivity Module] allows the blade servers to connect to a DS3200. Two of them fit right in the back of the BladeCenter chassis, providing full redundancy without consuming additional rack space.
DS3200 and EXP3000 expansion drawers
We'll have one DS3200 controller with twelve internal drives, and three expansion EXP3000 drawers with twelve drives each, for a total of 48 drives. Using 1TB SATA, this would be 48 TB raw capacity.
The end result? You get a 48TB NAS scalable storage solution, supporting up to 7500 concurrent CIFS and NFS users, with up to 700 MB/sec with large block transfers. By using BladeCenter, you can expand performance by adding more blades to the Chassis, or have some blades running SAP or Oracle RAC have direct read/write access to the SoFS data.
Just another example on how IBM can bring together all the components of a solution to provide customer value!
Fellow Blogger BarryB mentions "chunk size" in his post [Blinded by the light],as it relates to Symmetrix Virtual Provisioning capability. Here is an excerpt:
I mean, seriously, who else but someone who's already implemented thin provisioning would really understand the implications of "chunk" size enough to care?
For those of you who don't know what the heck "chunk size" means (now listen up you folks over at IBM who have yet to implement thin provisioning on your own storage products), a "chunk" is the term used (and I think even trademarked by 3PAR) to refer to the unit of actual storage capacity that is assigned to a thin device when it receives a write to a previously unallocated region of the device.
For reference, Hitachi USP-V uses I think a 42MB chunk, XIV NEXTRA is definitely 1MB, and 3PAR uses 16K or 256K (depending upon how you look at it).
Thin Provisioning currently offered in IBM System Storage N serieswas technically "implemented" by NetApp, and that the Thin Provisioning that will be offered in our IBM XIV Nextrasystems will have been acquired from XIV. Lest I remind you that many of EMC's products were developed by other companies first, then later acquired by EMC, so no need for you to throw rocks from your glass houses in Hopkington.
"Thin provisioning" was first introduced by StorageTek in the 1990's and sold by IBM under the name of RAMAC Virtual Array (RVA). An alternative approach is "Dynamic Volume Expansion" (DVE). Rather than giving the host application a huge 2TB LUN but actually only use 50GB for data, DVE was based on the idea that you only give out 50GB they need now, but could expand in place as more space was required. This was specifically designed to avoid the biggest problem with "Thin Provisioning" which back then was called "Net Capacity Load" on the IBM RVA, but today is now referred to as "over-subscription". It gave Storage Administrators greater control over their environment with no surprises.
In the same manner as Thin Provisioning, DVE requires a "chunk size" to work with. Let's take a look:
On the DS4000 series, we use the term "segment size", and indicate that the choice of a segment size can have some influence on performance in both IOPS and throughput. Smaller segment sizes increase the request rate (IOPS) by allowing multiple disk drives to respond to multiple requests. Large segment sizes increase the data transfer rate(Mbps) by allowing multiple disk drives to participate in one I/O request. The segment size does not actually change what is stored in cache, just what is stored on the disk itself.It turns out in practice there is no advantage in using smaller sizes with RAID 1; only in a few instances does this help with RAID-5 if you can writea full stripe at once to calculate parity on outgoing data. For most business workloads, 64KB or 128KB are recommended. DVE expands by the same number of segments across all disks in the RAID rank, so for example in a 12+P rank using 128KB segment sizes, the chunk size would be thirteen segments, about 1.6MB in size.
SAN Volume Controller
On the SAN Volume Controller, we call this "extent size" and allow it to be various values 64MB to 512MB. Initially,IBM only managed four million extents, so this table was used to explain the maximum amount that could be managedby an SVC system (up to 8 nodes) depending on extent size selected.
IBM thought that since we externalized "segment size" on the DS4000, we should do the same for the SANVolume Controller. As it turned out, SVC is so fast up in the cache, that we could not measure any noticeable performance difference based on extent size. We did have a few problems. First, clients who chose 16MB andthen grew beyond the 64TB maximum addressable discovered that perhaps they should have chosen something larger.Second, clients called in our help desk to ask what size to choose and how to determine the size that was rightfor them. Third, we allowed people to choose different extent sizes per managed disk group, but that preventsmovement or copies between groups. You can only copy between groups that use the same extent size. The generalrecommendation now is to specify 256MB size, and use that for all managed disk groups across the data center.
The latest SVC expanded maximum addressability to 8PB, still more than most people have today in their shops.
Getting smarter each time we introduce new function, we chose 1GB chunks for the DS8000. Based on a mainframebackground, most CKD volumes are 3GB, 9GB, or 27GB in size, and so 1GB chunks simplified this approach. Spreadingthese 1GB chunks across multiple RAID ranks greatly reduced hot-spots that afflict other RAID-based systems.(Rather than fix the problem by re-designing the architecture, EMC will offer to sell you software to help you manually move data around inside the Symmetrix after the hot-spot is identified)
Unlike EMC's virtual positioning, IBM DS8000 dynamic volume expansion does work on CKD volumes for our System z mainframe customers.
The trade-off in each case was between granularity and table space. Smaller chunks allow finer control on the exact amount allocated for a LUN or volume, but larger chunks reduced the number of chunks managed. With our advanced caching algorithms, changes in chunk size did not noticeably impact performance. It is best just to come up with a convenient size, and either configure it as fixed in the architecture, or externalize it as a parameter with a good default value.
Meanwhile, back at EMC, BarryB indicates that they haven't determined the "optimal" chunk size for their newfunction. They plan to run tests and experiments to determine which size offers the best performance, and thenmake that a fixed value configured into the DMX-4. I find this funny coming from the same EMC that won't participate in [standardized SPC benchmarks] because they feel that performance is a personal and private matter between a customer and their trusted storage vendor, that all workloads are different, and you get the idea. Here's another excerpt:
Back at the office, they've taking to calling these "chunks" Thin Device Extents (note the linkage back to EMC's mainframe roots), and the big secret about the actual Extent size is...(wait for it...w.a.i.t...for....it...)...the engineers haven't decided yet!
That's right...being the smart bunch they are, they have implemented Symmetrix Virtual Provisioning in a manner that allows the Extent size to be configured so that they can test the impact on performance and utilization of different sizes with different applications, file systems and databases. Of course, they will choose the optimal setting before the product ships, but until then, there will be a lot of modeling, simulation, and real-world testing to ensure the setting is "optimal."
Finally, BarryB wraps up this section poking fun at the chunk sizes chosen by other disk manufacturers. I don't knowwhy HDS chose 42MB for their chunk size, but it has a great[Hitchiker's Guide to the Galaxy]sound to it, answering the ultimate question to life, the universe and everything. Hitachi probably went to theirDeep Thought computer and asked how big should their "chunk size" be for their USP-V, and the computer said: 42.Makes sense to me.
I have to agree that anything smaller than 1MB is probably too small. Here's the last excerpt:
Now, many customers and analysts I've spoken to have in fact noted that Hitachi's "chunk" size is almost ridiculously large; others have suggested that 3PAR's chunks are so small as to create performance problems (I've seen data that supports that theory, by the way).
Well, here's the thing: the "right" chunk size is extremely dependent upon the internal architecture of the implementation, and the intersection of that ideal with the actual write distribution pattern of the host/application/file system/database.
So my suggestion to EMC is, please, please, please take as much time as you need to come up with the perfect"chunk size" for this, one that handles all workloads across a variety of operating systems and applications, from solid-state Flash drives to 1TB SATA disk. Take months or years, as long as it takes. The rest of the world is in no hurry, as thin provisioning or dynamic volume expansion is readily available on most other disk systems today.
Maybe if you ask HDS nicely, they might let you ask their computer.
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData blog, posted a ["bleg"] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
Keeping more backup data available online for fast recovery.
Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
Will this meet my current business requirements?
Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
In this case, it is not chess pieces, but FUD being slung around like mud between vendors. EMC blogger Chuck Hollis' post [Products vs. Features] correctly pointsout that IBM has invented most nearly everything useful in IT, and sadly a few things we wish we hadn't.Gene Amdahl, who left IBM to start his own company, is credited for coining the phrase describing IBM'sinnovative sales techniques. Wikipedia has a nice write up on the history of[Fear, Uncertainty and Doubt(FUD)].
Nowadays, when you hear "FUD" most storage administrators immediately think of EMC, who have taken this method to anew level of art-form. Take for example two EMC entries from fellow blogger BarryB, on his Storage Anarchist blog:[Not Dead Yet, andPushing Daisies].The first is a reference to a funny scene from a Monty Python movie, and the second one is referring to a terriblenew television program called "Pushing Daisies". (In this show, the main character can bring a dead personback to life for sixty seconds, just long enough to ask a few questions on behalf of his detective friend. He must touch the person again within 60 seconds, or someone else randomly dies instead. I amnot a fan of this concept, and found it a bit morbid and creepy. But I digress.)
It is true I was on vacation the past two weeks, but this was group travel I booked over six months ago before we had the exact dates lined up for our various announcements, and not a last-minute celebration of my recent new job assignment. I got all my assignments for this announcement turned in before leaving for my trip. I never thought of checking with fellow IBM blogger BarryW to make sure that we don't have overlapping vacation schedules, leaving the "blogosphere" unmanned, so to speak, but it is not a bad idea. Fortunately, our IBM PR team was able to make their rebuttal through other means. You can read the recap on Techworld [Marketing Wars by Proxy].
Several astute readers on my blog, however, requested that I add my two cents. Let's take a look at some of BarryB's comments:
...most DS8300's are to this day most frequently bundled as "free" storage with IBM mainframe and server sales.
We just shipped our 15,000th box, so for this absurd statement to be true, more than half would have to be given away as part of a server-and-storage deal?Actually, about a third of our DS8000 sales are sold with servers in the same bundle, and while we do provide discounts from the official list price, that is not the same as "free". The other two thirds are sold into accounts to be used with the existing servers already deployed. So BarryB, your math doesn't work out. (Perhaps you've been taking Hitachi math lessons???)
It is interesting however, that when we do a 4-year TCO comparison, between a normally-discounted DS8000 versus free EMC DMX4 hardware, IBM still has the lower cost, given that most of the price-gouging from EMC happens after the initial sale, through software features, annual Powerpath renewals and MES upgrades. If you are an EMC customer, and you are planning to add more capacity to your DMX, ask EMC to charge you no more than what you originally paid on a dollar-per-GB basis for the initial capacity. That's only fair, right?
...No thin provisioning, or even a commitment to thin provisioning. Just crickets. (Celerra support since Jan 2006...
EMC DMX does not have thin provisioning available today either, so BarryB brings up Celerra, their NAS box? IBM System Storage N series NAS box also has thin provisioning, so if you want thin provisioning you can buy a NAS box from EMC or IBM. Thin provisioning makes sense using NAS protocols, as there are actual commands to "delete a file" that can then free up the related blocks in a thin-provisioned environment. The only way to do this with block-oriented protocols is to get the OS to notify the storage device that blocks can be freed up. As it turns out, IBM's z/OS has such support, which we developed specifically for our thin-provisioning support in our IBM RAMAC Virtual Array disk systems back in the 1990s.For block-oriented devices on most other operating systems, thin provisioning may not be all that it is cracked up to be.
No SATA drives (only DMX-4 supports native SATA-II drives, since Aug’07)
A few people are confused on this. IBM DS8000 has supported FATA for quite some time now, same slower speeds and higher capacities as SATA, but are technically NOT the same as SATA. FATA are designed to provide better protection against vibrational shock, to improve reliability of the drives. IBM felt that if the data was important enough to put on a high-end system, it should get better-than-SATA treatment. If you really want SATA, try our IBM System Storage N series, DS4000 or DS3000 models.
No RAID 6 (DMX-3 has supported multi-dimensional RAID since Q1’07, DMX-4 since Aug'07, ...
IBM N series supports RAID6, but we called it RAID-DP and that confused some people. Same thing, DP stands for Dual Parity, protecting against a double-disk failure. We also just announced RAID6 on our DS4000 series, by the way.
No 4Gb back-end (USP-V since May '07, DMX-4 since Aug’07)
I found this one odd, since BarryB himself in an earlier post explained why 4Gbps back-end made no difference to DMX4 performance in this post [DMX-4 and Oh So Much More], which I will put into a different color so you can tell it is from a different post:
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s. Transmit times are really only a tiny portion of I/O overhead, and just don't make that much difference when a massively-cached system is pre-fetching reads, buffering/delaying writes and reordering I/O requests to minimize seek times. Not that 4Gb/s won't help some applications, but most people just won't see any noticeable difference.
In this case, BarryB is right. The IBM DS8000's 2Gbps back-end is not a performance bottleneck. The DS8000 with a 2Gbps back-end is faster than DMX4 with a 4Gbps back-end for business application workloads. EMC doesn't publish SPC benchmarks to deny this, so you will just have to take our word on this.
Still only 1024 maximum disk drives (DMX-3 & 4 support up to 2400 drives, USP-V supports 1152)
I would be curious to see how many customers have more than 1024 drives on any high-end disk array.As we learned back in [Day 2 Storage Symposium], the average DS8100 has 17.4 TB, and DS8300 has 41.5 TB capacity. Using 500GB drives,that's only 83 spindles. Even with 73GB drives, that's 568 spindles. Plenty of room for growth, so I am notconvinced that higher theoretical upper architectural limits are worth discussing here.
Still only two HARD LPARs (partitions) ..., and even IBM’s mid-tier products support more than 2 storage partitions (in this same announcement)
IBM's two LPARs are TWICE what EMC DMX offers. I don't even know why anyone from EMC would bring this up? While EMC is enjoying their success with VMware, the lack the experience to carry this over to their storage lines. Until EMC offers MORE THAN TWO of any kind of partitions on their high-end offerings, there just is no credibility here. As for our "storage partitions" on our DS4000 line, that is an unfortunate mis-understanding of the press release. On the DS4000, the term "storage partition" is really "LUN masking", dividing up only which disks can be accessed by which hosts, and not dividing up any processor or cache capacity. So this is not the same as any LPAR concept on any other system. For example, a DS4000 with 64 partitions can be attached to 64 hosts, or 64 host-clusters like a Windows MSCS environment or AIX HACMP.
No native Ethernet replication or iSCSI support (Symmetrix has had since 2002)
Again, I found this one odd. On another EMC post, [Vigorous Debates],Chad Sakac mentions that only 2% of Symmetrix are sold with IP ports, not sure if this is for Ethernet replication, iSCSI attachment, or both (Again, I will use a different color):
On the Symm business (a huge part of EMC’s business – the IP ports are included on 2% of deals. That’s a fact.
Just because engineer can put a feature or function on a box, doesn't mean there is business sense to do so. I would hate for IBM to invest millions of dollars on native iSCSI support, only to have 2% of our DS8000 boxes sold with that feature. Customers who have DS8000 on FC SANs already deployed can easily add iSCSI support either through their SAN switches, or by fronting the DS8000 with an N series gateway. Most customers looking for native iSCSI are the smaller no-SAN-deployed SMB customers, and for them, we have both the DS3300 and the various N series models to choose from.
Well that's my two cents. The DS8000 series remains a strategic part of the IBM System Storage offering matrix, with continued investment in the development, as well as on-going research that we can leverage throughout the IBM company. I would like to read your thoughts on this, post me a comment below.
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
Storage for the Green Data Center
I presented this topic in four general categories:
Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
Continuing my week in Chicago, for the IBM Storage Symposium 2008, I attended two presentations on XIV.
XIV Storage - Best Practices
Izhar Sharon, IBM Technical Sales Specialist for XIV, presented best practices using XIV in various environments.He started out explaining the innovative XIV architecture: a SATA-based disk system from IBM can outperformFC-based disk systems from other vendors using massive parallelism. He used a sports analogy:
"The men's world record for running 800 meters was set in 1997 by Wilson Kipketer of Denmark in a time of 1:41.11.
However, if you have eight men running, 100 meters each, they will all cross the finish line in about 10 seconds."
Since XIV is already self-tuning, what kind of best practices are left to present? Izhar presented best practicesfor software, hosts, switches and storage virtualization products that attach to the XIV. Here's some quickpoints:
Use as many paths as possible.
IBM does not require you to purchase and install multipathing software as other competitors might. Instead, theXIV relies on multipathing capabilities inherent to each operating system.For multipathing preference, choose Round-Robin, which is now available onAIX and VMware vSphere 4.0, for example. Otherwise, fixed-path is preferred over most-recently-used (MRU).
Encourage parallel I/O requests.
XIV architecture does not subscribe to the outdated notion of a "global cache". Instead, the cache is distributed across the modules, to reduce performance bottlenecks. Each HBA on the XIV can handle about 1400requests. If you have fewer than 1400 hosts attached to the XIV, you can further increase parallel I/O requests by specifying a large queue depth in the host bus adapter (HBA).An HBA queue depth of 64 is a good start. Additional settings mightbe required in the BIOS, operating system or application for multiple threads and processes.
For sequential workloads, select host stripe size less than 1MB. For random, select host stripe size larger than 1MB. Set rr_min_io between ten(10) and the queue depth(typically 64), setting it to half of the queue depth is a good starting point.
If you have long-running batch jobs, consider breaking them up into smaller steps and run in parallel.
Define fewer, larger LUNs
Generally, you no longer need to define many small LUNs, a practice that was often required on traditionaldisk systems. This means that you can now define just 1 or 2 LUNs per application, and greatly simplifymanagement. If your application must have multiple LUNs in order to do multiple threads or concurrent I/O requests, then, by all means, define multiple LUNs.
Modern Data Base Management Systems (DBMS) like DB2 and Oracle already parallelize their I/O requests, sothere is no need for host-based striping across many logical volumes. XIV already stripes the data for you.If you use Oracle Automated Storage Management (ASM), use 8MB to 16MB extent sizes for optimal performance.
For those virtualizing XIV with SAN Volume Controller (SVC), define manage disks as 1632GB LUNs, in multiple of six LUNs per managed disk group (MDG), to balance across the six interface modules. Define SVC extent size to 1GB.
XIV is ideal for VMware. Create big LUNs for your VMFS that you can access via FCP or iSCSI.
Organize data to simplify Snapshots.
You no longer need to separate logs from databases for performance reasons. However, for some backup productslike IBM Tivoli Storage Manager (TSM) for Advanced Copy Services (ACS), you might want to keep them separatefor snapshot reasons. Gernally, putting all data for an application on one big LUNgreatly simplifies administration and snapshot processing, without losing performance.If you define multiple LUNs for an application, simply put them into the same "consistencygroup" so that they are all snapshot together.
OS boot image disks can be snapshot before applying any patches, updates or application software, so that ifthere are any problems, you can reboot to the previous image.
Employ sizing tools to plan for capacity and performance.
The SAP Quicksizer tool can be used for new SAP deployments, employing either the user-based orthroughput-based sizing model approach. The result is in mythical unit called "SAPS", which represents0.4 IOPS for ERP/OLTP workloads, and 0.6 IOPS for BI/BW and OLAP workloads.
If you already have SAP or other applications running, use actual I/O measurements. IBM Business Partners and field technical sales specialists have an updated version of Disk Magic that can help size XIV configurations fromPERFMON and iostat figures.
Lee La Frese, IBM STSM for Enteprise Storage Performance Engineering, presented internal lab test results forthe XIV under various workloads, based on the latest hardware/software levels [announced two weeks ago]. Three workloadswere tested:
Web 2.0 (80/20/40) - 80 percent READ, 20 percent WRITE, 40 percent cache hits for READ.YouTube, FlickR, and the growing list at [GoWeb20] are applications with heavy read activity, but because of[long-tail effects], may not be as cache friendly.
Social Networking (50/50/50) - 50 percent READ, 50 percent WRITE, 50 percent cache hits for READ.Lotus Connections, Microsoft Sharepoint, and many other [social networking] usage are more write intensive.
Database (70/30/50) - 70 percent READ, 30 percent WRITE, 50 percent cache hits for READ.The traditional workload characteristics for most business applications, especially databases like DB2 andOracle on Linux, UNIX and Windows servers.
The results were quite impressive. There was more than enough performance for tier 2 application workloads,and most tier 1 applications. The performance was nearly linear from the smallest 6-module to the largest 15-module configuration. Some key points:
A full 15-module XIV overwhelms a single SVC 8F4 node-pair. For a full XIV, consider 4 to 8 nodes 8F4 models, or 2 to 4 nodes of an 8G4. For read-intensive cache-friendly workloads, an SVC in front of XIV was able to deliver over 300,000 IOPS.
A single node TS7650G ProtecTIER can handle 6 to 9 XIV modules. Two nodes of TS7650G were needed to drivea full 15-module XIV. A single node TS7650 in front of XIV was able to ingest 680 MB/sec on the seventh day with17 percent per-day change rate test workload using 64 virtual drives. Reading the data back got over 950 MB/sec.
For SAP environments where response time 20-30 msec are acceptable, the 15-module XIV delivered over 60,000 IOPS. Reducing this down to 25,000-30,000 cut the msec response time to a faster 10-15 msec.
These were all done as internal lab tests. Your mileage may vary.
Not surprisingly, XIV was quite the popular topic here this week at the Storage Symposium. There were many moresessions, but these were the only two that I attended.
Earlier this week, EMC announced its Symmetrix V-Max, following two trends in the industry:
Using Roman numerals. The "V" here is for FIVE, as this is the successor to the DMX-3 and DMX-4. EMC might have gotten the idea from IBM's success with the XIV (which does refer to the number 14, specifically the 14th class of a Talpiot program in Israel that the founders of XIV graduated from).
Adding "-Max", "-Monkey" or "2.0" at the end of things to make them sound more cool and to appeal to a younger, hipper audience. EMC might have gotten this idea from Pepsi-Max (... a taste of storage for the next generation?)
I took a cue from President Obama and waited a few days to collect my thoughts and do my homework before responding.Special thanks to fellow blogger ChuckH in giving me a [handy list of reactions] for me to pick and choose from. It appears that EMC marketing machine feels it is acceptable for their own folks to claim that EMC is doing something first, or that others are catching up to EMC, but when other vendors do likewise, then that is just pathetic or incoherent. Here are a few reactions already from fellow bloggers:
This was a major announcement for EMC, addressing many of the problems, flaws and weaknesses of the earlier DMX-3 and DMX-4 deliverables. Here's my read on this:
Now you can have as many FCP ports (128) as an IBM System Storage DS8300, although the maximum number of FICON ports is still short, and no mention of ESCON support. The Ethernet ports appear to be 1Gb, not the new 10GbE you might expect.
Support for System z mainframe
V-Max adds some new support to catch up with the DS8000, like Extended Address Volumes (EAV). EMC is still not quite there yet. IBM DS8000 continues to be the best, most feature-rich storage option if you have System z mainframe servers.
Both the IBM DS8000 and HDS USP-V beat the DMX-4 in performance, and in some cases the DMX-4 even lost to the IBM XIV, so EMC had to do something about it. EMC chooses not to participate in industry-standard performance benchmarks like those from the [Storage Performance Council], which limits them to vague comparisons against older EMC gear. I'll give EMC engineers the benefit of the doubt and say that now V-Max is now "comparably as fast as HDS and IBM offerings".
Getting "V" in the name
The "V" appears to be for the roman number five, not to be confused with external heterogeneous storage virtualization that HDS USP-V and IBM SVC provide. There is no mention of synergy with EMC's failed "Invista" product, and I see no support for attaching other vendors disk to the back of this thing.
Switch to Intel processor
Apple switched its computers from PowerPC to Intel-based, and now EMC follows in the same path. There are some custom ASICs still in V-Max, so it is not as pure as IBM's offerings.
Modular, XIV-like Scale-out Architecture
Actually, the packaging appears to follow the familiar system bays and storage bays of the DMX-4 and DMX-4 950 models, but architecturally offers XIV-like attachment across a common switch network between "engines", EMC's term for interface modules.
Non-disruptive data migration
IBM's SoFS, DR550 and GMAS have this already, as does as anything connected behind an IBM SAN Volume Controller.
A long time ago, IBM used to have midrange disk storage systems called "FAStT" which stood for Fibre Array Storage Technology, so this might have given EMC the idea for their "Fully Automated Storage Tiering" acronym. The concept appears similar to what IBM introduced back in 2007 for the Scale-Out-File Services [SofS] which not only provides policy-based placement, movement and expiration on different disk tiers, includes tape tiers as well for a complete solution. I don't see anything in the V-Max announcement that it will support tape anytime soon.
And what ever happend to EMC's Atmos? Wasn't that supposed to be EMC's new direction in storage?
Zero-data loss Three-site replication
IBM already calls this Metro/Global Mirror for its IBM DS8000 series, but EMC chose to call it SRDF/EDP for Extended Distance Protection.
Ease of Use
The most significant part of the announcement is that EMC is finally focusing on ease-of-use.In addition to reducing the requirement for "Bin File" modifications, this box has a redesigned user interface to focus on usability issues. For past DMX models, EMC customers had to either hire EMC to do tasks for them that were just to difficult otherwise, or buy expensive software like their EMC Control Center to manage. EMC willcontinue to sell DMX-4 boxes for a while, as they are probably supply-constrained on the V-Max side, but I doubt they will retro-fit these new features back to DMX-3 and DMX-4.
When IBM announced its acquisition of XIV over a year ago now, customers were knocking down our doors to get one. This caught two particular groups looking like a [deer in headlights]:
EMC Symmetrix sales force: Some of the smarter ones left EMC to go sell IBM XIV, leaving EMC short-handed and having to announce they [were hiring during their layoffs]. Obviously, a few of the smart ones stayed behind, to convince their management to build something like the V-Max.
IBM DS8000 sales force: If clients are not happy with their existing EMC Symmetrix, why don't they just buy an IBM DS8000 instead? What does XIV have that DS8000 doesn't?
Let me contrast this with the situation Microsoft Windows is currently facing.
I am often asked by friends to help them pick out laptops and personal computers. I use Linux, Windows and MacOS, so have personal experience with all three operating systems.
Linux is cheaper, offers the power-user the most options for supporting older, less-powerfulequipment, but I wouldn't have my Mom use it. While distributions like Ubuntu are makinggreat strides, it is just too difficult for some people.
MacOS is nice, I like it, it works out of the box with little or no customization and an intuitive interface. However, some of my friends don't make IBM-level salaries, and have to watch their budget.
In their "I'm a PC" campaign, Microsoft is fighting both fronts. Let's examine two commercials:
In the first commercial, a young eight-year-old puts together a video from pictures oftoy animals and some background music.The message: "Windows is easier to use than Linux!" If they really wanted to send this message, they should have shown senior citizens instead.
In the second commercial, a young college student is asked to find a laptop with 17 inchscreen, and a variety of other qualifications, for under $1000 US dollars. The only modelat the Apple store below this price had a 13 inch screen, but she finds a Windows-based system that had this size screen and met all the other qualifications. The message: "Windows-based hardware from a variety of competitors are less expensive than hardware from Apple!"
Both Microsoft and Apple charge a premium for ease-of-use.In the storage world, things are completely opposite. Vendors don't charge a premium forease-of-use. In fact, some of the easiest to use are also the least expensive.
If you just have Windows and Linux, you can get some entry level system likethe IBM DS3000 series, only a few features, and can be set up in six simple steps.
Next, if you have a more interesting mix of operating systems, Linux, Windows and some flavorsof UNIX like IBM AIX, HP-UX or Sun Solaris, then you might want the features and functionsof more pricier midrange offerings. More options means that configuration and deploymentis more difficult, however.
Finally, if you are serious Fortune 500 company, running your mission critical applications on System z or System i centralized systems in a big data center, that you might be willing to pay top dollar for the most feature-rich offerings of an Enterprise-class machine.Thankfully you have an army of highly-trained staff to handle the highest levels of complexity.
IBM's DS8000, HDS USP-V and EMC's Symmetrix are the key players in the Enterprise-classspace. They tried to be ["all things to all people"], er.. perhaps all things to allplatforms. All of the features and functions came at a price, not just in dollars, butin complexity and difficulty. You needed highly skilled storage admins using expensive storage management software, or be willing to hirethe storage vendor's premium services to get the job done.
IBM recognized this trend early. IBM's SVC, N series and now XIV all offer ease-of-use withenterprise-class features and functions, at lower total cost of ownership than traditional enterprise-class systems. IBM is not the only one, of course, as smaller storage start-ups like 3PAR,Pillar Data Systems, Compellent, and to some extent Dell's EqualLogic all recognized thisand developed clever offerings as well.
While IBM's XIV may not have been the first to introduce a modular, scale-out architectureusing commodity parts managed by sophisticated ease-of-use interfaces, its success might have been the kick-in-the-butt EMC needed to follow the rest of the industry in this direction.
Well, it's Tuesday, and that means IBM announcements! Today is bigger, as there are a lot of Dynamic Infrastructure announcements throughout the company with a common theme, cloud computing and smart business systems that support the new way of doing things. Today, IBM announced its new "IBM Smart Archive" strategy that integrates software, storage, servers and services into solutions that help meet the challenges of today and tomorrow. IBM has been spending the past few years working across its various divisions and acquisitions to ensure that our clients have complete end-to-end solutions.
IBM is introducing new "Smart Business Systems" that can be used on-premises for private-cloud configurations, as well as by cloud-computing companies to offer IT as a service.
IBM [Information Archive] is the first to be unveiled, a disk-only or blended disk-and-tape Information Infrastructure solution that offers a "unified storage" approach with amazing flexibility for dealing with various archive requirements:
For those with applications using the IBM Tivoli Storage Manager (TSM) or IBM System Storage Archive Manager (SSAM) API of the IBM System Storage DR550 data retention solution, the Information Archive will provide a direct migration, supporting this API for existing applications.
For those with IBM N series using SnapLock or the File System Gateway of the DR550, the Information Archive will support various NAS protocols, deployed in stages, including NFS, CIFS, HTTP and FTP access, with Non-Erasable, Non-Rewriteable (NENR) enforcement that are compatible with current IBM N series SnapLock usage.
For those using NAS devices with PACS applications to store X-rays and other medical images, the Information Archive will provide similar NAS protocol interfaces. Information Archive will support both read-only data such as X-rays, as well as read/write data such as Electronic Medical Records.
Information Archive is not just for compliance data that was previously sent to WORM optical media. Instead, it can handle all kinds of data, rewriteable data, read-only data, and data that needs to be locked down for tamper protection. It can handle structured databases, emails, videos and unstructured files, as well as objects stored through the SSAM API.
The Information Archive has all the server, storage and software integrated together into a single machine type/model number. It is based on IBM's General Parallel File System (GPFS) to provide incredible scalability, the same clustered file system used by many of the top 500 supercomputers. Initially, Information Archive will support up to 304TB raw capacity of disk and Petabytes of tape. You can read the [Spec Sheet] for other technical details.
For those who prefer a more "customized" approach, similar to IBM Scale-Out File Services (SoFS), IBM has [Smart Business Storage Cloud]. IBM Global Services can customize a solution that is best for you, using many of the same technologies. In fact, IBM Global Services announced a variety of new cloud-computing services to help enterprises determine the best approach.
In a related announcement, IBM announced [LotusLive iNotes], which you can think of as a "business-ready" version of Google's GoogleApps, Gmail and GoogleCalendar. IBM is focused on security and reliability but leaves out the advertising and data mining that people have been forced to tolerate from consumer-oriented Web 2.0-based solutions. IBM's clients that are already familiar with on-premises version of Lotus Notes will have no trouble using LotusLive iNotes.
There was actually a lot more announced today, which I will try to get to in later posts.
While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.
"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this." -- Hu Yoshida
Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.
First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.
Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.
IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.
IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.
IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.
IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.
This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.
IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.
IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.
Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).
IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.
The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).
Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.
IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.
So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.
Every year, I teach hundreds of sellers how to sell IBM storage products. I have been doing this since the late 1990s, and it is one task that has carried forward from one job to another as I transitioned through various roles from development, to marketing, to consulting.
This week, I am in the city of Taipei [Taipei] to teach Top Gun sales class, part of IBM's [Sales Training] curriculum. This is only my second time here on the island of Taiwan.
As you can see from this photo, Taipei is a large city with just row after row of buildings. The metropolitan area has about seven million people, and I saw lots of construction for more on my ride in from the airport.
The student body consists of IBM Business Partners and field sales reps eager to learn how to become better sellers. Typically, some of the students might have just been hired on, just finished IBM Sales School, a few might have transferred from selling other product lines, while others are established storage sellers looking for a refresher on the latest solutions and technologies.
I am part of the teach team comprised of seven instructors from different countries. Here is what the week entails for me:
Monday - I will present "Selling Scale-Out NAS Solutions" that covers the IBM SONAS appliance and gateway configurations, and be part of a panel discussion on Disk with several other experts.
Tuesday - I have two topics, "Selling Disk Virtualization Solutions" and "Selling Unified Storage Solutions", which cover the IBM SAN Volume Controller (SVC), Storwize V7000 and Storwize V7000 Unified products.
Wednesday - I will explain how to position and sell IBM products against the competition.
Thursday - I will present "Selling Infrastructure Management Solutions" and "Selling Unified Recovery Management Solutions", which focus on the IBM Tivoli Storage portfolio, including Tivoli Storage Productivity Center, Tivoli Storage Manager (TSM), and Tivoli Storage FlashCopy Manager (FCM). The day ends with the dreaded "Final Exam".
Friday - The students will present their "Team Value Workshop" presentations, and the class concludes with a formal graduation ceremony for the subset of students who pass. A few outstanding students will be honored with "Top Gun" status.
These are the solution areas I present most often as a consultant at the IBM Executive Briefing Center in Tucson, so I can provide real-life stories of different client situations to help illustrate my examples.
The weather here in Taipei calls for rain every day! I was able to take this photo on Sunday morning while it was still nice and clear, but later in the afternoon, we had quite the downpour. I am glad I brought my raincoat!
Well, it's Tuesday again, and that means more announcements from IBM!
In conjunction with IBM's new [System z10 Business Class (BC)] mainframe designed for Small and Medium-sized Businesses (SMB), IBM also announced related storage productenhancements.
Yes, it's alive! Contrary to the FUD you might have read from our competitors, IBM continues to sell thousands and thousands of IBM System Storage DS6800 disk systems, and now enhances them with the optionfor 450GB 15K RPM drives. What is nice about these 450GB drives is that they are as fast or faster* than 300GBdrives, so the typical trade-off between performance and capacity do not apply.
(* I compared Seagate 15.6K (450GB) with 15.5K (300GB) models.
Avg Seek time (Read)
Avg Seek time (Write)
Full Seek time (Read)
Full Seek time (Write)
This may or may not result in application performance improvements, depending on workload pattern. Your mileage may vary.)
Our clients report back that these are incredibly stable systems that they don't have toworry about. This enhancement applies to both the [511/EX1 models] and [522/EX2 models].
Understanding that clients want complete solutions from single vendors, IBM offers synergy between System z and the IBM System Storage DS8000 disk systems. The latest R4.1 microcode upgrade offers two key features onthe various models [2107,
zHPF - High Performance FICON for System z. IBM was able to increase the throughput on 4 Gbps links. For OLTP workloads randomly accessing 4KB blocks, IBM internal tests showed zHPF doubled performance from 13,000 IOPSto 26,000 IOPS per channel. For sequential workloads, such as batch processing, zHPF increased performance 50 percent, from 350 MB/sec to 525 MB/sec.
In February, IBM previewed[IncrementalResync] for z/OS Metro Global Mirror. However, some concepts are better explained with pictures.
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z, you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and also have site "A" sending data over to site "C" asynchronously using z/OS Global Mirror. This is called "z/OS Metro Global Mirror".
In the past, if the disk system in site A failed, you would switch over to site B, which would have to resend send all the data again to site C to be resynchronized. This is because site B was not tracking what the System Data Mover (SDM) reader had or had not yet processed.
With DS8000 4.1, the "incremental resync" function that, along with using IBM HyperSwap, requires site B to only send and resync the data that was in-flight when the outage occurred. When you compare the difference in sending this limited amount of in-flight data with the traditional complete volume of data, you can see how "Incremental Resync" can resynchronize the data 95% faster, and also greatly decrease your bandwidth requirements. This reduces the risk in case a subsequent outage occurs.
Introduced originally in 1997 as the IBM Virtual Tape Server (VTS), the [IBMSystem Storage TS7700] series supports Grid capabilityto replicate tape image data across locations. Here's a quick recap of today's announcement:
Existing TS7740 can be upgraded up to 9TB of disk cache. New models can have up to 13TB of disk cache.
A new "tape-less" TS7720 that has up to 70TB of disk cache.
Integrate Library Management support. I discussed[IntegratedRemovable Media Manager (IRMM)] before, and this is basically IRMM inside. For those with TS3500 tape libraries,this support eliminates the need for a separate IBM 3953 L05 Library Manager.
TS1130 back-end tape drive support. These are the fastest 1TB drives in the industry, with support of built-in encryption, and now can be used asthe physical tape back-end for the virtual tape TS7740 repository.
While our competitors might be boarding up their windows in preparation for the economic downturn in the USAeconomy, IBM remains generating solid results. San Jose Mercury News has an article that discusses this titled[IBM's 3Q profit strong on global sales].There has never been a better time to buy from, or invest in, IBM!
EMC Corporation (NYSE:EMC) today announced it has been positioned as a leader in the Forrester Wave™: Enterprise Open Systems Virtual Tape Library (VTL), Q1 2008 by Forrester Research, Inc. (January 31, 2008), an independent market and technology research firm. EMC achieved a position as a leader in the Forrester Wave report on virtual tape libraries based on the largest installed base of the EMC® Disk Library family of systems, its broad ecosystem interoperability. Virtual tape libraries emulate tape drives and work in conjunction with existing backup software applications, enabling fast backup and restoration of data by using high-capacity, low-cost disk drives.
EMC was the first major vendor in the open systems virtual tape library market as it introduced the EMC Disk Library in April 2004 and today is a leading provider of open systems virtual tape solutions, with systems that are designed for businesses and organizations of all sizes.
While the press release implies that "EDL equals VTL", Chuck tries to explain they are in fact very different. Here is an excerpt from his blog post:
Virtual Tape Libraries vs. Disk Libraries
As many of you know, VTLs have been around for a while. They use disk as a cache -- they buffer the incoming backup streams, do some housekeeping and stacking, then turn around and write tape efficiently. When you go to restore, you're usually coming back off of tape, unless the backup image in question is sitting in the disk cache.
Now, there is nothing wrong with the VTL approach, but it was conceived in a time when disks were horribly expensive. It was also pretty clear to many of us that disks were going to be a whole lot cheaper in the near future, and this fundamental assumption wouldn't be valid for much longer.
I kept thinking in terms of disk as a direct target for a backup application. No modifications to the backup application. Native speed of sequential disks for both backup and restore. Tape positioned as a backup to the backup. Use the strengths of the underlying array (e.g. CLARiiON) for performance, availability, management, etc.
We ended up calling the concept a "disk library" to differentiate from the VTLs that had come before it. It was a different value proposition and offering, based on the emergence of lower-cost disk media.
... It's nice to see we're at 1,100+ customers, and still going strong.
For those new to the blogosphere, there is a difference between "Press Releases" as formalcorporate communications versus "Blog Posts" which are informal opinions of the individual blogger, whichmay or may not match exactly the views of their respective employer.As we've learned many times before, one should not treat termslike "first" or "leader" in corporate press releases literally! Let's explore each.
Was EDL the first "open systems" Virtual Tape Library?
This is implied by the Forrester report. Chuck mentions the "VTLs that had came before it" in his blog, and many people are aware that IBM and StorageTek had introduced mainframe-attached VTLs in the 1990s. But what about VTL for "open systems"?
(Hold aside for the moment that IBM System zmainframe is an open system itself, with z/OS certified as a bona fide UNIX operating system by the [the Open Group] standards body. Most analysts and research firms usually refer only to the non-mainframe versions of UNIX and Windows. Alternative definitions for "open systems" can be foundin [Web definitions or Wikipedia]. I will assume Forrester meantnon-mainframe servers.)
IBM announced AIX non-mainframe attachment via SCSI connectivity to the IBM 3494 Virtual Tape Server (VTS) on Feb 16, 1999, with general availability in May 28, 1999. That's nearly FIVE YEARS before the April 2004 introduction of EDL. IBM VTS support for Sun Solaris and Microsoft Windows came shortly thereafter in November 2000, and support for HP-UX a bit later in June 2001. One of my 17 patents is for the software inside the IBM 3494 VTS, so like Chuck, I can takesome pride in the success of a successful product.
(I don't remember if StorageTek, which was subsequently acquired by Sun, had ever supported non-mainframe operating systems with their Virtual Storage Manager[VSM] offering, but if they did, I am sure it was also before EMC.)
Last week, another EMC blogger, BarryB (aka [the Storage Anarchist]),took me to task in comments on my post [IBM now supports 1TB SATA drives]. He felt that IBM should not claim support, given that the software inside the IBM System Storage N series is developed by NetApp. He compared this to the situation of HP and Sun re-badging the HDS USP-V disk system. If someone else wrote the software, BarryB opines, IBM should not claim credit for it. I tried to explain how IBM provides added value and has full-time employees dedicated to N series development and support, butdoubt I have changed his mind.
Why do I bring that up? Because the EMC Disk Library runs OEM software from FalconStor. Basically EMC is assembling a hardware/software solution with components provided from OEM suppliers. Hmmm? Sound familiar? Who is calling the kettle black?
If there is a clear winner here, it is FalconStor itself.Perhaps one of the worst kept industry secrets is that FalconStor software is also used in VTL offerings from Sun, Copan, and IBM, the latter embodied as the [IBM TS7520 Virtualization Engine] offering. If you like the concept of an EDL,but prefer instead one-stop shopping from an "information infrastructure" vendor, IBM can offer the TS7520 along with servers, software and services for a complete end-to-end solution.
Can EMC claim to be "a leader" in Virtual Tape Libraries?
During the measured quarter, IBM shipped its 10 millionth LTO-4 tape drive cartridge to Getty Images, the world's leading creator and distributor of still imagery, footage and multi-media products, as well as a recognized provider of other forms of premium digital content, including music. Getty Images is using the LTO-4 drives as part of a tiered infrastructure of IBM disk and tape solutions that help support the backup needs of their digital imagery;
IBM shipped more than 1,500 Petabytes of tape storage in Q3'07 alone;
During Q3'07, IBM shipped the 10,000th IBM System Storage TS3500 Tape Library. The TS3500 is a highly scalable tape library with support from 1 to 192 tape drives and up to 6,400 cartridge slots for open system, mainframe and virtual tape system attachment.
Let's take a look at the numbers. IBM has sold over 5,400 virtual tape libraries. Sun/STK has sold over 4,000 virtual tape libraries. Both are drastically more than the 1,100 mentioned in Chuck's post. Does IDC recognize EMC in third place? No, EMC chooses instead to declare EDL as disk arrays (probably toprop up their IDC "Disk Tracker" numbers), so they don't even earn an honorable mention under the virtual tape librarycategory. This of course includes the number of mainframe-attached models from IBM and Sun/STK. So, if EMC did call these tape systems instead, they might showup in third place, and as such EMC could claim to be "a leader" in much the same way an athlete can claim to be an "Olympic medalist" winning the bronze for third place. (If you limit thecount to just the FalconStor-based models from IBM, EMC, Sun and Copan, then EMC moves up to first or second, but then press release titles like "EMC a Leader in FalconStor-based non-mainframe Virtual Tape Libraries" can get too confusing.)
Chuck, if you are reading this, I feel you have every right to celebrate your involvement with the EDL. Despite having common software and hardware components, both IBM and EMC can rightfully declare their own unique value-add through their respective VTL offerings. Like the IBM N series, the EMC Disk Library is not diminished by the fact the software was written by someone else. BarryB might disagree.
You may not be the right person to ask but I am asking everyone so "How do you see hybrid disk drives?"
(For the record, I am not immediately related to Robert. At onepoint, "Pearson" was the 12th most common surname in the USA, but now doesn't even make the Top 100.)
Robert, I would like to encourage you and everyone else to ask questions, don't worry if I am the wrong person to ask, asprobably I know the right person within IBM. Some people have called me the "Kevin Bacon" of Storage,as I am often less than six degrees away from the right person, having worked in IBM Storage for over 20 years.
For those not familiar with hybrid drives, there is a good write-up in Wikipedia.
Unfortunately, most of the people I would consult on this question, such as those from Market Intelligence or Research, are on vacation for the holidays, so, Robert, I will have to rely on my trusted 78-card Tarot deck and answer you with a five-card throw.
Your first card, Robert, is the Hermit. This card represents "introspection". The best I/O is no I/O, which means that if applications can keep the information they need inside server memory, you can avoid the bus bandwidth limitations to going to external storage devices. Where external storage makes sense is when data is shared between servers, or when the single server is limited to a set amount of internal memory. So, consider maxing out the memory in your server first (IBM would be glad to sell you more internal memory!!!), then consider outside solid-state or hybrid devices. Windows for example has an architectural limit of 4GB.
Your second card, Robert, is the Four of Cups, representing "apathy".On the card, you see three cups together, with the fourth cup being delivered from a cloud. This reminds me thatwe have three storage tiers already (memory,disk,tape), and introducing a fourth tier into the mix may not garnermuch excitement. For the mainframe, IBM introduced a Solid-State Device, call the Coupling Facility, which can be accessed from multipleSystem z servers. It is used heavily by DFSMS and DB2 to hold shared information. However, given some customer's apathytowards Information Lifecycle Management which includes "tiered storage", introducing yet another tier that forcespeople to decide what data goes where may be another challenge.
Your third card, Robert, is the Chariot, which represents "Speed, Determination,and Will". In some cases, solid state disk are faster for reading, but can be slower for writing. In the case of ahybrid drive, where the memory acts as a front-end cache, read-hits would be faster, but read-misses might be slower.While the idea of stopping the drives during inactivity will reduce power consumption, spinning up and slowing downthe disk may incur additional performance penalties. At the time of this post, the fastest disk system remains the IBM SAN Volume Controller, based on SPC-1 and SPC-2 benchmarks in excess of those published for other devices.
Your fourth card, Robert, is the Eight of Pentacles, which represents"Diligence, Hard work". The pentacles are coins with five-sided stars on them, and this often represents money.Our research team has projected that spinning disk will continue to be a viable and profitable storage media for at least anothereight years.
Your fifth and last card, Robert, is the World, which normallyrepresents "Accomplishment", but since it is turned upside down, the meaning is reversed to "Limitation". Some Hybriddisks, and some types of solid state memory in general, do have limitations in the number of write cycles they can handle. For thoseunhappy with the frequency and slowness for rebuilds on SATA disk may find similar problems with hybrid drives.For that reason, businesses may not trust using hybrid drives for their busiest, mission-critical applications, but certainlymight use it for archive data with lower write-cycle requirements.
The tarot cards are never wrong, but certainly interpretations of the cards can be.
Continuing my ongoing discussion on Solid State Disk (SSD), fellow blogger BarryB (EMC) points out in his [latest post]:
Oh – and for the record TonyP, I don't think I ever said EMC was using a newer or different EFDs than IBM. I just asserted that EMC knows more than IBM about these EFDs and how they actually work a storage array under real-world workloads.
(Here "EFD" is refers to "Enterprise Flash Drive", EMC's marketing term for Single Layer Cell (SLC) NAND Flash non-volatile solid-state storage devices. Both IBM and EMC have been selling solid-state storage for quite some time now, but EMC felt that a new term was required to distinguish the SLC NAND Flash devices sold in their disk systems from solid-state devices sold in laptops or blade servers. The rest of the industry, including IBM, continues to use the term SSD to refer to these same SLC NAND Flash devices that EMC is referring to.)
Although STEC asserts that IBM is using the latest ZeusIOPS drives, IBM is only offering the 73GB and 146GB STEC drives (EMC is shipping the latest ZeusIOPS drives in 200GB and 400GB capacities for DMX4 and V-Max, affording customers a lower $/GB, higher density and lower power/footprint per usable GB.)
Here is where I enjoy the subtleties between marketing and engineering. Does the above seem like he is saying EMC is using newer or different drives? What are typical readers expected to infer from the statement above?
That there are four different drives from STEC, in four different capacities. In the HDD world, drives of different capacities are often different, and larger capacities are often newer than those of smaller capacities.
That the 200GB and 400GB are the latest drives, and that 73GB and 146GB drives are not the latest.
That STEC press release is making false or misleading claims.
Uncontested, some readers might infer the above and come to the wrong conclusions. I made an effort to set the record straight. I'll summarize with a simple table:
Usable (conservative format)
Usable (aggressive format)
So, we all agree now that the 256GB drives that are formatted as 146GB or 200GB are in fact the same drives, that IBM and EMC both sell the latest drives offered by STEC, and that the STEC press release was in fact correct in its claims.
I also wanted to emphasize that IBM chose the more conservative format on purpose. BarryB [did the math himself] and proved my key points:
Under some write-intensive workloads, an aggressive format may not last the full five years. (But don't worry, BarryB assures us that EMC monitors these drives and replaces them when they fail within the five years under their warranty program.)
Conservative formats with double the spare capacity happen to have roughly double the life expectancy.
I agree with BarryB that an aggressive format can offer a lower $/GB than the conservative format. Cost-conscious consumers often look for less-expensive alternatives, and are often willing to accept less-reliable or shorter life expectancy as a trade-off. However, "cost-conscious" is not the typical EMC targeted customer, who often pay a premiumfor the EMC label. To compensate, EMC offers RAID-6 and RAID-10 configurations to provide added protection. With a conservative format, RAID-5 provides sufficient protection.
(Just so BarryB won't accuse me of not doing my own math, a 7+P RAID-5 using conservative format 146GB drives would provide 1022GB of capacity, versus 4+4 RAID-10 configuration using aggressive format 200GB drives only 800GB total.)
In an ideal world, you the consumer would know exactly how many IOPS your application will generate over the next five years, exactly how much capacity you will require, be offered all three drives in either format to choose from, and make a smart business decision. Nothing, however, is ever this simple in IT.
"EqualLogic didn’t get 2,000 customers because people were dying to use iSCSI. It got them because it built systems that scale dynamically and because a system the size of Montana can be managed by someone as clueless as my ex-wife."
As with any acquisition, people might be asking if this is a "match made in heaven" that makes strong business sense,or another HP-Compaq debacle. Back in September, I posted [Supermarkets and Specialty Shops] to explain how the storage marketplace has two market segments. Internally, IBM distinguishesbetween "clients" and "customers". Clients are those that buy services and complete solutions from a one-stop systems vendor, such as IBM, HP, Sun, or Dell, or systems integrator like IBM, CSC or EDS. Customers are those that buy products and components, from the systems vendors I just mentioned, as well as from individual specialty shops, like EMC, HDS, or NetApp.
To reach the growing "supermarket" segment, specialty shops are dependent on systems vendors to OEM or resell their kit: EMC disk through Dell, HDS disk through Sun and HP, NetApp through IBM. Until now, EqualLogichad to make their living as a "specialty" shop, but iSCSI appeals more to SMB than large enterprises, andSMB tend to be in the "supermarket" segment, so they partnered with Sun. Here is the timeline of this likely awkwardand strained relationship:
I am not surprised that I haven't seen anything in the blogosphere yet from HP, Dell or Sun. I suspect this news meansthat Sun won't be reselling Dell's EqualLogic boxes anymore, and perhaps there is nothing more for Sun bloggers Randy Chalfant or Nigel Dessau to add to that. HP and Dell are practically non-existent in the storage blogosphere, so I didn't expect much from them either.
I did, however, expect EMC to put in their spin, given that Dell resells EMC disk, and accounts for perhaps 15% of their revenues.Now that Dell has multiple offerings, they will be instructing their channel reps when to lead with EqualLogic versus when to sell EMC, for now, until 2011, at which point may simplify their storage sales model to just EqualLogic. I don't know if Dell would do that in 2011. Depending on how quick the decline happens, EMC may have to increase the pricesof their gear, or cut into their development budgets, to make up for this loss.
I started this post because of a comment from EMC blogger Chuck Hollis, who speculates how this will impact[Dell, EqualLogic and EMC].In that post, he expresses his opinion (which I will put into a different color):
"Speculation is pretty evenly split. Neither HP nor IBM have a good, entry-level iSCSI product."
If he had left out the word "good", then that would just be a false statement, but by adding the word "good" reduces this to merely an opinion of IBM products that I disagree with. (I have no experience with whateverHP sells in this category, nor talked to any customers about their experiences, so will neither agree nordisagree with Chuck's opinion of the HP half of his statement). As for the term "Entry-level", this is fairly well defined by analysts as a storage system under $50,000 US Dollars. Actually, IBM has three good offerings.
Our basic, lowest-price model is the IBM System Storage DS3300, which does iSCSI only, like the EqualLogic offerings. This supports both SAS and SATA disks, and can attach to our System x and System p server product lines.
Our smallest model of our fancier IBM System Storage N series not only supports iSCSI, but also CIFS, NFS,HTTP, FTP, and FCP protocols, what we call "Unified Storage". The iSCSI feature is included at no additional charge, and small customers can start with this, then scale up to larger N3600, N5000 or N7000 models, andadd more protocols and software features, as their business grows.
Our next larger model, but still entry-level, is the N3600. Since the N series supports a unified multi-protocolplatform, with features like SnapLock for regulatory compliance and SnapMirror for remote disk mirroring. The IBM System Storage N series easily replaces any mix of EMC "C-boxes": Centera, Celerra, and CLARiiON.
Both the DS3300 and the N series support the various Business Applications I have discussed this week, Microsoft Exchange, Lotus Domino, SAP, Oracle, Siebel, JD Edwards and PeopleSoft. N series offers SnapManager for variousapplications to make the business value even that much better.
Chuck speculates that Dell did this to compete better against rival HP, but that doesn't make sense, sincehe feels HP didn't have much to offer in this space. Perhaps Dell did this to competebetter against IBM, the number one vendor in storage hardware, according to IDC. Looking at what IBM andNetApp have to offer, Dell may have realized that they didn't have competitive disk systems from their resellingrelationship with EMC, looked elsewhere and found EqualLogic. Meanwhile, EqualLogic probably felt that Sun wasgoing out of business, or not yet fully supportive of IP SAN environments, and decided to ["switch horses midstream"].
Continuing my week in Chicago for the IBM Storage and Storage Networking Symposium and System x and BladeCenter Technical Conference, I presented a variety of topics.
Hybrid Storage for a Green Data Center
The cost of power and cooling has risen to be a #1 concern among data centers. I presented the following hybrid storage solutions that combine disk with tape. These provide the best of both worlds, the high performance access time of disk with the lower costs and reduced energy consumption of tape.
IBM [System Storage DR550] - IBM's Non-erasable, Non-rewriteable (NENR) storage for archive and compliance data retention
IBM Grid Medical Archive Solution [GMAS] - IBM's multi-site grid storage for PACS applications and electronic medical records[EMR]
IBM Scale-out File Services [SoFS] - IBM's scalable NAS solution that combines a global name space with a clustered GPFS file system, serving as the ideal basis for IBM's own[Cloud Computing and Storage] offerings
Not only do these help reduce energy costs, they provide an overall lower total cost of ownership (TCO) thantraditional WORM optical or disk-only storage configurations.
The Convergence of Networks - Understanding SAN, NAS and iSCSI in the Data Center Network
This turned out to be my most popular session. Many companies are at a crossroads in choosing data and storage networking solutions in light of recent announcements from IBM and others. In the span of 75 minutes, I covered:
Block storage concepts, storage virtualization and RAID levels
File system concepts, how file systems map files to block storage
Network Attach Storage, the history of the NFS and CIFS protocols, Pros and Cons of using NAS
Storage Area Networks, the history of SAN protocols including ESCON, FICON and FCP, Pros and Cons of using SAN
IP SAN technologies, iSCSI and Fibre Channel over Ethernet (FCoE), Pros and Cons of using this approach
Network Convergence with Infiniband and Fibre Channel over Convergence Enhanced Ethernet (FCoCEE), why Infiniband was not adopted historically in the marketplace as a storage protocol, and the features and enhancements of Convergence Enhanced Ethernet (CEE) needed to merge NAS, SAN and iSCSI traffic onto a single converged data center network [DCN]
Yes, it was a lot of information to cover, but I managed to get it done on time.
IBM Tivoli Storage Productivity Center version 4.1 Overview and Update
In conferences like these, there are two types of product-level presentations. An "Overview" explains howproducts work today to those who are not familiar with it. An "Update" explains what's new in this version of the product for those who are already familiar with previous releases. I decided to combine these into one sessionfor IBM's new version of [Tivoli Storage Productivity Center].I was one of the original lead architects of this product many years ago, and was able to share many personalexperiences about its evolution in development and in the field at client facilities.Analysts have repeatedly rated IBM Productivity Center as one of the top Storage Resource Management (SRM) tools available in the marketplace.
Information Lifecycle Management (ILM) Overview
Can you believe I have been doing ILM since 1986? I was the lead architect for DFSMS which provides ILM support for z/OS mainframes. In 2003-2005, I spent 18 months in the field performingILM assessments for clients, and now there are dozens of IBM practitioners in Global Technology Services andSTG Lab Services that do this full time. This is a topic I cover frequently at the IBM Executive Briefing Center[EBC], because it addressesseveral top business challenges:
Reducing costs and simplifying management
Improving efficiency of personnel and application workloads
Managing risks and regulatory compliance
IBM has a solution based on five "entry points". The advantage of this approach is that it allows our consultants to craft the right solution to meet the specific requirements of each client situation. These entry points are:
Tiered Information Infrastructure - we don't limit ourselves to just "Tiered Storage" as storage is only part of a complete[information infrastructure] of servers,networks and storage
Storage Optimization and Virtualization - including virtual disk, virtual tape and virtual file solutions
Process Enhancement and Automation - an important part of ILM are the policies and procedures, such as IT Infrastructure Library [ITIL] best practices
Archive and Retention - space management and data retention solutions for email, database and file systems
I did not get as many attendees as I had hoped for this last one, as I was competing head-to-head in the same time slot as Lee La Frese covering IBM's DS8000 performance with Solid State Disk (SSD) drives, John Sing covering Cloud Computing and Storage with SoFS, and Eric Kern covering IBM Cloudburst.
I am glad that I was able to make all of my presentations at the beginning of the week, so that I can then sit back and enjoy the rest of the sessions as a pure attendee.
Last week, I was in Austin, and had dinner at [Rudy's Country Store and BBQ]. They offer their self-proclaimed "Worst BBQ in Austin!" with brisket, sausage and other meats by weight. I got a beer, some potato salad, and creamed corn, all at additional cost, of course. When I went to the cashier to pay, I was offered all the white bread I wanted at no additional charge. Are you kidding me? You are going to charge me for beer, but give me 8 to 12 complimentary slices of white bread (practically half a loaf)? Honestly, I consider bread and beer to be basically the same functional food item, differing only in solid versus liquid form. I chose to have only four slices. The food was awesome!
I am reminded of that from my latest exchange with EMC.It didn't take long after IBM's announcement yesterday of IBM's continued investment in its strategic product set, IBM System Storage DS8000 series, that competitors responded. In particular, fellow blogger BarryB from EMC has a post [DS8000 Finally Gets Thin Provisioning] that pokes fun at the new Thin Provisioning feature.
Interestingly, the attack is not on the technical implementation, which is straightforward and rock-solid, but rather that the feature is charged at a flat rate of $69,000 US dollars (list price) per disk array. BarryB claims that recently EMC Corporate has decided to reduce the price of their own thin provisioning, called Symmetrix Virtual Provisioning (VP) on select subset of models of their storage portfolio, although I have not found an EMC press release to confirm. In other words, EMC will bury the cost of thin provisioning into the total cost for new sales, and stop shafting, er.. over-charging their existing Symmetrix customers that are interesting in licensing this feature.
BarryB claims this was a lucky coincidence that his blog post happened just days before IBM's announcement.
(Update: While the timing appears suspicious, I am not accusing Mr. Burke in anywrongdoing of insider information of IBM's plans, nor am I aware of any investigations on this matter from the SEC or any other government agency, and apologize if my previous attempt at humor suggested otherwise. BarryB claimsthat the reduction in price was motivated to counter publicly announced HDS's "Switch In On" program, that it is not a secret thatEMC reduced VP pricing weeks ago, effective beginning 3Q09, just not widely advertised in any formal EMC press releases.Perhaps this new VP pricing was only disclosed to just EMC's existing Symmetrix customers, Business Partners, and employees. Perhaps EMC's decision not to announce this in a Press Release was to avoid upsetting all the EMC CLARiiON customers that continue to pay for Thin Provisioning, or to avoid a long line of existing VP customers asking for refunds. In any case, people are innocent until proven otherwise, and BarryB rightfully deserves the presumption of innocence in this regard. I'm sorry, BarryB, for any trouble my previous comments may have caused you.)
Instead, let's explore some events over the past year that have led up to this.
Let's start with what EMC previously charged for this feature. Software features like this often follow a common pricing method, based per TB, so larger configurations pay more, but tiered in a manner that larger configurations pay less per TB, combined with a yearly maintenance cost.
(Updated: EMC has asked me nicely not to post their actual list prices,so I will provide rough estimates instead. According to BarryB, these are no longer the current prices, soI present them as historical figures for comparison purposes only.)
Initial List price
Software Maintenance (SWMA) percentage
Software Maintenance per year
Number of years
Software License Cost (4 years)
Holy cow! How did EMC get away charging so much for this? To be fair, these are often deeply discounted, a practice common among the industry. However, it was easy for IBMers to show EMC customers that putting SVC or N series gateways in front of their existing EMC disks was more cost effective. Both SVC and N series, as well as IBM's XIV, provide thin provisioning at no additional charge.
HDS offers their own thin provisioning called Hitachi Dynamic Provisioning.Hitachi also offers an SVC-like capability to virtualize storage behind the USP-V. However, I suspect thatfewer than 10 percent of their install base actually licensed this capability because it cost so much. Under the cost pressure from IBM's thin provisioning capabilities in SVC, XIV and N series, Hitachi launched its ["Switch It On"] marketing campaign to activate virtualization and provide some features at no additional charge, including the first 10TB of Hitachi Dynamic Provisioning.
Last week, Martin Glassborow on his StorageBod blog, argued that EMC and HDS should[Set the Wide Stripes Free]. Here is an excerpt:
HDS and EMC are both extremely guilty in this regard, both Virtual Provisioning and Dynamic Provisioning cost me extra as an end-user to license. But this is the technology upon which all future block-based storage arrays will be built. If you guys want to improve the TCO and show that you are serious about reducing the complexity to manage your arrays, you will license for free. You will encourage the end-user to break free from the shackles of complexity and you will improve the image of Tier-1 storage in the enterprise.
Martin is using the term "free" in two contexts above. In the Linux community, we are careful to clarify "free, as in free speech" or "free, as in free beer". Technically, EMC's virtual provisioning is neither, as one has to purchase the hardware to get the feature, so the term "at no additional charge" is more legally correct.
However, the discussion of "free beer" brings me back to my first paragraph about Rudy's BBQ. Nearly everyone eats bread, with the exception of those with [Celiac Disease] that causesan intolerance for gluten protein in wheat, so burying the cost of white bread in the base cost of the BBQ meat is reasonable. In contrast, not everyone drinks beer, and there are probably several people whowould complain if the cost of beer was included in the cost of the BBQ meat, so charging separately forbeer makes business sense.
The same applies in the storage industry. When all (or most) customers of a product can benefit from a feature, it makes sense to include it at no additional charge. When a significant subset might not want to pay a higher base price because they won't use or benefit from a feature, it makes sense to make it optionally priced.
For the IBM SVC, XIV and N series, all customers can benefit from thin provisioning, so it is included at no additional charge.
For the IBM System Storage DS8000, perhaps some 30 to 40 percent of our clients have only System z and/or System i servers attached, and therefore would not benefit from this new thin provisioning. It may seem unfair to raise the price on everybody. The $69,000 flat rate was competitively priced against the prices EMC, HDS and 3PAR were charging for similar capability, and lower than the cost to add a new SVC cluster in front of the DS8000. IBM also charges an annual maintenance, but far lower than what others charged as well.
(Note: These list prices are approximate, and vary slightly based on whether you are on legacy, ESA, Servicesuite or ServiceElect software and subscription (S&S) service plans, and the machine type/model. The tables were too complicated to include here in this post, so these numbers are rounded for comparison purposes only.)
IBM flat rate
Software Maintenance per year (approx)
Number of years
Software License Cost (4 years)
Pricing is more art than science. Getting the right pricing structure that appears fair to everyone involved can be a complicated process.
Well, it's Tuesday, and you know what that means? IBM announcements!
Today we had several for the IBM System Storage product line. Here are some of them:
DS8000 gets thinner, leaner and faster
The 4.3 level of microcode for the IBM System Storage DS8000 series disk systems [announced enhancements] for both fixed block architecture (FBA) LUNs and count key data (CKD) volumes.
For FBA LUNs that attach to Linux, UNIX and Windows distributed systems, IBM announced DS8000 Thin Provisioning native support. Of course, many people already had this by putting IBM System Storage SAN Volume Controller (SVC) in front, but now DS8000 clients out there without SVC can also achieve benefits ofthin provisioning. This support also improves quick initialization a whopping 2.6 times faster.
For CKD volumes attached to z/OS on System z mainframes, IBM announced zHPF multitrack support for z/OS 1.9 and above. zHPF provide high performance FICON performance, and can now handle multitrack I/O transfers foreven better performance for zFS, HFS, PDSE, and extended striped data sets.
XIV gets better connected
A lot of XIV[announced enhancements] and preview announcements centered around better connectivity. Here's a run down:
Better host attachment connectivity by beefing up the interface modules that hold the FCP and iSCSI interface cards. XIV disk arrays have 3 to 6 of these in different configurations, and since they manage both their own disks,as well as receive host I/O requests for other disks, are basically doing double-duty.These interface modules can now be ordered as [Dual-CPU] modules.
Better infrastructure management by connecting XIV with the industry standard SMI-S interface to IBM Tivoli Storage Productivity Center. Now, XIV can be part of the single pane of glass console that manages all of your other disk arrays, tape libraries and SAN fabrics.
Better copy services for backups by connecting XIV with IBM Tivoli Storage Manager Advanced Copy Services. TSM for Advanced Copy Services is application aware and can coordinate XIV Snapshots similar to its current support for SVC and DS8000 FlashCopy capabilities.
Better connectivity to security systems by supporting LDAP credentials. Before, you had individual userid and passwords for each XIV, and these were probably different than all the other userid/password combinations you have for every other box on your data center floor. IBM is working on getting all products to support theLightweight Directory Access Protocol, or [LDAP] so that we can reach the nirvana of "single sign-on",one userid/password per administrator for all IT devices in the company.
Better support with flexible warranty periods and non-disruptive code load options.
Better remote copy support by connecting to sites far, far away. IBM previewed that it will provideasynchronous disk mirroring from one XIV to another XIV natively. Before this, XIV's synchronous mirroring was limited to 300km distances. Many of our clients do long distance global mirroring of their XIV today behind an SVC, but again, for those out there that don't yet have an SVC, this can be a reasonable alternative.
TS7650 ProtecTIER data deduplication appliance now offers "no dedupe" option
In what some might consider a surprising move, IBM announced a "no dedupe" licensing option on their premiere deduplication solution, which somewhat reminds me of IBM's NOCOPY option on DS8000 FlashCopy. At first I thought "Are you kidding me?!?!" However, this new license option allows the TS7650 appliance to compete with other virtual tape libraries (VTL) that do not offer deduplication capability on an even playing field. It also allows TS7650 to be used for data that doesn'tdedupe very well, such as seismic recordings, satellite images, or what have you. There are also clients who do not yet feel comfortable to dedupe their financial records for compliance reasons.This option now allows IBM to withdraw from marketing the TS7530 non-dedupe library. Having one technology thatdoes both dedupe and no-dedupe is better than offering two separate libraries based on different technologies.
The ProtecTIER series also announced [IP remote distance replication]. This can be used to replicate virtualtape cartridges in one ProtecTIER over to another ProtecTIER at a remote location. You can decide to replicateall or just a subset of your virtual tapes, and this feature can be used to migrate, merge or split ProtecTIERconfigurations as your needs grow. Before this support, our TS7650G clients replicated the disk repositoryusing native disk array replication technology, such as Global Mirror on the DS8000, but that meant that all data was replicated over to the secondary site. Now, with this new IP replication feature, you can be selective, and replicate only those virtual tapes that are mission critical.
The appliance now supports up to 36TB of disk capacity, and the new "IBM i" operating system on System i servers,formerly known as i5/OS.
GPFS does Windows
IBM's General Parallel File System (GPFS) has the lion's marketshare of file systems used in the [Top 500 Supercomputers]. For a while, it was limited to just Linux and AIX operating system support, but version 3.3 [extends this to Windows 2008 on 64-bit architectures]. GPFS isthe file system used in IBM's Scale-Out File Services, the underlying technology of IBM's Cloud Computing and Storage offerings.
"The murals in restaurants are on par with the food in museums." --- Peter De Vries
The quote above applies to blogs as well. Those about competitive products of which the blogger has little to no hands-on experience tend to be terribly misleading or technically inaccurate. We saw this last month as Sun Microsystems' Jeff Savit tried to discuss the IBM System z10 EC mainframe.
This time, it comes from EMC bloggers discussing NetApp equipment, and by association, IBM System Storage N series gear.I was going to comment on the ridiculous posts by fellow bloggers from EMC about SnapLock compliance feature on the NetApp, but my buddies at NetApp had already done this for me, saving me the trouble.
The hysterical nature of writing from EMC, and the calm responses from NetApp, speak volumes about the culturesof both companies.
The key point is that none of the "Non-erasable, Non-Rewriteable" (NENR) storage out there are certified as compliant by any government agency on the planet. Governments just aren't in the business of certifying such things. The best you can get is a third-party consultant, such as [Cohasset Associates], to help make decisions that are best for each particular situation.
In addition to SnapLock on N series, IBM offers the [IBM System Storage DR550], WORM tape and optical systems, all of which have been deemed compliant to the U.S. Securities and Exchange Commission [SEC 17a-4] federal regulations by Cohasset Associates. For medical patient records and images like X-rays, IBM offers the Grid Medical Archive Solution [GMAS]designed to meet the requirements of the U.S. Health Insurance Portability and Accountability Act[HIPAA].For other government or industry regulations, consult with your legal counsel.
Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
Seagate (including Maxstor they acquired)
Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
I got some interesting queries about IBM's Scale-Out File Services [SoFS] that I mentioned in my post yesterday [Area rugs versus Wall-to-Wall carpeting]. I thought I would provide some additional details of the product.
SoFS combines three key features: a global namespace, a clustered file system, and Information LifecycleManagement (ILM). Let's tackle each one.
Global Name Space
A long time ago, IBM acquired a company called Transarc that developed Andrew File System (AFS) and DistributedFile System (DFS). These both provided global namespace capability, meaning that all of your files could beaccessible from a single URL file tree. Imagine if you have data centers in Tucson, Austin, Raleigh and Chicago.Normally, to access files from each city, you would have to mount a unique IP address for that location, and thento get to files in a different city, you'd have to mount a second, and so on. But with a global namespace, you could mount a single drive letter Z: and access files simply by using Z:/Tucson/abc or Z:/Austin/xyz. IBM uses its DFS to make this happen.
Just because you have access to a global namespace doesn't give you read/write authority to every file. IBM SoFS has full NTFS Access Control List (ACL) support, so that only those who can read or write data can access the files. A "hide unreadable" feature provideswhat I like to call "parental controls": you don't even get to see on your directly list any file or subdirectory that you don't have access to. For example, if there is a directory with 50 projects, but you only have authority tothree projects, then you only see the three subdirectories related to those projects, and nothing else.
There are other ways to get a global namespace. IBM also offers the IBM System Storage N series Virtual FileManager, Brocade offers Storage/X, and F5 acquired Acopia. These all work by putting a box in front of a set ofindependent NAS storage units, and giving you a single mount point to represent all of the file systems managedbehind the scenes. This however can sometimes be a bottleneck for performance.
Clustered File System
Often, when you have a lot of data in one place, you are also expected to deliver that data to lots of clientswith relatively good performance. Otherwise, end users revolt and get their own internal direct attach storage.To solve this, you need a clustered architecture that provides access in parallel to the data.
First, we start with a node that is optimized for CIFS and NFS access. We have clocked our node to run CIFS at577 MB/sec, and NFS at 880 MB/sec, through a 10GbE pipe between a single client and a single SoFS node. Comparethat to the 400 MB/sec you get today with 4Gbps FCP, or the 800 MB/sec you will get if you upgrade to 8 GbpsFCP, and quickly you recognize that this is comparable performance for demanding workloads.
Then, you combine multiple nodes together, and have them all be able to read/write any file in the file system, andfront-end that with a load-balancing Virtual IP address (VIPA) that spreads the requests around, and you've gotyourself a lean and mean machine for accessing data.
In 2005, IBM delivered[ASC Purple] with the world's fastest file system. 1536 nodeswere able to access billions of files in the 2 Petabyte of data. The record of 126 GB/sec access to a single filewas set, and has yet to be beaten by any other vendor since.This same file system is used in SoFS, as well as a variety of other IBM storage offerings.
The back-end storage can be SAS or FC-attached, from the DS3200 to our mighty DS8300 Turbo, as well as ourIBM System Storage DCS9550 and SAN Volume Controller (SVC), and a variety of tape libraries.
Information Lifecycle Management
Lastly, we get to ILM. With SoFS, you can have different tiers of storage, high-speed SAS or FC disk, low-speedFATA and SATA disk, and even tape. Policy-based automation allows you to place any file onto any disk tier whencreated, and other policies can migrate or delete the data trigged by certain threshold, age, or other criteria.The advantage is that this is on a file by file basis, so Z:/Tucson/Project could have a bunch of files, some ofthem on my FC disk, some of them on my SATA, and some on tape. The file path doesn't change when they move, anddifferent files in the same directory can be on different tiers.
Data movement is bi-directional. If you know you will be using a set of files for an upcoming job, say perhapsquarter-end or year-end processing, you can pre-fetch those files from tape and move them to your fastest disk pool.
There is also integrated backup support. Typically, a large NAS environment is difficult to backup. Traditionalmethods take days to scan the directory tree looking for files in need of backup. A single SoFS node can scana billion files in 95 minutes, and 8 nodes in a cluster can scan a billion files in under 15 minutes.
Recovery is even more impressive. When you recover, SoFS brings back the entire directory structure first, withall the file names in place. This would make it appear that all the data is restored, but actually it is still on tape.When you access individual files, it will then drive the recovery of that file, so your applications and end usersbasically determine the priority of the recovery. Traditional methods would wait until every file was restoredbefore letting anyone access the system.
SoFS is part of IBM's [Blue Cloud] initiativethat was launched last November 2007. Of course, IBM isn't the only one competing in this space. HDS has partneredwith BlueArc, HP has acquired PolyServe, and Sun acquired CFS for their Lustre file system. Isilon and Exanet arestart-up companies with some offerings. EMC acquired Rainfinity,and have hinted at a Hulk/Maui project that they might deliver later this year or perhaps in 2009, but by thenmight be a dollar-short and a day-late.
But why wait? IBM SoFS is available today and is orders of magnitude more scalable!
Last July, IBM and EMC traded blog postings over SPC-1 benchmark results. Fellow EMC bloggerChuck Hollis wrote his post [Does Anyone Take The SPC Seriously?]. Here is an excerpt:
I think most storage users have figured this out. We've never done an SPC test, and probably will never do one. Anyone is free, however, to download the SPC code, lash it up to their CLARiiON, and have at it.
I responded with [Getting Under EMC Skin], and then followed up with a series explaining IBM SVC and SPC benchmarks here:
So what is the good news?Yesterday, our friends at NetApp took up Chuck's challenge and posted results on their FAS3040 as well as their EMC CLARiiON devices. IBM sells the FAS3040 under the name IBM System Storage N5300 disk system. Knowing that NetApp maintains excellent performance when it is doing point-in-time copies, NetApp ran both with and without on both boxes. I include DS4700 and DS4800 as well for comparison purposes, but only have them without FlashCopy running.
NetApp FAS3040 (IBM N5300)
NetApp FAS3040 (IBM N5300)
EMC CLARiiON CX3-40
IBM DS4700 Express
EMC CLARiiON CX3-40
One would expect some performance degradation with a box running point-in-time copies at the same time it is reading and writing data, but NetApp/IBM N5300 does not degrade by much, but EMC's drops a significant amount.
So what is the bad news? Last October, I welcomed HDS USP-V to the [Super High-End Club], but now we need to invite Texas Memory Systems as well.In 2006, I posted [Hybrid, Solid State and the future of RAID], and poked fun at Texas Memory Systems using the slogan "World's Fastest Storage", which at the time that honor belonged to IBM SAN Volume Controller instead.The VP of Texas Memory Systems, Woody Hutsell, explained the only reason their solid-state disk system, RAMSAN-320, didn't have faster results is that they didn't have the fastest IBM server to run against it. It may not surprise you that nearly everyone's SPC benchmarks use IBM servers because IBM has the fastest servers as well. I didn't have a million-dollar System p UNIX server to send Woody for this, but it looks like they have finally gotten one, and a new RAMSAN-400 device, as they have posted their latest results.
Texas Memory Systems RAMSAN-400
IBM SAN Volume Controller 4.2
EMC doesn't publish numbers for their Symmetrix box, despite their announcement of faster SSD drives. They claim that SSD drives make their overall disk system performance faster, but without SPC benchmarks, we will never know. If you have a Symmetrix, this YouTube video may help you decide where it belongs:
The weather has warmed up here in Tucson so I started my Spring Cleaning early this year and unearthed from my garage a [Bankers Box] full of floppy diskettes.
IBM invented the floppy disk back in 1971, and continued to make improvements and enhancements through the 1980s and 1990s. It will be one of the many inventions celebrated as part of IBM's Centennial (100-year) anniversary. Here is an example [T-shirt]
IBM needed a way to send out small updates and patches for microcode of devices out in client locations. IBM had drives that could write information, and sent out "read-only" drives to the customer locations to receive these updates. These were flexible plastic circles with a magnetic coating, and placed inside a square paper sleeve. Imagine a floppy disk the size of a piece of standard paper. The 8-inch floppy fit conveniently in a manila envelope, sendable by standard mail, and could hold nearly 80KB of data.
I've been using floppies for the past thirty years. Here's some of my fondest memories:
While still in high school, my friend Franz Kurath and I formed "Pearson Kurath Systems", a software development firm. We wrote computer programs to run on UNIX and Personal Computers for small businesses here in Tucson. Whenever we developed a clever piece of code, a subroutine or procedure, we would save it on a floppy disk and re-use it for our next project. We wrote in the BASIC language, and our databases were simple Comma-Separated-Variable (CSV) flat files.
The 5.25-inch floppies we used could hold 360KB, and were flexible like the 8-inch models. Later versions of these 5.25-inch floppies would be able to hold as much as 1.2MB of data. We would convert single-sided floppies into double-sided ones by cutting out a notch in the outer sleeve. Covering up the notches would mark them as read-only.
The 3.5-inch floppies were introduced with a hard plastic shell, with the selling point that you can slap on a mailing label and postage and send it "as is" without the need for a separate envelope. These new 3.5-inch floppies would carry "HD" for high density 720KB, and double-sided versions could hold 1.44MB of data. The term "diskette" was used to associate these new floppies with [hard-shelled tape cassettes]. Sliding a plastic tab would allow floppies to be marked "read-only". IBM has the patent on this clever invention.
Continuing our computer programming business in college, Franz and I took out a bank loan to buy our first Personal Computer, for over $5000 dollars USD. Until then, we had to use equipment belonging to each client. The banks we went to didn't understand why we needed a computer, and suggested we just track our expenses on traditional green-and-white ledger paper. Back then, peronsal computers were for balancing your checkbook, playing games and organizing your collection of cooking recipies. But for us, it was a production machine. A computer with both 5.25-inch and 3.5-inch drives could copy files from one format to another as needed. The boost in productivity paid for itself within months.
Apple launched its Macintosh computer in 1984, with a built-in 3.5-inch disk drive as standard equipment. Here is a YouTube video of an [astronaut ejecting a floppy disk] from an Apple computer in space.
In my senior year at the University of Arizona, my roommate Dave had borrowed my backpack to hold his lunch for a bike ride. He thought he had taken everything out, but forgot to remove my 3.5-inch floppy diskette containing files for my senior project. By the time he got back, the diskette was covered in banana pulp. I was able to rescue my data by cracking open the plastic outer shell, cleaning the flexible magnetic media in soapy water, placing it back into the plastic shell of a second diskette, and then copied the data off to a third diskette.
After graduating from college, Franz and I went our separate ways. I went to work for IBM, and Franz went to work for [Chiat/Day], the advertising agency famous for the 1984 Macintosh commercial. We still keep in touch through Facebook.
At IBM, I was given a 3270 terminal to do my job, and would not be assigned a personal computer until years later. Once I had a personal computer at home and at work, the floppy diskette became my "briefcase". I could download a file or document at work, take it home, work on it til the wee hours of the morning, and then come back the next morning with the updated effort.
To help prepare me for client visits and public speaking at conferences, IBM loaned me out to local schools to teach. This included teaching Computer Science 101 at Pima Community College. When asked by a student whether to use "disc" or "disk", I wrote a big letter "C" on the left side of the chalkboard, and a big letter "K" on the right side. If it is round, I told the students while pointing at the letter "C", like a CD-ROM or DVD, use "disc". If it has corners, pointing to corners of the letter "K", like a floppy diskette or hard disk drive, use "disk".
On one of my business trips to visit a client, we discovered the client had experienced a problem that we had just recently fixed. Normally, this would have meant cutting a Program Trouble Fix (PTF) to a 3480 tape cartridge at an IBM facility, and send it to the client by mail. Unwilling to wait, I offered to download the PTF onto a floppy diskette on my laptop, upload it from a PC connected to their systems, and apply it there. This involved a bit of REXX programming to deal with the differences between ASCII and EBCDIC character sets, but it worked, and a few hours later they were able to confirm the fix worked.
In 1998, Apple would signal the begining of the end of the floppy disk era, announcing their latest "iMac" would not come with an internal built-in floppy drive. David Adams has a great article on this titled [The iMac and the Floppy Drive: A Conspiracy Theory]. You can get external floppy drives that connect via USB, so not having an internal drive is no longer a big deal.
While teaching a Top Gun class to a mix of software and hardware sales reps, one of the students asked what a "U" was. He had noticed "2U" and "3U" next to various products and wondered what that was referring to. The "U" represents the [standard unit of measure for height of IT equipment in standard racks]. To help them visualize, I explained that a 5.25-inch floppy disk was "3U" in size, and a 3.5-inch floppy diskette was "2U". Thus, a "U" is 1.75 inches, the thinnest dimension on a two-by-four piece of lumber. Servers that were only 1U tall would be referred to as "pizza boxes" for having similar dimensions.
Every year, right around November or so, my friends and family bring me their old computers for me to wipe clean. Either I would re-load them with the latest Ubuntu Linux so that their kids could use it for homework, or I would donate it to charity. Last November, I got a computer that could not boot from a CD-ROM, forcing me to build a bootable floppy. This gave me a chance to check out the various 1-disk and 2-disk versions of Linux and other rescue disks. I also have a 3-disk set of floppies for booting OS/2 in command line mode.
So while this unexpected box of nostalgia derailed my efforts to clean out my garage this weekend, it did inspire me to try to get some of the old files off them and onto my PC hard drive. I have already retrieved some low-res photographs, some emails I sent out, and trip reports I wrote. While floppy diskettes were notorious for being unreliable, and this box of floppies has been in the heat and cold for many Arizonan summers and winters, I am amazed that I was able to read the data off most of them so far, all the way back to data written in 1989. While the data is readable, in most cases I can't render it into useful information. This brings up a few valuable lessons:
Backups are not Archives
Some of the files are in proprietary formats, such as my backups for TurboTax software. I would need a PC running a correct level of Windows operating system, and that particular software, just to restore the data. TurboTax shipped new software every year, and I don't know how forward or backward-compatible each new release was.
Another set of floppies are labeled as being in "FDBACK" format. I have no idea what these are. Each floppy has just two files, "backup.001" and "control.001", for example.
Backups are intended solely to protect against unexpected loss from broken hardware or corrupted data. If you plan to keep data as archives for long-term retention, use archive formats that will last a long time, so that you can make sense of them later.
Operating System Compatibility
Windows 7 and all of my favorite flavors of Linux are able to recognize the standard "FAT" file system that nearly all of my floppies are written in. Sadly, I have some files that were compressed under OS/2 operating system using software called "Stacker". I may have to stand up an OS/2 machine just to check out what is actually on those floppies.
You can't judge a book by its cover
Floppies were a convenient form of data interchange. Sometimes, I reused commercially-labeled floppies to hold personal files. So, just because a floppy says "America On-Line (AOL) version 2.5 Installation", I can't just toss it away. It might actually contain something else entirely. This means I need to mount each floppy to check on its actual contents.
So what will I do with the floppies I can't read, can't write, and can't format? I think I will convert them into a [retro set of coasters], to protect my new living room furniture from hot and cold beverages.
He feels I was unfair to accuse EMC of "proprietary interfaces" without spelling out what I was referring to. Here arejust two, along with the whines we hear from customers that relate to them.
EMC Powerpath multipathing driver
Typical whine: "I just paid a gazillion dollars to renew my annual EMC Powerpath license, so you will have to come back in 12 months with your SVC proposal. I just can't see explaining to my boss that an SVC eliminates the need for EMC Powerpath, throwing away all the good money we just spent on it, or to explain that EMC chooses not to support SVC as one of Powerpath's many supported devices."
EMC SRDF command line interface
Typical whine: "My storage admins have written tons of scripts that all invoke EMC SRDF command line interfacesto manage my disk mirroring environment, and I would hate for them to re-write this to use IBM's (also proprietary) command line interfaces instead."
Certainly BarryB is correct that IBM still has a few remaining "proprietary" items of its own. IBM has been in business over 80 years, but it was only the last 10-15 years that IBM made a strategic shift away from proprietary and over to open standards and interfaces. The transformation to "openness" is not yet complete, but we have made great progress. Take these examples:
The System z mainframe - IBM had opened the interfaces so that both Amdahl and Fujitsu made compatible machines.Unlike Apple which forbids cloning of this nature, IBM is now the single source for mainframes because the other twocompetitors could not keep up with IBM's progress and advancements in technology.
Update: Due to legal reasons, the statements referring to Hercules and other S/390 emulators havebeen removed.
The z/OS operating system - While it is possible to run Linux on the mainframe, most people associate the z/OSoperating system with the mainframe. This was opened up with UNIX System Services to satisfy requests from variousgovernments. It is now a full-fledged UNIX operating system, recognized by the [Open Group] that certifies it as such.
As BarryB alludes, the unique interfaces for disk attachment to System z known as Count-Key-Data (CKD) was published so that both EMC and HDS can offer disk systems to compete with IBM's high-end disk offerings. Linux on System zsupports standard Fibre Channel, allowing you to attach an IBM SVC and anyone's storage. Both z/OS and Linux on System z support NAS storage, so IBM N series, NetApp, even EMC Celerra could be used in that case.
The System i itself is still proprietary, but recently IBM announced that it will now support standard block size (512 bytes) instead of the awkward 528 byte blocks that only IBM and EMC support today. That means that any storage vendor will be ableto sell disk to the System i environment.
Advanced copy services, like FlashCopy and Metro Mirror, are as proprietary as the similar offerings from EMCand HDS, with the exception that IBM has licensed them to both EMC and HDS. Thanks to cross-licensing, you can do [FlashCopy on EMC] equipment. Getting all the storage vendors to agree to open standards for these copy services is still workin progress under [SNIA], but at least people who have coded z/OS JCL batchjobs that invoke FlashCopy utilities can work the same between IBM and EMC equipment.
So for those out there who thought that my comment about EMC's proprietary interfaces in any way implied thatIBM did not have any of its own, the proverbial ["pot calling the kettle black"] so to speak, I apologize.
BarryB shows off his [PhotoShop skills] with the graphic below. I take it as a compliment to be compared to anAll-American icon of business success.
TonyP and Monopoly's Mr. Pennybags Separated at Birth?
However, BarryB meant it as a reference back to long time ago when IBMwas a monopoly of the IT industry, which according to [IBM's History], ended in 1973. In other words, IBMstopped being a monopoly before EMC ever existed as a company, and long before I started working for IBM myself.
The anti-trust lawsuit that BarryB mentions happened in 1969, which forced IBM to separate some of the software from its hardware offerings, and prevented IBM from making various acquisitions for years to follow, forcing IBM instead into technology partnerships. I'm glad that's all behind us now!
We've been quite busy here at the Tucson Executive Briefing Center. I am often asked to explain the relationship between IBM's various storage products. While automakers don't have to explain why they sell sports coupes, pickup trucks and minivans, this analogy does not adequately cover IT storage products. So, I have come up with a new analogy that seems to be a better fit: foundations and flavorings.
All over the world, meals are often comprised of a foundation, perhaps rice, potatoes or pasta, covered with some form of flavoring, sauces, pieces of meat or fish, grated cheese and spices. In Puerto Rico, I had dishes where the foundation was mashed bananas called [plantains]. Sandwich shops often let you pick your choice of bread, the foundation, and then your meats and cheeses, the flavorings.At our local steakhouse,[McMahon's], the menulists a set of steaks, the foundation such as Rib Eye, Filet Mignon, Prime Rib or New York Strip, andvarious flavorings, such as sauces and rubs to cover the steak. Last night, I had the Delmonico steak with the Cristiani sauce consisting of Portobello mushrooms, garlic and aged Romano cheese.
This serves as a useful analogy for IBM's storage strategy. Allowing thefoundations and flavorings to be separately orderable greatly simplifies the selection menu and providesa nearly any-to-any approach to meeting a variety of client needs.Let's take a look at both.
IBM's foundation products are the DS family [DS3000, DS4000, DS5000, DS6000 and DS8000 series], [DS9900 series], and [XIV] for disk, and the TS family [TS1000, TS2000, TS3000] series for tape drives and libraries. In much thesame way you might prefer brown rice instead of white rice, or linguine instead of penne pasta, you might find the attributes of one storagefoundation more attractive based on its performance, scalability and availability features for yourparticular application workloads.
Fellow IBM blogger Barry Whyte discusses SVC at great length on his [Storage Virtualization] blog. Flavoring disk foundation storage with SAN Volume Controller can provide you additionalfeatures and functions, and help improve the scalability, performance or availability characteristics.For example, if you have DS4000, DS8000 and XIV, you might use SVC to provide a consistent methodologyfor asynchronous replication, a form of consistent "flavoring" if you will.
N series Gateways
The [N series gateways] offerflavoring to disk foundation, including unified NAS, iSCSI and FCP protocol host attachment, and application aware capabilities. (As for our IBM N series appliances or "filers", these could be foundational storage behind an SVC, but that's perhaps a topic for another post.)
SoFS provides a global namespace with clustered NAS access to files. This is a blended disk-and-tape solution with built-in backup and Information Lifecycle Management [ILM]. Policies can be used to place different files onto different tiers of storage, automate the movement from tier to tier, including migration to tape, and even expiration when the data is no longer needed.
The [IBM System Storage DR550] provides Non-erasable, Non-rewriteable (NENR) flavoring to storage. While the DR550 comes with internal disk storage, it can front end a tape library filled with WORM cartridges. The DR550 hasbeen paired up with small libraries (TS3200 or TS3310) as well as larger libraries like the TS3500.
The IBM Grid Medical Archive Solution [GMAS] provides a variety of capabilities for storing and accessing medical images, using a blended disk-and-tape approach. This allows hospital and clinicnetworks to provide access for doctors and radiologists from multiple locations.
Many of the flavorings are called "gateways". The IBM TS7650G flavors disk that provides a virtualtape library[VTL] with inline data deduplication capability. Recent performance tests pairing the TS7650G flavoring with XIV foundation storage found this combination to be an excellent match.
Let me know what you think. Does this help you understand IBM's storage strategy and acquisitions? Enteryour comments below.
Well, it's Tuesday, and so it is "announcement day" again! Actually, for me it is Wednesday morning herein Mumbai, India, but since I was "press embargoed" until 4pm EDT in talking about these enhancements, I had to wait until Wednesday morning here to talk about them.
World's Fastest 1TB tape drive
IBM announced its new enterprise [TS1130 tape drive]and corresponding [TS3500 tape library support]. This one has a funny back-story. Last week while we were preparing the Press Release, we debated on whether we should compare the 1TB per cartridge capacity as double that of Sun's Enterprise T10000 (500GB), or LTO-4 (800GB). The problem changed when Sun announced on Monday they too had a 1TB tape drive, so now instead ofsaying that we had the "World's First 1TB tape drive", we quickly changed this to the "World's Fastest 1TB tape drive" instead. At 160MB/sec top speed, IBM's TS1130 is 33 percent faster than Sun's latest announcement. Sun was rather vague when they will actually ship their new units, so IBM may still end up being first to deliver as well.
While EMC and other disk-only vendors have stopped claiming that "tape is dead", these recent announcements from IBM and Sun indicate that indeed tape is alive and well. IBM is able to borrow technologies from disk, such as the Giant Magneto Resistive (GMR) head over to its tape offerings, which means much of the R&D for disk applies to tape, keeping both forms ofstorage well invested. Tape continues to be the "greenest" storage option, more energy efficient than disk, optical, film, microfiche and even paper.
On the LTO front, IBM enhanced the reporting capabilities of its[TS3310] midrange tape library. This includes identifying the resource utilization of the drives, reporting on media integrity, and improved diagnostics to support library-managed encryption.
IBM System Storage DR550
As a blended disk-and-tape solution, the [IBM System Storage DR550] easily replaces the EMC Centera to meet compliance storagerequirements. IBM announced that we have greatly expanded its scalability, being able to support both 1TBdisk drives, as well as being able to attach to either IBM or Sun's 1TB tape drives.
Massive Array of Idle Disks (MAID)
IBM now offers a "Sleep Mode" in the firmware of the [IBM System Storage DCS9550], which is often called "Massive Array of Idle Disks" (MAID) or spin-down capability. This can reduce the amount of power consumed during idle times.
That's a lot of exciting stuff. I'm off to breakfast now.
Two weeks ago, I mentioned in my post [Pulse 2008 - Day 2 Breakout sessions] thatHenk de Ruiter from ABN Amro bank presented his success storyimplementing Information Lifecycle Management (ILM) across hisvarious data centers. I am no stranger to ABN Amro, having helped "ABN" and "Amro" banks merge their mainframe data in 1991. Henk has agreed to let me share with my readers more ofthis success story here on my blog:
Back in December 2005, Henkand his colleagues had come to visit the IBM Tucson ExecutiveBriefing Center (EBC) to hear about IBM products and services. At the time, I was part of our "STG Lab Services" team that performed ILM assessments at client locations. I explained to ABN Amro that the ILM methodology does not requirean all-IBM solution, and that ILM could even provide benefits with their current mix of storage, software and service providers.The ABN Amro team liked what I had to say, andmy team was commissioned to perform ILM assessments atthree of their data centers:
Sao Paulo (Brazil)
Chicago, IL (USA)
Each data center had its own management, its owndecision making, and its own set of issues, so we structuredeach ILM assessment independently. When we presented our results,we showed what each data center could do better with their existing mixed bagof storage, software and service providers, and also showed howmuch better their life would be with IBM storage, software andservices. They agreed to give IBM a chance to prove it, and soa new "Global Storage Study" was launched to take the recommendationsfrom our three ILM studies, and flesh out the details to make aglobally-integrated enterprise work for them. Once completed,it was renamed the "Global Storage Solution" (GSS).
Henk summarized the above with "I am glad to see Tony Pearsonin the audience, who was instrumental to making this all happen."As with many client testimonials, he presented a few charts onwho ABN Amro is today, the 12th largest bank worldwide, 8th largest in Europe. They operate in 53 countries and manage over a trillioneuros in assets.
They have over 20 data centers, with about 7 PB of disk, and over20 PB of tape, both growing at 50 to 70 percent CAGR. About 2/3 of theiroperations are now outsourced to IBM Global Services, the remaining 1/3is non-IBM equipment managed by a different service provider.
ABN Amro deployed IBM TotalStorage Productivity Center, variousIBM System Storage DS family disk systems, SAN Volume Controller (SVC), Tivoli StorageManager (TSM), Tivoli Provisioning Manager (TPM), and several other products. Armed with these products, they performed the following:
Clean Up. IBM uses the term "rationalization" to relate to the assignment of business value, to avoid confusion with theterm "classification" which many in IT relate to identifyingownership, read and write authorization levels. Often, in theinitial phases of an ILM deployment, a portion of the data isdetermined to be eligible for clean up, either to move to a lower-cost tier or deleted immediately. ABN Amro and IBM set a goal to identifyat least 20 percent of their data for clean up.
New tiers. Rather than traditional "storage tiers" which are often justTier 1 for Fibre Channel disk and Tier 2 for SATA disk, ABN Amroand IBM came up with seven "information infrastructure tiers" thatincorporate service levels, availability and protection status.They are:
High-performance, Highly-available disk with Remote replication.
High-performance, Highly-available disk (no remote replication)
Mid-performance, high-capacity disk with Remote replication
Mid-performance, high-capacity disk (no remote replication)
Non-erasable, Non-rewriteable (NENR) storage employinga blended disk and tape solution.
Enterprise Virtual Tape Library with remote replicationand back-end physical tape
Mid-performance physical tape
These tiers are applied equally across their mainframe anddistributed platforms. All of the tiers are priced per "primary GB", so any additional capacity required for replication orpoint-in-time copies, either local or remote, are all folded into the base price.ABN Amro felt a mission-critical applicationon Windows or UNIX deserves the same Tier 1 service level asa mission-critical mainframe application. Exactly!
Deployed storage virtualization for disk and tape. Thisinvolved the SAN Volume Controller and IBM TS7000 series library.
Implemented workflow automation. The key product here is IBM Tivoli Provisioning Manager
Started an investigation for HSM on distributed. This would be policy-based space management to migrate lessfrequently accessed data to the TSM pool for Windows or UNIX data.
While the deployment is not yet complete, ABN Amro feels they have alreadyrecognized business value:
Reduced cost by identifying data that should be stored on lower tiers
Simplified management, consolidated across all operating systems (mainframe, UNIX, Windows)
Increased utilization of existing storage resources
Reduced manual effort through policy-based automation, which can lead to fewer human errors and faster adaptability to new business opportunities
Standardized backup and other operational procedures
Henk and the rest of ABN Amro are quite pleased with the progress so far,although recent developments in terms of the takeover of ABN AMRO by aconsortium of banks means that the model is only implemented so far in Europe. Further rollout depends on the storage strategy of the new owners. Nonetheless,I am glad that I was able to work with Henk, Jason, Barbara, Steve, Tom, Dennis, Craig and othersto be part of this from the beginning and be able to see it rollout successfully over the years.
While EMC bloggers garnered media attention last year pointing out the faulty mathematics from HDS, an astute reader pointed me to EMC's own [DMX-4 specification sheet],updated for its 1TB SATA disk.I've chosen just the minimum and maximum number of drives RAID-6 data points for non-mainframe platforms:
In the first two rows, the numbers appear as expected. For example, 96 drives would be 12 sets of 6+2 RAID ranks, meaning 72 drives' worth of data, so nearly 36TB for 500GB drives, and nearly 72TB for 1TB drives. With 14+2 RAID-6, thenyou would have 84 drives' worth of data, so 42TB and 84TB respectively match expectations.
Where EMC appears miscalculating is having 20x more drives, as the numbers don't match up. For 1920 drives inRAID-6, you would expect 20x more usable capacity than the 96 drive configurations. For 6+2 configurations, one would expect 720TB and 1440TB respectively. For 14+2 configurations, one wouldexpect 840TB and 1680TB, respectively.
Perhaps EMC DMX-4 can't address more than 600TB for the entire system? Does EMC purposely limit the benefitsof these larger drives? It does question why someone might go from 500GB to 1TB drives, if the maximum configuration only gives about 40TB more capacity.Fellow IBM blogger Barry Whyte questioned the use of SATA in an expensive DMX-4 system, in his post[One Box Fits All - Or Does It], and now perhaps there are good reasons to question 1TB from a capacityperspective as well.
Well, it's Tuesday, which means IBM makes its announcements!
This week, IBM announces that it now supports 50GB Solid State Disk (SSD) in its [IBM System Storage EXP3000] disk systems.IBM has already made announcements about SSD enablement in the DS8000 and SAN Volume Controller (SVC), but now the EXP3000 brings SSD technology down to smaller System x server deployments.
Adoption of this new exciting technology is still in the early stages, despite the fact that IBM and other vendors have been touting this technology for a while. (For a quick blast to the past, here was my first post on the subject back from December 20, 2006: [Hybrid, Solid State and the future of RAID])Recently, fellow blogger BarryB admitted that EMC have only sold SSD to [hundreds of their customers], and to be fair, I suspect IBM's sales of SSD in its BladeCenter servers [available since July 2007] have been in similar single-digit percentage territory as well.
The advantage of today's announcement is that you can mix and match SSD drives with SAS and SATA drives in the EXP3000. You won't have to buy the entire drawer of SSD, you can start with just a few, depending on your business needs. On the other extreme, you can have up to two drawers, with 12 SSD drives each, for a total of 24 drives directly attached to System x servers via the ServeRAID MR10M SAS/SATA controller adapter.
Storage Networking World conference is over, and the buzz from the analysts appears to be focused onXiotech's low-cost RAID brick (LCRB) called Intelligent Storage Element, or ISE.
(Full disclosure: I work for IBM, not Xiotech, in case there weren't enough IBM references on this blog page to remindyou of that. I am writing this piece entirely from publicly available sources of information, and notfrom any internal working relationships between IBM and Xiotech. Xiotech is a member of the IBM BladeCenteralliance and our two companies collaborate together in that regard.)
Fellow blogger Jon Toigo in his DrunkenData blog posted [I’m Humming “ISE ISE Baby” this Week] and then a follow-up post[ISE Launches]. I looked up Xiotech's SPC-1benchmark numbers for the Emprise 5000 with both 73GB and 146GB drives, and at 8,202 IOPS per TB, does not seem to be as fast as IBM SAN VolumeControllers 11,354 IOPS per TB. Xiotech offers an impressive 5 year warranty (by comparison, IBM offers up to 4 years, and EMC I think is stillonly 90 days).Jon also wrote a review in [Enterprise Systems]that goes into more detail about the ISE.
Fellow blogger Robin Harris in his StorageMojo blog posted [SNW update - Xiotech’s ISE and the dilithium solution], feeling that Xiotech should win the "Best Announcement at SNW" prize. He points to the cool video on the[Xiotech website]. In that video, they claim 91,000 IOPS.Given that it took forty(40) 73GB drives (or 4 datapacs) in the previous example to get 8,202 IOPS for 1TB usable, I am guessing the 91,000 IOPS is probably 44 datapacs (440 drives) glommed together, representing 11TB usable.The ISE design appears very similar to the "data modules" used in IBM's XIV Nextra system.
Fellow blogger Mark Twomey from EMC in his StorageZilla blog posted[Xiotech: Industry second]correctly points out that Xiotech's 520-byte block (512 bytes plus extra for added integrity) was not the firstin the industry. Mark explains that EMC CLARiiON had this since the early 1990's, and implies in the title that this must have been the first in the industry, making Xiotech an industry second. Sorry Mark, both EMC and Xiotech were late to the game. IBM had been using 520-byte blocksize on its disk since 1980 with the System/38. This system morphed to the AS/400, and the blocksize was bumped up to 522 bytes in 1990, and is now called the System i, where the blocksize was bumped up yet again to 528 bytes in 2007.
While IBM was clever to do this, it actually means fewer choices for our System i clients, being only able to chooseexternal disk systems that explicitly support these non-standard blocksize values, such as the IBM System Storage DS8000and DS6000 series. (Yes, BarryB, IBM still sells the DS6000!) The DS6000 was specifically designed with the System i and smaller System z mainframes in mind, and in that niche does very well. Fortunately, as I mentioned in my February post [Getting off the island - the new i5/OS V6R1], IBM has now used virtualization, in the form of the VIOS logical partition, to allow i5/OS systems to attach to standard 512-byte block devices, greatly expanding the storage choices for our clients.
(Side note: SNW happens twice per year, so the challenge is having something new and fresh to talk about each time. While Andy Monshaw, General Manager of IBM System Storage, highlighted some of the many emerging technologies in his keynote address, IBM shipped on many of them prior to his last appearance in October 2007: thin provisioning in the IBM System Storage N series, deduplication in the IBM System Storage N series Advanced Single Instance Storage (A-SIS) feature, and Solid State Disk (SSD) drives in the IBM BladeCenter HS21-XM models. Of course, not everyone buys IBM gear the first day it is available, and IBM is not the only vendor to offer these technologies. My point is that for many people, these are still not yet deployed in their own data center, and so they are still in the future for them. However, since these IBM deliveries happened more than six months ago, they're old news in the eyes of the SNW attendees. While those who follow IBM closely would know that, others like[Britney Spears] may not.)
Back in the 1990s, when IBM was developing the IBM SAN Volume Controller (SVC), we generically called the managed disk arrays that were being virtualized by the SVC as "low-cost RAID brick" or LCRB. The IBM DS3400 is a good example of this. However, as we learned, SVC is not just for LCRB, it adds value in front of all kinds of disk systems, including the not-so-low-cost EMC DMX and IBM DS8000 disk systems. ISE might make a reasonable back-end managed disk device for IBM SVC to virtualize. This gives you the new cool features of Xiotech's ISE, with IBM SVC's faster performance, more robust functionality and advanced copy services.
Next week, I'll be in South America in meetings with IBM Business Partners and storage sales reps.
Continuing my week in Chicago, for the IBM Storage Symposium 2008, I attended several sessions intended to answer the questions of the audience.
In an effort to be cute, the System x team have a "Meet the xPerts" session at their System x and BladeCenter Technical Conference, so the storage side decided to do the same. Traditionally, these have been called "Birds of a Feature", "Q&A Panel", or "Free-for-All". They allow anyone to throw out a question, and have the experts in the room, either
IBM, Business Partner or another client, answer the question from their experience.
Meet the Experts - Storage for z/OS environments
Here were some of the questions answered:
I've seen terms like "z/OS", "zSeries" and "System z" used interchangeably, can you help clarify what this particular session is about?
IBM's current mainframe servers are all named "System z", such as our System z9 or System z10. These replace the older zSeries models of hardware. z/OS is one of the six operating systems that run on this hardware platform. The other five are z/VM, z/VSE, z/TPF, Linux and OpenSolaris. The focus of this session will be storage attached and used for z/OS specifically, including discussions of Omegamon and DFSMS software products.
What can we do to reduce our MIPS-based software licensing costs from our third party vendors?
Consider using IBM System z Integrated Information Processor
What about 8 Gbps FICON?
IBM has already announced
[FICON Express8] host bus adapter (HBA) cards, that will auto-negotiate to 4Gbps and 2Gbps speeds. If you don't need full 8Gbps speed now, you can
still get the Express8 cards, but put 4/2/1 Gbps SFP ports instead. Currently, LongWave (LW) is only supported to 4km at 8Gbps speed.
I want to use Global Mirror for my DS8100 to my remote DS8100, but also make test copies of my production data to
an older ESS 800 I have locally. Any suggestions? Yes, consider using FlashCopy to simplify this process.
I have Global Mirror (GM) running now successfully with DSCLI, and now want to deploy IBM Tivoli Storage Productivity Center for Replication. Is that possible? Yes, Productivity Center for Replication will detect existing GM relationships, and start managing them.
I have already deployed HyperPAV and zHPF, is there any value in getting Solid-State Drives as well?
HyperPAV and zHPF impact CONN time, but SSD impacts DISC time, so they are mutually complementary.
How should I size my FlashCopy SE pool? SE refers to "Space Efficient", which stores only the changes
between the source and destination copies of each LUN or CKD volume involved. General recommendation is to start with 20 percent and adjust accordingly.
How many RAID ranks should I configure per DS8000 extent pool? IBM recommends 4 to 8 ranks per pool.
Meet the Experts: Storage for Linux, UNIX and Windows distributed systems
This session was focused on storage systems attached to distributed servers, as well as products from Tivoli used to manage them. Here were some of the questions answered:
When we migrated from Tivoli Storage Manager v5 to v6, we lost our favorite "Operational Reporting" tool. How can we get TOR back? You now get the new Tivoli Common Reporting tool.
How can we identify appropriate port distribution for multiple SVC node pairs for load balancing?
IBM Tivoli Storage Productivity Center v4.1 has hot-spot analysis with recommendations for Vdisk migrations.
We tried TotalStorage Productivity Center way back when, but the frequent upgrades were killing us. How has it been lately? It has been much more stable since v3.3, and completely renamed to Tivoli Storage Productivity Center to avoid association with versions 1 and 2 of the predecessor product. The new "lightweight agents" feature of v4.1 resolve many of the problems you were experiencing.
We have over 1600 SVC virtual disks, how do we handle this in IBM Tivoli Storage Productivity Center? Use the Filter capability in combination with clever naming conventions for your virtual disks.
How can we be clever when we are limited to only 15 characters? Ok. We understand.
We are currently using an SSPC with Windows 2003 and 2GB memory, but we are only using the Productivity Center for Replication feature of it. Can we move the DB2 database over to a Windows 2008 server with 4GB of memory?
Consider using the IBM Tivoli Storage Productivity Center for Replication software instead of SSPC for special
circumstances like this.
We love the XIV GUI, how soon will all other IBM storage products have it also? As with every acquisition,
IBM evaluates if there are technologies from new products that can be carried back to existing products.
We are currently using 12 ports on our existing XIV, and love it so much we plan to buy a second frame, but are concerned about consuming another 12 ports on our SAN switch. Any suggestions? Yes, use only six ports per frame. Just because you have more ports, doesn't mean you are required to use them.
We have heard there are concerns from the legal community about using deduplication technology, any ideas how to address that?
Nobody here in the room is a lawyer, and you should consult legal counsel for any particular situation.
None of the IBM offerings intended for non-erasable, non-rewriteable (NENR) data retention records (DR550, WORM tape, N series SnapLock) support dedupe today, and none of IBM's deduplication offerings (TS7650,N series A-SIS,TSM) make any claims for fit-for-purpose for compliance regulatory storage. However, be assured that all of IBM's dedupe technology involves byte-for-byte comparisons so that you never lose any data due to false hash collisions. For all IBM compliance storage, what you write will be read back in the correct sequence of ones and zeros.
Fellow blogger Chuck Hollis from EMC has a post titled[Whither Frankenstorage] causing quite a stir in the [Stor-o-Sphere]. He is not the firstEMC blogger to use this phrase, I credit [BarryB] for coining the term back in September 2008.Frankenstein serves as the ideal icon for EMC's FUD machine. In the novel, Dr. Frankenstein wasattempting to do something nobody else had ever attempted, to create human life from variousdead body parts, a process full of uncertainty and doubt, with frightful results.
Perhaps it was a coincidence that I discussed IBM's storage strategy in my post[Foundations and Flavorings] on January 28, shortly followed by NetApp's announcing V-series gateway [support of Texas Memory Systems' RamSan-500] on February 3. These two events mighthave been the trigger that pushed ChuckH over the edge to put pen to paper, .. finger to keyboard.
Flinging FUD in all directions was ChuckH's not-so-subtle way to remind the world that EMC is the only major storage vendor to not offer a successful storage virtualization product. Withoutfirst-hand experience with well-designed storage virtualization, ChuckH conjectures that a configuration matching intelligent front-ends to reliable back-ends might be more expensive, might be more difficult to manage, or might be harder to support.
(Note: Rest assured, IBM can demonstrate that a modular approach, combining intelligent front-ends to reliableback-ends can help reduce costs, be easier to manage, and be fully supported. Contact yourlocal IBM Business Partner or storage sales rep for details.)
My favorite was from Nigel Poulton's post on[Ruptured Monkey]. Here's an excerpt:
In fact, I'm fairly certain that EMC don't back away from customers who run HP or IBM servers and say "sorry we cant help you here, an end to end HP or IBM solution would be much better for you when it comes to troubleshooting……. putting our storage in would only add extra layers of complexity and make things messy….."
On most other days, ChuckH has well-written, insightful blog posts that show that EMC brings some value to the industry. I could have made a snarky reference to[Dr Jekyll and Mr Hyde], or indicate this post proves that nobody at EMC is editing or reviewingChuck's thoughts before they get posted. But it's too late, Chuck already got the message, and added the following to bring the discussion back to civility:
When considering the broad range of storage media service levels available today (flash, FC, SATA, spin-down, etc.) what's the best way to offer these media choices in an array? Is the answer (a) combine smaller arrays from different vendors together behind a virtualization head, or (b) invest the time and effort to build arrays that can directly support all of these media types?
Would anyone like to try a cogent response to the question posed, please?
To address ChuckH's question, Nigel's post gave me the idea to use today's 200th year celebration of [Charles Darwin].
Over millions of years, Charles Darwin argued, evolution results in change in the inherited traits of a population of organisms from one generation to the next.A key component of this is a biological process called [mitosis] that allows a single cell to split and become two cells. In some cases, these individual daughter cells can then specialize to specific functions, such as nerve cells, muscle cells or bone cells. Over time, adaptations that work well carry forward, and thosethat don't get left behind.
I find it interesting that before [On the Origin of Species] was published in 1859, works of fiction like Mary Shelley's[Frankenstein] had monsters being"created", and afterward, monsters were the result of mutation or selective adaptation.
Nigel compares EMC's monolithic approach to placing an intelligent front-end with a reliable back-end as "One man band, where one guy is trying playing all the instruments himself" versus the "Philharmonic Orchestra". I would take it one step further, comparing single-cell organisms to multi-cell life forms.
Innovative companies like Google and Amazon can't wait for a completely integrated solution from a major IT vendor to meet their needs. Why should they? There are open standards, and ways to interconnect the best intelligence into a [dynamic infrastructure®.].You don't need to wait another million years to see which way the IT marketplace considers the better approach. Just look at the last 60 years. Back then, computer systems were all integrated, server, storage, and the wires that connected them were all inside a huge container. Then, mitosis happened, and IBM created external tape storage in 1952, and external disk storage in 1956. Open standards for interfaces allowed third party manufacturers like HDS, StorageTek and EMC to offer plug-compatible storage devices.
On the server side, it didn't take long for functionality in mainframes to split off. Mitosis happened again, with front-end UNIX systems processing incoming data, and mainframes handling the back-end data bases and printing. The client-server era replaced dumb terminals with more intelligent desktops and workstations, and these could handle the front-end processing to display information, with the back-end storage and number-crunching being handled by the UNIX and mainframe systems they connected to.Connections between desktops and servers, and from servers to storage, have also evolved. From thousands of direct-attach cables to networks of switches and directors.
Charles Darwin was particularly interested in cases where evolution happened faster or slower than in other cases. While IBM and Microsoft encouraged third-party innovations on the PC side, Apple resisted mitosis, trying to keep its machines pure single-cell, integrated solutions.For the same reasons that you can't fight the laws of nature, Apple ended up having to support I/O ports to external devices. Thanks to open standards like USB and Firewire, you can connect third-party storage to Apple computers. My little Mac Mini at home has more devices hanging off it than any of my Windows or Linux boxes! And Apple's iPod is successful because its iTunes software runs on both Windows and Mac OS operating systems.
Every time mitosis happens in the IT industry, it opens up opportunities to specialize, to innovate, to adapt to a dynamically changing world. When mitosis is suppressed, you get limiting products and frustratedengineers leaving to form their own start-up companies.But when mitosis is encouraged, you get successful products, solutions and partnerships positioned for a smarter planet.
On his blog post on preparation, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
Block-based disk storage
You walk to the coat-check desk, tell the attendant to hang your coat on a specific hanger, say hanger number 387. When you come back, you ask for the coat on hanger 387. The coat-check attendant knows exactly where hanger 387 is, and is able to retrieve it quickly. Most disk systems use this approach, including IBM SAN Volume Controller and DS family of disk systems.
Name-based disk storage
You walk to the coat-check desk, tell the person the name that you want to call your coat. An empty hanger is located, and a list of coat names, with their associated hanger number, is then kept. Upon return, you ask for your coat by name, and the coat-check attendant looks up the hanger number to match, and retrieves your coat. This is the scheme used by the IBM System Storage DR550, N series for NAS storage, and the IBM Healthcare and Life Sciences Grid Medical Archive Solution (GMAS).
Content-addressable storage (CAS)
You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a "hash code" from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. Upon return, you provide the hash code you were given, and the coat-check attendant looks up the hanger number to match, and retrieves your coat.This is the scheme used for some non-erasable, non-rewriteable storage, such as the EMC Centera.
IBM invented hash codes in 1953 as a way to speed up searches. For example, if you want to look up a word in the dictionary, knowing the first letter of the word makes it much quicker, because you can thumb directly to that section. A hash code was intended to give a more even distribution, so that if a million words are stored in a "hash code dictionary" then you would calculate the hash code, then look up only that section of words associated with that specific hash code number.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
The title of this post is inspired by Baxter Black's [latest book]. Rathera recap of the break-out sessions, I thought I would comment on a fewsentences, phrases or comments I heard in the afternoon and evening.
Stop buying storage from EMC or NetApp
The lunch was sponsored by Symantec. Rod Soderbery presented "Taking the cost out ofcost savings", explaining some ideas to reduce IT costs immediately.
First, he suggested to "stop buying storage" from EMC or NetApp that charge a premiumfor tier-one products. Instead, Rod suggested that people should "think like a Web company"and buy only storage products based on commodity hardware to save money, and to use SRM software to identify areas of poor storage utilization. IBM's TotalStorage Productivity Center softwareis often used to help with this analysis.
His other suggestions were to adopt thin provisioning, data deduplication, and virtualization.The discussion at my table started with someone asking, "How do we adopt those functions without buying new storage capacity with those features already built-in?" I explained that IBM's SAN Volume Controller (SVC),N series gateways, and TS7650G ProtecTIER virtual tape gateway can all provide one or moreof these features to your existing disk storage capacity.
IBM and HP are leaders in blade servers
In the session "Future of Server and OS: Disappearing Boundaries", the audience confirmedby electronic survey that IBM and HP are the leaders in blade servers, although blades representonly 8-10 percent of the overall server market.
Interestingly, 22 percent of the audience has deployed both x86 and non-x86 (POWER, SPARC, etc.) blade servers.The presenters considered this an interesting insight.
Another survey of the audience found that 3 percent considered Sun/STK as their primary storagevendor. One of the presenters was delighted that Sun is still hanging in there.
IBM Business Partners deliver the best of IBM and mask the worst
Elaine Lennox, IBM VP, and Mark Wyllie, CEO of Flagship Solutions Group, Inc. presentedIBM-sponsored back to back sessions. Elaine presented IBM's vision, the New Enterprise Data Center, and the challenges that demand a smarter planet.
Mark focused on his company's experience working with IBM through Innovation Workshops. Theseare assessments that can help someone identify where you are now, where you want to be, andthen action plans to address the gaps.
Cats and Dogs, Oil and Water, Microsoft Windows and Mission-critical applications, what do all of these have in common?
NEC Corporation of America sponsored some sessions on some x86-based solutions they have to offer.The first part, titled "Rats Nests, Snow Drifts and Trailers" focused unified storage, andthe second part, presented by Michael Nixon, focused on how to bring Microsoft Windows servers into the data center for mission-critical applications.
The Economy might be slowing, but storage is still growing
Two analysts co-presented "The Enterprise Storage Scenario". Unlike computing capacity, thereis no on/off switch for storage, not from applications nor from end-users. The cost ofpower for storage is expected to be 3x by 2013. Virtual servers, includingVMware and Microsoft's Hyper-V will drive the need for shared external disk storage.A survey of the audience found 20 percent were expecting to purchase additional storagecapacity 4Q08.
When someone reaches age 52, they expect to coast the rest of their career
At dinner with analysts, the discussion of financial meltdown and bailouts is unavoidable,including everyone's views about the proposed bailout of the Big 3 automakers. I can'tdefend Ford, GM and Chrysler paying their people $70 US dollars per hour, when their UScounterparts at Toyota or Honda are only paid $45 to $50 dollars per hour.
However, I have a close friend who retired after 20 years working for the fire department,and a cousin who retired after 20 years serving in the Navy (the US Navy, not the BolivianNavy), and both are still in their forties in age. A long time ago, IT professionalsretired after 30 years, in some cases with 50 to 60 percent of their base pay as theirpension for the rest of their lives. A 52-year-old that has worked 30 years might expect to enjoy the rest of his old age playing golf and pursuing other hobbies. This is not "coasting", it is called "retirement". The few of my colleagues that I have seen who worked 35 to 40 years did so becausethey enjoyed the challenge of work at IBM. They enjoyed solving tough engineering problems and helping customers.As long as they were having fun on the job,IBM was glad to keep their wealth of experience on board and actively engaged.
Unfortunately, many people rely on their own investments in the stock market for retirement, ratherthan company pensions. With the current financial crisis, I suspect many people my age arereconsidering their previous retirement plans.
We're going to need more trains!
I took the monorail back to my hotel. The ride includes funny announcements and statistics,including this gem:
"Since 1940, Las Vegas has doubled in population every ten years, which means thatby the year 2230, we will have over 1 trillion people calling Las Vegas home. We're goingto need more trains!"
That wraps up Tuesday, Day 2 of my attendance here! Now for some sleep.
Well, it's Tuesday again, and that means IBM announcements!
We've got a variety of storage-related items today, so here's my quick recap:
DS5020 and EXP520 disk systems
[IBM System Storage DS5020]
provides the functional replacement for DS4700 disk systems. These are combined controller
and 16 drives in a compact 3U package.
The EXP520 expansion drawer provides additional 16 drives per 3U drawer. A DS5020 can
support upo to six additional EXP520, for a total of 112 drives per system.
The DS5020 supports both 8 Gbps FC as well as 1GbE iSCSI.
New Remote Support Manager (DS-RSM model RS2)
The [IBM System Storage DS-RSM Model
RS2] supports of up to 50 disk systems, any mix of DS3000, DS4000 and DS5000 series.
It includes "call home" support, which is really "email home", sending error alerts to IBM
if there are any problems. The RSM also allows IBM to dial-in to perform diagnostics before
arrival, reducing the time needed to resolve a problem. The model RS2 is a beefier model
with more processing power than the prior generation RS1.
New Ethernet Switches
With the increased interest in iSCSI protocol, and the new upcoming Fibre Channel over
Convergence Enhanced Ethernet (FCoCEE), IBM's re-entrance into the ethernet switch market
has drawn a lot of interest.
The [IBM Ethernet Switch r-
series] offers 4-slot, 8-slot, 16-slot, and 32-slot models. Each slot can handle either
16 10GbE ports, or 48 1GbE ports. This means up to 1,536 ports.
The [c-series] now offers a
24-port model. This is either 24 copper and 4 fiber optic, or 24 fiber optic.
The "hybrid fiber" SFP fiber optic can handle either single or multi-mode, eliminating the
need to commit to one or the other, providing greater data center flexibility.
The [IBM Ethernet Switch B24X]
offers 24 fiber optic (that can handle 10GbE or 1GbE) and 4 copper (10/100/1000 MbE RJ45)
Storage Optimization and Integration Services
[IBM Storage Optimization and
Integration Services] are available. IBM service consultants use IBM's own
Storage Enterprise Resource Planner (SERP) software to evaluate your environment and provide
recommendations on how to improve your information infrastructure. This can be especially
helpful if you are looking at deploying server virtualization like VMware or Hyper-V.
As people look towards deploying a dynamic infrastructure, these new offerings can be a
I've blogged about some of these videos already, but since there are probably a few out there buying the brand new Apple iPhone looking for YouTube videos to play on them, these links might provide some exampleentertainment on your new handheld device.
Next week has "Fourth of July" Independence Day holiday in the USA smack in the middle of the week, so I suspect the blogosphereto quiet down a bit. So whether you are working next week or not, in the USA or elsewhere, take some time to enjoy your friends and family.
I'm glad to be back home in Tucson for a few weeks. All of these conferences kept mefrom reading up with what was going on in the blogosphere.
A few of us at IBM found it odd that EMC would announce their new Geographically Dispersed Disaster Restart (GDDR) the weekBEFORE their "EMC World" conference. Why not announce all of the stuff all at once instead at the conference?Were they worried that the admission that "Maui" software is still many months awaythat much of a negative stigma? The decision probably went something like this:
EMCer #1: GDDR is finally ready, should we announce now, or wait ONE week to make it part of the thingswe announce at EMC World?
EMCer #2: We are not announcing much at EMC World and what people really want us to talk about, Maui, wearen't delivering for a while. Why can't people understand we are company of hardware engineers, not software programmers! So, better not be associated with that quagmire at all.
EMCer #1: Yes, boss, I see your point. We'll announce this week then.
My fellow blogger and intellectual sparring partner, Barry Burke, on his Storage Anarchist blog, posted [are you wasting money on your mainframe dr solution?"] to bringup the GDDR announcement. The key difference is that IBM GDPS works withIBM, EMC and HDS equipment, being the fair-and-balanced folks that IBM clientshave come to expect, but it appears EMC GDDR works only with EMC equipment.Because GDDR does less, it also costs less. I can accept that. You get whatyou pay for. Of course, IBM does have a variety of protection levels, one probably will meet your budget and your business continuity needs.
To correct Barry's misperception, companies that buy IBM mainframe servers do have a choice.They can purchase their operating system from IBM, get their Linux or OpenSolarisfrom someone else like Red Hat or Novell, or build their own OS distribution fromreadily available open source. And unlike other servers that might require at leastone OS partition from the vendor, IBM mainframes can run 100 percent Linux.GDPS supports a mix of OS data. z/OS and Linux data can all be managed by GDPS.Companies that own mainframes know this. I can forgive the misperception from Barry,as EMC is focused on distributed servers instead, and many in their company may not have muchexposure to mainframe technology, or have ever spoken to mainframe customers.
But what almost had me fall out of my chair was this little nugget from his post:
"If you're an IBM mainframe customer, you are - by definition - IBM's profit stream."
Honestly, is there anyone out there that does not realize that IBM is a for-profitcorporation? In contrast, Barry would like his readers to believe that EMC is selling GDDR at cost, andthat EMC is a non-profit organization. While IBM has been delivering actual solutions thatour clients want, EMC continues to rumor that someday they might get around to offering something worthwhile.In the last six months, the shareholders have interpreted both strategies for what they really are,and the stock prices reflect that:
(Disclosure: I own IBM stock. I do not own EMC stock. Stock price comparisonsby Yahoo were based on publicly reported information. The colors blue and red to represent IBM and EMC, respectively, were selected by Yahoo graph-making facility. The color red does not necessarily imply EMC is losing money or having financial troubles.)
Of course, I for one would love to help Barry's dream of EMC non-profitability come true. If anyone has any suggestions how we can help EMC approach this goal, please post a comment below.
Lakota Industries made news with the introduction of its [Sarah-Cuda Hunting Bow], named after moose-huntingU.S. Vice President nominee and Governor of Alaska [Sarah Palin]. This has all the same features as their other high-end hunting bows, but is lighter, smaller and available in Pink Camo. This "pink-it-and-shrink-it" move was designed to broaden the market share of hunting bows by reaching out to the needs of women hunters.
Not to be outdone, today, at the Storage Networking World Conference, IBM announced the new IBM System Storage SAN Volume Controller Entry Edition [SVC EE].
The new SVC Entry Edition, available in Flamingo Pink* or traditional Raven Black.
* RPQ required. Default color is Raven Black.
You might be thinking: "Wait! IBM SVC is already the leading storage virtualization product among SMB clients today,why introduce a less expensive model?" With the global economy in the tank, IBM thought it would be nice to help outour smaller SMB clients with this new option.
This new offering is actually a combination of new software (SVC 4.3.1) and new hardware (2145-8A4). Here are thekey differences:
by usable capacity managed, up to 8 PB
by number of disk drives, up to 60 drives
2145-4F2, 8F2, 8F4, 8G4, 8A4
1, 2, 3 or 4 node-pairs, depending on performance requirements
only one node-pair needed
FlashCopy, Metro Mirror and Global Mirror, licensed by subset of capacity used
FlashCopy, Metro Mirror and Global Mirror, but with simplified licensing
The SVC EE is not a "dumbed-down" version of the SVC Classic. It has all the features and functions of theSVC Classic, including thin provisioning with "Space-efficient volumes", Quality of Service (QoS) performance prioritization for more important applications, point-in-time FlashCopy, and both synchronous and asynchronous disk mirroring (Metro and Global Mirror).
While IBM has not yet have SPC-1 benchmarks published, IBM is positioning the SVC EE as roughly 60 percent of the performance, at 60 percent of the list price, compared to a comparable SVC Classic 2145-8G4 configuration. The SVC Classic is already one of the fastest disk systems in the industry. By comparison, the SVC EE is twice as fast as the original SVC 2145-4F2 introduced five years ago.If you outgrow the SVC EE, no problem! The 2145-8A4 can be used in traditional SVC Classic mode, and the SVC EE software can be converted into the SVC Classic software license for upgrade purposes, protecting your originalinvestment!
For those considering an HP EVA 4400 or EMC CX-4 disk system, you might want to look at combining an SVC EE with [IBM System Storage DS3400] disk. The combination offers more features and capabilities, and helps reduce your IT costs at the same time.
And if you are worried you can't afford it right now, IBM Global Financing is offering a ["Why Wait?" world-wide deferral of interest and payments] for 90 days, so you don't have to make your first payment until 2009, applicable to all IBM System Storage products, including the SVC EE, SVC Classic and DS3400 disk systems.
This week is Thanksgiving holiday in the USA, so I thought a good theme would be things I am thankful for.
I'll start with saying that I am thankful EMC has finally announcedAtmos last week. This was the "Maui" part of the Hulk/Maui rumors we heard over a year ago. To quickly recap, Atmos is EMC's latest storage offeringfor global-scale storage intended for Web 2.0 and Digital Archive workloads. Atmos can be sold as just software, or combined with Infiniflex,EMC's bulk, high-density commodity disk storage systems. Atmos supports traditionalNFS/CIFS file-level access, as well as SOAP/REST object protocols.
I'm thankful for various reasons, here's a quick list:
It's hard to compete against "vaporware"
Back in the 1990s, IBM was trying to sell its actual disk systems against StorageTek's rumored "Iceberg" project. It took StorageTek some four years to get this project out,but in the meantime, we were comparing actual versus possibility. The main feature iswhat we now call "Thin Provisioning". Ironically, StorageTek's offering was not commercially successful until IBM agreed to resell this as the IBM RAMAC Virtual Array (RVA).
Until last week, nobody knew the full extent of what EMC was going to deliver on the many Hulk/Maui theories. Severalhinted as to what it could have been, and I am glad to see that Atmos falls short of those rumored possibilities. This is not to say that Atmos can't reach its potential, and certainly some of the design is clever, such as offering native SOAP/REST access.
Instead, IBM now can compare Atmos/Infiniflex directly to the features and capabilities of IBM's Scale Out File Services [SoFS], which offers a global-scale multi-site namespace with policy-based data movement, IBM System Storage Multilevel Grid Access Manager[GAM] that manages geographical distrubuted information,and IBM [XIV Storage System] that offers high-density bulk storage.
Web 2.0 and Digital Archive workloads justify new storage architectures
When I presented SoFS and XIV earlier this year, I mentioned they were designed forthe fast-growing Web 2.0 and Digital Archive workloads that were unique enough to justify their own storage architectures. One criticism was that SoFS appeared to duplicate what could be achieved with dozens of IBM N series NAS boxes connected with Virtual File Manager (VFM). Why invent a new offering with a new architecture?
With the Atmos announcement, EMC now agrees with IBM that the Web 2.0 and DigitalArchive workloads represent a unique enough "use case" to justify a new approach.
New offerings for new workloads will not impact existing offerings for existing workloads
I find it amusing that EMC is quickly defending that Atmos will not eat into its DMXbusiness, which is exactly the FUD they threw out about IBM XIV versus DS8000 earlier this year. In reality, neither the DS8000 nor the DMX were used much for Web 2.0 andDigital Archive workloads in the past. Companies like Google, Amazon and others hadto either build their own from piece parts, or use low-cost midrange disk systems.
Rather, the DS8000 and DMX can now focus on the workloads they were designed for,such as database applications on mainframe servers.
Cloud-Oriented Storage (COS)
Just when you thought we had enough terminology already, EMC introduces yet another three-letter acronym [TLA]. Kudos to EMC for coining phrases to help move newconcepts forward.
Now, when an RFP asks for Cloud-oriented storage, I am thankful this phrase will help serve as a trigger for IBM to lead with SoFS and XIV storage offerings.
Digital archives are different than Compliance Archives
EMC was also quick to point out that object-storage Atmos was different from theirobject-storage EMC Centera. The former being for "digital archives" and the latter for"compliance archives". Different workloads, Different use cases, different offerings.
Ever since IBM introduced its [IBM System Storage DR550] several years ago, EMC Centera has been playing catch-up to match IBM'smany features and capabilities. I am thankful the Centera team was probably too busy to incorporate Atmos capabilities, so it was easier to make Atmos a separate offering altogether. This allows the IBM DR550 to continue to compete against Centera's existingfeature set.
Micro-RAID arrays, logical file and object-level replication
I am thankful that one of the Atmos policy-based feature is replicating individualobjects, rather than LUN-based replication and protection. SoFS supports this forlogical files regardless of their LUN placement, GAM supports replication of files and medical images across geographical sites in the grid, and the XIV supports this for 1MBchunks regardless of their hard disk drive placement. The 1MB chunk size was basedon the average object size from established Web 2.0 and DigitalArchive workloads.
I tried to explain the RAID-X capability of the XIV back in January, under muchcriticism that replication should only be done at the LUN level. I amthankful that Marc Farley on StorageRap coined the phrase[Micro-RAID array] to helpmove this new concept further. Now, file-level, object-level and chunk-level replication can be considered mainstream.
Much larger minimum capacity increments
The original XIV in January was 51TB capacity per rack, and this went up to 79TB per rack for the most recent IBM XIV Release 2 model. Several complained that nobody would purchase disk systems at such increments. Certainly, small and medium size businessesmay not consider XIV for that reason.
I am thankful Atmos offers 120TB, 240TB and 360TB sizes. The companies that purchasedisk for Web 2.0 and Digital Archive workloads do purchase disk capacity in these large sizes. Service providers add capacity to the "Cloud" to support many of theirend-clients, and so purchasing disk capacity to rent back out represents revenue generating opportunity.
Renewed attention on SOAP and REST protocols
IBM and Microsoft have been pushing SOA and Web Services for quite some time now.REST, which stands for [Representational State Transfer] allows static and dynamic HTML message passing over standard HTTP.SOAP, which was originally [Simple Object Access Protocol], and then later renamed to "Service Oriented Architecture Protocol", takes this one step further, allowingdifferent applications to send "envelopes" containing messages and data betweenapplications using HTTP, RPC, SMTP and a variety of other underlying protocols.Typically, these messages are simple text surrounded by XML tags, easily stored asfiles, or rows in databases, and served up by SOAP nodes as needed.
It's hard to show leadership until there are followers
IBM's leadership sometimes goes unnoticed until followerscreate "me, too!" offerings or establish similar business strategies. IBM's leadership in Cloud and Grid computing is no exception.Atmos is the latest me-too product offering in this space, trying pretty muchto address the same challenges that SoFS and XIV were designed for.
So, perhaps EMC is thankful that IBM has already paved the way, breaking throughthe ice on their behalf. I am thankful that perhaps I won't have to deal with as much FUD about SoFS, GAM and XIV anymore.
Now that IBM XIV has proven that 1TB SATA are safe for high-end tier-1 enterprise class use, we extended DS8000 support to include SATA support also. DS8000 supports RAID-6 and RAID-10 for these.
Intelligent Write Caching
IBM Research conducts extensive investigations into improved algorithms for cache management. Intelligent Write Caching boosts performance for both temporal and spatial locality.
Remote Pair FlashCopy®
This allows you to FlashCopy volume A to volume B, with Volume B remotely mirrored to Volume C at a secondary location, via Metro Mirror. This allows you to have a consistent copy of your data at both locations.
IBM was the first in the industry to deliver tape-drive encryption, so it makes sense that IBM is also the first in the industry to deliver disk-drive encryption. These are 15K rpm drives in standard 146GB, 300GB and 450GB capacities. As with tape, encrypting at the disk device eliminates the huge overhead from server-based encryption methods.
Solid State Drive (SSD)
You can also have Solid State Disk drives in your DS8000, in 73GB and 146GB capacities, protected by RAID-5.If you are wondering what data to put on these much-faster drives, IBM has taken the work and worry out by havingintelligence in DB2 to optimize what gets placed on SSD to get the most performance improvement.
IBM System Storage XIV
Continuing the incredible marketplace excitement over its Cloud-Opimized Storage[XIV series], IBM now has announced[new capacity options]. The IBM XIV R2 that we announced last August 2008 was a fixed 15 module configuration. In thenew configurations, you can start with as little as six modules, representing a 40% partial rack of the originalfull model. Here is a table that shows the details:
Useable Capacity (TB)
Fibre Channel Ports
Cache Memory (GB)
IBM System Storage N series
And last, but not least, we have two new models in IBM's[N6000 series].The [N6060]has model A12 (single controller) and model A22 (dual controller). These are disk-less controllers thatyou can configure in either appliance mode or gateway mode. In appliance mode, you can attachdisk drawers such as the EXN1000, EXN2000 or EXN4000. In gateway mode, you attach external disk systems, suchas the IBM DS8000 or XIV above.
It's ruggedized to handle earthquakes. IBM brings a feature that we've had for a while on other disk systems to the N series with a collection of bolts and anchors to secure the rack from physical tremors.
It's instrumented for IBM Active Energy Manager, a component of IBM Systems Director. New iPDUs are designed to help measure and monitor energy management components. As companies get more concerned about thefate of the planet, monitoring energy consumption can help reduce carbon footprint.
I'll cover the rest of the announcements tomorrow!
Many people have asked me if there was any logic with the IBM naming convention of IBM Systems branded servers. Here's your quick and easy cheat sheet:
System x -- "x" for cross-platform architecture. Technologies from our mainframe and UNIX servers were brought into chips that sit next to the Intel or AMD processors to provide a more reliable x86 server experience. For example, some models have a POWER processor-based Remote Supervisor Adapter (RSA).
System p -- "p" for POWER architecture.
System z -- "z" for Zero-downtime, zero-exposures. Our lawyers prefer "near-zero", but this is about as close as you get to ["six-nines" availability] in our industry, with the highest level of security and encryption, no other vendor comes close, so you get the idea.
But what about the "i" for System i? Officially, it stands for "Integrated" in that it could integrate different applications running on different operating systems onto a [COMMON] platform. Options were available to insert Intel-based processor cards that ran Windows, or attach special cables that allowed separate System x servers running Windows to attach to a System i. Both allowed Windows applications to share the internal LAN and SAN inside the System i machine. Later, IBM allowed [AIX on System i] and [Linux on Power] operating systems to run as well.
From a storage perspective, we often joked that the "i" stood for "island", as most System i machines used internal disk, or attached externally to only a fewselected models of disk from IBM and EMC that had special support for i5/OS using a special, non-standard 520-byte disk block size. This meant only our popular IBM System Storage DS6000 and DS8000 series disk systems were available. This block size requirement only applies to disk. For tape, i5/OS supports both IBM TS1120 and LTO tape systems. For the most part,System i machines stood separate from the mainframe, and the rest of the Linux, UNIX and Windows distributed serverson the data center floor.
Often, when I am talking to customers, they ask when will product xyz be supported on System z or System i?I explained that IBM's strategy is not to make all storage devices connect via ESCON/FICON or support non-standard block sizes, but rather to get the servers to use standard 512-byte block size, Fibre Channel and other standard protocols.(The old adage applies: If you can't get Mohamed to move to the mountain, get the mountain to move to Mohamed).
On the System z mainframe, we are 60 percent there, allowing three of the five operating systems (z/VM, z/VSE and Linux) to access FCP-based disk and tape devices. (Four out of six if you include [OpenSolaris for the mainframe])But what about System i? As the characters on the popular television show [LOST] would say: It's time to get off the island!
Last week, IBM announced the new [i5/OS V6R1 operating system] with features that will greatly improve the use of external storage on this platform. Check this out:
POWER6-based System i 570 model server
Our latest, most powerful POWER processor brought to the System i platform. The 570 model will be the first in the System i family of servers to make use of new processing technology, using up to 16 (sixteen!) POWER6 processors (running at 4.7GHZ) in each machine.The advantage of the new processors is the increased commercial processing workload (CPW) rating, 31 percent greater than the POWER5+ version and 72 percent greater than the POWER5 version. CPW is the "MIPS" or "TeraFlops" rating for comparing System i servers.Here is the[Announcement Letter].
Fibre Channel Adapter for System i hardware
That's right, these are [Smart IOAs], so an I/O Processor (IOP) is no longer required! You can even boot the Initial Program Load (IPL) direclty from SAN-attached tape.This brings System i to the 21st century for Business Continuity options.
Virtual I/O Server (VIOS)
[VirtualI/O Server] has been around for System p machines, but now available on System i as well. This allows multiplelogical partitions (LPARs) to access resources like Ethernet cards and FCP host bus adapters. In the case of storage, the VIOS handles the 520-byte to 512-byte conversion, so that i5/OS systems can now read and write to standard FCP devices like the IBM System Storage DS4800 and DS4700 disk systems.
IBM System Storage DS4000 series
Initially, we have certified DS4700 and DS4800 disk systems to work with i5/OS, but more devices are in plan.This means that you can now share your DS4700 between i5/OS and your other Linux, UNIX and Windowsservers, take advantage of a mix of FC and SATA disk capacities, RAID6 protection, and so on.
To call [IBM PowerVM] the "VMware for the POWER architecture" would not do it quite justice. In combination with VIOS, IBM PowerVM is able to run a variety of AIX, Linux and i5/OS guest images.The "Live Partition Mobility" feature allows you to easily move guest images from one system to another, while they are running, just like VMotion for x86 machines.
And while we are on the topic of x86, PowerVM is also able to represent a Linux-x86 emulation base to run x86-compiled applications. While many Linux applications could be re-complied from source code for the POWER architecture "as is", others required perhaps 1-2 percent modification to port them over, and that was too much for some software development houses. Now, we can run most x86-compiled Linux application binaries in their original form on POWER architecture servers.
BladeCenter JS22 Express
The POWER6-based [JS22 Express blade] can run i5/OS, taking advantage of PowerVM and VIOS to access all of the BladeCenterresources. The BladeCenter lets you mix and match POWER and x86-based blades in the same chassis, providing theultimate in flexibility.
With all the announcements we had in June, it is easy for some of the more subtle enhancements to get overlooked. While I was at Orlando for the IBM Edge conference, I was able to blog about some of the key featured announcements. Then, later, when I got back from Orlando to Tucson, I was able to then blog about [More IBM Storage Announcements]. For IBM's Scale-Out Network Attach Storage (SONAS), I had simply:
"SONAS v1.3.2 adds support for management by the newly announced IBM Tivoli Storage Productivity Center v5.1 release. Also, IBM now officially supports Gateway configurations that have the storage nodes connected to XIV or Storwize V7000 disk systems. These gateway configurations offer new flexible choices and options for our ever-expanding set of clients."
In my defense, IBM numbers its software releasees with version.release.modification, so 1.3.2 is Version 1, Release 3, Modification 2. Generally, modification announcements don't get much attention. The big announcement for v1.3.0 of SONAS happened last October, see my blog post [October 2011 Announcements - Part I] or
the nice summary post [IBM Scale-out Network Attached Storage 1.3.0] from fellow blogger Roger Luethy.
Here is a diagram showing the three configurations of SONAS.
I have covered the SONAS Appliance model in depth in previous blogs, with options for fast and slow disk speeds, choice of RAID protection levels, a collection of enterprise-class software features provided at no additional charge, and interfaces to support a variety of third party backup and anti-virus checking software.
The basics haven't changed. The SONAS appliance consists of 2 to 32 interface nodes, 2 to 60 storage nodes, and up to 7,200 disk drives. The maximum configuration takes up 17 frames and holds 21.6PB of raw disk capacity, which is about 17PB usable space when RAID6 is configured. An interface nodes has one or two hex-core processors with up to 144GB of RAM to offer up to 3.5GB/sec performance each. This makes IBM SONAS the fastest performing and most scalable disk system in IBM's System Storage product line.
I thought I would go a bit deeper on the gateway models. These models support up to ten storage nodes, organized in pairs. The key difference is that instead of internal disk controllers, the storage nodes connect to external disk systems. There is enough space in the base SONAS rack to hold up to six interface nodes, or you can add a second rack if you need more interface nodes for increased performance.
SONAS with XIV gateway
XIV offers a clever approach to storage that allows for incredibly fast access to data on relatively slow 7200 RPM drives. By scattering data across all drives and taking advantage of parallel processing, rebuild times for a failed 3TB drive are less than 75 minutes. Compare that to typical rebuild times for 3TB drives that could take as much as 9-10 hours under active I/O loads!
In the configuration, each pair of storage nodes can connect to external SAN Fabric switches that then connect to one or two XIV storage systems. How simple is that? These can be the original XIV systems that support 1TB and 2TB drives, or the new XIV Gen3 systems that support 400GB Solid-state drives (SSD) and 3TB spinning disk drives. In both cases, you can acquire additional storage capacity as little as 12 drives at a time (one XIV module holds 12 drives).
The maximum configuration of ten XIV boxes could hold 1,800 drives. At 3TB drive per drive, that would be 2.4PB usable capacity.
The SONAS with XIV gateway does not require the XIV devices to be dedicated for SONAS purposes. Rather, you can assign some XIV storage space for the SONAS, and the rest is available for other servers. In this manner, SONAS just looks like another set of Linux-based servers to the XIV storage system. This in effect gives you "Unified Storage", with a full complement of NAS protocols from the SONAS side (NFS, CIFS, FTP, HTTPS, SCP) as well as block-based protocols directly from the XIV (FCP, iSCSI).
SONAS with Storwize V7000 gateway
The other gateway offering is the SONAS with Storwize V7000. Like the SONAS with XIV gateway model, you connect a pair of SONAS storage nodes to 1 or 2 Storwize V7000 disk systems. However, you do not need a SAN Fabric switch in between. You can instead connect the SONAS storage nodes directly to the Storwize V7000 control enclosures.
To acquire additional storage capacity, you can purchase a single drive at a time. That's right. Not 12 drives, or 60 drives, at a time, but one at a time. The Storwize V7000 supports a wide range of SSD, SAS and NL-SAS drives at different sizes, speeds and capacities. The drives can be configured into various RAID protection levels: RAID 0, 1, 3, 5, 6 and 10.
Each Storwize V7000 control enclosure can have up to nine expansion drawers. If you choose the 2.5-inch 24-bay models, you can have up to 480 drives per storage node pair, for a total of 2,400 drives. If you choose the 3.5-inch 12-bay models, you can have up to 240 drives per node pair, 1,200 drives total. At 3TB per drive, this could be 3.6PB of raw capacity. The usable PB would depend on which RAID level you selected. Of course, you don't have to limit yourself all to one size or the other. Feel free to mix 2.5-inch and 3.5-inch drawers to provide different storage pool capabilities.
All three SONAS configurations support Active Cloud Engine. This is a collection of features that differentiate SONAS from the other scale-out NAS wannabees in the marketplace:
Policy-driven Data Placement -- Different files can be directed to different storage pools. You no longer have to associate certain file systems to certain storage technologies.
High-speed Scan Engine -- SONAS can scan 10 million files per minute, per node. These scans can be used to drive data migration, backups, expirations, or replications, for example. It is over 100 times faster than traditional walk-the-directory-tree approaches employed by other NAS solutions.
Policy-driven Migration -- You can migrate files from one storage pool to another, based on age, days since last reference, size, and other criteria. The files can be moved from disk to disk, or move out of SONAS and stored on external media, such as tape or a virtual tape library. A lot of data stored on NAS systems is dormant, with little or no likelihood of being looked at again. Why waste money keeping that kind of data on expensive disk? With SONAS, you can move those files to tape can save lots of money. The files are stubbed in the SONAS file system, so that an access request to a file will automatically trigger a recall to fetch the data from tape back to the SONAS system.
Policy-driven Expiration -- SONAS can help you keep your system clean, by helping you decide what files should be deleted. This is especially useful for things like logs and traces that tend to just hang around until some deletes them manually.
WAN Caching -- This allows one SONAS to act as a "Cloud Storage Gateway" for another SONAS at a remote location connected by Wide Area Network (WAN). Let's say your main data center has a large SONAS repository of files, and a small branch office has a smaller SONAS. This allows all locations to have a "Global" view of the all the interconnected SONAS systems, with a high-speed user experience for local LAN-based access to the most recent and frequently used files.
If you want to learn more, see the [IBM SONAS landing page]. Next week, I will be across the Pacific Ocean in [Taipei], to teach IBM Top Gun class to sales reps and IBM Business Partners. "Selling SONAS" will be one of the topics I will be covering!
We had a great event today! This was a first-of-a-kind product launch, using Second Life as the medium. We invited IBM Business Partners, industry analysts and reporters from the Press to have their "avatars" in-world to watch us launch new tape systems, archive and retention systems, and disk systems announced this month.
Andy Monshaw, IBM System Storage General Manager, welcomed everyone to the event, and introduced our three speakers.He mentioned that this was a great innovative way to meet, collaborate and forge relationships without the carbon pollution associated with travel required by a more traditional face-to-face meeting. We had attendees from the USA, UK, Germany, Sweden, Italy, Colombia, and Brazil.
All the attendees were given a "goody bag" that contained IBM BP-logo clothing, animations and gestures to be used during the meeting.
Eric Buckley, one of our marketing managers for tape systems, introduced our complete line of LTO 4 tape systems, as wellas the TS7520 Virtualization Engine, a virtual tape library for Windows, UNIX and Linux servers. Eric had a virtual 3-Dversion of an LTO cartridge that is photo-realistic and dimensionally correct.
Funda Eceral, our solutions manager for archive and retention offerings, presented the new version of the IBM System Storage DR550, the DR550 file system gateway, and the IBM System Storage Multilevel Grid Archive Manager. At first we thought we would "pass the microphone" from speaker to speaker, but it turned out to be easier just to give all three speakers their own microphone.
Last, but not least, was David Tareen, marketing manager for disk systems, covering the entry-level DS3000 Express disk system bundles designed for our SMB client. David used a black-and-brown pointer stick to point out specific things on the charts.
After the presentations, Kristie Bell, VP of Marketing for IBM System Storage, hosted a Question & Answer (Q&A) panel.Avatars rose their left hand to indicate they had a question.
We thought it would be a good idea to have a few minutes at the end to socialize over a cup of coffee. This involved making a "coffee machine" that dispensed coffee, and the appropriate animations and gestures so that everyone could sip the coffee, and hold the coffee at waist level when they were talking.
The event was held upstairs in one of the conference rooms of the IBM Briefing Center, located on "IBM 8" island.Many people went to the ground floor to look at the many IBM System Storage products on display. Unlike a picture on a web-page, Second Life gives you a 3-D view that you can walk around each product, and get a feel for the size and shape of the hardware.
We had four photographers and camera-persons on hand to capture still shots, video, audio, and chat text, and are working now to combine them for marketing collateral. I want to thank the builders, script programmers, animators, clothing designers, speakers, editors, and channel enablement team for making this event such a great success!
A lot of people ask me about IBM branding, as we have recently changed brands. In the past we had two separate brands, one for servers (eServer) and one for storage (TotalStorage). These would be fine if we wanted to promote their independence, but customers today want synergy between servers and storage, they want systems that work well together.
Last year, in response to market feedback, we crated a new brand, "IBM Systems" and put all the server and storage product lines under one roof. Over time, we will transition from TotalStorage to System Storage naming. This will occur with new products, and major versions of existing products.
Two other phrases you will hear in the names of our offerings are "Virtualization Engine" and "Express". These are portfolio identifiers. The Virtualization Engine identifier was created to emphasize our leadership in system virtualization, and we have products that span product lines with this identifier.
The Express identifier was created to emphasize our focus on Small and Medium sized business (SMB). It spans not just servers and storage, but across other offerings from other IBM divisions.
Of course, just renaming products and services isn't enough. Systems don't work together just because they have similar names, are covered in similar "Apple white" plastic, or have similar black bezels. Obviously, thoughtful and collaborative design are needed, with the appropriate amounts of engineering and testing. IBM is aligning its server and storage development so that the IBM Systems brand keeps its promise.
Perhaps the recent financial meltdown is making storage vendors nervous.Both IBM and EMC gained market share in 3Q08, but EMC is acting strangelyat IBM's latest series of plays and announcements. Almost contradictory!
Benchmarks bad, rely on your own in-house evaluations instead
Let's start with fellow blogger Barry Burke from EMC, who offers his latest post[Benchmarketing Badly] with commentaryabout Enterprise Strategy Group's [DS5300 Lab Validation Report]. The IBM System Storage DS5300 is one of IBM's latest midrange disk systems recently announced. Take for example this excerpt from BarryB's blog post:
"I was pleasantly surprised to learn that both IBM and ESG agree with me about the relevance and importance of the Storage Performance Council benchmarks.
That is, SPC's are a meaningless tool by which to measure or compare enterprise storage arrays."
Nowhere in the ESG report says this, nor have I found any public statements from either IBM nor ESG that makes this claim. Instead, the ESG report explains that traditional benchmarks from the Storage Performance Council [SPC] focus on a single, specific workload, and ESG has chosen to complement this with a variety of other benchmarks to perform their product validation, including VMware's "VMmark", Oracle's Orion Utility, and Microsoft's JetStress.
Benchmarks provide prospective clients additional information to make purchasedecisions. IBM understands this, ESG understands this, and other well-respected companies like VMware, Oracle and Microsoft understand this. EMC is afraid that benchmarks mightencourage a client to "mistakenly" purchase a faster IBM product than a slower EMC product. Sunshine makes a great disinfectant, but EMC (and vampires) prefer their respective "prospects" remain in the dark.
Perhaps stranger still is BarryB's postscript. Here's an excerpt:
"... a customer here asked me if EMC would be willing to participate in an initiative to get multiple storage vendors to collaborate on truly representative real-world "enterprise-class" benchmarks, and I reassured him that I would personally sponsor active and objective participation in such an effort - IF he could get the others to join in with similar intent."
As I understand it, EMC was once part of the Storage Performance Council a long time ago, then chose to drop out of it. Why re-invent the wheel by creating yet another storage industry benchmark group? EMC is welcome to come back to SPC anytime! In addition to the SCP-1 and SPC-2 workloads, there is work underway for an SPC-3 benchmark. Each SPC workload provides additional insight for product comparisons to help with purchase decisions. If EMC can suggest an SPC-4 benchmark that it feels is more representative of real-world conditions, they are welcome to join the SPC party and make that a reality.
The old adage applies: ["It's better to light a candle than curse the darkness"]. EMC has been cursing the lack of what it considers to be acceptable benchmarks but has yet to offer anything more realistic or representative than SPC.What does EMC suggest you do instead? Get an evaluation box and run your own workloads and see for yourself! EMC has in the past offered evaluation units specifically for this purpose.
In-house evaluations bad, it's a trap!
Certainly, if you have the time and staff to run your own evaluation, with your own applications in your own environment, then I agree with EMC that this can provide better insight for your particular situation than standardized benchmarks.
In fact, that is exactly what IBM is doing for IBM XIV storage units, which are designed for Web 2.0 and Digital Archive workloads that current SPC benchmarks don't focus on. Fellow blogger Chuck Hollis from EMC opines in his post[Get yer free XIV!]. Here's an excerpt:
"Now that I think about it, this could get ugly. Imagine a customer who puts one on the floor to evaluate it, and -- in a moment of desperation or inattention -- puts production data on the device.
Nobody was paying attention, and there you are. Now IBM comes calling for their box back, and you've got a choice as to whether to go ahead and sign the P.O., or migrate all your data off the thing. Maybe they'll sell you an SVC to do this?
Yuck. I bet that happens more than once. And I can't believe that IBM (or the folks at XIV) aren't aware of this potentially happening."
Perhaps Chuck is speaking from experience here, as this may have happened with customers with EMC evaluation boxes, and is afraid this could happen with IBM XIV. I don't see anything unique about IBM XIV in the above concern. Typical evaluations involve copying test data onto the box, test it out with some particular application or workload, and then delete the data no longer required. Repeat as needed. Moving data off an IBM XIV is aseasy as moving data off an EMC DMX, EMC CLARiiON or EMC Celerra, and I am sure IBM wouldgladly demonstrate this on any EMC gear you now have.
Thanks to its clever RAID-X implementation, losing data on an IBM XIV is less likely thanlosing data on any RAID-5 based disk array from any storage vendor. Of course, there will always be skeptics about new technology that will want to try the box out for themselves.
If EMC thought the IBM XIV had nothing unique to offer, that its performance was just "OK",and is not as easy to manage as IBM says it is, then you would think EMC would gladly encourage such evaluations and comparisons, right?
No, I think EMC is afraid that companies will discover what they already know, that IBM has quality products that would stand a fair chance of side-by-side comparisons with their own offerings.We have enough fear, uncertainty and doubt from our current meltdown of the global financial markets, don't let EMC add any more.
Have a safe and fun Halloween! If you need to add some light to your otherwise dark surroundings, consider some of these ideas for [Jack-O-Lanterns]!
Array-based replication does have drawbacks; all externalised storage becomes dependent on the virtualising array. This makes replacement potentially complex. To date, HDS have not provided tools to seamlessly migrate away from one USP to another (as far as I am aware). In addition, there's the problem of "all your eggs in one basket"; any issue with the array (e.g. physical intervention like fire, loss of power, microcode bug etc) could result in loss of access to all of your data. Consider the upgrade scenario of moving to a higher level of code; if all data was virtualised through one array, you would want to be darn sure that both the upgrade process and the new code are going to work seamlessly...
The final option is to use fabric-based virtualisation and at the moment this means Invista and SVC. SVC is an interesting one as it isn't an array and it isn't a fabric switch, but it does effectively provide switching capabilities. Although I think SVC is a good product, there are inevitably going to be some drawbacks, most notably those similar issues to array-based virtualisation (Barry/Tony, feel free to correct me if SVC has a non-disruptive replacement path).
I would argue that the IBM System Storage SAN Volume Controller (SVC) is more like the HDS USP, and less like the Invista. Both SVC and USP provide a common look and feel to the application server, both provide additional cache to external disk, both are able to provide a consistent set of copy services.
IBM designed the SVC so that upgrades can occur non-disruptively. You can replace the hardware nodes, one node at a time, while the SVC system is up and running, without disruption to reading and writing data on virtual disk. You can upgrade the software, one node at a time, while the SVC system is up and running, without disruption to reading and writing data on virtual disk. You can upgrade the firmware on the managed disk arrays behind the SVC, again, without disruption to reading and writing data on virtual disk.
More importantly, SVC has the ultimate "un-do" feature. It is called "image mode". If for any reason you want to take a virtual disk out of SVC management, you migrate over to an "image mode" LUN, and then disconnect it from SVC. The "image mode" LUN can then be used directly, with all the file system data in tact.
I define "virtualization" as technology that makes one set of resources look and feel like a different set of resources with more desirable characteristics. For SVC, the more desirable characteristics include choice of multi-pathing driver, consistent copy services, improved performance, etc. For EMC Invista, the question is "more desirable for whom?" EMC Invista seems more designed to meet EMC's needs, not its customers. EMC profits greatly from its EMC PowerPath multi-pathing driver, and from its SRDF copy services, so it appears to have designed a virtualization offering that:
Continuesthe use of EMC Powerpath as a multi-pathing driver. SVC supports driversthat are provided at no charge to the customer, as well as those built-in to each operating system like MPIO.
and, continuesthe use of Array-based copy services like SRDF of the underlying disk. SVC providesconsistent copy services regardless of storage vendor being managed.
A post from Dan over at Architectures of Control explains the anti-social nature of public benches. City planners, in an effort to discourage homeless people from sleeping on benches in parks or sidewalks, design benches that are so uncomfortableto use, that nobody uses them. These included benches made of metal that are too hot or too cold during certainmonths, benches slanted at an angle that dump you on the ground if you lay down, or benches that have dividers sothat you must be in an upright seated position to use.
This is not a disparagement of split-path switch-based designs. Rather, EMC's specific implementation appears to be designed for it to continuevendor lock-in for its multi-pathing driver, continuevendor lock-in for its copy services when used with EMC disk, and only provide slightly improved data migration capability for heterogeneous storage environments. Other switch-based solutions, such as those from Incipient or StoreAge, had different goals in mind.
Sadly, my IBM colleague BarryW and I have probably spent more words discussing Invista than all eleven EMC bloggers combined this year. While everyone in the industry is impressed how often EMC can sell "me, too" products with an incredibly large marketing budget, EMC appears not to have set aside funds for the Invista.
If a customer could design the ideal "storage virtualization" solution that would provide them the characteristics they desire the most from storage resources, it would not be anything like an Invista. While there are pros and cons between IBM's SVC and HDS's TagmaStore offerings, the reason both IBM and HDS are the market leaders in storage virtualization is because both companies are trying to provide value to the customer, just in different ways, and with different implementations.
Based on this success, and perhaps because I am also fluent in Spanish, I was asked to help with Proyecto Ceibal, the team for OLPC Uruguay. Normally theXS school server resides at the school location itself, so that even if the internet connection is disrupted or limited, the school kids can continue to access each other and the web cache content until internet connection is resumed.However, with a diverse developmentteam with people in United States, Uruguay, and India, we first looked to Linux hosting providers that wouldagree to provide free or low-cost monthly access. We spent (make that "wasted") the month of May investigating.Most that I talked to were not interested in having a customized Linux kernel on non-standard hardware on their shop floor, and wanted instead to offer their own standard Linux build on existing standard servers, managed by theirown system administrators, or were not interested in providing it for free. Since the XS-163 kernel is customizedfor the x86 architecture, it is one of those exceptions where we could not host it on an IBM POWER or mainframe as a virtual guest.
This got picked up as an [idea] for the Google's[Summer of Code] and we are mentoring Tarun, a 19-year-old student to actas lead software developer. However, summer was fast approaching, and we wanted this ready for the next semester. In June, our project leader, Greg, came up with a new plan. Build a machine and have it connected at an internet service provider that would cover the cost of bandwidth, and be willing to accept this with remote administration. We found a volunteer organization to cover this -- Thank you Glen and Vicki!
We found a location, so the request to me sounded simple enough: put together a PC from commodity parts that meet the requirements of the customizedLinux kernel, the latest release being called [XS-163]. The server would have two disk drives, three Ethernet ports, and 2GB of memory; and be installed with the customized XS-163 software, SSHD for remote administration, Apache web server, PostgreSQL database and PHP programming language.Of course, the team wanted this for as little cost as possible, and for me to document the process, so that it could be repeated elsewhere. Some stretch goals included having a dual-boot with Debian 4.0 Etch Linux for development/test purposes, an alternative database such as MySQL for testing, a backup procedure, and a Recover-DVD in case something goes wrong.
Some interesting things happened:
The XS-163 is shipped as an ISO file representing a LiveCD bootable Linux that will wipe your system cleanand lay down the exact customized software for a one-drive, three-Ethernet-port server. Since it is based on Red Hat's Fedora 7 Linux base, I found it helpful to install that instead, and experiment moving sections of code over.This is similar to geneticists extracting the DNA from the cell of a pit bull and putting it into the cell for a poodle. I would not recommend this for anyone not familiar with Linux.
I also experimented with modifying the pre-built XS-163 CD image by cracking open the squashfs, hacking thecontents, and then putting it back together and burning a new CD. This provided some interesting insight, but in the end was able to do it all from the standard XS-163 image.
Once I figured out the appropriate "scaffolding" required, I managed to proceed quickly, with running versionsof XS-163, plain vanilla Fedora 7, and Debian 4, in a multi-boot configuration.
The BIOS "raid" capability was really more like BIOS-assisted RAID for Windows operating system drivers. This"fake raid" wasn't supported by Linux, so I used Linux's built-in "software raid" instead, which allowed somepartitions to be raid-mirrored, and other partitions to be un-mirrored. Why not mirror everything? With two160GB SATA drives, you have three choices:
No RAID, for a total space of 320GB
RAID everything, for a total space of 160GB
Tiered information infrastructure, use RAID for some partitions, but not all.
The last approach made sense, as a lot of of the data is cache web page images, and is easily retrievable fromthe internet. This also allowed to have some "scratch space" for downloading large files and so on. For example,90GB mirrored that contained the OS images, settings and critical applications, and 70GB on each drive for scratchand web cache, results in a total of 230GB of disk space, which is 43 percent improvement over an all-RAID solution.
While [Linux LVM2] provides software-based "storage virtualization" similar to the hardware-based IBM System Storage SAN Volume Controller (SVC), it was a bad idea putting different "root" directories of my many OS images on there. With Linux, as with mostoperating systems, it expects things to be in the same place where it last shutdown, but in a multi-boot environment, you might boot the first OS, move things around, and then when you try to boot second OS, it doesn'twork anymore, or corrupts what it does find, or hangs with a "kernel panic". In the end, I decided to use RAIDnon-LVM partitions for the root directories, and only use LVM2 for data that is not needed at boot time.
While they are both Linux, Debian and Fedora were different enough to cause me headaches. Settings weredifferent, parameters were different, file directories were different. Not quite as religious as MacOS-versus-Windows,but you get the picture.
During this time, the facility was out getting a domain name, IP address, subnet mask and so on, so I testedwith my internal 192.168.x.y and figured I would change this to whatever it should be the day I shipped the unit.(I'll find out next week if that was the right approach!)
Afraid that something might go wrong while I am in Tokyo, Japan next week (July 7-11), or Mumbai, India the following week (July 14-18), I added a Secure Shell [SSH] daemon that runs automaticallyat boot time. This involves putting the public key on the server, and each remote admin has their own private key on their own client machine.I know all about public/private key pairs, as IBM is a leader in encryption technology, and was the first todeliver built-in encryption with the IBM System Storage TS1120 tape drive.
To have users have access to all their files from any OS image required that I either (a) have identical copieseverywhere, or (b) have a shared partition. The latter turned out to be the best choice, with an LVM2 logical volumefor "/home" directory that is shared among all of the OS images. As we develop the application, we might findother directories that make sense to share as well.
For developing across platforms, I wanted the Ethernet devices (eth0, eth1, and so on) match the actual ports they aresupposed to be connected to in a static IP configuration. Most people use DHCP so it doesn't matter, but the XSsoftware requires this, so it did. For example, "eth0" as the 1 Gbps port to the WAN, and "eth1/eth2" as the two 10/100 Mbps PCI NIC cards to other servers.Naming the internet interfaces to specific hardware ports wasdifferent on Fedora and Debian, but I got it working.
While it was a stretch goal to develop a backup method, one that could perform Bare Machine Recovery frommedia burned by the DVD, it turned out I needed to do this anyways just to prevent me from losing my work in case thingswent wrong. I used an external USB drive to develop the process, and got everything to fit onto a single 4GB DVD. Using IBM Tivoli Storage Manager (TSM) for this seemed overkill, and [Mondo Rescue] didn't handle LVM2+RAID as well as I wanted, so I chose [partimage] instead, which backs up each primary partition, mirrored partition, or LVM2 logical volume, keeping all the time stamps, ownerships, and symbolic links in tact. It has the ability to chop up the output into fixed sized pieces, which is helpful if you are goingto burn them on 700MB CDs or 4.7GB DVDs. In my case, my FAT32-formatted external USB disk drive can't handle files bigger than 2GB, so this feature was helpful for that as well. I standardized to 660 GiB [about 692GB] per piece, sincethat met all criteria.
The folks at [SysRescCD] saved the day. The standard "SysRescueCD" assigned eth0, eth1, and eth2 differently than the three base OS images, but the nice folks in France that write SysRescCD created a customized[kernel parameter that allowed the assignments to be fixed per MAC address ] in support of this project. With this in place, I was able to make a live Boot-CD that brings up SSH, with all the users, passwords,and Ethernet devices to match the hardware. Install this LiveCD as the "Rescue Image" on the hard disk itself, and also made a Recovery-DVD that boots up just like the Boot-CD, but contains the 4GB of backup files.
For testing, I used Linux's built-in Kernel-based Virtual Machine [KVM]which works like VMware, but is open source and included into the 2.6.20 kernels that I am using. IBM is the leadingreseller of Vmware and has been doing server virtualization for the past 40 years, so I am comfortable with thetechnology. The XS-163 platform with Apache and PostgreSQL servers as a platform for [Moodle], an open source class management system, and the combination is memory-intensive enough that I did not want to incur the overheads running production this manner, but it wasgreat for testing!
With all this in place, it is designed to not need a Linux system admin or XS-163/Moodle expert at the facility. Instead, all we need is someone to insert the Boot-CD or Recover-DVD and reboot the system if needed.
Just before packing up the unit for shipment, I changed the IP addresses to the values they need at the destination facility, updated the [GRUB boot loader] default, and made a final backup which burned the Recover-DVD. Hopefully, it works by just turning on the unit,[headless], without any keyboard, monitor or configuration required. Fingers crossed!
So, thanks to the rest of my team: Greg, Glen, Vicki, Tarun, Marcel, Pablo and Said. I am very excited to bepart of this, and look forward to seeing this become something remarkable!
Well, it's Tuesday again, and we have more IBM announcements.
XIV asynchronous mirror
For those not using XIV behind SAN Volume Controller, [XIV now offers native asynchronous mirroring] support to another XIV far, far away. Unlike other disk systems that are limited to two or three sites, an XIV can mirror to up to 15 other sites. The mirroring can be at the individual volume, or a consistency group of multiple volumes. Each mirror pair can have its own recovery point objective (RPO). For example, a consistency group of mission critical application data might be given an RPO of 30 seconds, but less important data might be given an RPO of 20 minutes. This allows the XIV to prioritize packets it sends across the network.
As with XIV synchronous mirror, this new asynchronous mirror feature can send the data over either its
Fibre Channel ports (via FCIP) or its Ethernet ports.
The IBM System Storage SAN384B and SAN768B directors now offer [two new blades!]
A 24-port FCoCEE blade where each port can handle 10Gb convergence enhanced Ethernet (CEE). CEE can be used to transmit Fibre Channel, TCP/IP, iSCSI and other Ethernet protocols. This connect directly to server's converged network adapter (CNA) cards.
A 24-port mixed blade, with 12 FC ports (1Gbps, 2Ggbs, 4Gbps, 8Gbps), 10 Ethernet ports (1GbE) and 2 Ethernet ports (10GbE). This would connect to traditional server NIC, TOE and HBA cards as well as traditional NAS, iSCSI and FC based storage devices.
IBM also announced the IBM System Storage [SAN06B-R Fibre Channel router]. This has 16 FC ports (1Gbps up to 8Gbps) and six Ethernet ports (1GbE), with support for both FC routing as well as FCIP extended distance support.
With the holiday season coming up at the end of the year, now is a great time to ask Santa for a new shiny pair of XIV systems, and some extra networking gear to connect them.
IDC announced that IBM was number #1 in storage hardware (disk and tape combined)for 2006. Here are some excerpts from the IBM press release:
The newly released May 2007 report  by leading industry analyst firm IDC, "Worldwide Combined Disk and Tape Storage 2006 Market Share Update," shows IBM in the #1 overall position for all disk and tape storage hardware for the full year 2006.
In a total disk and tape storage hardware segment that increased to $28.2 billion in 2006, IBM captured 22.2 percent of the combined revenue for full year 2006, besting HP's 20.9 percent and EMC's 13.2 percent.
Five years ago, IBM was only #3 in this area, butis this new standing from IBM doing things better, or HP and EMC doing things poorly? Probably a little of both, but since it's not polite to point out the flaws of others in a blog, I will focus on what IBM is doing right, and I think our leadership in tape accounts for a good measure of this.
The resurgence of tape comes from a variety of factors:
The focus on being "green", to conserve energy power and cooling costs. Tape is the cheapest storage in this regard, as the tape cartridges only consume power when read or written.
Government regulations where more data must be stored for longer periods of time, such as theFederal Rules of Civil Procedures (FRCP), Sarbanes-Oxley, SEC regulations, and so on.
The widening gap in dollars per MB. Advancements in tape are outpacing disk. Disk is slowing down to about 25% improvement year on year, but tape continues its 30-40% improvement curve. A solution like Information Lifecycle Management (ILM) that moves older less valuable data from disk to tape can result in excellent cost savings.
Exciting "combined storage" solutions like the IBM System Storage DR550 and the IBM Grid Medical Archive Solution (GMAS) that combine disk and tape with internal hierarchy storage management of data, based on policies.
Whew! I am glad that is over. The BarryB circus has left town, he has decided to [move on to other topics], and I am now to clean up the ["circus gold"] leftbehind. I would like to remind everyone that all of these discussions have been about the architecture,not the product. IBM will come out withits own version of a product based on Nextra later in 2008, which may be different than the product that XIV currentlysells to its customers.
RAID-X does not protect against double-drive failures as well as RAID-6, but it's very close
BarryB calls this the "Elephant in the room", that RAID-6 protects better against double-drive failures. I don't dispute that. He also credits me with the term "RAID-X", but I got this directly from the XIV guys. It turns out this was already a term used among academic research circles for [distributed RAID environments]. Meanwhile, Jon Toigo feels the term RAID-X sounds like a brand of bug spray in his post[XIV Architecture: What’s Not to Like?]Perhaps IBM can change this to RAID-5.99 instead.
If you measure risk of a second drive failing during the rebuild or re-replication process ofa first drive failure, you can measure the exposure by multiplying the amount of GB at risk by thenumber of hours that the second failure could occur, resulting in a unit of "GB-hours". Here Ilist best-case rebuild times, your mileage may vary depending on whether other workloads existon the system competing for resources. Notice that 8-disk configurations of RAID-10 and RAID-5for smaller FC disk are in the triple digits, and larger SATA disk in five digits, but that with RAID-X it is only single digits. That is orders of magnitude closer to the ideal.
For each RAID type, the risk is proportional to the square of the individual drive size.Double the drive size causes the risk to be four times greater.This is not the first time this has been discussed. In [Is RAID-5 Getting Old?], Ramskovquotes NetApp's response in Robin Harris' [NetApp Weighs In On Disks]:
...protecting online data only via RAID 5 today verges on professional malpractice.
As disks get older, RAID-6 will not be able to protect against 3-drive failures. A similar chartabove could show the risk to data after the second drive fails and both rebuilds are going on,compared to the risk of a third drive failure during this time. The RAID-X scheme protects muchbetter against 3-drive failures than RAID-6.
Nothing in the Nextra architecture prevents a RAID-6, Triple-copy, or other blob-level scheme
In much the same way that EMC Centera is RAID-5 based for its blobs, there is nothing in the Nextra architecturethat prevents taking additional steps to provide even better protection, using a RAID-6 scheme, making three copiesof the data instead of two copies, or something even more advanced. The current two-copy scheme for RAID-X is betterthan all the RAID-5 and RAID-10 systems out in the marketplace today.
Mirrored Cache won't protect against Cosmic rays, but ECC detection/correction does
BarryB incorrectly states that since some implementations of cache are non-mirrored, that this implies they are unprotected against Cosmic rays. Mirroring does not protect against bit-flips unless both copies arecompared for differences. Unfortunately, even if you compared them, the best you can do is detect theyare different, there is no way of knowing which version is correct.Mirroring cache is normally done to protect uncommitted writes. Reads in cacheare expendable copies of data already written to disk, so ECC detection/correction schemes are adequateprotection. ECC is like RAID for DRAM memory. A single bit-flip can be corrected, multiple bit-flipscan be detected. In the case of detection, the cache copy is discarded and read fresh again from disk.IBM DS8000, XIV and probably most other major vendor offerings use ECC of some kind. BarryB is correctthat some cheaper entry-level and midrange offerings from other vendors might cut corners in this area.I don't doubt BarryB's assertion that the ECC method used in the EMC products may be differently implemented than theECC in the IBM DS8000, but that doesn't mean the IBM DS8000's ECC implementation is flawed.
ECC protection is important for all RAID systems that perform rebuild, and even more importantthe larger the GB-hours listed in the table above.
XIV is designed for high-utilization, not less than 50 percent
I mentioned that the typical Linux, UNIX or Windows LUN is only 30-50 percent full, and perhaps BarryBthought I was referring to the typical "XIV customer". This average is for all disk storage systems connectedto these operating systems, based on IBM market research and analyst reports. The XIV is expected to run at much higher utilization rates, and offers features like "thin provisioning" and "differential snapshot" to make this simple to implement in practice.
Most often, disks don't fail without warning. Usually, they give out temporary errors first, and then fail permanently.The XIV architecture allows for pre-emptive self-repair, initiating the re-replication process after detecting temporary errors, rather than waiting for a complete drive failure.
I had mentioned that this process used "spare capacity, not spare drives" but I was notified that there are three spare drives per system to ensure that there is enough spare capacity, so I stand corrected.
New drives don't have to match the same speed/capacity as the new drives, so three to five years from now, whenit might be hard to find a matching 500GB SATA drive anymore, you won't have to.
No RAID scheme eliminates backups or Business Continuity Planning
The XIV supports both synchronous and asynchronous disk mirroring to remote locations. Backup software willbe able to backup data from the XIV to tape. A double drive failure would require a "recovery action", eitherfrom the disk mirror, or from tape, for the few GB of data that need to be recovered.
A third alternative is to allow end-users to receive backups of their own user-generated content. For example, I have over 15,000 photos uploaded over the past six years to Kodak Photo Gallery, which I use to share with my friends and family. For about $180 US dollars, they will cut DVDs containing all of my uploaded files and send them to me, so that I do not have to worry about Kodak losing my photos.In many cases, if a company or product fails to deliver on its promises, the most you will get is your money back, but for "free services" like HotMail, FreeDrive, FlickR and others, you didn't pay anything in the first place, andthey may point this limitation of liability in the "terms of service".
XIV can be used for databases and other online transaction processing
The XIV will have FCP and iSCSI interfaces, and systems can use these to store any kind of data you want. I mentionedthat the design was intended for large volumes of unstructured digital content, but there is nothing to prevent the use of other workloads. In today's Wall Street Journal article[To Get Back Into the Storage Game, IBM Calls In an Old Foe]:
Today, XIV's Nextra system is used by Bank Leumi, a large Israeli bank, and a few other customers for traditional data-storage tasks such as recording hundreds of transactions a minute.
BarryB, thanks for calling the truce. I look forward to talking about other topics myself. These past two weeks have been exhausting!
I welcome HDS into the "Super High-End" club. Those who follow my blog might remember thatI suggested that analysts like IDC that use "Entry Level", "Midrange" and "Enterprise" as categoriesmay need a New Category: Super High End.
I was not surprised to see EMC, who now drops further down in perception, dispute HDS's recent SPC-1 benchmarks.Fellow blogger EMC's BarryB posted on his Storage Anarchist blog [IBM vs. Hitachi] thatpoints out that IBM's SAN Volume Controller (SVC) is still much faster, and less expensive, than USP-V.
So, just in case you haven't seen all the press releases, here is a quick recap on the results:IBM SVC 4.2 is still in first place, then HDS USP-V, then IBM System Storage DS8300. Just for comparison, I includeour IBM System Storage DS4800 midrange disk results, so you can appreciate the difference between midrange and high-end.There are other products from other vendors, I just point out a few from IBM and HDS here in this graph.
******************************************************************** 272,505 IOPS - IBM SVC 4.2 ************************************************** 200,245 IOPS - HDS USP-V ******************************* 123,033 IOPS - IBM DS8300 *********** 45,014 IBM DS4800
HDS tried to come up with a phrase "Enterprise Storage System" for comparison that would leave the SVC 4.2 out.Given that the SVC has five nines (99.999%) availability, has non-disruptive upgrade and firmware update capability, has more than two processors typical of midrange products, and can connect to mainframes via z/VM, z/VSE andLinux on System z operating systems, there is no reason to pretend SVC isn't Enterprise-class.
The irony now is that EMC now looks very lonely being one of the last remaining major storage vendors not to participate in standardized benchmarks that help customers make purchase decisions, as mentioned both by IBM's BarryW: I guess that only leaves EMC, as well as HDS's Claus Mikkelsen: Olympics of Storage.
Earlier this year, EMC's Chuck Hollis opined[Storage Scorecard]that the EMC DMX and HDS TagmaStore USP were high-endboxes, which I would speculate both of these would fall somewhere between DS4800 and DS8300 on the graph above.If that is the case, it is impressive that HDS was able to re-engineer their USP-V to be 2-3x faster thanits predecessor, the USP.
Not all workloads are the same, and your mileage may vary. While I can't speak to HDS, the folks over atEMC have assured me, in writingcomments on this blog, that there is nothing preventing their customers from publishingtheir own performance comparisons between EMC and non-EMC equipment. I would encourage every customer to do this, between IBM and HDS, HDS and EMC, and between IBM and EMC, to help shed even more light on this area.In fact, you can even run your own SPC benchmarks to see how your own environment compares to the ones published.
Of course, performance is just one attribute on which to choose a storage vendor, and to choose specific products,models or features. For more information about Storage Performance Council and the SPC-1 and SPC-2 benchmarks,see my week-long series on SPC benchmarks, which are listed in reverse chronological order.
Go to the official Storage Performance Council website to read the details of the SPC-1 results.
On Tuesday, I covered much of the Feb 26 announcements, but left the IBM System Storage DS8000 for today so that it can haveits own special focus.
Many of the enhancements relate to z/OS Global Mirror, which we formerly called eXtended Remote Copy or "XRC", not to be confused with our "regular" Global Mirror that applies to all data. For those not familiar with z/OS Global Mirror, here is how it works. The production mainframe writes updates to the DS8000, and the DS8000 keeps track of these in cache until a "reader" can pull them over to the secondary location.The "reader" is called System Data Mover (SDM) which runs in its own address space under z/OS operating system. Thanks to some work my team did several years ago, z/OS Global Mirror was able to extend beyond z/OS volumes and include Linux on System z data. Linux on System z can use a "Compatible Disk Layout" (CDL) format (now the default) that meetsall the requirements to be included in the copy session.
IBM has over 300 deployments of z/OS Global Mirror, mostly banks, brokerages and insurance companies. The feature can keep tens of thousands of volumes in one big "consistency group" and asynchronously mirror them to any distance on the planet, with the secondary copy recovery point objective (RPO) only a few seconds behind the primary.
Extended Distance FICON
Extended Distance FICON is an enhancement to the industry-standard FICON architecture (FC-SB-3) that can help avoid degradation of performance at extended distances by implementing a new protocol for "persistent" Information Unit (IU) pacing. This deals with the number of packets in flight between servers and storage separated by long distances, andcan keep a link fully utilized at 4Gpbs FICON up to 50 kilometers. This is particularly important for z/OS GlobalMirror "reader" System Data Mover (SDM). By having many "reads" in flight, this enhancementcan help reduce the need for spoofing or channel-extender equipment, or allow you to choose lower-costchannel extenders based on "frame-forwarding" technology. All of this helps reduce your total cost of ownership (TCO)for a complete end-to-end solution.
This feature will be available in March as a no-charge update to the DS8000 microcode.For more details, see the [IBM Press Release]
z/OS Global Mirror process offload to zIIP processors
To understand this one, you need to understand the different "specialty engines" available on the System z.
On distributed systems where you run a single application on a single piece of server hardware, you mightpay "per server", "per processor" or lately "per core" for dual-core and quad-core processors. Software vendors were looking for a way to charge smaller companies less, and larger companies more. However, you might end up paying the same whether you use 1GHz Intelor 4GHz Intel processor, even though the latter can do four times more work per unit time.
The mainframe has a few processors for hundreds or thousands of business applications.In the beginning, all engines on a mainframe were general-purpose "Central Processor" or CP engines. Based on theircycle rate, IBM was able to publish the number of Million Instructions per Second (MIPS) that a machine witha given number of CP engines can do. With the introduction of side co-processors, this was changed to "Millionsof Service Units" or MSU. Software licensing can charge per MSU, and this allows applications running in aslittle as one percent of a processor to get appropriately charged.
One of the first specialty engines was the IFL, the "Integrated Facility for Linux". This was a CP designatedto only run z/VM and Linux on the mainframe. You could "buy" an IFL on your mainframe much cheaper than a CP,and none of your z/OS application software would count it in the MSU calculations because z/OS can't run on theIFL. This made it very practical to run new Linux workloads.
In 2004, IBM introduced "z Application Assist Processor" (zAAP) engines to run Java, and in 2006, the "z Integrated Information Processor" (zIIP) engines to run database and background data movement activities.By not having these counted in the MSU number for business applications, it greatly reduced the cost for mainframe software.
Tuesday's announcement is that the SDM "reader" will now run in a zIIP engine, reducing the costs for applicationsthat run on that machine. Note that the CP, IFL, zAAP and zIIP engines are all identical cores. The z10 EC hasup to 64 of these (16 quad-core) and you can designate any core as any of these engine types.
Faster z/OS Global Mirror Incremental Resync
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z,you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and alsohave site "A" sending data over to size "C" using z/OS Global Mirror. This is called "Metro z/OS Global Mirror"or "MzGM" for short.
In the past, if the disk in site A failed, you would switch over to site B, and then send all the data all over again. This is because site B was not tracking what the SDM reader had or had not yet processed.With Tuesday's announcement, IBM has developed an "incremental resync" where site B figures out what theincremental delta is to connect to the z/OS Global Mirror at site "C", and this is 95% faster than sendingall the data over.
IBM Basic HyperSwap for z/OS
What if you are sending all of your data from one location to another, and one disk system fails? Do you declare a disaster and switch over entirely? With HyperSwap, you only switch over the disk systems, but leave therest of the servers alone. In the past, this involved hiring IBM Global Technology Services to implementa Geographically Dispersed Parallel Sysplex (GDPS) with software that monitors the situation and updates thez/OS operating system when a HyperSwap had occurred. All application I/O that were writing to the primary locationare automatically re-routed to the disks at the secondary location. HyperSwap can do this for all the disk systems involved,allowing applications at the primary location to continue running uninterrupted.
HyperSwap is a very popular feature, but not everyone has implemented the advanced GDPS capabilities.To address this, IBM now offers "Basic HyperSwap", which is actually going to be shipped as IBMTotalStorage Productivity Center for Replication Basic Edition for System z. This will run in a z/OSaddress space, and use either the DB2 RDBMS you already have, or provide you Apache Derby database for thosefew out there who don't have DB2 on their mainframe already.
Update: There has been some confusion on this last point, so let me explain the keydifferences between the different levels of service:
Basic HyperSwap: single-site high availability for the disk systems only
GDPS/PPRC HyperSwap Manager: single- or multi-site high availability for the disk systems, plus some entry-level disaster recovery capability
GDPS/PPRC: highly automated end-to-end disaster recovery solution for servers, storage and networks
I apologize to all my colleagues who thought I implied that Basic HyperSwap was a full replacement for the morefull-function GDPS service offerings.
Extended Address Volumes (EAV)
Up until now, the largest volume you could have was only 54 GB in size, and many customers still are using 3 GB and 9 GB volume sizes. Now, IBM will introduce 223 GB volumes. You can have any kind of data set on these volumes,but only VSAM data sets can reside on cylinders beyond the first 65,280. That is because many applications still thinkthat 65,280 is the largest cylinder number you can have.
This is important because a mainframe, or a set of mainframes clustered together, can only have about 60,000disk volumes total. The 60,000 is actually the Unit Control Block (UCB) limit, and besides disk volumes, youcan have "virtual" PAVs that serve as an alias to existing volumes to provide concurrent access.
Aside from the first item, the Extended Distance FICON, the other enhancements are "preview announcements" which means that IBM has not yet worked out the final details of price, packaging or delivery date. In many cases, the work is done, has been tested in our labs, or running beta in select client locations, but for completeness I am required to make the following disclaimer:
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Availability, prices, ordering information, and terms and conditions will be provided when the product is announced for general availability.
Yesterday, I started this week's topic discussing the various areas of exploration to helpunderstand our recent press release of the IBM System Storage SAN Volume Controller and itsimpressive SPC-1 and SPC-2 benchmark results that ranks it the fastest disk system in the industry.
Some have suggested that since the SVC has a unique design, it should be placed in its own category,and not compared to other disk systems. To address this, I would like to define what IBM meansby "disk system" and how it is comparable to other disk systems.
When I say "disk system", I am going to focus specifically on block-oriented direct-access storage systems, which I will define as:
One or more IT components, connected together, that function as a whole, to serve as a target forread and write requests for specific blocks of data.
Clarification: One could argue, and several do in various comments below, that there are other typesof storage systems that contain disks, some that emulate sequential access tape libraries, some that emulate file-systems through CIFS or NFS protocols, and some that support thestorage of archive objects and other fixed content. At the risk of looking like I may be including or excluding such to fit my purposes, I wanted to avoid apples-to-orangescomparisons between very different access methods. I will limit this exploration to block-oriented, direct-access devices. We can explore these other types of storage systems in later posts.
People who have been working a long time in the storage industry might be satisfied by this definition, thinkingof all the disk systems that would be included by this definition, and recognize that other types of storage liketape systems that are appropriately excluded.
Others might be scratching their heads, thinking to themselves "Huh?" So, I will provide some background, history, and additional explanation. Let's break up the definition into different phrases, and handle each separately.
read and write requests
Let's start with "read and write requests", which we often lump together generically as input/output request, or just I/O request. Typically an I/O request is initiated by a host, over a cable or network, to a target. The target responds with acknowledgment, data, or failure indication. A host can be a server, workstation, personal computer, laptop or other IT device that is capable of initiating such requests, and a target is a device or system designed to receive and respond to such requests.
(An analogy might help. A woman calls the local public library. She picks up the phone, and dials the phone number of the one down the street. A man working at the library hears the phone ring, answers it with "Welcome to the Public Library! How can I help you?" She asks "What is the capital city of Ethiopia?" and replies "Addis Ababa." and hangs up. Satisfied with this response, she hangs up. In this example, the query for information was the I/O request, initiated by the lady, to the public library target)
Today, there are three popular ways I/O requests are made:
CCW commands over OEMI, ESCON or FICON cables
SCSI commands over SCSI, Fibre Channel or SAS cables
SCSI commands over Ethernet cables, wireless or other IP communication methods
specific blocks of data
In 1956, IBM was the first to deliver a disk system. It was different from tape because it was a "direct access storage device" (the acronym DASD is still used today by some mainframe programmers). Tape was a sequential media, so it could handle commands like "read the next block" or "write the next block", it could not directly read without having to read past other blocks to get to it, nor could it write over an existing block without risking overwriting the contents of blocks past it.
The nature of a "block" of data varies. It is represented by a sequence of bytes of specific length. The length is determined in a variety of ways.
CCW commands assume a Count-Key-Data (CKD) format for disk, meaning that tracks are fixed in size, but that a track can consist of one or more blocks, and can be fixed or variable in length. Some blocks can span off the end of one track, and over to another track. Typical block sizes in this case are 8000 to 22000 bytes.
SCSI commands assume a Fixed-Block-Architecture (FBA) format for disk, where all blocks are the same size, almost always a power of two, such as 512 or 4096 bytes. A few operating systems, however, such as i5/OS on IBM System i machines, use a block size that doesn't follow this power-of-two rule.
one or more IT components
You may find one or more of the following IT components in a disk system:
motorized platter(s) covered in magnetic coating with a read/write head to move over its surface. These are often referred to as Hard Disk Drive (HDD) or Disk Drive Modules (DDM), and are manufacturedby companies like Seagate or Hitachi Global Storage Technologies.
A set of HDD can be accessed individually, affectionately known as JBOD for Just-a-bunch-of-disk, or collectively in a RAID configuration.
Memory can act as the high-speed cache in front of slower storage, or as the storage itself. For example, the solid state disk that IBM announced last week is entirely memory storage, using Flash technology.
Lately, there are two popular packaging methods for disk systems:
Monolithic -- all the components you need connected together inside a big refrigerator-sized unit, with options to attach additional frames. The IBM System Storage DS8000, EMC Symmetrix DMX-4 and HDS TagmaStore USP-V all fit this category.
Modular -- components that fit into standard 19-inch racks, often the size of the vegetable drawer inside a refrigerator, that can be connected externally with other components, if necessary, to make a complete disk system. The IBM System Storage DS6000, DS4000, and DS3000 series, as well as our SVC and N series, fall into this category.
Regardless of packaging, the general design is that a "controller" receives a request from its host attachment port, and uses its processors and cache storage to either satisfy the request, or pass the request to the appropriate HDD,and the results are sent back through the host attachment port.
In all of the monolithic systems, as well as some of the modular ones, the controller and HDD storage are contained in the same unit. On other modular systems, the controller is one system, and the HDD storage is in a separate system, and they are cabled together.
serve as a target
The last part is that a disk system must be able to satisfy some or all requests that come to it.
(Using the same analogy used above, when the lady asked her question, the guy at the public library knew the answer from memory, and replied immediately. However, for other questions, he might need to look up the answer in a book, do a search on the internet, or call another library on her behalf.)
Some disk systems are cache-only controllers. For these, either the I/O request is satisfied as a read-hit or write-hit in cache, or it is not, and has to go to the HDD. The IBM DS4800 and N series gateways are examples of this type of controller.
Other systems may have controller and disk, but support additional disk attachment. In this case, either the I/O request is handled by the cache or internal disk, or it has to go out to external HDD to satisfy the request. IBM DS3000 series, DS4100, DS4700, and our N series appliance models, all fall into this category.
So, the SAN Volume Controller is a disk system comprising of one to four node-pairs. Each node is a piece of IT equipment that have processors and cache. These node-pairs are connected to a pair of UPS power supplies to protect the cache memory holding writes that have not yet been de-staged. The combination of node-pairs and UPS acting as a whole, is able to serve as a target to SCSI commands sent over Fibre Channel cables on a Storage Area Network (SAN). To read some blocks of data, it uses its internal cache storage to satisfy the request, and for others, it goes out to external disk systems that contain the data required. All writes are satisfied immediately in cache on the SVC, and later de-staged to external disk when appropriate.
As of end of 2Q07, having reached our four-year anniversary for this product, IBM has sold over 9000 SVC nodes, which are part of more than 3100 SVC disk systems. These things are flying off the shelves, clocking in a 100% YTY growth over the amount we sold twelve months ago. Congratulations go to the SVC development team for their impressive feat of engineering that is starting to catch the attention of many customers and return astounding results!
So, now that I have explained why the SVC is considered a disk system, tomorrow I'll discuss metrics to measure performance.
Today, I'll cover the announcements related to our IBM System Storage N series disk systems, which ties inwith Valentines Day theme nicely. The phrase we use for "unified storage" is that N series allows you to "share the closet, not necessarily the clothes". Couples recognize the value of a shared closet over having one closet for just the man's clothes, and a separate closet for just the woman's clothes. (For some couples, the man's closet would be terribly under utilized!). By analogy, the N series allows you to share one solution for LUNs that can be accessed via FCP or iSCSI protocols, and NAS file systems that can be accessed via NFS and CIFS protocols. In most data centers, Windows and UNIX applications are about as likely to share files as men and women are to wear each other's clothes, so the analogy is in tact.
Let's take a look at what got announced:
N7700 and N7900
There are actually [eight new high-end N series] models. the N7900 has 4 processors and 32GB of cache. The N7700 has 2 processors and 16GB cache. Each has two appliance models (A11 single node and A21 dual node) and two gateway models (G11 single node and G21 dual node).
The appliance models support both FC and SATA disk. The N7900 A models support a maximum of 1176 drives; the N7700 A models supports 840 drives. The gateway models provide FCP, iSCSI and NAS host access through external disk attachment. The N7900 gateway models support 1176 LUNs on external disk systems; the N7700 gateway models support 840 external LUNs.
N series now supports 1 TB SATA disk
The [EXN1000 expansion drawer] can now have up to fourteen 1TB SATA drives. This is in addition to previousannouncements supporting 500GB and 750GB drive capacities. These drawer support the entire N series line.
With 1 TB drives, the N7900 now supports up to 1176 TB of raw capacity, which is over 1PB of usabledata in 12+2P RAID-DP mode. This is greater than the internal disk capacity limits of current IBM DS8000, EMC DMX andHDS USP-V models.
At the low end, both the N3300 and N3600 now support 500GB, 750GB and 1TB SATA drives in addition to the SASdrives they supported.
SnapManager for Microsoft SharePoint
There is a new SnapManager in town. This one is for Microsoft SharePoint data. See the announcementfor the [N3300 and N3600] for details.
On Jan 24, IBM signed agreements with [Ingram Micro, Tech Data, and Synnex], to distribute the N Series products and work with IBM to recruit new solution providers to the line. These three are all well-respected world-class distribution providers, so weare glad to have increased our partnership with them on this.
To get beyond the simple statistics of vendor popularity, we looked at the number and combinations of vendors with which enterprises work. Many were customers of one or two storage providers, but the rest were customers of up to six storage providers. More than one-third were customers of systems vendors only, bypassing storage specialists.
Comparisons between solutions vendors and storage component vendors are not new. One could argue that this can be compared to supermarkets and specialty shops.
Supermarkets offer everything you need to prepare a meal. You can buy your meat, bread, cheese,and extras all with one-stop shopping. In a sense, IBM, HP, Sun and Dell are offering this to clients who prefer this approach. Not surprisingly, the two leaders in overall storage hardware,IBM and HP, are also the two best to offer a complete set of software, services, servers and storage.
IBM and HP are also the leaders in tape.While Forrester reports that many large enterprises in North America prefer to buy diskfrom storage specialists, others have found that customers prefer to buy their tape from solution providers. Recently, Byte and Switch reports thatLTO Hits New Milestones,where the LTO consortium (IBM, HP, and Quantum) have collectively shipped over 2 million LTO tape drives, and over 80 million LTO tape cartridges. Perhaps this is because tape is part of an overallbackup, archive or space management solution, and customers trust a solution vendor overa storage specialist.
Where possible, IBM brings synergy between its servers and storage. For example, we justannounced the IBM BladeCenter Boot Disk System, a 2U high unit that supports up to 28 blade servers, ideal for applications running under Windows or Linux, and helping to reduce the energy consumption for thoseinterested in a "Green" data center.
Some people prefer buying their meat at the slaughterhouse, bread at the French pastry shop, andso on. Storage specialists focus on just storage, leaving the rest of the solution, like servers,to be purchased separately from someone else. Storage vendors like NetApp, EMC, HDS and othersoffer storage components to customers that like to do their own "system integration", or to thosethat are large enough to hire their own "systems integrator".
Storage specialists recognize that not everybody is a "specialty shop" shopper.HDS has done well selling their disk through solution vendorslike HP and Sun. EMC sells its gear through solution vendor Dell.
Interestingly, I have met clients who prefer to buy IBM System Storage N series from IBM, becauseIBM is a solution vendor, and others that prefer to buy comparable NetApp equipment directly fromNetApp, because they are a storage component vendor.
I mostly buy my groceries at a supermarket, buthave, on occasion, bought something from the local butcher, baker or candlestick maker. And if you are ever in Tucson, you might be able to find Mexican tamalessold by a complete stranger standing outside of a Walgreens pharmacy, the ultimate extreme of specialization. You can get a dozen tamales for tenbucks, and in my experience they are usually quite good. Theoretically, if you get sick, or they don't taste right, you have no recourse, and will probably never see that stranger again to complain to.(And no, before I get flamed, I am not implying any major vendor mentioned above is like this tamale vendor)
Of course, nothing is starkly black and white, and comparisons like this are just to help provide context and perspective,but if you are looking to have a complete IT solutionthat works, from software and servers to storage and financing, come to the vendor you can trust, IBM.
On his The Storage Architect blog, Chris Evans wrote [Twofor the Price of One]. He asks: why use RAID-1 compared to say a 14+2 RAID-6 configuration which would be much cheaper in terms of the disk cost? Perhpaps without realizing it, answers itwith his post today [XIV part II]:
So, as a drive fails, all drives could be copying to all drives in an attempt to ensure the recreated lost mirrors are well distributed across the subsystem. If this is true, all drives would become busy for read/writes for the rebuild time, rather than rebuild overhead being isolated to just one RAID group.
Let me try to explain. (Note: This is an oversimplification of the actual algorithm in an effortto make it more accessible to most readers, based on written materials I have been provided as partof the acquisition.)
In a typical RAID environment, say 7+P RAID-5, you might have to read 7 drives to rebuild one drive, and in the case of a 14+2 RAID-6, reading 15 drives to rebuild one drive. It turns out the performance bottleneck is the one driveto write, and today's systems can rebuild faster Fibre Channel (FC) drives at about 50-55 MB/sec, and slower ATA disk at around 40-42 MB/sec. At these rates, a 750GB SATA rebuild would take at least 5 hours.
In the IBM XIV Nextra architecture, let's say we have 100 drives. We lose drive 13, and we need to re-replicate any at-risk 1MB objects.An object is at-risk if it is the last and only remaining copy on the system. A 750GB that is 90 percent full wouldhave 700,000 or so at-risk object re-replications to manage. These can be sorted by drive. Drive 1 might have about 7000 objects that need re-replication, drive 2might have slightly more, slightly less, and so on, up to drive 100. The re-replication of objects on these other 99 drives goes through three waves.
Select 49 drives as "source volumes", and pair each randomly with a "destination volume". For example, drive 1 mapped todrive 87, drive 2 to drive 59, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
50 volumes left.Select another 49 drives as "source volumes", and pair each with a "destination volume". For example, drive 87 mapped todrive 15, drive 59 to drive 42, and so on. Initiate 49 tasks in parallel, each will re-replicate the blocks thatneed to be copied from the source volume to the destination volume.
Only one drive left. We select the last volume as the source volume, pair it off with a random destination volume,and complete the process.
Each wave can take as little as 3-5 minutes. The actual algorithm is more complicated than this, as tasks complete early the source and volumes drives are available for re-assignment to another task, but you get the idea. XIV hasdemonstrated the entire process, identifying all at-risk objects, sorting them by drive location, randomly selectingdrive pairs, and then performing most of these tasks in parallel, can be done in 15-20 minutes. Over 40 customershave been using this architecture over the past 2 years, and by now all have probably experienced at least adrive failure to validate this methodology.
In the unlikely event that a second drive fails during this short time, only one of the 99 task fails. The other 98 tasks continue to helpprotect the data. By comparison, in a RAID-5 rebuild, no data is protected until all the blocks are copied.
As for requiring spare capacity on each drive to handle this case, the best disks in production environments aretypically only 85-90 percent full, leaving plenty of spare capacity to handle re-replication process. On average,Linux, UNIX and Windows systems tend to only fill disks 30 to 50 percent full, so the fear there is not enough sparecapacity should not be an issue.
The difference in cost between RAID-1 and RAID-5 becomes minimal as hardware gets cheaper and cheaper. For every $1 dollar you spend on storage hardware, you spend $5-$8 dollars managing the environment. As hardware gets cheaper still, it might even be worth making three copies of every 1MB object, the parallel processto perform re-replications would be the same. This could be done using policy-based management, some data gets triple-copied, and other data gets only double-copied, based on whether the user selected "premium" or "basic" service.
The beauty of this approach is that it works with 100 drives, 1000 drives, or even a million drives. Parallel processingis how supercomputers are able to perform feats of amazing mathematical computations so quickly, and how Web 2.0services like Google and Yahoo can perform web searches so quickly. Spreading the re-replication process acrossmany drives in parallel, rather than performing them serially onto a single drive, is just one of the many uniquefeatures of this new architecture.
Use more efficient disk media, such as high-capacity SATA disk drives
Both are great recommendations, but why limit yourself to what EMC offers? Your x86-based machines are only a subset of your servers,and disk is only a subset of your storage. IBM takes a more holistic approach, looking at the entire data center.
VMware is a great product, and IBM is its top reseller. But in addition to VMware, there are other solutions for the x86-based servers, like Xen and Microsoft Virtual Server. IBM's System p, System i, and System z product lines all support logical partitioning.
To compare the energy effectiveness of server virtualization, consider a metric that can apply across platforms. For example, for an e-mail server, consider watts per mailbox. If you have, say, 15,000 users, you can calculate how many watts you are consuming to manage their mailboxes on your current environment, and compare that with running them on VMware, or logical partitions on other servers. Some people find it surprising that it is often more cost-effective, and power-efficient, to run workloads on mainframe logical partitions (LPARs) than a stack of x86 servers running VMware.
More efficient Media
SATA and FATA disks support higher capacities, and run at slower RPM speeds, thus using fewer watts per terabyte.A terabyte stored on 73GB high-speed 15K RPM drives consumes more watts than the same terabyte stored using 500GB SATA.Chuck correctly identifies that tape is more power-efficient than disk, but then argues that paper is more power-efficient than tape. But paper is not necessarily more efficient than tape.
ESG analyst Steve Duplessie divides up data betweenDynamic vs. Persistent. The best place to put dynamic data is on disk, and here is where evaluation of FC/SAS versus SATA/FATA comes into play.Persistent data, on the other hand, can be stored on paper, microfiche, optical or tape media. All of these shelf-resident media consume no electricity, nor generate any heat that would require additional cooling.
A study by scientists at the Lawrence Berkeley National Laboratory titled High-Tech Means High-Efficiency: The Business Case for Energy Management in High-Tech Industries indicates thatData centers consume 15 to 100 times more energy per square foot than traditional office space. Storing persistent data in traditional office space can save a huge amount of energy. Steve Duplessie feels the ratio of dynamic to persistent data is 1:10 today, but is likely to grow to 1:100 in the near future, raising the demand for energy-efficient storage of persistent data ever more important to our environment.
Data centers consume nearly 5000 Megawatts in the USA alone, 14000 Megawatts worldwide. To put that in perspective, the country of Hungary I was in last week can generate up to 8000 Megawatts for the entire country (and they were using 7400 Megawatts last week as a result of their current heat wave, causing them grave concern).
Back in the 1990's, one of the insurance companies IBM worked with kept data on paper in manila folders, and armiesof young adults in roller skates were dispatched throughout the large warehouses of shelves to get the appropriate folder in response to customer service inquiries. Digitizing this paper into electronic format greatly reduced the need for this amount of warehouse space, as well as improved the time to retrieve the data.
A typical file storage box (12 inch x 12 inch x 18 inch) containing typed pages single-spaced, double-sided, 12 point font could hold perhaps 100MB. The same box could hold a hundred or more LTO or 3592 tape cartridges, each storing hundreds of GB of information. That's a million-to-one improvement of space-efficiency, and from a watts-per-TB basis, translates to substantial improvement in standard office air conditioning and lighting conditions.
To learn more about IBM's Project Big Green, watch thisintroductory video which used Second Life for the animation.
Today was the "First Ever Live Virtual Virtualization Tech Fair" sponsored by IBM and VMware. This was a 1-day event hosted by Unisfair.
The day included presentations done at a conference call, along with exhibition booths.
We had an exhibition booth exclusively for "storage virtualization" featuring our IBM System Storage SAN Volume Controller (disk virtualization) and IBM System Storage TS7520 Virtualization Engine (a virtual tape library, or VTL).
People who were logged in were represented in silhouette form. When someone walked into the booth, our army of "booth reps" were able to chat with them and answer their questions. They could also peruse the various online materials we made available about each product.
Here are some of my observations:
A lot of questions were related to IBM's support for VMware. Although VMware is now currently owned by EMC, pending a spin-off IPO, IBM is its biggest reseller, given IBM's vast experience in server virtualization. Ironically, IBM's SAN Volume Controller supports VMware better than EMC's own storage virtualization product, Invista.
People also familiar with Second Life thought this 2-D "silhouette" version eliminated the need to configure and dress up your avatar as is required in participating in Second Life events. However, being only ableto chat, send e-mail and show web pages seemed less immersive than what Second Life can offer.
This event generated over 60 leads. We will pass on the contact information to the appropriate sales team.
Well it's Tuesday, and ["election day"] here in the USA, and again IBM has more announcements.
IBM announced [IBM Tivoli Key Lifecycle Manager v1.0] (TKLM) to manage encryption keys. This provides a graphical interface to manage encryption keys, including retention criteria when sharing keys with other companies.
TKLM is supported on AIX, Solaris, Windows, Red Hat and SUSE Linux. IBM plans to offer TKLM forz/OS in 2009. TKLM can be used with Firefox or Internet Explorer web browser. This will include the Encryption Key Manager (EKM) that IBM offered initially to support encryption keys for the TS1120, TS1130, and LTO-4 drives.
While this is needed today for tape, IBM positions this software to also manage the encryption keys for "Full Drive Encryption" (FDE) disk drive modules (DDM) in IBM disk systems in 2009.
This month (September, 2006) marks our 50th anniversary of the disk system. The first disk system was the 350 Disk Storage Unit, designed to attach to the IBM 305 RAMAC mainframe computer, both introduced to the world in September, 1956.
Well, we had another successful event in Second Life today.
Unlike our April 26 launch of our System Storage products for IBM Business Partners only, this time we decided this time to make it as a "Meet the Storage Experts" Q&A Panel format, and open up registration to everyone. Thesubject matter experts sat at the front of the room on four stools. We had six rows of chairs arrangedsemi-circularly.
Shown above, from left to right, are the avatars of our four experts:
IBM System Storage N series, focusing on recent N3000 disk system announcements
Harold Pike (holding the microphone while speaking)
IBM System Storage DS3000 and DS4000 series, focusing on recent DS3000 disk system announcements
IBM System Storage TS series, focusing on recent TS2230, TS3400 and TS7700 tape system announcements
IBM storage networking, focusing on recent IBM SAN256B director blade announcements
While Eric was a veteran Second Lifer, having presented at our April event, the other three were trainedon how to raise their hand, speak into the microphone, sit on the stool, and so on. I want to thank allof our experts for putting in this effort!
The event was produced by Katrina H Smith. She did a great job, and made sure we were on top ofall the issues and tasks required to get the job done. Running a Second Life event is every bit ashard as running a real face-to-face event. We had several meetings to discuss venue details, placementof chairs, placement of product demos, audio/video recording, wall decorations, tee-shirt and coffee mug design, logistics, and so on.
I acted as moderator/emcee for the event. That is my back in the picture above. The process wassimple, modeled after the "Birds of a Feather" sessions at events like SHARE and the IBMStorage and Storage Networking Symposium. We threw out a list of topics the experts would cover,and people in the audience would "raise their left hand". I, as the moderator, would then walkover to each person, and hold out the microphone for them to ask the question. I would then repeat the question and ask the appropriate expert to provide an answer. We defined gestures onhow to "raise hand" and "put hand down" that we gave to each registered participant.
We had four dedicated "camera-avatars" in world to capture both video and screenshots.Our video editors are now working to edit "highlight videos" that we can use at future events, for training materials, and for our internal "BlueTube" online video system.
The room was filled with examples of each of our products, made into 3D objects that were dimensionallycorrect, and "textured" with photographs of the actual products. If you click on an object, you get a "notecard" that provided more information. Special thanks to Scott Bissmeyer for making all of theseobjects for us.
We made posters of each expert and placed them in all four corners of the room. On the bottom of each coffee mug was a picture of each of the experts, and if you walked under each of the posters, you were"dispensed" a coffee mug matching the expert shown in the poster.Participants could "Collect all Four!" When you bring the coffee mug up to takea sip, the picture on the bottom of the mug is exposed for all to see.And as a final give-away to the audience, we made a variety of event tee-shirts and polo-shirts.
At the end of the session, we asked everyone to click on the "Survey" kiosk near the exit door. We askedsix simple questions using SurveyMonkey.com that took only a fewminutes to process. We found asking questions immediately at the end of the event was the best way tocapture this feedback.
From a "Green" perspective, we had people registered from the following countries: US, India, Mexico,Australia, United Kingdom, Brazil, Germany, Argentina, Chile, China, Canada, and Venezuela. Second Lifeallows all these people who probably could not travel, or could not afford the time and expense to travel,to participate in a simulated face-to-face meeting without energy consumption of traditional travel methods.
More importantly, we got several leads for business. People often ask "Yes, but is there any businessassociated with this?" This time, there was, based on the answers to the questions, several avatars asked for a real sales call to follow-up on the products and offerings they were discussed.
With such a great success, we have already scheduled our next Second Life event, November 8. Mark your calendars! I'll postmore details on the registration process of the November event when available.
It's Tuesday, which means IBM makes its announcements. We had several for the IBM System Storage product line. Here's a quick recap.
The IBM System Storage DS3000 now offers DC power models.New DC powered models of the DS3200, DS3400, and EXP3000 are well suited for Telco industry environments, as theseare NEBS and ETSI compliant and are powered by an industry standard 48 volt DC power source.
Also, the IBM System Storage N series now supports750GB SATA drives available for the EXN1000 drawer.
IBM Virtualization Engine TS7740now supports 3-cluster grids. Unlike 3-way replication on disk mirroring, such as IBM Metro/Global Mirror for the DS8000 that enforces a primary, secondary and tertiary copy, the grid implementation of TS7740 tape virtualization allows for any-to-any mirroring. Existing standalone TS7740 clusters can be converted to grid-enabled. A "Copy Export" feature allows virtual tapes to be exported onto physical tape. And in keeping with our theme of "enabling business flexibility", performance throughput can now be purchased in 100 MB/sec increments, up to 600 MB/sec, to match your workload bandwidth requirements.
The IBM System Storage TS1120drives installed in the IBM System Storage™ TS3400 Tape Library can now be attached to System z platforms using the IBM System Storage™ TS1120 Tape Controller. Before this, the TS3400 could only be attached to UNIX, Windows and Linux systems.
The IBM System StorageTS2230 Express is offered as an external stand-alone or rack-mountable unit. This model incorporates the new LTO IBM Ultrium 3 Serial Attached SCSI (SAS) Half-High Tape Drive, and a 3 Gbps single port SAS interface for a connection to a wide spectrum of distributed system servers that support Microsoft Windows and Linux systems.
IBM has added theCisco MDS 9124 for IBM System Storageentry-level fabric switch as an Express offering and part of the IBM Express Advantage Program. Express offerings are specifically created for mid-market companies and are well suited for workgroup storage applications like e-mail serving, collaborative databases and web serving. They bring enterprise-class performance, scalability and features to small and medium-sized companies and are easy to use, highly scalable, and cost-effective.This will make it easier for IBM Business Partners to provide fabric switch connectivity for:
Storage consolidation solutions with IBM System Storage™ DS4000 Express disk arrays, especially the DS4700 Express.
Backup / restore solutions with IBM System Storage™ TS3000 Tape Libraries, such as the TS3200.
Archive and Retention
Ordering large configurations of the IBM System Storage Grid Access Manager just got a lot easier.New features enable configurations greater than 500 TB to be submitted as a single order. No change in the actualproduct, just an improvement in the ordering process.
For System p and System i servers, the IBM 3996 Optical library now supports Gen 2 60GB optical cartridges. These can be read/write or WORM cartridges.
I'm off to Denver, Colorado this week. I hope it is cooler there than it is down here in Tucson, Arizona.
This is a reasonable question. Since Invista 2.0 came out months ago in August, and Invista 2.1 is rumored to be out by end of this month, why put out a press release now, rather than just wait a few weeks? Thesignificant part of this announcement was that EMC finally has their first customer reference.To be fair, getting a customer to agree to be a reference is difficult for any vendor. Some non-profitsand government agencies have rules against it, and some corporations just don't want to be bothered byjournalists, or take phone calls from other prospective customers. I suspect EMC wanted to put the good folks from Purdue University in front of the cameras and microphones before they:
In Moore's terminology, Purdue University would be a "technology enthusiast", interested in exploring the technologyof the EMC Invista. Universities by their very nature often see themselves as early adopters, willing to take big risks in hopes to reap big rewards. The chasm happens later, when there are a lot of early adopters, all willing to be reference accounts. The mainstream market--shown here as pragmatists, conservatives, and skeptics-- are unwillingto accept reference claims from early adopters, searching instead for moderate gains from minimal risks. They prefer references from customers that are similar in size and industry. Whether a vendor can get a product to cross this chasm is the focus of the book.
Why "SAN" virtualization?
Technically, Invista is "storage" virtualization, not "SAN" virtualization. Virtualizationis any technology that makes one set of resources look and feel like a different setof resources, preferably with more desirable characteristics. You can virtualizeservers, SANs, and storage resources.
Virtual SAN (VSAN) technology, supported bythe Cisco MDS 9500 Series Multilayer Director Switch, partitions a single physical SAN into multipleVSANs, allowing different business functions and requirements to share a common physical infrastructure.
How does Invista advance Cisco's VSAN functionality? It doesn't, but that doesn't makethe title a falsehood, or the press release by association full of lies.If you read the entire press release, EMCcorrectly states that Invista is "storage" virtualization. Some storagevirtualization products, like EMC Invista and IBM System Storage SAN Volume Controller (SVC), require a SAN as a platform for which to perform their magic.Marketing people might use the term "SAN" torefer not just the network gear that provides the plumbing, but also to include the storage devices that are attached to the SAN. In that light, theuse of "SAN virtualization" can be understood in the title.
More importantly, it appears that EMC no longer requires that you purchase new SAN equipment from themwith Invista. When the Invista first came out, it cost over a quarter-million US dollars to cover thecost of the intelligent switches, but with the price drop to $100K, I imagine this means theyassume everyone has an appropriately-supported intelligent switch already deployed.
Why this architecture?
In his post [Storage Virtualization and Invista 2.0], EMC blogger ChuckH does a fair job explaining why EMC went in this direction for Invista, and how it is different thanother storage virtualization products.
Most storage virtualization products are cache-based. The world's first disk storagevirtualization product, the IBM 3850 Mass Storage System, introduced in 1974, and thefirst tape virtualization product, the IBM 3494 Virtual tape Server, introduced in 1997, bothused disk cache in front of tape storage. Later virtualization products, like IBM SVC and HDS USP-V, use DRAM memory cache in front of disk storage, but the concept is the same.People are comfortable with cache-based solutions, because the technology is matureand well proven in the marketplace, and excited and delighted that these can offer the following features in a mixed heterogeneous disk environment:
instantaneous point-in-time copy
None of these features are provided by Invista, as there is no cache in the switch. Instead,Invista is a "packet cracker"; it cracks open each FCP packet, inspects and modifies the contents, then passes theFCP packet along to the appropriate storage device. This process slows down each read andwrite by some amount, perhaps 20 microseconds. The disadvantage of slowing down every readand write is offset by having other benefits, like non-disruptive data migration.
To compensate for Invista's inability to provide these features,EMC offers a second solution called EMC RecoverPoint, which is an in-band cache-based appliancesimilar in design to SVC, but maps all virtual disks one-to-one to physical disks. It offersremote distance asynchronous mirroring between heterogeneous devices.EMC supports RecoverPoint in front of Invista, but if you are considering buying bothto get the combined set of features, you might as well buy an IBM SVC or HDS USP-V instead,in one system, rather than two, which is much less complicated. IBM SVC and HDS USP-Vhave both "crossed the chasm" having sold thousands of units to every type and size of customer.
Hopefully, this answers the questions you might have about EMC Invista.
When times are tough, people revert back to their "default programming", and companies search for their"core strengths".The Redwoods Group calls this the[Native Language Theory]. Here'san excerpt:
A young carpenter immigrates to the United States from Italy, unable to speak a word of English. Upon arrival, he moves into a small apartment by himself and begins looking for a job in construction. With some luck and a lot of hard work, he quickly lands a job at a local construction site. Over the coming weeks he learns how to say “hello” and “goodbye” to his English-only coworkers. As time goes on, he is able to learn more complex phrases and commands and is now able to begin taking on jobs that better match his level of expertise.
Several years after the carpenter moved to the US, he now speaks fluent English and has started a family with an American woman and now speaks only English on the job site and at home. One afternoon, while hammering at the framing of a new home, the carpenter strikes his thumb. In what language does he curse?Italian, of course.
We believe that this story illustrates the nature of reacting to difficult, stressful, and, yes, painful situations by reverting to what you know best. This is the reason that coaches ask their players to make certain actions “instinctual” – simply, when times get tough, we do what we fall back on our native language.
Last September, in my post[Supermarketsand Specialty Shops] I mentioned how Forrester Research identified two kinds of IT vendors selling storage. On one side were the"information infrastructure" companies (IBM, HP, Sun, and Dell) that focus on providing one-stop shopping for clients that want all parts of an IT solution, including servers, storage, software and services. These I compared to "supermarkets".
On the other side were the storage component vendors (EMC, HDS, NetApp, and many others) that focus on specificstorage components. These I compared to "specialty shops", like butchers, bakers and candlestick makers.These often appeal to customers with big enough IT staffs with the skills to do their own system integration.The key difference seems to be that the supermarkets are client-focused, and the specialty shops are technology-focused, and different people prefer to do business with one side or another.This came in handy last November to explain Dell's acquisition of EqualLogic and discuss[IBMEntry-Level iSCSI offerings].
Some recent news seems to fit this model, in relation to the Native Language Theory.
Several argued that EMC was in the process of shifting sides, from disk specialty shop over to an everything-but-servers supermarket. Certainly many of its acquisitions in software, services, and VMwarewould support the notion that perhaps they are going through an identity crisis.The immediate beneficiary was HDS, the #2 disk specialty shop, that passedup EMC with innovative features in its USP-V disk system.
However, times are tough, especially in the U.S. economy that many storage vendors are focused on. EMCappears to have found its native language, going back to its roots of solid state storage systems thatthey started with back in 1979. This week EMC announced [Symmetrix DMX-4 support of Flash drives].Several bloggers review the technology involved:
Overall smart move for EMC to go back to its technology-focused disk specialty shop mode and go head-to-head against the HDS threat. With Web 2.0 workloads moving off these monolithic solutions and onto [clustered storage more appropriate for "cloud computing"], large enterprise-class disk systems like theIBM System Storage DS8000 and EMC DMX-4 can shift focus on what they do best: online transaction processing (OLTP) and large databases. However,I noticed the EMC press release mentions EMC as an "information infrastructure" company, so perhaps they stillhaven't resolved their identity crisis.
After Sun acquired StorageTek specialty shop, they too had a bit of an identity crisis.Fortunately, they realized their core strengths were on the "supermarket" side,moved storage in with servers in their latest restructuring, changed their NYSE symbol from SUNW to JAVA, and reset their focus on providing end-to-end solutions like IBM. For example, fellow blogger Taylor Allis from Sun mentions their latest in "clustered storage" in his post[IBM Buys XIV - Good Move].
In an ironic twist, some of today's leading manufacturers of server computers are also among the companies moving most aggressively to reduce their need for servers and other hardware components. Hewlett-Packard, for instance, is in the midst of a project to slash the number of data centers it operates from 85 to 6 and to cut the number of servers it uses by 30 percent. Now, Sun Microsystems is upping the stakes. Brian Cinque, the data center architect in Sun's IT department, says the company's goal is to close down all its internal data centers by 2015. "Did I just say 0 data centers?" he writes on his blog."Yes! Our goal is to reduce our entire data center presence by 2015."
While Nick feels this is ironic for Sun, known for UNIX servers based on their SPARC chip technology, I don't. Sun has shifted from being technology-focused to being client-focused.This is where the marketplace is going, and the supermarket vendors, being client-focused, are best positioned to adapt to this new world. In a sense, Sun found its roots. Nick summarizes this as:"The network, to spin the old Sun slogan, becomes the data center."
So, each move seems to strengthen their respective identities back to their origins, or at least help them communicate that to the market.
For those of us in the northern hemisphere, yesterday was this year's Winter Solstice, representingthe shortest amount of daylight between sunrise and sunset. So today, I thought I would blog on my thoughtsof managing scarcity.
Earlier in my career, I had the pleasure to serve as "administrative assistant" to Nora Denzel for the week at a storage conference. My job was to make her look good at the conference, which if you know Nora, doesn't take much. Later, she left IBM to work at HP, and I gotto hear her speak at a conference, and the one thing that I remember most was her statement that thewhole point of "management" was to manage scarcity, as in not enough money in the budget,not enough people to implement change, or not enough resources to accomplish a task.(Nora, I have no idea where you are today, so if you are reading this, send me a note).
Of course, the flip-side to this is that resources that are in abundance are generallytaken for granted. Priorities are focused on what is most scarce. Let's examine some of theresources involved in an IT storage environment:
Capacity - while everyone complains that they are "running out of space", the truth is that most external disk attached to Linux, UNIX, or Windows systems contain only 20-40% data. Many years ago, I visitedan insurance company to talk about a new product called IBM Tivoli Storage Manager. This company had 7TB of disk on their mainframe,and another 7TB of disk scattered on various UNIX and Windows machines. In the room were TWO storage admins for
the mainframe, and 45 storage admins for the distributed systems. My first question was "why so many people forthe mainframe, certainly one of you could manage all of it yourself, perhaps on Wednesday afternoons?" Their response was that they acted as eachother's backup, in case one goes on vacation for two weeks. My follow-up question to the rest of the audience was:"When was the last time you took two weeks vacation?" Mainframes fill their disk and tape storage comfortablyat over 80-90% full of data, primarily because they have a more mature, robust set of management software, likeDFSMS.
Labor - by this I mean skilled labor able to manage storage for a corporation. Some companies I have visitedkeep their new-hires off production systems for the first two years, working only on test or development systemsonly until then. Of course, labor is more expensive in some countries than others. Last year, I was doing a whiteboard session on-site for a client in China, and the last dry-erase pen ran out of ink. I asked for another pen, and they instead sent someone to go re-fill it. I asked wouldn't it be cheaper just to buy another pen, and they said "No, labor is cheap, but ink is expensive." Despite this, China does complain that there is a shortage of askilled IT labor force, so if you are looking for a job, start learning Mandarin.
Power and Cooling - Most data centers are located on raised floors, with large trunks of electrical power and hugeair conditioning systems to deal with all the heat generated from each machine. I have visited the data centers ofclients that are forced now to make decisions on storage based on power and cooling consumption, because the coststo upgrade their aging buildings are too high. Leading the charge is IBM, with technology advancements in chips, cards, and complete systems that use less power, and generate less heat. While energy is still fairly cheap in the grand scheme of things, fears ofGlobal Warmingand declining oil supplies, the costs ofpower and cooling have gotten some news lately. In 1956, Hubbert predicted US would reach peak oil supplies by1965-1970 (it happened in 1971), and this year Simmonsestimated that world-wide oil production began its decline already in 2005. Smart companies like Google have movedtheir server farms to places like Oregon in the Pacific Northwest for cheaper hydroelectric power.
Bandwidth - Last year IBM introduced 4Gbps Fibre Channel and FICON SAN networking gear, along with the servers and storage needed to complete the solution. 4Gbps equates to about 400 MB/sec in data throughput. By comparison, iSCSI is typically run on 1Gbps Ethernet, but has so much overheads that you only get abour 80 MB/sec. Next year, we may see both 8 Gbps SAN, and 10 GbE iSCSI, to provide 800 MB/sec throughputs. My experience is that the SAN is not the bottleneck, instead people run out of bandwidth at the server or storage end first. They may not have a million dollars to buy the fastest IBM System p5 servers, or may not have enough host adapters at the storage system end.
Floorspace - I end with floorspace because it reminds me that many "shortages" are temporary or artificially created. Floorspace is only in short supply because you don't want to knock down a wall, or build a new building, to handle your additional storage requirements.In 1997, Tihamer Toth-Fejel wrote an article for the National Space Society newsletter that estimated that ...Everybody on Earth could live comfortably in the USA on only 15% of our land area, with a population density between that of Chicago and San Francisco. Using agricultural yields attained widely now, the rest of the U.S. would be sufficient to grow enough food for everyone. The rest of the planet, 93.7% of it, would be completely empty.Of course, back in 1997 the world population was only 5.9 billion, and this year it is over 6.5 billion.
This last point brings me back to the concept of food, and I am not talking about doughnuts in the conference room, or pizza while making year-end storage upgrades. I'm talking aboutthe food you work so hard to provide for yourself and your family. The folks at Oxfam came up with a simpleanalogy. If 20 people sit down at your table, representing the world’s population:
3 would be served a gourmet, multi-course meal, while sitting at decorated table and a cushioned chair.
5 would eat rice and beans with a fork and sit on a simple cushion
12 would wait in line to receive a small portion of rice that they would eat with their hands while sitting on the floor.
So for those of you planning a special meal next Monday, be thankful you are one of the lucky three, and hopefulthat IBM will continue to lead the IT industry to help out the other seventeen.
In last week's System Storage Portfolio Top Gun class in Dallas, some of the students were not familiarwith Really Simple Syndication (RSS). For the uninitiated, this can be intimidating.I thought a quick overview of what I've done might help:
Chose a "feed reader". I chose Bloglines but there are many others.
Use Technorati to search other blogs for keywords or phrases I am looking for.
When I find a blog that I like to continue tracking, I "add" it to my subscription list on bloglines. Just hit "add" and copy the URL of the blog you want to track. Bloglines will figure out the RSS keywords required.I track eight blogs at the momemnt, but some people with lots of time on their hands track 20 or more. It is easy to unsubscribe, so don't be afraid to try some out for a few days.
Since I was actually going to run a blog of my own, I read a few books on the topic. One I recommend is "Naked Conversations" by Robert Scoble and Shel Israel, both experienced bloggers.
Finally, I am not big on spell checking, but most places have the option to preview your post or comment before it actually gets posted, which is not a bad idea if you use any HTML tags.
For a quick taste of blogging, consider using Data Storage Blogger Feed Reader. This has a lot of blogs on the topic of storage, already added and categorized for your convenience, ready for your perusal.
I am sure there are many other ways to enjoy the Blogosphere, but this works for me.[Read More]
For those in the US, a comedian named Carlos Mencia has a great TV show, Mind of Menciaand one of my favorite segments is "Why the @#$% is this news!" where he goes about showingblatantly obvious things that were reported in various channels.
So, when I saw that IBM once again, for the third year in a row, has the fastest disk system,the IBM System Storage SAN Volume Controller (SVC), based on widely-accepted industry benchmarksrepresenting typical business workloads, I thought, "Do I really want to blog about this,and sound like a broken record, repeating my various statements of the past of how great SVC is?" It's like reminding people that IBM hashad the most US patents than any other company, every year, for the past 14 years.
(Last year, I received comments fromWoody Hutsell, VP of Texas Memory Systems,because I pointed out that their "World's Fastest Storage"® cache-only system, was not as fast as IBM's SVC.You can ready my opinions, and the various comments that ensued, hereand here. )
That all changed when EMC uber-blogger Chuck Hollis forgot his own Lessons in Marketingwhen heposted his rantDoes Anyone Take The SPC Seriously?That's like asking "Does anyone take book and movie reviews seriously?" Of course they do!In fact, if a movie doesn't make a big deal of its "Two thumbs up!" rating, you know it did not sitwill with the reviewers. It's even more critical for books. I guess this latest news from SPC reallygot under EMC's skin.
For medium and large size businesses, storage is expensive, and customers want to do as much research as possible ahead of time to make informed decisions. A lot of money is at stake, and often, once you choose a product, you are stuckwith that vendor for many years to come, sometimes paying software renewals after only 90 days, and hardware maintenance renewals after only a year when the warranty runs out.
Customers shopping for storage like the idea of a standardized test that is representative, so they can compare one vendor's claims with another. The Storage Performance Council (SPC), much like the Transaction Processing Performance Council (TPC-C) for servers, requires full disclosure of the test environment so people can see what was measured and make their own judgement on whether or not it reflects their workloads. Chuck pours scorn on SPC but I think we should point to TPC-C as a great success story and ask why he thinks the same can't happen for storage? Server performance is also a complicatedsubject, but people compare TPC-C and TPC-H benchmarks all the time.
Note:This blog post has been updated. I am retracting comments that were unfair generalizations. The next two paragraphs are different than originally posted.
Chuck states that "Anyone is free, however, to download the SPC code, lash it up to their CLARiiON, and have at it." I encourage every customer to do this with whatever disk systems they already have installed. Judge for yourself how each benchmark compares to your experience with your application workload, and consider publishing the results for the benefit of others, or at least send me the results, so that I can understand better all of these"use cases" that Chuck talks about so often. I agree that real-world performance measurements using real applications and real data are always going to be more accurate and more relevant to that particular customer. Unfortunately, there are little or no such results made public. They are noticeably absent. With thousands of customers running with storage from all the major storage vendors, as well as storage from smaller start-up companies, I would expect more performance comparison data to be readily available.
In my opinion, customers would benefit by seeing the performance results obtained by others. SPC benchmarks help to fill this void, to provide customers who have not yet purchased the equipment, and are looking for guidance of which vendors to work with, and which products to put into their consideration set.
Truth is, benchmarks are just one of the many ways to evaluate storage vendors and their products. There are also customer references, industry awards, and corporate statements of a company's financial health, strategy and vision.Like anything, it is information to weigh against other factors when making expensive decisions. And I am sure the SPC would be glad to hear of any suggestions for a third SPC-3 benchmark, if the first two don't provide you enough guidance.
So, if you are not delighted with the performance you are getting from your storage now, or would benefit by having even faster I/O, consider improving its performance by adding SAN Volume Controller. SVC is like salt or soy sauce, it makes everything taste better. IBM would be glad to help you with a try-and-buy or proof-of-concept approach, and even help you compare the performance, before and after, with whatever gear you have now. You might just be surprised how much better life is with SVC. And if, for some reason, the performance boost you experience for your unique workload is only 10-30% better with SVC, you are free to tell the world about your disappointment.