Well, it's Tuesday again, but this time, today we had our third big storage launch of 2009! A lot got announced today as part of IBM's big "Dynamic Infrastructure" marketing campaign. I will just focus on the
disk-related announcements today:
- IBM System Storage DS8700
IBM adds a new model to its DS8000 series with the
[IBM System Storage DS8700]. Earlier this month, fellow blogger and arch-nemesis Barry Burke from EMC posted [R.I.P DS8300] on this mistaken assumption that the new DS8700 meant that DS8300 was going away, or that anyone who bought a DS8300 recently would be out of luck. Obviously, I could not respond until today's announcement, as the last thing I want to do is lose my job disclosing confidential information. BarryB is wrong on both counts:
- IBM will continue to sell the DS8100 and DS8300, in addition to the new DS8700.
- Clients can upgrade their existing DS8100 or DS8300 systems to DS8700.
BarryB's latest post [What's In a Name - DS8700] is fair game, given all the fun and ridicule everyone had at his expense over EMC's "V-Max" name.
So the DS8700 is new hardware with only 4 percent new software. On the hardware side, it uses faster POWER6 processors instead of POWER5+, has faster PCI-e buses instead of the RIO-G loops, and faster four-port device adapters (DAs) for added bandwidth between cache and drives. The DS8700 can be ordered as a single-frame dual 2-way that supports up to 128 drives and 128GB of cache, or as a dual 4-way, consisting of one primary frame, and up to four expansion frames, with up to 384GB of cache and 1024 drives.
Not mentioned explicitly in the announcements were the things the DS8700 does not support:
- ESCON attachment - Now that FICON is well-established for the mainframe market, there is no need to support the slower, bulkier ESCON options. This greatly reduced testing effort. The 2-way DS8700 can support up to 16 four-port FICON/FCP host adapters, and the 4-way can support up to 32 host adapters, for a maximum of 128 ports. The FICON/FCP host adapter ports can auto-negotiate between 4Gbps, 2Gbps and 1Gbps as needed.
- LPAR mode - When IBM and HDS introduced LPAR mode back in 2004, it sounded like a great idea the engineers came up with. Most other major vendors followed our lead to offer similar "partitioning". However, it turned out to be what we call in the storage biz a "selling apple" not a "buying apple". In other words, something the salesman can offer as a differentiating feature, but that few clients actually use. It turned out that supporting both LPAR and non-LPAR modes merely doubled the testing effort, so IBM got rid of it for the DS8700.
Update: I have been reminded that both IBM and HDS delivered LPAR mode within a month of each other back in 2004, so it was wrong for me to imply that HDS followed IBM's lead when obviously development happened in both companies for the most part concurrently prior to that. EMC was late to the "partition" party, but who's keeping track?
Initial performance tests show up to 50 percent improvement for random workloads, and up to 150 percent improvement for sequential workloads, and up to 60 percent improvement in background data movement for FlashCopy functions. The results varied slightly between Fixed Block (FB) LUNs and Count-Key-Data (CKD) volumes, and I hope to see some SPC-1 and SPC-2 benchmark numbers published soon.
The DS8700 is compatible for Metro Mirror, Global Mirror, and Metro/Global Mirror with the rest of the DS8000 series, as well as the ESS model 750, ESS model 800 and DS6000 series.
- New 600GB FC and FDE drives
IBM now offers [600GB drives] for the DS4700 and DS5020 disk systems, as well as the EXP520 and EXP810 expansion drawers. In each case, we are able to pack up to 16 drives into a 3U enclosure.
Personally, I think the DS5020 should have been given a DS4xxx designation, as it resembles the DS4700
more than the other models of the DS5000 series. Back in 2006-2007, I was the marketing strategist for IBM System Storage product line, and part of my job involved all of the meetings to name or rename products. Mostly I gave reasons why products should NOT be renamed, and why it was important to name the products correctly at the beginning.
- IBM System Storage SAN Volume Controller hardware and software
Fellow IBM master inventory Barry Whyte has been covering the latest on the [SVC 2145-CF8 hardware]. IBM put out a press release last week on this, and today is the formal announcement with prices and details. Barry's latest post
[SVC CF8 hardware and SSD in depth] covers just part of the entire
The other part of the announcement was the [SVC 5.1 software] which can be loaded
on earlier SVC models 8F2, 8F4, and 8G4 to gain better performance and functionality.
To avoid confusion on what is hardware machine type/model (2145-CF8 or 2145-8A4) and what is software program (5639-VC5 or 5639-VW2), IBM has introduced two new [Solution Offering Identifiers]:
- 5465-028 Standard SAN Volume Controller
- 5465-029 Entry Edition SAN Volume Controller
The latter is designed for smaller deployments, supports only a single SVC node-pair managing up to
150 disk drives, available in Raven Black or Flamingo Pink.
- EXN3000 and EXP5060 Expansion Drawers
IBM offers the [EXN3000 for the IBM N series]. These expansion drawers can pack 24 drives in a 4U enclosure. The drives can either be all-SAS, or all-SATA, supporting 300GB, 450GB, 500GB and 1TB size capacity drives.
The [EXP5060 for the IBM DS5000 series] is a high-density expansion drawer that can pack up to 60 drives into a 4U enclosure. A DS5100 or DS5300
can handle up to eight of these expansion drawers, for a total of 480 drives.
- IBM System Storage Productivity Center v1.4
The latest [System Storage Productivity Center (SSPC) v1.4] can manage all of your DS3000, DS4000, DS5000, DS6000, DS8000 series disk, and SAN Volume Controller. You can get the SSPC built in two modes:
- Pre-installed with Tivoli Storage Productivity Center Basic Edition. Basic Edition can be upgraded with license keys to support Data, Disk and Standard Edition to extend support and functionality to report and manage XIV, N series, and non-IBM disk systems.
- Pre-installed with Tivoli Key Lifecycle Manager (TKLM). This can be used to manage the Full Disk Encryption (FDE) encryption-capable disk drives in the DS8000 and DS5000, as well as LTO and TS1100 series tape drives.
- IBM Tivoli Storage FlashCopy Manager v2.1
The [IBM Tivoli Storage FlashCopy Manager V2.1] replaces two products in one. IBM used
to offer IBM Tivoli Storage Manager for Copy Services (TSM for CS) that protected Windows application data, and IBM Tivoli Storage Manager for Advanced Copy Services (TSM for ACS) that protected AIX application data.
The new product has some excellent advantages. FlashCopy Manager offers application-aware backup of LUNs containing SAP, Oracle, DB2, SQL server and Microsoft Exchange data. It can support IBM DS8000, SVC and XIV point-in-time copy functions, as well as the Volume Shadow Copy Services (VSS) interfaces of the IBM DS5000, DS4000 and DS3000 series disk systems. It is priced by the amount of TB you copy, not on the speed or number of CPU processors inside the server.
Don't let the name fool you. IBM FlashCopy Manager does not require that you use Tivoli Storage Manager (TSM) as your backup product. You can run IBM FlashCopy Manager on its own, and it will manage your FlashCopy target versions on disk, and these can be backed up to tape or another disk using any backup product. However, if you are lucky enough to also be using TSM, then there is optional integration that allows TSM to manage the target copies, move them to tape, inventory them in its DB2 database, and provide complete reporting.
Yup, that's a lot to announce in one day. And this was just the disk-related portion of the launch!
technorati tags: ds8000, disk, ds8700, exn3, svc, cf8, 2145-c58, DS5000, DS4000, DS3000, DS5020, DS4700, DS5100, DS5300, SSPC, TKLM, FlashCopy+Manager, Tivoli, Storage+Manager, TSM, DB2, Oracle, SAP, SQL, Microsoft+Exchange, VSS, Windows, AIX, N+series, XIV
Tonight PBS plans to air Season 38, Episode 6 of NOVA, titled [Smartest Machine On Earth]. Here is an excerpt from the station listing:
"What's so special about human intelligence and will scientists ever build a computer that rivals the flexibility and power of a human brain? In "Artificial Intelligence," NOVA takes viewers inside an IBM lab where a crack team has been working for nearly three years to perfect a machine that can answer any question. The scientists hope their machine will be able to beat expert contestants in one of the USA's most challenging TV quiz shows -- Jeopardy, which has entertained viewers for over four decades. "Artificial Intelligence" presents the exclusive inside story of how the IBM team developed the world's smartest computer from scratch. Now they're racing to finish it for a special Jeopardy airdate in February 2011. They've built an exact replica of the studio at its research lab near New York and invited past champions to compete against the machine, a big black box code -- named Watson after IBM's founder, Thomas J. Watson. But will Watson be able to beat out its human competition?"
Craig Rhinehart offers
[10 Things You Need to Know About the Technology Behind Watson].
An artist has come up with this clever
Dr. Jon Lenchner from IBM Research has a series of posts on
[How Watson "sees", "hears", and "speaks"] and [Selected Nuances].
Like most supercomputers, Watson runs the Linux operating system. The system runs 2,880 cores (90 IBM Power 750 servers, four sockets each, eight cores per socket) to achieve 80 [TeraFlops]. TeraFlops is the unit of measure for supercomputers, representing a trillion floating point operations. By comparison, Hans Morvec, principal research scientist at the Robotics Institute of Carnegie Mellon University (CMU) estimates that the [human brain is about 100 TeraFlops]. So, in the three seconds that Watson gets to calculate its response, it would have processed 240 trillion operations.
Several readers of my blog have asked for details on the storage aspects of Watson. Basically, it is a modified version of IBM Scale-Out NAS [SONAS] that IBM offers commercially, but running Linux on POWER instead of Linux-x86. System p expansion drawers of SAS 15K RPM 450GB drives, 12 drives each, are dual-connected to two storage nodes, for a total of 21.6TB of raw disk capacity. The storage nodes use IBM's General Parallel File System (GPFS) to provide clustered NFS access to the rest of the system. Each Power 750 has minimal internal storage mostly to hold the Linux operating system and programs.
When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.
On ZDnet, Steven J. Vaughan-Nichols welcomes our new [Linux Penguin Jeopardy overlords]. I have to say I share his enthusiasm!
technorati tags: IBM, Nova, Watson, #ibmwatson, Jeopardy, POWER7, p750, supercomputer, TeraFlops, disk, SONAS, GPFS, SAS, Craig Rhinehart, Jon Lenchner, Hans Morvec, Carnegie Mellon University, CMU
It's official! My "blook" Inside System Storage - Volume I
is now available.
|This blog-based book, or “blook”, comprises the first twelve months of posts from this Inside System Storage blog,165 posts in all, from September 1, 2006 to August 31, 2007. Foreword by Jennifer Jones. 404 pages.|
- IT storage and storage networking concepts
- IBM strategy, hardware, software and services
- Disk systems, Tape systems, and storage networking
- Storage and infrastructure management software
- Second Life, Facebook, and other Web 2.0 platforms
- IBM’s many alliances, partners and competitors
- How IT storage impacts society and industry
You can choose between hardcover (with dust jacket) or paperback versions:
This is not the first time I've been published. I have authored articles for storage industry magazines, written large sections of IBM publications and manuals, submitted presentations and whitepapers to conference proceedings, and even had a short story published with illustrations by the famous cartoon writer[Ted Rall].
But I can say this is my first blook, and as far as I can tell, the first blook from IBM's many bloggers on DeveloperWorks, and the first blook about the IT storage industry.I got the idea when I saw [Lulu Publishing] run a "blook" contest. The Lulu Blooker Prize is the world's first literary prize devoted to "blooks"--books based on blogs or other websites, including webcomics. The [Lulu Blooker Blog] lists past year winners. Lulu is one of the new innovative "print-on-demand" publishers. Rather than printing hundredsor thousands of books in advance, as other publishers require, Lulu doesn't print them until you order them.
I considered cute titles like A Year of Living Dangerously, orAn Engineer in Marketing La-La land, or Around the World in 165 Posts, but settled on a title that matched closely the name of the blog.
In addition to my blog posts, I provide additional insights and behind-the-scenes commentary. If you go to the Luluwebsite above, you can preview an entire chapter in its entirety before purchase. I have added a hefty 56-page Glossary of Acronyms and Terms (GOAT) with over 900 storage-related terms defined, which also doubles as an index back to the post (or posts) that use or further explain each term.
So who might be interested in this blook?
- Business Partners and Sales Reps looking to give a nice gift to their best clients and colleagues
- Managers looking to reward early-tenure employees and retain the best talent
- IT specialists and technicians wanting a marketing perspective of the storage industry
- Mentors interested in providing motivation and encouragement to their proteges
- Educators looking to provide books for their classroom or library collection
- Authors looking to write a blook themselves, to see how to format and structure a finished product
- Marketing personnel that want to better understand Web 2.0, Second Life and social networking
- Analysts and journalists looking to understand how storage impacts the IT industry, and society overall
- College graduates and others interested in a career as a storage administrator
And yes, according to Lulu, if you order soon, you can have it by December 25.
technorati tags: IBM, blook, Volume I, Jennifer Jones, system, storage, strategy, hardware, software, services, disk, tape, networking, SAN, secondlife, Web2.0, facebook, Lulu, publishing, Blooker Prize, articles, magazines, proceedings, Ted Rall, insights, glossary, early-tenure, mentors, library, classroom, administrator, print, publish, on demand
In my post yesterday [Spreading out the Re-Replication process
], fellow blogger BarryB [aka The Storage Anarchist
]raises some interesting points and questions in the comments section about the new IBM XIV Nextra architecture.I answer these below not just for the benefit of my friends at EMC, but also for my own colleagues within IBM,IBM Business Partners, Analysts and clients that might have similar questions.
- If RAID 5/6 makes sense on every other platform, why not so on the Web 2.0 platform?
Your attempt to justify the expense of Mirrored vs. RAID 5 makes no sense to me. Buying two drives for every one drive's worth of usable capacity is expensive, even with SATA drives. Isn't that why you offer RAID 5 and RAID 6 on the storage arrays that you sell with SATA drives?Let's take a look at various disk configurations, for example 3TB on 750GB SATA drives:
And if RAID 5/6 makes sense on every other platform, why not so on the (extremely cost-sensitive) Web 2.0 platform? Is faster rebuild really worth the cost of 40+% more spindles? Or is the overhead of RAID 6 really too much for those low-cost commodity servers to handle.
- JBOD: 4 drives
- JBOD here is industry slang for "Just a Bunch of Disks" and was invented as the term for "non-RAID".Each drive would be accessible independently, at native single-drive speed, with no data protection. Puttingfour drives in a single cabinet like this provides simplicity and convenience only over four separate drivesin their own enclosures.
- RAID-10: 8 drives
- RAID-10 is a combination of RAID-1 (mirroring) and RAID-0 (striping). In a 4x2 configuration, data is striped across disks 1-4,then these are mirrored across to disks 5-8. You get performance improvement and protection against a singledrive failure.
- RAID-5: 5 drives
- This would be a 4+P configuration, where there would be four drives' worth of data scattered across fivedrives. This gives you almost the same performance improvement as RAID-10, similar protection againstsingle drive failure, but with fewer drives per usable TB capacity.
- RAID-6: 6 drives
- This would be a 4+2P configuration, where the first P represents linear parity, and the second represents a diagonal parity. Similar in performance improvement as RAID-5, but protects against single and double drive failures, and still better than RAID-10 in terms of drives per TB usable capacity.
For all the RAID configurations, rebuild would require a spare drive, but often spares are shared among multiple RAID ranks, not dedicated to a single rank. To this end, you often have to have several spares per I/O loop, and a different set of spares for each kind of speed and capacity. If you had a mix of 15K/73GB, 10K/146GB, and 7200/500GB drives, then you would have three sets of spares to match.
In contrast, IBM XIV's innovative RAID-X approach doesn't requireany spare drives, just spare capacity on existing drives being used to hold data. The objects can be mirroredbetween any two types of drives, so no need to match one with another.
All of these RAID levels represent some trade-off between cost, protection and performance, and IBM offers each of theseon various disk systems platforms. Calculating parity is more complicated than just mirrored copies, but this can be done with specialized chips in cache memory to minimize performance impact.IBM generally recommends RAID-5 for high-performance FC disk, and RAID-6 for slower, large capacity SATA disk.
However, the questionassumes that the drive cost is a large portion of the overall "disk system" cost. It isn't. For example,Jon Toigo discusses the cost of EMC's new AX4 disk system in his post [National Storage Rip-Off Day]:
- EMC is releasing its low end Clariion AX4 SAS/SATA array with 3TB capacity for $8600. It ships with four 750GB SATA drives (which you and I could buy at list for $239 per unit). So, if the disk drives cost $956 (presumably far less for EMC), that means buyers of the EMC wares are paying about $7700 for a tin case, a controller/backplane, and a 4Gbps iSCSI or FC connector. Hmm.
- Dell is offering EMC’s AX4-5 with same configuration for $13,000 adding a 24/7 warranty.
(Note: I checked these numbers. $8599 is the list price that EMC has on its own website. External 750GB drivesavailable at my local Circuit City ranged from $189 to $329 list price. I could not find anything on Dell'sown website, but found [The Register] to confirm the $13,000 with 24x7 warranty figure.)
Disk capacity is a shrinking portion of the total cost of ownership (TCO). In addition to capacity, you are paying forcache, microcode and electronics of the system itself, along with software and services that are included in the mix,and your own storage administrators to deal with configuration and management. For more on this, see [XIV storage - Low Total Cost of Ownership].
- EMC Centera has been doing this exact type of blob striping and protection since 2002
As I've noted before, there's nothing "magic" about it - Centera has been employing the same type of object-level replication for years. Only EMC's engineers have figured out how to do RAID protection instead of mirroring to keep the hardware costs low while not sacrificing availability.
I agree that IBM XIV was not the first to do an object-level architecture, but it was one of the first to apply object-level technologies to the particular "use case" and "intended workload" of Web 2.0 applications.
RAID-5 based EMC Centera was designed insteadto hold fixed-content data that needed to be protected for a specific period of time, such as to meet government regulatory compliance requirements. This is data that you most likelywill never look at again unless you are hit with a lawsuit or investigation. For this reason, it is important to get it on the cheapest storage configuration as possible. Before EMC Centera, customers stored this data on WORM tape and optical media, so EMC came up with a disk-only alternative offering.IBM System Storage DR550 offers disk-level access for themost recent archives, with the ability to migrate to much less expensive tape for the long term retention. The end result is that storing on a blended disk-plus-tape solution can help reduce the cost by a factor of 5x to 7x, making RAID level discussion meaningless in this environment. For moreon this, see my post [OptimizingData Retention and Archiving].
While both the Centera and DR550 are based on SATA, neither are designed for Web 2.0 platforms.When EMC comes out with their own "me, too" version, they will probably make a similar argument.
- IBM XIV Nextra is not a DS8000 replacement
Nextra is anything but Enterprise-class storage, much less a DS8000 replacement. How silly of all those folks to suggest such a thing.
I did searches on the Web and could not find anybody, other than EMC employees, who suggested that IBM XIV Nextra architecture represented a replacement for IBM System Storage DS8000. The IBM XIV press release does not mentionor imply this, and certainly nobody I know at IBM has suggested this.
The DS8000 is designed for a different "use case" andset of "intended workloads" than what the IBM XIV was designed for. The DS8000 is the most popular disk systemfor our IBM System z mainframe platform, for activities like Online Transaction Processing (OLTP) and large databases, supporting ESCON and FICON attachment to high-speed 15K RPM FC drives. Web 2.0 customers that might chooseIBM XIV Nextra for their digital content might run their financial operations or metadata search indexes on DS8000.Different storage for different purposes.
As for the opinion that this is not "enterprise class", there are a variety of definitions that refer to this phrase.Some analysts look at "price band" of units that cost over $300,000 US dollars. Other analysts define this as beingattachable to mainframe servers via ESCON or FICON. Others use the term to refer to five-nines reliability, havingless than 5 minutes downtime per year. In this regard, based on the past two years experience at 40 customer locations,I would argue that it meets this last definition, with non-disruptive upgrades, microcode updates and hot-swappable components.
By comparison, when EMC introduced its object-level Centera architecture, nobody suggested it was the replacement for their Symmetrix or CLARiiON devices. Was it supposed to be?
- Given drive growth rates have slowed, improving utilization is mandatory to keep up with 60-70 percent CAGR
Look around you, Tony- all of your competitors are implementing thin provisioning specifically to drive physical utilization upwards towards 60-80%, and that's on top of RAID 5/RAID 6 storage and not RAID 1. Given that disk drive growth rates and $/GB cost savings have slowed significantly, improving utilization is mandatory just to keep up with the 60-70% CAGR of information growth.
Disk drive capacities have slowed for FC disk because much of the attention and investment has been re-directed to ATA technology. Dollar-per-GB price reduction is slowing for disks in general, as researchers are hitting physicallimitations to the amount of bits they can pack per square inch of disk media, and is now around 25 percent per year.The 60-70 percent Compound Annual Growth Rate (CAGR) is real, and can be even growing faster for Web 2.0providers. While hardware costs drop, the big ticket items to watch will be software, services and storage administrator labor costs.
To this end, IBM XIV Nextra offers thin provisioning and differential space-efficient snapshots. It is designed for 60-90 percent utilization, and can be expanded to larger capacities non-disruptively in a very scalable manner.
Well, I hope that helps clear some things up.
technorati tags: IBM, XIV, Nextra, EMC, BarryB, RAID-0, RAID-1, RAID-5, RAID-6, RAID-10, RAID-X, AX4, Dell, AX4-5, FC, SAS, SATA, iSCSI, TCO, blob, object-level, disk, storage, system, Centera, ESCON, FICON, Symmetrix, CLARiiON, ATA, CAGR, Web2.0
Continuing my week in Chicago, for the IBM Storage Symposium 2008, we had sessions that focused on individual products. IBM System Storage SAN Volume Controller (SVC) was a popular topic.
- SVC - Everything you wanted to know, but were afraid to ask!
Bill Wiegand, IBM ATS, who has been working with SAN Volume Controller since it was first introduced in 2003. answered some frequently asked questions about IBM System Storage SAN Volume Controller.
- Do you have to upgrade all of your HBAs, switches and disk arrays to the recommended firmware levels before upgrading SVC? No. These are recommended levels, but not required. If you do plan to update firmware levels, focus on the host end first, switches next, and disk arrays last.
- How do we request special support for stuff not yet listed on the Interop Matrix?
Submit an RPQ/SCORE, same as for any other IBM hardware.
- How do we sign up for SVC hints and tips? Go to the IBM
[SVC Support Site] and select the "My Notifications" under the "Stay Informed" box on the right panel.
- When we call IBM for SVC support, do we select "Hardware" or "Software"?
While the SVC is a piece of hardware, there are very few mechanical parts involved. Unless there are sparks,
smoke, or front bezel buttons dangling from springs, select "Software". Most of the questions are
related to the software components of SVC.
- When we have SVC virtualizing non-IBM disk arrays, who should we call first?
IBM has world-renown service, with some of IT's smartest people working the queues. All of the major storage vendors play nice
as part of the [TSAnet Agreement when a mutual customer is impacted.
When in doubt, call IBM first, and if necessary, IBM will contact other vendors on your behalf to resolve.
- What is the difference between livedump and a Full System Dump?
Most problems can be resolved with a livedump. While not complete information, it is generally enough,
and is completely non-disruptive. Other times, the full state of the machine is required, so a Full System Dump
is requested. This involves rebooting one of the two nodes, so virtual disks may temporarily run slower on that
- What does "svc_snap -c" do?
The "svc_snap" command on the CLI generates a snap file, which includes the cluster error log and trace files from all nodes. The "-c" parameter includes the configuration and virtual-to-physical mapping that can be useful for
disaster recovery and problem determination.
- I just sent IBM a check to upgrade my TB-based license on my SVC, how long should I wait for IBM to send me a software license key?
IBM trusts its clients. No software license key will be sent. Once the check clears, you are good to go.
- During migration from old disk arrays to new disk arrays, I will temporarily have 79TB more disk under SVC management, do I need to get a temporary TB-based license upgrade during the brief migration period?
Nope. Again, we trust you. However, if you are concerned about this at all, contact IBM and they will print out
a nice "Conformance Letter" in case you need to show your boss.
- How should I maintain my Windows-based SVC Master Console or SSPC server?
Treat this like any other Windows-based server in your shop, install Microsoft-recommended Windows updates,
run Anti-virus scans, and so on.
- Where can I find useful "How To" information on SVC?
Specify "SAN Volume Controller" in the search field of the
[IBM Redbooks vast library of helpful books.
- I just added more managed disks to my managed disk group (MDG), can I get help writing a script to redistribute the extents to improve wide-striping performance?
Yes, IBM has scripting tools available for download on
[AlphaWorks]. For example, svctools will take
the output of the "lsinfo" command, and generate the appropriate SVC CLI to re-migrate the disks around to optimize
performance. Of course, if you prefer, you can use IBM Tivoli Storage Productivity Center instead for a more
- Any rules of thumb for sizing SVC deployments?
IBM's Disk Magic tool includes support for SVC deployments. Plan for 250 IOPS/TB for light workloads,
500 IOPS/TB for average workloads, and 750 IOPS/TB for heavy workloads.
- Can I migrate virtual disks from one manage disk group (MDG) to another of different extent size?
Yes, the new Vdisk Mirroring capability can be used to do this. Create the mirror for your Vdisk between the
two MDGs, wait for the copy to complete, and then split the mirror.
- Can I add or replace SVC nodes non-disruptively? Absolutely, see the Technotes
[SVC Node Replacement page.
- Can I really order an SVC EE in Flamingo Pink? Yes. While my blog post that started all
this [Pink It and Shrink It] was initially just some Photoshop humor, the IBM product manager for SVC accepted this color choice as an RPQ option.
The default color remains Raven Black.
technorati tags: IBM, SVC, Audacity of Cope, svc_snap, Flamingo pink, Raven black, non-disruptive, svctools, AlphaWorks
The technology industry is full of trade-offs. Take for example solar cells that convert sunlight to electricity. Every hour, more energy hits the Earth in the form of sunlight than the entire planet consumes in an entire year. The general trade-off is between energy conversion efficiency versus abundance of materials:
- Get 9-11 percent efficiency using rare materials like indium (In), gallium (Ga) or cadmium (Cd).
- Get only 6.7 percent efficiency using abundant materials like copper (Cu), tin (Sn), zinc (Zn), sulfur (S), and selenium (Se)
IBM has eliminated this trade-off with a record-setting breakthrough last week, demonstrating 9.6 percent efficiency [thin film solar cells using earth-abundant materials].
A second trade-off is exemplified by EMC's recent GeoProtect announcement. This appears similar to the geographic dispersal method introduced by a company called [CleverSafe]. The trade-off is between the amount of space to store one or more copies of data and the protection of data in the event of disaster. Here's an excerpt from fellow blogger Chuck Hollis (EMC) titled ["Cloud Storage Evolves"]:
"Imagine a average-sized Atmos network of 9 nodes, all in different time zones around the world. And imagine that we were using, say, a 6+3 protection scheme.
The implication is clear: any 3 nodes could be completely lost: failed, destroyed, seized by the government, etc.
-- and the information could be completely recovered from the surviving nodes."
For organizations worried about their information falling into the wrong hands (whether criminal or government sponsored!), any subset of the nodes would yield nothing of value -- not only would the information be presumably encrypted, but only a few slices of a far bigger picture would be lost.
Seized by the government? falling into the wrong hands? Is EMC positioning ATMOS as "Storage for Terrorists"? I can certainly appreciate the value of being able to protect 6PB of data with only 9PB of storage capacity, instead of keeping two copies of 6PB each, the trade-off means that you will be accessing the majority of your data across your intranet, which could impact performance. But, if you are in an illicit or illegal business that could have a third of your facilities "seized by the government", then perhaps you shouldn't house your data centers there in the first place. Having two copies of 6PB each, in two "friendly nations", might make more sense.
(In reality, companies often keep way more than just two copies of data. It is not unheard of for companies to keep three to five copies scattered across two or three locations. Facebook keeps SIX copies of photographs you upload to their website.)
ChuckH argues that the governments that seize the three nodes won't have a complete copy of the data. However, merely having pieces of data is enough for governments to capture terrorists. Even if the striping is done at the smallest 512-byte block level, those 512 bytes of data might contain names, phone numbers, email addresses, credit cards or social security numbers. Hackers and computer forensics professionals take advantage of this.
You might ask yourself, "Why not just encrypt the data instead?" That brings me to the third trade-off, protection versus application performance. Over the past 30 years, companies had a choice, they could encrypt and decrypt the data as needed, using server CPU cycles, but this would slow down application processing. Every time you wanted to read or update a database record, more cycles would be consumed. This forced companies to be very selective on what data they encrypted, which columns or fields within a database, which email attachments, and other documents or spreadsheets.
An initial attempt to address this was to introduce an outboard appliance between the server and the storage device. For example, the server would write to the appliance with data in the clear, the appliance would encrypt the data, and pass it along to the tape drive. When retrieving data, the appliance would read the encrypted data from tape, decrypt it, and pass the data in the clear back to the server. However, this had the unintended consequences of using 2x to 3x more tape cartridges. Why? Because the encrypted data does not compress well, so tape drives with built-in compression capabilities would not be able to shrink down the data onto fewer tapes.
(I covered the importance of compressing data before encryption in my previous blog post
[Sock Sock Shoe Shoe].)
Like the trade-off between energy efficiency and abundant materials, IBM eliminated the trade-off by offering compression and encryption on the tape drive itself. This is standard 256-bit AES encryption implemented on a chip, able to process the data as it arrives at near line speed. So now, instead of having to choose between protecting your data or running your applications with acceptable performance, you can now do both, encrypt all of your data without having to be selective. This approach has been extended over to disk drives, so that disk systems like the IBM System Storage DS8000 and DS5000 can support full-disk-encryption [FDE] drives.
Certainly, something to think about!
technorati tags: , sunlight, solar cells, electricity, indium, gallium, cadmium, copper, tin, zinc, sulfur, selenium, thin+film, efficiency, EMC, Chuck Hollis, GeoProtect, Cleversafe, governement, seizure, Facebook, terrorists, encryption, forensics, hackers, protection, performance, disk, tape
So here we are in January, named after the two-faced Roman god Janus, who in their mythology was the god of gates and doors, and beginnings and endings.
-- Roger von Oech[Our "Janus-Like" Powers]
Well, it's 2008, which could mark the end to RAID5 and mark the beginnings of a new disk storagearchitecture. IBM starts the year with exciting news, acquiring new disk technology from a smallstart-up called XIV, led by former-EMCer Moshe Yanai. Moshe was ousted publicly in 2001 from hisposition as EMC's VP of engineering, and formed his own company. It didn't take long for EMC bloggersto poke fun at this already. Mark Twomey, in his StorageZilla blog, had mentioned XIV before back in August,[XIV], and again todayin [IBM Buys XIV].
The following is an excerpt from the [IBM Press Release]:
To address the new requirements associated with next generation digital content, IBM chose XIV and its NEXTRA™ architecture for its ability to scale dynamically, heal itself in the event of failure, and self-tune for optimum performance, all while eliminating the significant management burden typically associated with rapid growth environments. The architecture also is designed to automatically optimize resource utilization of all the components within the system, which can allow for easier management and configuration and improved performance and data availability.
"We are pleased to become a significant part of the IBM family, allowing for our unique storage architecture, our engineers and our storage industry experience to be part of IBM's overall storage business," said Moshe Yanai, chairman, XIV. "We believe the level of technological innovation achieved by our development team is unparalleled in the storage industry. Combining our storage architectural advancements with IBM's world-wide research, sales, service, manufacturing, and distribution capabilities will provide us with the ability to have these technologies tackle the emerging Web 2.0 technology needs and reach every corner of the world."
The NEXTRA architecture has been in production for more than two years, with more than four petabytes of capacity being used by customers today.
Current disk arrays were designed for online transaction processing (OLTP) databases. The focus was onusing fastest most expensive 10K and 15K RPM Fibre Channel drives, with clever caching algorithmsfor quick small updates of large relational databases. However, the world is changing, and peoplenow are looking for storage designed for digital media, archives, and other Web 2.0 applications.
One problem that NEXTRA architecture addresses is RAID rebuild. In a standard RAID5 6+P+S configuration of 146GB 10K RPM drives, the loss of one disk drive module (DDM) was recovered by reconstructing the data from parity of the other drives onto the spare drive. The process took46 minutes or longer, depending on how busy the system was doing other things. During this time,if a second drive in the same rank fails, all 876GB of data are lost. Double-drive failures are rare,but unpleasant when they happen, and hopefully you have a backup on tape to recover the data from.Moving to slower, less expensive SATA drives made this situation worse. The drives have highercapacity, but run at slower speeds. When a SATA drive fails in a RAID5 array, it could take severalhours to rebuild, and that is more time exposure for a second drive failure. A rebuild for a 750GBSATA drive would take five hours or more,with 4.5 TB of data at risk during the process if a second drive failure occurs.
The Nextra architecture doesn't use traditional RAID ranks or spare DDMs. Instead, data is carved up into 1MBobjects, and each object is stored on two physically-separate drives. In the event of a DDM loss, allthe data is readable from the second copies that are spread across hundreds of drives. New copies aremade on the empty disk space of the remaining system. This process can be done for a lost 750GB drive in under20 minutes. A double-drive failure would only lose those few objects that were on both drives, so perhaps1 to 2 percent of the total data stored on that logical volume.
Losing 1 to 2 percent of data might be devastating to a large relational database, as this could impactthe entire access to the internal structure. However, this box was designed for unstructuredcontent, like medical images, music, videos, Web pages, and other discrete files. In the event of a double-drivefailure, individual files would be recovered, such as with IBM Tivoli Storage Manager backup software.
IBM will continue to offer high-speed disk arrays like the IBM System Storage DS8000 and DS4800 for OLTP applications, and offer NEXTRA for this new surge in digital content of unstructured data. Recognizing this trend, diskdrive module manufacturers will phase out 10K RPM drives, and focus on 15K RPM for OLTP, and low-speedSATA for everything else.
Update: This blog post was focused on the version of XIV box available as of January 2008 that was built by XIV prior to the IBM acquisition. IBM has since made a major revision, made available August 2008 thataddresses a variety of workloads, including database, OLTP, email, as well as digital content and unstructuredfiles. Contact your IBM or IBM Business Partner for the latest details!
Bottom line, IBM continues to celebrate the new year, while the EMC folks in Hopkington, MA will continue to nurse their hangovers. Now that's a good way to start the new year!
technorati tags: Janus, two-faced, Roman god, Roger Von Oech, IBM, RAID5, XIV, EMC, Moshe Yanai, Mark Twomey, StorageZilla, NEXTRA, double-drive failure, rebuild, HDD, DDM, HDD, digital content, unstructured data
IBM once again delivers storage innovation!
(Note: The following paragraphs have been updated to clarify the performance tests involved.)
This time, IBM breaks the 1 million IOPS barrier, achieved by running a test workload consisting of a 70/30 mix of random 4K requests. That is 70 percent reads, 30 percent writes, with 4KB blocks. The throughput achieved was 3.5x times that obtained by running the identical workload on the fastest IBM storage system today (IBM System Storage SAN Volume Controller 4.3),
and an estimated EIGHT* times the performance of EMC DMX. With an average response time under 1 millisecond, this solution would be ideal for online transaction processing (OLTP) such as financial recordings or airline reservations.
(*)Note: EMC has not yet published ANY benchmarks of their EMC DMX box with SSD enterprise flash drives (EFD). However, I believe that the performance bottleneck is in their controller and not the back-end SSD or FC HDD media, so I have givenEMC the benefit of the doubt and estimated that their latest EMC DMX4 is as fast as an[IBMDS8300 Turbo] with Fibre Channel drives. If or when EMC publishes benchmarks, the marketplace can make more accurate comparisons. Your mileage may vary.
IBM used 4 TB of Solid State Disk (SSD) behind its IBM SAN Volume Controller (SVC) technology to achieve this amazing result. Not only does this represent a significantly smaller footprint, but it uses only 55 percent of the power and cooling.
The SSD drives are made by [Fusion IO] and are different than those used by EMC made by STEC.
The SVC addresses the one key problem clients face today with competitive disk systems that support SSD enterprise flash drives: choosing what data to park on those expensive drives? How do you decide which LUNs, which databases, or which files should be permanently resident on SSD? With SVC's industry-leading storage virtualization capability, you are not forced to decide. You can move data into SSD and back out again non-disruptively, as needed to meet performance requirements. This could be handy for quarter-end or year-end processing, for example.
For more on this, see the [IBM Press Release] or thearticles in [Network World] by Jon Brodkin, and [Cnet News] by Brooke Crothers.
Our clients have often told us at IBM that performance is one of their top purchase criteria. IBM once again has shown that it listens to the marketplace!
technorati tags: IBM, SVC, million, IOPS, EMC, DMX, Network World, Cnet, Jon Brodkin, Brooke Crothers, benchmark, leading, performance, SSD, EFD, FC, HDD, disk, systems, media
Last week, I presented IBM's strategic initiative, the IBM Information Infrastructure, which is part of IBM's New Enterprise Data Center vision. This week, I will try to get around to talking about some of theproducts that support those solutions.
There has been a lot of attention on XIV in the past few weeks, so I will start with that. Steve Duplessie, anIT industry analyst from Enterprise Strategy Group (ESG) had a post [Adaptec buys Aristos, Tom Cruise, XIV, and Logical Assumptions] with some interesting observations and some sage advice.Val Bercovici on his NetApp Exposed blog, has a post [Has Storage Swift-Blogging Finally Jumped the Shark?] which blasts EMC for their negativity.
(For those not in the USA, swift-blogging is a reference tofalse accusations and negative remarks made during the U.S. 2004 presidential election by the[Swift Boat Veterans], and ["jumping the shark"] is a reference to [a TV show that ran out of interesting and relevant topics].For movie sequels, the comparable phrase is ["nuke the fridge"] in reference to the most recent Indiana Jones' movie.)
I was going to set the record straight on a variety of misunderstandings, rumors or speculations, but I think most have been taken care of already. IBM blogger BarryW covered the fact that SVC now supports XIV storage systems, in his post[SVC and XIV],and addressed some of the FUD already. Here was my list:
- Now that IBM has an IBM-branded model of XIV, IBM will discontinue (insert another product here)
I had seen speculation that XIV meant the demise of the N series, the DS8000 or IBM's partnership with LSI.However, the launch reminded people that IBM announced a new release of DS8000 features, new models of N series N6000,and the new DS5000 disk, so that squashes those rumors.
- IBM XIV is a (insert tier level here) product
While there seems to be no industry-standard or agreement for what a tier-1, tier-2 or tier-3 disk system is, there seemed to be a lot of argument over what pigeon-hole category to put IBM XIV in. No question many people want tier-1 performance and functionality at tier-2 prices, and perhaps IBM XIV is a good step at giving them this. In some circles, tier-1 means support for System z mainframes. The XIV does not have traditional z/OS CKD volume support, but Linux on System z partitions or guests can attach to XIV via SAN Volume Controller (SVC), or through NFS protocol as part of the Scale-Out File Services (SoFS) implementation.
Whenever any radicalgame-changing technology comes along, competitors with last century's products and architectures want to frame the discussion that it is just yet another storage system. IBM plans to update its Disk Magic and otherplanning/modeling tools to help people determine which workloads would be a good fit with XIV.
- IBM XIV lacks (insert missing feature here) in the current release
I am glad to see that the accusations that XIV had unprotected, unmirrored cache were retracted. XIV mirrors all writes in the cache of two separate modules, with ECC protection. XIV allows concurrent code loadfor bug fixes to the software. XIV offers many of the features that people enjoy in other disksystems, such as thin provisioning, writeable snapshots, remote disk mirroring, and so on.IBM XIV can be part of a bigger solution, either through SVC, SoFS or GMAS that provide thebusiness value customers are looking for.
- IBM XIV uses (insert block mirroring here) and is not as efficient for capacity utilization
It is interesting that this came from a competitor that still recommends RAID-1 or RAID-10 for itsCLARiiON and DMX products.On the IBM XIV, each 1MB chunk is written on two different disks in different modules. When disks wereexpensive, how much usable space for a given set of HDD was worthy of argument. Today, we sell you abig black box, with 79TB usable, for (insert dollar figure here). For those who feel 79TB istoo big to swallow all at once, IBM offers "capacity on demand" pricing, where you can pay initially for as littleas 22TB, but get all the performance, usability, functionality and advanced availability of the full box.
- IBM XIV consumes (insert number of Watts here) of energy
For every disk system, a portion of the energy is consumed by the number of hard disk drives (HDD) andthe remainder to UPS, power conversion, processors and cache memory consumption. Again, the XIV is a bigblack box, and you can compare the 8.4 KW of this high-performance, low-cost storage one-frame system with thewattage consumed by competitive two-frame (sometimes called two-bay) systems, if you are willing to take some trade-offs. To getcomparable performance and hot-spot avoidance, competitors may need to over-provision or use faster, energy-consuming FC drives, and offer additional software to monitor and re-balance workloads across RAID ranks.To get comparable availability, competitors may need to drop from RAID-5 down to either RAID-1 or RAID-6.To get comparable usability, competitors may need more storage infrastructure management software to hide theinherent complexity of their multi-RAID design.
Of course, if energy consumption is a major concern for you, XIV can be part of IBM's many blended disk-and-tapesolutions. When it comes to being green, you can't get any greener storage than tape! Blended disk-and-tapesolutions help get the best of both worlds.
Well, I am glad I could help set the record straight. Let me know what other products people you would like me to focus on next.
technorati tags: IBM, XIV, disk, storage, system, Steve Duplessie, ESG, Val Bercovici, NetApp, BarryW, SVC, DS8000, N6000, DS5000, mainframe, z/OS, CKD, SoFS, NFS, ECC, HDD, RAID, UPS, availability, reliability, performance, usability, blended disk-and-tape, green
As a consultant, I am often asked to help design the architecture for the information infrastructure. A usefulanalogy to gather requirements and preferences is the difference between area rugs
and wall-to-wall carpeting
. Arearugs are not secured to the floor and cover only a portion of the floor area. Carpets are generally tacked or cemented to the floor, often with an underlay of cushion padding, stretched across the entire floor surface, out to all four walls of each room.
Each has its pros and cons, and often is a matter of preference. Some people like area rugs because they can choosea different style for each room, match the decor and color scheme of furniture, and use these to define each livingspace. Ever since paleolithic man put animal skins on the floor of their cave, people recognize that cold, hard andugly floors could be covered up with something soft and more attractive.Others prefer wall-to-wall carpeting because they want to walk around the house barefoot, have their young children crawl on their hands and knees, and give the entire house a unified look and feel. This is often an inexpensive option when compared against the cost of individual rugs.
The same is true for an information infrastructure. For some, they prefer the "area rug" approach: this style ofstorage for their email, this other type of storage for their databases, and perhaps a third for their unstructuredfile systems. When customers ask what storage would I recommend for their SAP application, or their Microsoft Exchangeemail environment, or their Business Intelligence (BI) software, I recognize they are taking this "area rug" approach.
Like area rugs, having different storage can focus on specific attributes of the workload characteristics. It alsoinsulates against company-wide changes, the dreaded "rip-and-replace" of replacing all of your storage with somethingfrom a different vendor. With "area rug" storage, you can support a dual-vendor or multi-vendor strategy, and upgrade or replace each on its own schedule.
Thanks to open standards and industry-standard benchmarks, changing out one storage solution for another is assimple as rolling up an area rug, and putting another one in its place that is similar in size dimensions.
|Others may prefer "wall-to-wall carpeting" approach: one disk system type, one tape library type,one network type, that provides unified management and minimizes the needs for unique skills. Generally, the choice of NAS, SAN or iSCSI infrastrucutre is done company-wide, and might strongly influence the set of products that will support that decision. For example, those with a mix of mainframe and distributed servers looking for SAN-attached storage may look at an [IBM System Storage DS8000] and [TS3500 tape library] that can provide support for FICON and FCP.|
Those looking at NAS or iSCSI might consider the IBM System Storage N series products, "unified storage" supporting iSCSI, FCP and NAS protocols. If you want the "wall-to-wall" to stretch across all the sites in your globally integrated enterprise, IBM's scalable NAS product, Scale-Out File Services[SoFS], provides a global name spacein combination with a clustered file system that provides incredible scalability and performance based on field-proven technology used by the majority of the [Top 100 supercomputer] deployments.
IBM can help you design an information infrastructure that fits either approach.
technorati tags: IBM, DS8000, TS3500, NAS, SAN, iSCSI, FCP, FICON, mainframe, distributed, SoFS, supercomputer
Continuing my week in Chicago, for the IBM Storage Symposium 2008, I attended two presentations on XIV.
- XIV Storage - Best Practices
Izhar Sharon, IBM Technical Sales Specialist for XIV, presented best practices using XIV in various environments.He started out explaining the innovative XIV architecture: a SATA-based disk system from IBM can outperformFC-based disk systems from other vendors using massive parallelism. He used a sports analogy:
"The men's world record for running 800 meters was set in 1997 by Wilson Kipketer of Denmark in a time of 1:41.11.
However, if you have eight men running, 100 meters each, they will all cross the finish line in about 10 seconds."
Since XIV is already self-tuning, what kind of best practices are left to present? Izhar presented best practicesfor software, hosts, switches and storage virtualization products that attach to the XIV. Here's some quickpoints:
- Use as many paths as possible.
IBM does not require you to purchase and install multipathing software as other competitors might. Instead, theXIV relies on multipathing capabilities inherent to each operating system.For multipathing preference, choose Round-Robin, which is now available onAIX and VMware vSphere 4.0, for example. Otherwise, fixed-path is preferred over most-recently-used (MRU).
- Encourage parallel I/O requests.
XIV architecture does not subscribe to the outdated notion of a "global cache". Instead, the cache is distributed across the modules, to reduce performance bottlenecks. Each HBA on the XIV can handle about 1400requests. If you have fewer than 1400 hosts attached to the XIV, you can further increase parallel I/O requests by specifying a large queue depth in the host bus adapter (HBA).An HBA queue depth of 64 is a good start. Additional settings mightbe required in the BIOS, operating system or application for multiple threads and processes.
For sequential workloads, select host stripe size less than 1MB. For random, select host stripe size larger than 1MB. Set rr_min_io between ten(10) and the queue depth(typically 64), setting it to half of the queue depth is a good starting point.
If you have long-running batch jobs, consider breaking them up into smaller steps and run in parallel.
- Define fewer, larger LUNs
Generally, you no longer need to define many small LUNs, a practice that was often required on traditionaldisk systems. This means that you can now define just 1 or 2 LUNs per application, and greatly simplifymanagement. If your application must have multiple LUNs in order to do multiple threads or concurrent I/O requests, then, by all means, define multiple LUNs.
Modern Data Base Management Systems (DBMS) like DB2 and Oracle already parallelize their I/O requests, sothere is no need for host-based striping across many logical volumes. XIV already stripes the data for you.If you use Oracle Automated Storage Management (ASM), use 8MB to 16MB extent sizes for optimal performance.
For those virtualizing XIV with SAN Volume Controller (SVC), define manage disks as 1632GB LUNs, in multiple of six LUNs per managed disk group (MDG), to balance across the six interface modules. Define SVC extent size to 1GB.
XIV is ideal for VMware. Create big LUNs for your VMFS that you can access via FCP or iSCSI.
- Organize data to simplify Snapshots.
You no longer need to separate logs from databases for performance reasons. However, for some backup productslike IBM Tivoli Storage Manager (TSM) for Advanced Copy Services (ACS), you might want to keep them separatefor snapshot reasons. Gernally, putting all data for an application on one big LUNgreatly simplifies administration and snapshot processing, without losing performance.If you define multiple LUNs for an application, simply put them into the same "consistencygroup" so that they are all snapshot together.
OS boot image disks can be snapshot before applying any patches, updates or application software, so that ifthere are any problems, you can reboot to the previous image.
- Employ sizing tools to plan for capacity and performance.
The SAP Quicksizer tool can be used for new SAP deployments, employing either the user-based orthroughput-based sizing model approach. The result is in mythical unit called "SAPS", which represents0.4 IOPS for ERP/OLTP workloads, and 0.6 IOPS for BI/BW and OLAP workloads.
If you already have SAP or other applications running, use actual I/O measurements. IBM Business Partners and field technical sales specialists have an updated version of Disk Magic that can help size XIV configurations fromPERFMON and iostat figures.
- XIV Performance
Lee La Frese, IBM STSM for Enteprise Storage Performance Engineering, presented internal lab test results forthe XIV under various workloads, based on the latest hardware/software levels [announced two weeks ago]. Three workloadswere tested:
The results were quite impressive. There was more than enough performance for tier 2 application workloads,and most tier 1 applications. The performance was nearly linear from the smallest 6-module to the largest 15-module configuration. Some key points:
- Web 2.0 (80/20/40) - 80 percent READ, 20 percent WRITE, 40 percent cache hits for READ.YouTube, FlickR, and the growing list at [GoWeb20] are applications with heavy read activity, but because of[long-tail effects], may not be as cache friendly.
- Social Networking (50/50/50) - 50 percent READ, 50 percent WRITE, 50 percent cache hits for READ.Lotus Connections, Microsoft Sharepoint, and many other [social networking] usage are more write intensive.
- Database (70/30/50) - 70 percent READ, 30 percent WRITE, 50 percent cache hits for READ.The traditional workload characteristics for most business applications, especially databases like DB2 andOracle on Linux, UNIX and Windows servers.
- A full 15-module XIV overwhelms a single SVC 8F4 node-pair. For a full XIV, consider 4 to 8 nodes 8F4 models, or 2 to 4 nodes of an 8G4. For read-intensive cache-friendly workloads, an SVC in front of XIV was able to deliver over 300,000 IOPS.
- A single node TS7650G ProtecTIER can handle 6 to 9 XIV modules. Two nodes of TS7650G were needed to drivea full 15-module XIV. A single node TS7650 in front of XIV was able to ingest 680 MB/sec on the seventh day with17 percent per-day change rate test workload using 64 virtual drives. Reading the data back got over 950 MB/sec.
- For SAP environments where response time 20-30 msec are acceptable, the 15-module XIV delivered over 60,000 IOPS. Reducing this down to 25,000-30,000 cut the msec response time to a faster 10-15 msec.
These were all done as internal lab tests. Your mileage may vary.
Not surprisingly, XIV was quite the popular topic here this week at the Storage Symposium. There were many moresessions, but these were the only two that I attended.
technorati tags: IBM, XIV, SATA, best practices, performance, Wilson Kipketer, massive parallelism, HBA, DBMS, Oracle, ASM, DB2, SVC, VMware, VMFS, TSM, Tivoli, SAP, Quicksizer, SAPS, PERFMON, iostat, Disk+Magic, TS7650G, ProtecTIER
Well, it's Tuesday again, and that means more IBM announcements!
Today, IBM announced the enhanced IBM System Storage DS3200 disk system.It is in our DS3000 series, the DS3200 is SAS-attach, DS3300 is iSCSI-attach, and DS3400 is FC-attach. All of them support up to 48 drives, which can be a mix of SAS and SATA drives.
The DS3200 supports the following operating environments (see IBM's [Interop Matrix] for details):
- Microsoft Windows
- Linux (both Linux-x86 and Linux on POWER)
- Sun Solaris
- Novell NetWare
With today's announcements, the DS3200 can be used to boot from, as well as contain data. This is ideal to combine with IBM BladeCenter. With the IBM BladeCenter you can have 14 blades, either x86 or POWER based processors, attached to a DS3200 via SAS switch modules in the back of the chassis.
Let's take an example of how this can be used for a Scale-Out File Services[SoFS] deployment.
First, we start with servers. We can have either three [IBM System x3650] servers, but this would use up all six of the direct-attach ports. Instead, we'll choose the [BladeCenter H chassis], with three HS21 blades for SoFS, and that leaves us with eleven empty blade slots we could put in a management node, or other blades to run applications.
- SAS connectivity modules
The IBM BladeCenter [SAS Connectivity Module] allows the blade servers to connect to a DS3200. Two of them fit right in the back of the BladeCenter chassis, providing full redundancy without consuming additional rack space.
- DS3200 and EXP3000 expansion drawers
We'll have one DS3200 controller with twelve internal drives, and three expansion EXP3000 drawers with twelve drives each, for a total of 48 drives. Using 1TB SATA, this would be 48 TB raw capacity.
The end result? You get a 48TB NAS scalable storage solution, supporting up to 7500 concurrent CIFS and NFS users, with up to 700 MB/sec with large block transfers. By using BladeCenter, you can expand performance by adding more blades to the Chassis, or have some blades running SAP or Oracle RAC have direct read/write access to the SoFS data.
Just another example on how IBM can bring together all the components of a solution to provide customer value!
technorati tags: IBM, DS3200, BladeCenter, Linux, AIX, Windows, Solaris, VMware, NetWare, POWER, SAS, EXP3000, SATA, CIFS, NFS, SoFS
Well, it's Tuesday, and that means IBM announcements! Today is bigger, as there are a lot of Dynamic Infrastructure announcements throughout the company with a common theme, cloud computing and smart business systems that support the new way of doing things. Today, IBM announced its new "IBM Smart Archive" strategy that integrates software, storage, servers and services into solutions that help meet the challenges of today and tomorrow. IBM has been spending the past few years working across its various divisions and acquisitions to ensure that our clients have complete end-to-end solutions.
IBM is introducing new "Smart Business Systems" that can be used on-premises for private-cloud configurations, as well as by cloud-computing companies to offer IT as a service.
IBM [Information Archive
] is the first to be unveiled, a disk-only or blended disk-and-tape Information Infrastructure solution that offers a "unified storage" approach with amazing flexibility for dealing with various archive requirements:
- For those with applications using the IBM Tivoli Storage Manager (TSM) or IBM System Storage Archive Manager (SSAM) API of the IBM System Storage DR550 data retention solution, the Information Archive will provide a direct migration, supporting this API for existing applications.
- For those with IBM N series using SnapLock or the File System Gateway of the DR550, the Information Archive will support various NAS protocols, deployed in stages, including NFS, CIFS, HTTP and FTP access, with Non-Erasable, Non-Rewriteable (NENR) enforcement that are compatible with current IBM N series SnapLock usage.
- For those using NAS devices with PACS applications to store X-rays and other medical images, the Information Archive will provide similar NAS protocol interfaces. Information Archive will support both read-only data such as X-rays, as well as read/write data such as Electronic Medical Records.
Information Archive is not just for compliance data that was previously sent to WORM optical media. Instead, it can handle all kinds of data, rewriteable data, read-only data, and data that needs to be locked down for tamper protection. It can handle structured databases, emails, videos and unstructured files, as well as objects stored through the SSAM API.
The Information Archive has all the server, storage and software integrated together into a single machine type/model number. It is based on IBM's General Parallel File System (GPFS) to provide incredible scalability, the same clustered file system used by many of the top 500 supercomputers. Initially, Information Archive will support up to 304TB raw capacity of disk and Petabytes of tape. You can read the [Spec Sheet
] for other technical details.
For those who prefer a more "customized" approach, similar to IBM Scale-Out File Services (SoFS), IBM has [Smart Business Storage Cloud
]. IBM Global Services can customize a solution that is best for you, using many of the same technologies. In fact, IBM Global Services announced a variety of new cloud-computing services to help enterprises determine the best approach.
In a related announcement, IBM announced [LotusLive iNotes
], which you can think of as a "business-ready" version of Google's GoogleApps, Gmail and GoogleCalendar. IBM is focused on security and reliability but leaves out the advertising and data mining that people have been forced to tolerate from consumer-oriented Web 2.0-based solutions. IBM's clients that are already familiar with on-premises version of Lotus Notes will have no trouble using LotusLive iNotes.
There was actually a lot more announced today, which I will try to get to in later posts.
technorati tags: IBM, Dynamic Infrastructure, Smart Archive, Information Archive, Information Infrastructure, TSM, SSAM, WORM, NENR, DR550, GMAS, N series, SnapLock, compliance, disk, tape, storage, GPFS, LotusLive, iNotes, SoFS, Google, GoogleApps, Gmail, GoogleCalendar
Earlier this week, EMC announced its Symmetrix V-Max, following two trends in the industry:
- Using Roman numerals. The "V" here is for FIVE, as this is the successor to the DMX-3 and DMX-4. EMC might have gotten the idea from IBM's success with the XIV (which does refer to the number 14, specifically the 14th class of a Talpiot program in Israel that the founders of XIV graduated from).
- Adding "-Max", "-Monkey" or "2.0" at the end of things to make them sound more cool and to appeal to a younger, hipper audience. EMC might have gotten this idea from Pepsi-Max (... a taste of storage for the next generation?)
I took a cue from President Obama and waited a few days to collect my thoughts and do my homework before responding.Special thanks to fellow blogger ChuckH in giving me a [handy list of reactions] for me to pick and choose from. It appears that EMC marketing machine feels it is acceptable for their own folks to claim that EMC is doing something first, or that others are catching up to EMC, but when other vendors do likewise, then that is just pathetic or incoherent. Here are a few reactions already from fellow bloggers:
This was a major announcement for EMC, addressing many of the problems, flaws and weaknesses of the earlier DMX-3 and DMX-4 deliverables. Here's my read on this:
- More ports
Now you can have as many FCP ports (128) as an IBM System Storage DS8300, although the maximum number of FICON ports is still short, and no mention of ESCON support. The Ethernet ports appear to be 1Gb, not the new 10GbE you might expect.
- Support for System z mainframe
V-Max adds some new support to catch up with the DS8000, like Extended Address Volumes (EAV). EMC is still not quite there yet. IBM DS8000 continues to be the best, most feature-rich storage option if you have System z mainframe servers.
- Improved Performance
Both the IBM DS8000 and HDS USP-V beat the DMX-4 in performance, and in some cases the DMX-4 even lost to the IBM XIV, so EMC had to do something about it. EMC chooses not to participate in industry-standard performance benchmarks like those from the [Storage Performance Council], which limits them to vague comparisons against older EMC gear. I'll give EMC engineers the benefit of the doubt and say that now V-Max is now "comparably as fast as HDS and IBM offerings".
- Getting "V" in the name
The "V" appears to be for the roman number five, not to be confused with external heterogeneous storage virtualization that HDS USP-V and IBM SVC provide. There is no mention of synergy with EMC's failed "Invista" product, and I see no support for attaching other vendors disk to the back of this thing.
- Switch to Intel processor
Apple switched its computers from PowerPC to Intel-based, and now EMC follows in the same path. There are some custom ASICs still in V-Max, so it is not as pure as IBM's offerings.
- Modular, XIV-like Scale-out Architecture
Actually, the packaging appears to follow the familiar system bays and storage bays of the DMX-4 and DMX-4 950 models, but architecturally offers XIV-like attachment across a common switch network between "engines", EMC's term for interface modules.
- Non-disruptive data migration
IBM's SoFS, DR550 and GMAS have this already, as does as anything connected behind an IBM SAN Volume Controller.
A long time ago, IBM used to have midrange disk storage systems called "FAStT" which stood for Fibre Array Storage Technology, so this might have given EMC the idea for their "Fully Automated Storage Tiering" acronym. The concept appears similar to what IBM introduced back in 2007 for the Scale-Out-File Services [SofS] which not only provides policy-based placement, movement and expiration on different disk tiers, includes tape tiers as well for a complete solution. I don't see anything in the V-Max announcement that it will support tape anytime soon.
And what ever happend to EMC's Atmos? Wasn't that supposed to be EMC's new direction in storage?
- Zero-data loss Three-site replication
IBM already calls this Metro/Global Mirror for its IBM DS8000 series, but EMC chose to call it SRDF/EDP for Extended Distance Protection.
- Ease of Use
The most significant part of the announcement is that EMC is finally focusing on ease-of-use.In addition to reducing the requirement for "Bin File" modifications, this box has a redesigned user interface to focus on usability issues. For past DMX models, EMC customers had to either hire EMC to do tasks for them that were just to difficult otherwise, or buy expensive software like their EMC Control Center to manage. EMC willcontinue to sell DMX-4 boxes for a while, as they are probably supply-constrained on the V-Max side, but I doubt they will retro-fit these new features back to DMX-3 and DMX-4.
When IBM announced its acquisition of XIV over a year ago now, customers were knocking down our doors to get one. This caught two particular groups looking like a [deer in headlights]:
- EMC Symmetrix sales force: Some of the smarter ones left EMC to go sell IBM XIV, leaving EMC short-handed and having to announce they [were hiring during their layoffs]. Obviously, a few of the smart ones stayed behind, to convince their management to build something like the V-Max.
- IBM DS8000 sales force: If clients are not happy with their existing EMC Symmetrix, why don't they just buy an IBM DS8000 instead? What does XIV have that DS8000 doesn't?
Let me contrast this with the situation Microsoft Windows is currently facing.
|I am often asked by friends to help them pick out laptops and personal computers. I use Linux, Windows and MacOS, so have personal experience with all three operating systems.|
Linux is cheaper, offers the power-user the most options for supporting older, less-powerfulequipment, but I wouldn't have my Mom use it. While distributions like Ubuntu are makinggreat strides, it is just too difficult for some people.
MacOS is nice, I like it, it works out of the box with little or no customization and an intuitive interface. However, some of my friends don't make IBM-level salaries, and have to watch their budget.
In their "I'm a PC" campaign, Microsoft is fighting both fronts. Let's examine two commercials:
- In the first commercial, a young eight-year-old puts together a video from pictures oftoy animals and some background music.The message: "Windows is easier to use than Linux!" If they really wanted to send this message, they should have shown senior citizens instead.
- In the second commercial, a young college student is asked to find a laptop with 17 inchscreen, and a variety of other qualifications, for under $1000 US dollars. The only modelat the Apple store below this price had a 13 inch screen, but she finds a Windows-based system that had this size screen and met all the other qualifications. The message: "Windows-based hardware from a variety of competitors are less expensive than hardware from Apple!"
Both Microsoft and Apple charge a premium for ease-of-use.In the storage world, things are completely opposite. Vendors don't charge a premium forease-of-use. In fact, some of the easiest to use are also the least expensive.
- If you just have Windows and Linux, you can get some entry level system likethe IBM DS3000 series, only a few features, and can be set up in six simple steps.
- Next, if you have a more interesting mix of operating systems, Linux, Windows and some flavorsof UNIX like IBM AIX, HP-UX or Sun Solaris, then you might want the features and functionsof more pricier midrange offerings. More options means that configuration and deploymentis more difficult, however.
- Finally, if you are serious Fortune 500 company, running your mission critical applications on System z or System i centralized systems in a big data center, that you might be willing to pay top dollar for the most feature-rich offerings of an Enterprise-class machine.Thankfully you have an army of highly-trained staff to handle the highest levels of complexity.
IBM's DS8000, HDS USP-V and EMC's Symmetrix are the key players in the Enterprise-classspace. They tried to be ["all things to all people"], er.. perhaps all things to allplatforms. All of the features and functions came at a price, not just in dollars, butin complexity and difficulty. You needed highly skilled storage admins using expensive storage management software, or be willing to hirethe storage vendor's premium services to get the job done.
Alan Cooper wrote a book that would change that mindset, titled[The Inmates Are Running the Asylum: Why High-Tech Products Drive Us Crazy and How to Restore the Sanity]. This book showed why the situation had gotten out ofhand, and to demand better ease-of-use from all vendors. It was required reading for anyone working in or with IBM's "User Centered Design" teams.I highly recommend this book.
IBM recognized this trend early. IBM's SVC, N series and now XIV all offer ease-of-use withenterprise-class features and functions, at lower total cost of ownership than traditional enterprise-class systems. IBM is not the only one, of course, as smaller storage start-ups like 3PAR,Pillar Data Systems, Compellent, and to some extent Dell's EqualLogic all recognized thisand developed clever offerings as well.
While IBM's XIV may not have been the first to introduce a modular, scale-out architectureusing commodity parts managed by sophisticated ease-of-use interfaces, its success might have been the kick-in-the-butt EMC needed to follow the rest of the industry in this direction.
technorati tags: IBM, DS8000, XIV, SVC, EMC, Symmetrix, V-Max, DMX-3, DMX-4, Chuck Hollis, HDS, USP-V, FCP, FICON, ESCON, storage+virtualization, FAStT
Fellow Blogger BarryB mentions "chunk size" in his post [Blinded by the light
],as it relates to Symmetrix Virtual Provisioning capability. Here is an excerpt:
I mean, seriously, who else but someone who's already implemented thin provisioning would really understand the implications of "chunk" size enough to care?
For those of you who don't know what the heck "chunk size" means (now listen up you folks over at IBM who have yet to implement thin provisioning on your own storage products), a "chunk" is the term used (and I think even trademarked by 3PAR) to refer to the unit of actual storage capacity that is assigned to a thin device when it receives a write to a previously unallocated region of the device.For reference, Hitachi USP-V uses I think a 42MB chunk, XIV NEXTRA is definitely 1MB, and 3PAR uses 16K or 256K (depending upon how you look at it).
Thin Provisioning currently offered in IBM System Storage N serieswas technically "implemented" by NetApp, and that the Thin Provisioning that will be offered in our IBM XIV Nextrasystems will have been acquired from XIV. Lest I remind you that many of EMC's products were developed by other companies first, then later acquired by EMC, so no need for you to throw rocks from your glass houses in Hopkington.
"Thin provisioning" was first introduced by StorageTek in the 1990's and sold by IBM under the name of RAMAC Virtual Array (RVA). An alternative approach is "Dynamic Volume Expansion" (DVE). Rather than giving the host application a huge 2TB LUN but actually only use 50GB for data, DVE was based on the idea that you only give out 50GB they need now, but could expand in place as more space was required. This was specifically designed to avoid the biggest problem with "Thin Provisioning" which back then was called "Net Capacity Load" on the IBM RVA, but today is now referred to as "over-subscription". It gave Storage Administrators greater control over their environment with no surprises.
In the same manner as Thin Provisioning, DVE requires a "chunk size" to work with. Let's take a look:
- DS4000 series
On the DS4000 series, we use the term "segment size", and indicate that the choice of a segment size can have some influence on performance in both IOPS and throughput. Smaller segment sizes increase the request rate (IOPS) by allowing multiple disk drives to respond to multiple requests. Large segment sizes increase the data transfer rate(Mbps) by allowing multiple disk drives to participate in one I/O request. The segment size does not actually change what is stored in cache, just what is stored on the disk itself.It turns out in practice there is no advantage in using smaller sizes with RAID 1; only in a few instances does this help with RAID-5 if you can writea full stripe at once to calculate parity on outgoing data. For most business workloads, 64KB or 128KB are recommended. DVE expands by the same number of segments across all disks in the RAID rank, so for example in a 12+P rank using 128KB segment sizes, the chunk size would be thirteen segments, about 1.6MB in size.
- SAN Volume Controller
On the SAN Volume Controller, we call this "extent size" and allow it to be various values 64MB to 512MB. Initially,IBM only managed four million extents, so this table was used to explain the maximum amount that could be managedby an SVC system (up to 8 nodes) depending on extent size selected.
|Extent Size||Maximum Addressable|
IBM thought that since we externalized "segment size" on the DS4000, we should do the same for the SANVolume Controller. As it turned out, SVC is so fast up in the cache, that we could not measure any noticeable performance difference based on extent size. We did have a few problems. First, clients who chose 16MB andthen grew beyond the 64TB maximum addressable discovered that perhaps they should have chosen something larger.Second, clients called in our help desk to ask what size to choose and how to determine the size that was rightfor them. Third, we allowed people to choose different extent sizes per managed disk group, but that preventsmovement or copies between groups. You can only copy between groups that use the same extent size. The generalrecommendation now is to specify 256MB size, and use that for all managed disk groups across the data center.
The latest SVC expanded maximum addressability to 8PB, still more than most people have today in their shops.
- DS8000 series
Getting smarter each time we introduce new function, we chose 1GB chunks for the DS8000. Based on a mainframebackground, most CKD volumes are 3GB, 9GB, or 27GB in size, and so 1GB chunks simplified this approach. Spreadingthese 1GB chunks across multiple RAID ranks greatly reduced hot-spots that afflict other RAID-based systems.(Rather than fix the problem by re-designing the architecture, EMC will offer to sell you software to help you manually move data around inside the Symmetrix after the hot-spot is identified)
Unlike EMC's virtual positioning, IBM DS8000 dynamic volume expansion does work on CKD volumes for our System z mainframe customers.
The trade-off in each case was between granularity and table space. Smaller chunks allow finer control on the exact amount allocated for a LUN or volume, but larger chunks reduced the number of chunks managed. With our advanced caching algorithms, changes in chunk size did not noticeably impact performance. It is best just to come up with a convenient size, and either configure it as fixed in the architecture, or externalize it as a parameter with a good default value.
Meanwhile, back at EMC, BarryB indicates that they haven't determined the "optimal" chunk size for their newfunction. They plan to run tests and experiments to determine which size offers the best performance, and thenmake that a fixed value configured into the DMX-4. I find this funny coming from the same EMC that won't participate in [standardized SPC benchmarks] because they feel that performance is a personal and private matter between a customer and their trusted storage vendor, that all workloads are different, and you get the idea. Here's another excerpt:
Back at the office, they've taking to calling these "chunks" Thin Device Extents (note the linkage back to EMC's mainframe roots), and the big secret about the actual Extent size is...(wait for it...w.a.i.t...for....it...)...the engineers haven't decided yet!
That's right...being the smart bunch they are, they have implemented Symmetrix Virtual Provisioning in a manner that allows the Extent size to be configured so that they can test the impact on performance and utilization of different sizes with different applications, file systems and databases. Of course, they will choose the optimal setting before the product ships, but until then, there will be a lot of modeling, simulation, and real-world testing to ensure the setting is "optimal."
Finally, BarryB wraps up this section poking fun at the chunk sizes chosen by other disk manufacturers. I don't knowwhy HDS chose 42MB for their chunk size, but it has a great[Hitchiker's Guide to the Galaxy]sound to it, answering the ultimate question to life, the universe and everything. Hitachi probably went to theirDeep Thought computer and asked how big should their "chunk size" be for their USP-V, and the computer said: 42.Makes sense to me.
I have to agree that anything smaller than 1MB is probably too small. Here's the last excerpt:
Now, many customers and analysts I've spoken to have in fact noted that Hitachi's "chunk" size is almost ridiculously large; others have suggested that 3PAR's chunks are so small as to create performance problems (I've seen data that supports that theory, by the way).
Well, here's the thing: the "right" chunk size is extremely dependent upon the internal architecture of the implementation, and the intersection of that ideal with the actual write distribution pattern of the host/application/file system/database.
So my suggestion to EMC is, please, please, please take as much time as you need to come up with the perfect"chunk size" for this, one that handles all workloads across a variety of operating systems and applications, from solid-state Flash drives to 1TB SATA disk. Take months or years, as long as it takes. The rest of the world is in no hurry, as thin provisioning or dynamic volume expansion is readily available on most other disk systems today.
Maybe if you ask HDS nicely, they might let you ask their computer.
technorati tags: IBM, thin provisioning, XIV, Nextra, N series, chunk size, BarryB, EMC, Symmetrix, virtual provisioning, 3PAR, Hitachi, HDS, USP-V, StorageTek, RAMAC Virtual Array, RVA, dynamic volume expansion, DVE, 42MB, Hitchhiker's Guide, CKD, System z, mainframe, SATA, DS8000, DS4000, SAN Volume Controller, SVC
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData
blog, posted a ["bleg"
] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
- Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
- Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
- Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
- Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
- Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
- InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
- Keeping more backup data available online for fast recovery.
- Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
- Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
- Will this meet my current business requirements?
- Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
- What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
- Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
- De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
- De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
- Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
- Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
- Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
- De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
- Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
technorati tags: IBM, Diligent, Jon Toigo, DrunkenData, bleg, deduplication, A-SIS, NetApp, ProtecTier, inline, post-process, back-end, disk, data integrity, hash, collision, ingest rate, VTL, non-repudiation, extent, bit-perfect, Microsoft Word, Adobe PDF, diff, Black Hat, encryption, compression, Hifn, FC, SATA
Continuing my week in Chicago for the IBM Storage and Storage Networking Symposium and System x and BladeCenter Technical Conference, I presented a variety of topics.
- Hybrid Storage for a Green Data Center
The cost of power and cooling has risen to be a #1 concern among data centers. I presented the following hybrid storage solutions that combine disk with tape. These provide the best of both worlds, the high performance access time of disk with the lower costs and reduced energy consumption of tape.
- IBM [System Storage DR550] - IBM's Non-erasable, Non-rewriteable (NENR) storage for archive and compliance data retention
- IBM Grid Medical Archive Solution [GMAS] - IBM's multi-site grid storage for PACS applications and electronic medical records[EMR]
- IBM Scale-out File Services [SoFS] - IBM's scalable NAS solution that combines a global name space with a clustered GPFS file system, serving as the ideal basis for IBM's own[Cloud Computing and Storage] offerings
Not only do these help reduce energy costs, they provide an overall lower total cost of ownership (TCO) thantraditional WORM optical or disk-only storage configurations.
- The Convergence of Networks - Understanding SAN, NAS and iSCSI in the Data Center Network
This turned out to be my most popular session. Many companies are at a crossroads in choosing data and storage networking solutions in light of recent announcements from IBM and others. In the span of 75 minutes, I covered:
- Block storage concepts, storage virtualization and RAID levels
- File system concepts, how file systems map files to block storage
- Network Attach Storage, the history of the NFS and CIFS protocols, Pros and Cons of using NAS
- Storage Area Networks, the history of SAN protocols including ESCON, FICON and FCP, Pros and Cons of using SAN
- IP SAN technologies, iSCSI and Fibre Channel over Ethernet (FCoE), Pros and Cons of using this approach
- Network Convergence with Infiniband and Fibre Channel over Convergence Enhanced Ethernet (FCoCEE), why Infiniband was not adopted historically in the marketplace as a storage protocol, and the features and enhancements of Convergence Enhanced Ethernet (CEE) needed to merge NAS, SAN and iSCSI traffic onto a single converged data center network [DCN]
Yes, it was a lot of information to cover, but I managed to get it done on time.
- IBM Tivoli Storage Productivity Center version 4.1 Overview and Update
In conferences like these, there are two types of product-level presentations. An "Overview" explains howproducts work today to those who are not familiar with it. An "Update" explains what's new in this version of the product for those who are already familiar with previous releases. I decided to combine these into one sessionfor IBM's new version of [Tivoli Storage Productivity Center].I was one of the original lead architects of this product many years ago, and was able to share many personalexperiences about its evolution in development and in the field at client facilities.Analysts have repeatedly rated IBM Productivity Center as one of the top Storage Resource Management (SRM) tools available in the marketplace.
- Information Lifecycle Management (ILM) Overview
Can you believe I have been doing ILM since 1986? I was the lead architect for DFSMS which provides ILM support for z/OS mainframes. In 2003-2005, I spent 18 months in the field performingILM assessments for clients, and now there are dozens of IBM practitioners in Global Technology Services andSTG Lab Services that do this full time. This is a topic I cover frequently at the IBM Executive Briefing Center[EBC], because it addressesseveral top business challenges:
- Reducing costs and simplifying management
- Improving efficiency of personnel and application workloads
- Managing risks and regulatory compliance
IBM has a solution based on five "entry points". The advantage of this approach is that it allows our consultants to craft the right solution to meet the specific requirements of each client situation. These entry points are:
- Enterprise Content Management [ECM]
- Tiered Information Infrastructure - we don't limit ourselves to just "Tiered Storage" as storage is only part of a complete[information infrastructure] of servers,networks and storage
- Storage Optimization and Virtualization - including virtual disk, virtual tape and virtual file solutions
- Process Enhancement and Automation - an important part of ILM are the policies and procedures, such as IT Infrastructure Library [ITIL] best practices
- Archive and Retention - space management and data retention solutions for email, database and file systems
I did not get as many attendees as I had hoped for this last one, as I was competing head-to-head in the same time slot as Lee La Frese covering IBM's DS8000 performance with Solid State Disk (SSD) drives, John Sing covering Cloud Computing and Storage with SoFS, and Eric Kern covering IBM Cloudburst.
I am glad that I was able to make all of my presentations at the beginning of the week, so that I can then sit back and enjoy the rest of the sessions as a pure attendee.
technorati tags: IBM, Symp09, storage symposium, hybrid storage, DR550, NENR, WORM, GMAS, SoFS, PACS, EMR, NAS, GPFS, SAN, iSCSI, FCoE, FCoCEE, CEE, DCN, TCO, RAID, ESCON, FICON, Infiniband, Tivoli, Productivity Center, ILM, virtualization, ITIL, DS8000, SSD, Cloudburst, Information Infrastructure
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
- Storage for the Green Data Center
I presented this topic in four general categories:
- Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
- Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
- Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
- Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
- Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
technorati tags: IBM, Technical University, Green Data Center, PUE, DCiE, Free Cooling, ASHRAE, LEED, SSD, Disk, Tape, SONAS, Archive
Perhaps I wrapped up my exploration of disk system performance one day too early. (While it is Friday here in Malaysia, it is still only Thursday back home)
Barry Burke, EMC blogger (aka The Storage Anarchist) writes:
Aren't you mixing metrics here?
Miles per Gallon measures an effeciency ratio (amount of work done with a fixed amount of energy), not a speed ratio (distance traveled in a unit of time).
Given that IOPs and MB/s are the unit of "work" a storage array does, wouldn't the MPG equivalent for storage be more like IOPs per Watt or MB/s per Watt? Or maybe just simply Megabytes Stored per Watt (a typical "green" measurement)?
You appear to be intentionally avoiding the comparison of I/Os per Second and Megabytes per Second to Miles Per Hour?
May I ask why?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
- MPG applies to all instances of a particular make and model. Before Henry Ford and the assembly line, cars were made one at a time, by a small team of craftsmen, and so there could be variety from one instance to another. Today, vehicles and storage systems are mass-produced in a manner that provides consistent quality. You can test one vehicle, and safely assume that all similar instances of the same make and model will have the similar mileage. The same is true for disk systems, test one disk system and you can assume that all others of the same make and model will have similar performance.
MPG has a standardized measurement benchmark that is publicly available. The US Environmental Protection Agency (EPA) is an easy analogy for the Storage Performance Council, providing the results of various offerings to chose from.
MPG has usage-specific benchmarks to reflect real-world conditions.The EPA offers City MPG for the type of driving you do to get to work, and Highway MPG, to reflect the type ofdriving on a cross-country trip. These serve as a direct analogy to SPC having SPC-1 for Online transaction processing (OLTP) and SPC-2 for large file transfers, database queries and video streaming.
MPG can be used for cost/benefit analysis.For example, one could estimate the amount of business value (miles travelled) for the amount of dollar investment (cost to purchase gallons of gasoline, at an assumed gas price). The EPA does this as part of their analysis. This is similar to the way IOPS and MB/s can be divided by the cost of the storage system being tested on SPC benchmark results. The business value of IOPS or MB/s depends on the application, but could relate to the number of transactions processed per hour, the number of music downloads per hour, or number of customer queries handled per hour, all of which can be assigned a specific dollar amount for analysis.
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
- Increasing MPH, and driving anywhere near the maximum rated MPH for a vehicle, can be reckless and dangerous,risking loss of human life and property damage. Even professional race car drivers will agree there are dangers involved. By contrast, processing I/O requests at maximum speed poses no additional risk to the data, nor possibledamage to any of the IT equipment involved.
- While most vehicles have top speeds in excess of 100 miles per hour, most Federal, State and Local speed limits prevent anyone from taking advantage of those maximums. Race-car drivers in NASCAR may be able to take advantage of maximum MPH of a vehicle, the rest of us can't. The government limits speed of vehicles precisely because of the dangers mentioned in the previous bullet. In contrast, processing I/O requests at faster speeds poses no such dangers, so the government poses no limits.
- Neither IOPS nor MB/s match MPH exactly.Earlier this week,I related IOPS to "Questions handled per hour" at the local public library, and MB/s to "Spoken words per minute" in those replies. If I tried to find a metric based on unit type to match the "per second" in IOPS and MB/s, then I would need to find a unit that equated to "I/O requests" or "MB transferred" rather than something related to "distance travelled".
In terms of time-based units, the closest I could come up with for IOPS was acceleration rate of zero-to-sixty MPH in a certain number of seconds. Speeding up to 60MPH, then slamming the breaks, and then back up to 60MPH, start-stop, start-stop, and so on, would reflect what IOPS is doing on a requestby request basis, but nobody drives like this (except maybe the taxi cab drivers here in Malaysia!)
Since vehicles are limited to speed limits in normal road conditions, the closest I could come up with for MB/s would be "passenger-miles per hour", such that high-occupancy vehicles like school buses could deliver more passengers than low-occupancy vehicles with only a few passengers.
Neither start-stops nor passenger-miles per hour have standardized benchmarks, so they don't work well for comparisonbetween vehicles.If you or anyone can come up with a metric that will help explain the relevance of standardized benchmarks better than the MPG that I already used, I would be interested in it.
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s.
Oh, and yes, it's true - the DMX-4 is not the first high-end storage array to ship a 4Gb/s FC back-end. The USP-V, announced way back in May, has that honor (but only if it meets the promised first shipments in July 2007). DMX-4 will be in August '07, so I guess that leaves the DS8000 a distant 3rd.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
technorati tags: IBM, EMC, Storage Anarchist, MPG, MPH, IOPS, NASCAR, Malaysia, Watts, GB, green, back-end, DMX-3, DMX-4, HDS, USP, USP-V, SPC, SPC-1, SPC-2, standardized, benchmarks, workload, DS8000, disk, storage, tape
I am still wiping the coffee off my computer screen, inadvertently sprayed when I took a sip while reading HDS' uber-blogger Hu Yoshida's post on storage virtualization andvendor lock-in
. This blog appears to be the text version of theirfunny video
While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.
"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this."
-- Hu Yoshida
Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.
First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.
Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.
IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.
IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.
IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.
IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.
This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.
IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.
IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.
Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).
IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.
The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).
Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.
IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.
So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.
technorati tags: IBM, disk, tape, storage, virtualization, Hu Yoshia, HDS, Hitachi, TagmaStore, USP, NSC, disk-less, SAN Volume Controller, LVM, AIX, RAID, SAN, blade, Sun, STK, Cisco, EMC, Invista,