There's some good discussion in the comments section over at Robin Harris' StorageMojo
blog for hispost [Building a 1.8 Exabyte Data Center
].To summarize, a student is working on a research archive and asked Robin Harris for his opinion. The archive will consist of 20-40 million files averaging 90 GB in size each, for a total of 1800 PB or 1.8 EB. By comparison, anIBM DS8300 with five frames tops out at 512TB, so it would take nearly 3600 of these to hold 1.8 EB. While this might seem like a ridiculous amount of data, I think the discussion is valid as our world is certainly headed in that direction.
IBM works with a lot of research firms, and the solution is to put most of this data on tape, with just enough disk for specific analysis. Robin mentions a configurion with Sun Fire 4540 disk systems (aka Thumper). Despite Sun Microsystems' recent [$1.7 Billion dollar quarterly loss], I think even the experts at Sun would recommend a blended disk-and-tape solution for this situation.
Take for example IBM's Scale Out File Services [SoFS] which today handles 2-3 billion files in a single global file system, so 20-40 million would present no problem. SoFS supports a mix of disk and tape, with built-in movement, so that files that were referenced would automatically be moved to disk when needed, and moved back to tape when no longer required, based on policies set by the administrator. Depending on the analysis, you may only need 1 PB or less of disk to perform the work, which can easily be accomplished with a handful of disk systems, such as IBM DS8300 or IBM XIV, for example.
The rest would be on tape. Let's consider using the IBM TS3500 with [S24 High Density] frames. A singleTS3500 tape library with fifteen of these HD frames could hold 45PB of data, assuming 3:1 compression on 1TB-size 3592 cartridges. You wouldneed 40 (forty) of these libraries to get to the full 1800 PB required, and these could hold even more as higher capacity cartridges are developed. IBM has customers with over 40 tape libraries today (not all with these HD frames, of course), but the dimensions and scale that IBM is capable lies within this scope.
(For LTO fans, fifteen S54 frames would hold 32PB of data, assuming 2:1 compression on 800GB-size LTO-4 cartridges.so you would need 57 libraries instead of 40 in the above example.)
This blended disk-and-tape approach would drastically reduce the floorspace and electricity requirements when compared against all-disk configurations discussed in the post.
People are rediscovering tape in a whole new light. ComputerWorld recently came out with an 11-page Technology Brief titled [The Business Value of Tape Storage],sponsored by Dell. (Note: While Dell is a competitor to IBM for some aspects of their business, they OEM their tape storage systems from IBM, so in that respect, I can refer to them as a technology partner.) Here are some excerpts from the ComputerWorld brief:
For IT managers, the question isnot whether to use tape, but whereand how to best use tape as part of acomprehensive, tiered storage architecture.In the modern storage architecture,tape plays a role not onlyin data backup, but also in long-termarchiving and compliance.
“Long-term archiving is the primaryreason any company shoulduse tape these days,” says MikeKarp, senior analyst at EnterpriseManagement Associates in Boulder,Colo. Companies are increasinglylikely to use disk in conjunctionwith tape for backup, but for long-termarchiving needs, tape remainsunbeatable.
After factoring inacquisition costs of equipment andmedia, as well as electricity and datacenter floor space, Clipper Groupfound that the total cost of archivingsolutions based on SATA disk, theleast expensive disk, was up to 23times more expensive than archivingsolutions involving tape. Calculatingenergy costs for the competing approaches,the costs for disk jumpedto 290 times that of tape.
“Tape isalways the winner anywhere costtrumps anything else,” says Karp.No matter how the cost is figured,tape is less expensive.
Beyond IT familiarity with tape,analysts point to other reasons whyorganizations will likely keep tapein their IT storage infrastructures.Energy savings, for example, is themost recent reason to stick withtape. “The economics of tape arepretty compelling, especially whenyou figure in the cost of power,”Schulz says.
So, whether you are planning for an Exabyte-scale data center, or merely questioning the logic of a disk-for-everything storage approach, you might want to consider tape. It's "green" for the environment, and less expensive on your budget.
technorati tags: Robin Harris, StorageMojo, Exabyte, Data Center, IBM, blended, disk-and-tape, Sun, Huge Quarterly Loss, Thumper, SoFS, DS8300, XIV, N series, TS3500, S24, 3592, S54, LTO, LTO-4, ComputerWorld, Dell, Mike Karp, Greg Schulz
Perhaps the recent financial meltdown is making storage vendors nervous.Both IBM and EMC gained market share in 3Q08, but EMC is acting strangelyat IBM's latest series of plays and announcements. Almost contradictory!
- Benchmarks bad, rely on your own in-house evaluations instead
Let's start with fellow blogger Barry Burke from EMC, who offers his latest post[Benchmarketing Badly] with commentaryabout Enterprise Strategy Group's [DS5300 Lab Validation Report]. The IBM System Storage DS5300 is one of IBM's latest midrange disk systems recently announced. Take for example this excerpt from BarryB's blog post:
"I was pleasantly surprised to learn that both IBM and ESG agree with me about the relevance and importance of the Storage Performance Council benchmarks.Nowhere in the ESG report says this, nor have I found any public statements from either IBM nor ESG that makes this claim. Instead, the ESG report explains that traditional benchmarks from the Storage Performance Council [SPC] focus on a single, specific workload, and ESG has chosen to complement this with a variety of other benchmarks to perform their product validation, including VMware's "VMmark", Oracle's Orion Utility, and Microsoft's JetStress.
That is, SPC's are a meaningless tool by which to measure or compare enterprise storage arrays."
Benchmarks provide prospective clients additional information to make purchasedecisions. IBM understands this, ESG understands this, and other well-respected companies like VMware, Oracle and Microsoft understand this. EMC is afraid that benchmarks mightencourage a client to "mistakenly" purchase a faster IBM product than a slower EMC product. Sunshine makes a great disinfectant, but EMC (and vampires) prefer their respective "prospects" remain in the dark.
Perhaps stranger still is BarryB's postscript. Here's an excerpt:
"... a customer here asked me if EMC would be willing to participate in an initiative to get multiple storage vendors to collaborate on truly representative real-world "enterprise-class" benchmarks, and I reassured him that I would personally sponsor active and objective participation in such an effort - IF he could get the others to join in with similar intent."
As I understand it, EMC was once part of the Storage Performance Council a long time ago, then chose to drop out of it. Why re-invent the wheel by creating yet another storage industry benchmark group? EMC is welcome to come back to SPC anytime! In addition to the SCP-1 and SPC-2 workloads, there is work underway for an SPC-3 benchmark. Each SPC workload provides additional insight for product comparisons to help with purchase decisions. If EMC can suggest an SPC-4 benchmark that it feels is more representative of real-world conditions, they are welcome to join the SPC party and make that a reality.
The old adage applies: ["It's better to light a candle than curse the darkness"]. EMC has been cursing the lack of what it considers to be acceptable benchmarks but has yet to offer anything more realistic or representative than SPC.What does EMC suggest you do instead? Get an evaluation box and run your own workloads and see for yourself! EMC has in the past offered evaluation units specifically for this purpose.
- In-house evaluations bad, it's a trap!
Certainly, if you have the time and staff to run your own evaluation, with your own applications in your own environment, then I agree with EMC that this can provide better insight for your particular situation than standardized benchmarks.
In fact, that is exactly what IBM is doing for IBM XIV storage units, which are designed for Web 2.0 and Digital Archive workloads that current SPC benchmarks don't focus on. Fellow blogger Chuck Hollis from EMC opines in his post[Get yer free XIV!]. Here's an excerpt:
"Now that I think about it, this could get ugly. Imagine a customer who puts one on the floor to evaluate it, and -- in a moment of desperation or inattention -- puts production data on the device.
Nobody was paying attention, and there you are. Now IBM comes calling for their box back, and you've got a choice as to whether to go ahead and sign the P.O., or migrate all your data off the thing. Maybe they'll sell you an SVC to do this?
Yuck. I bet that happens more than once. And I can't believe that IBM (or the folks at XIV) aren't aware of this potentially happening."
Perhaps Chuck is speaking from experience here, as this may have happened with customers with EMC evaluation boxes, and is afraid this could happen with IBM XIV. I don't see anything unique about IBM XIV in the above concern. Typical evaluations involve copying test data onto the box, test it out with some particular application or workload, and then delete the data no longer required. Repeat as needed. Moving data off an IBM XIV is aseasy as moving data off an EMC DMX, EMC CLARiiON or EMC Celerra, and I am sure IBM wouldgladly demonstrate this on any EMC gear you now have.
Thanks to its clever RAID-X implementation, losing data on an IBM XIV is less likely thanlosing data on any RAID-5 based disk array from any storage vendor. Of course, there will always be skeptics about new technology that will want to try the box out for themselves.
If EMC thought the IBM XIV had nothing unique to offer, that its performance was just "OK",and is not as easy to manage as IBM says it is, then you would think EMC would gladly encourage such evaluations and comparisons, right?
No, I think EMC is afraid that companies will discover what they already know, that IBM has quality products that would stand a fair chance of side-by-side comparisons with their own offerings.We have enough fear, uncertainty and doubt from our current meltdown of the global financial markets, don't let EMC add any more.
Have a safe and fun Halloween! If you need to add some light to your otherwise dark surroundings, consider some of these ideas for [Jack-O-Lanterns]!
technorati tags: IBM, DS5300, ESG, benchmarks, SPC, SPC-1, SPC-2, SPC-3, VMware, VMmark, Oracle, Orion, Microsoft, JetStress, EMC, BarryB, RAID-X, RAID-5, DMX, CLARiiON, Celerra, XIV, financial, global markets, crisis, meltdown, Halloween, Jack-O-Lantern
This is page 34 of Sequoia Capital's[56-slide presentation] about the current financial meltdown. In the past, IT spending tracked closely to the rest of the economy, but the latest downturn has not yet reflected in IT spend.
The rest of the deck is worth going through, with interesting stats presented in a clear manner.
technorati tags: IBM, Sequoia Capital, tech spending, financial, crisis, meltdown, downturn
Well, it's Tuesday again, and that means more IBM announcements!
- Storage Area Network (SAN)
IBM and Cisco announced [three new blades] for the Cisco MDS 9500 seriesdirectors: 24-port 8 Gbps, 48-port 8 Gbps, and 4/44 blended. The 4/44blended has 4 of the faster 8 Gbps ports, and 44 of the 4 Gpbs ports,so that you can auto-negotiate down to 1 Gbps for your older gear, andstill take advantage of the faster 8 Gbps speeds during the transition.
On the Brocade side, IBM announced the newIBM System Storage Data Center Fabric Manager [DCFM] V10 software. This replaces the products formerly known as BrocadeFabric Manager and McData Enterprise Fabric Connection Manager (EFCM).This software can support up to 24 distinct fabrics, up to 9000 ports,including a mix of FCP, FICON, FCIP and iSCSI protocols.
And if you need help setting up your SAN, IBM has recently renamed itsservices, formerly known as "IBM Implementation Services for SAN fabric components" to ["IBM Storage and Data Product Services - IBM Implementation Services for storage software - SAN fabric components"]. There is no change to the services actually provided, which have been available since 2003, but IBM felt that renaming it would make good press coverage.
(On a related note, I heard that Microsoft is planning to rename "Windows Vista" to "Windows 7" next year! Like we say here in Tucson,if it ends in "-ista" it is going to fail in the marketplace! Perhaps EMC should rename their storage virtualization product to "In-7"?).
- IBM System Storage DR550
IBM announced today that it now supports [RAID 6 onthe DR550] compliance and retention storage system.
There are a few RAID-5 based EMC Centera customers out there who have notyet switched over to the IBM DR550, and now this might be just the littlenudge they need. For long-term retention of regulatory compliance data,RAID-5 doesn't cut it, you need an advanced RAID scheme, such as RAID-6, RAID-DP or RAID-X.
The DR550 provides non-erasable, non-rewriteable (NENR) storage supportto keep retention-managed data on disk and tape media. It supports 1 TBSATA disk drives and 1TB tape cartridges to provide high capacity at lowcost and "green" low energy consumption.
- IBM System Storage N series
Several of our disk systems got improved and enhanced. Let's start withthe IBM System Storage N series[hardware and software] enhancements. IBM now offers high-speed 450GB 15K RPM drives. These are Fibre Channel (FC) drives for the EXN4000 expansion drawers, and Serial Attached SCSI (SAS) drives for the entry-levelN3300 and N3600 models.
The "gateway" models now support a variety of functions that were formerlyonly available on the appliance models. This includes Advanced Single Instance Storage (A-SIS), Disk Sanitization, and FlexScale.
A-SIS is IBM's "other" deduplication function, and I talked about this in my post [A-SIS Storage Savings Estimator Tool]. Disk Sanitization will physicallywrite ones and zeros over existing data to eliminate it, what IBM sometimes calls "Data Shredding".
The last feature, FlexScale, might be new for many. It is software toenable to use of the "Performance Accelerator Module" (PAM). The PAM isa PCI-Express card with 16GB on-board RAM that acts as a secondary cachebehind main memory of the N series controller. Depending on the model,you can have one to five of these cards fit into the controller itself,boosting random read performance, metadata access, and write block destage.
- IBM System Storage DS5000
IBM's latest entry into the DS family has been hugely successful.In addition to Linux, Windows and AIX, the DS5000 now supports [Novell Netware and Sun Solaris] operating systems.
For infrastructure management, IBM has enhanced the Remote Support Manager [RSM]that supports DS3000 and DS4000 has been extended to support DS5000 as well. This software can monitor up to 50 disk systems, will e-mail alerts to IBM when something goes wrong, and allow IBM to dial in via modem to get more diagnostic information to improve service to the client. Also, the IBM System Storage Productivity Center [SSPC]which now supports the DS8000 and SAN Volume Controller (SVC) has been extended to also support the DS5000.
- IBM XIV Storage System
In addition to 1-year and 3-year maintenance agreements, IBM now offers[2-year, 4-year and 5-year] software maintenance agreements.
- RFID labels for IBM tape media
IBM 3589 (20-pack of LTO cartridges) and IBM 3599 (20-pack of 3592 cartridges for TS1100 series)now offer [RFID labels]. These labels match the volume serial (VOLSER) with a 216-bit unique identifier and 256 bits of user-defined content. This can help with tape inventory,and to prevent people from walking out of the building with a tape cartridge stuffed in their jacket.
- 32GB memory stick
While not technically part of the IBM System Storage matrix of offerings, Lenovo announced their new[Essential Memory Key] which holds 32GB of memory and workswith both USB 1.1 and USB 2.0 protocols.
I wish I could say this is it for the IBM announcements for October, given that this is the last Tuesday of the month, but there are three days left, so there might be just a few more!
technorati tags: IBM, SAN, Cisco, MDS9500, DCFM, BFM, EFCM, FCP, FICON, FCIP, iSCSI, Windows Vista, Windows 7, EMC, Centera, DR550, RAID-6, RAID-DP, RAID-X, NENR, FC, EXN4000, SAS, N3300, N3600, A-SIS, Disk Sanitization, FlexScale, PAM, DS5000, Netware, Solaris, DS3000, DS4000, DS8000, SVC, XIV, RFID, 3589, 3599, LTO, 3592, tape, cartridges, VOLSER
In collaboration with [The Feminist Press] and the[National Science Foundation], IBM launched today a new Web site called ["Under the Microscope"]to encourage young women to pursue education and careers in science, technology, engineering and math (STEM).
The site is filled with information. One item I found particularly interesting was Science Debate 2008's[14 Questions about Science] where the top two U.S. presidential candidates answer questions about science. Barack Obama's answers inDemocratic blue, and John McCain's answers in Republican red.
This is just one of the ways IBM is trying to reach out and help our next generation.
technorati tags: IBM, Under The Microscope, Feminist Press, NSF, STEM, education, science, debate, Barack Obama, John McCain