IBM works with a lot of research firms, and the solution is to put most of this data on tape, with just enough disk for specific analysis. Robin mentions a configurion with Sun Fire 4540 disk systems (aka Thumper). Despite Sun Microsystems' recent [$1.7 Billion dollar quarterly loss], I think even the experts at Sun would recommend a blended disk-and-tape solution for this situation.
Take for example IBM's Scale Out File Services [SoFS] which today handles 2-3 billion files in a single global file system, so 20-40 million would present no problem. SoFS supports a mix of disk and tape, with built-in movement, so that files that were referenced would automatically be moved to disk when needed, and moved back to tape when no longer required, based on policies set by the administrator. Depending on the analysis, you may only need 1 PB or less of disk to perform the work, which can easily be accomplished with a handful of disk systems, such as IBM DS8300 or IBM XIV, for example.
The rest would be on tape. Let's consider using the IBM TS3500 with [S24 High Density] frames. A singleTS3500 tape library with fifteen of these HD frames could hold 45PB of data, assuming 3:1 compression on 1TB-size 3592 cartridges. You wouldneed 40 (forty) of these libraries to get to the full 1800 PB required, and these could hold even more as higher capacity cartridges are developed. IBM has customers with over 40 tape libraries today (not all with these HD frames, of course), but the dimensions and scale that IBM is capable lies within this scope.
(For LTO fans, fifteen S54 frames would hold 32PB of data, assuming 2:1 compression on 800GB-size LTO-4 cartridges.so you would need 57 libraries instead of 40 in the above example.)
This blended disk-and-tape approach would drastically reduce the floorspace and electricity requirements when compared against all-disk configurations discussed in the post.
People are rediscovering tape in a whole new light. ComputerWorld recently came out with an 11-page Technology Brief titled [The Business Value of Tape Storage],sponsored by Dell. (Note: While Dell is a competitor to IBM for some aspects of their business, they OEM their tape storage systems from IBM, so in that respect, I can refer to them as a technology partner.) Here are some excerpts from the ComputerWorld brief:
For IT managers, the question isnot whether to use tape, but whereand how to best use tape as part of acomprehensive, tiered storage architecture.In the modern storage architecture,tape plays a role not onlyin data backup, but also in long-termarchiving and compliance.
“Long-term archiving is the primaryreason any company shoulduse tape these days,” says MikeKarp, senior analyst at EnterpriseManagement Associates in Boulder,Colo. Companies are increasinglylikely to use disk in conjunctionwith tape for backup, but for long-termarchiving needs, tape remainsunbeatable.
After factoring inacquisition costs of equipment andmedia, as well as electricity and datacenter floor space, Clipper Groupfound that the total cost of archivingsolutions based on SATA disk, theleast expensive disk, was up to 23times more expensive than archivingsolutions involving tape. Calculatingenergy costs for the competing approaches,the costs for disk jumpedto 290 times that of tape.
“Tape isalways the winner anywhere costtrumps anything else,” says Karp.No matter how the cost is figured,tape is less expensive.
Beyond IT familiarity with tape,analysts point to other reasons whyorganizations will likely keep tapein their IT storage infrastructures.Energy savings, for example, is themost recent reason to stick withtape. “The economics of tape arepretty compelling, especially whenyou figure in the cost of power,”Schulz says.
So, whether you are planning for an Exabyte-scale data center, or merely questioning the logic of a disk-for-everything storage approach, you might want to consider tape. It's "green" for the environment, and less expensive on your budget.
technorati tags: Robin Harris, StorageMojo, Exabyte, Data Center, IBM, blended, disk-and-tape, Sun, Huge Quarterly Loss, Thumper, SoFS, DS8300, XIV, N series, TS3500, S24, 3592, S54, LTO, LTO-4, ComputerWorld, Dell, Mike Karp, Greg Schulz