Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
In my presentations in Australia and New Zealand, I mentioned that people were re-discovering the benefits of removable media. While floppy diskettes were convenient way of passing information from one person to another, they unfortunately did not have enough capacity. In today's world, you may need Gigabytes or Terabytes of re-writeable storage with a file system interface that can easily be passed from one person to another. In this post, I explore three options.
(FCC Disclaimer: I work for IBM, and IBM has no business relationship with Cirago at the time of this writing. Cirago has not paid me to mention their product, but instead provided me a free loaner that I promised to return to them after my evaluation is completed. This post should not be considered an endorsement for Cirago's products. List prices for Cirago and IBM products were determined from publicly available sources for the United States, and may vary in different countries. The views expressed herein may not necessarily reflect the views and opinions of either IBM or Cirago.)
I took a few photos so you can see what exactly this device looks like. Basically, it is a plastic box that holds a single naked disk drive. It has four little rubber feet so that it does not slip on your desk surface.
The inside is quite simple. The power and SATA connections match those of either a standard 3.5 inch drive, or the smaller form factor (SFF) 2.5 inch drive. However, to my dismay, it does not handle EIDE drives which I have a ton of. After taking apart six different computer systems, I found only one had SATA drives for me to try this unit out with.
The unit comes with a USB cable and AC/DC power adapter. In my case, I found the USB 3.0 cable too short for my liking. My tower systems are under my desk, but I like keeping docking stations like this on the top of the desk, within easy reach, but that wasn't going to happen because the USB cable was not long enough.
Instead, I ended up putting it half-way in between, behind my desk, sitting on another spare system. Not ideal, but in theory there are USB-extension cables that probably could fix this.
Here it is with the drive inside. I had a 3.5 inch Western Digital [1600AAJS drive] 160 GB, SATA 3 Gbps, 8 MB Cache, 7200 RPM.
To compare the performance, I used a dual-core AMD [Athlon X2] system that I had built for my 2008 [One Laptop Per Child] project. To compare the performance, I ran with the drive externally in the Cirago docking station, then ran the same tests with the same drive internally on the native SATA controller. Although the Cirago documentation indicated that Windows was required, I used Ubuntu Linux 10.04 LTS just fine, using the flexible I/O [fio] benchmarking tool against an ext3 file system.
Sequential Write - a common use for external disk drive is backup.
Random read - randomly read files ranging from 5KB to 10MB in size.
Random mixed - randomly read/write files (50/50 mix) ranging from 5KB to 10MB in size.
Random Mixed (50/50)
Latency (msec) read
Latency (msec) write
Bandwidth (KB/s) read
Bandwidth (KB/s) write
For sequential write, the Cirago performed well, only about 15 percent slower than native SATA. For random workloads, however, it was 30-40 percent slower. If you are wondering why I did not get USB 3.0 speeds, there are several factors involved here. First, with overheads, 5 Gbps USB 3.0 is expected to get only about 400 MB/sec. My SATA 2.0 controller maxes out at 375 MB/sec, and my USB 2.0 ports on my system are rated for 57 MB/sec, but with overheads will only get 20-25 MB/sec. Most spinning drives only get 75 to 110 MB/sec. Even solid-state drives top out at 250 MB/sec for sustained activity. Despite all that, my internal SATA drive only got 16 MB/sec, and externally with the Cirago 14 MB/sec in sustained write activity.
Here is the mess that is inside my system. The slot for drive 2 was blocked by cables, memory chips and the heat sink for my processor. It is possible to damage a system just trying to squeeze between these obstacles.
However, the point of this post is "removable media". Having to open up the case and insert the second drive and wire it up to the correct SATA port was a pain, and certainly a more difficult challenge than the average PC user wishes to tackle.
Price-wise, the Cirago lists for $49 USD, and the 160GB drive I used lists for $69, so the combination $118 is about what you would pay for a fully integrated external USB drive. However, if you had lots of loose drives, then this could be more convenient and start to save you some money.
IBM RDX disk backup system
Another problem with the Cirago approach is that the disk drives are naked, with printed circuit board (PCB) exposed. When not in the docking station, where do you put your drive? Did you keep the [anti-static ESD bag] that it came in when you bought it? And once inside the bag, now what? Do you want to just stack it up in a pile with your other pieces of equipment?
To solve this, IBM offers the RDX backup system. These are fully compatible with other RDX sytems from Dell, HP, Imation, NEC, Quantum, and Tandberg Data. The concept is to have a docking station that takes removable, rugged plastic-coated disk-enclosed cartridges. The docking station can be part of the PC itself, similar to how CD/DVD drives are installed, or as a stand-alone USB 2.0 system, capable of processing data up to 25 MB/sec.
The idea is not new, about 10 years ago we had [Iomega "zip" drives] that offered disk-enclosed cartridges with capacities of 100, 250 and 750MB in size. Iomega had its fair share of problems with the zip drive, which were ranked in 2006 as the 15th worst technology product of all time, and were eventually were bought out by EMC two years later (as if EMC has not had enough failures on its own!)
The problem with zip drives was that they did not hold as much as CD or DVD media, and were more expensive. By comparison, IBM RDX cartridges come in 160GB to 750GB in size, at list prices starting at $127 USD.
IBM LTO tape with Long-Term File System
Removable media is not just for backup. Disk cartridges, like the IBM RDX above, had the advantage of being random access, but most tape are accessed sequentially. IBM has solved this also, with the new IBM Long Term File System [LTFS], available for LTO-5 tape cartridges.
With LFTS, the LTO-5 tape cartridge now can act as a super-large USB memory stick for passing information from one person to the next. The LTO-5 cartridge can handle up to 3TB of compressed data at up to SAS speeds of 140 MB/sec. An LTO-5 tape cartridge lists for only $87 USD.
The LTO-5 drives, such as the IBM [TS2250 drive] can read LTO-3, LTO-4 and LTO-5cartridges, and can write LTO-4 and LTO-5 cartridges, in a manner that is fully compatible with LTO drives from HP or Quantum. LTO-3, LTO-4 and LTO-5 cartridges are available in WORM or rewriteable formats. LTO-4 and LTO-5 cartridges can be encrypted with 256-bit AES built-in encryption. With three drive manufacturers, and seven cartridge manufacturers, there is no threat of vendor lock-in with this approach.
These three options offer various trade-offs in price, performance, security and convenience. Not surprisingly, tape continues to be the cheapest option.
Well, it's Tuesday again, and you know what that means! IBM Announcements! Typically, IBM System Storage has three to five major product launches per year. Making announcements every Tuesday would have been two frequent, and having one big announcement every two or three years would be too far apart. Worldwide combined revenues for storage hardware and software grew double digits last year, comparing full-year 2011 to the prior 2010 year, and I am sure that 2012 will also be a good year for IBM as well! This week we have announcements for both disk and tape, but since 2012 is the 60th Diamond Anniversary for tape, I will start with tape systems first.
TS1140 support for JA/JJ tape cartridges
The TS1140 enterprise tape drive was announced at the [Storage Innovation Executive Summit] last May. It supported a new E07 format on three different new tape cartridges. Models "JC" was 4.0TB standard re-writeable tapes, "JY" was 4.0TB WORM tapes, and "JK" were 500GB economy tapes that were less expensive, but offered faster random access.
Generally, IBM has adopted an N-2 read, N-1 write [backward compatibility]. This means that the TS1140 could read E05 and E06 formatted tapes on JB and JX media, and could write E06 format on JB and JX media. However, there are a lot of older JA and JJ media, especially as part of TS7740 environments, so IBM now supports TS1140 drives to read J1A formatted JA and JJ media. This is not just for TS7740 environments, any TS1140 in stand-alone or tape library configurations will support this as well.
TS7700 R2.1 enhancements
IBM is a leader in tape virtualization with or without physical tape as back-end media. There are two hardware models of the [IBM Virtualization Engine TS7700 family] for the IBM System z mainframe. These virtual libraries are referred to as "clusters" in IBM literature.
The TS7740 Virtual Tape Library supports putting virtual tape images on disk first, then move less-active data to physical tape, which I covered in my blog post [IBM Announcements - July 2007].
A unique feature of the TS7700 series is support for a Grid configuration, which allows up to six different TS7700 clusters to be grouped into a single instance image. These clusters can be in local or remote locations, connected via WAN or LAN connections.
R2.1 is the latest software release of this successful IBM's TS7700 series.
True Sync Mode Copy. Before R2.1, the TS7700 offered "immediate mode copy". An application would write to a virtual tape, and when it was done with the tape and performed an unmount, the TS7700 would then replicate the tape contents to a secondary cluster on the grid. With True Sync Mode, data contents are replicated per implicit or explicit SYNC points. This is another IBM first in the IT tape industry.
Remote Mount Fail-over. When you have two or more TS7700 clusters in a grid configuration, you can do remote mounts. We've added fail-over multi-pathing up to four paths, so that if a link to a remote cluster is down, it will try one of the others instead.
Parallel Copies and Pre-Migration. On of my 19 patents is for the pre-migration feature for the IBM 3494 Virtual Tape Server (VTS) that carries forward into the TS7700, and is also used in the SONAS and Information Archive products. However, when the grid architecture was introduced, the engineers decided not to allow pre-migration and copies to secondary clusters to occur concurrently. Now these two operations can be done in parallel.
Merge two grids into one grid. Now that we can support up to six clusters into a single grid, we have people with 2-cluster and 3-cluster grids looking to merge them into one. Of course, all the logical and physical volume serials (VOLSER) must be unique!
Accelerate off JA/JJ Media. There are a lot of older JA and JJ media still in TS7700 libraries. This feature allows customers to speed up the transition to newer physical tape media.
Copy Export to E06 format on JB media. This one is clever, and I have to say I would have never thought about it. Let's say you have a TS7740 with TS1140 drives, but you want to export some virtual tapes to physical media to be sent to someone who only has a TS7740 connected with older TS1130 drives. These older drives can't read new JC media nor make sense of the E07 format. This feature will let you export to older JB media in E06 format so that it will be fully readable at the new location on the TS1130 drives.
Copy Export Merge service offering. Thanks to mergers and acquisitions, it is sometimes necessary to split off a portion of data from a TS7700 grid. In the past, IBM supported sending this export to a completely empty TS7700 library, but this new service offerings allows the export to be merged into an existing TS7700 that already contains data.
LTFS-SDE support for Mac OS X 10.7 Lion
How do people still not yet know about the Linear Tape File System [LTFS]? I mentioned this in my blogs back in 2010 in [April], [September], and [November]. Last year, LTFS was the [NAB Show Pick Hits Award] and an [Emmy] for revolutionizing the use of digital tape in Television broadcasting.
In layman's terms, the Single Drive Edition [LTFS-SDE] allows a tape cartridge to be treated like USB memory stick. It is supported on the LTO5 tape drives for systems running various levels of Windows, Linux and Mac OS X. Prior to this announcement, IBM supported Snow Leopard (10.5.6) and Leopard (10.6), and now supports Mac OS X 10.7 "Lion" release.
IBM first introduced Solid-State Drives (SSD) back in 2007 where it made sense the most, in [drive-for-drive replacements on blade servers in the IBM BladeCenter]. Blade servers typically only have a single drive, and SSD are both faster and use less energy on a drive-for-drive comparison, so this provided immediate benefit. Today, SSD are available on a variety of System x and POWER system servers.
In 2008, IBM rocked the world by being the first to reach [1 Million IOPS with Project Quicksilver]. This was an all-SSD configuration which many considered unrealistic (at the time), but it showed the potential for solid state drives.
When the [XIV Gen3 was Announced - July 2011], each module included an 1.8-inch "SSD-Ready" slot in the back. IBM made a Statement of Direction that IBM would someday offer SSD drives to put in these slots. Today's announcement is that IBM has finalized the qualification process, so now XIV Gen3 clients can have 400GB of usable non-volatile SSD read cache added to each module. This SSD can be added to existing XIV Gen3 boxes in the field, or it can be factory-installed in new shipments. If you have a 15-module XIV, that's 6TB of additional read cache! This SSD is entirely managed by the XIV Gen3, so you won't have to spend weeks reading manuals or specifying configuration parameters.
When you carve volumes on the XIV, you now have an option to enable or disable use of the SSD cache for each volume. Since XIV is being used in private and public cloud deployments, this offers the ability to offer premium performance at premium prices. The use of SSD is complementary to IBM XIV Quality of Service (QoS) performance levels, which are determined by host instead.
Well, that's the first major IBM System Storage launch of 2012. Let me know what you think in the comment section below.
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
This week, Hitachi Ltd. announced their next generation disk storage virtualization array, the Virtual Storage Platform, following on the success of its USP V line. It didn't take long for fellow blogger Chuck Hollis (EMC) to comment on this in his blog post [Hitachi's New VSP: Separating The Wheat From The Chaff]. Here are some excerpts:
"Well, we all knew that Hitachi (through HDS and HP) would be announcing some sort of refresh to their high-end storage platform sooner or later.
As EMC is Hitachi's only viable competitor in this part of the market, I think people are expecting me to say something.
If you're a high-end storage kind of person, your universe is basically a binary star: EMC and Hitachi orbiting each other, with the interesting occasional sideshow from other vendors trying to claim relevance in this space."
Chuck implies that neither Hewlett-Packard (HP) nor Hitachi Data Systems (HDS) as vendors provide any value-add from the box manufactured by Hitachi Ltd. so combines them into a single category. I suspect the HP and HDS folks might disagree with that opinion.
When I reminded Chuck that IBM was also a major player in the high-end disk space, his response included the following gem:
"Many of us in the storage industry believe that IBM currently does not field a competitive high-end storage platform. IDC market share numbers bear out this assertion, as you probably know."
While Chuck is certainly entitled to his own beliefs and opinions, believing the world is flat does not make it so. Certainly, I doubt IDC or any other market research firm has put out a survey asking "Do you think IBM offers a competitive high-end disk storage platform?" Of course, if Chuck is basing his opinion on anecdotal conversations with existing EMC customers, I can certainly see how he might have formed this misperception. However, IDC market share numbers don't support Chuck's assertion at all.
There is no industry-standard definition of what is a "high-end" or "enterprise-class" disk system. Some define high-end as having the option for mainframe attachment via ESCON and/or FICON protocol. Others might focus on features, functionality, scalability and high 99.999+ percent availability. Others insist high-end requires block-oriented protocols like FC and iSCSI, rather than file-based protocols like NAS and CIFS.
For the most demanding mission-critical mix of random and sequential workloads, IBM offers the [IBM System Storage DS8000 series] high-end disk system which connects to mainframes and distributed servers, via FCP and FICON attachment, and supports a variety of drive types and RAID levels. The features that HP and HDS are touting today for the VSP are already available on the IBM DS8000, including sub-LUN automatic tiering between Solid-State drives and spinning disk, called [Easy Tier], thin provisioning, wide striping, point-in-time copies, and long distance synchronous and asynchronous replication.
There are lots of analysts that track market share for the IT storage industry, but since Chuck mentions [IDC] specifically, I reviewed the most recent IDC data, published a few weeks ago in their "IDC Worldwide Quarter Disk Storage Tracker" for 2Q 2010, representing April 1 to June 30, 2010 sales. Just in case any of the rankings have changed over time, I also looked at the previous four quarters: 2Q 2009, 3Q 2009, 4Q 2009 and 1Q 2010.
(Note: IDC considers its analysis proprietary, out of respect for their business model I will not publish any of the actual facts and figures they have collected. If you would like to get any of the IDC data to form your own opinion, contact them directly.)
In the case of IDC, they divide the disk systems into three storage classes: entry-level, midrange and high-end. Their definition of "high-end" is external RAID-protected disk storage that sells for $250,000 USD or more, representing roughly 25 to 30 percent of the external disk storage market overall. Here are IDC's rankings of the four major players for high-end disk systems:
By either measure of market share, units (disk systems) or revenue (US dollars), IDC reports that IBM high-end disk outsold both HDS and HP combined. This has been true for the past five quarters. If a smaller start-up vendor has single digit percent market share, I could accept it being counted as part of Chuck's "occasional sideshow from other vendors trying to claim relevance", but IBM high-end disk has consistently had 20 to 30 percent market share over the past five quarters!
Not all of these high-end disk systems are connected to mainframes. According to IDC data, only about 15 to 25 percent of these boxes are counted under their "Mainframe" topology.
Chuck further writes:
"It's reasonable to expect IBM to sell a respectable amount of storage with their mainframes using a protocol of their own design -- although IBM's two competitors in this rather proprietary space (notably EMC and Hitachi) sell more together than does IBM."
The IDC data doesn't support that claim either, Chuck. By either measure of market share, units (disk systems) or revenue (US dollars), IDC reports that IBM disk for mainframes outsold all other vendors (including EMC, HDS, and HP) combined. And again, this has been true for the past five quarters. Here is the IDC ranking for mainframe disk storage:
IBM has over 50 percent market share in this case, primarily because IBM System Storage DS8000 is the industry leader in mainframe-related features and functions, and offers synergy with the rest of the z/Architecture stack.
So Chuck, I am not picking a fight with you or asking you to retract or correct your blog post. Your main theme, that the new VSP presents serious competition to EMC's VMAX high-end disk arrays, is certainly something I can agree with. Congratulations to HDS and HP for putting forth what looks like a viable alternative to EMC's VMAX.
To learn more about IBM's upcoming products, register for next week's webcast "Taming the Information Explosion with IBM Storage" featuring Dan Galvan, IBM Vice President, and Steve Duplessie, Senior Analyst and Founder of Enterprise Storage Group (ESG).
Continuing my post-week coverage of the [Data Center 2010 conference], Wednesday evening we had six hospitality suites. These are fun informal get-togethers sponsored by various companies. I present them in the order that I attended them.
Intel - The Silver Lining
Intel called their suite "The Silver Lining". Magician Joel Bauer wowed the crowds with amazing tricks.
Intel handed out branded "Snuggies". I had to explain to this guy that he was wearing his backwards.
i/o - Wrestling with your Data Center?
New-comer "i/o" named their suite "Wrestling with your Data Center?" They invited attendees frustrated with their data centers to don inflated Sumo Wrestling suits.
APC by Schneider Electric - Margaritaville
This will be the last year for Margaritaville, a theme that APC has used now for several years at this conference.
Cisco - Fire and Ice
Cisco had "Fire and Ice" with half the room decorated in Red for fire, and White for ice.
This is Ivana, welcoming people to the "Ice" side.
This is Peter, on the "Fire" side. Cisco tried to have opposites on both sides, savory food on one side, sweets on the other.
CA Technologies - Can you Change the Game?
CA Technologies offered various "sports games", with a DJ named "Coach".
Compellent - Get "Refreshed" at the Fluid Data Hospitality Suite
Compellent chose a low-key format, "lights out" approach with a live guitarist. They had hourly raffles for prizes, but it was too dark to read the raffle ticket numbers.
Of the six, my favorite was Intel. The food was awesome, the Snuggies were hilarious, and the magician was incredibly good. I would like to think Intel for providing me super-secret inside access to their Cloud Computing training resources and for the Snuggie!
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData blog, posted a ["bleg"] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
Keeping more backup data available online for fast recovery.
Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
Will this meet my current business requirements?
Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
On Wikibon, David Floyer has an article titled [SAS Drives Tier 1 to New Levels of Green] that focuses on the energy efficiency benefits of newer Serial-Attach SCSI (SAS) drives over older Fibre Channel (FC) drives. This makes sense, as R&D budgets have been spent on making newer technologies more "green".
Of course, people might consider this an [apples-to-oranges] comparison. Not only are we changing from FC to SAS technology, we are also changing from 3.5-inch drives to small form factor (SFF) 2.5-inch drives. It seems odd to specify 2000 drives, when only two of the five scale up to that level. Few systems in production, from any vendor, have more than 1000 drives, so it would have seemed that would have been a fairer comparison.
However, Hu's conclusion that the combination of SAS and SFF provides better performance and energy efficiency for both IBM DS8800 and HDS VSP than FC-based alternatives from any vendor seems reasonably supported by the data.
Meanwhile, fellow blogger David Merrill (HDS) pokes fun at IBM DS8800 in Figure 2 in his post [Winner o’ the green]. This second comparison was for 4PB of raw capacity, which 4 of the 5 can handle easily using 2TB SATA drives, but the DS8800 is based on SAS technology and does not support 2TB SATA drives. A performance-oriented configuration with four distinct DS8800 boxes employing 600GB SAS drives is used instead, causing the data for the DS8800 to stick out like a sore thumb, or perhaps more intentionally as a middle finger.
The main take-away here is that IBM offers both the DS8700 for capacity-optimized workloads, and the DS8800 for performance-optimized workloads. Some competitors may have been spreading FUD that the DS8700 was withdrawn last month, it wasn't. As you can see from the data presented, there are times where a DS8700 might be more preferable than a DS8800, depending on the type of workloads you plan to deploy. IBM offers both, and will continue to support existing DS8700 and DS8800 units in the field for many years to come.
Did you miss IBM Pulse 2013 this week? I wasn't there either, having scheduled visits with clients in Washington DC this week, only to have those meetings cancelled due to the [U.S. sequestration cuts].
Fortunately, there are plenty of videos and materials to review from the event. Here's a [12-minute video] interview between Laura DuBois, Program VP of Storage for industry analyst firm [IDC], and fellow IBM executive Steve "Woj" Wojtowecz, VP of Tivoli Storage and Networking Software.
(Update: Apparently, IBM had not secured re-distribution rights from IDC to post this video prior to my blog post. IBM now has full permission to distribute. My apologies for any inconvenience last week.)
The two discuss client opportunities and requirements for storage clouds and compute clouds. Client cloud storage requirements include backup and archive clouds, file storage clouds, and storage that supports compute cloud environments.
Each quarter since 2006, the [IBM Migration Factory] team has tallied the number of clients who have moved to IBM severs and storage systems from competitive hardware. We'll I've just seen the latest numbers, for the third quarter of 2010, and it looks like we set a new quarterly record with nearly 400 total migrations to IBM from Oracle/Sun and HP.
It's clear that companies and governments worldwide are seeing greater value in IBM systems, while Oracle and HP watch their customer bases erode. In just this past 3Q 2010, nearly 400 clients have moved over to IBM -- almost all of them from Oracle/Sun and HP. Of these, 286 clients migrated to IBM Power Systems, running AIX, Linux and IBM i operating systems, from competitors alone -- nearly 175 from Oracle/Sun and nearly 100 from HP. The number of migrations to IBM Power Systems through the first three quarters of 2010 is nearly 800, already exceeding the total for all of last year by more than 200.
Let's do the math.... Since IBM established its Migration Factory program in 2006, more than 4,500 clients have switched to IBM. More than 1,000 from Oracle/Sun and HP joined the exodus this year alone. In less than five years, almost 3,000 of these clients -- including more than 1,500 from Oracle/Sun and more than 1,000 from HP -- have chosen to run their businesses on IBM's Power Systems. That's more than a client per day making the move to IBM!
And as the servers go, so goes the storage. Clients are re-discovering IBM as a server and storage powerhouse, offering a strong portfolio in servers, disk and tape systems, and how synergies between servers and storage can provide them real business benefits.
Adding it all up, it's clear that IBM's multi-billion dollar investment in helping to build a smarter planet with workload-optimized systems is paying off -- and that, more and more, clients are selecting IBM over the competition to help them meet their business needs.
Guest Post: The following post was written by Tom Rauchut, IBM Infrastructure Architect and Advanced Technical Sales Specialist for Tivoli Automation. Tom is at IBM Pulse 2011 for Las Vegas this week, and has offered to send his observations.
The expo opened last night. There are so many fantastic demos and product experts. Las Vegas has a Tivoli buzz on right now.
In this case, it is not chess pieces, but FUD being slung around like mud between vendors. EMC blogger Chuck Hollis' post [Products vs. Features] correctly pointsout that IBM has invented most nearly everything useful in IT, and sadly a few things we wish we hadn't.Gene Amdahl, who left IBM to start his own company, is credited for coining the phrase describing IBM'sinnovative sales techniques. Wikipedia has a nice write up on the history of[Fear, Uncertainty and Doubt(FUD)].
Nowadays, when you hear "FUD" most storage administrators immediately think of EMC, who have taken this method to anew level of art-form. Take for example two EMC entries from fellow blogger BarryB, on his Storage Anarchist blog:[Not Dead Yet, andPushing Daisies].The first is a reference to a funny scene from a Monty Python movie, and the second one is referring to a terriblenew television program called "Pushing Daisies". (In this show, the main character can bring a dead personback to life for sixty seconds, just long enough to ask a few questions on behalf of his detective friend. He must touch the person again within 60 seconds, or someone else randomly dies instead. I amnot a fan of this concept, and found it a bit morbid and creepy. But I digress.)
It is true I was on vacation the past two weeks, but this was group travel I booked over six months ago before we had the exact dates lined up for our various announcements, and not a last-minute celebration of my recent new job assignment. I got all my assignments for this announcement turned in before leaving for my trip. I never thought of checking with fellow IBM blogger BarryW to make sure that we don't have overlapping vacation schedules, leaving the "blogosphere" unmanned, so to speak, but it is not a bad idea. Fortunately, our IBM PR team was able to make their rebuttal through other means. You can read the recap on Techworld [Marketing Wars by Proxy].
Several astute readers on my blog, however, requested that I add my two cents. Let's take a look at some of BarryB's comments:
...most DS8300's are to this day most frequently bundled as "free" storage with IBM mainframe and server sales.
We just shipped our 15,000th box, so for this absurd statement to be true, more than half would have to be given away as part of a server-and-storage deal?Actually, about a third of our DS8000 sales are sold with servers in the same bundle, and while we do provide discounts from the official list price, that is not the same as "free". The other two thirds are sold into accounts to be used with the existing servers already deployed. So BarryB, your math doesn't work out. (Perhaps you've been taking Hitachi math lessons???)
It is interesting however, that when we do a 4-year TCO comparison, between a normally-discounted DS8000 versus free EMC DMX4 hardware, IBM still has the lower cost, given that most of the price-gouging from EMC happens after the initial sale, through software features, annual Powerpath renewals and MES upgrades. If you are an EMC customer, and you are planning to add more capacity to your DMX, ask EMC to charge you no more than what you originally paid on a dollar-per-GB basis for the initial capacity. That's only fair, right?
...No thin provisioning, or even a commitment to thin provisioning. Just crickets. (Celerra support since Jan 2006...
EMC DMX does not have thin provisioning available today either, so BarryB brings up Celerra, their NAS box? IBM System Storage N series NAS box also has thin provisioning, so if you want thin provisioning you can buy a NAS box from EMC or IBM. Thin provisioning makes sense using NAS protocols, as there are actual commands to "delete a file" that can then free up the related blocks in a thin-provisioned environment. The only way to do this with block-oriented protocols is to get the OS to notify the storage device that blocks can be freed up. As it turns out, IBM's z/OS has such support, which we developed specifically for our thin-provisioning support in our IBM RAMAC Virtual Array disk systems back in the 1990s.For block-oriented devices on most other operating systems, thin provisioning may not be all that it is cracked up to be.
No SATA drives (only DMX-4 supports native SATA-II drives, since Aug’07)
A few people are confused on this. IBM DS8000 has supported FATA for quite some time now, same slower speeds and higher capacities as SATA, but are technically NOT the same as SATA. FATA are designed to provide better protection against vibrational shock, to improve reliability of the drives. IBM felt that if the data was important enough to put on a high-end system, it should get better-than-SATA treatment. If you really want SATA, try our IBM System Storage N series, DS4000 or DS3000 models.
No RAID 6 (DMX-3 has supported multi-dimensional RAID since Q1’07, DMX-4 since Aug'07, ...
IBM N series supports RAID6, but we called it RAID-DP and that confused some people. Same thing, DP stands for Dual Parity, protecting against a double-disk failure. We also just announced RAID6 on our DS4000 series, by the way.
No 4Gb back-end (USP-V since May '07, DMX-4 since Aug’07)
I found this one odd, since BarryB himself in an earlier post explained why 4Gbps back-end made no difference to DMX4 performance in this post [DMX-4 and Oh So Much More], which I will put into a different color so you can tell it is from a different post:
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s. Transmit times are really only a tiny portion of I/O overhead, and just don't make that much difference when a massively-cached system is pre-fetching reads, buffering/delaying writes and reordering I/O requests to minimize seek times. Not that 4Gb/s won't help some applications, but most people just won't see any noticeable difference.
In this case, BarryB is right. The IBM DS8000's 2Gbps back-end is not a performance bottleneck. The DS8000 with a 2Gbps back-end is faster than DMX4 with a 4Gbps back-end for business application workloads. EMC doesn't publish SPC benchmarks to deny this, so you will just have to take our word on this.
Still only 1024 maximum disk drives (DMX-3 & 4 support up to 2400 drives, USP-V supports 1152)
I would be curious to see how many customers have more than 1024 drives on any high-end disk array.As we learned back in [Day 2 Storage Symposium], the average DS8100 has 17.4 TB, and DS8300 has 41.5 TB capacity. Using 500GB drives,that's only 83 spindles. Even with 73GB drives, that's 568 spindles. Plenty of room for growth, so I am notconvinced that higher theoretical upper architectural limits are worth discussing here.
Still only two HARD LPARs (partitions) ..., and even IBM’s mid-tier products support more than 2 storage partitions (in this same announcement)
IBM's two LPARs are TWICE what EMC DMX offers. I don't even know why anyone from EMC would bring this up? While EMC is enjoying their success with VMware, the lack the experience to carry this over to their storage lines. Until EMC offers MORE THAN TWO of any kind of partitions on their high-end offerings, there just is no credibility here. As for our "storage partitions" on our DS4000 line, that is an unfortunate mis-understanding of the press release. On the DS4000, the term "storage partition" is really "LUN masking", dividing up only which disks can be accessed by which hosts, and not dividing up any processor or cache capacity. So this is not the same as any LPAR concept on any other system. For example, a DS4000 with 64 partitions can be attached to 64 hosts, or 64 host-clusters like a Windows MSCS environment or AIX HACMP.
No native Ethernet replication or iSCSI support (Symmetrix has had since 2002)
Again, I found this one odd. On another EMC post, [Vigorous Debates],Chad Sakac mentions that only 2% of Symmetrix are sold with IP ports, not sure if this is for Ethernet replication, iSCSI attachment, or both (Again, I will use a different color):
On the Symm business (a huge part of EMC’s business – the IP ports are included on 2% of deals. That’s a fact.
Just because engineer can put a feature or function on a box, doesn't mean there is business sense to do so. I would hate for IBM to invest millions of dollars on native iSCSI support, only to have 2% of our DS8000 boxes sold with that feature. Customers who have DS8000 on FC SANs already deployed can easily add iSCSI support either through their SAN switches, or by fronting the DS8000 with an N series gateway. Most customers looking for native iSCSI are the smaller no-SAN-deployed SMB customers, and for them, we have both the DS3300 and the various N series models to choose from.
Well that's my two cents. The DS8000 series remains a strategic part of the IBM System Storage offering matrix, with continued investment in the development, as well as on-going research that we can leverage throughout the IBM company. I would like to read your thoughts on this, post me a comment below.
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
Storage for the Green Data Center
I presented this topic in four general categories:
Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
I have arrived safely to San Francisco, and was able to check-in at the hotel, pick up my registration badge for Oracle OpenWorld 2011, and attend the first keynote session. This is the largest Oracle OpenWorld event to-date, with over 45,000 attendees from 117 different countries. There are 520,000 square feet of exhibition floor, and over 2,400 educational sessions. The conference is spread across the different buildings of the Moscone center, as well as nearby hotels. On average, attendees will walk seven miles during the week.
Larry Ellison was the keynote speaker for this first kick-off session. He focused almost exclusively on server and storage hardware. He feels that business is all about moving data, not doing integer math.
At the beginning of 2011, Oracle had only sold about 1,000 Exadata, but they have a sales target to sell an additional 3,000 Exadata boxes by year end.
The Exadata offers up to 10x columnar compression, and has 10x faster bandwidth (40Gbps Infiniband versus 4Gbps FCP). If you have a 100TB database, it would take up only 10TB of disk with this approach. He claims that the 90TB of disk you don't have to buy can then be used to buy more DRAM and/or Flash SSD.
(Realistically, since SSD is 15x more expensive than spinning disk, you can only purchase about 6TB of Flash for the 90TB you save on disk!)
Larry claims the design point for Exadata and Exalogic was to offer a system that was more powerful than IBM's fastest P795 computer, but cheaper than commodity x86 hardware. His secret is to "Parallel everything" for faster performance, and no single points of failure (SPOF). Exadata offers up to 10-50x faster query, and 4-10x faster OLTP. To keep costs low, Exadata uses all commodity hardware except the Infiniband. He cited various customer examples:
A company replaced 36 Teradata with 3 Exadata and result was application was 8x faster.
Banco Chile 9x faster than previous system
Deutsche Post 60x faster
Sogetti gets 60x faster backups.
French bank BNP Paribas 17x faster and no change to applications.
Proctor & Gamble 18x faster
Merck 5x faster
Turkcell 250TB compressed to 25TB, 10x faster
The problem was that in each example, he said what it was compared against was the old previous system, which varies and could have been an older Sun system, or an old system from HP, IBM or Dell. Perhaps it was a freudian slip, but Larry mistakenly said "Paralyze" your applications, when he probably meant to "Parallelize".
Of all their 380,000 Oracle customers, 70 percent have SPARC/Solaris and/or Linux. Last week, Oracle announced the new SPARC-T4, which Larry claimed was 5x faster than the previous SPARC-T3. Larry feels that for the first time ever, a non-IBM CPU can challenge the long-standing rein of the IBM POWER series processor. Larry admitted that the IBM POWER7 chip actually did some tasks faster than the SPARC-T4, so his work is not yet done, but they plan to offer a new SPARC-T5 next year that will be 2x better than the SPARC-T4.
Larry compared the I/O bandwidth of serv ers based on SPARC-T4, compared to POWER7, and found that the SPARC-T4 has double the I/O bandwidth, for a cost that was only about 1/4 the cost of a mainframe. IBM offers both. POWER7-based servers for CPU-intensive workloads, and System z (S/390)-based systems for I/O-intensive workloads. Larry feels that even though POWER7 is superior than SPARC-T4 for mathematical calculations, all business applications are focused on I/O-bandwidth to move data, not computations.
Larry claims the new SPARC-T4 can do 1.2 million IOPS. He uses 40 Gbps Infiniband instead of traditional SAN-attached FCP solutions.
A new "box" called Exalytics, combines their commodity hardware platform with a hueristic adaptive in-memory cache, their latest "me-too" solution that compares with what IBM already offers in [IBM SolidDB]. In fact, their me-too is not even internally developed, but rather the result of an acquisition of a company called "Times Ten". I thought it was interesting that the only piece of Oracle software mentioned during Larry's 90-minute speach, was this piece of acquired technology. The new Exalytics product run on a small rack and grow, analyzing relational data, non-relational OLAP, as well as unstructured documents. The result is what Larry called "the Speed of Light".
He also mentioned that Bob Shimp would kick-off the Cloud later in the week. Given that Larry himself thought that Cloud was a stupid, over-marketed term that nobody has deployed over the past few years, to a complete believer, claiming that over 20 live demos will be given this year on Cloud.
Perhaps the funniest quote was his motivation to use Infiniband as the interconnect
"Ethernet was invented by Xerox when I was a child."
-- Larry Ellison
Here are some sessions that IBM is featuring on Monday. Note the first two are Solution Spotlight sessions at the IBM Booth #1111 where I will be most of the time.
IBM Cloud Computing Solutions for Oracle
10/03/11, 10:30 a.m. – 11:00 a.m., Solution Spotlight, Booth #1111 Moscone South
Presenter: Chuck Calio,Technical Strategist, IBM Systems & Technology Group
IBM is recognized in the IT industry as one of the "Big 6" cloud providers, along with Amazon, Google, Microsoft, Salesforce and Yahoo. This session will highlight how IBM Cloud offerings apply to Oracle applications.
Lowering Cost and increasing efficiency in your long term support of Oracle EPM and BI
10/03/11, 3:00 p.m. -- 3:30 p.m., Solution Spotlight, Booth #1111 Moscone South
Presenter: Matthew Angelstad, IBM Global Business Solutions - Oracle EPM (Hyperion) Practice Lead
In 2007, Oracle acquired Hyperion, a leading provider of performance management software. This session will show how IBM helps Oracle clients unify Enterprise Performance Management (EPM) and Business Intelligence (BI) in a cost-effective manner, supporting a broad range of strategic, financial and operational management processes.
Application Strategy: Charting the Course for Maximum Business Value
10/03/11, 3:30 p.m. – 4:30 p.m., OpenWorld session #39061
Presenter: Mike Marchildon, IBM
The industry is undergoing a shift from single Enteprise Resource Planning (ERP) application to second-generation platforms containing diverse yet interdependent systems. This shift presents opportunities and challenges for both IT and the business.
This week I was aboard the Queen Mary in Long Beach, California! This was a business event organized by [Key Info Systems], a valued IBM Business Partner. Key Info resells IBM servers, storage and switches.
The Queen Mary retired in 1967, and has been converted into a hotel and events venue. The locals just parked their car and walked on board, but I got to stay Tuesday through Thursday in one of the cabins. It was long and narrow, with round windows! There were four dials for the bathtub: Cold Salt, Hot Fresh, Cold Fresh, and Hot Salt.
Stepping on the boat was like walking back in time through history! If you decide to go see it, check out the [Art Deco bar at the front of the Promenade deck. The ship is still in the water, but is permanently docked. It is sectioned off to prevent the ocean waves from affecting it, so we did not have the nauseous moving back and forth normally associated with cruise ships.
(It is with a bit of irony that we are on the Queen Mary just days after the tragedy of the [Costa Concordia], the largest Italian cruise ship that ran aground near Isola de Giglio. The captain will have to explain how he [fell into a lifeboat] before he had a chance to wait for everyone else to get safely off the shipwreck. He was certainly no [Captain Sulley]! I am thankful that most of the 4,200 people survived the incident.)
Lief Morin, Founder and Chief Executive for Key Info Systems, kicked off the meeting with highlights of 2011 successes. I have known Lief for years, as Key Info comes to the Tucson EBC on a frequent basis. This event was designed to give his sellers an update of what is the latest for each product line, and what to look forward to in the next 12-18 months.
The next speaker was from Vision Solutions that provides High Availability solutions for IBM i on Power Systems. In 2010, their company nearly doubled in size with the acquisition of Double-Take, which provides data replication for x86 servers running Windows, Linux, VMware, Hyper-V and other hypervisors. The capabilities of Double-Take sounded similar to what IBM offers with [Tivoli Storage Manager FastBack] and [Tivoli Storage Manager for Virtual Environments].
Dinner at Sir Winston's
Rather than take the "Ghosts and Legends" tour, I opted for dinner at the Queen Mary's signature restaurant, Sir Winston's. This is a fancy place, so dress accordingly. If you want the Raspberry soufflé, order it early as it takes 30 minutes to prepare!
[Storwize V7000], including the new Storwize V7000 Unified configuration
Storage is an important part of the Key Info Systems revenue stream, so I was glad to have lots of questions and interactions from the audience.
Murder Mystery Dinner
The acting troupe from [Dinner Detective] put on quite the show for us! With all that is going on in the world, it is good to laugh out loud every now and then.
In other murder mystery dinners I have participated in, each person is assigned a "character" and given a script of what to say and when to say it. This was different, we got to pick our own characters. I chose "Doctor Watson", from the Sherlock Holmes series. Several attendees thought it was a double meaning with [IBM Watson], the computer that figured out the clues on Jeopardy! television game show, and has since been [put to work at Wellpoint] to help out the Healthcare industry.
After the "murder" happened, two actors portraying policemen selected members of the audience to answer questions. We didn't get a script of what to say, so everyone had to "ad lib". I was singled out as a suspect, and had fun playing along in character. One of the attendees afterwards said he was impressed that I was able to fabricate such amusing and elaborate responses to their personal and embarassing questions. As a public speaker for IBM, I have had a lot of practice thinking quickly on my feet.
Fibre Channel and Ethernet Switches
The next two speakers gave us an update on Fibre Channel and Ethernet switches, and their thoughts on the inevitability of Fibre Channel over Ethernet (FCoE). One of the exciting new developments is the [Brocade Network Subscription] which creates a flexible pay-per-use Ethernet port rental model for customers. This is especially timely given the Financial Accounting Standards Board proposed [FASB Change 13] that affects operating leases in the balance sheet.
With the Brocade Network Subscription, you pay monthly for the ports you are using. Need more ports, Brocade will install the added gear. Use fewer ports, Brocade will take the equipment back. There is no term endpoint or residual value like tradtional leasing, so when you are done using the equipment, give it back any time. This is ideal for companies that may need to have a lot of Ethernet ports for the next 2-3 years, but then plan to taper down, and don't want to get stuck with a long-term commitment or capital depreciation.
The last speaker was from VMware. IBM is the #1 reseller of VMware, and VMware commands an impressive 81 percent marketshare in the x86 virtualization space. The speaker presented VMware's strategy going forward, which aligns well with IBM's own strategy, to help companies Cloud-enable their existing IT infrastructures, in preparation for eventual moves to Hybrid or Public cloud deployments.
Special thanks to Lief Morin for sponsoring this event, Raquel Hernandez from IBM for coordinating my travel, and Pete, Christina and Kendrell from Key Info Systems for organizing the activities!
I'm down here in Australia, where the government is a bit stalled for the past two weeks at the moment, known formally as being managed by the [Caretaker government]. Apparently, there is a gap between the outgoing administration and the incoming administration, and the caretaker government is doing as little as possible until the new regime takes over. They are still counting votes, including in some cases dummy ballots known as "donkey votes", the Australian version of the hanging chad. Three independent parties are also trying to decide which major party they will support to finalize the process.
While we are on the topic of a government stalled, I feel bad for the state of Virginia in the United States. Apparently, one of their supposedly high-end enterprise class EMC Symmetrix DMX storage systems, supporting 26 different state agencies in Virginia, crashed on August 25th and now more than a week later, many of those agencies are still down, including the Department of Motor Vehicles and the Department of Taxation and Revenue.
Many of the articles in the press on this event have focused on what this means for the reputation of EMC. Not surprisingly, EMC says that this failure is unprecedented, but really this is just one in a long series of failures from EMC. It reminds me of the last time EMC had a public failure with a dual-controller CLARiiON a few months ago that stopped another company from their operations. There is nothing unique in the physical equipment itself, all IT gear can break or be taken down by some outside force, such as a natural disaster. The real question, though, is why haven’t EMC and the State Government been able to restore operations many days after the hardware was fixed?
In the Boston Globe, Zeus Kerravala, a data storage analyst at Yankee Group in Boston, is quoted as saying that such a high-profile breakdown could undermine EMC’s credibility with large businesses and government agencies. “I think it’s extremely important for them,’’ said Kerravala. “When you see a failure of this magnitude, and their inability to get a customer like the state of Virginia up and running almost immediately, all companies ought to look at that and raise their eyebrows.’’
Was the backup and disaster recovery solution capable of the scale and service level requirements needed by vital state
agencies? Had they tested their backups to ensure they were running correctly, and had they tested their recovery plans? Were they monitoring the success of recent backup operations?
Eventually, the systems will be back up and running, fines and penalties will be paid, and perhaps the guy who chose to go with EMC might feel bad enough to give back that new set of golf clubs, or whatever ridiculously expensive gift EMC reps might offer to government officials these days to influence the purchase decision making process.
(Note: I am not accusing any government employee in particular working at the state of Virginia of any wrongdoing, and mention this only as a possibility of what might have happened. I am sure the media will dig into that possibility soon enough during their investigations, so no sense in me discussing that process any further.)
So what lessons can we learn from this?
Lesson 1: You don't just buy technology, you also are choosing to work with a particular vendor
IBM stands behind its products. Choosing a product strictly on its speeds and feeds misses the point. A study IBM and Mercer Consulting Group conducted back in 2007 found that only 20 percent of the purchase decision for storage was from the technical capabilities. The other 80 percent were called "wrapper attributes", such as who the vendor was, their reputation, the service, support and warranty options.
Lesson 2: Losing a single disk system is a disaster, so disaster recovery plans should apply
IBM has a strong Business Continuity and Recovery Services (BCRS) services group to help companies and government agencies develop their BC/DR plans. In the planning process, various possible incidents are identified, recovery point objectives (RPO) and recovery time objectives (RTO) and then appropriate action plans are documentede on how to deal with them. For example, if the state of Virginia had an RPO of 48 hours, and an RTO of 5 days, then when the failure occurred on August 25, they could have recovered up to August 23 level data(48 hours prior to the incident) and be up and running by August 30 (five days after the incident). I don't personally know what RPO and RTO they planned for, but certainly it seems like they missed it by now already.
Lesson 3: BC/DR Plans only work if you practice them often enough
Sadly, many companies and government agencies make plans, but never practice them, so they have no idea if the plans will work as expected, or if they are fundamentally flawed. Just as we often have fire drills that force everyone to stop what they are doing and vacate the office building, anyone with an IT department needs to practice BC/DR plans often enough so that you can ensure the plan itself is solid, but also so that the people involved know what to do and their respective roles in the recovery process.
Lesson 4: This can serve as a wake-up call to consider Cloud Computing as an alternative option
Are you still doing IT in your own organization? Do you feel all of the IT staff have been adequately trained for the job? If your biggest disk system completely failed, not just a minor single or double drive failure, but a huge EMC-like failure, would your IT department know how to recover in less than five days? Perhaps this will serve as a wake-up call to consider alternative IT delivery options. The advantage of big Cloud Service Providers (Microsoft, Google, Yahoo, Amazon, SalesForce.com and of course, IBM) is that they are big enough to have worked out all the BC/DR procedures, and have enough resources to switch over to in case any individual disk system fails.
Recently, I spoke with Jarrett Potts, my long-time friend and former IBM colleague, who now works as Director of Strategic Marketing over at STORServer. If you have never heard of STORServer, it is a company that makes purpose-built backup appliances.
What is a Backup Appliance? It is an integrated solution of hardware and software that serves a single purpose: backup and recovery. STORServer Enterprise Backup Appliance (EBA) combines IBM's high-end x86 M4 server, IBM disk and tape storage, and IBM Tivoli Storage Manager (TSM) backup software.
(Fun Fact: The 2012 IBM year-end financial results were announced last month. IBM not only continues its #1 lead in servers overall, but has the #1 marketshare for high-end x86 servers, market-leading disk and tape storage hardware, and market leading backup software.)
To determine the appropriate size of your backup appliance, the folks at STORServer help you every step of the way. They figure out the number of TB you will backup every day, and even help configure all of the TSM server parameters to achieve the policies that make the most sense for your organization.
The appliance can backup every type of data, from databases and Virtual Machines (VMs) to documents, spreadsheets, and other unstructured data.
Are you then left with a solution too complicated to run yourself? No. The STORServer Console is an easy-to-use GUI for ongoing monitoring and maintenance. Plus, your friends at STORServer are only a phone call away in case you have any questions.
(FTC Disclosure: I work for IBM, and STORSever is an approved IBM Business Partner that uses IBM hardware and software to build their solution. I have no financial interest in STORServer, and was not paid by STORServer to mention their company or products on my blog. This post may be considered a celebrity endorsement of STORServer and its Enterprise Backup Appliances.)
Perhaps my readers feel that I am a bit biased in describing a TSM-based solution, and you want a second opinion. No worries, I understand. In the latest 165-page [2012 DCIG Backup Appliance Buyer's Guide], the STORServer models ranked very high. Here is an excerpt:
"Nowhere is this demand for purpose built appliances more evident than in the rise of purpose
built backup appliances (PBBAs) over the last few years and their anticipated growth rate
going forward. A recent market analysis performed by IDC found that worldwide PBBA revenue totaled $2.4 billion in 2011 which was a 42.4 percent increase over the prior year.
This scoring came into play in preparing this Buyer's Guide
as the STORServer EBA 3100 model scored so highly
overall that it fell outside of the two (2) standard deviations
that DCIG generally uses as a guideline for inclusion and
exclusion of products.
The reason DCIG included this model in this Buyer's Guide
whereas in other situations it might not is that DCIG is
unaware of any other backup appliance(s) from any other
providers that come close to matching the EBA 3100's
software and hardware attributes. As such, DCIG felt it
would be doing STORServer specifically and the market
generally a disservice by not highlighting in this Buyer's
Guide that such a backup appliance existed and was
generally available for purchase."
Backup Appliance Models
STORServer EBA 3100
Symantec NetBackup 5220 Backup Appliance
STORServer EBA 2100
STORServer EBA 1100
STORServer EBA 800
Symantec Backup Exec 3600 Appliance
The STORServer is ideal for small and medium-sized business (SMB), but can scale quite large to handle business growth. If you are currently unhappy with your current backup environment, and feel now is the time to look around for a better way of taking backups, you won't go wrong choosing a solution based on IBM's market-leading server and storage hardware with Tivoli Storage Manager software.
Bill Bauman, IBM System x Field Technical Support Specialist and System x University celebrity, presented the differences between Grid, SOA and Cloud Computing. I thought this was an odd combination to compare and contrast, but his presentation was well attended.
Grid - this is when two or more independently owned and managed computers are brought together to solve a problem. Some research facilities do this. IBM helped four hospitals connect their computers together into a grid to help analyze breast cancer. IBM also supports the [World Community Grid] which allows your personal computer to be connected to the grid and help process calculations.
SOA - SOA, which stands for Service Oriented Architecture, is an approach to building business applications as a combination of loosely-coupled black-box components orchestrated to deliver a well-defined level of service by linking together business processes. I often explain SOA as the the business version of Web 2.0. You can download a free copy of the eBook "SOA for Dummies" at the [IBM Smart SOA] landing page.
Cloud - A Cloud is a dynamic, scalable, expandable, and completely contractible architecture. It may consist of multiple, disparate, on-premise and off-premise hardware and virtualized platforms hosting legacy, fully installed, stateless, or virtualized instances of operating systems and application workloads.
Tom Vezina, IBM Advanced Technical Sales Specialist, presented "Chaos to Cloud Computing". Survey results show that roughly 70 percent of cloud spend will be for private clouds, and 30 percent for public, hybrid or community clouds. Of the key motivations for public cloud, 77 percent or respondents cited reducing costs, 72 percent time to value, and 50 percent improving reliability.
Tom ran over 500 "server utilization" studies for x86 deployments during the past eight years. Of these, the worst was 0.52 percent CPU utilization, the best was 13.4 percent, and the average was 6.8 percent. When IBM mentions that 85 percent of server capacity is idle, it is mostly due to x86 servers. At this rate, it seems easy to put five to 20 guest images onto a machine. However, many companies encounter "VM stall" where they get stuck after only 25 percent of their operating system images virtualized.
He feels the problem is with the fact most Physical-to-Virtual (P2V) migrations are manual efforts. There are tools available like Novell [PlateSpin Recon] to help automate and reduce the total number of hours spent per migration.
System x KVM Solutions
Boy, I walked into this one. Many of IBM's cloud offerings are based on the Linux hypervisor called Kernel-based Virtual Machine [a href="http://www.linux-kvm.org/page/Main_Page">KVM] instead of VMware or Microsoft Hyper-V. However, this session was about the "other KVM": keyboard video and mouse switches, which thankfully, IBM has renamed to Console Managers to avoid confusion. Presenters Ben Hilmus (IBM) and Steve Hahn (Avocent) presented IBM's line of Local Console Managers (LCM) and Global Console Managers (GCM) products.
LCM are the traditional KVM switches that people are familiar with. A single keyboard, video and mouse can select among hundreds of servers to perform maintenance or check on status. GCM adds KVM-over-IP capabilities, which means that now you can access selected systems over the Ethernet from a laptop or personal computer. Both LCM and GCM allow for two-level tiering, which means that you can have an LCM in each rack, and an LCM or GCM that points to each rack, greatly increasing the number of servers that can be managed from a single pane of glass.
Many severs have a "service processor" to manage the rest of the machine. IBM RSA II, HP iLO, and Dell DRAC4 are some examples. These allow you to turn on and off selected servers. IBM BladeCenter offers an Management Module that allows the chassis to be connected to a Console Manager and select a specific blade server inside. These can also be used with VMware viewer, Virtual Network Computing (VNC), or Remote Desktop Protocol (RDP).
IBM's offerings are unique it that you can have an optical CD/DVD drive or USB external storage attached at the LCM or GCM, and make it look like the storage is attached to the selected server. This can be used to install or upgrade software, transfer log files, and so on. Another great use, and apparently the motivation for having this session in the "Federal Track", is that the USB can be used to attach a reader for a smart card, known as a Common Access Card [CAC] used by various government agencies. This provides two-factor authentication [TFA]. For example, to log into the system, you enter your password (something you know) and swipe your employee badge smart card (something you have). The combination are validated at the selected server to provide access.
I find it amusing that server people limit themselves to server sessions, and storage people to storage sessions. Sometimes, you have to step "outside your comfort zone" and learn something new, something different. Open your eyes and look around a bit. You might just be surprised what you find.
(FTC note: I work for IBM. IBM considers Novell a strategic Linux partner. Novell did not provide me a copy of Platespin Recon, I have no experience using it, and I mention it only in context of the presentation made. IBM resells Avocent solutions, and we use LCM gear in the Tucson Executive Briefing Center.)
Recently, a client asked how to backup their IBM PureData System for Analytics devices. IBM had [acquired Netezza in November 2010], and later renamed their TwinFin devices as the IBM PureData for Analytics, powered by Netezza.
The [IBM PureData System for Analytics] is incredibly fast for performing deep, ad-hoc analytics. However, the people who use them are "data scientists", not backup experts.
Likewise, there are backup administrators who may not be familiar with the unique characteristics of this expert-integrated system to know what backup options are available.
As with the rest of the IBM PureSystems line, the IBM PureData System for Analytics (or, PDA for short) has a combination of servers, storage and switches inside.
In a full-frame PDA, there are two servers in Active/Passive mode, these coordinate activity to FPGA-based blade servers, which have parallel access to hundreds of disk drives, storing nearly 200 TB of compressed database data. A system can span up to four frames.
But what do you backup? And why? You don't need to worry about backing up the Linux operating system or NPS server code, that is considered firmware and if anything every got corrupted, IBM would help restore it for you. System-wide metadata, such as the host catalog and global users, groups, and permissions should be backed up periodically to protect against data corruption.
There are a number of reasons to backup your user databases:
As part of firmware upgrade/downgrade
To transfer data to another system
Protect against hardware failure / disaster
Protect against data corruption
The PDA has three backup formats. You can backup the entire user database in compressed format, backup individual tables in compressed format, or export to a text-format file.
Compressed format is faster, but can only be restored to the same PDA, or a PDA that has the same or higher level of NPS firmware. The text-format is slower, but can be used to restore to lower levels of NPS firmware, or to other database systems.
There are basically two methods to backup your PDA. The first is called the "Filesystem" method. Basically, you can attach an external storage device to the NPS server, and use the built-in command line interface (CLI) to store the backups onto its file system.
On NPS version 6, the nzhostbackup will backup the /nz/data directory which stores the system tables, database catalogs, configuration files, query plans, and cached executable code for the SPU blade servers.
(I have heard that the nzhostbackup will get deprecated in NPS version 7, but I only have access to version 6. As always, [RTFM] for your specific NPS code level.)
The nzbackup with the users parameter will backup the global users, groups and permissions. This is included in the /nz/data backup contents from the nzhostbackup command, but you may want to backup and restore these separately.
The nzbackup with the db parameter will backup a user database in compressed format. To backup individual tables, use the CREATE EXTERNAL TABLE command, which can create compressed or text-format exports.
You may find that your databases are so large, they will exceed the limits of the filesystem on the external storage device. For SAN or NAS deployments, I recommend the IBM Storwize V7000 Unified with IBM General Parallel File System (GPFS). However, if you are using something else, you may need to use the "nz_backup" scripts provided which split up the backup images into smaller pieces that most other filesystems can handle.
The PDA comes with 10GbE Ethernet ports that you can attach a NAS storage device over a Local Area Network (LAN), or add Fibre Channel Protocol (FCP) ports and connect over a Storage Area Network (SAN). To keep things simple, I will refer to whichever network you decide as the "Backup Network" in the drawings.
The second method for backup is called the "External Backup Software" method. As you have probably guessed, it involves sending the backups to a supported software product like IBM Tivoli Storage Manager (or, TSM for short).
In this case, the PDA acts as a client node, similar to a laptop, desktop, or application server with internal disk. Backup data is sent over the LAN to the designated TSM server, and the TSM server in turn writes over the SAN to its storage hierarchy of disk, virtual tape and/or physical tape resources.
Backups can be done by command "on demand", or automated on a schedule. For the /nz/data directory, direct the nzhostbackup command to send the backup copy to local disk, then use TSM's dsmc archive command to transfer this backup copy to the TSM server.
For nzbackup with the users or db parameters, you can send the data directly to the appropriate TSM server by specifying the connector and connectorArgs parameters.
To reduce traffic on the TSM Server, an intermediary "TSM Proxy Node" can be put in between. In this case, the PDA sends the backup to the Proxy Node, the Proxy Node uses a "LAN Free Storage Agent" to send the backups directly to the virtual tape and/or physical tape, and then notifies the TSM Server to updates its system catalog to record which tape holds these new backups.
Another configuration involves installing the TSM LAN Free storage agent directly on the PDA. While this will require FCP ports to be added and consume more CPU resources on the NPS server, it eliminates most of the LAN traffic, allowing the PDA to send its backups directly to virtual or physical tape.
Continuing my drawn out coverage of IBM's big storage launch of February 9, today I'll cover the IBM System Storage TS7680 ProtecTIER data deduplication gateway for System z.
On the host side, TS7680 connects to mainframe systems running z/OS or z/VM over FICON attachment, emulating an automated tape library with 3592-J1A devices. The TS7680 includes two controllers that emulate the 3592 C06 model, with 4 FICON ports each. Each controller emulates up to 128 virtual 3592 tape drives, for a total of 256 virtual drives per TS7680 system. The mainframe sees up to 1 million virtual tape cartridges, up to 100GB raw capacity each, before compression. For z/OS, the automated library has full SMS Tape and Integrated Library Management capability that you would expect.
Inside, the two control units are both connected to a redundant pair cluster of ProtecTIER engines running the HyperFactor deduplication algorithm that is able to process the deduplication inline, as data is ingested, rather than post-process that other deduplication solutions use. These engines are similar to the TS7650 gateway machines for distributed systems.
On the back end, these ProtecTIER deduplication engines are then connected to external disk, up to 1PB. If you get 25x data deduplication ratio on your data, that would be 25PB of mainframe data stored on only 1PB of physical disk. The disk can be any disk supported by ProtecTIER over FCP protocol, not just the IBM System Storage DS8000, but also the IBM DS4000, DS5000 or IBM XIV storage system, various models of EMC and HDS, and of course the IBM SAN Volume Controller (SVC) with all of its supported disk systems.
You may not be the right person to ask but I am asking everyone so "How do you see hybrid disk drives?"
(For the record, I am not immediately related to Robert. At onepoint, "Pearson" was the 12th most common surname in the USA, but now doesn't even make the Top 100.)
Robert, I would like to encourage you and everyone else to ask questions, don't worry if I am the wrong person to ask, asprobably I know the right person within IBM. Some people have called me the "Kevin Bacon" of Storage,as I am often less than six degrees away from the right person, having worked in IBM Storage for over 20 years.
For those not familiar with hybrid drives, there is a good write-up in Wikipedia.
Unfortunately, most of the people I would consult on this question, such as those from Market Intelligence or Research, are on vacation for the holidays, so, Robert, I will have to rely on my trusted 78-card Tarot deck and answer you with a five-card throw.
Your first card, Robert, is the Hermit. This card represents "introspection". The best I/O is no I/O, which means that if applications can keep the information they need inside server memory, you can avoid the bus bandwidth limitations to going to external storage devices. Where external storage makes sense is when data is shared between servers, or when the single server is limited to a set amount of internal memory. So, consider maxing out the memory in your server first (IBM would be glad to sell you more internal memory!!!), then consider outside solid-state or hybrid devices. Windows for example has an architectural limit of 4GB.
Your second card, Robert, is the Four of Cups, representing "apathy".On the card, you see three cups together, with the fourth cup being delivered from a cloud. This reminds me thatwe have three storage tiers already (memory,disk,tape), and introducing a fourth tier into the mix may not garnermuch excitement. For the mainframe, IBM introduced a Solid-State Device, call the Coupling Facility, which can be accessed from multipleSystem z servers. It is used heavily by DFSMS and DB2 to hold shared information. However, given some customer's apathytowards Information Lifecycle Management which includes "tiered storage", introducing yet another tier that forcespeople to decide what data goes where may be another challenge.
Your third card, Robert, is the Chariot, which represents "Speed, Determination,and Will". In some cases, solid state disk are faster for reading, but can be slower for writing. In the case of ahybrid drive, where the memory acts as a front-end cache, read-hits would be faster, but read-misses might be slower.While the idea of stopping the drives during inactivity will reduce power consumption, spinning up and slowing downthe disk may incur additional performance penalties. At the time of this post, the fastest disk system remains the IBM SAN Volume Controller, based on SPC-1 and SPC-2 benchmarks in excess of those published for other devices.
Your fourth card, Robert, is the Eight of Pentacles, which represents"Diligence, Hard work". The pentacles are coins with five-sided stars on them, and this often represents money.Our research team has projected that spinning disk will continue to be a viable and profitable storage media for at least anothereight years.
Your fifth and last card, Robert, is the World, which normallyrepresents "Accomplishment", but since it is turned upside down, the meaning is reversed to "Limitation". Some Hybriddisks, and some types of solid state memory in general, do have limitations in the number of write cycles they can handle. For thoseunhappy with the frequency and slowness for rebuilds on SATA disk may find similar problems with hybrid drives.For that reason, businesses may not trust using hybrid drives for their busiest, mission-critical applications, but certainlymight use it for archive data with lower write-cycle requirements.
The tarot cards are never wrong, but certainly interpretations of the cards can be.