It's official! My "blook" Inside System Storage - Volume I
is now available.
|This blog-based book, or “blook”, comprises the first twelve months of posts from this Inside System Storage blog,165 posts in all, from September 1, 2006 to August 31, 2007. Foreword by Jennifer Jones. 404 pages.|
- IT storage and storage networking concepts
- IBM strategy, hardware, software and services
- Disk systems, Tape systems, and storage networking
- Storage and infrastructure management software
- Second Life, Facebook, and other Web 2.0 platforms
- IBM’s many alliances, partners and competitors
- How IT storage impacts society and industry
You can choose between hardcover (with dust jacket) or paperback versions:
This is not the first time I've been published. I have authored articles for storage industry magazines, written large sections of IBM publications and manuals, submitted presentations and whitepapers to conference proceedings, and even had a short story published with illustrations by the famous cartoon writer[Ted Rall].
But I can say this is my first blook, and as far as I can tell, the first blook from IBM's many bloggers on DeveloperWorks, and the first blook about the IT storage industry.I got the idea when I saw [Lulu Publishing] run a "blook" contest. The Lulu Blooker Prize is the world's first literary prize devoted to "blooks"--books based on blogs or other websites, including webcomics. The [Lulu Blooker Blog] lists past year winners. Lulu is one of the new innovative "print-on-demand" publishers. Rather than printing hundredsor thousands of books in advance, as other publishers require, Lulu doesn't print them until you order them.
I considered cute titles like A Year of Living Dangerously, orAn Engineer in Marketing La-La land, or Around the World in 165 Posts, but settled on a title that matched closely the name of the blog.
In addition to my blog posts, I provide additional insights and behind-the-scenes commentary. If you go to the Luluwebsite above, you can preview an entire chapter in its entirety before purchase. I have added a hefty 56-page Glossary of Acronyms and Terms (GOAT) with over 900 storage-related terms defined, which also doubles as an index back to the post (or posts) that use or further explain each term.
So who might be interested in this blook?
- Business Partners and Sales Reps looking to give a nice gift to their best clients and colleagues
- Managers looking to reward early-tenure employees and retain the best talent
- IT specialists and technicians wanting a marketing perspective of the storage industry
- Mentors interested in providing motivation and encouragement to their proteges
- Educators looking to provide books for their classroom or library collection
- Authors looking to write a blook themselves, to see how to format and structure a finished product
- Marketing personnel that want to better understand Web 2.0, Second Life and social networking
- Analysts and journalists looking to understand how storage impacts the IT industry, and society overall
- College graduates and others interested in a career as a storage administrator
And yes, according to Lulu, if you order soon, you can have it by December 25.
technorati tags: IBM, blook, Volume I, Jennifer Jones, system, storage, strategy, hardware, software, services, disk, tape, networking, SAN, secondlife, Web2.0, facebook, Lulu, publishing, Blooker Prize, articles, magazines, proceedings, Ted Rall, insights, glossary, early-tenure, mentors, library, classroom, administrator, print, publish, on demand
While HDS blogger Hu Yoshida and IBM blogger Barry Whyte make a [great case for why you should buy IBM SAN Volume Controller
], my favorite arch-nemesis and fellow blogger BarryB on his Storage Anarchist
blog feels the SVC is "blue spray paint".
BarryB's latest round of red-meat rhetoric is his amusing post [This is like déjà vu all over again], titled after a [quote from Yogi Berra].BarryB pokes fun at Andy Monshaw's commentsin Chris Preimesberger's eWeek article [IBM's Big Storage Picture], andmy post ealier this week about Sun's "Open Storage" initiative [Simply Dinners and Open Storage from Sun], as if the two were somehow connected.
He feels I was unfair to accuse EMC of "proprietary interfaces" without spelling out what I was referring to. Here arejust two, along with the whines we hear from customers that relate to them.
- EMC Powerpath multipathing driver
Typical whine: "I just paid a gazillion dollars to renew my annual EMC Powerpath license, so you will have to come back in 12 months with your SVC proposal. I just can't see explaining to my boss that an SVC eliminates the need for EMC Powerpath, throwing away all the good money we just spent on it, or to explain that EMC chooses not to support SVC as one of Powerpath's many supported devices."
- EMC SRDF command line interface
Typical whine: "My storage admins have written tons of scripts that all invoke EMC SRDF command line interfacesto manage my disk mirroring environment, and I would hate for them to re-write this to use IBM's (also proprietary) command line interfaces instead."
Certainly BarryB is correct that IBM still has a few remaining "proprietary" items of its own. IBM has been in business over 80 years, but it was only the last 10-15 years that IBM made a strategic shift away from proprietary and over to open standards and interfaces. The transformation to "openness" is not yet complete, but we have made great progress. Take these examples:
- The System z mainframe - IBM had opened the interfaces so that both Amdahl and Fujitsu made compatible machines.Unlike Apple which forbids cloning of this nature, IBM is now the single source for mainframes because the other twocompetitors could not keep up with IBM's progress and advancements in technology.
Update: Due to legal reasons, the statements referring to Hercules and other S/390 emulators havebeen removed.
- The z/OS operating system - While it is possible to run Linux on the mainframe, most people associate the z/OSoperating system with the mainframe. This was opened up with UNIX System Services to satisfy requests from variousgovernments. It is now a full-fledged UNIX operating system, recognized by the [Open Group] that certifies it as such.
- As BarryB alludes, the unique interfaces for disk attachment to System z known as Count-Key-Data (CKD) was published so that both EMC and HDS can offer disk systems to compete with IBM's high-end disk offerings. Linux on System zsupports standard Fibre Channel, allowing you to attach an IBM SVC and anyone's storage. Both z/OS and Linux on System z support NAS storage, so IBM N series, NetApp, even EMC Celerra could be used in that case.
- The System i itself is still proprietary, but recently IBM announced that it will now support standard block size (512 bytes) instead of the awkward 528 byte blocks that only IBM and EMC support today. That means that any storage vendor will be ableto sell disk to the System i environment.
- Advanced copy services, like FlashCopy and Metro Mirror, are as proprietary as the similar offerings from EMCand HDS, with the exception that IBM has licensed them to both EMC and HDS. Thanks to cross-licensing, you can do [FlashCopy on EMC] equipment. Getting all the storage vendors to agree to open standards for these copy services is still workin progress under [SNIA], but at least people who have coded z/OS JCL batchjobs that invoke FlashCopy utilities can work the same between IBM and EMC equipment.
So for those out there who thought that my comment about EMC's proprietary interfaces in any way implied thatIBM did not have any of its own, the proverbial ["pot calling the kettle black"] so to speak, I apologize.
BarryB shows off his [PhotoShop skills] with the graphic below. I take it as a compliment to be compared to anAll-American icon of business success.
|TonyP and Monopoly's Mr. Pennybags|
Separated at Birth?
However, BarryB meant it as a reference back to long time ago when IBMwas a monopoly of the IT industry, which according to [IBM's History
], ended in 1973. In other words, IBMstopped being a monopoly before EMC ever existed as a company, and long before I started working for IBM myself.
The anti-trust lawsuit that BarryB mentions happened in 1969, which forced IBM to separate some of the software from its hardware offerings, and prevented IBM from making various acquisitions for years to follow, forcing IBM instead into technology partnerships. I'm glad that's all behind us now!
technorati tags: HDS, Hu Yoshida, IBM, Barry Whyte, SVC, BarryB, Storage Anarchist, blue, spray paint, red-meat rhetoric, Yogi Berra, Andy Monshaw, Chris Preimesberger, eWeek, Open storage, Sun, proprietary interfaces, mainframe, z/OS, UNIX, Open+Group, CKD, NAS, NetApp, Photoshop
Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
- Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
- Seagate (including Maxstor they acquired)
- Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
- American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
- The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
- ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
- RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
- Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
technorati tags: IBM, Seagate, Hitachi, HGST, EMC, NetApp, HP, HDS, Sun, Google, CMU, DDM, Fujitsu, MTBF, MTTF, AFR, ARR, JBOD, RAID, Tivoli, SVC, DR550, CDP, FC, SCSI, disk, tape, SAN,
I'm glad to be back home in Tucson for a few weeks. All of these conferences kept mefrom reading up with what was going on in the blogosphere.
A few of us at IBM found it odd that EMC would announce their new Geographically Dispersed Disaster Restart (GDDR) the weekBEFORE their "EMC World" conference. Why not announce all of the stuff all at once instead at the conference?Were they worried that the admission that "Maui" software is still many months awaythat much of a negative stigma? The decision probably went something like this:
EMCer #1: GDDR is finally ready, should we announce now, or wait ONE week to make it part of the thingswe announce at EMC World?
EMCer #2: We are not announcing much at EMC World and what people really want us to talk about, Maui, wearen't delivering for a while. Why can't people understand we are company of hardware engineers, not software programmers! So, better not be associated with that quagmire at all.
EMCer #1: Yes, boss, I see your point. We'll announce this week then.
My fellow blogger and intellectual sparring partner, Barry Burke, on his Storage Anarchist blog, posted [are you wasting money on your mainframe dr solution?"] to bringup the GDDR announcement. The key difference is that IBM GDPS works withIBM, EMC and HDS equipment, being the fair-and-balanced folks that IBM clientshave come to expect, but it appears EMC GDDR works only with EMC equipment.Because GDDR does less, it also costs less. I can accept that. You get whatyou pay for. Of course, IBM does have a variety of protection levels, one probably will meet your budget and your business continuity needs.
To correct Barry's misperception, companies that buy IBM mainframe servers do have a choice.They can purchase their operating system from IBM, get their Linux or OpenSolarisfrom someone else like Red Hat or Novell, or build their own OS distribution fromreadily available open source. And unlike other servers that might require at leastone OS partition from the vendor, IBM mainframes can run 100 percent Linux.GDPS supports a mix of OS data. z/OS and Linux data can all be managed by GDPS.Companies that own mainframes know this. I can forgive the misperception from Barry,as EMC is focused on distributed servers instead, and many in their company may not have muchexposure to mainframe technology, or have ever spoken to mainframe customers.
But what almost had me fall out of my chair was this little nugget from his post:
"If you're an IBM mainframe customer, you are - by definition - IBM's profit stream."
Honestly, is there anyone out there that does not realize that IBM is a for-profitcorporation? In contrast, Barry would like his readers to believe that EMC is selling GDDR at cost, andthat EMC is a non-profit organization. While IBM has been delivering actual solutions thatour clients want, EMC continues to rumor that someday they might get around to offering something worthwhile.In the last six months, the shareholders have interpreted both strategies for what they really are,and the stock prices reflect that:
(courtesy of [finance.yahoo.com])
(Disclosure: I own IBM stock. I do not own EMC stock. Stock price comparisonsby Yahoo were based on publicly reported information. The colors blue and red to represent IBM and EMC, respectively, were selected by Yahoo graph-making facility. The color red does not necessarily imply EMC is losing money or having financial troubles.)
Of course, I for one would love to help Barry's dream of EMC non-profitability come true. If anyone has any suggestions how we can help EMC approach this goal, please post a comment below.
technorati tags: IBM, GDPS, EMC, GDDR, Maui, EMC World, HDS, Yahoo Finance, stock price, non-profit, strategy, shareholders
Two weeks ago, I mentioned in my post [Pulse 2008 - Day 2 Breakout sessions
] thatHenk de Ruiter from ABN Amro bank presented his success storyimplementing Information Lifecycle Management (ILM) across hisvarious data centers. I am no stranger to ABN Amro, having helped "ABN" and "Amro" banks merge their mainframe data in 1991. Henk has agreed to let me share with my readers more ofthis success story here on my blog:
Back in December 2005, Henkand his colleagues had come to visit the IBM Tucson ExecutiveBriefing Center (EBC) to hear about IBM products and services. At the time, I was part of our "STG Lab Services" team that performed ILM assessments at client locations. I explained to ABN Amro that the ILM methodology does not requirean all-IBM solution, and that ILM could even provide benefits with their current mix of storage, software and service providers.The ABN Amro team liked what I had to say, andmy team was commissioned to perform ILM assessments atthree of their data centers:
- Amsterdam (Netherlands)
- Sao Paulo (Brazil)
- Chicago, IL (USA)
Each data center had its own management, its owndecision making, and its own set of issues, so we structuredeach ILM assessment independently. When we presented our results,we showed what each data center could do better with their existing mixed bagof storage, software and service providers, and also showed howmuch better their life would be with IBM storage, software andservices. They agreed to give IBM a chance to prove it, and soa new "Global Storage Study" was launched to take the recommendationsfrom our three ILM studies, and flesh out the details to make aglobally-integrated enterprise work for them. Once completed,it was renamed the "Global Storage Solution" (GSS).
Henk summarized the above with "I am glad to see Tony Pearsonin the audience, who was instrumental to making this all happen."As with many client testimonials, he presented a few charts onwho ABN Amro is today, the 12th largest bank worldwide, 8th largest in Europe. They operate in 53 countries and manage over a trillioneuros in assets.
They have over 20 data centers, with about 7 PB of disk, and over20 PB of tape, both growing at 50 to 70 percent CAGR. About 2/3 of theiroperations are now outsourced to IBM Global Services, the remaining 1/3is non-IBM equipment managed by a different service provider.
ABN Amro deployed IBM TotalStorage Productivity Center, variousIBM System Storage DS family disk systems, SAN Volume Controller (SVC), Tivoli StorageManager (TSM), Tivoli Provisioning Manager (TPM), and several other products. Armed with these products, they performed the following:
- Clean Up. IBM uses the term "rationalization" to relate to the assignment of business value, to avoid confusion with theterm "classification" which many in IT relate to identifyingownership, read and write authorization levels. Often, in theinitial phases of an ILM deployment, a portion of the data isdetermined to be eligible for clean up, either to move to a lower-cost tier or deleted immediately. ABN Amro and IBM set a goal to identifyat least 20 percent of their data for clean up.
- New tiers. Rather than traditional "storage tiers" which are often justTier 1 for Fibre Channel disk and Tier 2 for SATA disk, ABN Amroand IBM came up with seven "information infrastructure tiers" thatincorporate service levels, availability and protection status.They are:
- High-performance, Highly-available disk with Remote replication.
- High-performance, Highly-available disk (no remote replication)
- Mid-performance, high-capacity disk with Remote replication
- Mid-performance, high-capacity disk (no remote replication)
- Non-erasable, Non-rewriteable (NENR) storage employinga blended disk and tape solution.
- Enterprise Virtual Tape Library with remote replicationand back-end physical tape
- Mid-performance physical tape
These tiers are applied equally across their mainframe anddistributed platforms. All of the tiers are priced per "primary GB", so any additional capacity required for replication orpoint-in-time copies, either local or remote, are all folded into the base price.ABN Amro felt a mission-critical applicationon Windows or UNIX deserves the same Tier 1 service level asa mission-critical mainframe application. Exactly!
- Deployed storage virtualization for disk and tape. Thisinvolved the SAN Volume Controller and IBM TS7000 series library.
- Implemented workflow automation. The key product here is IBM Tivoli Provisioning Manager
- Started an investigation for HSM on distributed. This would be policy-based space management to migrate lessfrequently accessed data to the TSM pool for Windows or UNIX data.
While the deployment is not yet complete, ABN Amro feels they have alreadyrecognized business value:
- Reduced cost by identifying data that should be stored on lower tiers
- Simplified management, consolidated across all operating systems (mainframe, UNIX, Windows)
- Increased utilization of existing storage resources
- Reduced manual effort through policy-based automation, which can lead to fewer human errors and faster adaptability to new business opportunities
- Standardized backup and other operational procedures
Henk and the rest of ABN Amro are quite pleased with the progress so far,although recent developments in terms of the takeover of ABN AMRO by aconsortium of banks means that the model is only implemented so far in Europe. Further rollout depends on the storage strategy of the new owners. Nonetheless,I am glad that I was able to work with Henk, Jason, Barbara, Steve, Tom, Dennis, Craig and othersto be part of this from the beginning and be able to see it rollout successfully over the years.
For more about what was presented at Pulse 2008 conference, see the videos of the keynotespeakers at [IBM Pulse - YouTube channel]!
technorati tags: IBM, ABN Amro, Henk de Ruiter, merge, mainframe, Tucson, Executive Briefing Center, EBC, STG, Lab Services, ILM, Amsterdam, Netherlands, Sao Paulo, Brazil, Chicago, Global Storage, study, solution, Productivity Center, DS8000, DS4000, SVC, storage tiers, rationalization, NENR, FC, SATA, Windows, UNIX, TS7000, HSM
Over on his Backup Blog
, fellow blogger Scott Waterhouse from EMC has a post titled
[Backup Sucks: Reason #38
]. Here is an excerpt:
Unfortunately, we have not been able to successfully leverage economies of scale in the world of backup and recovery. If it costs you $5 to backup a given amount of data, it probably costs you $50 to back up 10 times that amount of data, and $500 to back up 100 times that amount of data.
If anybody can figure out how to get costs down to $40 for 10 times the amount of data, and $300 for 100 times the amount of data, they will have an irrefutable advantage over anybody that has not been able to leverage economies of scale.
I suspect that where Scott mentions we in the above excerpt, he is referring to EMC in general, with products like
Legato. Fortunately, IBM has scalable backup solutions, using either a hardware approach, or one purely with software.
- Hardware Approach
The hardware approach involves using deduplication hardware technology as the storage pool for IBM Tivoli Storage Manager (TSM). Using this approach, IBM Tivoli Storage Manager would receive data from dozens, hundreds or even thousands
of client nodes, and the backup copies would be sent to an IBM TS7650 ProtecTIER data deduplication appliance, IBM TS7650G gateway, or IBM N series with A-SIS. In most cases, companies have standardized on the operating systems and applications used on these nodes, and multiple copies of data reside across employee laptops. As a result, as you have more nodes backing up, you are able to achieve benefits of scale.
- Software Approach
Perhaps your budget isn't big enough to handle new hardware purchases at this time, in this economy. Have no fear,
IBM also offers deduplication built right into the IBM Tivoli Storage Manager v6 software itself. You can use sequential access disk storage pool for this. TSM scans and identifies duplicate chunks of data in the backup copies, and also archive and HSM data, and reclaims the space when found.
If your company is using a backup software product that doesn't scale well, perhaps now is a good time to switch over to IBM Tivoli Storage Manager. TSM is perhaps the most scalable backup software product in the marketplace, giving IBM an "irrefutable advantage" over the competition.
technorati tags: IBM, Scott Waterhouse, EMC, Legato, Tivoli, TSM, deduplication, ProtecTIER, N series, A-SIS
NetworkWorld has compiled interlude with storage videos
, a follow up to last year's Yikes! Exploding Servers
I've blogged about some of these videos already, but since there are probably a few out there buying the brand new Apple iPhone looking for YouTube videos to play on them, these links might provide some exampleentertainment on your new handheld device.
Next week has "Fourth of July" Independence Day holiday in the USA smack in the middle of the week, so I suspect the blogosphereto quiet down a bit. So whether you are working next week or not, in the USA or elsewhere, take some time to enjoy your friends and family.
technorati tags: NetworkWorld, storage, videos, HP, IBM, EMC, HDS, Sun,exploding, servers, Apple, iPhone, YouTube
Many people have asked me if there was any logic with the IBM naming convention of IBM Systems branded servers. Here's your quick and easy cheat sheet:
- System x -- "x" for cross-platform architecture. Technologies from our mainframe and UNIX servers were brought into chips that sit next to the Intel or AMD processors to provide a more reliable x86 server experience. For example, some models have a POWER processor-based Remote Supervisor Adapter (RSA).
- System p -- "p" for POWER architecture.
- System z -- "z" for Zero-downtime, zero-exposures. Our lawyers prefer "near-zero", but this is about as close as you get to ["six-nines" availability] in our industry, with the highest level of security and encryption, no other vendor comes close, so you get the idea.
But what about the "i" for System i? Officially, it stands for "Integrated" in that it could integrate different applications running on different operating systems onto a [COMMON
] platform. Options were available to insert Intel-based processor cards that ran Windows, or attach special cables that allowed separate System x servers running Windows to attach to a System i. Both allowed Windows applications to share the internal LAN and SAN inside the System i machine. Later, IBM allowed [AIX on System i
] and [Linux on Power
] operating systems to run as well.
From a storage perspective, we often joked that the "i" stood for "island", as most System i machines used internal disk, or attached externally to only a fewselected models of disk from IBM and EMC that had special support for i5/OS using a special, non-standard 520-byte disk block size. This meant only our popular IBM System Storage DS6000 and DS8000 series disk systems were available. This block size requirement only applies to disk. For tape, i5/OS supports both IBM TS1120 and LTO tape systems. For the most part,System i machines stood separate from the mainframe, and the rest of the Linux, UNIX and Windows distributed serverson the data center floor.
Often, when I am talking to customers, they ask when will product xyz be supported on System z or System i?I explained that IBM's strategy is not to make all storage devices connect via ESCON/FICON or support non-standard block sizes, but rather to get the servers to use standard 512-byte block size, Fibre Channel and other standard protocols.(The old adage applies: If you can't get Mohamed to move to the mountain, get the mountain to move to Mohamed).
On the System z mainframe, we are 60 percent there, allowing three of the five operating systems (z/VM, z/VSE and Linux) to access FCP-based disk and tape devices. (Four out of six if you include [OpenSolaris for the mainframe])But what about System i? As the characters on the popular television show [LOST] would say: It's time to get off the island!
Last week, IBM announced the new [i5/OS V6R1 operating system] with features that will greatly improve the use of external storage on this platform. Check this out:
- POWER6-based System i 570 model server
Our latest, most powerful POWER processor brought to the System i platform. The 570 model will be the first in the System i family of servers to make use of new processing technology, using up to 16 (sixteen!) POWER6 processors (running at 4.7GHZ) in each machine.The advantage of the new processors is the increased commercial processing workload (CPW) rating, 31 percent greater than the POWER5+ version and 72 percent greater than the POWER5 version. CPW is the "MIPS" or "TeraFlops" rating for comparing System i servers.Here is the[Announcement Letter].
- Fibre Channel Adapter for System i hardware
That's right, these are [Smart IOAs], so an I/O Processor (IOP) is no longer required! You can even boot the Initial Program Load (IPL) direclty from SAN-attached tape.This brings System i to the 21st century for Business Continuity options.
- Virtual I/O Server (VIOS)
[VirtualI/O Server] has been around for System p machines, but now available on System i as well. This allows multiplelogical partitions (LPARs) to access resources like Ethernet cards and FCP host bus adapters. In the case of storage, the VIOS handles the 520-byte to 512-byte conversion, so that i5/OS systems can now read and write to standard FCP devices like the IBM System Storage DS4800 and DS4700 disk systems.
- IBM System Storage DS4000 series
Initially, we have certified DS4700 and DS4800 disk systems to work with i5/OS, but more devices are in plan.This means that you can now share your DS4700 between i5/OS and your other Linux, UNIX and Windowsservers, take advantage of a mix of FC and SATA disk capacities, RAID6 protection, and so on.
- IBM PowerVM
To call [IBM PowerVM] the "VMware for the POWER architecture" would not do it quite justice. In combination with VIOS, IBM PowerVM is able to run a variety of AIX, Linux and i5/OS guest images.The "Live Partition Mobility" feature allows you to easily move guest images from one system to another, while they are running, just like VMotion for x86 machines.
And while we are on the topic of x86, PowerVM is also able to represent a Linux-x86 emulation base to run x86-compiled applications. While many Linux applications could be re-complied from source code for the POWER architecture "as is", others required perhaps 1-2 percent modification to port them over, and that was too much for some software development houses. Now, we can run most x86-compiled Linux application binaries in their original form on POWER architecture servers.
- BladeCenter JS22 Express
The POWER6-based [JS22 Express blade] can run i5/OS, taking advantage of PowerVM and VIOS to access all of the BladeCenterresources. The BladeCenter lets you mix and match POWER and x86-based blades in the same chassis, providing theultimate in flexibility.
Now that's exciting!
technorati tags: IBM, System x, System p, System i, System z, island, COMMON, AIX, Linux, POWER, POWER6, Windows, EMC, DS6000, DS8000, TS1120, LTO, ESCON, FICON, 520-byte, z/VM, z/VSE, z/OS, z/TPF, OpenSolaris, mainframe, LOST, CPW, x86, VMware, VMotion, BladeCenter, JS22, i5/OS, V6R1, PowerVM, VIOS, LPAR, DS4700, DS4800, LTO, disk, SAN, tape, storage
On Tuesday, I covered much of the Feb 26 announcements, but left the IBM System Storage DS8000 for today so that it can haveits own special focus.
Many of the enhancements relate to z/OS Global Mirror, which we formerly called eXtended Remote Copy or "XRC", not to be confused with our "regular" Global Mirror that applies to all data. For those not familiar with z/OS Global Mirror, here is how it works. The production mainframe writes updates to the DS8000, and the DS8000 keeps track of these in cache until a "reader" can pull them over to the secondary location.The "reader" is called System Data Mover (SDM) which runs in its own address space under z/OS operating system. Thanks to some work my team did several years ago, z/OS Global Mirror was able to extend beyond z/OS volumes and include Linux on System z data. Linux on System z can use a "Compatible Disk Layout" (CDL) format (now the default) that meetsall the requirements to be included in the copy session.
IBM has over 300 deployments of z/OS Global Mirror, mostly banks, brokerages and insurance companies. The feature can keep tens of thousands of volumes in one big "consistency group" and asynchronously mirror them to any distance on the planet, with the secondary copy recovery point objective (RPO) only a few seconds behind the primary.
- Extended Distance FICON
Extended Distance FICON is an enhancement to the industry-standard FICON architecture (FC-SB-3) that can help avoid degradation of performance at extended distances by implementing a new protocol for "persistent" Information Unit (IU) pacing. This deals with the number of packets in flight between servers and storage separated by long distances, andcan keep a link fully utilized at 4Gpbs FICON up to 50 kilometers. This is particularly important for z/OS GlobalMirror "reader" System Data Mover (SDM). By having many "reads" in flight, this enhancementcan help reduce the need for spoofing or channel-extender equipment, or allow you to choose lower-costchannel extenders based on "frame-forwarding" technology. All of this helps reduce your total cost of ownership (TCO)for a complete end-to-end solution.
This feature will be available in March as a no-charge update to the DS8000 microcode.For more details, see the [IBM Press Release]
- z/OS Global Mirror process offload to zIIP processors
To understand this one, you need to understand the different "specialty engines" available on the System z.
On distributed systems where you run a single application on a single piece of server hardware, you mightpay "per server", "per processor" or lately "per core" for dual-core and quad-core processors. Software vendors were looking for a way to charge smaller companies less, and larger companies more. However, you might end up paying the same whether you use 1GHz Intelor 4GHz Intel processor, even though the latter can do four times more work per unit time.
The mainframe has a few processors for hundreds or thousands of business applications.In the beginning, all engines on a mainframe were general-purpose "Central Processor" or CP engines. Based on theircycle rate, IBM was able to publish the number of Million Instructions per Second (MIPS) that a machine witha given number of CP engines can do. With the introduction of side co-processors, this was changed to "Millionsof Service Units" or MSU. Software licensing can charge per MSU, and this allows applications running in aslittle as one percent of a processor to get appropriately charged.
One of the first specialty engines was the IFL, the "Integrated Facility for Linux". This was a CP designatedto only run z/VM and Linux on the mainframe. You could "buy" an IFL on your mainframe much cheaper than a CP,and none of your z/OS application software would count it in the MSU calculations because z/OS can't run on theIFL. This made it very practical to run new Linux workloads.
In 2004, IBM introduced "z Application Assist Processor" (zAAP) engines to run Java, and in 2006, the "z Integrated Information Processor" (zIIP) engines to run database and background data movement activities.By not having these counted in the MSU number for business applications, it greatly reduced the cost for mainframe software.
Tuesday's announcement is that the SDM "reader" will now run in a zIIP engine, reducing the costs for applicationsthat run on that machine. Note that the CP, IFL, zAAP and zIIP engines are all identical cores. The z10 EC hasup to 64 of these (16 quad-core) and you can designate any core as any of these engine types.
- Faster z/OS Global Mirror Incremental Resync
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z,you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and alsohave site "A" sending data over to size "C" using z/OS Global Mirror. This is called "Metro z/OS Global Mirror"or "MzGM" for short.
In the past, if the disk in site A failed, you would switch over to site B, and then send all the data all over again. This is because site B was not tracking what the SDM reader had or had not yet processed.With Tuesday's announcement, IBM has developed an "incremental resync" where site B figures out what theincremental delta is to connect to the z/OS Global Mirror at site "C", and this is 95% faster than sendingall the data over.
- IBM Basic HyperSwap for z/OS
What if you are sending all of your data from one location to another, and one disk system fails? Do you declare a disaster and switch over entirely? With HyperSwap, you only switch over the disk systems, but leave therest of the servers alone. In the past, this involved hiring IBM Global Technology Services to implementa Geographically Dispersed Parallel Sysplex (GDPS) with software that monitors the situation and updates thez/OS operating system when a HyperSwap had occurred. All application I/O that were writing to the primary locationare automatically re-routed to the disks at the secondary location. HyperSwap can do this for all the disk systems involved,allowing applications at the primary location to continue running uninterrupted.
HyperSwap is a very popular feature, but not everyone has implemented the advanced GDPS capabilities.To address this, IBM now offers "Basic HyperSwap", which is actually going to be shipped as IBMTotalStorage Productivity Center for Replication Basic Edition for System z. This will run in a z/OSaddress space, and use either the DB2 RDBMS you already have, or provide you Apache Derby database for thosefew out there who don't have DB2 on their mainframe already.
Update: There has been some confusion on this last point, so let me explain the keydifferences between the different levels of service:
- Basic HyperSwap: single-site high availability for the disk systems only
- GDPS/PPRC HyperSwap Manager: single- or multi-site high availability for the disk systems, plus some entry-level disaster recovery capability
- GDPS/PPRC: highly automated end-to-end disaster recovery solution for servers, storage and networks
I apologize to all my colleagues who thought I implied that Basic HyperSwap was a full replacement for the morefull-function GDPS service offerings.
- Extended Address Volumes (EAV)
Up until now, the largest volume you could have was only 54 GB in size, and many customers still are using 3 GB and 9 GB volume sizes. Now, IBM will introduce 223 GB volumes. You can have any kind of data set on these volumes,but only VSAM data sets can reside on cylinders beyond the first 65,280. That is because many applications still thinkthat 65,280 is the largest cylinder number you can have.
This is important because a mainframe, or a set of mainframes clustered together, can only have about 60,000disk volumes total. The 60,000 is actually the Unit Control Block (UCB) limit, and besides disk volumes, youcan have "virtual" PAVs that serve as an alias to existing volumes to provide concurrent access.
Aside from the first item, the Extended Distance FICON, the other enhancements are "preview announcements" which means that IBM has not yet worked out the final details of price, packaging or delivery date. In many cases, the work is done, has been tested in our labs, or running beta in select client locations, but for completeness I am required to make the following disclaimer:
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Availability, prices, ordering information, and terms and conditions will be provided when the product is announced for general availability.
technorati tags: IBM, z10 EC, DS8000, z/OS Global Mirror, XRC, SDM, CDL, RPO, FICON, dual-core, quad-core, Intel, MIPS, MSU, zAAP, IFL, zIIP, Hyperswap, DB2, Apache, Derby, UCB, VSAM, EAV
Marshall Lancaster from [United Stationers Technology Services
]presented how their [Lagasse
] subsidiary successfully survived [Hurricane Katrina in New Orleans
] in 2005. I feel this was one of the bestpresentations of the week, here at the [Data Center Conference
Lagasse, Inc. sells janitorial supplies, such as mops, cleaning chemicals, waste receptacles, and garbage can liners. Of the 1000 employees of Lagasse nationwide, about 200 associates were located in New Orleans at their main Headquarters, primary customer care center, and primary IT computing center.
Amazingly, Lagasse did not have a formally documented BCP (Business Continuity Plan) but more of aBCI (Business Continuity Idea). They chose to take a ["donut tire"] approach, putting older previous-generation equipment at their DR site. They knew that in the event of a disaster,they would not be processing as many transactions per second. That was a business trade-offthey could accept.
Evaluating all the different threat scenarios for impact and likelihood, and focused on hurricanes and floods.They had experienced previous hurricanes, learning from each,with the most recent being 2004 Hurricane Ivan and 2005 Hurricane Dennis. From this, they wereable to categorize three levels of DR recovery:
- Tier 1 - The most mission-critical, which for them related to picking, packing and shipping products.
- Tier 2 - The next most important, focused on maintaining good customer service
- Tier 3 - Everything else, including reporting and administrative functions
The time-line of events went as follows:
- August 25
- The US Government issues warning that a hurricane may hit New Orleans
- August 27 - 7pm
- Lagasse declares a disaster, starts recovery procedures to an existing IT facility in Chicago, owned by their parent company. A temporary "Southeast" Headquarters were set up in Atlanta.Remote call centers were identified in Dallas, Atlanta, San Antonio, and Miami.
- August 28 - just after midnight
- In just five hours, they recovered their "Tier 1" applications.
- August 28 - 7:30pm
- In just over 24 hours, they recovered their "Tier 2" applications.
- August 29 - 6am
- The Hurricane hits land. With 73 levees breached, the city of New Orleans was flooded.
- The following week
- Lagasse was fully operational, and recorded their second and third best sales days ever.
I was quite impressed with their company's policy for how they treat their employees during a disaster. For many companies, people during a disaster prioritize on their families, not their jobs.If any associate was asked to work during a disaster, the company would take care of:
- The safety of their family
- The safety of their pets. (In the weeks following this hurricane, I sponsored people in Tucson to go to New Orleans to attend to lost and stray dogs and cats, many of which were left behind when rescuers picked up people from their rooftops.)
- Any emergency repairs to secure the home they leave behind
Marshall felt that if you don't know the names of the spouse and kids of your key employees, you are not emotionally-invested enough to be successful during a disaster.
For communications, cell phones were useless. They could call out on them, but anyone with acell phone with 504 area code had difficulty receiving calls, as the calls had to be processedthrough New Orleans. Instead, they used Voice over IP (VoIP) to redirect calls to whichever remote call center each associate went to. Laptops, Citrix, VPN and email were considered powerful tools during this process. They did not have Instant Messaging (IM) at the time.
While the disk and tapes needed to recover Tiers 1 and 2 were already in Chicago, the tapes for Tier 3 were stored locally by a third-party provider. When Lagasse asked for thier DR tapes back, the third-party refused, based on their [force majeure] clause. Force majeure is a common clause in many business contracts to free parties from liabilityduring major disasters.Marshall advised everyone to strike out any "force majeure" clauses out of any future third-party DR protection contracts.
Hurricane Katrina hit the US hard, killing over 1400 people, and America still has not fully recovered. The recovery of thecity of New Orleans has been slow. Massive relocations has caused a deficit of talent inthe area, not just IT talent, but also in the areas of medicine, education and other professions. The result has been degraded social services, encouraging others to relocate as well. Some have called it the "liberation effect", a major event that causespeople to move to a new location or take on a new career in a different field.
On a personal note, I was in New Orleans for a conference the week prior to landfall, and helped clients with their recoveries the weeks after. For more on how IBM Business Continuity Recovery Services (BCRS) helped clients during Hurricane Katrina, see the following [media coverage].
technorati tags: Marshall Lancaster, United Stationers, Lagasse, BCP, Hurricane Katrina, Business Continuity, Disaster Recovery, donut tire, New Orleans, VoIP, Citrix, VPN, liberation effect, Chicago, force majeure, BCRS