Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
Every year, I teach hundreds of sellers how to sell IBM storage products. I have been doing this since the late 1990s, and it is one task that has carried forward from one job to another as I transitioned through various roles from development, to marketing, to consulting.
This week, I am in the city of Taipei [Taipei] to teach Top Gun sales class, part of IBM's [Sales Training] curriculum. This is only my second time here on the island of Taiwan.
As you can see from this photo, Taipei is a large city with just row after row of buildings. The metropolitan area has about seven million people, and I saw lots of construction for more on my ride in from the airport.
The student body consists of IBM Business Partners and field sales reps eager to learn how to become better sellers. Typically, some of the students might have just been hired on, just finished IBM Sales School, a few might have transferred from selling other product lines, while others are established storage sellers looking for a refresher on the latest solutions and technologies.
I am part of the teach team comprised of seven instructors from different countries. Here is what the week entails for me:
Monday - I will present "Selling Scale-Out NAS Solutions" that covers the IBM SONAS appliance and gateway configurations, and be part of a panel discussion on Disk with several other experts.
Tuesday - I have two topics, "Selling Disk Virtualization Solutions" and "Selling Unified Storage Solutions", which cover the IBM SAN Volume Controller (SVC), Storwize V7000 and Storwize V7000 Unified products.
Wednesday - I will explain how to position and sell IBM products against the competition.
Thursday - I will present "Selling Infrastructure Management Solutions" and "Selling Unified Recovery Management Solutions", which focus on the IBM Tivoli Storage portfolio, including Tivoli Storage Productivity Center, Tivoli Storage Manager (TSM), and Tivoli Storage FlashCopy Manager (FCM). The day ends with the dreaded "Final Exam".
Friday - The students will present their "Team Value Workshop" presentations, and the class concludes with a formal graduation ceremony for the subset of students who pass. A few outstanding students will be honored with "Top Gun" status.
These are the solution areas I present most often as a consultant at the IBM Executive Briefing Center in Tucson, so I can provide real-life stories of different client situations to help illustrate my examples.
The weather here in Taipei calls for rain every day! I was able to take this photo on Sunday morning while it was still nice and clear, but later in the afternoon, we had quite the downpour. I am glad I brought my raincoat!
While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.
"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this." -- Hu Yoshida
Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.
First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.
Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.
IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.
IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.
IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.
IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.
This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.
IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.
IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.
Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).
IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.
The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).
Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.
IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.
So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
Storage for the Green Data Center
I presented this topic in four general categories:
Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
Earlier this week, EMC announced its Symmetrix V-Max, following two trends in the industry:
Using Roman numerals. The "V" here is for FIVE, as this is the successor to the DMX-3 and DMX-4. EMC might have gotten the idea from IBM's success with the XIV (which does refer to the number 14, specifically the 14th class of a Talpiot program in Israel that the founders of XIV graduated from).
Adding "-Max", "-Monkey" or "2.0" at the end of things to make them sound more cool and to appeal to a younger, hipper audience. EMC might have gotten this idea from Pepsi-Max (... a taste of storage for the next generation?)
I took a cue from President Obama and waited a few days to collect my thoughts and do my homework before responding.Special thanks to fellow blogger ChuckH in giving me a [handy list of reactions] for me to pick and choose from. It appears that EMC marketing machine feels it is acceptable for their own folks to claim that EMC is doing something first, or that others are catching up to EMC, but when other vendors do likewise, then that is just pathetic or incoherent. Here are a few reactions already from fellow bloggers:
This was a major announcement for EMC, addressing many of the problems, flaws and weaknesses of the earlier DMX-3 and DMX-4 deliverables. Here's my read on this:
Now you can have as many FCP ports (128) as an IBM System Storage DS8300, although the maximum number of FICON ports is still short, and no mention of ESCON support. The Ethernet ports appear to be 1Gb, not the new 10GbE you might expect.
Support for System z mainframe
V-Max adds some new support to catch up with the DS8000, like Extended Address Volumes (EAV). EMC is still not quite there yet. IBM DS8000 continues to be the best, most feature-rich storage option if you have System z mainframe servers.
Both the IBM DS8000 and HDS USP-V beat the DMX-4 in performance, and in some cases the DMX-4 even lost to the IBM XIV, so EMC had to do something about it. EMC chooses not to participate in industry-standard performance benchmarks like those from the [Storage Performance Council], which limits them to vague comparisons against older EMC gear. I'll give EMC engineers the benefit of the doubt and say that now V-Max is now "comparably as fast as HDS and IBM offerings".
Getting "V" in the name
The "V" appears to be for the roman number five, not to be confused with external heterogeneous storage virtualization that HDS USP-V and IBM SVC provide. There is no mention of synergy with EMC's failed "Invista" product, and I see no support for attaching other vendors disk to the back of this thing.
Switch to Intel processor
Apple switched its computers from PowerPC to Intel-based, and now EMC follows in the same path. There are some custom ASICs still in V-Max, so it is not as pure as IBM's offerings.
Modular, XIV-like Scale-out Architecture
Actually, the packaging appears to follow the familiar system bays and storage bays of the DMX-4 and DMX-4 950 models, but architecturally offers XIV-like attachment across a common switch network between "engines", EMC's term for interface modules.
Non-disruptive data migration
IBM's SoFS, DR550 and GMAS have this already, as does as anything connected behind an IBM SAN Volume Controller.
A long time ago, IBM used to have midrange disk storage systems called "FAStT" which stood for Fibre Array Storage Technology, so this might have given EMC the idea for their "Fully Automated Storage Tiering" acronym. The concept appears similar to what IBM introduced back in 2007 for the Scale-Out-File Services [SofS] which not only provides policy-based placement, movement and expiration on different disk tiers, includes tape tiers as well for a complete solution. I don't see anything in the V-Max announcement that it will support tape anytime soon.
And what ever happend to EMC's Atmos? Wasn't that supposed to be EMC's new direction in storage?
Zero-data loss Three-site replication
IBM already calls this Metro/Global Mirror for its IBM DS8000 series, but EMC chose to call it SRDF/EDP for Extended Distance Protection.
Ease of Use
The most significant part of the announcement is that EMC is finally focusing on ease-of-use.In addition to reducing the requirement for "Bin File" modifications, this box has a redesigned user interface to focus on usability issues. For past DMX models, EMC customers had to either hire EMC to do tasks for them that were just to difficult otherwise, or buy expensive software like their EMC Control Center to manage. EMC willcontinue to sell DMX-4 boxes for a while, as they are probably supply-constrained on the V-Max side, but I doubt they will retro-fit these new features back to DMX-3 and DMX-4.
When IBM announced its acquisition of XIV over a year ago now, customers were knocking down our doors to get one. This caught two particular groups looking like a [deer in headlights]:
EMC Symmetrix sales force: Some of the smarter ones left EMC to go sell IBM XIV, leaving EMC short-handed and having to announce they [were hiring during their layoffs]. Obviously, a few of the smart ones stayed behind, to convince their management to build something like the V-Max.
IBM DS8000 sales force: If clients are not happy with their existing EMC Symmetrix, why don't they just buy an IBM DS8000 instead? What does XIV have that DS8000 doesn't?
Let me contrast this with the situation Microsoft Windows is currently facing.
I am often asked by friends to help them pick out laptops and personal computers. I use Linux, Windows and MacOS, so have personal experience with all three operating systems.
Linux is cheaper, offers the power-user the most options for supporting older, less-powerfulequipment, but I wouldn't have my Mom use it. While distributions like Ubuntu are makinggreat strides, it is just too difficult for some people.
MacOS is nice, I like it, it works out of the box with little or no customization and an intuitive interface. However, some of my friends don't make IBM-level salaries, and have to watch their budget.
In their "I'm a PC" campaign, Microsoft is fighting both fronts. Let's examine two commercials:
In the first commercial, a young eight-year-old puts together a video from pictures oftoy animals and some background music.The message: "Windows is easier to use than Linux!" If they really wanted to send this message, they should have shown senior citizens instead.
In the second commercial, a young college student is asked to find a laptop with 17 inchscreen, and a variety of other qualifications, for under $1000 US dollars. The only modelat the Apple store below this price had a 13 inch screen, but she finds a Windows-based system that had this size screen and met all the other qualifications. The message: "Windows-based hardware from a variety of competitors are less expensive than hardware from Apple!"
Both Microsoft and Apple charge a premium for ease-of-use.In the storage world, things are completely opposite. Vendors don't charge a premium forease-of-use. In fact, some of the easiest to use are also the least expensive.
If you just have Windows and Linux, you can get some entry level system likethe IBM DS3000 series, only a few features, and can be set up in six simple steps.
Next, if you have a more interesting mix of operating systems, Linux, Windows and some flavorsof UNIX like IBM AIX, HP-UX or Sun Solaris, then you might want the features and functionsof more pricier midrange offerings. More options means that configuration and deploymentis more difficult, however.
Finally, if you are serious Fortune 500 company, running your mission critical applications on System z or System i centralized systems in a big data center, that you might be willing to pay top dollar for the most feature-rich offerings of an Enterprise-class machine.Thankfully you have an army of highly-trained staff to handle the highest levels of complexity.
IBM's DS8000, HDS USP-V and EMC's Symmetrix are the key players in the Enterprise-classspace. They tried to be ["all things to all people"], er.. perhaps all things to allplatforms. All of the features and functions came at a price, not just in dollars, butin complexity and difficulty. You needed highly skilled storage admins using expensive storage management software, or be willing to hirethe storage vendor's premium services to get the job done.
IBM recognized this trend early. IBM's SVC, N series and now XIV all offer ease-of-use withenterprise-class features and functions, at lower total cost of ownership than traditional enterprise-class systems. IBM is not the only one, of course, as smaller storage start-ups like 3PAR,Pillar Data Systems, Compellent, and to some extent Dell's EqualLogic all recognized thisand developed clever offerings as well.
While IBM's XIV may not have been the first to introduce a modular, scale-out architectureusing commodity parts managed by sophisticated ease-of-use interfaces, its success might have been the kick-in-the-butt EMC needed to follow the rest of the industry in this direction.
You may not be the right person to ask but I am asking everyone so "How do you see hybrid disk drives?"
(For the record, I am not immediately related to Robert. At onepoint, "Pearson" was the 12th most common surname in the USA, but now doesn't even make the Top 100.)
Robert, I would like to encourage you and everyone else to ask questions, don't worry if I am the wrong person to ask, asprobably I know the right person within IBM. Some people have called me the "Kevin Bacon" of Storage,as I am often less than six degrees away from the right person, having worked in IBM Storage for over 20 years.
For those not familiar with hybrid drives, there is a good write-up in Wikipedia.
Unfortunately, most of the people I would consult on this question, such as those from Market Intelligence or Research, are on vacation for the holidays, so, Robert, I will have to rely on my trusted 78-card Tarot deck and answer you with a five-card throw.
Your first card, Robert, is the Hermit. This card represents "introspection". The best I/O is no I/O, which means that if applications can keep the information they need inside server memory, you can avoid the bus bandwidth limitations to going to external storage devices. Where external storage makes sense is when data is shared between servers, or when the single server is limited to a set amount of internal memory. So, consider maxing out the memory in your server first (IBM would be glad to sell you more internal memory!!!), then consider outside solid-state or hybrid devices. Windows for example has an architectural limit of 4GB.
Your second card, Robert, is the Four of Cups, representing "apathy".On the card, you see three cups together, with the fourth cup being delivered from a cloud. This reminds me thatwe have three storage tiers already (memory,disk,tape), and introducing a fourth tier into the mix may not garnermuch excitement. For the mainframe, IBM introduced a Solid-State Device, call the Coupling Facility, which can be accessed from multipleSystem z servers. It is used heavily by DFSMS and DB2 to hold shared information. However, given some customer's apathytowards Information Lifecycle Management which includes "tiered storage", introducing yet another tier that forcespeople to decide what data goes where may be another challenge.
Your third card, Robert, is the Chariot, which represents "Speed, Determination,and Will". In some cases, solid state disk are faster for reading, but can be slower for writing. In the case of ahybrid drive, where the memory acts as a front-end cache, read-hits would be faster, but read-misses might be slower.While the idea of stopping the drives during inactivity will reduce power consumption, spinning up and slowing downthe disk may incur additional performance penalties. At the time of this post, the fastest disk system remains the IBM SAN Volume Controller, based on SPC-1 and SPC-2 benchmarks in excess of those published for other devices.
Your fourth card, Robert, is the Eight of Pentacles, which represents"Diligence, Hard work". The pentacles are coins with five-sided stars on them, and this often represents money.Our research team has projected that spinning disk will continue to be a viable and profitable storage media for at least anothereight years.
Your fifth and last card, Robert, is the World, which normallyrepresents "Accomplishment", but since it is turned upside down, the meaning is reversed to "Limitation". Some Hybriddisks, and some types of solid state memory in general, do have limitations in the number of write cycles they can handle. For thoseunhappy with the frequency and slowness for rebuilds on SATA disk may find similar problems with hybrid drives.For that reason, businesses may not trust using hybrid drives for their busiest, mission-critical applications, but certainlymight use it for archive data with lower write-cycle requirements.
The tarot cards are never wrong, but certainly interpretations of the cards can be.
EMC Corporation (NYSE:EMC) today announced it has been positioned as a leader in the Forrester Wave™: Enterprise Open Systems Virtual Tape Library (VTL), Q1 2008 by Forrester Research, Inc. (January 31, 2008), an independent market and technology research firm. EMC achieved a position as a leader in the Forrester Wave report on virtual tape libraries based on the largest installed base of the EMC® Disk Library family of systems, its broad ecosystem interoperability. Virtual tape libraries emulate tape drives and work in conjunction with existing backup software applications, enabling fast backup and restoration of data by using high-capacity, low-cost disk drives.
EMC was the first major vendor in the open systems virtual tape library market as it introduced the EMC Disk Library in April 2004 and today is a leading provider of open systems virtual tape solutions, with systems that are designed for businesses and organizations of all sizes.
While the press release implies that "EDL equals VTL", Chuck tries to explain they are in fact very different. Here is an excerpt from his blog post:
Virtual Tape Libraries vs. Disk Libraries
As many of you know, VTLs have been around for a while. They use disk as a cache -- they buffer the incoming backup streams, do some housekeeping and stacking, then turn around and write tape efficiently. When you go to restore, you're usually coming back off of tape, unless the backup image in question is sitting in the disk cache.
Now, there is nothing wrong with the VTL approach, but it was conceived in a time when disks were horribly expensive. It was also pretty clear to many of us that disks were going to be a whole lot cheaper in the near future, and this fundamental assumption wouldn't be valid for much longer.
I kept thinking in terms of disk as a direct target for a backup application. No modifications to the backup application. Native speed of sequential disks for both backup and restore. Tape positioned as a backup to the backup. Use the strengths of the underlying array (e.g. CLARiiON) for performance, availability, management, etc.
We ended up calling the concept a "disk library" to differentiate from the VTLs that had come before it. It was a different value proposition and offering, based on the emergence of lower-cost disk media.
... It's nice to see we're at 1,100+ customers, and still going strong.
For those new to the blogosphere, there is a difference between "Press Releases" as formalcorporate communications versus "Blog Posts" which are informal opinions of the individual blogger, whichmay or may not match exactly the views of their respective employer.As we've learned many times before, one should not treat termslike "first" or "leader" in corporate press releases literally! Let's explore each.
Was EDL the first "open systems" Virtual Tape Library?
This is implied by the Forrester report. Chuck mentions the "VTLs that had came before it" in his blog, and many people are aware that IBM and StorageTek had introduced mainframe-attached VTLs in the 1990s. But what about VTL for "open systems"?
(Hold aside for the moment that IBM System zmainframe is an open system itself, with z/OS certified as a bona fide UNIX operating system by the [the Open Group] standards body. Most analysts and research firms usually refer only to the non-mainframe versions of UNIX and Windows. Alternative definitions for "open systems" can be foundin [Web definitions or Wikipedia]. I will assume Forrester meantnon-mainframe servers.)
IBM announced AIX non-mainframe attachment via SCSI connectivity to the IBM 3494 Virtual Tape Server (VTS) on Feb 16, 1999, with general availability in May 28, 1999. That's nearly FIVE YEARS before the April 2004 introduction of EDL. IBM VTS support for Sun Solaris and Microsoft Windows came shortly thereafter in November 2000, and support for HP-UX a bit later in June 2001. One of my 17 patents is for the software inside the IBM 3494 VTS, so like Chuck, I can takesome pride in the success of a successful product.
(I don't remember if StorageTek, which was subsequently acquired by Sun, had ever supported non-mainframe operating systems with their Virtual Storage Manager[VSM] offering, but if they did, I am sure it was also before EMC.)
Last week, another EMC blogger, BarryB (aka [the Storage Anarchist]),took me to task in comments on my post [IBM now supports 1TB SATA drives]. He felt that IBM should not claim support, given that the software inside the IBM System Storage N series is developed by NetApp. He compared this to the situation of HP and Sun re-badging the HDS USP-V disk system. If someone else wrote the software, BarryB opines, IBM should not claim credit for it. I tried to explain how IBM provides added value and has full-time employees dedicated to N series development and support, butdoubt I have changed his mind.
Why do I bring that up? Because the EMC Disk Library runs OEM software from FalconStor. Basically EMC is assembling a hardware/software solution with components provided from OEM suppliers. Hmmm? Sound familiar? Who is calling the kettle black?
If there is a clear winner here, it is FalconStor itself.Perhaps one of the worst kept industry secrets is that FalconStor software is also used in VTL offerings from Sun, Copan, and IBM, the latter embodied as the [IBM TS7520 Virtualization Engine] offering. If you like the concept of an EDL,but prefer instead one-stop shopping from an "information infrastructure" vendor, IBM can offer the TS7520 along with servers, software and services for a complete end-to-end solution.
Can EMC claim to be "a leader" in Virtual Tape Libraries?
During the measured quarter, IBM shipped its 10 millionth LTO-4 tape drive cartridge to Getty Images, the world's leading creator and distributor of still imagery, footage and multi-media products, as well as a recognized provider of other forms of premium digital content, including music. Getty Images is using the LTO-4 drives as part of a tiered infrastructure of IBM disk and tape solutions that help support the backup needs of their digital imagery;
IBM shipped more than 1,500 Petabytes of tape storage in Q3'07 alone;
During Q3'07, IBM shipped the 10,000th IBM System Storage TS3500 Tape Library. The TS3500 is a highly scalable tape library with support from 1 to 192 tape drives and up to 6,400 cartridge slots for open system, mainframe and virtual tape system attachment.
Let's take a look at the numbers. IBM has sold over 5,400 virtual tape libraries. Sun/STK has sold over 4,000 virtual tape libraries. Both are drastically more than the 1,100 mentioned in Chuck's post. Does IDC recognize EMC in third place? No, EMC chooses instead to declare EDL as disk arrays (probably toprop up their IDC "Disk Tracker" numbers), so they don't even earn an honorable mention under the virtual tape librarycategory. This of course includes the number of mainframe-attached models from IBM and Sun/STK. So, if EMC did call these tape systems instead, they might showup in third place, and as such EMC could claim to be "a leader" in much the same way an athlete can claim to be an "Olympic medalist" winning the bronze for third place. (If you limit thecount to just the FalconStor-based models from IBM, EMC, Sun and Copan, then EMC moves up to first or second, but then press release titles like "EMC a Leader in FalconStor-based non-mainframe Virtual Tape Libraries" can get too confusing.)
Chuck, if you are reading this, I feel you have every right to celebrate your involvement with the EDL. Despite having common software and hardware components, both IBM and EMC can rightfully declare their own unique value-add through their respective VTL offerings. Like the IBM N series, the EMC Disk Library is not diminished by the fact the software was written by someone else. BarryB might disagree.
Well, it's Tuesday again, and that means more announcements from IBM!
In conjunction with IBM's new [System z10 Business Class (BC)] mainframe designed for Small and Medium-sized Businesses (SMB), IBM also announced related storage productenhancements.
Yes, it's alive! Contrary to the FUD you might have read from our competitors, IBM continues to sell thousands and thousands of IBM System Storage DS6800 disk systems, and now enhances them with the optionfor 450GB 15K RPM drives. What is nice about these 450GB drives is that they are as fast or faster* than 300GBdrives, so the typical trade-off between performance and capacity do not apply.
(* I compared Seagate 15.6K (450GB) with 15.5K (300GB) models.
Avg Seek time (Read)
Avg Seek time (Write)
Full Seek time (Read)
Full Seek time (Write)
This may or may not result in application performance improvements, depending on workload pattern. Your mileage may vary.)
Our clients report back that these are incredibly stable systems that they don't have toworry about. This enhancement applies to both the [511/EX1 models] and [522/EX2 models].
Understanding that clients want complete solutions from single vendors, IBM offers synergy between System z and the IBM System Storage DS8000 disk systems. The latest R4.1 microcode upgrade offers two key features onthe various models [2107,
zHPF - High Performance FICON for System z. IBM was able to increase the throughput on 4 Gbps links. For OLTP workloads randomly accessing 4KB blocks, IBM internal tests showed zHPF doubled performance from 13,000 IOPSto 26,000 IOPS per channel. For sequential workloads, such as batch processing, zHPF increased performance 50 percent, from 350 MB/sec to 525 MB/sec.
In February, IBM previewed[IncrementalResync] for z/OS Metro Global Mirror. However, some concepts are better explained with pictures.
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z, you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and also have site "A" sending data over to site "C" asynchronously using z/OS Global Mirror. This is called "z/OS Metro Global Mirror".
In the past, if the disk system in site A failed, you would switch over to site B, which would have to resend send all the data again to site C to be resynchronized. This is because site B was not tracking what the System Data Mover (SDM) reader had or had not yet processed.
With DS8000 4.1, the "incremental resync" function that, along with using IBM HyperSwap, requires site B to only send and resync the data that was in-flight when the outage occurred. When you compare the difference in sending this limited amount of in-flight data with the traditional complete volume of data, you can see how "Incremental Resync" can resynchronize the data 95% faster, and also greatly decrease your bandwidth requirements. This reduces the risk in case a subsequent outage occurs.
Introduced originally in 1997 as the IBM Virtual Tape Server (VTS), the [IBMSystem Storage TS7700] series supports Grid capabilityto replicate tape image data across locations. Here's a quick recap of today's announcement:
Existing TS7740 can be upgraded up to 9TB of disk cache. New models can have up to 13TB of disk cache.
A new "tape-less" TS7720 that has up to 70TB of disk cache.
Integrate Library Management support. I discussed[IntegratedRemovable Media Manager (IRMM)] before, and this is basically IRMM inside. For those with TS3500 tape libraries,this support eliminates the need for a separate IBM 3953 L05 Library Manager.
TS1130 back-end tape drive support. These are the fastest 1TB drives in the industry, with support of built-in encryption, and now can be used asthe physical tape back-end for the virtual tape TS7740 repository.
While our competitors might be boarding up their windows in preparation for the economic downturn in the USAeconomy, IBM remains generating solid results. San Jose Mercury News has an article that discusses this titled[IBM's 3Q profit strong on global sales].There has never been a better time to buy from, or invest in, IBM!
Continuing my week in Chicago, for the IBM Storage Symposium 2008, I attended two presentations on XIV.
XIV Storage - Best Practices
Izhar Sharon, IBM Technical Sales Specialist for XIV, presented best practices using XIV in various environments.He started out explaining the innovative XIV architecture: a SATA-based disk system from IBM can outperformFC-based disk systems from other vendors using massive parallelism. He used a sports analogy:
"The men's world record for running 800 meters was set in 1997 by Wilson Kipketer of Denmark in a time of 1:41.11.
However, if you have eight men running, 100 meters each, they will all cross the finish line in about 10 seconds."
Since XIV is already self-tuning, what kind of best practices are left to present? Izhar presented best practicesfor software, hosts, switches and storage virtualization products that attach to the XIV. Here's some quickpoints:
Use as many paths as possible.
IBM does not require you to purchase and install multipathing software as other competitors might. Instead, theXIV relies on multipathing capabilities inherent to each operating system.For multipathing preference, choose Round-Robin, which is now available onAIX and VMware vSphere 4.0, for example. Otherwise, fixed-path is preferred over most-recently-used (MRU).
Encourage parallel I/O requests.
XIV architecture does not subscribe to the outdated notion of a "global cache". Instead, the cache is distributed across the modules, to reduce performance bottlenecks. Each HBA on the XIV can handle about 1400requests. If you have fewer than 1400 hosts attached to the XIV, you can further increase parallel I/O requests by specifying a large queue depth in the host bus adapter (HBA).An HBA queue depth of 64 is a good start. Additional settings mightbe required in the BIOS, operating system or application for multiple threads and processes.
For sequential workloads, select host stripe size less than 1MB. For random, select host stripe size larger than 1MB. Set rr_min_io between ten(10) and the queue depth(typically 64), setting it to half of the queue depth is a good starting point.
If you have long-running batch jobs, consider breaking them up into smaller steps and run in parallel.
Define fewer, larger LUNs
Generally, you no longer need to define many small LUNs, a practice that was often required on traditionaldisk systems. This means that you can now define just 1 or 2 LUNs per application, and greatly simplifymanagement. If your application must have multiple LUNs in order to do multiple threads or concurrent I/O requests, then, by all means, define multiple LUNs.
Modern Data Base Management Systems (DBMS) like DB2 and Oracle already parallelize their I/O requests, sothere is no need for host-based striping across many logical volumes. XIV already stripes the data for you.If you use Oracle Automated Storage Management (ASM), use 8MB to 16MB extent sizes for optimal performance.
For those virtualizing XIV with SAN Volume Controller (SVC), define manage disks as 1632GB LUNs, in multiple of six LUNs per managed disk group (MDG), to balance across the six interface modules. Define SVC extent size to 1GB.
XIV is ideal for VMware. Create big LUNs for your VMFS that you can access via FCP or iSCSI.
Organize data to simplify Snapshots.
You no longer need to separate logs from databases for performance reasons. However, for some backup productslike IBM Tivoli Storage Manager (TSM) for Advanced Copy Services (ACS), you might want to keep them separatefor snapshot reasons. Gernally, putting all data for an application on one big LUNgreatly simplifies administration and snapshot processing, without losing performance.If you define multiple LUNs for an application, simply put them into the same "consistencygroup" so that they are all snapshot together.
OS boot image disks can be snapshot before applying any patches, updates or application software, so that ifthere are any problems, you can reboot to the previous image.
Employ sizing tools to plan for capacity and performance.
The SAP Quicksizer tool can be used for new SAP deployments, employing either the user-based orthroughput-based sizing model approach. The result is in mythical unit called "SAPS", which represents0.4 IOPS for ERP/OLTP workloads, and 0.6 IOPS for BI/BW and OLAP workloads.
If you already have SAP or other applications running, use actual I/O measurements. IBM Business Partners and field technical sales specialists have an updated version of Disk Magic that can help size XIV configurations fromPERFMON and iostat figures.
Lee La Frese, IBM STSM for Enteprise Storage Performance Engineering, presented internal lab test results forthe XIV under various workloads, based on the latest hardware/software levels [announced two weeks ago]. Three workloadswere tested:
Web 2.0 (80/20/40) - 80 percent READ, 20 percent WRITE, 40 percent cache hits for READ.YouTube, FlickR, and the growing list at [GoWeb20] are applications with heavy read activity, but because of[long-tail effects], may not be as cache friendly.
Social Networking (50/50/50) - 50 percent READ, 50 percent WRITE, 50 percent cache hits for READ.Lotus Connections, Microsoft Sharepoint, and many other [social networking] usage are more write intensive.
Database (70/30/50) - 70 percent READ, 30 percent WRITE, 50 percent cache hits for READ.The traditional workload characteristics for most business applications, especially databases like DB2 andOracle on Linux, UNIX and Windows servers.
The results were quite impressive. There was more than enough performance for tier 2 application workloads,and most tier 1 applications. The performance was nearly linear from the smallest 6-module to the largest 15-module configuration. Some key points:
A full 15-module XIV overwhelms a single SVC 8F4 node-pair. For a full XIV, consider 4 to 8 nodes 8F4 models, or 2 to 4 nodes of an 8G4. For read-intensive cache-friendly workloads, an SVC in front of XIV was able to deliver over 300,000 IOPS.
A single node TS7650G ProtecTIER can handle 6 to 9 XIV modules. Two nodes of TS7650G were needed to drivea full 15-module XIV. A single node TS7650 in front of XIV was able to ingest 680 MB/sec on the seventh day with17 percent per-day change rate test workload using 64 virtual drives. Reading the data back got over 950 MB/sec.
For SAP environments where response time 20-30 msec are acceptable, the 15-module XIV delivered over 60,000 IOPS. Reducing this down to 25,000-30,000 cut the msec response time to a faster 10-15 msec.
These were all done as internal lab tests. Your mileage may vary.
Not surprisingly, XIV was quite the popular topic here this week at the Storage Symposium. There were many moresessions, but these were the only two that I attended.
Well, it's Tuesday, and that means IBM announcements! Today is bigger, as there are a lot of Dynamic Infrastructure announcements throughout the company with a common theme, cloud computing and smart business systems that support the new way of doing things. Today, IBM announced its new "IBM Smart Archive" strategy that integrates software, storage, servers and services into solutions that help meet the challenges of today and tomorrow. IBM has been spending the past few years working across its various divisions and acquisitions to ensure that our clients have complete end-to-end solutions.
IBM is introducing new "Smart Business Systems" that can be used on-premises for private-cloud configurations, as well as by cloud-computing companies to offer IT as a service.
IBM [Information Archive] is the first to be unveiled, a disk-only or blended disk-and-tape Information Infrastructure solution that offers a "unified storage" approach with amazing flexibility for dealing with various archive requirements:
For those with applications using the IBM Tivoli Storage Manager (TSM) or IBM System Storage Archive Manager (SSAM) API of the IBM System Storage DR550 data retention solution, the Information Archive will provide a direct migration, supporting this API for existing applications.
For those with IBM N series using SnapLock or the File System Gateway of the DR550, the Information Archive will support various NAS protocols, deployed in stages, including NFS, CIFS, HTTP and FTP access, with Non-Erasable, Non-Rewriteable (NENR) enforcement that are compatible with current IBM N series SnapLock usage.
For those using NAS devices with PACS applications to store X-rays and other medical images, the Information Archive will provide similar NAS protocol interfaces. Information Archive will support both read-only data such as X-rays, as well as read/write data such as Electronic Medical Records.
Information Archive is not just for compliance data that was previously sent to WORM optical media. Instead, it can handle all kinds of data, rewriteable data, read-only data, and data that needs to be locked down for tamper protection. It can handle structured databases, emails, videos and unstructured files, as well as objects stored through the SSAM API.
The Information Archive has all the server, storage and software integrated together into a single machine type/model number. It is based on IBM's General Parallel File System (GPFS) to provide incredible scalability, the same clustered file system used by many of the top 500 supercomputers. Initially, Information Archive will support up to 304TB raw capacity of disk and Petabytes of tape. You can read the [Spec Sheet] for other technical details.
For those who prefer a more "customized" approach, similar to IBM Scale-Out File Services (SoFS), IBM has [Smart Business Storage Cloud]. IBM Global Services can customize a solution that is best for you, using many of the same technologies. In fact, IBM Global Services announced a variety of new cloud-computing services to help enterprises determine the best approach.
In a related announcement, IBM announced [LotusLive iNotes], which you can think of as a "business-ready" version of Google's GoogleApps, Gmail and GoogleCalendar. IBM is focused on security and reliability but leaves out the advertising and data mining that people have been forced to tolerate from consumer-oriented Web 2.0-based solutions. IBM's clients that are already familiar with on-premises version of Lotus Notes will have no trouble using LotusLive iNotes.
There was actually a lot more announced today, which I will try to get to in later posts.
The weather has warmed up here in Tucson so I started my Spring Cleaning early this year and unearthed from my garage a [Bankers Box] full of floppy diskettes.
IBM invented the floppy disk back in 1971, and continued to make improvements and enhancements through the 1980s and 1990s. It will be one of the many inventions celebrated as part of IBM's Centennial (100-year) anniversary. Here is an example [T-shirt]
IBM needed a way to send out small updates and patches for microcode of devices out in client locations. IBM had drives that could write information, and sent out "read-only" drives to the customer locations to receive these updates. These were flexible plastic circles with a magnetic coating, and placed inside a square paper sleeve. Imagine a floppy disk the size of a piece of standard paper. The 8-inch floppy fit conveniently in a manila envelope, sendable by standard mail, and could hold nearly 80KB of data.
I've been using floppies for the past thirty years. Here's some of my fondest memories:
While still in high school, my friend Franz Kurath and I formed "Pearson Kurath Systems", a software development firm. We wrote computer programs to run on UNIX and Personal Computers for small businesses here in Tucson. Whenever we developed a clever piece of code, a subroutine or procedure, we would save it on a floppy disk and re-use it for our next project. We wrote in the BASIC language, and our databases were simple Comma-Separated-Variable (CSV) flat files.
The 5.25-inch floppies we used could hold 360KB, and were flexible like the 8-inch models. Later versions of these 5.25-inch floppies would be able to hold as much as 1.2MB of data. We would convert single-sided floppies into double-sided ones by cutting out a notch in the outer sleeve. Covering up the notches would mark them as read-only.
The 3.5-inch floppies were introduced with a hard plastic shell, with the selling point that you can slap on a mailing label and postage and send it "as is" without the need for a separate envelope. These new 3.5-inch floppies would carry "HD" for high density 720KB, and double-sided versions could hold 1.44MB of data. The term "diskette" was used to associate these new floppies with [hard-shelled tape cassettes]. Sliding a plastic tab would allow floppies to be marked "read-only". IBM has the patent on this clever invention.
Continuing our computer programming business in college, Franz and I took out a bank loan to buy our first Personal Computer, for over $5000 dollars USD. Until then, we had to use equipment belonging to each client. The banks we went to didn't understand why we needed a computer, and suggested we just track our expenses on traditional green-and-white ledger paper. Back then, peronsal computers were for balancing your checkbook, playing games and organizing your collection of cooking recipies. But for us, it was a production machine. A computer with both 5.25-inch and 3.5-inch drives could copy files from one format to another as needed. The boost in productivity paid for itself within months.
Apple launched its Macintosh computer in 1984, with a built-in 3.5-inch disk drive as standard equipment. Here is a YouTube video of an [astronaut ejecting a floppy disk] from an Apple computer in space.
In my senior year at the University of Arizona, my roommate Dave had borrowed my backpack to hold his lunch for a bike ride. He thought he had taken everything out, but forgot to remove my 3.5-inch floppy diskette containing files for my senior project. By the time he got back, the diskette was covered in banana pulp. I was able to rescue my data by cracking open the plastic outer shell, cleaning the flexible magnetic media in soapy water, placing it back into the plastic shell of a second diskette, and then copied the data off to a third diskette.
After graduating from college, Franz and I went our separate ways. I went to work for IBM, and Franz went to work for [Chiat/Day], the advertising agency famous for the 1984 Macintosh commercial. We still keep in touch through Facebook.
At IBM, I was given a 3270 terminal to do my job, and would not be assigned a personal computer until years later. Once I had a personal computer at home and at work, the floppy diskette became my "briefcase". I could download a file or document at work, take it home, work on it til the wee hours of the morning, and then come back the next morning with the updated effort.
To help prepare me for client visits and public speaking at conferences, IBM loaned me out to local schools to teach. This included teaching Computer Science 101 at Pima Community College. When asked by a student whether to use "disc" or "disk", I wrote a big letter "C" on the left side of the chalkboard, and a big letter "K" on the right side. If it is round, I told the students while pointing at the letter "C", like a CD-ROM or DVD, use "disc". If it has corners, pointing to corners of the letter "K", like a floppy diskette or hard disk drive, use "disk".
On one of my business trips to visit a client, we discovered the client had experienced a problem that we had just recently fixed. Normally, this would have meant cutting a Program Trouble Fix (PTF) to a 3480 tape cartridge at an IBM facility, and send it to the client by mail. Unwilling to wait, I offered to download the PTF onto a floppy diskette on my laptop, upload it from a PC connected to their systems, and apply it there. This involved a bit of REXX programming to deal with the differences between ASCII and EBCDIC character sets, but it worked, and a few hours later they were able to confirm the fix worked.
In 1998, Apple would signal the begining of the end of the floppy disk era, announcing their latest "iMac" would not come with an internal built-in floppy drive. David Adams has a great article on this titled [The iMac and the Floppy Drive: A Conspiracy Theory]. You can get external floppy drives that connect via USB, so not having an internal drive is no longer a big deal.
While teaching a Top Gun class to a mix of software and hardware sales reps, one of the students asked what a "U" was. He had noticed "2U" and "3U" next to various products and wondered what that was referring to. The "U" represents the [standard unit of measure for height of IT equipment in standard racks]. To help them visualize, I explained that a 5.25-inch floppy disk was "3U" in size, and a 3.5-inch floppy diskette was "2U". Thus, a "U" is 1.75 inches, the thinnest dimension on a two-by-four piece of lumber. Servers that were only 1U tall would be referred to as "pizza boxes" for having similar dimensions.
Every year, right around November or so, my friends and family bring me their old computers for me to wipe clean. Either I would re-load them with the latest Ubuntu Linux so that their kids could use it for homework, or I would donate it to charity. Last November, I got a computer that could not boot from a CD-ROM, forcing me to build a bootable floppy. This gave me a chance to check out the various 1-disk and 2-disk versions of Linux and other rescue disks. I also have a 3-disk set of floppies for booting OS/2 in command line mode.
So while this unexpected box of nostalgia derailed my efforts to clean out my garage this weekend, it did inspire me to try to get some of the old files off them and onto my PC hard drive. I have already retrieved some low-res photographs, some emails I sent out, and trip reports I wrote. While floppy diskettes were notorious for being unreliable, and this box of floppies has been in the heat and cold for many Arizonan summers and winters, I am amazed that I was able to read the data off most of them so far, all the way back to data written in 1989. While the data is readable, in most cases I can't render it into useful information. This brings up a few valuable lessons:
Backups are not Archives
Some of the files are in proprietary formats, such as my backups for TurboTax software. I would need a PC running a correct level of Windows operating system, and that particular software, just to restore the data. TurboTax shipped new software every year, and I don't know how forward or backward-compatible each new release was.
Another set of floppies are labeled as being in "FDBACK" format. I have no idea what these are. Each floppy has just two files, "backup.001" and "control.001", for example.
Backups are intended solely to protect against unexpected loss from broken hardware or corrupted data. If you plan to keep data as archives for long-term retention, use archive formats that will last a long time, so that you can make sense of them later.
Operating System Compatibility
Windows 7 and all of my favorite flavors of Linux are able to recognize the standard "FAT" file system that nearly all of my floppies are written in. Sadly, I have some files that were compressed under OS/2 operating system using software called "Stacker". I may have to stand up an OS/2 machine just to check out what is actually on those floppies.
You can't judge a book by its cover
Floppies were a convenient form of data interchange. Sometimes, I reused commercially-labeled floppies to hold personal files. So, just because a floppy says "America On-Line (AOL) version 2.5 Installation", I can't just toss it away. It might actually contain something else entirely. This means I need to mount each floppy to check on its actual contents.
So what will I do with the floppies I can't read, can't write, and can't format? I think I will convert them into a [retro set of coasters], to protect my new living room furniture from hot and cold beverages.
Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
Seagate (including Maxstor they acquired)
Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
Last week, I was in Austin, and had dinner at [Rudy's Country Store and BBQ]. They offer their self-proclaimed "Worst BBQ in Austin!" with brisket, sausage and other meats by weight. I got a beer, some potato salad, and creamed corn, all at additional cost, of course. When I went to the cashier to pay, I was offered all the white bread I wanted at no additional charge. Are you kidding me? You are going to charge me for beer, but give me 8 to 12 complimentary slices of white bread (practically half a loaf)? Honestly, I consider bread and beer to be basically the same functional food item, differing only in solid versus liquid form. I chose to have only four slices. The food was awesome!
I am reminded of that from my latest exchange with EMC.It didn't take long after IBM's announcement yesterday of IBM's continued investment in its strategic product set, IBM System Storage DS8000 series, that competitors responded. In particular, fellow blogger BarryB from EMC has a post [DS8000 Finally Gets Thin Provisioning] that pokes fun at the new Thin Provisioning feature.
Interestingly, the attack is not on the technical implementation, which is straightforward and rock-solid, but rather that the feature is charged at a flat rate of $69,000 US dollars (list price) per disk array. BarryB claims that recently EMC Corporate has decided to reduce the price of their own thin provisioning, called Symmetrix Virtual Provisioning (VP) on select subset of models of their storage portfolio, although I have not found an EMC press release to confirm. In other words, EMC will bury the cost of thin provisioning into the total cost for new sales, and stop shafting, er.. over-charging their existing Symmetrix customers that are interesting in licensing this feature.
BarryB claims this was a lucky coincidence that his blog post happened just days before IBM's announcement.
(Update: While the timing appears suspicious, I am not accusing Mr. Burke in anywrongdoing of insider information of IBM's plans, nor am I aware of any investigations on this matter from the SEC or any other government agency, and apologize if my previous attempt at humor suggested otherwise. BarryB claimsthat the reduction in price was motivated to counter publicly announced HDS's "Switch In On" program, that it is not a secret thatEMC reduced VP pricing weeks ago, effective beginning 3Q09, just not widely advertised in any formal EMC press releases.Perhaps this new VP pricing was only disclosed to just EMC's existing Symmetrix customers, Business Partners, and employees. Perhaps EMC's decision not to announce this in a Press Release was to avoid upsetting all the EMC CLARiiON customers that continue to pay for Thin Provisioning, or to avoid a long line of existing VP customers asking for refunds. In any case, people are innocent until proven otherwise, and BarryB rightfully deserves the presumption of innocence in this regard. I'm sorry, BarryB, for any trouble my previous comments may have caused you.)
Instead, let's explore some events over the past year that have led up to this.
Let's start with what EMC previously charged for this feature. Software features like this often follow a common pricing method, based per TB, so larger configurations pay more, but tiered in a manner that larger configurations pay less per TB, combined with a yearly maintenance cost.
(Updated: EMC has asked me nicely not to post their actual list prices,so I will provide rough estimates instead. According to BarryB, these are no longer the current prices, soI present them as historical figures for comparison purposes only.)
Initial List price
Software Maintenance (SWMA) percentage
Software Maintenance per year
Number of years
Software License Cost (4 years)
Holy cow! How did EMC get away charging so much for this? To be fair, these are often deeply discounted, a practice common among the industry. However, it was easy for IBMers to show EMC customers that putting SVC or N series gateways in front of their existing EMC disks was more cost effective. Both SVC and N series, as well as IBM's XIV, provide thin provisioning at no additional charge.
HDS offers their own thin provisioning called Hitachi Dynamic Provisioning.Hitachi also offers an SVC-like capability to virtualize storage behind the USP-V. However, I suspect thatfewer than 10 percent of their install base actually licensed this capability because it cost so much. Under the cost pressure from IBM's thin provisioning capabilities in SVC, XIV and N series, Hitachi launched its ["Switch It On"] marketing campaign to activate virtualization and provide some features at no additional charge, including the first 10TB of Hitachi Dynamic Provisioning.
Last week, Martin Glassborow on his StorageBod blog, argued that EMC and HDS should[Set the Wide Stripes Free]. Here is an excerpt:
HDS and EMC are both extremely guilty in this regard, both Virtual Provisioning and Dynamic Provisioning cost me extra as an end-user to license. But this is the technology upon which all future block-based storage arrays will be built. If you guys want to improve the TCO and show that you are serious about reducing the complexity to manage your arrays, you will license for free. You will encourage the end-user to break free from the shackles of complexity and you will improve the image of Tier-1 storage in the enterprise.
Martin is using the term "free" in two contexts above. In the Linux community, we are careful to clarify "free, as in free speech" or "free, as in free beer". Technically, EMC's virtual provisioning is neither, as one has to purchase the hardware to get the feature, so the term "at no additional charge" is more legally correct.
However, the discussion of "free beer" brings me back to my first paragraph about Rudy's BBQ. Nearly everyone eats bread, with the exception of those with [Celiac Disease] that causesan intolerance for gluten protein in wheat, so burying the cost of white bread in the base cost of the BBQ meat is reasonable. In contrast, not everyone drinks beer, and there are probably several people whowould complain if the cost of beer was included in the cost of the BBQ meat, so charging separately forbeer makes business sense.
The same applies in the storage industry. When all (or most) customers of a product can benefit from a feature, it makes sense to include it at no additional charge. When a significant subset might not want to pay a higher base price because they won't use or benefit from a feature, it makes sense to make it optionally priced.
For the IBM SVC, XIV and N series, all customers can benefit from thin provisioning, so it is included at no additional charge.
For the IBM System Storage DS8000, perhaps some 30 to 40 percent of our clients have only System z and/or System i servers attached, and therefore would not benefit from this new thin provisioning. It may seem unfair to raise the price on everybody. The $69,000 flat rate was competitively priced against the prices EMC, HDS and 3PAR were charging for similar capability, and lower than the cost to add a new SVC cluster in front of the DS8000. IBM also charges an annual maintenance, but far lower than what others charged as well.
(Note: These list prices are approximate, and vary slightly based on whether you are on legacy, ESA, Servicesuite or ServiceElect software and subscription (S&S) service plans, and the machine type/model. The tables were too complicated to include here in this post, so these numbers are rounded for comparison purposes only.)
IBM flat rate
Software Maintenance per year (approx)
Number of years
Software License Cost (4 years)
Pricing is more art than science. Getting the right pricing structure that appears fair to everyone involved can be a complicated process.
He feels I was unfair to accuse EMC of "proprietary interfaces" without spelling out what I was referring to. Here arejust two, along with the whines we hear from customers that relate to them.
EMC Powerpath multipathing driver
Typical whine: "I just paid a gazillion dollars to renew my annual EMC Powerpath license, so you will have to come back in 12 months with your SVC proposal. I just can't see explaining to my boss that an SVC eliminates the need for EMC Powerpath, throwing away all the good money we just spent on it, or to explain that EMC chooses not to support SVC as one of Powerpath's many supported devices."
EMC SRDF command line interface
Typical whine: "My storage admins have written tons of scripts that all invoke EMC SRDF command line interfacesto manage my disk mirroring environment, and I would hate for them to re-write this to use IBM's (also proprietary) command line interfaces instead."
Certainly BarryB is correct that IBM still has a few remaining "proprietary" items of its own. IBM has been in business over 80 years, but it was only the last 10-15 years that IBM made a strategic shift away from proprietary and over to open standards and interfaces. The transformation to "openness" is not yet complete, but we have made great progress. Take these examples:
The System z mainframe - IBM had opened the interfaces so that both Amdahl and Fujitsu made compatible machines.Unlike Apple which forbids cloning of this nature, IBM is now the single source for mainframes because the other twocompetitors could not keep up with IBM's progress and advancements in technology.
Update: Due to legal reasons, the statements referring to Hercules and other S/390 emulators havebeen removed.
The z/OS operating system - While it is possible to run Linux on the mainframe, most people associate the z/OSoperating system with the mainframe. This was opened up with UNIX System Services to satisfy requests from variousgovernments. It is now a full-fledged UNIX operating system, recognized by the [Open Group] that certifies it as such.
As BarryB alludes, the unique interfaces for disk attachment to System z known as Count-Key-Data (CKD) was published so that both EMC and HDS can offer disk systems to compete with IBM's high-end disk offerings. Linux on System zsupports standard Fibre Channel, allowing you to attach an IBM SVC and anyone's storage. Both z/OS and Linux on System z support NAS storage, so IBM N series, NetApp, even EMC Celerra could be used in that case.
The System i itself is still proprietary, but recently IBM announced that it will now support standard block size (512 bytes) instead of the awkward 528 byte blocks that only IBM and EMC support today. That means that any storage vendor will be ableto sell disk to the System i environment.
Advanced copy services, like FlashCopy and Metro Mirror, are as proprietary as the similar offerings from EMCand HDS, with the exception that IBM has licensed them to both EMC and HDS. Thanks to cross-licensing, you can do [FlashCopy on EMC] equipment. Getting all the storage vendors to agree to open standards for these copy services is still workin progress under [SNIA], but at least people who have coded z/OS JCL batchjobs that invoke FlashCopy utilities can work the same between IBM and EMC equipment.
So for those out there who thought that my comment about EMC's proprietary interfaces in any way implied thatIBM did not have any of its own, the proverbial ["pot calling the kettle black"] so to speak, I apologize.
BarryB shows off his [PhotoShop skills] with the graphic below. I take it as a compliment to be compared to anAll-American icon of business success.
TonyP and Monopoly's Mr. Pennybags Separated at Birth?
However, BarryB meant it as a reference back to long time ago when IBMwas a monopoly of the IT industry, which according to [IBM's History], ended in 1973. In other words, IBMstopped being a monopoly before EMC ever existed as a company, and long before I started working for IBM myself.
The anti-trust lawsuit that BarryB mentions happened in 1969, which forced IBM to separate some of the software from its hardware offerings, and prevented IBM from making various acquisitions for years to follow, forcing IBM instead into technology partnerships. I'm glad that's all behind us now!
Well, it's Tuesday, and you know what that means? IBM announcements!
Today we had several for the IBM System Storage product line. Here are some of them:
DS8000 gets thinner, leaner and faster
The 4.3 level of microcode for the IBM System Storage DS8000 series disk systems [announced enhancements] for both fixed block architecture (FBA) LUNs and count key data (CKD) volumes.
For FBA LUNs that attach to Linux, UNIX and Windows distributed systems, IBM announced DS8000 Thin Provisioning native support. Of course, many people already had this by putting IBM System Storage SAN Volume Controller (SVC) in front, but now DS8000 clients out there without SVC can also achieve benefits ofthin provisioning. This support also improves quick initialization a whopping 2.6 times faster.
For CKD volumes attached to z/OS on System z mainframes, IBM announced zHPF multitrack support for z/OS 1.9 and above. zHPF provide high performance FICON performance, and can now handle multitrack I/O transfers foreven better performance for zFS, HFS, PDSE, and extended striped data sets.
XIV gets better connected
A lot of XIV[announced enhancements] and preview announcements centered around better connectivity. Here's a run down:
Better host attachment connectivity by beefing up the interface modules that hold the FCP and iSCSI interface cards. XIV disk arrays have 3 to 6 of these in different configurations, and since they manage both their own disks,as well as receive host I/O requests for other disks, are basically doing double-duty.These interface modules can now be ordered as [Dual-CPU] modules.
Better infrastructure management by connecting XIV with the industry standard SMI-S interface to IBM Tivoli Storage Productivity Center. Now, XIV can be part of the single pane of glass console that manages all of your other disk arrays, tape libraries and SAN fabrics.
Better copy services for backups by connecting XIV with IBM Tivoli Storage Manager Advanced Copy Services. TSM for Advanced Copy Services is application aware and can coordinate XIV Snapshots similar to its current support for SVC and DS8000 FlashCopy capabilities.
Better connectivity to security systems by supporting LDAP credentials. Before, you had individual userid and passwords for each XIV, and these were probably different than all the other userid/password combinations you have for every other box on your data center floor. IBM is working on getting all products to support theLightweight Directory Access Protocol, or [LDAP] so that we can reach the nirvana of "single sign-on",one userid/password per administrator for all IT devices in the company.
Better support with flexible warranty periods and non-disruptive code load options.
Better remote copy support by connecting to sites far, far away. IBM previewed that it will provideasynchronous disk mirroring from one XIV to another XIV natively. Before this, XIV's synchronous mirroring was limited to 300km distances. Many of our clients do long distance global mirroring of their XIV today behind an SVC, but again, for those out there that don't yet have an SVC, this can be a reasonable alternative.
TS7650 ProtecTIER data deduplication appliance now offers "no dedupe" option
In what some might consider a surprising move, IBM announced a "no dedupe" licensing option on their premiere deduplication solution, which somewhat reminds me of IBM's NOCOPY option on DS8000 FlashCopy. At first I thought "Are you kidding me?!?!" However, this new license option allows the TS7650 appliance to compete with other virtual tape libraries (VTL) that do not offer deduplication capability on an even playing field. It also allows TS7650 to be used for data that doesn'tdedupe very well, such as seismic recordings, satellite images, or what have you. There are also clients who do not yet feel comfortable to dedupe their financial records for compliance reasons.This option now allows IBM to withdraw from marketing the TS7530 non-dedupe library. Having one technology thatdoes both dedupe and no-dedupe is better than offering two separate libraries based on different technologies.
The ProtecTIER series also announced [IP remote distance replication]. This can be used to replicate virtualtape cartridges in one ProtecTIER over to another ProtecTIER at a remote location. You can decide to replicateall or just a subset of your virtual tapes, and this feature can be used to migrate, merge or split ProtecTIERconfigurations as your needs grow. Before this support, our TS7650G clients replicated the disk repositoryusing native disk array replication technology, such as Global Mirror on the DS8000, but that meant that all data was replicated over to the secondary site. Now, with this new IP replication feature, you can be selective, and replicate only those virtual tapes that are mission critical.
The appliance now supports up to 36TB of disk capacity, and the new "IBM i" operating system on System i servers,formerly known as i5/OS.
GPFS does Windows
IBM's General Parallel File System (GPFS) has the lion's marketshare of file systems used in the [Top 500 Supercomputers]. For a while, it was limited to just Linux and AIX operating system support, but version 3.3 [extends this to Windows 2008 on 64-bit architectures]. GPFS isthe file system used in IBM's Scale-Out File Services, the underlying technology of IBM's Cloud Computing and Storage offerings.
Well, it's Tuesday, and so it is "announcement day" again! Actually, for me it is Wednesday morning herein Mumbai, India, but since I was "press embargoed" until 4pm EDT in talking about these enhancements, I had to wait until Wednesday morning here to talk about them.
World's Fastest 1TB tape drive
IBM announced its new enterprise [TS1130 tape drive]and corresponding [TS3500 tape library support]. This one has a funny back-story. Last week while we were preparing the Press Release, we debated on whether we should compare the 1TB per cartridge capacity as double that of Sun's Enterprise T10000 (500GB), or LTO-4 (800GB). The problem changed when Sun announced on Monday they too had a 1TB tape drive, so now instead ofsaying that we had the "World's First 1TB tape drive", we quickly changed this to the "World's Fastest 1TB tape drive" instead. At 160MB/sec top speed, IBM's TS1130 is 33 percent faster than Sun's latest announcement. Sun was rather vague when they will actually ship their new units, so IBM may still end up being first to deliver as well.
While EMC and other disk-only vendors have stopped claiming that "tape is dead", these recent announcements from IBM and Sun indicate that indeed tape is alive and well. IBM is able to borrow technologies from disk, such as the Giant Magneto Resistive (GMR) head over to its tape offerings, which means much of the R&D for disk applies to tape, keeping both forms ofstorage well invested. Tape continues to be the "greenest" storage option, more energy efficient than disk, optical, film, microfiche and even paper.
On the LTO front, IBM enhanced the reporting capabilities of its[TS3310] midrange tape library. This includes identifying the resource utilization of the drives, reporting on media integrity, and improved diagnostics to support library-managed encryption.
IBM System Storage DR550
As a blended disk-and-tape solution, the [IBM System Storage DR550] easily replaces the EMC Centera to meet compliance storagerequirements. IBM announced that we have greatly expanded its scalability, being able to support both 1TBdisk drives, as well as being able to attach to either IBM or Sun's 1TB tape drives.
Massive Array of Idle Disks (MAID)
IBM now offers a "Sleep Mode" in the firmware of the [IBM System Storage DCS9550], which is often called "Massive Array of Idle Disks" (MAID) or spin-down capability. This can reduce the amount of power consumed during idle times.
That's a lot of exciting stuff. I'm off to breakfast now.