Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Well it's Tuesday again, and you know what that means... IBM announcements! Yesterday, at the IBM Edge conference here in Orlando, Florida, IBM announced its new apporach to storage, and a whole bunch of storage products, enhancements, and services. I will focus on some key ones here, and save the rest for next week.
IBM SAN Volume Controller (SVC) v6.4
The SVC is IBM's enterprise-class storage hypervisor. The latest software release, v6.4, can be installed on any SVC hardware, from the 2145-8F2 introduced back in 2005, to newer models like the 2145-CG8. Here are the key features:
Fibre Channel over Ethernet (FCoE) -- This is complete end-to-end support. For SVC units with 10GbE ports, these ports can be now be used for FCoE. This allows hosts to attach to SVC via FCoE, allows SVC node-to-node communication for clustering, and allows SVC to communicate to back-end devices via FCoE.
Real-Time Compression -- IBM ported over the patent Random Access Compression Engine (RACE) from the Real-Time Compression Appliances to SVC v6.4. This allows primary data, accessed via block-based protocols, to be compressed up to 80 percent. This feature is an extra priced feature by TB.
Non-Disruptive Volume move between I/O Groups -- If you don't already have SVC, you don't need to worry about this. For existing SVC customers, this allows volumes to be associated with two or more I/O groups, and that you can add or remove I/O groups non-disruptively. For example, if you want to move a volume from IOG1 to IOG2, then you add IOG2 to the list of I/O groups for the volume, let the multi-pathing software discover the additional paths, the remove IOG1, which then marks the previous IOG1 paths inactive. All this can be done while applications read and write data.
Dedicate FCP ports for Replication -- If you activate the two 10GbE Ethernet ports for FCoE, you can free up two FCP ports that you can dedicate for long-distance Metro Mirror or Global Mirror.
If you have SVC today, but are running an old release like v4.3 or v5.1, I recommennd you upgrade up to at least v6.2.05 release now. This release has been out for a year and is very stable, and serves as a great platform for a later upgrade to SVC v6.4.
IBM Storwize V7000 v6.4
The Storwize V7000 is IBM's midrange storage hypervisor. The latest software release, v6.4, can be installed on existing block-only Storwize V7000 units in the field. The Storwize V7000 v6.4 gets all the features listed above, as well as the following:
Four-way clustering -- Previously, you could cluster two Storwize V7000 controller enclosures together (4 canisters total). To cluster three or four controllers required an RPQ. Now, IBM supports up to four Storwize V7000 controller enclosures (8 canisters) without an RPQ.
Direct Fibre Channel attach -- A lot of people are using Storwize V7000 inside single-rack configurations, so it makes sense not to require a SAN switch for just a few Windows, Linux or VMware servers. An RPQ is now available to allow this to happen.
IBM Tivoli Storage Productivity Center (TPC) v5.1
TPC is already ranked one of the best Storage Infrastructure Management software in the market, and this release will just solidify its lead. Key features include:
Upward integration to higher level management systems
A new, intuitive, easy-to-use web-based GUI inspired by the XIV GUI
Integration of COGNOS to be able to generate and customize reports
Support for SONAS systems
There are several presentations on TPC this week that will go into more detail. Check out the [TPC Facebook page].
My latest book Inside System Storage: Volume IV is now available!
Yes, can you believe it? I have published my fourth volume in my "Inside System Storage" series! It is available in three formats:
Hardcover with dust jacket
eBook (Adobe Acrobat PDF)
You can order this, and all my other books, in all formats, directly from my [Author Spotlight] page. The paperback will also be available soon from other online booksellers, search for ISBN 978-1-105-72213-4.
IBM DS3500 Express
The DS3500 is our entry-level block-based device, designed specifically for random I/O workloads. This includes databases, email repositories, traditional business applications, and on-line transactional workloads. Here are the new features:
Dynamic Disk Pooling, similar to what XIV does to reduce disk rebuild times, but using a RAID-6 like approach per chunk of data.
Thin Provisioning using Dynamic Disk Pooling
Asynchronous Logical Unit Access (ALUA) failover
Enhanced FlashCopy, improved scalability, consistency groups and rollback support
VMware API for Array Integration (VAAI) support. This includes Write Same, Extended Copy, and Atomic Test & Set.
The DS3500 replaces the previous models of DS3200, DS3300 and DS3400 models.
The DCS3700 is our entry-level/midrange block-based device, replacing the DCS9900 model, designed specifically for sequential I/O workloads. This includes Big Data analytics, Hadoop, High Performance Computing (HPC), video surveillance, and television broadcasting. It holds 60 drives in a 4U controller enclosure.
When I turned on the television last weekend, I saw large waves of water knock down rows of small houses. I thought I had caught the end of a bad Godzilla movie, but sadly it was not movie special effects. Mother Nature can be quite destructive. Over the past four days, Japan has been hit hard by a series of earthquakes and resulting tsunami.
(Note: Disasters can happen anywhere and at any time. Last month, New Zealand had an earthquake as well. It is best to always be prepared. If you haven't done so lately, check out the latest recommendations from the US Government [Ready.Gov] website.)
Several have asked me how this tragedy in Japan might affect IBM and its clients. Here is what I have gathered from various sources. All IBM Japan employees have survived, are safe and reporting no major injuries. IBM has four major facilities, near central part of the country around Tokyo, far from Sendai, the epicenter. All IBM buildings are still standing and operational. A few sections of Tokyo are affected by scheduled brown-outs in an effort to save electricity. Employees are asked to telecommute (a.k.a. work from home) to minimize traffic congestion.
Hakozaki - Headquarters and executive briefing center
Makuhari - Technical Center, where we often hold conferences and other events
Yamato - Research Facility, where R&D is done for IBM tape storage products
Toyosu - Service Delivery Center
I have been to Japan many times throughout my career. Back in the summer of 1995, IBM sent me to Osaka to help out clients in the aftermath of the Great Hanshin eartquake near Kobe. I remember it well, sending an email back to my team saying "It is 1995, and here in Japan it is 95 degrees and 95 percent humidiy." It was seven months after the earthquake, but people were still living in cardboard boxes and make-shift tents.
Many people asked if I will be going back to Japan to help out. I speak Japanese, can make sense of the Japanese Katakana characters on computer monitors, and am an expert in Disaster Recovery. However, the IBM Japan team is doing an awesome job helping our clients restore their data and recovery their business operations. Of course, if IBM needs me in Japan, I will gladly go, but so far, it doesn't seem that I am needed there.
Well, it's Tuesday again, and you know what that means! IBM announcements!
Today, I am in New York visiting clients. The weather is a lot nicer than I expected. Here is a picture of the Hudson River through some trees with leaves turning color. Something we don't see in Tucson! Our cactus and pine trees stay green year-round!
The announcements today center around the IBM PureSystems family of expert integrated systems. The PureFlex is based on Flex System components. The Flex System chassis is 10U high that hold 14 bays, consisting of 7 rows by 2 columns. Computer and Storage nodes fit in the front, and switches, fans and power supplies in the back. Here is a quick recap:
IBM Flex System Compute Nodes
The x220 Compute Node is a single-bay low-power 2-socket x86 server. The x440 Compute Node is a powerful double-bay (1 row, 2 columns). The p260 Compute Node is a single-bay server based on the latest POWER7+ CPU processor.
IBM Flex System Expansion Nodes
Do you remember those old movies where a motorcycle would have a sidecar that could hold another passenger, or extra cargo? IBM introduces "Expansion Nodes" for the x200 series single-bay Compute nodes. The idea here is that in a single column, you have one bay for the Compute node, and then on the side in the next bay (same column) you have an Expanions node. There are two choices:
Storage Expansion Node allows you to have eight additional drives
PCIe Expansion Node allows to to have four PCIe cards, which could include the SSD-based PCIe cards from IBM's recent acquisition, Texas Memory Systems.
There are times where one or two internal drives are just not enough storage for a single server, and these expanion nodes could just be the perfect solution for some use cases.
IBM Flex System V7000 Storage Node
I saved the best for last! The Flex System V7000 Storage Node is basically the IBM Storwize V7000 repackaged to fit into the Flex System chassis. This means that in the front of the chassis, the Flex System V7000 takes up four bays (2 rows by 2 columns). In the back of the chassis are the power supplies, fans and switches.
The new Flex System V7000 supports everything the Storwize V7000 does except the upgrade to "Unified" through file modules. For those who want to have Storwize V7000 Unified in their PureFlex systems, IBM will continue to offer the outside-the-chassis original Storwize V7000 that can have two file modules added for NFS, CIFS, HTTPS, FTP and SCP protocol support.
IBM Flex System Converged Network Switch
The Converged Network Switch provide Fibre Channel over Ethernet (FCoE) directly from the chassis. This eliminates the need for a separate "Top-of-Rack" switch, and allows the new Flex System V7000 Storage Node to externally virtualize FCoE-based disk arrays.
Patterns of Expertise for Infrastructure
The original patterns of expertise focused on the PureApplication Systems. Now IBM has added some for the Infrastructure on PureFlex systems.
IBM has sold over 1,000 Flex System and PureFlex systems, across 40 different countries around the world, since their introduction a few months ago in April! These latest enhancements will help solidify IBM's industry leadership,
While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.
"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this." -- Hu Yoshida
Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.
First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.
Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.
IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.
IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.
IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.
IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.
This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.
IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.
IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.
Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).
IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.
The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).
Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.
IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.
So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.
Continuing coverage of my week in Washington DC for the annual [2010 System Storage Technical University], I attended several XIV sessions throughout the week. There were many XIV sessions. I could not attend all of them. Jack Arnold, one of my colleagues at the IBM Tucson Executive Briefing Center, often presents XIV to clients and Business Partners. He covered all the basics of XIV architecture, configuration, and features like snapshots and migration. Carlos Lizarralde presented "Solving VMware Challenges with XIV". Ola Mayer presented "XIV Active Data Migration and Disaster Recovery".
Here is my quick recap of two in particular that I attended:
XIV Client Success Stories - Randy Arseneau
Randy reported that IBM had its best quarter ever for the XIV, reflecting an unexpected surge shortly after my blog post debunking the DDF myth last April. He presented successful case studies of client deployments. Many followed a familiar pattern. First, the client would only purchase one or two XIV units. Second, the client would beat the crap out of them, putting all kinds of stress from different workloads. Third, the client would discover that the XIV is really as amazing as IBM and IBM Business Partners have told them. Finally, in the fourth phase, the client would deploy the XIV for mission-critical production applications.
A large US bank holding company managed to get 5.3 GB/sec from a pair of XIV boxes for their analytics environment. They now have 14 XIV boxes deployed in mission-critical applications.
A large equipment manufacturer compared the offerings among seven different storage vendors, and IBM XIV came out the winner. They now have 11 XIV boxes in production and another four boxes for development/test. They have moved their entire VMware infrastructure to IBM XIV, running over 12,000 guest instances.
A financial services company bought their first XIV in early 2009 and now has 34 XIV units in production attached to a variety of Windows, Solaris, AIX, Linux servers and VMware hosts. Their entire Microsoft Exchange was moved from HP and EMC disk to IBM XIV, and experienced noticeable performance improvement.
When a University health system replaced two competitive disk systems with XIV, their data center temperature dropped from 74 to 68 degrees Fahrenheit. In general, XIV systems are 20 to 30 percent more energy efficient per usable TB than traditional disk systems.
A service provider that had used EMC disk systems for over 10 years evaluated the IBM XIV versus upgrading to EMC V-Max. The three year total cost of ownership (TCO) of EMC's V-Max was $7 Million US dollars higher, so EMC counter-proposed CLARiiON CX4 instead. But, in the end, IBM XIV proved to be the better fit, and now the customer is happy having made the switch.
The manager of an information communications technology service provider was impressed that the XIV was up and running in just a couple of days. They now have over two dozen XIV systems.
Another XIV client had lost all of their Computer Room Air Conditioning (CRAC) units for several hours. The data center heated up to 126 degrees Fahrenheit, but the customer did not lose any data on either of their two XIV boxes, which continued to run in these extreme conditions.
Optimizing XIV Performance - Brian Cormody
This session was an update from the [one presented last year] by Izhar Sharon. Brian presented various best practices for optimizing the performance when using specific application workloads with IBM XIV disk systems.
Oracle ASM: Many people allocate lots of small LUNs, because this made sense a long time ago when all you had was just a bunch of disks (JBOD). In fact, many of the practices that DBAs use to configure databases across disks become unnecessary with XIV. Wth XIV, you are better off allocating a few number of very large LUNs from the XIV. The best option was a 1-volume ASM pool with 8MB AU stripe. A single LUN can contain multiple Oracle databases. A single LUN can be used to store all of the logs.
VMware: Over 70 percent of XIV customers use it with VMware. For VMFS, IBM recommends allocating a few number of large LUNs. You can specify the maximum of 2181 GB. Do not use VMware's internal LUN extension capability, as IBM XIV already has thin provisioning and works better to allow XIV to do this for you. XIV Snapshots provide crash-consistent copies without all the VMware overhead of VMware Snapshots.
SAP: For planning purposes, the "SAPS" unit equates roughly to 0.4 IOPS for ERP OLTP workloads, and 0.6 IOPS for BW/BI OLAP workloads. In general, an XIV can deliver 25-30,000 IOPS at 10-15 msec response time, and 60,000 IOPS at 30 msec response time. With SAP, our clients have managed to get 60,000 IOPS at less than 15 msec.
Microsoft Exchange: Even my friends in Redmond could not believe how awesome XIV was during ESRP testing. Five Exchange 2010 servers connected two a pair of XIV boxes using the new 2TB drawers managed 40,000 mailboxes at the high profile (0.15 IOPS per mailbox). Another client found four XIV boxes (720 drives) was able to handle 60,000 mailboxes (5GB max), which would have taken over 4000 drives if internal disk drives were used instead. Who said SANs are obsolete for MS Exchange?
Asynchronous Replication: IBM now has an "Async Calculator" to model and help design an XIV async replication solution. In general, dark fiber works best, and MPLS clouds had the worst results. The latest 10.2.2 microcode for the IBM XIV can now handle 10 Mbps at less than 250 msec roundtrip. During the initial sync between locations, IBM recommends setting the "schedule=never" to consume as much bandwidth as possible. If you don't trust the bandwidth measurements your telco provider is reporting, consider testing the bandwidth yourself with [iPerf] open source tool.
In my presentations in Australia and New Zealand, I mentioned that people were re-discovering the benefits of removable media. While floppy diskettes were convenient way of passing information from one person to another, they unfortunately did not have enough capacity. In today's world, you may need Gigabytes or Terabytes of re-writeable storage with a file system interface that can easily be passed from one person to another. In this post, I explore three options.
(FCC Disclaimer: I work for IBM, and IBM has no business relationship with Cirago at the time of this writing. Cirago has not paid me to mention their product, but instead provided me a free loaner that I promised to return to them after my evaluation is completed. This post should not be considered an endorsement for Cirago's products. List prices for Cirago and IBM products were determined from publicly available sources for the United States, and may vary in different countries. The views expressed herein may not necessarily reflect the views and opinions of either IBM or Cirago.)
I took a few photos so you can see what exactly this device looks like. Basically, it is a plastic box that holds a single naked disk drive. It has four little rubber feet so that it does not slip on your desk surface.
The inside is quite simple. The power and SATA connections match those of either a standard 3.5 inch drive, or the smaller form factor (SFF) 2.5 inch drive. However, to my dismay, it does not handle EIDE drives which I have a ton of. After taking apart six different computer systems, I found only one had SATA drives for me to try this unit out with.
The unit comes with a USB cable and AC/DC power adapter. In my case, I found the USB 3.0 cable too short for my liking. My tower systems are under my desk, but I like keeping docking stations like this on the top of the desk, within easy reach, but that wasn't going to happen because the USB cable was not long enough.
Instead, I ended up putting it half-way in between, behind my desk, sitting on another spare system. Not ideal, but in theory there are USB-extension cables that probably could fix this.
Here it is with the drive inside. I had a 3.5 inch Western Digital [1600AAJS drive] 160 GB, SATA 3 Gbps, 8 MB Cache, 7200 RPM.
To compare the performance, I used a dual-core AMD [Athlon X2] system that I had built for my 2008 [One Laptop Per Child] project. To compare the performance, I ran with the drive externally in the Cirago docking station, then ran the same tests with the same drive internally on the native SATA controller. Although the Cirago documentation indicated that Windows was required, I used Ubuntu Linux 10.04 LTS just fine, using the flexible I/O [fio] benchmarking tool against an ext3 file system.
Sequential Write - a common use for external disk drive is backup.
Random read - randomly read files ranging from 5KB to 10MB in size.
Random mixed - randomly read/write files (50/50 mix) ranging from 5KB to 10MB in size.
Random Mixed (50/50)
Latency (msec) read
Latency (msec) write
Bandwidth (KB/s) read
Bandwidth (KB/s) write
For sequential write, the Cirago performed well, only about 15 percent slower than native SATA. For random workloads, however, it was 30-40 percent slower. If you are wondering why I did not get USB 3.0 speeds, there are several factors involved here. First, with overheads, 5 Gbps USB 3.0 is expected to get only about 400 MB/sec. My SATA 2.0 controller maxes out at 375 MB/sec, and my USB 2.0 ports on my system are rated for 57 MB/sec, but with overheads will only get 20-25 MB/sec. Most spinning drives only get 75 to 110 MB/sec. Even solid-state drives top out at 250 MB/sec for sustained activity. Despite all that, my internal SATA drive only got 16 MB/sec, and externally with the Cirago 14 MB/sec in sustained write activity.
Here is the mess that is inside my system. The slot for drive 2 was blocked by cables, memory chips and the heat sink for my processor. It is possible to damage a system just trying to squeeze between these obstacles.
However, the point of this post is "removable media". Having to open up the case and insert the second drive and wire it up to the correct SATA port was a pain, and certainly a more difficult challenge than the average PC user wishes to tackle.
Price-wise, the Cirago lists for $49 USD, and the 160GB drive I used lists for $69, so the combination $118 is about what you would pay for a fully integrated external USB drive. However, if you had lots of loose drives, then this could be more convenient and start to save you some money.
IBM RDX disk backup system
Another problem with the Cirago approach is that the disk drives are naked, with printed circuit board (PCB) exposed. When not in the docking station, where do you put your drive? Did you keep the [anti-static ESD bag] that it came in when you bought it? And once inside the bag, now what? Do you want to just stack it up in a pile with your other pieces of equipment?
To solve this, IBM offers the RDX backup system. These are fully compatible with other RDX sytems from Dell, HP, Imation, NEC, Quantum, and Tandberg Data. The concept is to have a docking station that takes removable, rugged plastic-coated disk-enclosed cartridges. The docking station can be part of the PC itself, similar to how CD/DVD drives are installed, or as a stand-alone USB 2.0 system, capable of processing data up to 25 MB/sec.
The idea is not new, about 10 years ago we had [Iomega "zip" drives] that offered disk-enclosed cartridges with capacities of 100, 250 and 750MB in size. Iomega had its fair share of problems with the zip drive, which were ranked in 2006 as the 15th worst technology product of all time, and were eventually were bought out by EMC two years later (as if EMC has not had enough failures on its own!)
The problem with zip drives was that they did not hold as much as CD or DVD media, and were more expensive. By comparison, IBM RDX cartridges come in 160GB to 750GB in size, at list prices starting at $127 USD.
IBM LTO tape with Long-Term File System
Removable media is not just for backup. Disk cartridges, like the IBM RDX above, had the advantage of being random access, but most tape are accessed sequentially. IBM has solved this also, with the new IBM Long Term File System [LTFS], available for LTO-5 tape cartridges.
With LFTS, the LTO-5 tape cartridge now can act as a super-large USB memory stick for passing information from one person to the next. The LTO-5 cartridge can handle up to 3TB of compressed data at up to SAS speeds of 140 MB/sec. An LTO-5 tape cartridge lists for only $87 USD.
The LTO-5 drives, such as the IBM [TS2250 drive] can read LTO-3, LTO-4 and LTO-5cartridges, and can write LTO-4 and LTO-5 cartridges, in a manner that is fully compatible with LTO drives from HP or Quantum. LTO-3, LTO-4 and LTO-5 cartridges are available in WORM or rewriteable formats. LTO-4 and LTO-5 cartridges can be encrypted with 256-bit AES built-in encryption. With three drive manufacturers, and seven cartridge manufacturers, there is no threat of vendor lock-in with this approach.
These three options offer various trade-offs in price, performance, security and convenience. Not surprisingly, tape continues to be the cheapest option.
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
Well, it's Tuesday again, and you know what that means! IBM Announcements! Typically, IBM System Storage has three to five major product launches per year. Making announcements every Tuesday would have been two frequent, and having one big announcement every two or three years would be too far apart. Worldwide combined revenues for storage hardware and software grew double digits last year, comparing full-year 2011 to the prior 2010 year, and I am sure that 2012 will also be a good year for IBM as well! This week we have announcements for both disk and tape, but since 2012 is the 60th Diamond Anniversary for tape, I will start with tape systems first.
TS1140 support for JA/JJ tape cartridges
The TS1140 enterprise tape drive was announced at the [Storage Innovation Executive Summit] last May. It supported a new E07 format on three different new tape cartridges. Models "JC" was 4.0TB standard re-writeable tapes, "JY" was 4.0TB WORM tapes, and "JK" were 500GB economy tapes that were less expensive, but offered faster random access.
Generally, IBM has adopted an N-2 read, N-1 write [backward compatibility]. This means that the TS1140 could read E05 and E06 formatted tapes on JB and JX media, and could write E06 format on JB and JX media. However, there are a lot of older JA and JJ media, especially as part of TS7740 environments, so IBM now supports TS1140 drives to read J1A formatted JA and JJ media. This is not just for TS7740 environments, any TS1140 in stand-alone or tape library configurations will support this as well.
TS7700 R2.1 enhancements
IBM is a leader in tape virtualization with or without physical tape as back-end media. There are two hardware models of the [IBM Virtualization Engine TS7700 family] for the IBM System z mainframe. These virtual libraries are referred to as "clusters" in IBM literature.
The TS7740 Virtual Tape Library supports putting virtual tape images on disk first, then move less-active data to physical tape, which I covered in my blog post [IBM Announcements - July 2007].
A unique feature of the TS7700 series is support for a Grid configuration, which allows up to six different TS7700 clusters to be grouped into a single instance image. These clusters can be in local or remote locations, connected via WAN or LAN connections.
R2.1 is the latest software release of this successful IBM's TS7700 series.
True Sync Mode Copy. Before R2.1, the TS7700 offered "immediate mode copy". An application would write to a virtual tape, and when it was done with the tape and performed an unmount, the TS7700 would then replicate the tape contents to a secondary cluster on the grid. With True Sync Mode, data contents are replicated per implicit or explicit SYNC points. This is another IBM first in the IT tape industry.
Remote Mount Fail-over. When you have two or more TS7700 clusters in a grid configuration, you can do remote mounts. We've added fail-over multi-pathing up to four paths, so that if a link to a remote cluster is down, it will try one of the others instead.
Parallel Copies and Pre-Migration. On of my 19 patents is for the pre-migration feature for the IBM 3494 Virtual Tape Server (VTS) that carries forward into the TS7700, and is also used in the SONAS and Information Archive products. However, when the grid architecture was introduced, the engineers decided not to allow pre-migration and copies to secondary clusters to occur concurrently. Now these two operations can be done in parallel.
Merge two grids into one grid. Now that we can support up to six clusters into a single grid, we have people with 2-cluster and 3-cluster grids looking to merge them into one. Of course, all the logical and physical volume serials (VOLSER) must be unique!
Accelerate off JA/JJ Media. There are a lot of older JA and JJ media still in TS7700 libraries. This feature allows customers to speed up the transition to newer physical tape media.
Copy Export to E06 format on JB media. This one is clever, and I have to say I would have never thought about it. Let's say you have a TS7740 with TS1140 drives, but you want to export some virtual tapes to physical media to be sent to someone who only has a TS7740 connected with older TS1130 drives. These older drives can't read new JC media nor make sense of the E07 format. This feature will let you export to older JB media in E06 format so that it will be fully readable at the new location on the TS1130 drives.
Copy Export Merge service offering. Thanks to mergers and acquisitions, it is sometimes necessary to split off a portion of data from a TS7700 grid. In the past, IBM supported sending this export to a completely empty TS7700 library, but this new service offerings allows the export to be merged into an existing TS7700 that already contains data.
LTFS-SDE support for Mac OS X 10.7 Lion
How do people still not yet know about the Linear Tape File System [LTFS]? I mentioned this in my blogs back in 2010 in [April], [September], and [November]. Last year, LTFS was the [NAB Show Pick Hits Award] and an [Emmy] for revolutionizing the use of digital tape in Television broadcasting.
In layman's terms, the Single Drive Edition [LTFS-SDE] allows a tape cartridge to be treated like USB memory stick. It is supported on the LTO5 tape drives for systems running various levels of Windows, Linux and Mac OS X. Prior to this announcement, IBM supported Snow Leopard (10.5.6) and Leopard (10.6), and now supports Mac OS X 10.7 "Lion" release.
IBM first introduced Solid-State Drives (SSD) back in 2007 where it made sense the most, in [drive-for-drive replacements on blade servers in the IBM BladeCenter]. Blade servers typically only have a single drive, and SSD are both faster and use less energy on a drive-for-drive comparison, so this provided immediate benefit. Today, SSD are available on a variety of System x and POWER system servers.
In 2008, IBM rocked the world by being the first to reach [1 Million IOPS with Project Quicksilver]. This was an all-SSD configuration which many considered unrealistic (at the time), but it showed the potential for solid state drives.
When the [XIV Gen3 was Announced - July 2011], each module included an 1.8-inch "SSD-Ready" slot in the back. IBM made a Statement of Direction that IBM would someday offer SSD drives to put in these slots. Today's announcement is that IBM has finalized the qualification process, so now XIV Gen3 clients can have 400GB of usable non-volatile SSD read cache added to each module. This SSD can be added to existing XIV Gen3 boxes in the field, or it can be factory-installed in new shipments. If you have a 15-module XIV, that's 6TB of additional read cache! This SSD is entirely managed by the XIV Gen3, so you won't have to spend weeks reading manuals or specifying configuration parameters.
When you carve volumes on the XIV, you now have an option to enable or disable use of the SSD cache for each volume. Since XIV is being used in private and public cloud deployments, this offers the ability to offer premium performance at premium prices. The use of SSD is complementary to IBM XIV Quality of Service (QoS) performance levels, which are determined by host instead.
Well, that's the first major IBM System Storage launch of 2012. Let me know what you think in the comment section below.
This week, Hitachi Ltd. announced their next generation disk storage virtualization array, the Virtual Storage Platform, following on the success of its USP V line. It didn't take long for fellow blogger Chuck Hollis (EMC) to comment on this in his blog post [Hitachi's New VSP: Separating The Wheat From The Chaff]. Here are some excerpts:
"Well, we all knew that Hitachi (through HDS and HP) would be announcing some sort of refresh to their high-end storage platform sooner or later.
As EMC is Hitachi's only viable competitor in this part of the market, I think people are expecting me to say something.
If you're a high-end storage kind of person, your universe is basically a binary star: EMC and Hitachi orbiting each other, with the interesting occasional sideshow from other vendors trying to claim relevance in this space."
Chuck implies that neither Hewlett-Packard (HP) nor Hitachi Data Systems (HDS) as vendors provide any value-add from the box manufactured by Hitachi Ltd. so combines them into a single category. I suspect the HP and HDS folks might disagree with that opinion.
When I reminded Chuck that IBM was also a major player in the high-end disk space, his response included the following gem:
"Many of us in the storage industry believe that IBM currently does not field a competitive high-end storage platform. IDC market share numbers bear out this assertion, as you probably know."
While Chuck is certainly entitled to his own beliefs and opinions, believing the world is flat does not make it so. Certainly, I doubt IDC or any other market research firm has put out a survey asking "Do you think IBM offers a competitive high-end disk storage platform?" Of course, if Chuck is basing his opinion on anecdotal conversations with existing EMC customers, I can certainly see how he might have formed this misperception. However, IDC market share numbers don't support Chuck's assertion at all.
There is no industry-standard definition of what is a "high-end" or "enterprise-class" disk system. Some define high-end as having the option for mainframe attachment via ESCON and/or FICON protocol. Others might focus on features, functionality, scalability and high 99.999+ percent availability. Others insist high-end requires block-oriented protocols like FC and iSCSI, rather than file-based protocols like NAS and CIFS.
For the most demanding mission-critical mix of random and sequential workloads, IBM offers the [IBM System Storage DS8000 series] high-end disk system which connects to mainframes and distributed servers, via FCP and FICON attachment, and supports a variety of drive types and RAID levels. The features that HP and HDS are touting today for the VSP are already available on the IBM DS8000, including sub-LUN automatic tiering between Solid-State drives and spinning disk, called [Easy Tier], thin provisioning, wide striping, point-in-time copies, and long distance synchronous and asynchronous replication.
There are lots of analysts that track market share for the IT storage industry, but since Chuck mentions [IDC] specifically, I reviewed the most recent IDC data, published a few weeks ago in their "IDC Worldwide Quarter Disk Storage Tracker" for 2Q 2010, representing April 1 to June 30, 2010 sales. Just in case any of the rankings have changed over time, I also looked at the previous four quarters: 2Q 2009, 3Q 2009, 4Q 2009 and 1Q 2010.
(Note: IDC considers its analysis proprietary, out of respect for their business model I will not publish any of the actual facts and figures they have collected. If you would like to get any of the IDC data to form your own opinion, contact them directly.)
In the case of IDC, they divide the disk systems into three storage classes: entry-level, midrange and high-end. Their definition of "high-end" is external RAID-protected disk storage that sells for $250,000 USD or more, representing roughly 25 to 30 percent of the external disk storage market overall. Here are IDC's rankings of the four major players for high-end disk systems:
By either measure of market share, units (disk systems) or revenue (US dollars), IDC reports that IBM high-end disk outsold both HDS and HP combined. This has been true for the past five quarters. If a smaller start-up vendor has single digit percent market share, I could accept it being counted as part of Chuck's "occasional sideshow from other vendors trying to claim relevance", but IBM high-end disk has consistently had 20 to 30 percent market share over the past five quarters!
Not all of these high-end disk systems are connected to mainframes. According to IDC data, only about 15 to 25 percent of these boxes are counted under their "Mainframe" topology.
Chuck further writes:
"It's reasonable to expect IBM to sell a respectable amount of storage with their mainframes using a protocol of their own design -- although IBM's two competitors in this rather proprietary space (notably EMC and Hitachi) sell more together than does IBM."
The IDC data doesn't support that claim either, Chuck. By either measure of market share, units (disk systems) or revenue (US dollars), IDC reports that IBM disk for mainframes outsold all other vendors (including EMC, HDS, and HP) combined. And again, this has been true for the past five quarters. Here is the IDC ranking for mainframe disk storage:
IBM has over 50 percent market share in this case, primarily because IBM System Storage DS8000 is the industry leader in mainframe-related features and functions, and offers synergy with the rest of the z/Architecture stack.
So Chuck, I am not picking a fight with you or asking you to retract or correct your blog post. Your main theme, that the new VSP presents serious competition to EMC's VMAX high-end disk arrays, is certainly something I can agree with. Congratulations to HDS and HP for putting forth what looks like a viable alternative to EMC's VMAX.
To learn more about IBM's upcoming products, register for next week's webcast "Taming the Information Explosion with IBM Storage" featuring Dan Galvan, IBM Vice President, and Steve Duplessie, Senior Analyst and Founder of Enterprise Storage Group (ESG).
Continuing my post-week coverage of the [Data Center 2010 conference], Wednesday evening we had six hospitality suites. These are fun informal get-togethers sponsored by various companies. I present them in the order that I attended them.
Intel - The Silver Lining
Intel called their suite "The Silver Lining". Magician Joel Bauer wowed the crowds with amazing tricks.
Intel handed out branded "Snuggies". I had to explain to this guy that he was wearing his backwards.
i/o - Wrestling with your Data Center?
New-comer "i/o" named their suite "Wrestling with your Data Center?" They invited attendees frustrated with their data centers to don inflated Sumo Wrestling suits.
APC by Schneider Electric - Margaritaville
This will be the last year for Margaritaville, a theme that APC has used now for several years at this conference.
Cisco - Fire and Ice
Cisco had "Fire and Ice" with half the room decorated in Red for fire, and White for ice.
This is Ivana, welcoming people to the "Ice" side.
This is Peter, on the "Fire" side. Cisco tried to have opposites on both sides, savory food on one side, sweets on the other.
CA Technologies - Can you Change the Game?
CA Technologies offered various "sports games", with a DJ named "Coach".
Compellent - Get "Refreshed" at the Fluid Data Hospitality Suite
Compellent chose a low-key format, "lights out" approach with a live guitarist. They had hourly raffles for prizes, but it was too dark to read the raffle ticket numbers.
Of the six, my favorite was Intel. The food was awesome, the Snuggies were hilarious, and the magician was incredibly good. I would like to think Intel for providing me super-secret inside access to their Cloud Computing training resources and for the Snuggie!
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData blog, posted a ["bleg"] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
Keeping more backup data available online for fast recovery.
Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
Will this meet my current business requirements?
Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
Did you miss IBM Pulse 2013 this week? I wasn't there either, having scheduled visits with clients in Washington DC this week, only to have those meetings cancelled due to the [U.S. sequestration cuts].
Fortunately, there are plenty of videos and materials to review from the event. Here's a [12-minute video] interview between Laura DuBois, Program VP of Storage for industry analyst firm [IDC], and fellow IBM executive Steve "Woj" Wojtowecz, VP of Tivoli Storage and Networking Software.
(Update: Apparently, IBM had not secured re-distribution rights from IDC to post this video prior to my blog post. IBM now has full permission to distribute. My apologies for any inconvenience last week.)
The two discuss client opportunities and requirements for storage clouds and compute clouds. Client cloud storage requirements include backup and archive clouds, file storage clouds, and storage that supports compute cloud environments.
Guest Post: The following post was written by Tom Rauchut, IBM Infrastructure Architect and Advanced Technical Sales Specialist for Tivoli Automation. Tom is at IBM Pulse 2011 for Las Vegas this week, and has offered to send his observations.
The expo opened last night. There are so many fantastic demos and product experts. Las Vegas has a Tivoli buzz on right now.
On Wikibon, David Floyer has an article titled [SAS Drives Tier 1 to New Levels of Green] that focuses on the energy efficiency benefits of newer Serial-Attach SCSI (SAS) drives over older Fibre Channel (FC) drives. This makes sense, as R&D budgets have been spent on making newer technologies more "green".
Of course, people might consider this an [apples-to-oranges] comparison. Not only are we changing from FC to SAS technology, we are also changing from 3.5-inch drives to small form factor (SFF) 2.5-inch drives. It seems odd to specify 2000 drives, when only two of the five scale up to that level. Few systems in production, from any vendor, have more than 1000 drives, so it would have seemed that would have been a fairer comparison.
However, Hu's conclusion that the combination of SAS and SFF provides better performance and energy efficiency for both IBM DS8800 and HDS VSP than FC-based alternatives from any vendor seems reasonably supported by the data.
Meanwhile, fellow blogger David Merrill (HDS) pokes fun at IBM DS8800 in Figure 2 in his post [Winner o’ the green]. This second comparison was for 4PB of raw capacity, which 4 of the 5 can handle easily using 2TB SATA drives, but the DS8800 is based on SAS technology and does not support 2TB SATA drives. A performance-oriented configuration with four distinct DS8800 boxes employing 600GB SAS drives is used instead, causing the data for the DS8800 to stick out like a sore thumb, or perhaps more intentionally as a middle finger.
The main take-away here is that IBM offers both the DS8700 for capacity-optimized workloads, and the DS8800 for performance-optimized workloads. Some competitors may have been spreading FUD that the DS8700 was withdrawn last month, it wasn't. As you can see from the data presented, there are times where a DS8700 might be more preferable than a DS8800, depending on the type of workloads you plan to deploy. IBM offers both, and will continue to support existing DS8700 and DS8800 units in the field for many years to come.
Each quarter since 2006, the [IBM Migration Factory] team has tallied the number of clients who have moved to IBM severs and storage systems from competitive hardware. We'll I've just seen the latest numbers, for the third quarter of 2010, and it looks like we set a new quarterly record with nearly 400 total migrations to IBM from Oracle/Sun and HP.
It's clear that companies and governments worldwide are seeing greater value in IBM systems, while Oracle and HP watch their customer bases erode. In just this past 3Q 2010, nearly 400 clients have moved over to IBM -- almost all of them from Oracle/Sun and HP. Of these, 286 clients migrated to IBM Power Systems, running AIX, Linux and IBM i operating systems, from competitors alone -- nearly 175 from Oracle/Sun and nearly 100 from HP. The number of migrations to IBM Power Systems through the first three quarters of 2010 is nearly 800, already exceeding the total for all of last year by more than 200.
Let's do the math.... Since IBM established its Migration Factory program in 2006, more than 4,500 clients have switched to IBM. More than 1,000 from Oracle/Sun and HP joined the exodus this year alone. In less than five years, almost 3,000 of these clients -- including more than 1,500 from Oracle/Sun and more than 1,000 from HP -- have chosen to run their businesses on IBM's Power Systems. That's more than a client per day making the move to IBM!
And as the servers go, so goes the storage. Clients are re-discovering IBM as a server and storage powerhouse, offering a strong portfolio in servers, disk and tape systems, and how synergies between servers and storage can provide them real business benefits.
Adding it all up, it's clear that IBM's multi-billion dollar investment in helping to build a smarter planet with workload-optimized systems is paying off -- and that, more and more, clients are selecting IBM over the competition to help them meet their business needs.