Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Well, I'm back safely from my tour of Asia. I am glad to report that Tokyo, Beijing and Kuala Lumpur are pretty much how I remember them from the last time I was there in each city. I have since been fighting jet lag by watching the last thirteen episodes of LOST season 6 and the series finale.
Recently, I have started seeing a lot of buzz on the term "Storage Federation". The concept is not new, but rather based on the work in database federation, first introduced in 1985 by [A federated architecture for information management] by Heimbigner and McLeod. For those not familiar with database federation, you can take several independent autonomous databases, and treat them as one big federated system. For example, this would allow you to issue a single query and get results across all the databases in the federated system. The advantage is that it is often easier to federate several disparate heterogeneous databases than to merge them into a single database. [IBM Infosphere Federation Server] is a market leader in this space, with the capability to federate DB2, Oracle and SQL Server databases.
Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.
Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.
Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.
Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.
Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.
To some extent, IBM SAN Volume Controller (SVC), XIV, Scale-Out NAS (SONAS), and Information Archive (IA) offer most, if not all, of these capabilities. EMC claims its VPLEX will be able to offer storage federation, but only with other VPLEX clusters, which brings up a good question. What about heterogenous storage federation? Before anyone accuses me of throwing stones at glass houses, let's take a look at each IBM solution:
IBM SAN Volume Controller
The IBM SAN Volume Controller has been doing storage federation since 2003. Not only can IBM SAN Volume Controller bring together storage from a variety of heterogenous storage, the SVC cluster itself can be a mix of different hardware models. You can have a 2145-8A4 node pair, 2145-8G4 node pair, and the new 2145-CF8 node pair, all combined together into a single SVC cluster. Upgrading SVC hardware nodes in an SVC cluster is always non-disruptive.
IBM XIV storage system
The IBM XIV has two kinds of independent modules. Data modules have processor, cache and 12 disks. Interface modules are data modules with additional processor, FC and Ethernet (iSCSI) adapters. Because these two modules play different roles in an XIV "colony", that number of each type is predetermined. Entry-level six-module systems have 2 interface and 4 data modules. Full 15-module systems have 6 interface and 9 data modules. Individual modules can be added or removed non-disruptively in an XIV.
IBM Scale-Out NAS
The SONAS is comprised of three kinds of nodes that work together in concert. A management node, one or more interface nodes, and two or more storage nodes. The storage nodes are paired to manage up to 240 nodes in a storage pod. Individual interface or data nodes can be added or removed non-disruptively in the SONAS. The underlying technology, the General Parallel File System, has been doing storage federation since 1996 for some of the largest top 500 supercomputers in the world.
IBM Information Archive (IA)
For the IA, there are 1, 2 or 3 nodes, which manages a set of collections. A collection can either be file-based using industry-standard NAS protocols, or object-based using the popular System Storage™ Archive Manager (SSAM) interface. Normally, you have as many collections as you have nodes, but nodes are powerful enough to manage two collections to provide N-1 availability. This allows a node to be removed, and a new node added into the IA "colony", in a non-disruptive manner.
Even in an ant colony, there are only a few types of ants, with typically one queen, several males, and lots of workers. But all the ants are red. You don't see colonies that mix between different species of ants. For databases, federation was a way to avoid the much harder task of merging databases from different platforms. For storage, I am surprised people have latched on to the term "federation", given our mixed results in the other "federations" we have formed, which I have conveniently (IMHO) ranked from least effective to most effective:
The Union of Soviet Socialist Republics (USSR)
My father used to say, "If the Soviet Union were in charge of the Sahara desert, they would run out of sand in 50 years." The [Soviet Union] actually lasted 68 years, from 1922 to 1991.
The United Nations (UN)
After the previous League of Nations failed, the UN was formed in 1945 to facilitate cooperation in international law, international security, economic development, social progress, human rights, and the achieving of world peace by stopping wars between countries, and to provide a platform for dialogue.
The European Union (EU)
With the collapse of the Greek economy, and the [rapid growth of debt] in the UK, Spain and France, there are concerns that the EU might not last past 2020.
The United States of America (USA)
My own country is a federation of states, each with its own government. California's financial crisis was compared to the one in Greece. My own state of Arizona is under boycott from other states because of its recent [immigration law]. However, I think the US has managed better than the EU because it has evolved over the past 200 years.
The Organization of the Petroleum Exporting Countries [OPEC]
Technically, OPEC is not a federation of cooperating countries, but rather a cartel of competing countries that have agreed on total industry output of oil to increase individual members' profits. Note that it was a non-OPEC company, BP, that could not "control their output" in what has now become the worst oil spill in US history. OPEC was formed in 1960, and is expected to collapse sometime around 2030 when the world's oil reserves run out. Matt Savinar has a nice article on [Life After the Oil Crash].
United Federation of Planets
The [Federation] fictitiously described in the Star Trek series appears to work well, an optimistic view of what federations could become if you let them evolve long enough.
Given the mixed results with "federation", I think I will avoid using the term for storage, and stick to the original term "scale-out architecture".
Here I am, day 11 of a 17-day business trip, on my last leg of the trip this week, in Kuala Lumpur in Malaysia. I have been flooded with requests to give my take on EMC's latest re-interpretation of storage virtualization, VPLEX.
I'll leave it to my fellow IBM master inventor Barry Whyte to cover the detailed technical side-by-side comparison. Instead, I will focus on the business side of things, using Simon Sinek's Why-How-What sequence. Here is a [TED video] from Garr Reynold's post
[The importance of starting from Why].
Let's start with the problem we are trying to solve.
Problem: migration from old gear to new gear, old technology to new technology, from one vendor to another vendor, is disruptive, time-consuming and painful.
Given that IT storage is typically replaced every 3-5 years, then pretty much every company with an internal IT department has this problem, the exception being those companies that don't last that long, and those that use public cloud solutions. IT storage can be expensive, so companies would like their new purchases to be fully utilized on day 1, and be completely empty on day 1500 when the lease expires. I have spoken to clients who have spent 6-9 months planning for the replacement or removal of a storage array.
A solution to make the data migration non-disruptive would benefit the clients (make it easier for their IT staff to keep their data center modern and current) as well as the vendors (reduce the obstacle of selling and deploying new features and functions). Storage virtualization can be employed to help solve this problem. I define virtualization as "technology that makes one set of resources look and feel like a different set of resources, preferably with more desirable characteristics.". By making different storage resources, old and new, look and feel like a single type of resource, migration can be performed without disrupting applications.
Before VPLEX, here is a breakdown of each solution:
Non-disruptive tech refresh, and a unified platform to provide management and functionality across heterogeneous storage.
Non-disruptive tech refresh, and a unified platform to provide management and functionality between internal tier-1 HDS storage, and external tier-2 heterogeneous storage.
Non-disruptive tech refresh, with unified multi-pathing driver that allows host attachment of heterogeneous storage.
New in-band storage virtualization device
Add in-band storage virtualization to existing storage array
New out-of-band storage virtualization device with new "smart" SAN switches
SAN Volume Controller
HDS USP-V and USP-VM
For IBM, the motivation was clear: Protect customers existing investment in older storage arrays and introduce new IBM storage with a solution that allows both to be managed with a single set of interfaces and provide a common set of functionality, improving capacity utilization and availability. IBM SAN Volume Controller eliminated vendor lock-in, providing clients choice in multi-pathing driver, and allowing any-to-any migration and copy services. For example, IBM SVC can be used to help migrate data from an old HDS USP-V to a new HDS USP-V.
With EMC, however, the motivation appeared to protect software revenues from their PowerPath multi-pathing driver, TimeFinder and SRDF copy services. Back in 2005, when EMC Invista was first announced, these three software represented 60 percent of EMC's bottom-line profit. (Ok, I made that last part up, but you get my point! EMC charges a lot for these.)
Back in 2006, fellow blogger Chuck Hollis (EMC) suggested that SVC was just a [bump in the wire] which could not possibly improve performance of existing disk arrays. IBM showed clients that putting cache(SVC) in front of other cache(back end devices) does indeed improve performance, in the same way that multi-core processors successfully use L1/L2/L3 cache. Now, EMC is claiming their cache-based VPLEX improves performance of back-end disk. My how EMC's story has changed!
So now, EMC announces VPLEX, which sports a blend of SVC-like and Invista-like characteristics. Based on blogs, tweets and publicly available materials I found on EMC's website, I have been able to determine the following comparison table. (Of course, VPLEX is not yet generally available, so what is eventually delivered may differ.)
Scalable, 1 to 4 node-pairs
One size fits all, single pair of CPCs
SVC-like, 1 to 4 director-pairs
Works with any SAN switches or directors
Required special "smart" switches (vendor lock-in)
SVC-like, works with any SAN switches or directors
Broad selection of IBM Subsystem Device Driver (SDD) offered at no additional charge, as well as OS-native drivers Windows MPIO, AIX MPIO, Solaris MPxIO, HP-UX PV-Links, VMware MPP, Linux DM-MP, and comercial third-party driver Symantec DMP.
Limited selection, with focus on priced PowerPath driver
Invista-like, PowerPath and Windows MPIO
Read cache, and choice of fast-write or write-through cache, offering the ability to improve performance.
No cache, Split-Path architecture cracked open Fibre Channel packets in flight, delayed every IO by 20 nanoseconds, and redirected modified packets to the appropriate physical device.
SVC-like, Read and write-through cache, offering the ability to improve performance.
Space-Efficient Point-in-Time copies
SVC FlashCopy supports up to 256 space-efficient targets, copies of copies, read-only or writeable, and incremental persistent pairs.
Like Invista, No
Remote distance mirror
Choice of SVC Metro Mirror (synchronous up to 300km) and Global Mirror (asynchronous), or use the functionality of the back-end storage arrays
No native support, use functionality of back-end storage arrays, or purchase separate product called EMC RecoverPoint to cover this lack of functionality
Limited synchronous remote-distance mirror within VPLEX (up to 100km only), no native asynchronous support, use functionality of back-end storage arrays
Provides thin provisioning to devices that don't offer this natively
Like Invista, No
SVC Split-Cluster allows concurrent read/write access of data to be accessed from hosts at two different locations several miles apart
I don't think so
PLEX-Metro, similar in concept but implemented differently
Non-disruptive tech refresh
Can upgrade or replace storage arrays, SAN switches, and even the SVC nodes software AND hardware themselves, non-disruptively
Tech refresh for storage arrays, but not for Invista CPCs
Tech refresh of back end devices, and upgrade of VPLEX software, non-disruptively. Not clear if VPLEX engines themselves can be upgraded non-disruptively like the SVC.
Heterogeneous Storage Support
Broad support of over 140 different storage models from all major vendors, including all CLARiiON, Symmetrix and VMAX from EMC, and storage from many smaller startups you may not have heard of
Invista-like. VPLEX claims to support a variety of arrays from a variety of vendors, but as far as I can find, only DS8000 supported from the list of IBM devices. Fellow blogger Barry Burke (EMC) suggests [putting SVC between VPLEX and third party storage devices] to get the heterogeneous coverage most companies demand.
Back-end storage requirement
Must define quorum disks on any IBM or non-IBM back end storage array. SVC can run entirely on non-IBM storage arrays
HP SVSP-like, requires at least one EMC storage array to hold metadata
SVC 2145-CF8 model supports up to four solid-state drives (SSD) per node that can treated as managed disk to store end-user data
Invista-like. VPLEX has an internal 30GB SSD, but this is used only for operating system and logs, not for end-user data.
In-band virtualization solutions from IBM and HDS dominate the market. Being able to migrate data from old devices to new ones non-disruptively turned out to be only the [tip of the iceberg] of benefits from storage virtualization. In today's highly virtualized server environment, being able to non-disruptively migrate data comes in handy all the time. SVC is one of the best storage solutions for VMware, Hyper-V, XEN and PowerVM environments. EMC watched and learned in the shadows, taking notes of what people like about the SVC, and decided to follow IBM's time-tested leadership to provide a similar offering.
EMC re-invented the wheel, and it is round. On a scale from Invista (zero) to SVC (ten), I give EMC's new VPLEX a six.
Over on the Tivoli Storage Blog, there is an exchange over the concept of a "Storage Hypervisor". This started with fellow IBMer Ron Riffe's blog post [Enabling Private IT for Storage Cloud -- Part I], with a promise to provide parts 2 and 3 in the next few weeks. Here's an excerpt:
"Storage resources are virtualized. Do you remember back when applications ran on machines that really were physical servers (all that “physical” stuff that kept everything in one place and slowed all your processes down)? Most folks are rapidly putting those days behind them.
In August, Gartner published a paper [Use Heterogeneous Storage Virtualization as a Bridge to the Cloud] that observed “Heterogeneous storage virtualization devices can consolidate a diverse storage infrastructure around a common access, management and provisioning point, and offer a bridge from traditional storage infrastructures to a private cloud storage environment” (there’s that “cloud” language). So, if I’m going to use a storage hypervisor as a first step toward cloud enabling my private storage environment, what differences should I expect? (good question, we get that one all the time!)
The basic idea behind hypervisors (server or storage) is that they allow you to gather up physical resources into a pool, and then consume virtual slices of that pool until it’s all gone (this is how you get the really high utilization). The kicker comes from being able to non-disruptively move those slices around. In the case of a storage hypervisor, you can move a slice (or virtual volume) from tier to tier, from vendor to vendor, and now, from site to site all while the applications are online and accessing the data. This opens up all kinds of use cases that have been described as “cloud”. One of the coolest is inter-site application migration.
A good storage hypervisor helps you be smart.
Application owners come to you for storage capacity because you’re responsible for the storage at your company. In the old days, if they requested 500GB of capacity, you allocated 500GB off of some tier-1 physical array – and there it sat. But then you discovered storage hypervisors! Now you tell that application owner he has 500GB of capacity… What he really has is a 500GB virtual volume that is thin provisioned, compressed, and backed by lower-tier disks. When he has a few data blocks that get really hot, the storage hypervisor dynamically moves just those blocks to higher tier storage like SSD’s. His virtual disk can be accessed anywhere across vendors, tiers and even datacenters. And in the background you have changed the vendor storage he is actually sitting on twice because you found a better supplier. But he doesn’t know any of this because he only sees the 500GB virtual volume you gave him. It’s 'in the cloud'."
"Let’s start with a quick walk down memory lane. Do you remember what your data protection environment looked like before virtualization? There was a server with an operating system and an application… and that thing had a backup agent on it to capture backup copies and send them someplace (most likely over an IP network) for safe keeping. It worked, but it took a lot of time to deploy and maintain all the agents, a lot of bandwidth to transmit the data, and a lot of disk or tapes to store it all. The topic of data protection has modernized quite a bit since then.
Fast forward to today. Modernization has come from three different sources – the server hypervisor, the storage hypervisor and the unified recovery manager. The end result is a data protection environment that captures all the data it needs in one coordinated snapshot action, efficiently stores those snapshots, and provides for recovery of just about any slice of data you could want. It’s quite the beautiful thing."
At this point, you might scratch your head and ask "Does this Storage Hypervisor exist, or is this just a theoretical exercise?" The answer of course is "Yes, it does exist!" Just like VMware offers vSphere and vCenter, IBM offers block-level disk virtualization through the SAN Volume Controller(SVC) and Storwize V7000 products, with a full management support from Tivoli Storage Productivity Center Standard Edition.
SVC has supported every release of VMware since the 2.5 version. IBM is the leading reseller of VMware, so it makes sense for IBM and VMware development to collaborate and make sure all the products run smoothly together. SVC presents volumes that can be formatted for VMFS file system to hold your VMDK files, accessible via FCP protocol. IBM and VMware have some key synergies:
Management integration with Tivoli Storage Productivity Center and VMware vCenter plug-in
VAAI support: Hardware-assisted locking, hardware-assisted zeroing, and hardware-assisted copying. Some of the competitors, like EMC VPLEX, don't have this!
Space-efficient FlashCopy. Let's say you need 250 VM images, all running a particular level of Windows. A boot volume of 20GB each would consume 5000GB (5 TB) of capacity. Instead, create a Golden Master volume. Then, take 249 copies with space-efficient FlashCopy, which only consumes space for the modified portions of the new volumes. For each copy, make the necessary changes like unique hostname and IP address, changing only a few blocks of data each. The end result? 250 unique VM boot volumes in less than 25GB of space, a 200:1 reduction!
Support for VMware's Site Recovery Manager using SVC's Metro Mirror or Global Mirror features for remote-distance replication.
Data center federation. SVC allows you to seamlessly do vMotion from one datacenter to another using its "stretched cluster" capability. Basically, SVC makes a single image of the volume available to both locations, and stores two physical copies, one in each location. You can lose either datacenter and still have uninterrupted access to your data. VMware's HA or Fault Tolerance features can kick in, same as usual.
But unlike tools that work only with VMware, IBM's storage hypervisor works with a variety of server virtualization technologies, including Microsoft Hyper-V, Xen, OracleVM, Linux KVM, PowerVM, z/VM and PR/SM. This is important, as a recent poll on the Hot Aisle blog indicates that [44 percent run 2 or more server hypervisors]!
Join the conversation! The virtual dialogue on this topic will continue in a [live group chat] this Friday, September 23, 2011 from 12 noon to 1pm EDT. Join me and about 20 other top storage bloggers, key industry analysts and IBM Storage subject matter experts to discuss storage hypervisors and get questions answered about improving your private storage environment.
I always try to catch a session from Jim Blue, who works in our "SAN Central" center of competency team. This session was a long list of useful hints and tips, based on his many years of experience helping clients.
SAN Zoning works by inclusion, limiting the impact of failing devices. The best approach is to zone by individual initiator port. The default policy for your SAN zoning should be "deny".
Ports should be named to identify who, what, where and how.
While many people know not to mix both disk and tape devices on the same HBA, Jim also recommends not mixing dissimilar disks, test and production, FCP and FICON.
The sweet spot is FOUR paths. Too many paths can impact performance.
When making changes to redundant fabrics, make changes to the first fabric, then allow sufficient time before making the same changes to the other fabric.
Use software tools like Tivoli Storage Productivity Center (Standard Edition) to validate all changes to your SAN fabric.
Do not mix 62.5 and 50.0 micron technology.
Use port caps to disable inactive ports. In one amusing anecdote, he mention that an uncovered port was hit by sunlight every day, sending error messages that took a while to figure out.
Save your SAN configuration to non-SAN storage for backup
Consider firmware about two months old to be stable
Rule of thumb for estimating IOPS: 75-100 IOPS per 7200 RPM drive, 120-150 IOPS per 10K RPM drive, and 150-200 IOPS per 15K RPM drive.
Decide whether your shop is just-in-time or just-in-case provisioning. Just-in-time gets additional capacity on demand as needed, and just-in-case over-provisions to avoid scrambling last minute.
Avoid oversubscribing your inter-switch links (ISL). Aim for around 7:1 to 10:1 ratio.
Don't go cheap on bandwidth between sites for long-distance replication
Next Generation Network Fabrics - Strategy and Innovations
Mike Easterly, IBM Director of Global Field Marketing, presented IBM System Networking strategy, in light of IBM's recent acquisition of Blade Network Technologies (BNT). BNT is used in 350 of the Fortune 500 companies, and is ranked #2 behind Cisco in sales of non-core Ethernet switches (based on number of units sold).
Based on a recent survey, companies are upgrading their Ethernet networks for a variety of reasons:
56 percent for Live Partition Mobility and VMware Vmotion
45 percent for integrated compute stacks, like IBM CloudBurst
43 percent for private, public and hybrid cloud computing deployments
40 percent for network convergences
Many companies adopt a three-level approach, with core directors, distribution switches, and then access switches at the edge that connect servers and storage devices. IBM's BNT allows you to flatten the network to lower latency by collapsing the access and distribution levels into one.
IBM's strategy is to focus on BNT for the access/distribution level, and to continue its strategic partnerships for the core level.
IBM BNT provides better price/performance and lower energy consumption. To help with hot-aisle/cold-aisle rack deployments, IBM BNT provides both F and R models. F models have ports on the front, and R models have ports in the rear.
IBM BNT supports virtual fabric and HW-offload iSCSI traffic, and future-enabled for FCoE. Support for TRILL (transparent interconnect of lots of links) and OpenFlow will be implemented through software updates to the switches.
While Cisco Nexus 1000v is focused on VMware Enterprise Plus, IBM BNT's VMready works with VMware, Hyper-V, Linux KVM, XEN, OracleVM, and PowerVM. This allows single pane of management of VMready and ESX vSwitches.
In preparation for Converged Enhanced Ethernet (CEE), IBM BNT will provide full 40GbE support sometime next year, and offer switches that support 100GbE uplinks. IBM offers extended length cables, including passive SFP+ DAC at 8.5 meters, and 10Gbase-T Cat7 cables up to 100 meters.
Inter-datacenter Workload Mobility with VMware vSphere and SAN Volume Controller (SVC)
This session was co-presented between Bill Wiegand, IBM Advanced Technical Services, and Rawley Burbridge, IBM VMware and midrange storage consultant. IBM is the leader in storage virtualization product (SVC), and is the leading reseller of VMware.
Like MetroCluster on IBM N series, or EMC's VPLEX Metro, the IBM SAN Volume Controller can support a stretched cluster across distance that allows virtual machines to move seamlessly from one datacenter to another. This is a feature IBM introduced with SVC 5.1 back in 2009. This can be used for PowerVM Live Partition Mobility, VMware vMotion, and Hyper-V Quick Migration.
SVC stretched cluster can help with both Disaster Avoidance and Disaster Recovery. For Disaster Avoidance, in anticipation of an outage, VMs can be moved to the secondary datacenter. For Disaster Recover, additional automation, such as VMware High Availability (HA) is needed to restart the VMs at the secondary datacenter.
IBM stretched cluster is further improved with a feature called Volume Mirroring (formerly vDisk Mirroring) which creates two physical copies of one logical volume. To the VMware ESX hosts, there is only one volume, regardless of which datacenter it is in. The two physical copies can be on any kind of managed disk, as there is no requirement or dependency of copy services on the back-end storage arrays.
Another recent improvement is the idea of spreading the three quorum disks to three different locations or "failure domains". One in each data center, and a third one in a separate building, somewhere in between the other two, perhaps.
Of course, there are regional disasters that could affect both datacenters. For this reason, SVC stretched cluster volumes can be replicated to a third location up to 8000 km away. This can be done with any back-end disk arrays, as again there is not requirement for copy services from the managed devices. SVC takes care of it all.
Networking is going to be very important for a variety of transformational projects going forward in the next five years.