Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
For the longest time, people thought that humans could not run a mile in less than four minutes. Then, in 1954, [Sir Roger Bannister] beat that perception, and shortly thereafter, once he showed it was possible, many other runners were able to achieve this also. The same is being said now about the IBM Watson computer which appeared this week against two human contestants on Jeopardy!
(2014 Update: A lot has happened since I originally wrote this blog post! I intended this as a fun project for college students to work on during their summer break. However, IBM is concerned that some businesses might be led to believe they could simply stand up their own systems based entirely on open source and internally developed code for business use. IBM recommends instead the [IBM InfoSphere BigInsights] which packages much of the software described below. IBM has also launched a new "Watson Group" that has [Watson-as-a-Service] capabilities in the Cloud. To raise awareness to these developments, IBM has asked me to rename this post from IBM Watson - How to build your own "Watson Jr." in your basement to the new title IBM Watson -- How to replicate Watson hardware and systems design for your own use in your basement. I also took this opportunity to improve the formatting layout.)
Often, when a company demonstrates new techology, these are prototypes not yet ready for commercial deployment until several years later. IBM Watson, however, was made mostly from commercially available hardware, software and information resources. As several have noted, the 1TB of data used to search for answers could fit on a single USB drive that you buy at your local computer store.
Take a look at the [IBM Research Team] to determine how the project was organized. Let's decide what we need, and what we don't in our version for personal use:
Do we need it for personal use?
Yes, That's you. Assuming this is a one-person project, you will act as Team Lead.
Yes, I hope you know computer programming!
No, since this version for personal use won't be appearing on Jeopardy, we won't need strategy on wager amounts for the Daily Double, or what clues to pick next. Let's focus merely on a computer that can accept a question in text, and provide an answer back, in text.
Yes, this team focused on how to wire all the hardware together. We need to do that, although this version for personal use will have fewer components.
Optional. For now, let's have this version for personal use just return its answer in plain text. Consider this Extra Credit after you get the rest of the system working. Consider using [eSpeak], [FreeTTS], or the Modular Architecture for Research on speech sYnthesis [MARY] Text-to-Speech synthesizers.
Yes, I will explain what this is, and why you need it.
Yes, we will need to get information for personal use to process
Yes, this team developed a system for parsing the question being asked, and to attach meaning to the different words involved.
No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.
(Disclaimer: As with any Do-It-Yourself (DIY) project, I am not responsible if you are not happy with your version for personal use I am basing the approach on what I read from publicly available sources, and my work in Linux, supercomputers, XIV, and SONAS. For our purposes, this version for personal use is based entirely on commodity hardware, open source software, and publicly available sources of information. Your implementation will certainly not be as fast or as clever as the IBM Watson you saw on television.)
Step 1: Buy the Hardware
Supercomputers are built as a cluster of identical compute servers lashed together by a network. You will be installing Linux on them, so if you can avoid paying extra for Microsoft Windows, that would save you some money. Here is your shopping list:
Three x86 hosts, with the following:
64-bit quad-core processor, either Intel-VT or AMD-V capable,
8GB of DRAM, or larger
300GB of hard disk, or larger
CD or DVD Read/Write drive
Computer Monitor, mouse and keyboard
Ethernet 1GbE 4-port hub, and appropriate RJ45 cables
Surge protector and Power strip
Local Console Monitor (LCM) 4-port switch (formerly known as a KVM switch) and appropriate cables. This is optional, but will make it easier during the development. Once your implementation is operational, you will only need the monitor and keyboard attached to one machine. The other two machines can remain "headless" servers.
Step 2: Establish Networking
IBM Watson used Juniper switches running at 10Gbps Ethernet (10GbE) speeds, but was not connected to the Internet while playing Jeopardy! Instead, these Ethernet links were for the POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes.
The implementation will be able to run "disconnected from the Internet" as well. However, you will need Internet access to download the code and information sources. For our purposes, 1GbE should be sufficient. Connect your Ethernet hub to your DSL or Cable modem. Connect all three hosts to the Ethernet switch. Connect your keyboard, video monitor and mouse to the LCM, and connect the LCM to the three hosts.
Step 3: Install Linux and Middleware
To say I use Linux on a daily basis is an understatement. Linux runs on my Android-based cell phone, my laptop at work, my personal computers at home, most of our IBM storage devices from SAN Volume Controller to XIV to SONAS, and even on my Tivo at home which recorded my televised episodes of Jeopardy!
For this project, you can use any modern Linux distribution that supports KVM. IBM Watson used Novel SUSE Linux Enterprise Server [SLES 11]. Alternatively, I can also recommend either Red Hat Enterprise Linux [RHEL 6] or Canonical [Ubuntu v10]. Each distribution of Linux comes in different orientations. Download the the 64-bit "ISO" files for each version, and burn them to CDs.
Graphical User Interface (GUI) oriented, often referred to as "Desktop" or "HPC-Head"
Command Line Interface (CLI) oriented, often referred to as "Server" or "HPC-Compute"
Guest OS oriented, to run in a Hypervisor such as KVM, Xen, or VMware. Novell calls theirs "Just Enough Operating System" [JeOS].
For this version for personal use, I have chosen a [multitier architecture], sometimes referred to as an "n-tier" or "client/server" architecture.
Host 1 - Presentation Server
For the Human-Computer Interface [HCI], the IBM Watson received categories and clues as text files via TCP/IP, had a [beautiful avatar] representing a planet with 42 circles streaking across in orbit, and text-to-speech synthesizer to respond in a computerized voice. Your implementation will not be this sophisticated. Instead, we will have a simple text-based Query Panel web interface accessible from a browser like Mozilla Firefox.
Host 1 will be your Presentation Server, the connection to your keyboard, video monitor and mouse. Install the "Desktop" or "HPC Head Node" version of Linux. Install [Apache Web Server and Tomcat] to run the Query Panel. Host 1 will also be your "programming" host. Install the [Java SDK] and the [Eclipse IDE for Java Developers]. If you always wanted to learn Java, now is your chance. There are plenty of books on Java if that is not the language you normally write code.
While three little systems doesn't constitute an "Extreme Cloud" environment, you might like to try out the "Extreme Cloud Administration Tool", called [xCat], which was used to manage the many servers in IBM Watson.
Host 2 - Business Logic Server
Host 2 will be driving most of the "thinking". Install the "Server" or "HPC Compute Node" version of Linux. This will be running a server virtualization Hypervisor. I recommend KVM, but you can probably run Xen or VMware instead if you like.
Host 3 - File and Database Server
Host 3 will hold your information sources, indices, and databases. Install the "Server" or "HPC Compute Node" version of Linux. This will be your NFS server, which might come up as a question during the installation process.
Technically, you could run different Linux distributions on different machines. For example, you could run "Ubuntu Desktop" for host 1, "RHEL 6 Server" for host 2, and "SLES 11" for host 3. In general, Red Hat tries to be the best "Server" platform, and Novell tries to make SLES be the best "Guest OS".
My advice is to pick a single distribution and use it for everything, Desktop, Server, and Guest OS. If you are new to Linux, choose Ubuntu. There are plenty of books on Linux in general, and Ubuntu in particular, and Ubuntu has a helpful community of volunteers to answer your questions.
Step 4: Download Information Sources
You will need some documents for your implementation to process.
IBM Watson used a modified SONAS to provide a highly-available clustered NFS server. For this version, we won't need that level of sophistication. Configure Host 3 as the NFS server, and Hosts 1 and 2 as NFS clients. See the [Linux-NFS-HOWTO] for details. To optimize performance, host 3 will be the "official master copy", but we will use a Linux utility called rsync to copy the information sources over to the hosts 1 and 2. This allows the task engines on those hosts to access local disk resources during question-answer processing.
We will also need a relational database. You won't need a high-powered IBM DB2. Your implementation can do fine with something like [Apache Derby] which is the open source version of IBM CloudScape from its Informix acquisition. Set up Host 3 as the Derby Network Server, and Hosts 1 and 2 as Derby Network Clients. For more about structured content in relational databases, see my post [IBM Watson - Business Intelligence, Data Retrieval and Text Mining].
Linux includes a utility called wget which allows you to download content from the Internet to your system. What documents you decide to download is up to you, based on what types of questions you want answered. For example, if you like Literature, check out the vast resources at [FullBooks.com]. You can automate the download by writing a shell script or program to invoke wget to all the places you want to fetch data from. Rename the downloaded files to something unique, as often they are just "index.html". For more on wget utility, see [IBM Developerworks].
Step 5: The Query Panel - Parsing the Question
Next, we need to parse the question and have some sense of what is being asked for. For this we will use [OpenNLP] for Natural Language Processing, and [OpenCyc] for the conceptual logic reasoning. See Doug Lenat presenting this 75-minute video [Computers versus Common Sense]. To learn more, see the [CYC 101 Tutorial].
Unlike Jeopardy! where Alex Trebek provides the answer and contestants must respond with the correct question, we will do normal Question-and-Answer processing. To keep things simple, we will limit questions to the following formats:
Who is ...?
Where is ...?
When did ... happen?
What is ...?
Host 1 will have a simple Query Panel web interface. At the top, a place to enter your question, and a "submit" button, and a place at the bottom for the answer to be shown. When "submit" is pressed, this will pass the question to "main.jsp", the Java servlet program that will start the Question-answering analysis. Limiting the types of questions that can be posed will simplify hypothesis generation, reduce the candidate set and evidence evaluation, allowing the analytics processing to continue in reasonable time.
Step 6: Unstructured Information Management Architecture
The "heart and soul" of IBM Watson is Unstructured Information Management Architecture [UIMA]. IBM developed this, then made it available to the world as open source. It is maintained by the [Apache Software Foundation], and overseen by the Organization for the Advancement of Structured Information Standards [OASIS].
Basically, UIMA lets you scan unstructured documents, gleam the important points, and put that into a database for later retrieval. In the graph above, DBs means 'databases' and KBs means 'knowledge bases'. See the 4-minute YouTube video of [IBM Content Analytics], the commercial version of UIMA.
Starting from the left, the Collection Reader selects each document to process, and creates an empty Common Analysis Structure (CAS) which serves as a standardized container for information. This CAS is passed to Analysis Engines , composed of one or more Annotators which analyze the text and fill the CAS with the information found. The CAS are passed to CAS Consumers which do something with the information found, such as enter an entry into a database, update an index, or update a vote count.
(Note: This point requires, what we in the industry call a small matter of programming, or [SMOP]. If you've always wanted to learn Java programming, XML, and JDBC, you will get to do plenty here. )
If you are not familiar with UIMA, consider this [UIMA Tutorial].
Step 7: Parallel Processing
People have asked me why IBM Watson is so big. Did we really need 2,880 cores of processing power? As a supercomputer, the 80 TeraFLOPs of IBM Watson would place it only in 94th place on the [Top 500 Supercomputers]. While IBM Watson may be the [Smartest Machine on Earth], the most powerful supercomputer at this time is the Tianhe-1A with more than 186,000 cores, capable of 2,566 TeraFLOPs.
To determine how big IBM Watson needed to be, the IBM Research team ran the DeepQA algorithm on a single core. It took 2 hours to answer a single Jeopardy question! Let's look at the performance data:
Number of cores
Time to answer one Jeopardy question
Single IBM Power750 server
< 4 minutes
Single rack (10 servers)
< 30 seconds
IBM Watson (90 servers)
< 3 seconds
The old adage applies, [many hands make for light work]. The idea is to divide-and-conquer. For example, if you wanted to find a particular street address in the Manhattan phone book, you could dispatch fifty pages to each friend and they could all scan those pages at the same time. This is known as "Parallel Processing" and is how supercomputers are able to work so well. However, not all algorithms lend well to parallel processing, and the phrase [nine women can't have a baby in one month] is often used to remind us of this.
Fortuantely, UIMA is designed for parallel processing. You need to install UIMA-AS for Asynchronous Scale-out processing, an add-on to the base UIMA Java framework, supporting a very flexible scale-out capability based on JMS (Java Messaging Services) and ActiveMQ. We will also need Apache Hadoop, an open source implementation used by Yahoo Search engine. Hadoop has a "MapReduce" engine that allows you to divide the work, dispatch pieces to different "task engines", and the combine the results afterwards.
Host 2 will run Hadoop and drive the MapReduce process. Plan to have three KVM guests on Host 1, four on Host 2, and three on Host 3. That means you have 10 task engines to work with. These task engines can be deployed for Content Readers, Analysis Engines, and CAS Consumers. When all processing is done, the resulting votes will be tabulated and the top answer displayed on the Query Panel on Host 1.
Step 8: Testing
To simplify testing, use a batch processing approach. Rather than entering questions by hand in the Query Panel, generate a long list of questions in a file, and submit for processing. This will allow you to fine-tune the environment, optimize for performance, and validate the answers returned.
There you have it. By the time you get your implementation fully operational, you will have learned a lot of useful skills, including Linux administration, Ethernet networking, NFS file system configuration, Java programming, UIMA text mining analysis, and MapReduce parallel processing. Hopefully, you will also gain an appreciation for how difficult it was for the IBM Research team to accomplish what they had for the Grand Challenge on Jeopardy! Not surprisingly, IBM Watson is making IBM [as sexy to work for as Apple, Google or Facebook], all of which started their business in a garage or a basement with a system as small as this version for personal use.
In the prank, I indicated that I had submitted my video to the [Arizona International Film Festival], of AIFF for short, which coincidently was running April 1-20, and that it had won an award. I invited everyone who read my blog to see me accept the award at a ceremony at 6:00pm on April 1 at the Fox Theater, followed by the 8:00pm showing of another award-winning film.
I didn't submit the video, the video didn't win any award, and I was not invited to the award ceremony. I did, however, plan to see the movie at 8:00pm.
When I got there, I learned that a dozen of my friends, not realizing it was a prank, showed up, asking for me. The AIFF was quite amused, and invited me to award ceremony still going on. The other filmmakers were impressed I had concocted such an elaborate social media campaign!
A slideshow is another style of video, animating still images to music. The [Ken Burns effect] was named after the technique fellow filmmaker Ken Burns used in his documentaries.
In 2010, I worked with the XIV team to address FUD that our competitors were flinging about double drive failures. My blog post [Double Drive Failure Debunked: XIV Two Years Later] set the record straight and put this issue to rest once and for all. XIV sales shot up dramatically after this post went public!
Are you tired of hearing about Cloud Computing without having any hands-on experience? Here's your chance. IBM has recently launched its IBM Development and Test Cloud beta. This gives you a "sandbox" to play in. Here's a few steps to get started:
Generate a "key pair". There are two keys. A "public" key that will reside in the cloud, and a "private" key that you download to your personal computer. Don't lose this key.
Request an IP address. This step is optional, but I went ahead and got a static IP, so I don't have to type in long hostnames like "vm353.developer.ihost.com".
Request storage space. Again, this step is optional, but you can request a 50GB, 100GB and 200GB LUN. I picked a 200GB LUN. Note that each instance comes with some 10 to 30GB storage already. The advantage to a storage LUN is that it is persistent, and you can mount it to different instances.
Start an "instance". An "instance" is a virtual machine, pre-installed with whatever software you chose from the "asset catalog". These are Linux images running under Red Hat Enterprise Virtualization (RHEV) which is based on Linux's kernel virtual machine (KVM). When you start an instance, you get to decide its size (small, medium, or large), whether to use your static IP address, and where to mount your storage LUN. On the examples below, I had each instance with a static IP and mounted the storage LUN to /media/storage subdirectory. The process takes a few minutes.
So, now that you are ready to go, what instance should you pick from the catalog? Here are three examples to get you started:
IBM WebSphere sMASH Application Builder
Base OS server to run LAMP stack
Next, I decided to try out one of the base OS images. There are a lot of books on Linux, Apache, MySQL and PHP (LAMP) which represents nearly 70 percent of the web sites on the internet. This instance let's you install all the software from scratch. Between Red Hat and Novell SUSE distributions of Linux, Red Hat is focused on being the Hypervisor of choice, and SUSE is focusing on being the Guest OS of choice. Most of the images on the "asset catalog" are based on SLES 10 SP2. However, there was a base OS image of Red Hat Enterprise Linux (RHEL) 5.4, so I chose that.
To install software, you either have to find the appropriate RPM package, or download a tarball and compile from source. To try both methods out, I downloaded tarballs of Apache Web Server and PHP, and got the RPM packages for MySQL. If you just want to learn SQL, there are instances on the asset catalog with DB2 and DB2 Express-C already pre-installed. However, if you are already an expert in MySQL, or are following a tutorial or examples based on MySQL from a classroom textbook, or just want a development and test environment that matches what your company uses in production, then by all means install MySQL.
This is where my SSH client comes in handy. I am able to login to my instance and use "wget" to fetch the appropriate files. An alternative is to use "SCP" (also part of PuTTY) to do a secure copy from your personal computer up to the instance. You will need to do everything via command line interface, including editing files, so I found this [VI cheat sheet] useful. I copied all of the tarballs and RPMs on my storage LUN ( /media/storage ) so as not to have to download them again.
Compiling and configuring them is a different matter. By default, you login as an end user, "idcuser" (which stands for IBM Developer Cloud user). However, sometimes you need "root" level access. Use "sudo bash" to get into root level mode, and this allows you to put the files where they need to be. If you haven't done a configure/make/make install in awhile, here's your chance to relive those "glory days".
In the end, I was able to confirm that Apache, MySQL and PHP were all running correctly. I wrote a simple index.php that invoked phpinfo() to show all the settings were set correctly. I rebooted the instance to ensure that all of the services started at boot time.
Rational Application Developer over VDI
This last example, I started an instance pre-installed with Rational Application Developer (RAD), which is a full Integrated Development Environment (IDE) for Java and J2EE applications. I used the "NX Client" to launch a virtual desktop image (VDI) which in this case was Gnome on SLES 10 SP2. You might want to increase the screen resolution on your personal computer so that the VDI does not take up the entire screen.
From this VDI, you can launch any of the programs, just as if it were your own personal computer. Launch RAD, and you get the familiar environment. I created a short Java program and launched it on the internal WebSphere Application Server test image to confirm it was working correctly.
If you are thinking, "This is too good to be true!" there is a small catch. The instances are only up and running for 7 days. After that, they go away, and you have to start up another one. This includes any files you had on the local disk drive. You have a few options to save your work:
Copy the files you want to save to your storage LUN. This storage LUN appears persistent, and continues to exist after the instance goes away.
Take an "image" of your "instance", a function provided in the IBM Developer and Test Cloud. If you start a project Monday morning, work on it all week, then on Friday afternoon, take an "image". This will shutdown your instance, and backup all of the files to your own personal "asset catalog" so that the next time you request an instance, you can chose that "image" as the starting point.
Another option is to request an "extension" which gives you another 7 days for that instance. You can request up to five unique instances running at the same time, so if you wanted to develop and test a multi-host application, perhaps one host that acts as the front-end web server, another host that does some kind of processing, and a third host that manages the database, this is all possible. As far as I can tell, you can do all the above from either a Windows, Mac or Linux personal computer.
Getting hands-on access to Cloud Computing really helps to understand this technology!
Back in June, I mentioned this blog was [Moving to MyDeveloperWorks] which is based on IBM Lotus Connections.
Finally, the move is complete for all bloggers. If you are having problems with the redirects, you might need to unsubscribe and re-subscribe in your RSS feed reader. Here are the new links for several IBM bloggers that have moved over:
IBM has chosen three particular Software Defined Environments. At one end, IBM is a platinum sponsor of OpenStack which supports x86 servers, POWER systems and z System mainframes. A problem with open source projects like this, however, is that they can be a bit like putting together IKEA furniture from pieces in a box: "Some assembly required."
At the other end, highly proprietary environments from VMware and Microsoft bring enterprise-ready out-of-the-box solutions. However, nobody wants to be limited to just x86-based solutions. IBM offers the best of both worlds, basing its IBM Cloud and SmartCloud software on OpenStack standards, but providing enterprise-ready solutions for x86, POWER Systems and z System mainframes. This includes IBM Cloud Manager with OpenStack, IBM Cloud Orchestrator, and IBM SmartCloud Cost Management software products.
(Analogy: If open source solutions were vanilla ice cream, and proprietary solutions were chocolate ice cream, then IBM Cloud and SmartCloud is vanilla ice cream with chocolate sauce on top! This is the same approach IBM used for WebSphere Application Server, based on Apache web server, and IBM BigInsights, based on Hadoop analytics.)
For some people, software defined can also refer to how the resources are deployed. Rather than using specialized hardware, solutions based on industry-standard hardware can be delivered either as pre-built appliances, services in the Cloud, or as software-only products.
Back in the 1990s, IBM came up with the [Seascape Storage Enterprise Architecture], deciding to focus the design of its storage systems to be based, where possible and practical, on industry-standard components.
Let's review a few products:
IBM SAN Volume Controller (SVC) and Storwize V7000: IBM storage hypervisors were originally designed to run on industry-standard x86 servers. The IBM scientists at Almaden Research Center referred to this as the "COMmodity PArts Storage System" (COMPASS) architecture.
That is still mostly true 12 years later, but SVC and Storwize V7000 does have specialized hardware, including host bus adapter cards and the [Intel® QuickAssist] chip for Real-time Compression.
IBM DS8000 disk system: The DS8000 is based on off-the-shelf IBM POWER servers. Originally, you could only purchase POWER-based servers from IBM, but now thanks to the [OpenPOWER Foundation], you now have more options.
The DS8000 does use some specialized hardware for its host and device adapters, taking advantage of ASICs and FPGAs to optimize performance.
IBM XIV storage system: IBM acquired XIV back in 2008, but its design is very similar to Seascape architecture. All of the Intellectual Property was in the software, installed on industry-standard x86 servers, cache memory, host bus adapters and 7200 RPM nearline disk drives. I joked that the entire hardware bill-of-materials could be ordered directly from the CDW catalog!
IBM FlashSystem: IBM is #1 rank in the All-Flash Array market. Rather than using off-the-shelf commodity Solid-State drives (SSD), the IBM FlashSystem employs specialized hardware based on FPGAs to optimize performance.
IBM FlashSystem came from the recent acquisition of Texas Memory Systems, and was not designed under the IBM Seascape architecture.
Combining the method the resources are controlled and managed with the way storage is deployed results in a quadrant. Let's take a look at this from a storage perspective:
Traditional storage products that are based on specialized hardware that do not support Software Defined Environment APIs.
Storage products that are based on specialized hardware, but have been enhanced to support Software Defined Environment APIs. For OpenStack, this refers to Cinder and Swift interfaces. For VMware, this would include VAAI, VASA and VADP interfaces and vCenter Console plug-ins.
Storage products that are basically software, either installed on pre-built hardware appliances, offered as services in the Cloud, or software you deploy on your own industry-standard hardware. Unfortunately, this category does not support software defined environment APIs, and so proprietary interfaces require administrator-intensive involvement instead.
Storage software for industry-standard hardware. You purchase the appropriate server, cache memory, flash and disk drives as needed. This category could also extend to pre-built appliance versions of this software, or as services in the Cloud. APIs for software defined environments are available to deploy this with self-service automation.
IBM Spectrum Storage is a family of Category IV software offerings. Here are the products announced:
Based on technology from...
IBM Spectrum Control™
Simplified control and optimization of storage and data infrastructure
SmartCloud Virtual Storage Center, Tivoli Storage Productivity Center
IBM Spectrum Protect™
Single point of administration for data backup and recovery
Tivoli Storage Manager
IBM Spectrum Accelerate™
Accelerating speed of deployment and access to data for new workloads
XIV storage system
IBM Spectrum Virtualize™
Storage virtualization that frees client data from IT boundaries
SAN Volume Controller
IBM Spectrum Scale™
High-performance, scalable storage manages exabytes of unstructured data
GPFS and codename:Elastic Storage
IBM Spectrum Archive™
Enables easy access to long term storage of low activity data
Linear Tape File System (LTFS)
Last year, IDC recognized IBM as #1 in this new emerging software defined storage market. This announcement reinforces IBM's lead in this area. See the [Press Release] for details.
We have a lot to cover, so I will do the quick recap today, and then go in-depth on subsequent posts.
IBM FlashSystem 840 and V840
The FlashSystem now offers a high-voltage 1300W power supply. There are two supplies providing redundancy. In the unlikely event that you are doing maintenance on one of them, the other supply handles all the workload. With the original power supply, the system slowed down the clock speeds to reduce electrical demand. The new power supplies can handle full performance.
Also, the Graphical User Interface (GUI) now holds 300 days of performance data with pan-and-zoom capability. Five predefined graphs showing key performance metrics with additional user-defined metrics available for visualization.
The new v7.4 level of microcode combines features from v7.2.7 and v7.3 into a single code base.
In previous 3-site mirroring implementations, you had A-to-B-to-C cascading. Metro Mirror would get the data from A-to-B, then Global Mirror would copy B-to-C. Multiple Target Peer-to-Peer Remote Copy (PPRC) feature number 7025 allows you to have two separate paths of the data: A-to-B and separately A-to-C. Some folks refer to this as a "star" configuration.
For System z mainframe clients, the new v7.4 introduces new zHyperWrite for DB2 database logs, enhances zGM (XRC) write pacing, and extends Easy Tier automated-tiering API to allow z/OS applications to influence placement on different tiers of storage.
The High Performance Flash Enclosures (HPFE) that IBM introduced last May for the "A" frames are now available for "B" frames. You can have four HPFE in A, and another 4 in B.
DS8870 now offers 600 GB 15K rpm SAS and 1.6 TB 2.5-inch SSD encryption drives for additional capacity and cost performance options to meet data growth demands within the same space. Both support data-at-rest encryption.
Lastly, we have upgraded the OpenStack Cinder driver to the latest Juno release, including features like volume replication and volume retype.
The latest SAN switch is a slim 1U high box that can be configured with 12 or 24 ports. These are 16Bps ports that can auto-negotiate down to 8Gbps, 4Gbps and 2Gbps. These are easy to set up, and can be managed with the IBM Network Advisor management software.
GPFS is the core technology for IBM's "Codename: Elastic Storage" initiative.
You have several options. First, you can purchase just the GPFS software itself. It runs natively on AIX, Windows and Linux, and can be extended to support other operating systems through the use of NAS protocols like NFS or CIFS. Today, the Linux support which was previously just x86 and POWER has been extended to include Linux on System z mainframes as well.
GPFS v4.1 offers "Native RAID" support, with de-clustered RAID in 8+2P and 8+3P configurations. Like the IBM XIV Storage System, this scatters the data across many drives, and can tolerate drive failures better than traditional RAID-5 configurations.
Another option is to get a pre-configured "Converged" appliance that combines servers, storage and hardware. We already offer SONAS and the Storwize V7000 Unified, but IBM now offers the "GPFS Storage Server" running on the new P822L Linux-on-Power servers, RHEL v7, and and GPFS v4.1 with Native RAID to twin-tailed attached DCS3700 expansion drawers. Since GPFS provides the RAID, no need for DCS37000 controllers, saving clients substantial costs.
The IBM Storwize family includes SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, Storwize V5000, Storwize V3700 and Storwize V3500.
The big announcement is that IBM now offers data-at-rest encryption for block data on internal drives in the new generation of Storwize V7000 and V7000 Unified models. There is no performance impact, and no need to purchase new SED-capable drives.
Data-at-rest encryption helps in several ways. First, it protects data if a drive is pulled out and taken away maliciously. Second, it protects data if the drive fails and you want to send it back to the manufacturer for replacement. Third, it allows you to perform a "secure erase" so that the data can be sold or re-purposed without fear of anyone reading previous data.
Initially, the encryption key management is built-in, with the keys stored on a USB memory stick physically attached to the model. In the future, IBM will extend this support to SVC, extend this support to external virtualized drives, and extend this support to IBM Security Key Lifecycle Manager (SKLM).
Other announcements include 16Gbps adapters for SVC, Storwize V7000 and V7000 Unified. The entire Storwize family will also enjoy both 1.8TB 10K RPM 2.5-inch drives, and 6TB 7200RPM 3.5-inch drives
See the Announcement Letter (available later this month) for details.
New TS1150 enterprise tape drives
The anticipation is over! The new TS1150 tape drive has been announced, with 10TB raw un-compressed "JD" media cartridge capacity and 360 MB/sec throughput performance. The new drive is read/write compatible with TS1140 on JC, JY and JK media cartridges.
For the virtual tape libraries for the System z platform, IBM offers two models. The TS7740 had a small amount of disk front ending tape library of physical tape. The TS7720 had a large amount of disk with no tape library.
But then the person carrying the chocolate bar bumped into the person carrying the jar of peanut butter, and the rest is history. IBM will now allow tape attach on TS7720, best of both worlds! Large disk cache plus tape library attach.
Tape-attached TS7720 configurations can have up to eight partitions, with different partitions have different policies. Some might move data from disk cache to tape more aggressively, while other partitions may keep data on disk for longer periods, or indefinitely if needed.
Logical tape volumes can now be up to 25GB in size.
The DCS3700 is IBM's entry-level disk system for sequential-oriented workloads. Today, IBM announced new disk drive options: 400GB 2.5-inch SSD, 800 GB 2.5-inch SSD, and 1.2TB 10K RPM 2.5-inch drive. All of these offer T10 Protection Information (PI) data integrity.
If Eskimos have 37 words for "snow", then EMC has perhaps a similar number of names for "failure". I have already covered a few of their past attempts, including [ATMOS], [Invista], and [VPLEX]. Last week, EMC introduced its latest, called XtremeIO.
But rather than focus on XtremeIO's many shortcomings, I thought it would be better to point out the highlights of IBM's All-Flash array, IBM FlashSystem.
But first, a quick story.
Two years ago, I worked the booth at [Oracle OpenWorld 2011]. After a conference attendee had visited the booths of Violin Memory and Pure Storage, he asked me why IBM did not have an all-Flash array.
Of course IBM did, and I showed him the [Storwize V7000]. For example, a 2U model with 18 SSD drives of 400GB each, configured in two RAID-5 ranks 7+P+S could offer 5.6 TB of space, running up to 250,000 IOPS at sub-millisecond response times.
Why didn't IBM advertise the Storwize V7000 as an all-Flash array? I though the question was silly at the time, since the Storwize V7000 supported SSD, 15K, 10K and 7200 RPM spinning disk, it seemed obvious that it could be configured with only SSD if you chose.
Since then, IBM has added 800GB support to the Storwize V7000, doubling the capacity. More importantly, IBM acquired Texas Memory Systems, and offers a much better all-Flash array.
Flash can be deployed in three levels. The first is in the server itself, such as with PCiE cards containing Flash chips, limited to applications running on that server only.
The second option is a hybrid disk system, that can intermix Flash-based Solid State Drives (SSD) with regular spinning hard disk drives (HDD). These can be attached to many servers.
The problem with this approach is that when Flash is packaged to pretend to be spinning disk, it undermines some of the performance benefits. Traditional disk system architectures using SCSI commands over Device adapter loops can introduce added latency.
The third fits snuggly in the middle: all-Flash arrays designed from the ground up to be only Flash.
Whereas SSD can typically achieve an I/O latency in the 300 to 1000 microseconds range, IBM FlashSystem can process I/O in the 25 to 110 microsecond range. That is a huge difference!
(FTC Disclosure: The U.S. Federal Trade Commission requires that I mention that I am an IBM employee, and that this post may be considered a paid, celebrity endorsement of both the IBM FlashSystem and IBM Storwize family of products. I have no financial interest in EMC, do not endorse the XtremeIO mentioned here, and was not paid to mention their company or products in any manner.)
Fellow blogger and IBM Master Inventor Barry Whyte has a great comparison table in his blog post [Extreme Blogging]. I thought I would add an added column for the Storwize V7000 with 18 Solid State drives.
IBM FlashSystem 820
IBM Storwize V7000 with SSD
20 Terabytes: 1U
11 Terabytes: 2U
7 Terabytes: 6U
I/O latency (microseconds)
110us (~5x faster)
Maximum I/O per second
NAND Flash type
While it is easy to show that EMC's XtremeIO does not hold a candle to IBM FlashSystems, I think it is more amusing that it is not even as good as a Storwize V7000 with SSD that IBM offered two years ago, long before [EMC acquired XtremeIO company] back in May 2012.
The first day of the residency started with introductions. Our emcee and project leader is Vasfi Gucer from IBM Austin lab. There are 17 participants (referred to as "residents") from the USA and various countries including Brazil, Canada and Sweden.
Michael Fork presenting. I am sitting on the far left side
in the pink shirt. Photo taken by Tina Williams.
To set the right expectations, Tina Williams (IBM Social Media ITSO Projects Program Manager) explained what was going to happen this week.
In a typical "residency", residents are brought together for 4-6 weeks to write an [IBM Redbook] which are often how-to guides written in a very conversational tone.
This residency is different. A bunch of social media and Cloud experts have been brought together to share experiences and to build up skills to write individual blog posts about IBM Cloud offerings. I was invited as both a world-reknown blogger as well as a Cloud expert. Everyone who signed up for this commits to write at least six blog posts about Cloud sometime in the next 90 days.
(Residents who do not have their own blogs can post to the IBM [Thoughts on Cloud] group blog Publishing is part of our promotion process, and writing blogs consistently over a period of time counts!)
Jennifer Turner (IBM Worldwide Cloud Marketing Manager) explained IBM Cloud Social Media Initiative. Five years ago, IBM was one of the top 5 Cloud service providers, then a whole bunch of things happened, and we fell out of the top 5 list, and now with the recent [IBM acquisition of SoftLayer], we are in the top 5 again!
Michael Fork (IBM CloudFirst Lead Architect), presented the latest about SoftLayer. Wow! He did a great job, and am glad to have him as a contact in case I have future questions from clients at the Tucson Executive Briefing Center.
Mohsin Syed [@mohsinusyed], IBM Development Manager, presented [IBM Social Media Analytics], combining Hadoop-style analytics using IBM BigInsights, DB2 database and Cognos reporting. IBM can do [sentiment analysis] to determine positive and negative comments in various languages. This product was formerly known as Cognos Consumer Insight.
I was the last speaker of the day. As one of the top bloggers in both the IT Storage Industry, and company-wide within IBM, I was invited to provide a few tips on blogging to the newbies in the audience. Jeff Antley, the "co-owner" of my blog [Inside System Storage] who works on the IBM developerWorks team, was there on hand to help answer questions.
(IBM requires all highly-visible corporate blogs like mine to have at least two owners. Jeff is an expert at HTML, CSS and other web design and has been immensely helpful in getting my blog looking nicer.)
Well it's Tuesday again, and you know what that means.. IBM announcements! Today, IBM announces that next Monday marks the 60th anniversary of first commercial digital tape storage system! I am on the East coast this week visiting clients, but plan to be back in Tucson in time for the cake and fireworks next Monday.
1925 - masking tape (which 3M sold under its newly announced Scotch® brand)
1930 - clear cellulose-based tape (today, when people say Scotch tape, they usually are referring to the cellulose version)
1935 - Allgemeine Elektrizitatsgesellschaft (AEG) presents Magnetophon K1, audio recording on analog tape
1942 - Duct tape
1947 - Bing Crosby adopts audio recording for his radio program. This eliminated him doing the same program live twice per day, perhaps the first example of using technology for "deduplication".
According to the IBM Archives the [IBM 726 tape drive was formally announced May 21, 1952]. It was the size of a refrigerator, and the tape reel was the size of a large pizza. The next time you pull a frozen pizza from your fridge, you can remember this month's celebration!
When I first joined IBM in 1986, there were three kinds of IBM tape. The round reel called 3420, and the square cartridge called 3480, and the tubes that contained a wide swath of tape stored in honeycomb shelves called the [IBM 3850 Mass Storage System].
My first job at IBM was to work on DFHSM, which was specifically started in 1977 to manage the IBM 3850, and later renamed to the DFSMShsm component of the DFSMS element of the z/OS operating system. This software was instrumental in keeping disk and tape at high 80-95 percent utilization rates on mainframe servers.
While visiting a client in Detroit, the client loved their StorageTek tape automation silo, but didn't care for the StorageTek drives inside were incompatible with IBM formats. They wanted to put IBM drives into the StorageTek silos. I agreed it was a good idea, and brought this back to the attention of development. In a contentious meeting with management and engineers, I presented this feedback from the client.
Everyone in the room said IBM couldn't do that. I asked "Why not?" The software engineers I spoke to already said they could support it. With StorageTek at the brink of Chapter 11 bankruptcy, I argued that IBM drives in their tape automation would ease the transition of our mainframe customers to an all-IBM environment.
Was the reason related to business/legal concerns, or was their a hardware issue? It turned out to be a little of both. On the business side, IBM had to agree to work with StorageTek on service and support to its mutual clients in mixed environments. On the technical side, the drive had to be tilted 12 degrees to line up with the robotic hand. A few years later, the IBM silo-compatible 3592 drive was commercially available.
Rather than put StorageTek completely out of business, it had the opposite effect. Now that IBM drives can be put in StorageTek libraries, everyone wanted one, basically bringing StorageTek back to life. This forced IBM to offer its own tape automation libraries.
In 1993, I filed my first patent. It was for the RECYCLE function in DFHSM to consolidate valid data from partial tapes to fresh new tapes. Before my patent, the RECYCLE function selected tapes alphabetically, by volume serial (VOLSER). My patent evaluated all tapes based on how full they were, and sorted them least-full to most-full, to maximize the return of cartridges.
Different tape cartridges can hold different amounts of data, especially with different formats on the same media type, with or without compression, so calculating the percentage full turned out to be a tricky algorithm that continues to be used in mainframe environments today.
The patent was popular for cross-licensing, and IBM has since filed additional patents for this invention in other countries to further increase its license revenue for intellectual property.
In 1997, IBM launched the IBM 3494 Virtual Tape Server (VTS), the first virtual tape storage device, blending disk and tape to optimal effect. This was based off the IBM 3850 Mass Storage Systems, which was the first virtual disk system, that used 3380 disk and tape to emulate the older 3350 disk systems.
In the VTS, tape volume images would be emulated as files on a disk system, then later moved to physical tape. We would call the disk the "Tape Volume Cache", and use caching algorithms to decide how long to keep data in cache, versus destage to tape. However, there were only a few tape drives, and sometimes when the VTS was busy, there were no tape drives available to destage the older images, and the cache would fill up.
I had already solved this problem in DFHSM, with a function called pre-migration. The idea was to pre-emptively copy data to tape, but leave it also on disk, so that when it needed to be destaged, all we had to do was delete the disk copy and activate the tape copy. We patented using this idea for the VTS, and it is still used in the successor models of IBM Sysem Storage TS7740 virtual tape libraries today.
Today, tape continues to be the least expensive storage medium, about 15 to 25 times less expensive, dollar-per-GB, than disk technologies. A dollar of today's LTO-5 tape can hold 22 days worth of MP3 music at 192 Kbps recording. A full TS1140 tape cartridge can hold 2 million copies of the book "War and Peace".
(If you have not read the book, Woody Allen took a speed reading course and read the entire novel in just 20 minutes. He summed up the novel in three words: "It involves Russia." By comparison, in the same 20 minutes, at 650MB/sec, the TS1140 drive can read this novel over and over 390,000 times.)
If you have your own "war stories" about tape, I would love to hear them, please consider posting a comment below.
Tonight PBS plans to air Season 38, Episode 6 of NOVA, titled [Smartest Machine On Earth]. Here is an excerpt from the station listing:
"What's so special about human intelligence and will scientists ever build a computer that rivals the flexibility and power of a human brain? In "Artificial Intelligence," NOVA takes viewers inside an IBM lab where a crack team has been working for nearly three years to perfect a machine that can answer any question. The scientists hope their machine will be able to beat expert contestants in one of the USA's most challenging TV quiz shows -- Jeopardy, which has entertained viewers for over four decades. "Artificial Intelligence" presents the exclusive inside story of how the IBM team developed the world's smartest computer from scratch. Now they're racing to finish it for a special Jeopardy airdate in February 2011. They've built an exact replica of the studio at its research lab near New York and invited past champions to compete against the machine, a big black box code -- named Watson after IBM's founder, Thomas J. Watson. But will Watson be able to beat out its human competition?"
Like most supercomputers, Watson runs the Linux operating system. The system runs 2,880 cores (90 IBM Power 750 servers, four sockets each, eight cores per socket) to achieve 80 [TeraFlops]. TeraFlops is the unit of measure for supercomputers, representing a trillion floating point operations. By comparison, Hans Morvec, principal research scientist at the Robotics Institute of Carnegie Mellon University (CMU) estimates that the [human brain is about 100 TeraFlops]. So, in the three seconds that Watson gets to calculate its response, it would have processed 240 trillion operations.
Several readers of my blog have asked for details on the storage aspects of Watson. Basically, it is a modified version of IBM Scale-Out NAS [SONAS] that IBM offers commercially, but running Linux on POWER instead of Linux-x86. System p expansion drawers of SAS 15K RPM 450GB drives, 12 drives each, are dual-connected to two storage nodes, for a total of 21.6TB of raw disk capacity. The storage nodes use IBM's General Parallel File System (GPFS) to provide clustered NFS access to the rest of the system. Each Power 750 has minimal internal storage mostly to hold the Linux operating system and programs.
When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.
“In times of universal deceit, telling the truth will be a revolutionary act.”
-- George Orwell
Well, it has been over two years since I first covered IBM's acquisition of the XIV company. Amazingly, I still see a lot of misperceptions out in the blogosphere, especially those regarding double drive failures for the XIV storage system. Despite various attempts to [explain XIV resiliency] and to [dispel the rumors], there are still competitors making stuff up, putting fear, uncertainty and doubt into the minds of prospective XIV clients.
Clients love the IBM XIV storage system! In this economy, companies are not stupid. Before buying any enterprise-class disk system, they ask the tough questions, run evaluation tests, and all the other due diligence often referred to as "kicking the tires". Here is what some IBM clients have said about their XIV systems:
“3-5 minutes vs. 8-10 hours rebuild time...”
-- satisfied XIV client
“...we tested an entire module failure - all data is re-distributed in under 6 hours...only 3-5% performance degradation during rebuild...”
-- excited XIV client
“Not only did XIV meet our expectations, it greatly exceeded them...”
In this blog post, I hope to set the record straight. It is not my intent to embarrass anyone in particular, so instead will focus on a fact-based approach.
Fact: IBM has sold THOUSANDS of XIV systems
XIV is "proven" technology with thousands of XIV systems in company data centers. And by systems, I mean full disk systems with 6 to 15 modules in a single rack, twelve drives per module. That equates to hundreds of thousands of disk drives in production TODAY, comparable to the number of disk drives studied by [Google], and [Carnegie Mellon University] that I discussed in my blog post [Fleet Cars and Skin Cells].
Fact: To date, no customer has lost data as a result of a Double Drive Failure on XIV storage system
This has always been true, both when XIV was a stand-alone company and since the IBM acquisition two years ago. When examining the resilience of an array to any single or multiple component failures, it's important to understand the architecture and the design of the system and not assume all systems are alike. At it's core, XIV is a grid-based storage system. IBM XIV does not use traditional RAID-5 or RAID-10 method, but instead data is distributed across loosely connected data modules which act as independent building blocks. XIV divides each LUN into 1MB "chunks", and stores two copies of each chunk on separate drives in separate modules. We call this "RAID-X".
Spreading all the data across many drives is not unique to XIV. Many disk systems, including EMC CLARiiON-based V-Max, HP EVA, and Hitachi Data Systems (HDS) USP-V, allow customers to get XIV-like performance by spreading LUNs across multiple RAID ranks. This is known in the industry as "wide-striping". Some vendors use the terms "metavolumes" or "extent pools" to refer to their implementations of wide-striping. Clients have coined their own phrases, such as "stripes across stripes", "plaid stripes", or "RAID 500". It is highly unlikely that an XIV will experience a double drive failure that ultimately requires recovery of files or LUNs, and is substantially less vulnerable to data loss than an EVA, USP-V or V-Max configured in RAID-5. Fellow blogger Keith Stevenson (IBM) compared XIV's RAID-X design to other forms of RAID in his post [RAID in the 21st Centure].
Fact: IBM XIV is designed to minimize the likelihood and impact of a double drive failure
The independent failure of two drives is a rare occurrence. More data has been lost from hash collisions on EMC Centera than from double drive failures on XIV, and hash collisions are also very rare. While the published worst-case time to re-protect from a 1TB drive failure for a fully-configured XIV is 30 minutes, field experience shows XIV regaining full redundancy on average in 12 minutes. That is 40 times less likely than a typical 8-10 hour window for a RAID-5 configuration.
A lot of bad things can happen in those 8-10 hours of traditional RAID rebuild. Performance can be seriously degraded. Other components may be affected, as they share cache, connected to the same backplane or bus, or co-dependent in some other manner. An engineer supporting the customer onsite during a RAID-5 rebuild might pull the wrong drive, thereby causing a double drive failure they were hoping to avoid. Having IBM XIV rebuild in only a few minutes addresses this "human factor".
In his post [XIV drive management], fellow blogger Jim Kelly (IBM) covers a variety of reasons why storage admins feel double drive failures are more than just random chance. XIV avoids load stress normally associated with traditional RAID rebuild by evenly spreading out the workload across all drives. This is known in the industry as "wear-leveling". When the first drive fails, the recovery is spread across the remaining 179 drives, so that each drive only processes about 1 percent of the data. The [Ultrastar A7K1000] 1TB SATA disk drives that IBM uses from HGST have specified 1.2 million hours mean-time-between-failures [MTBF] would average about one drive failing every nine months in a 180-drive XIV system. However, field experience shows that an XIV system will experience, on average, one drive failure per 13 months, comparable to what companies experience with more robust Fibre Channel drives. That's innovative XIV wear-leveling at work!
Fact: In the highly unlikely event that a DDF were to occur, you will have full read/write access to nearly all of your data on the XIV, all but a few GB.
Even though it has NEVER happened in the field, some clients and prospects are curious what a double drive failure on an XIV would look like. First, a critical alert message would be sent to both the client and IBM, and a "union list" is generated, identifying all the chunks in common. The worst case on a 15-module XIV fully loaded with 79TB data is approximately 9000 chunks, or 9GB of data. The remaining 78.991 TB of unaffected data are fully accessible for read or write. Any I/O requests for the chunks in the "union list" will have no response yet, so there is no way for host applications to access outdated information or cause any corruption.
(One blogger compared losing data on the XIV to drilling a hole through the phone book. Mathematically, the drill bit would be only 1/16th of an inch, or 1.60 millimeters for you folks outside the USA. Enough to knock out perhaps one character from a name or phone number on each page. If you have ever seen an actor in the movies look up a phone number in a telephone booth then yank out a page from the phone book, the XIV equivalent would be cutting out 1/8th of a page from an 1100 page phone book. In both cases, all of the rest of the unaffected information is full accessible, and it is easy to identify which information is missing.)
If the second drive failed several minutes after the first drive, the process for full redundancy is already well under way. This means the union list is considerably shorter or completely empty, and substantially fewer chunks are impacted. Contrast this with RAID-5, where being 99 percent complete on the rebuild when the second drive fails is just as catastrophic as having both drives fail simultaneously.
Fact: After a DDF event, the files on these few GB can be identified for recovery.
Once IBM receives notification of a critical event, an IBM engineer immediately connects to the XIV using remote service support method. There is no need to send someone physically onsite, the repair actions can be done remotely. The IBM engineer has tools from HGST to recover, in most cases, all of the data.
Any "union" chunk that the HGST tools are unable to recover will be set to "media error" mode. The IBM engineer can provide the client a list of the XIV LUNs and LBAs that are on the "media error" list. From this list, the client can determine which hosts these LUNs are attached to, and run file scan utility to the file systems that these LUNs represent. Files that get a media error during this scan will be listed as needing recovery. A chunk could contain several small files, or the chunk could be just part of a large file. To minimize time, the scans and recoveries can all be prioritized and performed in parallel across host systems zoned to these LUNs.
As with any file or volume recovery, keep in mind that these might be part of a larger consistency group, and that your recovery procedures should make sense for the applications involved. In any case, you are probably going to be up-and-running in less time with XIV than recovery from a RAID-5 double failure would take, and certainly nowhere near "beyond repair" that other vendors might have you believe.
Fact: This does not mean you can eliminate all Disaster Recovery planning!
To put this in perspective, you are more likely to lose XIV data from an earthquake, hurricane, fire or flood than from a double drive failure. As with any unlikely disaster, it is best to have a disaster recovery plan than to hope it never happens. All disk systems that sit on a single datacenter floor are vulnerable to such disasters.
For mission-critical applications, IBM recommends using disk mirroring capability. IBM XIV storage system offers synchronous and asynchronous mirroring natively, both included at no additional charge.
Every year, March 31 marks "World Backup Day". Sadly, many people forget the importance of backing up their critical information. This is not just true for businesses, non-profit organizations and government agencies, but also for all of your personal information that you keep on computer devices.
My friends over at Cloudwards had developed an awesome infographic related to World Backup Day. Here it is.
(FTC Disclosure: I work for IBM, which has no business relationship with Cloudwards. Cloudwards does not itself provide backup services, but rather reviews services provided by others. This post should not be considered an endorsement of Cloudwards or their reviews.)
Courtesy of: Cloudwards.net
I hope you find this information helpful and informative!
Well it's Tuesday again, and you know what that means? IBM Announcements!
Today, IBM announced an exciting new addition to the IBM System Storage™ product line, [IBM Spectrum Storage™], a family of software defined storage offerings.
To understand its significance, I need to explain a few things first. Software defined storage is part of a larger concept of software defined environment.
How is software defined environment different than what you have now? In every data center, you need to map business requirements of an application workload with an appropriate set of IT infrastructure, including server, network and storage resources.
The traditional approach involves an application owner or database administrator reviewing the business requirements documented for the application, calling the server, network and storage administrators, who match those requirements to appropriate IT hardware and notify the folks in facilities to rack and stack the gear accordingly.
In a software defined environment, Application Programming Interfaces (API), Service Level Agreements (SLA) and Orchestration workflows can automate the request for the appropriate resources. This is referred as the "Control Plane".
Responding to these requests, the software can provision the appropriate server, network and storage resources required. Server, network and storage virtualization, standard interfaces and deployment technologies exist to make this practical. This is referred to as the "Data Plane".
Any time new a way of doing things is introduced into the world, there could be some resistance. Let's tackle the three most frequently stated objections:
"IT infrastructure resources are rare and expensive! Administrators need to control or approve how resources are doled out!" An objection to self-service automation is the fear that employees would take too much.
If you have a bank account, Automated Teller Machines (ATM) can restrict the amount of cash you can take out, based on what is appropriate per request, or per day, with an upper limit of what you have in your personal checking or savings account. You enter your debit card and PIN into the "Control Plane" keypad and out comes a stack of 20-dollar bills from the "Data Plane" slot. In a software defined environment, you can limit requests through quotas and resource pools.
"Some application workloads are more important than others! Another objection is that every workload will be treated in the same standard way, mission critical workloads and dev/test would be treated alike.
At the gas station, you can select different levels of octane gasoline. You enter your credit card and zip code into the "Control Plane" keypad and selected octane comes out of the "Data Plane" hose. In a software defined environment, resources can be provisioned with different Quality of Service (QoS) levels.
"Different applications require different combinations of resources!" Another objection is the fear that fixed combinations of server, storage and network resources will be stifling to innovation and productivity.
At the vending machine, you can choose which candy bar and which chips to have with whatever soft drink you choose for lunch. You enter your bills and coins into the "Control Plane" slot, select the row letter and column number for your snack of choice, and then fetch your purchases from the "Data Plane" flap. In a software defined environment, a Service Catalog can offer a virtual menu of different server, network and storage resources to be combined together as needed.
These concerns are addressed well enough in software defined environments, in general, and with IBM Spectrum Storage family of products, in particular.
(Nostalgia: I remember the days before self-service automation. At the bank, I had to stand in line at the bank until I could to talk to a human bank teller to get cash from my savings account. At the gas station, human gas attendants would come out and pump the gas for me, check my oil and wash my windshield. And at a restaurant, I felt like I waited an eternity from the time I ordered my meal to the time the human short-order cook had it ready and human wait staff delivered it to my table. These all seem silly today, doesn't it?)
Jamie Thomas, IBM General Manager of Storage and Software Defined Environments
Jamie announced [IBM Elastic Storage], a new offering that is available as a software defined storage solution, based on IBM's General Parallel File System (GPFS) technology already deployed at 45,000 installations.
IBM Elastic Storage provides a global name view across data center locations. It can manage up to a Yotabyte of information, combining Flash, disk and tape resources. It supports OpenStack interfaces, Hadoop and standard POSIX file system conventions.
IBM Elastic Storage provides automated tiering to move data from different storage media types. Infrequently accessed files can be migrated to tape and automatically recalled back to disk when required. Unlike traditional storage, it allows you to smoothly grow or shrink your storage infrastructure without application disruption or outages.
IBM Elastic Storage software can run on a cluster of x86 and/or POWER-based servers, and can be used with internal disk, commodity storage, or advanced storage systems from IBM or other vendors.
IBM partnered with various clients in different industries in a special beta program. Jamie led a client panel to discuss their experiences with IBM Elastic Storage:
Alan Malek, Director of IT, Cypress Semiconductor.
"Total cycle time is key". Over the past 31 years, they bought whatever file storage was available. Now, with IBM Elastic Storage, the performance was very consistent for their engineering workloads with full load balancing.
Russell Schneider, Principal Storage Consultant, Jeskell.
Russell's company works with a lot of federal agencies, "Big Data has become Bigger Data". For example, research on Global Warming and Climate Change requires a large amount of storage across agencies.
In another example, when the tsunami hit Japan a few years ago, an agency here in the USA realized they had 14PB of data stored as a single copy in a data center at sea level less than a mile from the coast. They realized they needed to have a secondary copy, and an option to cache to a third location depending on regional disasters.
Matthew Richards, Products, OwnCloud.
For those not familiar with OwnCloud, it provides a Dropbox-like file sharing service, but in the Enterprise, with on-premise storage. It has been fully tested and certified with IBM Elastic Storage to provide a secure file sharing platform.
With IBM Elastic Storage, they were able to scale linearly up to 20,000 users, and are now testing 100,000 users. The need to have intelligent access to files at scale is what Matthew likes about IBM Elastic Storage.
Dr. Michael Factor, IBM Distinguished Engineer at IBM Research
Michael started out explaining there are three areas for storage: block, file and object. The fastest growing type of data is unstructured fixed content with associated metadata. This is ideal for object storage. Michael has been working with OpenStack Swift, an open source interface defined for object storage. He defined "storlets" as follows:
Storlets extend an object store by moving computation to the data -- filtering, transforming, analyzing -- instead of bringing data to the computation.
Storlets have been deployed on a variety of European Union research projects. For example, in partnership with Phillips, a pathology storlet can count the number of cancer cells in an image. By bringing the computation to the data, it eliminates having to transfer large amounts of data over the network.
Storlets can run on-premise and on IBM's SoftLayer IaaS cloud offering.
Bruce Hillsberg, IBM Director of Storage Systems at IBM Research
Bruce led another panel discussion, this time of IBM storage experts:
Vincent Hsu, IBM Fellow and CTO of Storage.
The problem is the isolation of data into "storage silos". Isolation causes problems in managing large amounts of data at scale, and costs more as storage is not fully utilized. IBM Elastic Storage brings everything together, eliminating storage silos.
Michael explained how IBM works with clients all over the world to ensure that storage solutions meet client requirements. For example, storlets can be used to use rich metadata to manage photographs, and display them based on GPS satellite location, or other content that makes it easier to manage these images.
IBM Elastic Storage will support OpenStack Cinder and Swift interfaces. IBM is a platinum sponsor of OpenStack foundation, and is now its second most prolific contributor, with hundreds of full-time employees working on this.
Tom Clark, IBM Distinguished Engineer, Chief Architect, Storage Software, Cloud & Smarter Infrastructure.
Storage Management is a critical piece of Software Defined Storage. This is done in three ways:
The use of analytics to optimize the deployment of storage, based on workload requirements. Storage admins set policies, and then IBM Elastic Storage analytics gather metrics and then optimize data placement and movement based on these policies. IBM Elastic Storage has 70 percent lower TCO that competitive offerings.
The focus on backup services. Backups are not just for data protection, but rather can be used to duplicate or replicate data for testing, for training, and for other purposes. IBM Elastic Storage is fully supported by IBM Tivoli Storage Manager.
Being able to support Hybrid Cloud environments, where some data can be on-premise, and other data off-premise. Storage Management challenges will need to deal with this possibility. IBM Elastic Storage is well positioned for this.
Carl Kraenzel, IBM Distinguished Engineer, Director of Watson Cloud Technology and Support.
Watson is ground-breaking technology, and IBM Elastic Storage technology was at the heart of the Watson that was first introduced in 2011.
To consider IBM Elastic Storage based on lower-cost and higher-scalability is not the full picture. Rather, this is an important platform for Cognitive Computing, which we are just at the tip of the iceberg in exploring. IT systems need to be aware of the context of what we are doing.
While the Grand Challenge demonstration on Jeopardy! was exciting, it is time we stop playing games and apply IBM Elastic Storage to business, to help with health care and medical research, and other problems in society. IBM has already deployed this at Anderson Cancer Center and Memorial Sloan Kettering Cancer Center, for example.
Tom Rosamilia provided closing remarks. IBM Elastic Storage is not just for new workloads in Cloud, Analytics, Mobile and Social (CAMS) but also traditional workloads as well. IBM Elastic Storage provides "data democracy" and allows for "better rested storage administrators" that make fewer mistakes.
Tom opened the floor for questions from the audience:
Q1. Data integrity, not just security but also quality? IBM Elastic Storage has end-to-end data integrity checking built-in.
Q2. How does IT transition from full control to auto-pilot? IBM allows you to tap into existing storage. This is not rip-and-replace. With storage virtualization, IBM hides the complexity that normally requires full control over specific assets.
Q3. Storage admins would rather have a root canal without Novocaine than move their data. What is IBM doing to offer automation to help storage admins move to this new infrastructure? IBM storage virtualization breaks that hard link between applications and specific storage devices. IBM Elastic Storage eliminates application downtime previously associated with data movement.
Tom Rosamilia assured the audience that IBM is fully committed to its storage portfolio. IBM Elastic Storage is not just about the profoundness of what IBM announced today, but also where IBM is investing in the future of storage.
Recently, a client asked how to backup their IBM PureData System for Analytics devices. IBM had [acquired Netezza in November 2010], and later renamed their TwinFin devices as the IBM PureData for Analytics, powered by Netezza.
The [IBM PureData System for Analytics] is incredibly fast for performing deep, ad-hoc analytics. However, the people who use them are "data scientists", not backup experts.
Likewise, there are backup administrators who may not be familiar with the unique characteristics of this expert-integrated system to know what backup options are available.
As with the rest of the IBM PureSystems line, the IBM PureData System for Analytics (or, PDA for short) has a combination of servers, storage and switches inside.
In a full-frame PDA, there are two servers in Active/Passive mode, these coordinate activity to FPGA-based blade servers, which have parallel access to hundreds of disk drives, storing nearly 200 TB of compressed database data. A system can span up to four frames.
But what do you backup? And why? You don't need to worry about backing up the Linux operating system or NPS server code, that is considered firmware and if anything every got corrupted, IBM would help restore it for you. System-wide metadata, such as the host catalog and global users, groups, and permissions should be backed up periodically to protect against data corruption.
There are a number of reasons to backup your user databases:
As part of firmware upgrade/downgrade
To transfer data to another system
Protect against hardware failure / disaster
Protect against data corruption
The PDA has three backup formats. You can backup the entire user database in compressed format, backup individual tables in compressed format, or export to a text-format file.
Compressed format is faster, but can only be restored to the same PDA, or a PDA that has the same or higher level of NPS firmware. The text-format is slower, but can be used to restore to lower levels of NPS firmware, or to other database systems.
There are basically two methods to backup your PDA. The first is called the "Filesystem" method. Basically, you can attach an external storage device to the NPS server, and use the built-in command line interface (CLI) to store the backups onto its file system.
On NPS version 6, the nzhostbackup will backup the /nz/data directory which stores the system tables, database catalogs, configuration files, query plans, and cached executable code for the SPU blade servers.
(I have heard that the nzhostbackup will get deprecated in NPS version 7, but I only have access to version 6. As always, [RTFM] for your specific NPS code level.)
The nzbackup with the users parameter will backup the global users, groups and permissions. This is included in the /nz/data backup contents from the nzhostbackup command, but you may want to backup and restore these separately.
The nzbackup with the db parameter will backup a user database in compressed format. To backup individual tables, use the CREATE EXTERNAL TABLE command, which can create compressed or text-format exports.
You may find that your databases are so large, they will exceed the limits of the filesystem on the external storage device. For SAN or NAS deployments, I recommend the IBM Storwize V7000 Unified with IBM General Parallel File System (GPFS). However, if you are using something else, you may need to use the "nz_backup" scripts provided which split up the backup images into smaller pieces that most other filesystems can handle.
The PDA comes with 10GbE Ethernet ports that you can attach a NAS storage device over a Local Area Network (LAN), or add Fibre Channel Protocol (FCP) ports and connect over a Storage Area Network (SAN). To keep things simple, I will refer to whichever network you decide as the "Backup Network" in the drawings.
The second method for backup is called the "External Backup Software" method. As you have probably guessed, it involves sending the backups to a supported software product like IBM Tivoli Storage Manager (or, TSM for short).
In this case, the PDA acts as a client node, similar to a laptop, desktop, or application server with internal disk. Backup data is sent over the LAN to the designated TSM server, and the TSM server in turn writes over the SAN to its storage hierarchy of disk, virtual tape and/or physical tape resources.
Backups can be done by command "on demand", or automated on a schedule. For the /nz/data directory, direct the nzhostbackup command to send the backup copy to local disk, then use TSM's dsmc archive command to transfer this backup copy to the TSM server.
For nzbackup with the users or db parameters, you can send the data directly to the appropriate TSM server by specifying the connector and connectorArgs parameters.
To reduce traffic on the TSM Server, an intermediary "TSM Proxy Node" can be put in between. In this case, the PDA sends the backup to the Proxy Node, the Proxy Node uses a "LAN Free Storage Agent" to send the backups directly to the virtual tape and/or physical tape, and then notifies the TSM Server to updates its system catalog to record which tape holds these new backups.
Another configuration involves installing the TSM LAN Free storage agent directly on the PDA. While this will require FCP ports to be added and consume more CPU resources on the NPS server, it eliminates most of the LAN traffic, allowing the PDA to send its backups directly to virtual or physical tape.
Well it's Tuesday again, and you know what that means? IBM announcements! Many of the announcements were made by IBM Executives at the [IBM Pulse 2014 conference].
IBM BlueMix is the newest cloud offering from IBM, providing Platform-as-a-Service (PaaS) offering based on the Cloud Foundry open source project that promises to deliver enterprise-level features and services that are easy to integrate into cloud applications.
This week, my fifth-line manager Tom Rosamilia, IBM Senior Vice President IBM Systems & Technology Group and Integrated Supply Chain made two announcements at Pulse. First, in additional to x86-based servers, SoftLayer will also offer POWER-based servers to run AIX, IBM i and [Linux on POWER] applications.
Second, SoftLayer will support PureApplication Patterns of Expertise. What is a pattern of expertise? It can be as simple as a virtual machine encapsulated in [Open Virtual Format], to more dynamic architectures, packaged with required platform services, that are deployed and managed by the system according to a set of policies.
Patterns simplify and automate tasks across the lifecycle of the application. Customers and partners alike are [seeing significant reductions in cost and time] across the application lifecycle with the deployment of a PureApplication System.
Also, this week at Pulse, Robert LaBlanc, IBM Senior Vice President of Software and Cloud Solutions, announced [IBM plans to Acquire Cloudant] which offers an open, cloud Database-as-a-Service (DBaaS) that helps organizations simplify mobile, web app and big data development efforts.
When I introduced [SmartCloud Virtual Storage Center] back in October 2012, I mentioned that it was a great solution for large enterprise that have all of their disk behind SAN Volume Controller (SVC).
To reach smaller accounts, IBM has announced two new offerings:
IBM SmartCloud Virtual Storage Entry for customers that have less than 250TB of disk behind two or four SVC nodes. It is priced per terabyte, by the amount of capacity that is virtualized.
IBM SmartCloud Virtual Storage for Storwize Family for customers that have other Storwize family products (Storwize V7000 or V5000, for example). It is priced per the number of storage enclosures that are managed by the Storwize family hardware.
From the photo, the marketing people staggered the various components to give it a stylized [Dagwood Sandwich] effect. I can assure you that these are just standard 19-inch rack components that fit into 6U of space in standard IT racks.
Starting top to bottom, we have the first FlashSystem V840 Control Enclosure, its 1U-high UPS, a second FlashSystem V840 Control Enclosure and its UPS, and finally a 2U-high FlashSystem V840 Storage Enclosure.
You can have up to a dozen Flash modules, either 2TB or 4TB size, for a maximum of 40TB usable RAID-protected capacity. These can be protected with AES 256-bit encryption. The FlashSystem modules are front-loaded, and slide in and out for easy maintenance.
The system is fully redundant and hot-swappable with concurrent code load to ensure high availability.
(Update: In the comments, readers thought that this was nothing more than just a two-node SVC with FlashSystem 840. There are differences, so I have added the following table.)
SVC with FlashSystem 840
Cabling from controllers to storage
Through SAN fabric ports
Direct attach from V840 Controllers to V840 Storage Enclosures
Call Home Support
GUI screen branding
The system is fully VMware-certified, supporting VAAI interfaces, and an SRA for VMware's Site Recovery Manager (SRM). With Real-time Compression, you can get up to 80 percent capacity savings for workloads like Virtual Desktop Infrastructure (VDI). That in effect gives you up to 5x (200TB) of virtual capacity in 6U of rack space!
You can either keep it as an All-Flash array, or you can virtualize external IBM and non-IBM disk systems, and use the Flash capacity in the Storage Enclosure for IBM's Easy Tier automated sub-volume tiering and data migration. With or without external storage, the FlashSystem V840 can provide local and remote mirroring and point-in-time copies.
However, I was speaking to various clients in Winnipeg, Canada Tuesday and Wednesday this week, so marketing moved the announcement date to today to accommodate my schedule. Sometimes, being the #1 most influential IBM employee in storage comes in handy!)
Here, then, is a quick review of the storage portion of today's announcements.
IBM FlashSystem 840
The [IBM FlashSystem 840] offers twice the capacity as its predecessors, the 810 and 820, with up to 48TB in a dense 2U package.
(Quick recap of previous models: Both the FlashSystem 810 and 820 supported ECC-protected memory and Variable-striped RAID (VSR). The [FlashSystem 810] supported RAID-0 striped across the modules, and the [FlashSystem 820] supported two-dimensional 2D-RAID across modules for higher availability. Fellow blogger Jim Kelley (IBM) on his Storage Buddhist blog has a great post on this: [IBM FlashSystem: Feeding the Hogs].
The new FlashSystem 840 in effect replaces both, so you can choose RAID-0 striping or 2D-RAID, along with your ECC-protected memory and Variable-striped RAID. It offers hot-swappable Flash modules, redundant components, and non-disruptive concurrent code load (CCL).
The FlashSystem 840 also introduces military-grade AES-XTS 256 bit encryption to provide added protection to your data.
For host attachment, you have some great choices: 16Gb/8Gb/4Gb auto-negotiated Fibre Channel (FCP), 40Gb InfiniBand QDR, and 10Gb FCoE. Whatever you decide, you get 90 microsecond writes, and 135 microsecond reads.
Since its introduction just over a year ago, IBM has sold FlashSystem to over 1,000 clients! For more on how this compares to other all-flash arrays, read my previous post about [IBM FlashSystem].
Adding SAN Volume Controller provides some key advantages, including Real-time compression, Thin provisioning, FlashCopy point-in-time copies, Stretched Cluster support, Easy Tier sub-LUN automated tiering, and remote copy services like Metro Mirror (synchronous) and Global Mirror (asynchronous).
Adding the SVC also changes the host attachment options: 8Gb/4Gb/2Gb Fibre Channel (FCP), 1Gb and 10Gb iSCSI, and 10Gb FCoE. Depending on the options and features you choose, the SVC layer adds a modest 60 to 100 microseconds to each read and write.
Each SVC node dedicates four of its six cores, and 2GB of its 24GB cache, to use with compression. Those interested in beefing up compression performance, either with FlashSystems or with any other disk, can choose the "Compression Hardware Upgrade Boosts Base I/O Efficiency" (affectionately known as the CHUBBIE) RPQ 8S1296 for SVC systems with software version 184.108.40.206 or higher. Basically, this RPQ adds another 6-core CPU and another 24GB of cache, so that each node can dedicate 8 cores for compression, and 26GB of cache for compression processing. Initial test results show this can increase performance 3x!
IBM Network Advisor
The [IBM Network Advisor v12.1] management software provides comprehensive management for data, storage and converged networks. This single application can deliver end-to-end visibility and insight across different network types--it supports Fibre Channel SANs (including Gen 5 Fibre Channel platform), IBM FICON and IBM b-type SAN FCoE networks--and provides new features to manage your Brocade and IBM b-type SAN switches.
Cisco MDS 9710 Multilayer Director
The [Cisco MDS 9710 Multilayer Director] is mainframe-ready, with full support for System z FICON and Fibre Channel protocol (FCP) environments. This director supports eight module slots for a maximum of 384 ports.
Well it's Tuesday again, and you know what that means? IBM Announcements!
You might be thinking, didn't IBM just have a [huge storage announcement October 8, 2013]? You would be right! IBM's $1B additional investment in Storage has been like shot of adrenaline in getting new features and functions out sooner to our clients.
DS8870 Disk System Release 7.2
New IBM POWER7+ controllers. The previous models of DS8870 were based on the POWER7 controllers, and these new models have POWER7+ processors. This change enhances the performance across the board, from mainframe to distributed systems, from sequential to random. Customers with existing POWER7-based models will be able to do MES upgrade to the new POWER7+ next year.
For comparison with older DS8000 models, here are some internal IBM measurements we took for Database workloads on both z/OS(mainframe) and Distributed systems with typical 70% read, 30% write and 50% cache hit:
IBM Internal Measurements (thousands of IOPS)
DB Distributed systems
New 1.2TB (10K RPM) and 4TB (7200 RPM) self-encrypting enterprise drives (SED). This is a 33% boost over the previous 900GB and 3TB drives previously available. As with all the other drives in the DS8870, these new drives include the encryption chip right on the drive itself, offering encryption with scalability.
Improved security. Release 7.2 will support the U.S. National Institute of Standards and Technology [NIST.gov]] 800-131A specification, raising the 96-bit encryption to the required 112 bits on the customer IP network. This involves updates to the security firmware, management software and digital signatures on code loads.
Metro Mirror enhancement for System z. By avoiding serial conflicts of updated blocks, this enhancement can boost performance up to 100 percent when using Metro Mirror with z/OS applications on System z mainframes.
Easy Tier™ reporting and graphs to determine optimal mix. Now you can see for yourself how sub-LUN automated tiering is helping your applications.
Easy Tier Workload Categorization
New workload visuals help clients and IBM technical specialists compare activity across tiers within and across pools to help determine optimal drive mix for current workloads
Easy Tier Data Movement Daily Report
New Easy Tier summary report every 24 hours illustrating data migration activity (5-minute intervals) can help visualize migration types and patterns for current workloads
Easy Tier Workload Skew Curve
Shows skew of all workloads across the system in a graph to help clients visualize and accurately tier configurations when adding capacity or a new system Clients can import data into Disk Magic
All-Flash Optimization. Yesterday, in my post [IBM FlashSystem versus EMC XtremeIO], I mentioned that any hybrid systems like the IBM Storwize V7000 that can support a mix of SSD and HDD can obviously be configured as SSD-only. Apparently, that was not obvious to many readers, so I apologize. For the DS8870, you can configure an all-Flash (SSD only) configuration, and Release 7.2 added some optimization when configured with SSD only.
1,056 drives 15K @146GB in RAID-10
224 drives SSD @400GB in RAID-5
Same - Usable 72 TB
70 percent faster
33 percent less floor space required
62 percent less energy consumed
(Note: Performance results based on measurements and projections using IBM benchmarks in a controlled environment.)
OpenStack™ support DS8870 now offers the [OpenStack Cinder] interface for block LUN allocations in OpenStack environments. IBM is a Platinum sponsor of OpenStack, and Opentack is the strategic platform for IBM private and hybrid clouds.
XIV Storage System
Following on the heels of the [XIV enhancements announced], IBM has now added 800GB Solid State Drives (SSD) as Read cache for its 4TB drive-based models.
DCS3860 Disk System
The DCS3860 is the next generation of the DCS3700 disk system. Designed with Linux-x86 servers in mind, the system offers direct SAS host attachement, 24GB of cache, and 60 drives in a compact 4U drawer. Like its predecessor, the drives are stored on five pull-out trays, with twelve hot-swappable drives per tray. You can add up to five more expansion units, with 60 drives each, for a total of 360 drives in 24U rack space.
These new models will help our clients deploy new workloads and consolidate existing workloads.
Each resident presented at least six proposals for blog post ideas. A proposal included a title and short description of what it would entail. Titles had to be less than 70 characters, and the short descriptions were typically just a few sentences.
These were presented to the entire team, and we picked them apart, suggested better wording for the titles, or different ways to approach the topic.
"I treat others respectfully, attacking ideas and not people. I also welcome respectful disagreement with my own ideas.
I believe in intellectual property rights, providing links, citing sources, and crediting inspiration where appropriate.
I disclose my material relationships, policies and business practices. My readers will know the difference between editorial, advertorial, and advertising, should I choose to have it. If I do sponsored or paid posts, they are clearly marked.
When collaborating with marketers and PR professionals, I handle myself professionally and abide by basic journalistic standards.
I always present my honest opinions to the best of my ability.
I own my words. Even if I occasionally have to eat them."
Words to live by.
The residents spent most of the day working on our blogs from the proposals that were approved. The target was around 400 to 600 words in length, with one or two stock photos.
IBM is the #1 vendor for Social Business tools, so it makes sense for us to use our own stuff to facilitate the submission process. The residents submit their blog posts to IBM Connections as an activity in the Cloud Social Media Residency community. All of the resources we used, and all the presentations we saw, are all here in the community.
As an incentive, prizes were given out to those who submitted the most posts by end of the day.
We were given certificates for completing the class, and a "Redbooks Thought Leader" emblem to put on our blog.
Ryan Boyles took a group photo! If it seems that the photo is slightly askew, it is to make me look taller. Yes, I could have used GIMP to fix the orientation, but why bother? I look tall! Woo hoo! I will have to remember this technique for future group photos.
ITSO Cloud Social Media Residency, Oct 2013. Photo by Ryan Boyles.
Lastly, I would like to thank Vasfi, Tamikia, Hillary, Caroline, Ric, Jane, LeeAnne, Tina, Karen, Michael, Shelbee, Farzad, Stewart, Arun, Eric, Chris, Hans, Odilon, Mohsin, Wolfgang and the rest of the ITSO team for a wonderful job organizing this week!
The gondolier propels the boat with an oar, and stopped rowing a few times to belt out beautiful Italian songs.
Truly impressed, I asked the gondolier how long was the training for this job. "Six weeks!" he answered. Wow! Where can I learn to sing like that in six weeks?
He clarified. No, the Venetian hotel hires competent singers, and then spends six weeks to teach them to row the gondola. Duh!
I asked Vasfi Gucer, our ITSO project leader for this residency, why there were so many Cloud topics on the agenda for this social media training. He explained it was just as important to emphasize "why" people need to be passionate about Cloud, in addition to the "what" and "how" of blogging.
This reminded me of this quote from fellow author Hugh MacLeod. I highly recommend his series of books.
"Blogging requires passion and authority. Which leaves out most people."
--- Hugh MacLeod.
Vasfi had invited Cloud experts who already have the authority to blog, and the point of this residency is for the residents to become passionate in sharing their expertise.
Here are some of the people that spoke on Cloud:
Ric Telford, IBM VP of Cloud Services
Ric Telford shared with us IBM's point of view of where the Cloud industry is going. He has been in this job position since 2009, and shared with us the history of how the IBM Cloud business has evolved in the past four years.
Jane Munn, IBM VP Business Line Executive for Cloud hardware
As the Center of Competency on Cloud for all 12 IBM Executive Briefing Centers in my group, I had to report to Jane Munn on a frequent basis. I was pretty candid on those calls on what we should change, and I am glad to see that many of my suggestions have been implemented, or being considered for 2014.
Michael Fork, IBM Lead Architect for Hosted Private Cloud
Michael Fork gave two great presentations, one on [IBM SoftLayer] Cloud services, and the second on IBM's support of open standards, such as [OpenStack] and Cloud Foundry.
Hans Zai, IBM Cloud Service Line Leader; and Odilon Magroski Goulart Junior, IBM Technical Solution Architect
All the residents had to present in front of the class on their expertise. Hans and Odilon presented their work on [IBM SmartCloud for SAP Applications]. Hans is from Sweden, and Odilon from Brazil, so their perspectives on this was quite interesting.
When IBM renamed LotusLive to [SmartCloud for Social Business], I thought this would be the naming convention for all of our Software-as-a-Service (SaaS) offerings.
But SmartCloud for SAP Applications is a Platform-as-a-Service, providing the SAP environment as a platform, which allows clients to then deploy their customized SAP applications on this platform.
What did I present on for my "Share your expertise" session? IBM System Storage, of course! Storage is a critical part of Cloud!
So, my gentle readers, what topics do you want me to write about that combines Storage and Cloud? Enter your suggestions in the comments below.
"SmartCloud Enterprise Object Storage is switching from 3rd-party Nirvanix to its internal IBM Softlayer. This one involves more in-depth explanation which I will save for another post."
It's time to make good on that promise! Here is a quick diagram to help visualize the agreement (with sincere apologies to [Jessica Hagy]!) but not to scale, of course!
Last month, Nirvanix announced it was shutting down October 15. Here was the exact wording from their website:
For the past seven years, we have worked to deliver cloud storage solutions. We have concluded that we must begin a wind-down of our business and we need your active participation to achieve the best outcome.
We are dedicating the resources we can to assisting our customers in either returning their data or transitioning their data to alternative providers who provide similar services including IBM SoftLayer, Amazon S3, Google Storage or Microsoft Azure.
We have an agreement with IBM, and a team from IBM is ready to help you. In addition, we have established a higher speed connection with some companies to increase the rate of data transfer from Nirvanix to their servers.
We are working hard to have resources available through October 15 to assist you with the transition process, and have set up a rapid response team that can be reached at (619) 764-5650 [press 2 for customer support during normal business hours] or (888) 791-0365 after business hours, or contact firstname.lastname@example.org.
Please check back to this web page periodically for status updates.
We thank you for your support and patience.
The Nirvanix team
UPDATE ON NIRVANIX
On October 1, 2013, Nirvanix voluntarily sought Chapter 11 bankruptcy protection in order to pursue all alternatives to maximize value for its creditors while continuing its efforts to provide the best possible transition for customers."
In response, IBM put out this press release:
"In light of reports that Nivanix has decided to soon cease operations, IBM is moving quickly to help clients of our Nivanix-based Object Storage offering to move their data to other solutions such as the robust and highly scalable IBM SoftLayer Object Storage or IBM's persistent storage solution."
To understand why this is a big deal, consider the difference between Cloud Computing and Cloud Storage. Cloud Computing is like buying gasoline at your favorite gas station. If the station is closed, you can just drive a few blocks to another gas station. The ease with which customers can switch from one Cloud Compute provider to another is part of the appeal, forcing Cloud Compute providers to be extremely efficient at what they do to offer the lowest price.
Cloud Storage is completely different, more like a safety-deposit box at the bank, or a storage unit to hold all of your boxes of tax receipts. Now if you have a small amount stored away in a safety-deposit box, this is probably just a minor inconvenience. You can take out the contents and store at home, or find another bank and open a new safe deposit account.
However, if you have a lot stored in a storage unit, it may be more difficult.
For example, I am in the process of remodeling my home, so I have moved a lot of my stuff to a 400 cubic-foot storage unit during the process. There were a variety of storage units within miles of my home. Some are fully air-conditioned, some offered 24x7 access, while others are not air-conditioned, or only allowed access during business hours. It has taken me several weekends to box up and move them to the storage unit. My car only holds 12-14 boxes at a time, so many trips were involved.
If the Storage Unit company told me that they were closing down, and that I would have to move all of these boxes to another facility, I would have to hire moving professionals to do all the work. This is in effect what companies need to do with their data. They must take the data off Nirvanix systems, and either store it in-house, or find another cloud storage provider.
IBM offers three options:
IBM [SoftLayer Object Storage] offering which is an OpenStack Swift-based Object Storage solution. IBM's SoftLayer object based storage solution provides a robust, highly scalable solution, with the ability to retrieve and leverage data the way you want to, and grow when you need. You can choose to store your objects in Dallas, Texas (USA), Amsterdam (Europe), and/or Singapore (Asia).
SCE persistent storage solution where you will be able to manage storage resources by attaching an instance during the instance creation process.
An alternate storage solution of your choice. Yes, IBM will help you move your data to Amazon, Google, Microsoft, etc. While technically competitors, IBM also has strategic partnerships in place with each to facilitate the movement.
These options are not just for IBM's SmartCloud Enterprise Object Storage clients. Nirvanix has named IBM the savior for all of its other non-IBM customers as well. Why IBM? Well, IBM is one of the most recognized names in the IT industry. Not just one of the biggest Cloud Service providers, IBM also has an army of professionals in its Global Services division to help.
Well, it's Tuesday again, and you know what that means? Announcements!
Today, IBM's announcements are designed to change the economics of big data analytics, cloud, mobile and social media.
[Software Defined Environments] require [Software Defined Storage], combining storage virtualization with open, extensible, industry-led interfaces. The IBM SmartCloud Virtual Storage Center (VSC) and IBM Storwize Family are the market leaders in storage virtualization. SmartCloud VSC, Storwize Family, and XIV support the industry-led OpenStack interfaces.
Here are some of the announcements today:
IBM Storwize® Family
The [SAN Volume Controller] was first introduced 10 years ago, in 2003. Today, clients enjoy these storage virtualization capabilities across a variety of offerings, known collectively as the [IBM Storwize Family].
IBM adds a new member to the Storwize Family. In addition to SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, Flex System V7000, Storwize V3700, and Storwize V3500, IBM is announcing the [IBM Storwize V5000]. Here's a quick side-by-side comparison:
Scalability: Maximum configuration
Four control enclosures clustered together, 36 expansion enclosures, 960 drives, 64GB cache
Two control enclosures clustered together, 12 expansion enclosures, 336 drives, 32GB cache
One control enclosure, 4 expansion enclosures, 120 drives, 8GB cache upgradeable to 16GB, optional Turbo performance
8Gbps FCP and 1GbE iSCSI standard; optional 10GbE iSCSI/FCoE.
Can upgrade to Storwize V7000 Unified by adding NAS File Modules to add support for CIFS, NFS, HTTPS, SCP and FTP protocols
1GbE iSCSI, 6Gbps SAS, 8Gbps FCP and 10GbE iSCSI/FCoE Standard
1GbE iSCSI, 6Gbps SAS standard; optional 8Gbps FCP and 10GbE iSCSI/FCoE
Storage virtualization/Data Migration
Internal virtualization, Data Migration standard; optional external virtualization
Internal virtualization, Data Migration standard; optional external virtualization
Internal virtualization, Data Migration (external devices can be attached to ingest data only) standard
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
Sub-LUN Automated Tiering
Easy Tier standard
optional Easy Tier
optional Easy Tier
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
Storwize V7000, V5000 and V37000 now support larger 800GB SSD drives. Previously, they only support SSD drives up to 400GB.
VMware 5.5 and VASA support. VMware ships every release with built-in support for all members of the IBM Storwize Family, but it bears repeating here just in case you were interested. IBM is a leading reseller of VMware, so it makes sense for IBM's storage devices to support everything that VMware customers could possibly want in terms of VMware integration. IBM SmartCloud VSC, Storwize Family, and XIV Storage System are no exception!
New IP-based replication driving lower costs for replication. Previously, Metro Mirror, Global Mirror and Global Mirror with Change Volumes were FCP-based, and many clients bought extra equipment to run FCP packets over long-distance IP (known as FCIP). Now, clients can replicate across distnace natively without FCIP routers, and use IP-based connections natively.
In my blog posts covering [Edge 2013 - Day 3 Solution Center], I mentioned that IBM has certified Bridgeworks' SANSlide 150SVCV7K unit that provides a Riverbed-like WAN Optimization for long-distance replication. Now, IBM has fully integrated Bridgeworks' SANSlide network optimization technology directly into Storwize Family!
All members of the Storwize Family will support 1GbE remote disk replication, and this will be extended to 10GbE support at a later date.
The [Storwize V3700] is now offered in 48-volt Direct Current (DC) models, [NEBS/ETSI compliance] for Telecommunications companies that require this, and now support 4TB drives.
When we introduced [IBM SmartCloud Storage Access] in February, it was to offer self-service, automated policy-based provisioning for file storage on the SONAS and Storwize V7000 Unified. Today, we add self-service, automated policy-based provisioning for block storage. The first products to be supported are SmartCloud VSC, the entire Storwize Family, and XIV Storage Systems. In addition to the web portal, the Storage Cloud Integration API enables 3rd party ISV applications to support SmartCloud Storage Access.
Storage admins will no longer need to be bothered with tedious provisioning requests, freeing up more time for them to work on more strategic, transformational projects.
[IBM SmartCloud Virtual Storage Center] was introduced last year, combining SAN Volume Controller, Tivoli Storage Productivity Center, Tivoli FlashCopy Mangaer and the Storage Analytics Engine into a single license. The initial offering provided the cross-platform "Tiered Storage Optimization" that provided recommendations for what LUNs should be moved from one disk array to another to manage performance vs. cost. Today, IBM is first to market with an automated version, moving LUNs automatically from one disk array to another.
[SmartCloud Enterprise Object Storage] is switching from 3rd-party Nirvanix to its internal IBM Softlayer. This one involves more in-depth explanation which I will save for another post.
IBM XIV Storage System
As part of the [due diligence] team for IBM to acquire the XIV company back in 2007, I am glad to see how this system has evolved since then. I have certainly [blogged quite a bit on XIV] over the years.
Earlier this year, IBM introduced Hyper-Scale Mobility which allows the storage admin to move LUNs non-disruptively from one XIV frame to another. Today, Hyper-Scale Cross-system Consistency Groups allows you to have snapshots of collections of volumes across multiple XIV frames, up to 3PB of capacity snapped at the same instance of time.
The current supported releases of OpenStack are Folsom and Grizzly, and the newest release is Havana. XIV now offers OpenStack Cinder interfaces at the Havana level.
XIV now offers a RESTful API for monitoring and provisioning. [REST] is a de-facto standard in WEB services and cloud implementations. XIV's RESTful API is a programmatic management interface that follows REST principles:
Resources are identified by global identifiers (URIs)
Data is sent as JSON/XML over HTTP
Manipulations of resources are done by HTTP methods (GET, PUT, POST, DELETE)
The interface is Stateless and Hypertext driven
The interface is universally supported, programming language and platform agnostic. For monitoring, the following GET example could show the list of volumes on a particular XIV storage system:
For provisioning, the following PUT example could create "vol1" on that XIV storage system.
IBM SmartCloud Storage Access to allow self-service provisioning (see the SmartCloud section above).
Data-at-Rest encryption, using Self-Encrypting Drives (SED). XIV will encrypt the data, and IBM's Security Key Lifecycle Manager (SKLM) or Tivoli Key Lifecycle Manager (TKLM). If you have an XIV already, you may already have SED drives ready to use! The XIV will also encrypt the data on the SSD drives used for persistent read-cache.
Other new and enhanced offerings
For our mainframe clients, the Virtualization Engine TS7700 now supports 60 percent more capacity, and can now support 8Gbps FICON attachement.
N series N3000, N6000 and N7000 support new disk drive types and sizes, as well as Data OnTap 8.2 Cluster mode. You can now lash together up to 16 N series together into a SONAS-like single system image.
Cisco MDS 9710 Multilayer Director for IBM® System Networking is a new 16 Gbps SAN director with robust security to support multi-tenancy cloud configurations.
Whew! That is a lot of things to discuss in one post. Since they were all related, I did not want to split it up into parts.
Wrapping up my coverage of the of the [IBM Edge 2013] conference, I have some photos of people I ran into at the Solutions Center.
Leslie Hattig and Lisa Stone, both account managers for [MarkIII Systems], an IBM Business Partner located in Houston, TX. These ladies are inseparable BFFs, I have never seen one without the other! I first met them at the [Storage Symposium in Chicago] back in 2009.
Stacy Tabor was our Community Manager for the [Storage Community]. This community covers IT Storage challenges, hot topics, architecture and solutions. You'll find industry news, videos, blog discussion threads on timely topics, exclusive analyst white papers and experts opinions. I am a frequent contributor, myself, and thank Stacy for her past service. She helped run a "Social Media Hour" at Edge for all the bloggers like me to get to meet each other.
I could not resist getting a picture with this Las Vegas Cirque du Soleil] dancer. This was an invitation-only event, sponsored by IBM Business Partners, that I was invited to during the Social Media Hour. (See, it pays to be social!) I think the visual effects of the flag she was waving turned out really well in the picture! And yes, in case you are wondering, that is my favorite grape-flavored beverage (GFB) in my left hand. Posing for this picture was quite the balancing act, but then I am also a certified yoga instructor, so I was able to manage!
Tanaz Sowdagar is an IBM Storage Rep for our Business Development Team. This includes finding other companies to OEM our technology and re-brand it under their own names. I have worked with Tanaz for many years, helping answer questions that potential OEM parnters have about our products and technologies for this purpose.
This was Michelle, my Conference Room Monitor. Each room had one, scanning the bar-codes on each badge for all the attendees, keeping count of the number of people for each session, supporting anything the speaker needs, like getting the A/V guy to come help set up the laptop projector.
Since this was Friday, last day of the conference. I decide to dress casually, consistent with many company's [Casual Friday] dress code policy. I am wearing the "IBM Edge Rocks" tee-shirt given out at the concert and Solutions center the first few nights.
Getting this shot right took several takes, as the man I handed my camera to had apparently never seen a digital camera before, did not know how to focus, and some
Finally, leaving Las Vegas, I sat next to Mrs. Joey Clark, wife of "Bulldog" Clark of the Utah band [Blammity Blam]. She also sometimes plays violin with the band. She is a newly-wed, and not sure if Joey is her name, or her husband's name. (Joey, if you are out there, and want me to correctly identify you, please write a comment in the section below.)
What I have learned however, is that if a beautiful girl is sitting next to me on the plane, she will either talk to me the entire flight, implying that she is single, or mention within the first 30 seconds of conversation that she is married. Sadly for me, it was the latter.
(We were both flying on to Dallas, TX, whereupon she was going to visit her parents in Florida, and I was on my way to Sao Paulo, Brazil to get stuck there amongst the protesters in what is now called the [V for Vinegar movement], but I will save that for another blog post!)
Well, that wraps up my coverage of Edge 2013. I am sorry it took so many months to cover all the material, but I did not want to have it go uncovered much longer.
Next year's [Edge 2014] is expected to be bigger and better. It will in Las Vegas again, but this time at the Venetian Hotel, May 19-23, 2014. I plan to be there!
Monday marked the first official day of [IBM Edge 2013] conference. This is actually three conferences in one: Executive Edge for the high-level executives, Winning Edge for the Business Partners, and Technical Edge for storage administrators and IT manager/directors. I attended the latter.
The General Session was kicked off by an awesome drumbeat-heavy song performed by a band from North Carolina called [Delta Rae]. Their use of drums reminded me of Adam Ant.
Deon Newman, IBM VP of Marketing, Systems and Technology Group, North America, served as today's master of ceremonies. He was pleased to announce there were more then 4,700 attendees at this event -- representing more than 60 countries -- a huge increase over the attendance we had last year. Here are my notes of the opening General Session:
Stephen Leonard, IBM General Manager, Sales, Systems & Technology Group
Consumers expect an always-on technology experience. We, as consumers, are leaving a trail of data that is getting wider and wider every day. Data is the new "natural resource", but plentiful and never ending.
In 1996, about 29 percent of IT spend was for adminstration and management, today it has grown to 68 percent. Some 34 percent of IT projects deploy late.
Stephen emphasized the themes of Smarter Computing: (a) systems that are designed for the data, (b) software-defined environments, that are (c) open and collaborative.
Stephen cited a customer example from [Jaguar Land Rover], a manufacturer of sporty automobiles and rugged 4x4 vehicles. IBM developed a ["Virtual Dealership"] for them. Rather that trying to maintain additional physical bricks-and-mortar facilities, which can be expensive to staff and fill with vehicles across their wide portfolio, the virtual dealership allows prospective customers to try out vehicles through simulation. This virtual dealership could be taken to where prospective clients are, such as a sporting event or shopping mall.
Ed Walsh, IBM VP of Marketing, System Storage and Networking
Ed presented the "data economics" of all-Flash arrays. IBM recently acquired Texas Memory Systems, and renamed the RamSan products to IBM FlashSystem, and committed to invest an additional $1 Billion US dollars in flash technologies.
On a $-per-IOPS basis, IBM FlashSystems can be 30 percent lower total-cost-of-ownership TCO than disk-based alternatives. The cost of Flash is offset by 17 percent fewer servers from having higher CPU utilization rates, resuling in 38 percent lower software license fees. Flash is also more efficient, with 74 percent lower in environmental costs, and 35 percent lower operational support costs. For many situations, Flash is the solution for poorly written software applications.
Ed also mentioned IBM's strong support for open source and open standards. Over the past 15 years, IBM as been a major contributor for open source efforts like Linux, Eclipse and Apache. IBM continues that tradition, with contributions to OpenStack and Hadoop.
Without going into any details, Ed also hinted that IBM announced 65 new or refreshed products in Storage, Networking and PureSystems. The details of each announcement would be explained during the break-out sessions during the week.
Charles Long, Founder and CEO of Centerline Digital
[Centerline Digital] does computer-generated animations in support of corporate marketing efforts.
(FTC disclosure: I work for IBM, and have worked closely with Centerline Digital marketing agency when I was the chief marketing strategist for System Storage back in 2006-2007. I was not paid or provided any products or services to mention any of the clients mentioned in this post.)
Charles indicates that internet technologies have converted "Analog dollars to digital pennies". Using IBM PureFlex with Storwize V7000 storage, real-time compression, and Tivoli Endpoint Manager, Centerline was able to drastically improve their business. He feels the old joke of "Better, Faster, Cheaper - Choose Any Two!" no longer applies with IBM solutions!
Ambuj Goyal, IBM General Manager, System Storage and Networking
Formerly my fifth-line manager in charge of Software and Systems, Ambuj switched to be the General Manager of System Storage and Networking group earlier this year.
In his former roles, Ambuj managed software and hardware product lines, but he feels storage is a completely different animal. In the past, clients focused on choosing the best servers, then chose their storage as an afterthought. Today, Ambuj feels that processors are now a commodity, and that storage is becoming the forethought.
Ambuj also highlighted the evolution of IBM's Software-Defined Environment:
In 2003, IBM introduced its the SAN Volume Controller, a storage hypervisor. Now, over 10,000 clients enjoy the benefits of a Software-Defined Environment using SAN Volume Controller.
SmartCloud Virtual Storage Center represents the "third generation" for policy-driven management, combining SAN Volume Controller, Tivoli Storage Productivity Center, FlashCopy Manager and the Storage Analytics Engine.
IBM is trying to help people keep their business critical apps running securely, to be able to start quickly, add value and functions at scale, and to leverage all of this data-intensive solutions to help drive new business and gain customer insight.
Joseph Balsamo, VP of Platform Engineering at Prudential Insurance
While the IT department of [Prudential Insurance] is focused on the three V's -- Volume, Velocity and Variety -- Joe is more focused on solutions, status and cost. His mission was to strengthen the role of IT as a partner through business aligned services. Prudential has deployed XIV, N series, SAN Volume Controller (SVC) and Storwize V7000 disk systems, with the following results:
Reduced their $-per-IOPS by 75 percent
No additional storage administrators
85 percent utilization through thick-to-thin migrations
Reduced their $-per-MB by 50 percent
Reduced their 72-hour RPO to 15 seconds
These benefits were achieved over the past 24 months of deployment.
Paulo Carvao, IBM Vice President, North America Systems & Technology Group
Paulo is Deon Newman's boss. He presented BlueInsight, IBM's internal "Business Analytics" cloud accessible by over 200,000 users, with over 1 PB of content.
Inside IBM, the deployment of a Smarter Infrastructure has allowed for 25 percent capacity growth at flat IT budget, with 30,000 fewer Megawatts and 103,000 square feet.
Why is this significant? Today's disk writes each bit of information across 1200 atoms, and the smallest number of atoms that can retain information is 12 bits, so sometime in the next 7 to 10 years, the improvements in magnetic bit density for disk will stop.
For silicon chips, the smallest practical feature is 7 nanometers, about 35 atoms wide. We are quickly approaching that limit also.
I can already tell that it's going to be a busy week! Follow me on twitter (@az990tony) and tag your posts and tweets with #IBMedge hashtag.
Continuing this week's theme about the future, fellow blogger, published author, and futurist David Houle is coming out with a new book this month titled [Entering the Shift Age]. This is a follow-on to his book, [The Shift Age].
Since this book cites IBM studies explicitly, his PR department asked me to review it. If you are an aspiring author that has a book you want me review, and it relates to the topics my blog covers like Cloud, Big Data, storage, and the explosion of information, feel free to send me a copy!
(FTC Disclosure: I work for IBM. I was not paid by anyone to mention this book on my blog. I was provided an "Uncorrected Advanced Copy" of this book at no cost to me for this review. I do not know David Houle personally, have not read any of his prior works, nor have I ever seen him speak at public events. This post is neither a paid nor celebrity endorsement of this author, his book, nor any other books by this author.)
First, let's get a few details out of the way:
Title:Entering the Shift Age, 284 pages Author: David Houle, futurist Genre: Non-fiction, trends and predictions
Publisher: Sourcebooks, Inc. Publish date: January 2013
As I mentioned in my post [Historians vs. Futurists], there is only one past, but there are many potential futures. There seems to be as many futurists out there as there are potential futures. I suspect not everyone will agree with all that David has written. However, this reminds me of one of my favorite quotes:
"When two futurists always agree, one is no longer necessary." -- old Italian adage
In his book, David asks a series of thought-provoking questions, then answers them with his views and opinions on how the future will roll out:
Is humanity now entering a new age that is different than the Information age?
If so, what should we call it?
Which forces are driving this new age?
How will this impact various aspects and institutions of society?
David feels humanity is indeed entering a new age, which he calls the Shift Age. This is driven by three forces: the shift to globalization of culture and politics, the flow of power and influence to individuals, and the acceleration of electronic connectedness.
In a sense, David is like a hunter-gatherer from the Stone age, hunting down trends and gathering ideas from others. In much the same way my compost brings renewed purpose to the rinds and pits of my fruits and vegetables, David's book does a good job paraphrasing the works of many of today's leading futurists.
David predicts the decade we are now in, the 2010's, will mark the end of the Information age, a transition period to this new era, that will lead to transformations in government, education, health, technology, and energy.
Over the past two weeks, I had time to enjoy a variety of movies. I had seen several whose stories wrapped around key moments of transition.
"Gone with the Wind", as well as the new offering "Lincoln" from Steven Spielberg. Both are set in the 1860's, the time of the [American Civil War], pitting the Industrial-age forces of the North, against the Agricultural-age economy of the South. This time saw the transition from slavery to freedom.
"Doctor Zhivago", set in the time of World War I, on the German-Russian front, as well as the Russian Revolution of 1917, and the resulting Civil War between the Red Guard and the White Army. This saw the transition from a Russian government ruled by Czars, to one ruled by the people through Communism.
"Lawrence of Arabia", also set in the time of World War I, but south in Arabia. T. E. Lawrence was able to bring several warring Arab tribes together to defeat the Turks, and was a key figure in the transition to an Arab National Council.
Some might call these completely unexpected [Black Swan] events, while others might feel they are merely fortunate (or misfortunate) sequences of events that led to inevitable social change. Has something happened, or will something happen later this decade, that will drive us to leave the Information Age?
David's previous book, The Shift Age, was published back in 2007, and a lot has happened in the past six years: a global financial melt-down recession; the Arab Spring uprisings in the Middle East; Barack Obama was elected and re-elected; man-made climate change in the form of hurricanes, tsunamis and superstorms hit various parts of the world; brush fires lit up Australia, and BP's Deepwater Horizon oil rig exploded off the Gulf coast, just to name a few.
David's new book reflects the impact of these recent events, from discussions on his [Evolutionshift] blog, to Q&A sessions he has after his public speaking presentations. For those who are not interested in the wide array of topics he covers in this one book, David also offers [a dozen different mini-eBooks] that cover specific topics like [Technology, Energy and Health].
My Rating: Moist and Flaky
Who should read this book: If you are a time-traveler from 1975 that came to this decade to learn all about what your future has in store, but can only select one book to read before you zoom back to your own time period, this would be the book I recommend.
I do not want to imply this is a quick read, or one that you can't put down once you start reading it. Just like you should not gulp down a full bottle of cheap Vodka in one sitting, this book should be read over a series of days, as I did, so that you can mull over in your mind the different points and thoughts he is trying to convey.
If you store your VMware bits on external SAN or NAS-based disk storage systems, this post is for you. The subject of the post, VM Volumes, is a potential storage management game changer!
Fellow blogger Stephen Foskett mentioned VM Volumes in his [Introducing VMware vSphere Storage Features] presentation at IBM Edge 2012 conference. His session on VMware's storage features included VMware APIs for Array Integration (VAAI), VMware Array Storage Awareness (VASA), vCenter plug-ins, and a new concept he called "vVol", now more formally known as VM Volumes. This post provides a follow-up to this, describing the VM Volumes concepts, architecture, and value proposition.
"VM Volumes" is a future architecture that VMware is developing in collaboration with IBM and other major storage system vendors. So far, very little information about VM Volumes has been released. At VMworld 2012 Barcelona, VMware highlights VM Volumes for the first time and IBM demonstrates VM Volumes with the IBM XIV Storage System (more about this demo below). VM Volumes is worth your attention -- when it becomes generally available, everyone using storage arrays will have to reconsider their storage management practices in a VMware environment -- no exaggeration!
But enough drama. What is this all about?
(Note: for the sake of clarity, this post refers to block storage only. However, the VM Volumes feature applies to NAS systems as well. Special thanks to Yossi Siles and the XIV development team for their help on this post!)
The VM Volumes concept is simple: VM disks are mapped directly to special volumes on a storage array system, as opposed to storing VMDK files on a vSphere datastore.
The following images illustrate the differences between the two storage management paradigms.
You may still be asking yourself: bottom line, how will I benefit from VM Volumes?
Well, take a VM snapshot for example. With VM Volumes, vSphere can simply offload the operation by invoking a hardware snapshot of the hardware volume. This has significant implications:
VM-Granularity: Only the right VMs are copied (with datastores, backing up or cloning individual-VM portions of hardware snapshot of a datastore would require more complex configuration, tools and work)
Hardware Offload: No ESXi server resources are consumed
XIV advantage: With XIV, snapshots consume no space upfront and are completed instantly.
Here's the first takeaway: With VM Volumes, advanced storage services (which cost a lot when you buy a storage array), will become available at an individual VM level. In a cloud world, this means that applications can be provisioned easily with advanced storage services, such as snapshots and mirroring.
Now, let's take a closer look at another relevant scenario where VM Volumes will make a lot of difference - provisioning an application with special mirroring requirements:
VM Volumes case: The application is ordered via the private cloud portal. The requestor checks a box requesting an asynchronous mirror. He changes the default RPO for his needs. When the request is submitted, the process wraps up automatically: Volumes are created on one of the storage arrays, configured with a mirror and RPO exactly as specified. A few minutes later, the requestor receives an automatic mail pointing to the application virtual machine.
Datastores case #1: As may be expected, a datastore that is mirrored with the special RPO does not exist. As a result, the automated workflow sets a pending status on the request, creates an urgent ticket to a VMware administrator and aborts. When the VMware admin handles that ticket, she re-assigns the ticket to the storage administrator, asking for a new volume which is mirrored with the special RPO, and mapped to the right ESXi cluster. The next day, the volume is created; the ticket is re-assigned to the storage admin, with the new LUN being pointed to. The VMware administrator follows and creates the datastore on top of it. Since the automated workflow was aborted, the admin re-assigns the ticket to the cloud administrator, who sometime later completes the application provisioning manually.
Datastores case #2: Luckily for the requestor, a datastore that is mirrored with the special RPO does exist. However, that particular datastore is consuming space from a high performance XIV Gen3 system with SSD caching, while the application does not require that level of performance, so the workflow requires a storage administrator approval. The approval is given to save time, but the storage administrator opens a ticket for himself to create a new volume on another array, as well as a follow-up ticket for the VMware admin to create a new datastore using the new volume and migrate the application to the other datastore. In this case, provisioning was relatively rapid, but required manual follow up, involving the two administrators.
Here's the second takeaway: With VM Volumes, management is simplified, and end-to-end automation is much more applicable. The reason is that there are no datastores. Datastores physically group VMs that may otherwise be totally unrelated, and require close coordination between storage and VMware administrators.
Now, the above mainly focuses on the VMware or cloud administrator perspective. How does VM Volumes impact storage management?
VM's are the new hosts: Today, storage administrators have visibility of physical hosts in their management environment. In a non-virtualized environment, this visibility is very helpful. The storage administrator knows exactly which applications in a data center are storage-provisioned or affected by storage management operations because the applications are running on well-known hosts. However, in virtualized environments the association of an application to a physical host is temporary. To keep at least the same level of visibility as in physical environments, VMs should become part of the storage management environment, like hosts. Hosts are still interesting, for example to manage physical storage mapping, but without VM visibility, storage administrators will know less about their operation than they are used to, or need to. VM Volumes enables such visibility, because volumes are provided to individual VMs. The XIV VM Volumes demonstration at VMworld Barcelona, although experimental, shows a view of VM volumes, in XIV's management GUI.
Here's a screenshot:
That's not all!
Storage Profiles and Storage Containers: A Storage Profile is a vSphere specification of a set of storage services. A storage profile can include properties like thin or thick provisioning, mirroring definition, snapshot policy, minimum IOPS, etc.
Storage administrators define a portfolio of supported storage services, maintained as a set of storage profiles, and published (via VASA integration) to vSphere.
VMware or cloud administrators define the required storage profiles for specific applications
VMware and storage administrators need to coordinate the typical storage requirements and the automatically-available storage services. When a request to provision an application is made, the associated storage profiles are matched against the published set of available storage profiles. The matching published profiles will be used to create volumes, which will be bound to the application VMs. All that will happen automatically.
Note that when a VM is created today, a datastore must be specified. With VM Volumes, a new management entity called Storage Container (also known as Capacity Pool) replaces the use of datastore as a management object. Each Storage Container exposes a subset of the available storage profiles, as appropriate. The storage container also has a capacity quota.
Here are some more takeaways:
New way to interface vSphere and storage management: Storage administrators structure and publish storage services to vSphere via storage profiles and storage containers.
Automated provisioning, out of the box: The provisioning process automatically matches application-required storage profiles against storage profiles available from the specified storage containers. There is no need to build custom scripts and custom processes to automate storage provisioning to applications
The XIV advantage:
XIV services are very simple to define and publish. The typical number of available storage profiles would be low. It would also be easy to define application storage profiles.
XIV provides consistent high performance, up to very high capacity utilization levels, without any maintenance. As a result, automated provisioning (which inherently implies less human attention) will not create an elevated risk of reduced performance.
Note: A storage vendor VASA provider is required to support VM Volumes, storage profiles, storage containers and automated provisioning. The IBM Storage VASA provider runs as a standalone service that needs to be deployed on a server.
To summarize the VM Volumes value proposition:
Streamline cloud operation by providing storage services at VM and application level, enabling end-to-end provisioning automation, and unifying VMware and storage administration around volumes and VMs.
Increase storage array ROI, improve vSphere scalability and response time, and reduce cloud provisioning lag, by offloading VM-level provisioning, failover, backup, storage migration, storage space recycling, monitoring, and more, to the storage array, using advanced storage operations such as mirroring and snapshots.
Simplify the adoption of VM Volumes using XIV, with smaller and simpler sets of storage profiles. Apply XIV's supreme fast cloning to individual VMs, and keep automation risks at bay with XIV's consistent high performance.
Until you can get your hands on a VM Volumes-capable environment, the VMware and IBM developer groups will be collaborating and working hard to realize this game-changing feature. The above information is definitely expected to trigger your questions or comments, and our development teams are eager to learn from them and respond. Enter your comments below, and I will try to answer them, and help shape the next post on this subject. There's much more to be told.
This month, I am pleased to announce the new [IBM STG Executive Briefing Center] website, representing a huge improvement over the previous website we had been using over the past two years. STG refers to IBM's Systems and Technology Group, the division that focuses on servers, storage, switches and the system software that makes them run. This new website is for the dozen STG EBCs that span the globe. The new website reminds me of this famous quote:
"Perfection is achieved, not when there is nothing left to add, but when there is nothing left to take away"
-- Antoine de Saint-Exupery
Let's take a quick look at what makes it so much better.
The previous website required registration. At every briefing, those of us who work in the EBCs had to pass around a sign-up sheet for email addresses from each attendee so that we could send them an invitation to register for the site. We would have a hard time reading people's handwriting, resulting in some emails coming back rejected.
Inspired by self-service gas stations, automated teller machines, and the many self-service portals of Cloud Computing, the new website has everything up-front, without registration. IBM Business Partners and sales representatives can easily request a briefing at any of the dozen briefing centers represented!
IBM-managed and IBM-hosted
We had a difficult time explaining to our attendees why our previous website was hosted on a lone machine and maintained by a third party. Think about it, IBM manages the data centers of over 400 clients. IBM has provided web hosting to the most mission critical workloads, with high levels of availability and reliability, and is recognized as one of the "Big 5" Cloud companies. I have done web design myself in my career, and we were terribly disappointed with the third party chosen to create and maintain our previous website, constantly having to point out errors in their HTML and CSS.
For the new website, IBM took back control. Staff from each EBC, myself included, came up with a simple page to bring the essence of each location to life. Special thanks to my colleage Hal Jennings, from the Austin EBC, for bringing this altogether!
Despite two years of manually registering attendees to use the previous website, Google Analytics showed that few people visited, and the few that did spent little time exploring the vast repository of content.
The new website is vastly simpler. The front page points to all twelve EBCs, and a single mouse click gets you to the location you are interested in, with all the details you need to make a decision to book a briefing, and the contact information to make it happen.
Elimination of Wasted and Duplicate Effort
In the previous website, we spent as much as 15 hours just to create, voice over, edit and produce a single 15-minute recorded presentation. Less than six percent of the previous website visitors watched more than five minutes of these videos, making us feel that most of our effort was wasted.
The EBC staff kept wasting their time, month after month, thanks to all-stick, no-carrot tactics that mandated minimums for contributions for more and more content that nobody was ever looking at. Even more disappointing was that much of our work duplicated the formal responsibilities of our IBM Marketing team. They weren't happy about this either, causing confusion between the roles of our two teams.
Finally, we said enough was enough! The new STG EBC website is a marvel in minimalism. If you want to see presentations, videos, expert profiles, or partake in on-going conversations, I welcome you to visit the [IBM Expert Network], the [IBM Storage YouTube Channel], and the [Storage Community] where they belong.
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
This week, I am in beautiful Sao Paulo, Brazil, teaching Top Gun class to IBM Business Partners and sales reps. Traditionally, we have "Tape Thursday" where we focus on our tape systems, from tape drives, to physical and virtual tape libraries. IBM is the number #1 tape vendor, and has been for the past eight years.
(The alliteration doesn't translate well here in Brazil. The Portuguese word for tape is "fita", and Thursday here is "quinta-feira", but "fita-quinta-feira" just doesn't have the same ring to it.)
In the class, we discussed how to handle common misperceptions and myths about tape. Here are a few examples:
Myth 1: Tape processing is manually intensive
In my July 2007 blog post [Times a Million], I coined the phrase "Laptop Mentality" to describe the problem most people have dealing with data center decisions. Many folks extend linearly their experiences using their PCs, workstations or laptops to apply to the data center, unable to comprehend large numbers or solutions that take advantage of the economies of scale.
For many, the only experience dealing with tape was manual. In the 1980s, we made "mix tapes" on little cassettes, and in the 1990s we recorded our favorite television shows on VHS tapes in the VCR. Today, we have playlists on flash or disk-based music players, and record TV shows on disk-based video recorders like Tivo. The conclusion is that tapes are manual, and disk are not.
Manual processing of tapes ended in 1987, with the introduction of a silo-like tape library from StorageTek. IBM quickly responded with its own IBM 3495 Tape Library Data Server in 1992. Today, clients have many tape automation choices, from the smallest IBM TS2900 Tape Autoloader that has one drive and nine cartridges, all the way to the largest IBM TS3500 multiple-library shuttle complex that can hold exabytes of data. These tape automation systems eliminate most of the manual handling of cartridges in day-to-day operations.
Myth 2: Tape media is less reliable than disk media
For any storage media to be unreliable is to return the wrong information that is different than what was originally stored. There are only two ways for this to happen: if you write a "zero" but read back a "one", or write a "one" and read a "zero". This is called a bit error. Every storage media has a "bit error rate" that is the average likelihood for some large amount of data written.
According to the latest [LTO Bit Error rates, 2012 March], today's tape expects only 1 bit error per 10E17 bits written (about 100 Petabytes). This is 10 times more reliable than Enterprise SAS disk (1 bit per 10E16), and 100 times more reliable than Enterprise-class SATA disk (1 bit per 10E15).
Tape is the media used in "black boxes" for airplanes. When an airplane crashes, the black box is retrieved and used to investigate the causes of the crash. In 1986, the Space Shuttle Challenger exploded 73 seconds after take-off. The tapes in the black box sat on the ocean floor for six weeks before being recovered. Amazingly, IBM was able to successfully restore [90 percent of the block data, and 100 percent of voice data].
Analysts are quite upset when they are quoted out of context, but in this case, Gartner never said anything closely similar to this. Nor did the other analysts that Curtis investigated for similar claims. What Garnter did say was that disk provides an attractive alternative storage media for backup which can increase the performance of the recovery process.
Back in the 1990s, Savur Rao and I developed a patent to help backup DB2 for z/OS by using the FlashCopy feature of IBM's high-end disk system. The software method to coordinate the FlashCopy snapshots with the database application and maintain multiple versions was implemented in the DFSMShsm component of DFSMS. A few years later, this was part of a set of patents IBM cross-licensed to Microsoft for them to implement a similar software for Windows called Data Protection Manager (DPM). IBM has since introduced its own version for distributed systems called IBM Tivoli FlashCopy Manager that runs not just on Windows, but also AIX, Linux, HP-UX and Solaris operating systems.
Curtis suspects the "71 percent" citation may have been propogated by an ambitious product manager of Microsoft's Data Protection Manager, back in 2006, perhaps to help drive up business to their new disk-based backup product. Certainly, Microsoft was not the only vendor to disparage tape in this manner.
A few years ago, an [EMC failure brought down the State of Virginia] due to not just a component failure it its production disk system, but then made it worse by failing to recover from the disk-based remote mirror copy. Fortunately, the data was able to be restored from tape over the next four days. If you wonder why nobody at EMC says "Tape is Dead" anymore, perhaps it is because tape saved their butts that week.
(FTC Disclosure: I work for IBM and this post can be considered a paid, celebrity endorsement for all of the IBM tape and software products mentioned on this post. I own shares of stock in both IBM and Google, and use Google's Gmail for my personal email, as well as many other Google services. While IBM, Google and Microsoft can be considered competitors to each other in some areas, IBM has working relationships with both companies on various projects. References in this post to other companies like EMC are merely to provide illustrative examples only, based on publicly available information. IBM is part of the Linear Tape Open (LTO) consortium.)
Myth 4: Vendors and Manufacturers are no longer investing in tape technology
IBM and others are still investing Research and Development (R&D) dollars to improve tape technology. What people don't realize is that much of the R&D spent on magnetic media can be applied across both disk and tape, such as IBM's development of the Giant Magnetoresistance read/write head, or [GMR] for short.
Most recently, IBM made another major advancement with tape with the introduction of the Linear Tape File Systems (LTFS). This allows greater portability to share data between users, and between companies, but treating tape cartridges much like USB memory sticks or pen drives. You can read more in my post [IBM and Fox win an Emmy for LTFS technology]!
Next month, IBM celebrates the 60th anniversary for tape. It is good to see that tape continues to be a vibrant part of the IT industry, and to IBM's storage business!
Well, it's Tuesday again, and you know what that means!
This Thursday is the Thanksgiving holiday here in the United States, so instead of announcing IBM products, I wanted to announce the general availability of my latest book, [Inside System Storage: Volume III].
This book includes blog posts from May 2008 to March 2009, along with the ever popular behind-the-scenes commentary on what was going on during IBM's launch of the Information Infrastructure initiative.
Do you know someone who celebrates Chanukah, Christmas, Kwanza, or the Winter Solstice, and have a hard time finding the right gift?
Do you know a client or IBM Business Partner that would appreciate a nominally-priced gift to thank them for their business?
Do you know someone newly hired into IBM or another IT company that could benefit from behind-the-scenes insight and commentary?
As with the other two volumes, Inside System Storage: Volume III is available in your choice of paperback, hardcover, and eBook (Adobe PDF) format.
In the spirit of Thanksgiving, I would like to thank my editor, Susan Pollard, who put in the extra effort, working evenings and weekends, to get this book done in time for the upcoming holiday season. For those outside the United States, there is an American tradition to shop in brick-and-mortar stores on Black Friday (the day after Thanksgiving) and to shop on-line for books like mine on Cyber Monday (the Monday after Thanksgiving).
I would also like to thank my publisher, Lulu.com, for upgrading me to "Spotlight" level, so now I have a spotlight page titled [Books Written by Tony Pearson], making it easy for you to order any of my books in various formats.
And last, but not least, I would like to thank all my friends and family that were supportive these past few difficult months while I was putting this book together.
Next month, I will be in Las Vegas, Dec 4-8, speaking at Gartner's [Data Center Conference]. If you order a book today, and bring it with you to the IBM booth at the Solution Expo, I can sign it for you!
This week, IBM made over a dozen announcements related to IBM storage products. Here is part 2 of my overview:
IBM System Storage® DS8000 series microcode
One of the advantages of acquiring XIV as IBM's other high-end disk system, is that it allows the DS8000 team to focus on the IBM i and z/OS operating systems. As a result, IBM DS8000 has over half the mainframe-attach market share.
For both the DS8700 and DS8800 models, IBM Easy Tier now support sub-LUN automated tiering across three storage tiers: Solid-State Drives, high-performance spinning disk drives (15K and 10K RPM), and high-capacity disk drives (7200 RPM).
For System z customers, the latest DS8000 microcode has synergy with z/OS and GDPS, now supporting 4x larger EAV volumes, faster high-performance FICON (zHPF), and Workload Manager (WLM) integration with the I/O Priority Manager. IBM has a world record SAP performance of 59 million account postings per hour. DB2 v10 for z/OS queries were measured at 11x faster using the new zHPF feature.
IBM System Storage® DS8800 systems
On the hardware side, the DS8800 now supports a fourth frame to hold a total over 1,500 disk drives. Yes, we have customers that three frames wasn't enough, and they wanted more.
IBM is now also offering new drive options. Small Form Factor (2.5 inch) drives now include 300GB 15K RPM drives, and a 900GB 10K RPM drives. But wait! There's more! The DS8800 is no longer a SFF-only box, it now allows for mixing in Large form factor (3.5 inch) drives, starting with the 3TB NL-SAS 7200 RPM drive.
IBM XIV® Storage System Gen3
We announced the XIV Gen3 already, but we have two enhancements.
First, we now offer a model based entirely on 3TB NL-SAS drives. If you are thinking, what IBM is going to put 3TB drives into everything? Yup. Once we go through all the pain and suffering of qualifying a drive, we make sure we get our money's worth!
Secondly, we have now an iPad application to manage the XIV. This has nothing to do with Apple CEO Steve Jobs passing away last week, it was merely coincidence.
IBM Real-time Compression Appliances™ STN6500 and STN6800 V3.8
The latest software for RtCA now supports Microsoft SMB v2, and enhanced reporting so that storage admins know exactly the benefits of the compression ratios of different file extensions.
IBM System Storage EXP2500 Express®
The EXP2500 is for direct-attach situations, like the IBM BladeCenter. IBM adds LFF 3.5-inch 3TB NL-SAS drives, SFF 2.5-inch 300GB 15K RPM SAS drives, and 900GB 7200 RPM NL-SAS drives.
My colleague Curtis Neal refers to these as "B.F.D" announcements, which of course stands for Bigger, Faster, Denser!
Last week, fellow IBMer Ron Riffe started his three-part series on the Storage Hypervisor. I discussed Part I already in my previous post [Storage Hypervisor Integration with VMware]. We wrapped up the week with a Live Chat with over 30 IT managers, industry analysts, independent bloggers, and IBM storage experts.
"The idea of shopping from a catalog isn’t new and the cost efficiency it offers to the supplier isn’t new either. Public storage cloud service providers seized on the catalog idea quickly as both a means of providing a clear description of available services to their clients, and of controlling costs. Here’s the idea… I can go to a public cloud storage provider like Amazon S3, Nirvanix, Google Storage for Developers, or any of a host of other providers, give them my credit card, and get some storage capacity. Now, the “kind” of storage capacity I get depends on the service level I choose from their catalog.
Most of today’s private IT environments represent the complete other end of the pendulum swing – total customization. Every application owner, every business unit, every department wants to have complete flexibility to customize their storage services in any way they want. This expectation is one of the reasons so many private IT environments have such a heavy mix of tier-1 storage. Since there is no structure around the kind of requests that are coming in, the only way to be prepared is to have a disk array that could service anything that shows up. Not very efficient… There has to be a middle ground.
Private storage clouds are a little different. Administrators we talk to aren’t generally ready to let all their application owners and departments have the freedom to provision new storage on their own without any control. In most cases, new capacity requests still need to stop off at the IT administration group. But once the request gets there, life for the IT administrator is sweet!
Here comes the request from an application owner for 500GB of new “Database” capacity (one of the options available in the storage service catalog) to be attached to some server. After appropriate approvals, the administrator can simply enter the three important pieces of information (type of storage = “Database”, quantity = 500GB, name of the system authorized to access the storage) and click the “Go” button (in TPC SE it’s actually a “Run now” button) to automatically provision and attach the storage. No more complicated checklists or time consuming manual procedures.
A storage hypervisor increases the utilization of storage resources, and optimizes what is most scarce in your environment. For Linux, UNIX and Windows servers, you typically see utilization rates of 20 to 35 percent, and this can be raised to 55 to 80 percent with a storage hypervisor. But what is most scarce in your environment? Time! In a competitive world, it is not big animals eating smaller ones as much as fast ones eating the slow.
Want faster time-to-market? A storage hypervisor can help reduce the time it takes to provision storage, from weeks down to minutes. If your business needs to react quickly to changes in the marketplace, you certainly don't want your IT infrastructure to slow you down like a boat anchor.
Want more time with your friends and family? A storage hypervisor can migrate the data non-disruptively, during the week, during the day, during normal operating hours, instead of scheduling down-time on an evenings and weekends. As companies adopt a 24-by-7 approach to operations, there are fewer and fewer opportunities in the year for scheduled outages. Some companies get stuck paying maintenance after their warranty expires, because they were not able to move the data off in time.
Want to take advantage of the new Solid-State Drives? Most admins don't have time to figure out what applications, workloads or indexes would best benefit from this new technology? Let your storage hypervisor automated tiering do this for you! In fact, a storage hypervisor can gather enough performance and usage statistics to determine the characteristics of your workload in advance, so that you can predict whether solid-state drives are right for you, and how much benefit you would get from them.
Want more time spent on strategic projects? A storage hypervisor allows any server to connect to any storage. This eliminates the time wasted to determine when and how, and let's you focus on the what and why of your more strategic transformational projects.
If this sounds all too familiar, it is similar to the benefits that one gets from a server hypervisor -- better utilization of CPU resources, optimizing the management and administration time, with the agility and flexibility to deploy new technologies in and decommission older ones out.
"Server virtualization is a fairly easy concept to understand: Add a layer of software that allows processing capability to work across multiple operating environments. It drives both efficiency and performance because it puts to good use resources that would otherwise sit idle.
Storage virtualization is a different animal. It doesn't free up capacity that you didn't know you had. Rather, it allows existing storage resources to be combined and reconfigured to more closely match shifting data requirements. It's a subtle distinction, but one that makes a lot of difference between what many enterprises expect to gain from the technology and what it actually delivers."
Jon Toigo on his DrunkenData blog brings back the sanity with his post [Once More Into the Fray]. Here is an excerpt:
"What enables me to turn off certain value-add functionality is that it is smarter and more efficient to do these functions at a storage hypervisor layer, where services can be deployed and made available to all disk, not to just one stand bearing a vendor’s three letter acronym on its bezel. Doesn’t that make sense?
I think of an abstraction layer. We abstract away software components from commodity hardware components so that we can be more flexible in the delivery of services provided by software rather than isolating their functionality on specific hardware boxes. The latter creates islands of functionality, increasing the number of widgets that must be managed and requiring the constant inflation of the labor force required to manage an ever expanding kit. This is true for servers, for networks and for storage.
Can we please get past the BS discussion of what qualifies as a hypervisor in some guy’s opinion and instead focus on how we are going to deal with the reality of cutting budgets by 20% while increasing service levels by 10%. That, my friends, is the real challenge of our times."
Did you miss out on last Friday's Live Chat? We are doing it again this Friday, covering parts I and II of Ron's posts, so please join the conversation! The virtual dialogue on this topic will continue in another [Live Chat] on September 30, 2011 from 12 noon to 1pm Eastern Time.
Can you believe it has been five years since I started blogging?
(If you absolutely abhor the navel-gazing associated with blogging-about-blogging posts, then by all means stop reading now!)
Back in July 2005, IBM decided to merge together two brands, IBM eServer and IBM TotalStorage, into a single all-encompassing "IBM Systems" brand. Thus TotalStorage brand became the "IBM System Storage" product line of the "IBM Systems" brand. The next six months was spent renaming some (not all) of the products. The following January, I was named the Marketing Strategist for this new product line, with the mission to help promote the new naming convention.
We looked at possibly doing a regularly-scheduled podcast, but nobody back then, including myself, were familar with audio editing tools. Instead, we chose a blog. Most blogs at IBM are internal, safely hidden behind the firewall, accessible only to IBM employees. I wanted mine to be different, to be accessible to the public, clients, prospects, IBM Business Partners, and yes, even those working for IBM's various competitors. One thing I like about blogs is that if you have a typo, or make a mistake, you can go back and correct it after it has posted.
Marketing through social media is quite different than traditional marketing techniques. Management was supportive, but legal wanted to review and approval everything I wrote before I posted it onto my blog. Official IBM Press Releases, for example, go through a dozen reviews before they are finally made public. I refused. This kind of review and approval would ruin the blogging process.
Fortunately, this blog was not my first attempt at technical writing. Our legal counsel reviewed my past trip reports from various conferences, and decided to let me blog without review. Occasionally, someone will reivew my blog once already posted, and ask me to make some corrections. It reminds me of my favorite saying used heavily within IBM:
Despite these delays, we managed to launch this blog in September 2006, just in time to celebrate the 50th anniversary of disk systems. IBM introduced the industry's first commercial disk system on September 13, 1956.
Over the years, this blog has helped sales reps and IBM Business Partners close deals, and address the FUD their prospects heard from competition. I have helped my readers get in touch with the right people within IBM. And, I have "sent the elevator back down", helping other IBMers launch their own blogs, including [Barry Whyte], [Elisabeth Stahl], and [Anthony Vandewerdt].
Today, bloggers have a profound impact on the world. Not everyone has a positive view on this. Bloggers and other users of social media have been seen as whistle-blowers for fraudulent corporations, as activists against corrupt governments and dictators, and as subject matter experts and fact checkers referenced during television and radio newscasts. In a recent movie, one of the major characters was a trouble-making blogger, and another character describes his blogging as nothing more than "graffiti with punctuation."
I want to thank all of my readers for making this the #1 most influential blog on IBM DeveloperWorks in 2011! This blog has been [published in a series of books], Inside System Storage Volume I and Volume II. And yes, before you all ask in the comments below, I am actively working on Volume III.
For a bit of nostalgia, I invite you to read my first 21 blog posts that I posted back in [September 2006].
After the amount of flack Jon Toigo had to endure for not giving advanced notice to his upcoming Webcast, I thought I would better remind people about my own Webinar that is happening next Tuesday, August 23.
So here's the scoop, next Tuesday I will be presenting [The Future of Storage], August 23, 1pm to 2pm EDT. You can register to attend at the [Infoboom Registration Page]. Infoboom is a social community for business and IT leaders of small and midsize businesses brought to you by IBM.
But that's not all! After the webinar, I will then travel to various cities for face-to-face lectures. Here are the first two:
September 7 - Indianapolis
September 8 - Boston area
If you are near either of these two locations, contact your local IBM storage specialist or IBM business partner to participate.
The IBM Storwize V7000 was introduced last October, and has proven to be wildly successful. I saw two awesome reviews recently of the IBM Storwize V7000 disk system that I thought I would bring to your attention.
The first review is [IBM Storwize V7000] from Roger Howorth of ZDNet UK. Here are some quotes:
"Under the hood, the Storwize V7000 is built from technologies originally developed for IBM's enterprise-class storage systems, so the V7000 benefits from a comprehensive set of high-end features that have been scaled down for mid-range buyers."
"Initial configuration couldn't be simpler."
"We really liked the layout and functionality of the GUI."
"Storwize V7000 is virtual storage that offers efficiency and flexibility through built-in SSD optimization and "thin provisioning" technologies while enabling users to virtualize and re-use existing disk systems..."
"Storwize V7000 advanced functionality also enables non-disruptive migration of data from existing storage, simplifying implementation and minimizing disruption to users."
"The Storwize V7000 graphical user interface is a browser-based, easy to navigate intuitive GUI."
"ESG Lab found that getting started with the Storwize V7000 disk system was intuitive and straightforward."
"Easy Tier increases the efficiency and simplicity of deploying SSD drives."
Full VMware Vstorage API for Array Integration (VAAI). Back in 2008, VMware announced new vStorage APIs for its vSphere ESX hypervisor: vStorage API for Site Recovery Manager, vStorage API for Data Potection, vStorage API for Multipathing. Last July, VMware added a new API called vStorage API for Array Integration [VAAI] which offers three primitives:
Hardware-assisted Blocks zeroing. Sometimes referred to as "Write Same", this SCSI command will zero out a large section of blocks, presumably as part of a VMDK file. This can then be used to reclaim space on the XIV on thin-provisioned LUNs.
Hardware-assisted Copy. Make an XIV snapshot of data without any I/O on the server hardware.
Hardware-assisted locking. On mainframes, this is call Parallel Access Volumes (PAV). Instead of locking an entire LUN using standard SCSI reserve commands, this primitive allows an ESX host to lock just an individual block so as not to interfere with other hosts accessing other blocks on that same LUN.
Quality of Service (QoS) Performance Classes.
When XIV was first released, it treated all hosts and all data the same, even when deployed for a variety of different applications. This worked for some clients, such as [Medicare y Mucho Más]. They migrated their databases, file servers and email system from EMC CLARiiON to an IBM XIV Storage System. In conjunction with VMware, the XIV provides a highly flexible and scalable virtualized architecture, which enhances the company's business agility.
However, other clients were skeptical, and felt they needed additional "nobs" to prioritize different workloads. The new 10.2.4 microcode allows you to define four different "performance classes". This is like the door of a nightclub. All the regular people are waiting in a long line, but when a celebrity in a limo arrives, the bouncer unclips the cord, and lets the celebrity in. For each class, you provide IOPS and/or MB/sec targets, and the XIV manages to those goals. Performance classes are assigned to each host based on their value to the business.
Offline Initialization for Asynchronous Mirror.
Internally, we called this Truck Mode. Normally, when a customer decides to start using Asynchronous Mirror, they already have a lot of data at the primary location, and so there is a lot of data to send over to the new XIV box at the secondary location. This new feature allows the data to be dumped to tape at the primary location. Those tapes are shipped to the secondary location and restored on the empty XIV. The two XIV boxes are then connected for Asynchronous Mirroring, and checksums of each 64KB block are compared to determine what has changed at the primary during this "tape delivery time". This greatly reduces the time it takes for the two boxes to get past the initial synchronization phase.
IP-based Replication. When IBM first launched the Storwize V7000 last October, people commented that the one feature they felt missing was IP-based replication. Sure, we offered FCP-based replication as most other Enterprise-class disk systems offer today, but many midrange systems also offer IP-based repliation to reduce the need for expensive FCIP routers. [IBM Tivoli Storage FastBack for Storwize V7000] provides IP-based replication for Storwize V7000 systems.
Network Attached Storage
IBM announced two new models of the IBM System Storage N series. The midrange N6240 supports up to 600 drives, replacing the N6040 system. The entry-level N6210 supports up to 240 drives, and replaces the N3600 system. Details for both are available on the latest [data sheet].
IBM Real-Time Compression appliances work with all N series models to provide additional storage efficiency. Last October, I provided the [Product Name Decoder Ring] for the STN6500 and STN6800 models. The STN6500 supports 1 GbE ports, and the STN6800 supports 10GbE ports (or a mix of 10GbE and 1GbE, if you prefer). The IBM versions of these models were announced last December, but some people were on vacation and might have missed it. For more details of this, read the [Resources page], the [landing page], or [watch this video].
IBM System Storage DS3000 series
IBM System Storage [DS3524 Express DC and EXP3524 Express DC] models are powered with direct current (DC) rather than alternating current (AC). The DS3524 packs dual controllers and two dozen small-form factor (2.5 inch) drives in a compact 2U-high rack-optimized module. The EXP3524 provides addition disk capacity that can be attached to the DS3524 for expansion.
Large data centers, especially those in the Telecommunications Industry, receive AC from their power company, then store it in a large battery called an Uninterruptible Power Supply (UPS). For DC-powered equipment, they can run directly off this battery source, but for AC-powered equipment, the DC has to be converted back to AC, and some energy is lost in the conversion. Thus, having DC-powered equipment is more energy efficient, or "green", for the IT data center.
Whether you get the DC-powered or AC-powered models, both are NEBS-compliant and ETSI-compliant.
New Tape Drive Options for Autoloaders and Libraries
IBM System Storage [TS2900 Autoloader] is a compact 1U-high tape system that supports one LTO drive and up to 9 tape cartridges. The TS2900 can support either an LTO-3, LTO-4 or LTO-5 half-height drive.
IBM System Storage [TS3100 and TS3200 Tape Libraries] were also enhanced. The TS3100 can accomodate one full-height LTO drive, or two half-height drives, and hold up to 24 cartridges. The TS3200 offers twice as many drives and space for cartridges.
From New York, Rolf went to London, Paris, Madrid, Morocco, Cairo, South Africa, Bangkok Thailand, Malaysia, Singapore, New Zealand, Australia, and then back to United States. I was hoping to run into him while I was in Australia and New Zealand last month, but our schedules did not line up.
Travelingwithout baggage is more than just a convenience, it is a metaphor for the philosophy that we should keep only what we need, and leave behind what we don't. This was the approach taken by IBM in the design of the IBM Storwize V7000 midrange disk system.
The IBM Storwize V7000 disk system consists of 2U enclosures. Controller enclosures have dual-controllers and drives. Expansion enclosures have just drives. Enclosures can have either 24 smaller form factor (SFF) 2.5-inch drives, or twelve larger 3.5-inch drives. A controller enclosure can be connected up to nine expansion enclosures.
The drives are all connected via 6 Gbps SAS, and come in a variety of speeds and sizes: 300GB Solid-State Drive (SSD); 300GB/450GB/600GB high-speed 10K RPM; and 2TB low-speed 7200 RPM drives. The 12-bay enclosures can be intermixed with 24-bay enclosures on the same system, and within an enclosure different speeds and sizes can be intermixed. A half-rack system (20U) could hold as much as 480TB of raw disk capacity.
This new system, freshly designed entirely within IBM, competes directly against systems that carry a lot of baggage, including the HDS AMS, HP EVA, an EMC CLARiiON CX4 systems. Instead, we decided to keep the what we wanted from our other successful IBM products.
Inspired by our successful XIV storage system, IBM has developed a web-based GUI that focuses on ease-of-use. This GUI uses the latest HTML5 and dojo widgets to provide an incredible user experience.
Borrowed from our IBM DS8000 high-end disk systems, state-of-the-art device adapters provide 6 Gbps SAS connectivity with a variety of RAID levels: 0, 1, 5, 6, and 10.
From our SAN Volume Controller, the embedded [ SVC 6.1 firmware] provides all of the features and functions normally associated with enterprise-class systems, including Easy Tier sub-LUN automated tiering between Solid-State Drives and Spinning disk, thin provisioning, external disk virtualization, point-in-time FlashCopy, disk mirroring, built-in migration capability, and long-distance synchronous and asynchronous replication.
Finally, the various "internal NDA" that kept me from publishing this sooner have expired, so now I have the long-awaited [Inside System Storage: Volume II], documenting IBM's transformation in its storage strategy, including behind-the-scenes commentary about IBM's acquisitions of XIV and Diligent. Available initially in paperback form. I am still working on the hard cover and eBook editions.
For those who have not yet read my first book, Inside System Storage: Volume I, it is still available from my publisher Lulu, in [hard cover], [paperback] and [eBook] editions.
IBM System Storage DS8800
A lesson IBM learned long ago was not to make radical changes to high-end disk systems, as clients who run mission-critical applications are more concerned about reliability, availability and serviceability than they are performance or functionality. Shipping any product before it was ready meant painfully having to fix the problems in the field instead.
(EMC apparently is learning this same lesson now with their VMAX disk system. Their Engenuity code from Symmetrix DMX4 was ported over to new CLARiiON-based hardware. With several hundred boxes in the field, they have already racked up over 150 severity 1 problems, roughly half of these resulted in data loss or unavailability issues. For the sake of our mutual clients that have both IBM servers and EMC disk, I hope they get their act together soon.)
To avoid this, IBM made incremental changes to the successful design and architecture of its predecessors. The new DS8800 shares 85 percent of the stable microcode from the DS8700 system. Functions like Metro Mirror, Global Mirror, and Metro/Global Mirror, are compatible with all of the previous models of the DS8000 series, as well as previous models of the IBM Enterprise Storage Server (ESS) line.
The previous models of DS8000 series were designed to take in cold air from both front and back, and route the hot air out the top, known as chimney design. However, many companies are re-arranging their data centers into separate cold aisles and hot aisles. The new DS8800 has front-to-back cooling to help accommodate this design.
My colleague Curtis Neal would call the rest of this a "BFD" announcement, which of course stands for "Bigger, Faster and Denser". The new DS8800 scales-up to more drives than its DS8700 predecessor, and can scale-out from a single-frame 2-way system to a multi-frame 4-way system. IBM has upgraded to faster 5GHz POWER6+ processors, with dual-core 8 Gbps FC and FICON host adapters, 8 Gbps device adapters, and 6 Gbps SAS connectivity to smaller form factor (SFF) 2.5-inch SAS drives. IBM Easy Tier will provide sub-LUN automated tiering between Solid-State Drives and spinning disk. The denser packaging with SFF drives means that we can pack over 1000 drives in only three frames, compared to five frames required for the DS8700.
The [IBM System Storage SAN Volume Controller] software release v6.1 brings Easy Tier sub-LUN automated tiering to the rest of the world. IBM Easy Tier moves the hottest, most active extents up to Solid-State Drives (SSD) and moves the coldest, least active down to spinning disk. This works whether the SSD is inside the SVC 2145-CF8 nodes, or in the managed disk pool.
Tired of waiting for EMC to finally deliver FAST v2 for your VMAX? It has been 18 months since they first announced that someday they would have sub-LUN automatic tiering. What is taking them so long? Why not virtualize your VMAX with SVC, and you can have it sooner!
SVC 6.1 also upgrades to a sexy new web-based GUI, which like the one for the IBM Storwize V7000, is based on the latest HTML5 and dojo widget standards. Inspired by the popular GUI from the IBM XIV Storage System, this GUI has greatly improved ease-of-use.
A client asked me to explain "Nearline storage" to them. This was easy, I thought, as I started my IBM career on DFHSM, now known as DFSMShsm for z/OS, which was created in 1977 to support the IBM 3850 Mass Storage System (MSS), a virtual storage system that blended disk drives and tape cartridges with robotic automation. Here is a quick recap:
Online storage is immediately available for I/O. This includes DRAM memory, solid-state drives (SSD), and always-on spinning disk, regardless of rotational speed.
Nearline storage is not immediately available, but can be made online quickly without human intervention. This includes optical jukeboxes, automated tape libraries, as well as spin-down massive array of idle disk (MAID) technologies.
Offline storage is not immediately available, and requires some human intervention to bring online. This can include USB memory sticks, CD/DVD optical media, shelf-resident tape cartridges, or other removable media.
Sadly, it appears a few storage manufacturers and vendors have been misusing the term "Nearline" to refer to "slower online" spinning disk drives. I find this [June 2005 technology paper from Seagate], and this [2002 NetApp Press Release], the latter of which included this contradiction for their "NearStore" disk array. Here is the excerpt:
"Providing online access to reference information—NetApp nearline storage solutions quickly retrieve and replicate reference and archive information maintained on cost-effective storage—medical images, financial models, energy exploration charts and graphs, and other data-intensive records can be stored economically and accessed in multiple locations more quickly than ever"
Which is it, "online access" or "nearline storage"?
If a client asked why slower drives consume less energy or generate less heat, I could explain that, but if they ask why slower drives must have SATA connections, that is a different discussion. The speed of a drive and its connection technology are for the most part independent. A 10K RPM drive can be made with FC, SAS or SATA connection.
I am opposed to using "Nearlne" just to distinguish between four-digit speeds (such as 5400 or 7200 RPM) versus "online" for five-digit speeds (10,000 and 15,000 RPM). The difference in performance between 10K RPM and 7200 RPM spinning disks is miniscule compared to the differences between solid-state drives and any spinning disk, or the difference between spinning disk and tape.
I am also opposed to using the term "Nearline" for online storage systems just because they are targeted for the typical use cases like backup, archive or other reference information that were previously directed to nearline devices like automated tape libraries.
Can we all just agree to refer to drives as "fast" or "slow", or give them RPM rotational speed designations, rather than try to incorrectly imply that FC and SAS drives are always fast, and SATA drives are always slow? Certainly we don't need new terms like "NL-SAS" just to represent a slower SAS connected drive.
Well, it feels like Tuesday and you know what that means... "IBM Announcement Day!" Actually, today is Wednesday, but since Monday was Memorial Day holiday here in the USA, my week is day-shifted. Yesterday, IBM announced its latest IBM FlashCopy Manager v2.2 release. Fellow blogger, Del Hoobler (IBM) has also posted something on this out atthe [Tivoli Storage Blog].
IBM FlashCopy Manager replaces two previous products. One was called Tivoli Storage Manager for Copy Services, the other was called Tivoli Storage Manager for Advanced Copy Services. To say people were confused between these two was an understatement, the first was for Windows, and the second was for UNIX and Linux operating systems. The solution? A new product that replaces both of these former products to support Windows, UNIX and Linux! Thus, IBM FlashCopy Manager was born. I introduced this product back in 2009 in my post [New DS8700 and other announcements].
IBM Tivoli Storage FlashCopy Manager provides what most people with "N series SnapManager envy" are looking for: application-aware point-in-time copies. This product takes advantage of the underlying point-in-time interfaces available on various disk storage systems:
FlashCopy on the DS8000 and SAN Volume Controller (SVC)
Snapshot on the XIV storage system
Volume Shadow Copy Services (VSS) interface on the DS3000, DS4000, DS5000 and non-IBM gear that supports this Microsoft Windows protocol
For Windows, IBM FlashCopy Manager can coordinate the backup of Microsoft Exchange and SQL Server. The new version 2.2 adds support for Exchange 2010 and SQL Server 2008 R2. This includes the ability to recover an individual mailbox or mail item from an Exchange backup. The data can be recovered directly to an Exchange server, or to a PST file.
For UNIX and Linux, IBM FlashCopy Manager can coordinate the backup of DB2, SAP and Oracle databases. Version 2.2 adds support specific Linux and Solaris operating systems, and provides a new capability for database cloning. Basically, database cloning restores a database under a new name with all the appropriate changes to allow its use for other purposes, like development, test or education training. A new "fcmcli" command line interface allows IBM FlashCopy Manager to be used for custom applications or file systems.
A common misperception is that IBM FlashCopy Manager requires IBM Tivoli Storage Manager backup software to function. That is not true. You have two options:
In Stand-alone mode, it's just you, the application, IBM FlashCopy Manager and your disk system. IBM FlashCopy Manager coordinates the point-in-time copies, maintains the correct number of versions, and allows you to backup and restore directly disk-to-disk.
Unified Recovery Management with Tivoli Storage Manager
Of course, the risk with relying only on point-in-time copies is that in most cases, they are on the same disk system as the original data. The exception being virtual disks from the SAN Volume Controller. IBM FlashCopy Manager can be combined with IBM Tivoli Storage Manager so that the point-in-time copies can be copied off to a local or remote TSM server, so that if the disk system that contains both the source and the point-in-time copies fails, you have a backup copy from TSM. In this approach, you can still restore from the point-in-time copies, but you can also restore from the TSM backups as well.
IBM FlashCopy Manager is an excellent platform to connect application-aware fucntionality with hardware-based copy services.
Well, I'm back safely from my tour of Asia. I am glad to report that Tokyo, Beijing and Kuala Lumpur are pretty much how I remember them from the last time I was there in each city. I have since been fighting jet lag by watching the last thirteen episodes of LOST season 6 and the series finale.
Recently, I have started seeing a lot of buzz on the term "Storage Federation". The concept is not new, but rather based on the work in database federation, first introduced in 1985 by [A federated architecture for information management] by Heimbigner and McLeod. For those not familiar with database federation, you can take several independent autonomous databases, and treat them as one big federated system. For example, this would allow you to issue a single query and get results across all the databases in the federated system. The advantage is that it is often easier to federate several disparate heterogeneous databases than to merge them into a single database. [IBM Infosphere Federation Server] is a market leader in this space, with the capability to federate DB2, Oracle and SQL Server databases.
Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.
Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.
Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.
Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.
Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.
To some extent, IBM SAN Volume Controller (SVC), XIV, Scale-Out NAS (SONAS), and Information Archive (IA) offer most, if not all, of these capabilities. EMC claims its VPLEX will be able to offer storage federation, but only with other VPLEX clusters, which brings up a good question. What about heterogenous storage federation? Before anyone accuses me of throwing stones at glass houses, let's take a look at each IBM solution:
IBM SAN Volume Controller
The IBM SAN Volume Controller has been doing storage federation since 2003. Not only can IBM SAN Volume Controller bring together storage from a variety of heterogenous storage, the SVC cluster itself can be a mix of different hardware models. You can have a 2145-8A4 node pair, 2145-8G4 node pair, and the new 2145-CF8 node pair, all combined together into a single SVC cluster. Upgrading SVC hardware nodes in an SVC cluster is always non-disruptive.
IBM XIV storage system
The IBM XIV has two kinds of independent modules. Data modules have processor, cache and 12 disks. Interface modules are data modules with additional processor, FC and Ethernet (iSCSI) adapters. Because these two modules play different roles in an XIV "colony", that number of each type is predetermined. Entry-level six-module systems have 2 interface and 4 data modules. Full 15-module systems have 6 interface and 9 data modules. Individual modules can be added or removed non-disruptively in an XIV.
IBM Scale-Out NAS
The SONAS is comprised of three kinds of nodes that work together in concert. A management node, one or more interface nodes, and two or more storage nodes. The storage nodes are paired to manage up to 240 nodes in a storage pod. Individual interface or data nodes can be added or removed non-disruptively in the SONAS. The underlying technology, the General Parallel File System, has been doing storage federation since 1996 for some of the largest top 500 supercomputers in the world.
IBM Information Archive (IA)
For the IA, there are 1, 2 or 3 nodes, which manages a set of collections. A collection can either be file-based using industry-standard NAS protocols, or object-based using the popular System Storage™ Archive Manager (SSAM) interface. Normally, you have as many collections as you have nodes, but nodes are powerful enough to manage two collections to provide N-1 availability. This allows a node to be removed, and a new node added into the IA "colony", in a non-disruptive manner.
Even in an ant colony, there are only a few types of ants, with typically one queen, several males, and lots of workers. But all the ants are red. You don't see colonies that mix between different species of ants. For databases, federation was a way to avoid the much harder task of merging databases from different platforms. For storage, I am surprised people have latched on to the term "federation", given our mixed results in the other "federations" we have formed, which I have conveniently (IMHO) ranked from least effective to most effective:
The Union of Soviet Socialist Republics (USSR)
My father used to say, "If the Soviet Union were in charge of the Sahara desert, they would run out of sand in 50 years." The [Soviet Union] actually lasted 68 years, from 1922 to 1991.
The United Nations (UN)
After the previous League of Nations failed, the UN was formed in 1945 to facilitate cooperation in international law, international security, economic development, social progress, human rights, and the achieving of world peace by stopping wars between countries, and to provide a platform for dialogue.
The European Union (EU)
With the collapse of the Greek economy, and the [rapid growth of debt] in the UK, Spain and France, there are concerns that the EU might not last past 2020.
The United States of America (USA)
My own country is a federation of states, each with its own government. California's financial crisis was compared to the one in Greece. My own state of Arizona is under boycott from other states because of its recent [immigration law]. However, I think the US has managed better than the EU because it has evolved over the past 200 years.
The Organization of the Petroleum Exporting Countries [OPEC]
Technically, OPEC is not a federation of cooperating countries, but rather a cartel of competing countries that have agreed on total industry output of oil to increase individual members' profits. Note that it was a non-OPEC company, BP, that could not "control their output" in what has now become the worst oil spill in US history. OPEC was formed in 1960, and is expected to collapse sometime around 2030 when the world's oil reserves run out. Matt Savinar has a nice article on [Life After the Oil Crash].
United Federation of Planets
The [Federation] fictitiously described in the Star Trek series appears to work well, an optimistic view of what federations could become if you let them evolve long enough.
Given the mixed results with "federation", I think I will avoid using the term for storage, and stick to the original term "scale-out architecture".
Here I am, day 11 of a 17-day business trip, on my last leg of the trip this week, in Kuala Lumpur in Malaysia. I have been flooded with requests to give my take on EMC's latest re-interpretation of storage virtualization, VPLEX.
I'll leave it to my fellow IBM master inventor Barry Whyte to cover the detailed technical side-by-side comparison. Instead, I will focus on the business side of things, using Simon Sinek's Why-How-What sequence. Here is a [TED video] from Garr Reynold's post
[The importance of starting from Why].
Let's start with the problem we are trying to solve.
Problem: migration from old gear to new gear, old technology to new technology, from one vendor to another vendor, is disruptive, time-consuming and painful.
Given that IT storage is typically replaced every 3-5 years, then pretty much every company with an internal IT department has this problem, the exception being those companies that don't last that long, and those that use public cloud solutions. IT storage can be expensive, so companies would like their new purchases to be fully utilized on day 1, and be completely empty on day 1500 when the lease expires. I have spoken to clients who have spent 6-9 months planning for the replacement or removal of a storage array.
A solution to make the data migration non-disruptive would benefit the clients (make it easier for their IT staff to keep their data center modern and current) as well as the vendors (reduce the obstacle of selling and deploying new features and functions). Storage virtualization can be employed to help solve this problem. I define virtualization as "technology that makes one set of resources look and feel like a different set of resources, preferably with more desirable characteristics.". By making different storage resources, old and new, look and feel like a single type of resource, migration can be performed without disrupting applications.
Before VPLEX, here is a breakdown of each solution:
Non-disruptive tech refresh, and a unified platform to provide management and functionality across heterogeneous storage.
Non-disruptive tech refresh, and a unified platform to provide management and functionality between internal tier-1 HDS storage, and external tier-2 heterogeneous storage.
Non-disruptive tech refresh, with unified multi-pathing driver that allows host attachment of heterogeneous storage.
New in-band storage virtualization device
Add in-band storage virtualization to existing storage array
New out-of-band storage virtualization device with new "smart" SAN switches
SAN Volume Controller
HDS USP-V and USP-VM
For IBM, the motivation was clear: Protect customers existing investment in older storage arrays and introduce new IBM storage with a solution that allows both to be managed with a single set of interfaces and provide a common set of functionality, improving capacity utilization and availability. IBM SAN Volume Controller eliminated vendor lock-in, providing clients choice in multi-pathing driver, and allowing any-to-any migration and copy services. For example, IBM SVC can be used to help migrate data from an old HDS USP-V to a new HDS USP-V.
With EMC, however, the motivation appeared to protect software revenues from their PowerPath multi-pathing driver, TimeFinder and SRDF copy services. Back in 2005, when EMC Invista was first announced, these three software represented 60 percent of EMC's bottom-line profit. (Ok, I made that last part up, but you get my point! EMC charges a lot for these.)
Back in 2006, fellow blogger Chuck Hollis (EMC) suggested that SVC was just a [bump in the wire] which could not possibly improve performance of existing disk arrays. IBM showed clients that putting cache(SVC) in front of other cache(back end devices) does indeed improve performance, in the same way that multi-core processors successfully use L1/L2/L3 cache. Now, EMC is claiming their cache-based VPLEX improves performance of back-end disk. My how EMC's story has changed!
So now, EMC announces VPLEX, which sports a blend of SVC-like and Invista-like characteristics. Based on blogs, tweets and publicly available materials I found on EMC's website, I have been able to determine the following comparison table. (Of course, VPLEX is not yet generally available, so what is eventually delivered may differ.)
Scalable, 1 to 4 node-pairs
One size fits all, single pair of CPCs
SVC-like, 1 to 4 director-pairs
Works with any SAN switches or directors
Required special "smart" switches (vendor lock-in)
SVC-like, works with any SAN switches or directors
Broad selection of IBM Subsystem Device Driver (SDD) offered at no additional charge, as well as OS-native drivers Windows MPIO, AIX MPIO, Solaris MPxIO, HP-UX PV-Links, VMware MPP, Linux DM-MP, and comercial third-party driver Symantec DMP.
Limited selection, with focus on priced PowerPath driver
Invista-like, PowerPath and Windows MPIO
Read cache, and choice of fast-write or write-through cache, offering the ability to improve performance.
No cache, Split-Path architecture cracked open Fibre Channel packets in flight, delayed every IO by 20 nanoseconds, and redirected modified packets to the appropriate physical device.
SVC-like, Read and write-through cache, offering the ability to improve performance.
Space-Efficient Point-in-Time copies
SVC FlashCopy supports up to 256 space-efficient targets, copies of copies, read-only or writeable, and incremental persistent pairs.
Like Invista, No
Remote distance mirror
Choice of SVC Metro Mirror (synchronous up to 300km) and Global Mirror (asynchronous), or use the functionality of the back-end storage arrays
No native support, use functionality of back-end storage arrays, or purchase separate product called EMC RecoverPoint to cover this lack of functionality
Limited synchronous remote-distance mirror within VPLEX (up to 100km only), no native asynchronous support, use functionality of back-end storage arrays
Provides thin provisioning to devices that don't offer this natively
Like Invista, No
SVC Split-Cluster allows concurrent read/write access of data to be accessed from hosts at two different locations several miles apart
I don't think so
PLEX-Metro, similar in concept but implemented differently
Non-disruptive tech refresh
Can upgrade or replace storage arrays, SAN switches, and even the SVC nodes software AND hardware themselves, non-disruptively
Tech refresh for storage arrays, but not for Invista CPCs
Tech refresh of back end devices, and upgrade of VPLEX software, non-disruptively. Not clear if VPLEX engines themselves can be upgraded non-disruptively like the SVC.
Heterogeneous Storage Support
Broad support of over 140 different storage models from all major vendors, including all CLARiiON, Symmetrix and VMAX from EMC, and storage from many smaller startups you may not have heard of
Invista-like. VPLEX claims to support a variety of arrays from a variety of vendors, but as far as I can find, only DS8000 supported from the list of IBM devices. Fellow blogger Barry Burke (EMC) suggests [putting SVC between VPLEX and third party storage devices] to get the heterogeneous coverage most companies demand.
Back-end storage requirement
Must define quorum disks on any IBM or non-IBM back end storage array. SVC can run entirely on non-IBM storage arrays
HP SVSP-like, requires at least one EMC storage array to hold metadata
SVC 2145-CF8 model supports up to four solid-state drives (SSD) per node that can treated as managed disk to store end-user data
Invista-like. VPLEX has an internal 30GB SSD, but this is used only for operating system and logs, not for end-user data.
In-band virtualization solutions from IBM and HDS dominate the market. Being able to migrate data from old devices to new ones non-disruptively turned out to be only the [tip of the iceberg] of benefits from storage virtualization. In today's highly virtualized server environment, being able to non-disruptively migrate data comes in handy all the time. SVC is one of the best storage solutions for VMware, Hyper-V, XEN and PowerVM environments. EMC watched and learned in the shadows, taking notes of what people like about the SVC, and decided to follow IBM's time-tested leadership to provide a similar offering.
EMC re-invented the wheel, and it is round. On a scale from Invista (zero) to SVC (ten), I give EMC's new VPLEX a six.
Last week, I presented "An Introduction to Cloud Computing" for two hours to the local Institute of Management Accountants [IMA] for their Continuing Professional Education [CPE]. Since I present IBM's leadership in Cloud Storage offerings, I have had to become an expert in Cloud Computing overall. The audience was a mix of bookkeepers, accountants, auditors, comptrollers, CPAs, and accounting teachers.
Here is a sample of the questions I took during and after my presentation:
If I need to shut down host machine, I lose all my virtual machines as well?
No, it is possible to seemlessly move virtual machines from one host to another. If you need to shut down a host machine, move all the VMs to other hosts, then you can shut down the empty host without impacting business.
Does the SaaS provider have to build their own app, can they not buy an app and then rent it out?
Yes, but they won't have competitive differentiation, and the software development they buy from will want a big cut of the action. SaaS developers that build their own applications can keep more of the profits for themselves.
How do backups work in cloud computing? Do I have to contact someone at the cloud computing company to find the backup tape?
Large datacenters often keep the most recent backups on disk, and older versions on tape in automated tape libraries that can fetch your backup in less than 2 minutes. Because of this, there is no need to talk to anyone, you can schedule or invoke your own backups, and often perform the recovery yourself using self-service tools.
Last month, my sister tried to rent a car during the week the Tucson Gem Show, but they were out of cars she wanted to drive. Could this happen with Cloud Computing?
Not likely. With rental cars, the cars have to be physically in Tucson to rent them. Rental companies could have brought cars down from Phoenix to satisfy demand. With Cloud Computing, it is all accessible over the global network, you are not limited to the cloud providers nearest you.
Is there a reason why Amazon Web Services (AWS) charges more for a Windows image than a Linux image?
Yes, Amazon and Microsoft have a patent cross-licensing agreement where Amazon pays Microsoft for the priveledge of offering Windows-based images on their EC2 cloud infrastructure. It just makes business sense to pass those costs onto the consumer. Linux is a free open source operating system, and is often the better choice.
So if we rent a machine from Amazon, they send it to my accounting office? What exactly am I getting for 12 cents per hour?
No. The computer remains in their datacenter. You get a virtual machine that runs 1.2Ghz Intel processor, with 1700MB of RAM, and 160GB of hard disk space, with Windows operating system running on it, comparable to a machine you can get at the local BestBuy, but instead of it running in the next room, it is running in a datacenter somewhere else in the United States with electricity and air conditioning.
You access it remotely from your desktop or laptop PC.
Why would I ever rent more than one computer?
It depends on your workload. For example, Derek Gottfrid at the New York Times needed to convert 11 million articles from TIFF format to PDF format so that he could put them up on the web. This would have taken him months using a single computer, so he rented 100 computers and got the entire stack converted in 24 hours, for a cost of about $240. See the articles [Self-Service, Prorated, Super Computing] and [TimesMachine] for details.
What about throughput? Won't I need to run cables from my accounting office to this cloud computing data center?
You will need connectivity, most likely from connections provided by your local telephone or cable company, or through the Internet. Certainly, there can be cases where direct privately-owned fiber optic cables, known as "dark fiber", can directly connect consumers to local Cloud service providers, for added security.
What about medical records? Will Cloud Computing help the Healthcare industry?
Yes, hospitals are finding that digitizing their records greatly reduces costs. IBM offers the Grid Medical Archive Solution [GMAS] as a private cloud storage solution to store X-ray images and other electronic medical records on disk and tape, and these records can be accessed from multiple hospitals and clinics, wherever the doctor or patient happens to be.
The advantage of personal computers was individualization, I could put on my own choices of software, and customize my own settings, won't we lose this with Cloud Computing?
Yes, customized software and settings cost companies millions of dollars with help desk calls. Cloud Computing attempts to provide some standardization, reducing the amount of effort to support IT operations.
Won't putting all the computers into a big datacenter make them more vulnerable to hackers?
Security is a well-known concern, but this is being addressed with encryption, access control lists, multi-tenancy isolation, and VPN connections.
My daughter has a BlackBerry or iPod or something, and when we mentioned that someone in Phoenix wore a monkey suit to avoid photo-radar speed cameras, she was able to pull up a picture on her little hand-held thing, is this the future?
Yes, mobile phones and other hand-held devices now have internet access to take advantage of Cloud Computing services. People will be able to access the information they need from wherever they happen to be. (You can see the picture here: [Man Dons Mask for Speed-Camera Photos])
IBM offers a variety of Cloud Computing services, as well as customized solutions and integrated systems that can be deployed on-premises behind your corporate firewall. To learn more, go to [ibm.com/cloud].
The second speaker was local celebrity Dan Ryan presenting the financials for the upcoming [Rosemont Copper] mining operations. Copper is needed for emerging markets, such as hybrid vehicles and wind turbines. Copper is a major industry in Arizona.
This week I got a comment on my blog post [IBM Announces another SSD Disk offering!]. The exchange involved Solid State Disk storage inside the BladeCenter and System x server line. Sandeep offered his amazing performance results, but we have no way to get in contact with him. So, for those interested, I have posted on SlideShare.net a quick five-chart presentation on recent tests with various SSD offerings on the eX5 product line here:
Continuing my drawn out coverage of IBM's big storage launch of February 9, today I'll cover the IBM System Storage TS7680 ProtecTIER data deduplication gateway for System z.
On the host side, TS7680 connects to mainframe systems running z/OS or z/VM over FICON attachment, emulating an automated tape library with 3592-J1A devices. The TS7680 includes two controllers that emulate the 3592 C06 model, with 4 FICON ports each. Each controller emulates up to 128 virtual 3592 tape drives, for a total of 256 virtual drives per TS7680 system. The mainframe sees up to 1 million virtual tape cartridges, up to 100GB raw capacity each, before compression. For z/OS, the automated library has full SMS Tape and Integrated Library Management capability that you would expect.
Inside, the two control units are both connected to a redundant pair cluster of ProtecTIER engines running the HyperFactor deduplication algorithm that is able to process the deduplication inline, as data is ingested, rather than post-process that other deduplication solutions use. These engines are similar to the TS7650 gateway machines for distributed systems.
On the back end, these ProtecTIER deduplication engines are then connected to external disk, up to 1PB. If you get 25x data deduplication ratio on your data, that would be 25PB of mainframe data stored on only 1PB of physical disk. The disk can be any disk supported by ProtecTIER over FCP protocol, not just the IBM System Storage DS8000, but also the IBM DS4000, DS5000 or IBM XIV storage system, various models of EMC and HDS, and of course the IBM SAN Volume Controller (SVC) with all of its supported disk systems.
It's Tuesday, and that means more IBM announcements!
I haven't even finished blogging about all the other stuff that got announced last week, and here we are with more announcements. Since IBM's big [Pulse 2010 Conference] is next week, I thought I would cover this week's announcement on Tivoli Storage Manager (TSM) v6.2 release. Here are the highlights:
Client-Side Data Deduplication
This is sometimes referred to as "source-side" deduplication, as storage admins can get confused on which servers are clients in a TSM client-server deployment. The idea is to identify duplicates at the TSM client node, before sending to the TSM server. This is done at the block level, so even files that are similar but not identical, such as slight variations from a master copy, can benefit. The dedupe process is based on a shared index across all clients, and the TSM server, so if you have a file that is similar to a file on a different node, the duplicate blocks that are identical in both would be deduplicated.
This feature is available for both backup and archive data, and can also be useful for archives using the IBM System Storage Archive Manager (SSAM) v6.2 interface.
Simplified management of Server virtualization
TSM 6.2 improves its support of VMware guests by adding auto-discovery. Now, when you spontaneously create a new virtual machine OS guest image, you won't have to tell TSM, it will discover this automatically! TSM's legendary support of VMware Consolidated Backup (VCB) now eliminates the manual process of keeping track of guest images. TSM also added support of the Vstorage API for file level backup and recovery.
While IBM is the #1 reseller of VMware, we also support other forms of server virtualization. In this release, IBM adds support for Microsoft Hyper-V, including support using Microsoft's Volume Shadow Copy Services (VSS).
Automated Client Deployment
Do you have clients at all different levels of TSM backup-archive client code deployed all over the place? TSM v6.2 can upgrade these clients up to the latest client level automatically, using push technology, from any client running v5.4 and above. This can be scheduled so that only certain clients are upgraded at a time.
Simultaneous Background Tasks
The TSM server has many background administrative tasks:
Migration of data from one storage pool to another, based on policies, such as moving backups and archives on a disk pool over to a tape pools to make room for new incoming data.
Storage pool backup, typically data on a disk pool is copied to a tape pool to be kept off-site.
Copy active data. In TSM terminology, if you have multiple backup versions, the most recent version is called the active version, and the older versions are called inactive. TSM can copy just the active versions to a separate, smaller disk pool.
In previous releases, these were done one at a time, so it could make for a long service window. With TSM v6.2, these three tasks are now run simultaneously, in parallel, so that they all get done in less time, greatly reducing the server maintenance window, and freeing up tape drives for incoming backup and archive data. Often, the same file on a disk pool is going to be processed by two or more of these scheduled tasks, so it makes sense to read it once and do all the copies and migrations at one time while the data is in buffer memory.
Enhanced Security during Data Transmission
Previous releases of TSM offered secure in-flight transmission of data for Windows and AIX clients. This security uses Secure Socket Layer (SSL) with 256-bit AES encryption. With TSM v6.2, this feature is expanded to support Linux, HP-UX and Solaris.
Improved support for Enterprise Resource Planning (ERP) applications
I remember back when we used to call these TDPs (Tivoli Data Protectors). TSM for ERP allows backup of ERP applications, seemlessly integrating with database-specific tools like IBM DB2, Oracle RMAN, and SAP BR*Tools. This allows one-to-many and many-to-one configurations between SAP servers and TSM servers. In other words, you can have one SAP server backup to several TSM servers, or several SAP servers backup to a single TSM server. This is done by splitting up data bases into "sub-database objects", and then process each object separately. This can be extremely helpful if you have databases over 1TB in size. In the event that backing up an object fails and has to be re-started, it does not impact the backup of the other objects.
Continuing on the [IBM Storage Launch of February 9], John Sing has offered to write the following guest post about the [announcement] of IBM Scale Out Network Attached Storage [IBM SONAS]. John and I have known each other for a while, traveled the world to work with clients and speak at conferences. He is an Executive IT Consultant on the SONAS team.
Guest Post written by John Sing, IBM San Jose, California
What is IBM SONAS? It’s many things, so let’s start with this list:
It’s IBM’s delivery of a productized, pre-packaged Scale Out NAS global virtual file server, delivered in a easy-to-use appliance
IBM’s solution for large enterprise file-based storage requirements, where massive scale in capacity and extreme performance is required, especially for today’s modern analytics-based Competitive Advantage IT applications
Scales to many petabytes of usable storage and billions of files in a single global namespace
Provides integrated central management, central deployment of petabyte levels of storage
Modular commercial-off-the-shelf [COTS] building blocks. I/O, storage, network capacity scale independently of each other. Up to 30 interface nodes and 60 storage nodes, in an IBM General Parallel File System [GPFS]-based cluster. Each 10Gb CEE interface node port is capable of streaming at 900 MB/sec
Files are written in block-sized chunks, striped over as many multiple disk drives in parallel – aggregating throughput on a massive scale (both read and write), as well as providing auto-tuning, auto-balancing
Functionality delivered via one program product, IBM SONAS Software, which provides all of above functions, along with clustered CIFS, NFS v2/v3 with session auto-failover, FTP, high availability, and more
IBM SONAS makes automated tiered storage achievable and realistic at petabyte levels:
Integrated high performance parallel scan engine capable of identifying files at over 10 million files per minute per node
Integrated parallel data movement engine to physically relocate the data within tiered storage
And we’re just scratching the surface. IBM has plans to deploy additional protocols, storage hardware options, and software features.
However, the real question of interest should be, “who really needs that much storage capacity and throughput horsepower?”
The answer may surprise you. IMHO, the answer is: almost any modern enterprise that intends to stay competitive. Hmmm…… Consider this: the reason that IT exists today is no longer to simply save cost (that may have been true 10 years ago). Everyone is reducing cost… but how much competitive advantage is purchased through “let’s cut our IT budget by 10% this year”?
Notice that in today’s world, there are (many) bright people out there, changing our world every day through New Intelligence Competitive Advantage analytics-based IT applications such as real time GPS traffic data, real time energy monitoring and redirection, real time video feed with analytics, text analytics, entity analytics, real time stream computing, image recognition applications, HDTV video on demand, etc. Think of how GPS industry, cell phone / Twitter / Facebook, iPhone and iPad applications, as examples, are creating whole new industries and markets almost overnight.
Then start asking yourself, “What's behind these Competitive Advantage IT applications – as they are the ones that are driving all my storage growth? Why do they need so much storage? What do those applications mean for my storage requirements?”
To be “real-time”, long-held IT paradigms are being broken every day. Things like “data proximity”: we can no longer can extract terabytes of data from production databases and load them to a data warehouse – where’s the “real-time” in that? Instead, today’s modern analytics-based applications demand:
Multiple processes and servers (sometimes numbering in the 100s) simultaneously ….
Running against hundreds of terabytes of data of live production data, streaming in from expanding number of smarter sensors, input devices, users
Producing digital image-intensive results that must be programatically sent to an ever increasing number of mobile devices in geographically dispersed storage
Requiring parallel performance levels, that used to be the domain only of High Performance Computing (HPC)
This is a major paradigm shift in storage – and that is the solution and storage capabilities that IBM SONAS is designed to address. And of course, you should be able to save significant cost through the SONAS global virtual file server consolidation and virtualization as well.
Certainly, this topic warrants more discussion. If you found it interesting, contact me, your local IBM Business Partner or IBM Storage rep to discuss Competitive Advantage IT applications and SONAS further.
Wrapping up my coverage of the Data Center Conference 2009, the week ends with a celebration. This year we had six "Hospitality Suites" sponsored by various different vendors. Each suite has its own theme, decorations and entertainment. The first suite was VMware's "Cloud 9 Ultra Lounge" which offered blue cotton candy martinis. IBM is the leading reseller of VMware.
When the red martini liquid was poured on top of the blue cotton candy, the result was a nasty muddish brown grey color. The guy on the left chose to get the martini without the blue cotton candy. We joked that this is perhaps a good metaphor for cloud computing in general. It looks good on paper, until you actually put it all together and realize it does not look as blue and puffy as you were expecting. However, it tasted good!
Next suite was sponsored by Cisco, one of IBM's storage networking partners. Cisco also decorated in blue, as the guy Jake in the middle demonstrates.
Next suite was sponsored by Brocade, our supplier for IBM-branded networking gear. They went with a red-and-black color scheme. Sadly, many of my pictures inside involved straight jackets and unicycles, so not appropriate for this blog. However, it was easy to remember that they were talking about their "extraordinary networks". Makes you want to help out Brocade by contacting your nearest IBM storage sales rep and buy yourself a SAN768B or two.
Somewhere along the way, we picked up Hawaiian leis at the "Margaritaville" Hospitality Suite, compliments of sponsor APC by Schneider Electric. We had the best "Filet Mignon" appetizers at "Club Dedupe" by our competitor DataDomain, and some fun with my friends over at Computer Associates' "Top Gun" suite. Pictured at right are Paula Koziol with Christian Barrera from Argentina. A good time was had by all.