Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
For the longest time, people thought that humans could not run a mile in less than four minutes. Then, in 1954, [Sir Roger Bannister] beat that perception, and shortly thereafter, once he showed it was possible, many other runners were able to achieve this also. The same is being said now about the IBM Watson computer which appeared this week against two human contestants on Jeopardy!
Often, when a company demonstrates new techology, these are prototypes not yet ready for commercial deployment until several years later. IBM Watson, however, was made mostly from commercially available hardware, software and information resources. As several have noted, the 1TB of data used to search for answers could fit on a single USB drive that you buy at your local computer store.
Take a look at the [IBM Research Team] to determine how the project was organized. Let's decide what we need, and what we don't in our Watson Jr.:
Do we need it for Watson Jr.?
Yes, That's you. Assuming this is a one-person project, you will act as Team Lead.
Yes, I hope you know computer programming!
No, since Watson Jr. won't be appearing on Jeopardy, we won't need strategy on wager amounts for the Daily Double, or what clues to pick next. Let's focus merely on a computer that can accept a question in text, and provide an answer back, in text.
Yes, this team focused on how to wire all the hardware together. We need to do that, although Watson Jr. will have fewer components.
Optional. For now, let's have Watson Jr. just return its answer in plain text. Consider this Extra Credit after you get the rest of the system working. Consider using [eSpeak], [FreeTTS], or the Modular Architecture for Research on speech sYnthesis [MARY] Text-to-Speech synthesizers.
Yes, I will explain what this is, and why you need it.
Yes, we will need to get information for Watson Jr. to process
Yes, this team developed a system for parsing the question being asked, and to attach meaning to the different words involved.
No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.
(Disclaimer: As with any Do-It-Yourself (DIY) project, I am not responsible if you are not happy with your Watson Jr. I am basing the approach on what I read from publicly available sources, and my work in Linux, supercomputers, XIV, and SONAS. For our purposes, Watson Jr. is based entirely on commodity hardware, open source software, and publicly available sources of information. Your Watson Jr. will certainly not be as fast or as clever as the IBM Watson you saw on television.)
Step 1: Buy the Hardware
Supercomputers are built as a cluster of identical compute servers lashed together by a network. You will be installing Linux on them, so if you can avoid paying extra for Microsoft Windows, that would save you some money. Here is your shopping list:
Three x86 hosts, with the following:
64-bit quad-core processor, either Intel-VT or AMD-V capable,
8GB of DRAM, or larger
300GB of hard disk, or larger
CD or DVD Read/Write drive
Computer Monitor, mouse and keyboard
Ethernet 1GbE 4-port hub, and appropriate RJ45 cables
Surge protector and Power strip
Local Console Monitor (LCM) 4-port switch (formerly known as a KVM switch) and appropriate cables. This is optional, but will make it easier during the development. Once your Watson Jr. is operational, you will only need the monitor and keyboard attached to one machine. The other two machines can remain "headless" servers.
Step 2: Establish Networking
IBM Watson used Juniper switches running at 10Gbps Ethernet (10GbE) speeds, but was not connected to the Internet while playing Jeopardy! Instead, these Ethernet links were for the POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes.
The Watson Jr. will be able to run "disconnected from the Internet" as well. However, you will need Internet access to download the code and information sources. For our purposes, 1GbE should be sufficient. Connect your Ethernet hub to your DSL or Cable modem. Connect all three hosts to the Ethernet switch. Connect your keyboard, video monitor and mouse to the LCM, and connect the LCM to the three hosts.
Step 3: Install Linux and Middleware
To say I use Linux on a daily basis is an understatement. Linux runs on my Android-based cell phone, my laptop at work, my personal computers at home, most of our IBM storage devices from SAN Volume Controller to XIV to SONAS, and even on my Tivo at home which recorded my televised episodes of Jeopardy!
For this project, you can use any modern Linux distribution that supports KVM. IBM Watson used Novel SUSE Linux Enterprise Server [SLES 11]. Alternatively, I can also recommend either Red Hat Enterprise Linux [RHEL 6] or Canonical [Ubuntu v10]. Each distribution of Linux comes in different orientations. Download the the 64-bit "ISO" files for each version, and burn them to CDs.
Graphical User Interface (GUI) oriented, often referred to as "Desktop" or "HPC-Head"
Command Line Interface (CLI) oriented, often referred to as "Server" or "HPC-Compute"
Guest OS oriented, to run in a Hypervisor such as KVM, Xen, or VMware. Novell calls theirs "Just Enough Operating System" [JeOS].
For Watson Jr., I have chosen a [multitier architecture], sometimes referred to as an "n-tier" or "client/server" architecture.
Host 1 - Presentation Server
For the Human-Computer Interface [HCI], the IBM Watson received categories and clues as text files via TCP/IP, had a [beautiful avatar] representing a planet with 42 circles streaking across in orbit, and text-to-speech synthesizer to respond in a computerized voice. Your Watson Jr. will not be this sophisticated. Instead, we will have a simple text-based Query Panel web interface accessible from a browser like Mozilla Firefox.
Host 1 will be your Presentation Server, the connection to your keyboard, video monitor and mouse. Install the "Desktop" or "HPC Head Node" version of Linux. Install [Apache Web Server and Tomcat] to run the Query Panel. Host 1 will also be your "programming" host. Install the [Java SDK] and the [Eclipse IDE for Java Developers]. If you always wanted to learn Java, now is your chance. There are plenty of books on Java if that is not the language you normally write code.
While three little systems doesn't constitute an "Extreme Cloud" environment, you might like to try out the "Extreme Cloud Administration Tool", called [xCat], which was used to manage the many servers in IBM Watson.
Host 2 - Business Logic Server
Host 2 will be driving most of the "thinking". Install the "Server" or "HPC Compute Node" version of Linux. This will be running a server virtualization Hypervisor. I recommend KVM, but you can probably run Xen or VMware instead if you like.
Host 3 - File and Database Server
Host 3 will hold your information sources, indices, and databases. Install the "Server" or "HPC Compute Node" version of Linux. This will be your NFS server, which might come up as a question during the installation process.
Technically, you could run different Linux distributions on different machines. For example, you could run "Ubuntu Desktop" for host 1, "RHEL 6 Server" for host 2, and "SLES 11" for host 3. In general, Red Hat tries to be the best "Server" platform, and Novell tries to make SLES be the best "Guest OS".
My advice is to pick a single distribution and use it for everything, Desktop, Server, and Guest OS. If you are new to Linux, choose Ubuntu. There are plenty of books on Linux in general, and Ubuntu in particular, and Ubuntu has a helpful community of volunteers to answer your questions.
Step 4: Download Information Sources
You will need some documents for Watson Jr. to process.
IBM Watson used a modified SONAS to provide a highly-available clustered NFS server. For Watson Jr., we won't need that level of sophistication. Configure Host 3 as the NFS server, and Hosts 1 and 2 as NFS clients. See the [Linux-NFS-HOWTO] for details. To optimize performance, host 3 will be the "official master copy", but we will use a Linux utility called rsync to copy the information sources over to the hosts 1 and 2. This allows the task engines on those hosts to access local disk resources during question-answer processing.
We will also need a relational database. You won't need a high-powered IBM DB2. Watson Jr. can do fine with something like [Apache Derby] which is the open source version of IBM CloudScape from its Informix acquisition. Set up Host 3 as the Derby Network Server, and Hosts 1 and 2 as Derby Network Clients. For more about structured content in relational databases, see my post [IBM Watson - Business Intelligence, Data Retrieval and Text Mining].
Linux includes a utility called wget which allows you to download content from the Internet to your system. What documents you decide to download is up to you, based on what types of questions you want answered. For example, if you like Literature, check out the vast resources at [FullBooks.com]. You can automate the download by writing a shell script or program to invoke wget to all the places you want to fetch data from. Rename the downloaded files to something unique, as often they are just "index.html". For more on wget utility, see [IBM Developerworks].
Step 5: The Query Panel - Parsing the Question
Next, we need to parse the question and have some sense of what is being asked for. For this we will use [OpenNLP] for Natural Language Processing, and [OpenCyc] for the conceptual logic reasoning. See Doug Lenat presenting this 75-minute video [Computers versus Common Sense]. To learn more, see the [CYC 101 Tutorial].
Unlike Jeopardy! where Alex Trebek provides the answer and contestants must respond with the correct question, we will do normal Question-and-Answer processing. To keep things simple, we will limit questions to the following formats:
Who is ...?
Where is ...?
When did ... happen?
What is ...?
Host 1 will have a simple Query Panel web interface. At the top, a place to enter your question, and a "submit" button, and a place at the bottom for the answer to be shown. When "submit" is pressed, this will pass the question to "main.jsp", the Java servlet program that will start the Question-answering analysis. Limiting the types of questions that can be posed will simplify hypothesis generation, reduce the candidate set and evidence evaluation, allowing the analytics processing to continue in reasonable time.
Step 6: Unstructured Information Management Architecture
The "heart and soul" of IBM Watson is Unstructured Information Management Architecture [UIMA]. IBM developed this, then made it available to the world as open source. It is maintained by the [Apache Software Foundation], and overseen by the Organization for the Advancement of Structured Information Standards [OASIS].
Basically, UIMA lets you scan unstructured documents, gleam the important points, and put that into a database for later retrieval. In the graph above, DBs means 'databases' and KBs means 'knowledge bases'. See the 4-minute YouTube video of [IBM Content Analytics], the commercial version of UIMA.
Starting from the left, the Collection Reader selects each document to process, and creates an empty Common Analysis Structure (CAS) which serves as a standardized container for information. This CAS is passed to Analysis Engines , composed of one or more Annotators which analyze the text and fill the CAS with the information found. The CAS are passed to CAS Consumers which do something with the information found, such as enter an entry into a database, update an index, or update a vote count.
(Note: This point requires, what we in the industry call a small matter of programming, or [SMOP]. If you've always wanted to learn Java programming, XML, and JDBC, you will get to do plenty here. )
If you are not familiar with UIMA, consider this [UIMA Tutorial].
Step 7: Parallel Processing
People have asked me why IBM Watson is so big. Did we really need 2,880 cores of processing power? As a supercomputer, the 80 TeraFLOPs of IBM Watson would place it only in 94th place on the [Top 500 Supercomputers]. While IBM Watson may be the [Smartest Machine on Earth], the most powerful supercomputer at this time is the Tianhe-1A with more than 186,000 cores, capable of 2,566 TeraFLOPs.
To determine how big IBM Watson needed to be, the IBM Research team ran the DeepQA algorithm on a single core. It took 2 hours to answer a single Jeopardy question! Let's look at the performance data:
Number of cores
Time to answer one Jeopardy question
Single IBM Power750 server
< 4 minutes
Single rack (10 servers)
< 30 seconds
IBM Watson (90 servers)
< 3 seconds
The old adage applies, [many hands make for light work]. The idea is to divide-and-conquer. For example, if you wanted to find a particular street address in the Manhattan phone book, you could dispatch fifty pages to each friend and they could all scan those pages at the same time. This is known as "Parallel Processing" and is how supercomputers are able to work so well. However, not all algorithms lend well to parallel processing, and the phrase [nine women can't have a baby in one month] is often used to remind us of this.
Fortuantely, UIMA is designed for parallel processing. You need to install UIMA-AS for Asynchronous Scale-out processing, an add-on to the base UIMA Java framework, supporting a very flexible scale-out capability based on JMS (Java Messaging Services) and ActiveMQ. We will also need Apache Hadoop, an open source implementation used by Yahoo Search engine. Hadoop has a "MapReduce" engine that allows you to divide the work, dispatch pieces to different "task engines", and the combine the results afterwards.
Host 2 will run Hadoop and drive the MapReduce process. Plan to have three KVM guests on Host 1, four on Host 2, and three on Host 3. That means you have 10 task engines to work with. These task engines can be deployed for Content Readers, Analysis Engines, and CAS Consumers. When all processing is done, the resulting votes will be tabulated and the top answer displayed on the Query Panel on Host 1.
Step 8: Testing
To simplify testing, use a batch processing approach. Rather than entering questions by hand in the Query Panel, generate a long list of questions in a file, and submit for processing. This will allow you to fine-tune the environment, optimize for performance, and validate the answers returned.
There you have it. By the time you get your Watson Jr. fully operational, you will have learned a lot of useful skills, including Linux administration, Ethernet networking, NFS file system configuration, Java programming, UIMA text mining analysis, and MapReduce parallel processing. Hopefully, you will also gain an appreciation for how difficult it was for the IBM Research team to accomplish what they had for the Grand Challenge on Jeopardy! Not surprisingly, IBM Watson is making IBM [as sexy to work for as Apple, Google or Facebook], all of which started their business in a garage or a basement with a system as small as Watson Jr..
In the prank, I indicated that I had submitted my video to the [Arizona International Film Festival], of AIFF for short, which coincidently was running April 1-20, and that it had won an award. I invited everyone who read my blog to see me accept the award at a ceremony at 6:00pm on April 1 at the Fox Theater, followed by the 8:00pm showing of another award-winning film.
I didn't submit the video, the video didn't win any award, and I was not invited to the award ceremony. I did, however, plan to see the movie at 8:00pm.
When I got there, I learned that a dozen of my friends, not realizing it was a prank, showed up, asking for me. The AIFF was quite amused, and invited me to award ceremony still going on. The other filmmakers were impressed I had concocted such an elaborate social media campaign!
A slideshow is another style of video, animating still images to music. The [Ken Burns effect] was named after the technique fellow filmmaker Ken Burns used in his documentaries.
In 2010, I worked with the XIV team to address FUD that our competitors were flinging about double drive failures. My blog post [Double Drive Failure Debunked: XIV Two Years Later] set the record straight and put this issue to rest once and for all. XIV sales shot up dramatically after this post went public!
Are you tired of hearing about Cloud Computing without having any hands-on experience? Here's your chance. IBM has recently launched its IBM Development and Test Cloud beta. This gives you a "sandbox" to play in. Here's a few steps to get started:
Generate a "key pair". There are two keys. A "public" key that will reside in the cloud, and a "private" key that you download to your personal computer. Don't lose this key.
Request an IP address. This step is optional, but I went ahead and got a static IP, so I don't have to type in long hostnames like "vm353.developer.ihost.com".
Request storage space. Again, this step is optional, but you can request a 50GB, 100GB and 200GB LUN. I picked a 200GB LUN. Note that each instance comes with some 10 to 30GB storage already. The advantage to a storage LUN is that it is persistent, and you can mount it to different instances.
Start an "instance". An "instance" is a virtual machine, pre-installed with whatever software you chose from the "asset catalog". These are Linux images running under Red Hat Enterprise Virtualization (RHEV) which is based on Linux's kernel virtual machine (KVM). When you start an instance, you get to decide its size (small, medium, or large), whether to use your static IP address, and where to mount your storage LUN. On the examples below, I had each instance with a static IP and mounted the storage LUN to /media/storage subdirectory. The process takes a few minutes.
So, now that you are ready to go, what instance should you pick from the catalog? Here are three examples to get you started:
IBM WebSphere sMASH Application Builder
Base OS server to run LAMP stack
Next, I decided to try out one of the base OS images. There are a lot of books on Linux, Apache, MySQL and PHP (LAMP) which represents nearly 70 percent of the web sites on the internet. This instance let's you install all the software from scratch. Between Red Hat and Novell SUSE distributions of Linux, Red Hat is focused on being the Hypervisor of choice, and SUSE is focusing on being the Guest OS of choice. Most of the images on the "asset catalog" are based on SLES 10 SP2. However, there was a base OS image of Red Hat Enterprise Linux (RHEL) 5.4, so I chose that.
To install software, you either have to find the appropriate RPM package, or download a tarball and compile from source. To try both methods out, I downloaded tarballs of Apache Web Server and PHP, and got the RPM packages for MySQL. If you just want to learn SQL, there are instances on the asset catalog with DB2 and DB2 Express-C already pre-installed. However, if you are already an expert in MySQL, or are following a tutorial or examples based on MySQL from a classroom textbook, or just want a development and test environment that matches what your company uses in production, then by all means install MySQL.
This is where my SSH client comes in handy. I am able to login to my instance and use "wget" to fetch the appropriate files. An alternative is to use "SCP" (also part of PuTTY) to do a secure copy from your personal computer up to the instance. You will need to do everything via command line interface, including editing files, so I found this [VI cheat sheet] useful. I copied all of the tarballs and RPMs on my storage LUN ( /media/storage ) so as not to have to download them again.
Compiling and configuring them is a different matter. By default, you login as an end user, "idcuser" (which stands for IBM Developer Cloud user). However, sometimes you need "root" level access. Use "sudo bash" to get into root level mode, and this allows you to put the files where they need to be. If you haven't done a configure/make/make install in awhile, here's your chance to relive those "glory days".
In the end, I was able to confirm that Apache, MySQL and PHP were all running correctly. I wrote a simple index.php that invoked phpinfo() to show all the settings were set correctly. I rebooted the instance to ensure that all of the services started at boot time.
Rational Application Developer over VDI
This last example, I started an instance pre-installed with Rational Application Developer (RAD), which is a full Integrated Development Environment (IDE) for Java and J2EE applications. I used the "NX Client" to launch a virtual desktop image (VDI) which in this case was Gnome on SLES 10 SP2. You might want to increase the screen resolution on your personal computer so that the VDI does not take up the entire screen.
From this VDI, you can launch any of the programs, just as if it were your own personal computer. Launch RAD, and you get the familiar environment. I created a short Java program and launched it on the internal WebSphere Application Server test image to confirm it was working correctly.
If you are thinking, "This is too good to be true!" there is a small catch. The instances are only up and running for 7 days. After that, they go away, and you have to start up another one. This includes any files you had on the local disk drive. You have a few options to save your work:
Copy the files you want to save to your storage LUN. This storage LUN appears persistent, and continues to exist after the instance goes away.
Take an "image" of your "instance", a function provided in the IBM Developer and Test Cloud. If you start a project Monday morning, work on it all week, then on Friday afternoon, take an "image". This will shutdown your instance, and backup all of the files to your own personal "asset catalog" so that the next time you request an instance, you can chose that "image" as the starting point.
Another option is to request an "extension" which gives you another 7 days for that instance. You can request up to five unique instances running at the same time, so if you wanted to develop and test a multi-host application, perhaps one host that acts as the front-end web server, another host that does some kind of processing, and a third host that manages the database, this is all possible. As far as I can tell, you can do all the above from either a Windows, Mac or Linux personal computer.
Getting hands-on access to Cloud Computing really helps to understand this technology!
Back in June, I mentioned this blog was [Moving to MyDeveloperWorks] which is based on IBM Lotus Connections.
Finally, the move is complete for all bloggers. If you are having problems with the redirects, you might need to unsubscribe and re-subscribe in your RSS feed reader. Here are the new links for several IBM bloggers that have moved over:
If Eskimos have 37 words for "snow", then EMC has perhaps a similar number of names for "failure". I have already covered a few of their past attempts, including [ATMOS], [Invista], and [VPLEX]. Last week, EMC introduced its latest, called XtremeIO.
But rather than focus on XtremeIO's many shortcomings, I thought it would be better to point out the highlights of IBM's All-Flash array, IBM FlashSystem.
But first, a quick story.
Two years ago, I worked the booth at [Oracle OpenWorld 2011]. After a conference attendee had visited the booths of Violin Memory and Pure Storage, he asked me why IBM did not have an all-Flash array.
Of course IBM did, and I showed him the [Storwize V7000]. For example, a 2U model with 18 SSD drives of 400GB each, configured in two RAID-5 ranks 7+P+S could offer 5.6 TB of space, running up to 250,000 IOPS at sub-millisecond response times.
Why didn't IBM advertise the Storwize V7000 as an all-Flash array? I though the question was silly at the time, since the Storwize V7000 supported SSD, 15K, 10K and 7200 RPM spinning disk, it seemed obvious that it could be configured with only SSD if you chose.
Since then, IBM has added 800GB support to the Storwize V7000, doubling the capacity. More importantly, IBM acquired Texas Memory Systems, and offers a much better all-Flash array.
Flash can be deployed in three levels. The first is in the server itself, such as with PCiE cards containing Flash chips, limited to applications running on that server only.
The second option is a hybrid disk system, that can intermix Flash-based Solid State Drives (SSD) with regular spinning hard disk drives (HDD). These can be attached to many servers.
The problem with this approach is that when Flash is packaged to pretend to be spinning disk, it undermines some of the performance benefits. Traditional disk system architectures using SCSI commands over Device adapter loops can introduce added latency.
The third fits snuggly in the middle: all-Flash arrays designed from the ground up to be only Flash.
Whereas SSD can typically achieve an I/O latency in the 300 to 1000 microseconds range, IBM FlashSystem can process I/O in the 25 to 110 microsecond range. That is a huge difference!
(FTC Disclosure: The U.S. Federal Trade Commission requires that I mention that I am an IBM employee, and that this post may be considered a paid, celebrity endorsement of both the IBM FlashSystem and IBM Storwize family of products. I have no financial interest in EMC, do not endorse the XtremeIO mentioned here, and was not paid to mention their company or products in any manner.)
Fellow blogger and IBM Master Inventor Barry Whyte has a great comparison table in his blog post [Extreme Blogging]. I thought I would add an added column for the Storwize V7000 with 18 Solid State drives.
IBM FlashSystem 820
IBM Storwize V7000 with SSD
20 Terabytes: 1U
11 Terabytes: 2U
7 Terabytes: 6U
I/O latency (microseconds)
110us (~5x faster)
Maximum I/O per second
NAND Flash type
While it is easy to show that EMC's XtremeIO does not hold a candle to IBM FlashSystems, I think it is more amusing that it is not even as good as a Storwize V7000 with SSD that IBM offered two years ago, long before [EMC acquired XtremeIO company] back in May 2012.
The first day of the residency started with introductions. Our emcee and project leader is Vasfi Gucer from IBM Austin lab. There are 17 participants (referred to as "residents") from the USA and various countries including Brazil, Canada and Sweden.
Michael Fork presenting. I am sitting on the far left side
in the pink shirt. Photo taken by Tina Williams.
To set the right expectations, Tina Williams (IBM Social Media ITSO Projects Program Manager) explained what was going to happen this week.
In a typical "residency", residents are brought together for 4-6 weeks to write an [IBM Redbook] which are often how-to guides written in a very conversational tone.
This residency is different. A bunch of social media and Cloud experts have been brought together to share experiences and to build up skills to write individual blog posts about IBM Cloud offerings. I was invited as both a world-reknown blogger as well as a Cloud expert. Everyone who signed up for this commits to write at least six blog posts about Cloud sometime in the next 90 days.
(Residents who do not have their own blogs can post to the IBM [Thoughts on Cloud] group blog Publishing is part of our promotion process, and writing blogs consistently over a period of time counts!)
Jennifer Turner (IBM Worldwide Cloud Marketing Manager) explained IBM Cloud Social Media Initiative. Five years ago, IBM was one of the top 5 Cloud service providers, then a whole bunch of things happened, and we fell out of the top 5 list, and now with the recent [IBM acquisition of SoftLayer], we are in the top 5 again!
Michael Fork (IBM CloudFirst Lead Architect), presented the latest about SoftLayer. Wow! He did a great job, and am glad to have him as a contact in case I have future questions from clients at the Tucson Executive Briefing Center.
Mohsin Syed [@mohsinusyed], IBM Development Manager, presented [IBM Social Media Analytics], combining Hadoop-style analytics using IBM BigInsights, DB2 database and Cognos reporting. IBM can do [sentiment analysis] to determine positive and negative comments in various languages. This product was formerly known as Cognos Consumer Insight.
I was the last speaker of the day. As one of the top bloggers in both the IT Storage Industry, and company-wide within IBM, I was invited to provide a few tips on blogging to the newbies in the audience. Jeff Antley, the "co-owner" of my blog [Inside System Storage] who works on the IBM developerWorks team, was there on hand to help answer questions.
(IBM requires all highly-visible corporate blogs like mine to have at least two owners. Jeff is an expert at HTML, CSS and other web design and has been immensely helpful in getting my blog looking nicer.)
Well it's Tuesday again, and you know what that means.. IBM announcements! Today, IBM announces that next Monday marks the 60th anniversary of first commercial digital tape storage system! I am on the East coast this week visiting clients, but plan to be back in Tucson in time for the cake and fireworks next Monday.
1925 - masking tape (which 3M sold under its newly announced Scotch® brand)
1930 - clear cellulose-based tape (today, when people say Scotch tape, they usually are referring to the cellulose version)
1935 - Allgemeine Elektrizitatsgesellschaft (AEG) presents Magnetophon K1, audio recording on analog tape
1942 - Duct tape
1947 - Bing Crosby adopts audio recording for his radio program. This eliminated him doing the same program live twice per day, perhaps the first example of using technology for "deduplication".
According to the IBM Archives the [IBM 726 tape drive was formally announced May 21, 1952]. It was the size of a refrigerator, and the tape reel was the size of a large pizza. The next time you pull a frozen pizza from your fridge, you can remember this month's celebration!
When I first joined IBM in 1986, there were three kinds of IBM tape. The round reel called 3420, and the square cartridge called 3480, and the tubes that contained a wide swath of tape stored in honeycomb shelves called the [IBM 3850 Mass Storage System].
My first job at IBM was to work on DFHSM, which was specifically started in 1977 to manage the IBM 3850, and later renamed to the DFSMShsm component of the DFSMS element of the z/OS operating system. This software was instrumental in keeping disk and tape at high 80-95 percent utilization rates on mainframe servers.
While visiting a client in Detroit, the client loved their StorageTek tape automation silo, but didn't care for the StorageTek drives inside were incompatible with IBM formats. They wanted to put IBM drives into the StorageTek silos. I agreed it was a good idea, and brought this back to the attention of development. In a contentious meeting with management and engineers, I presented this feedback from the client.
Everyone in the room said IBM couldn't do that. I asked "Why not?" The software engineers I spoke to already said they could support it. With StorageTek at the brink of Chapter 11 bankruptcy, I argued that IBM drives in their tape automation would ease the transition of our mainframe customers to an all-IBM environment.
Was the reason related to business/legal concerns, or was their a hardware issue? It turned out to be a little of both. On the business side, IBM had to agree to work with StorageTek on service and support to its mutual clients in mixed environments. On the technical side, the drive had to be tilted 12 degrees to line up with the robotic hand. A few years later, the IBM silo-compatible 3592 drive was commercially available.
Rather than put StorageTek completely out of business, it had the opposite effect. Now that IBM drives can be put in StorageTek libraries, everyone wanted one, basically bringing StorageTek back to life. This forced IBM to offer its own tape automation libraries.
In 1993, I filed my first patent. It was for the RECYCLE function in DFHSM to consolidate valid data from partial tapes to fresh new tapes. Before my patent, the RECYCLE function selected tapes alphabetically, by volume serial (VOLSER). My patent evaluated all tapes based on how full they were, and sorted them least-full to most-full, to maximize the return of cartridges.
Different tape cartridges can hold different amounts of data, especially with different formats on the same media type, with or without compression, so calculating the percentage full turned out to be a tricky algorithm that continues to be used in mainframe environments today.
The patent was popular for cross-licensing, and IBM has since filed additional patents for this invention in other countries to further increase its license revenue for intellectual property.
In 1997, IBM launched the IBM 3494 Virtual Tape Server (VTS), the first virtual tape storage device, blending disk and tape to optimal effect. This was based off the IBM 3850 Mass Storage Systems, which was the first virtual disk system, that used 3380 disk and tape to emulate the older 3350 disk systems.
In the VTS, tape volume images would be emulated as files on a disk system, then later moved to physical tape. We would call the disk the "Tape Volume Cache", and use caching algorithms to decide how long to keep data in cache, versus destage to tape. However, there were only a few tape drives, and sometimes when the VTS was busy, there were no tape drives available to destage the older images, and the cache would fill up.
I had already solved this problem in DFHSM, with a function called pre-migration. The idea was to pre-emptively copy data to tape, but leave it also on disk, so that when it needed to be destaged, all we had to do was delete the disk copy and activate the tape copy. We patented using this idea for the VTS, and it is still used in the successor models of IBM Sysem Storage TS7740 virtual tape libraries today.
Today, tape continues to be the least expensive storage medium, about 15 to 25 times less expensive, dollar-per-GB, than disk technologies. A dollar of today's LTO-5 tape can hold 22 days worth of MP3 music at 192 Kbps recording. A full TS1140 tape cartridge can hold 2 million copies of the book "War and Peace".
(If you have not read the book, Woody Allen took a speed reading course and read the entire novel in just 20 minutes. He summed up the novel in three words: "It involves Russia." By comparison, in the same 20 minutes, at 650MB/sec, the TS1140 drive can read this novel over and over 390,000 times.)
If you have your own "war stories" about tape, I would love to hear them, please consider posting a comment below.
Tonight PBS plans to air Season 38, Episode 6 of NOVA, titled [Smartest Machine On Earth]. Here is an excerpt from the station listing:
"What's so special about human intelligence and will scientists ever build a computer that rivals the flexibility and power of a human brain? In "Artificial Intelligence," NOVA takes viewers inside an IBM lab where a crack team has been working for nearly three years to perfect a machine that can answer any question. The scientists hope their machine will be able to beat expert contestants in one of the USA's most challenging TV quiz shows -- Jeopardy, which has entertained viewers for over four decades. "Artificial Intelligence" presents the exclusive inside story of how the IBM team developed the world's smartest computer from scratch. Now they're racing to finish it for a special Jeopardy airdate in February 2011. They've built an exact replica of the studio at its research lab near New York and invited past champions to compete against the machine, a big black box code -- named Watson after IBM's founder, Thomas J. Watson. But will Watson be able to beat out its human competition?"
Like most supercomputers, Watson runs the Linux operating system. The system runs 2,880 cores (90 IBM Power 750 servers, four sockets each, eight cores per socket) to achieve 80 [TeraFlops]. TeraFlops is the unit of measure for supercomputers, representing a trillion floating point operations. By comparison, Hans Morvec, principal research scientist at the Robotics Institute of Carnegie Mellon University (CMU) estimates that the [human brain is about 100 TeraFlops]. So, in the three seconds that Watson gets to calculate its response, it would have processed 240 trillion operations.
Several readers of my blog have asked for details on the storage aspects of Watson. Basically, it is a modified version of IBM Scale-Out NAS [SONAS] that IBM offers commercially, but running Linux on POWER instead of Linux-x86. System p expansion drawers of SAS 15K RPM 450GB drives, 12 drives each, are dual-connected to two storage nodes, for a total of 21.6TB of raw disk capacity. The storage nodes use IBM's General Parallel File System (GPFS) to provide clustered NFS access to the rest of the system. Each Power 750 has minimal internal storage mostly to hold the Linux operating system and programs.
When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.
“In times of universal deceit, telling the truth will be a revolutionary act.”
-- George Orwell
Well, it has been over two years since I first covered IBM's acquisition of the XIV company. Amazingly, I still see a lot of misperceptions out in the blogosphere, especially those regarding double drive failures for the XIV storage system. Despite various attempts to [explain XIV resiliency] and to [dispel the rumors], there are still competitors making stuff up, putting fear, uncertainty and doubt into the minds of prospective XIV clients.
Clients love the IBM XIV storage system! In this economy, companies are not stupid. Before buying any enterprise-class disk system, they ask the tough questions, run evaluation tests, and all the other due diligence often referred to as "kicking the tires". Here is what some IBM clients have said about their XIV systems:
“3-5 minutes vs. 8-10 hours rebuild time...”
-- satisfied XIV client
“...we tested an entire module failure - all data is re-distributed in under 6 hours...only 3-5% performance degradation during rebuild...”
-- excited XIV client
“Not only did XIV meet our expectations, it greatly exceeded them...”
In this blog post, I hope to set the record straight. It is not my intent to embarrass anyone in particular, so instead will focus on a fact-based approach.
Fact: IBM has sold THOUSANDS of XIV systems
XIV is "proven" technology with thousands of XIV systems in company data centers. And by systems, I mean full disk systems with 6 to 15 modules in a single rack, twelve drives per module. That equates to hundreds of thousands of disk drives in production TODAY, comparable to the number of disk drives studied by [Google], and [Carnegie Mellon University] that I discussed in my blog post [Fleet Cars and Skin Cells].
Fact: To date, no customer has lost data as a result of a Double Drive Failure on XIV storage system
This has always been true, both when XIV was a stand-alone company and since the IBM acquisition two years ago. When examining the resilience of an array to any single or multiple component failures, it's important to understand the architecture and the design of the system and not assume all systems are alike. At it's core, XIV is a grid-based storage system. IBM XIV does not use traditional RAID-5 or RAID-10 method, but instead data is distributed across loosely connected data modules which act as independent building blocks. XIV divides each LUN into 1MB "chunks", and stores two copies of each chunk on separate drives in separate modules. We call this "RAID-X".
Spreading all the data across many drives is not unique to XIV. Many disk systems, including EMC CLARiiON-based V-Max, HP EVA, and Hitachi Data Systems (HDS) USP-V, allow customers to get XIV-like performance by spreading LUNs across multiple RAID ranks. This is known in the industry as "wide-striping". Some vendors use the terms "metavolumes" or "extent pools" to refer to their implementations of wide-striping. Clients have coined their own phrases, such as "stripes across stripes", "plaid stripes", or "RAID 500". It is highly unlikely that an XIV will experience a double drive failure that ultimately requires recovery of files or LUNs, and is substantially less vulnerable to data loss than an EVA, USP-V or V-Max configured in RAID-5. Fellow blogger Keith Stevenson (IBM) compared XIV's RAID-X design to other forms of RAID in his post [RAID in the 21st Centure].
Fact: IBM XIV is designed to minimize the likelihood and impact of a double drive failure
The independent failure of two drives is a rare occurrence. More data has been lost from hash collisions on EMC Centera than from double drive failures on XIV, and hash collisions are also very rare. While the published worst-case time to re-protect from a 1TB drive failure for a fully-configured XIV is 30 minutes, field experience shows XIV regaining full redundancy on average in 12 minutes. That is 40 times less likely than a typical 8-10 hour window for a RAID-5 configuration.
A lot of bad things can happen in those 8-10 hours of traditional RAID rebuild. Performance can be seriously degraded. Other components may be affected, as they share cache, connected to the same backplane or bus, or co-dependent in some other manner. An engineer supporting the customer onsite during a RAID-5 rebuild might pull the wrong drive, thereby causing a double drive failure they were hoping to avoid. Having IBM XIV rebuild in only a few minutes addresses this "human factor".
In his post [XIV drive management], fellow blogger Jim Kelly (IBM) covers a variety of reasons why storage admins feel double drive failures are more than just random chance. XIV avoids load stress normally associated with traditional RAID rebuild by evenly spreading out the workload across all drives. This is known in the industry as "wear-leveling". When the first drive fails, the recovery is spread across the remaining 179 drives, so that each drive only processes about 1 percent of the data. The [Ultrastar A7K1000] 1TB SATA disk drives that IBM uses from HGST have specified 1.2 million hours mean-time-between-failures [MTBF] would average about one drive failing every nine months in a 180-drive XIV system. However, field experience shows that an XIV system will experience, on average, one drive failure per 13 months, comparable to what companies experience with more robust Fibre Channel drives. That's innovative XIV wear-leveling at work!
Fact: In the highly unlikely event that a DDF were to occur, you will have full read/write access to nearly all of your data on the XIV, all but a few GB.
Even though it has NEVER happened in the field, some clients and prospects are curious what a double drive failure on an XIV would look like. First, a critical alert message would be sent to both the client and IBM, and a "union list" is generated, identifying all the chunks in common. The worst case on a 15-module XIV fully loaded with 79TB data is approximately 9000 chunks, or 9GB of data. The remaining 78.991 TB of unaffected data are fully accessible for read or write. Any I/O requests for the chunks in the "union list" will have no response yet, so there is no way for host applications to access outdated information or cause any corruption.
(One blogger compared losing data on the XIV to drilling a hole through the phone book. Mathematically, the drill bit would be only 1/16th of an inch, or 1.60 millimeters for you folks outside the USA. Enough to knock out perhaps one character from a name or phone number on each page. If you have ever seen an actor in the movies look up a phone number in a telephone booth then yank out a page from the phone book, the XIV equivalent would be cutting out 1/8th of a page from an 1100 page phone book. In both cases, all of the rest of the unaffected information is full accessible, and it is easy to identify which information is missing.)
If the second drive failed several minutes after the first drive, the process for full redundancy is already well under way. This means the union list is considerably shorter or completely empty, and substantially fewer chunks are impacted. Contrast this with RAID-5, where being 99 percent complete on the rebuild when the second drive fails is just as catastrophic as having both drives fail simultaneously.
Fact: After a DDF event, the files on these few GB can be identified for recovery.
Once IBM receives notification of a critical event, an IBM engineer immediately connects to the XIV using remote service support method. There is no need to send someone physically onsite, the repair actions can be done remotely. The IBM engineer has tools from HGST to recover, in most cases, all of the data.
Any "union" chunk that the HGST tools are unable to recover will be set to "media error" mode. The IBM engineer can provide the client a list of the XIV LUNs and LBAs that are on the "media error" list. From this list, the client can determine which hosts these LUNs are attached to, and run file scan utility to the file systems that these LUNs represent. Files that get a media error during this scan will be listed as needing recovery. A chunk could contain several small files, or the chunk could be just part of a large file. To minimize time, the scans and recoveries can all be prioritized and performed in parallel across host systems zoned to these LUNs.
As with any file or volume recovery, keep in mind that these might be part of a larger consistency group, and that your recovery procedures should make sense for the applications involved. In any case, you are probably going to be up-and-running in less time with XIV than recovery from a RAID-5 double failure would take, and certainly nowhere near "beyond repair" that other vendors might have you believe.
Fact: This does not mean you can eliminate all Disaster Recovery planning!
To put this in perspective, you are more likely to lose XIV data from an earthquake, hurricane, fire or flood than from a double drive failure. As with any unlikely disaster, it is best to have a disaster recovery plan than to hope it never happens. All disk systems that sit on a single datacenter floor are vulnerable to such disasters.
For mission-critical applications, IBM recommends using disk mirroring capability. IBM XIV storage system offers synchronous and asynchronous mirroring natively, both included at no additional charge.
Well it's Tuesday again, and you know what that means? IBM announcements! Many of the announcements were made by IBM Executives at the [IBM Pulse 2014 conference].
IBM BlueMix is the newest cloud offering from IBM, providing Platform-as-a-Service (PaaS) offering based on the Cloud Foundry open source project that promises to deliver enterprise-level features and services that are easy to integrate into cloud applications.
This week, my fifth-line manager Tom Rosamilia, IBM Senior Vice President IBM Systems & Technology Group and Integrated Supply Chain made two announcements at Pulse. First, in additional to x86-based servers, SoftLayer will also offer POWER-based servers to run AIX, IBM i and [Linux on POWER] applications.
Second, SoftLayer will support PureApplication Patterns of Expertise. What is a pattern of expertise? It can be as simple as a virtual machine encapsulated in [Open Virtual Format], to more dynamic architectures, packaged with required platform services, that are deployed and managed by the system according to a set of policies.
Patterns simplify and automate tasks across the lifecycle of the application. Customers and partners alike are [seeing significant reductions in cost and time] across the application lifecycle with the deployment of a PureApplication System.
Also, this week at Pulse, Robert LaBlanc, IBM Senior Vice President of Software and Cloud Solutions, announced [IBM plans to Acquire Cloudant] which offers an open, cloud Database-as-a-Service (DBaaS) that helps organizations simplify mobile, web app and big data development efforts.
When I introduced [SmartCloud Virtual Storage Center] back in October 2012, I mentioned that it was a great solution for large enterprise that have all of their disk behind SAN Volume Controller (SVC).
To reach smaller accounts, IBM has announced two new offerings:
IBM SmartCloud Virtual Storage Entry for customers that have less than 250TB of disk behind two or four SVC nodes. It is priced per terabyte, by the amount of capacity that is virtualized.
IBM SmartCloud Virtual Storage for Storwize Family for customers that have other Storwize family products (Storwize V7000 or V5000, for example). It is priced per the number of storage enclosures that are managed by the Storwize family hardware.
However, I was speaking to various clients in Winnipeg, Canada Tuesday and Wednesday this week, so marketing moved the announcement date to today to accommodate my schedule. Sometimes, being the #1 most influential IBM employee in storage comes in handy!)
Here, then, is a quick review of the storage portion of today's announcements.
IBM FlashSystem 840
The [IBM FlashSystem 840] offers twice the capacity as its predecessors, the 810 and 820, with up to 48TB in a dense 2U package.
(Quick recap of previous models: Both the FlashSystem 810 and 820 supported ECC-protected memory and Variable-striped RAID (VSR). The [FlashSystem 810] supported RAID-0 striped across the modules, and the [FlashSystem 820] supported two-dimensional 2D-RAID across modules for higher availability. Fellow blogger Jim Kelley (IBM) on his Storage Buddhist blog has a great post on this: [IBM FlashSystem: Feeding the Hogs].
The new FlashSystem 840 in effect replaces both, so you can choose RAID-0 striping or 2D-RAID, along with your ECC-protected memory and Variable-striped RAID. It offers hot-swappable Flash modules, redundant components, and non-disruptive concurrent code load (CCL).
The FlashSystem 840 also introduces military-grade AES-XTS 256 bit encryption to provide added protection to your data.
For host attachment, you have some great choices: 16Gb/8Gb/4Gb auto-negotiated Fibre Channel (FCP), 40Gb InfiniBand QDR, and 10Gb FCoE. Whatever you decide, you get 90 microsecond writes, and 135 microsecond reads.
Since its introduction just over a year ago, IBM has sold FlashSystem to over 1,000 clients! For more on how this compares to other all-flash arrays, read my previous post about [IBM FlashSystem].
Adding SAN Volume Controller provides some key advantages, including Real-time compression, Thin provisioning, FlashCopy point-in-time copies, Stretched Cluster support, Easy Tier sub-LUN automated tiering, and remote copy services like Metro Mirror (synchronous) and Global Mirror (asynchronous).
Adding the SVC also changes the host attachment options: 8Gb/4Gb/2Gb Fibre Channel (FCP), 1Gb and 10Gb iSCSI, and 10Gb FCoE. Depending on the options and features you choose, the SVC layer adds a modest 60 to 100 microseconds to each read and write.
Each SVC node dedicates four of its six cores, and 2GB of its 24GB cache, to use with compression. Those interested in beefing up compression performance, either with FlashSystems or with any other disk, can choose the "Compression Hardware Upgrade Boosts Base I/O Efficiency" (affectionately known as the CHUBBIE) RPQ 8S1296 for SVC systems with software version 18.104.22.168 or higher. Basically, this RPQ adds another 6-core CPU and another 24GB of cache, so that each node can dedicate 8 cores for compression, and 26GB of cache for compression processing. Initial test results show this can increase performance 3x!
IBM Network Advisor
The [IBM Network Advisor v12.1] management software provides comprehensive management for data, storage and converged networks. This single application can deliver end-to-end visibility and insight across different network types--it supports Fibre Channel SANs (including Gen 5 Fibre Channel platform), IBM FICON and IBM b-type SAN FCoE networks--and provides new features to manage your Brocade and IBM b-type SAN switches.
Cisco MDS 9710 Multilayer Director
The [Cisco MDS 9710 Multilayer Director] is mainframe-ready, with full support for System z FICON and Fibre Channel protocol (FCP) environments. This director supports eight module slots for a maximum of 384 ports.
Well it's Tuesday again, and you know what that means? IBM Announcements!
You might be thinking, didn't IBM just have a [huge storage announcement October 8, 2013]? You would be right! IBM's $1B additional investment in Storage has been like shot of adrenaline in getting new features and functions out sooner to our clients.
DS8870 Disk System Release 7.2
New IBM POWER7+ controllers. The previous models of DS8870 were based on the POWER7 controllers, and these new models have POWER7+ processors. This change enhances the performance across the board, from mainframe to distributed systems, from sequential to random. Customers with existing POWER7-based models will be able to do MES upgrade to the new POWER7+ next year.
For comparison with older DS8000 models, here are some internal IBM measurements we took for Database workloads on both z/OS(mainframe) and Distributed systems with typical 70% read, 30% write and 50% cache hit:
IBM Internal Measurements (thousands of IOPS)
DB Distributed systems
New 1.2TB (10K RPM) and 4TB (7200 RPM) self-encrypting enterprise drives (SED). This is a 33% boost over the previous 900GB and 3TB drives previously available. As with all the other drives in the DS8870, these new drives include the encryption chip right on the drive itself, offering encryption with scalability.
Improved security. Release 7.2 will support the U.S. National Institute of Standards and Technology [NIST.gov]] 800-131A specification, raising the 96-bit encryption to the required 112 bits on the customer IP network. This involves updates to the security firmware, management software and digital signatures on code loads.
Metro Mirror enhancement for System z. By avoiding serial conflicts of updated blocks, this enhancement can boost performance up to 100 percent when using Metro Mirror with z/OS applications on System z mainframes.
Easy Tier™ reporting and graphs to determine optimal mix. Now you can see for yourself how sub-LUN automated tiering is helping your applications.
Easy Tier Workload Categorization
New workload visuals help clients and IBM technical specialists compare activity across tiers within and across pools to help determine optimal drive mix for current workloads
Easy Tier Data Movement Daily Report
New Easy Tier summary report every 24 hours illustrating data migration activity (5-minute intervals) can help visualize migration types and patterns for current workloads
Easy Tier Workload Skew Curve
Shows skew of all workloads across the system in a graph to help clients visualize and accurately tier configurations when adding capacity or a new system Clients can import data into Disk Magic
All-Flash Optimization. Yesterday, in my post [IBM FlashSystem versus EMC XtremeIO], I mentioned that any hybrid systems like the IBM Storwize V7000 that can support a mix of SSD and HDD can obviously be configured as SSD-only. Apparently, that was not obvious to many readers, so I apologize. For the DS8870, you can configure an all-Flash (SSD only) configuration, and Release 7.2 added some optimization when configured with SSD only.
1,056 drives 15K @146GB in RAID-10
224 drives SSD @400GB in RAID-5
Same - Usable 72 TB
70 percent faster
33 percent less floor space required
62 percent less energy consumed
(Note: Performance results based on measurements and projections using IBM benchmarks in a controlled environment.)
OpenStack™ support DS8870 now offers the [OpenStack Cinder] interface for block LUN allocations in OpenStack environments. IBM is a Platinum sponsor of OpenStack, and Opentack is the strategic platform for IBM private and hybrid clouds.
XIV Storage System
Following on the heels of the [XIV enhancements announced], IBM has now added 800GB Solid State Drives (SSD) as Read cache for its 4TB drive-based models.
DCS3860 Disk System
The DCS3860 is the next generation of the DCS3700 disk system. Designed with Linux-x86 servers in mind, the system offers direct SAS host attachement, 24GB of cache, and 60 drives in a compact 4U drawer. Like its predecessor, the drives are stored on five pull-out trays, with twelve hot-swappable drives per tray. You can add up to five more expansion units, with 60 drives each, for a total of 360 drives in 24U rack space.
These new models will help our clients deploy new workloads and consolidate existing workloads.
Each resident presented at least six proposals for blog post ideas. A proposal included a title and short description of what it would entail. Titles had to be less than 70 characters, and the short descriptions were typically just a few sentences.
These were presented to the entire team, and we picked them apart, suggested better wording for the titles, or different ways to approach the topic.
"I treat others respectfully, attacking ideas and not people. I also welcome respectful disagreement with my own ideas.
I believe in intellectual property rights, providing links, citing sources, and crediting inspiration where appropriate.
I disclose my material relationships, policies and business practices. My readers will know the difference between editorial, advertorial, and advertising, should I choose to have it. If I do sponsored or paid posts, they are clearly marked.
When collaborating with marketers and PR professionals, I handle myself professionally and abide by basic journalistic standards.
I always present my honest opinions to the best of my ability.
I own my words. Even if I occasionally have to eat them."
Words to live by.
The residents spent most of the day working on our blogs from the proposals that were approved. The target was around 400 to 600 words in length, with one or two stock photos.
IBM is the #1 vendor for Social Business tools, so it makes sense for us to use our own stuff to facilitate the submission process. The residents submit their blog posts to IBM Connections as an activity in the Cloud Social Media Residency community. All of the resources we used, and all the presentations we saw, are all here in the community.
As an incentive, prizes were given out to those who submitted the most posts by end of the day.
We were given certificates for completing the class, and a "Redbooks Thought Leader" emblem to put on our blog.
Ryan Boyles took a group photo! If it seems that the photo is slightly askew, it is to make me look taller. Yes, I could have used GIMP to fix the orientation, but why bother? I look tall! Woo hoo! I will have to remember this technique for future group photos.
ITSO Cloud Social Media Residency, Oct 2013. Photo by Ryan Boyles.
Lastly, I would like to thank Vasfi, Tamikia, Hillary, Caroline, Ric, Jane, LeeAnne, Tina, Karen, Michael, Shelbee, Farzad, Stewart, Arun, Eric, Chris, Hans, Odilon, Mohsin, Wolfgang and the rest of the ITSO team for a wonderful job organizing this week!
The gondolier propels the boat with an oar, and stopped rowing a few times to belt out beautiful Italian songs.
Truly impressed, I asked the gondolier how long was the training for this job. "Six weeks!" he answered. Wow! Where can I learn to sing like that in six weeks?
He clarified. No, the Venetian hotel hires competent singers, and then spends six weeks to teach them to row the gondola. Duh!
I asked Vasfi Gucer, our ITSO project leader for this residency, why there were so many Cloud topics on the agenda for this social media training. He explained it was just as important to emphasize "why" people need to be passionate about Cloud, in addition to the "what" and "how" of blogging.
This reminded me of this quote from fellow author Hugh MacLeod. I highly recommend his series of books.
"Blogging requires passion and authority. Which leaves out most people."
--- Hugh MacLeod.
Vasfi had invited Cloud experts who already have the authority to blog, and the point of this residency is for the residents to become passionate in sharing their expertise.
Here are some of the people that spoke on Cloud:
Ric Telford, IBM VP of Cloud Services
Ric Telford shared with us IBM's point of view of where the Cloud industry is going. He has been in this job position since 2009, and shared with us the history of how the IBM Cloud business has evolved in the past four years.
Jane Munn, IBM VP Business Line Executive for Cloud hardware
As the Center of Competency on Cloud for all 12 IBM Executive Briefing Centers in my group, I had to report to Jane Munn on a frequent basis. I was pretty candid on those calls on what we should change, and I am glad to see that many of my suggestions have been implemented, or being considered for 2014.
Michael Fork, IBM Lead Architect for Hosted Private Cloud
Michael Fork gave two great presentations, one on [IBM SoftLayer] Cloud services, and the second on IBM's support of open standards, such as [OpenStack] and Cloud Foundry.
Hans Zai, IBM Cloud Service Line Leader; and Odilon Magroski Goulart Junior, IBM Technical Solution Architect
All the residents had to present in front of the class on their expertise. Hans and Odilon presented their work on [IBM SmartCloud for SAP Applications]. Hans is from Sweden, and Odilon from Brazil, so their perspectives on this was quite interesting.
When IBM renamed LotusLive to [SmartCloud for Social Business], I thought this would be the naming convention for all of our Software-as-a-Service (SaaS) offerings.
But SmartCloud for SAP Applications is a Platform-as-a-Service, providing the SAP environment as a platform, which allows clients to then deploy their customized SAP applications on this platform.
What did I present on for my "Share your expertise" session? IBM System Storage, of course! Storage is a critical part of Cloud!
So, my gentle readers, what topics do you want me to write about that combines Storage and Cloud? Enter your suggestions in the comments below.
"SmartCloud Enterprise Object Storage is switching from 3rd-party Nirvanix to its internal IBM Softlayer. This one involves more in-depth explanation which I will save for another post."
It's time to make good on that promise! Here is a quick diagram to help visualize the agreement (with sincere apologies to [Jessica Hagy]!) but not to scale, of course!
Last month, Nirvanix announced it was shutting down October 15. Here was the exact wording from their website:
For the past seven years, we have worked to deliver cloud storage solutions. We have concluded that we must begin a wind-down of our business and we need your active participation to achieve the best outcome.
We are dedicating the resources we can to assisting our customers in either returning their data or transitioning their data to alternative providers who provide similar services including IBM SoftLayer, Amazon S3, Google Storage or Microsoft Azure.
We have an agreement with IBM, and a team from IBM is ready to help you. In addition, we have established a higher speed connection with some companies to increase the rate of data transfer from Nirvanix to their servers.
We are working hard to have resources available through October 15 to assist you with the transition process, and have set up a rapid response team that can be reached at (619) 764-5650 [press 2 for customer support during normal business hours] or (888) 791-0365 after business hours, or contact firstname.lastname@example.org.
Please check back to this web page periodically for status updates.
We thank you for your support and patience.
The Nirvanix team
UPDATE ON NIRVANIX
On October 1, 2013, Nirvanix voluntarily sought Chapter 11 bankruptcy protection in order to pursue all alternatives to maximize value for its creditors while continuing its efforts to provide the best possible transition for customers."
In response, IBM put out this press release:
"In light of reports that Nivanix has decided to soon cease operations, IBM is moving quickly to help clients of our Nivanix-based Object Storage offering to move their data to other solutions such as the robust and highly scalable IBM SoftLayer Object Storage or IBM's persistent storage solution."
To understand why this is a big deal, consider the difference between Cloud Computing and Cloud Storage. Cloud Computing is like buying gasoline at your favorite gas station. If the station is closed, you can just drive a few blocks to another gas station. The ease with which customers can switch from one Cloud Compute provider to another is part of the appeal, forcing Cloud Compute providers to be extremely efficient at what they do to offer the lowest price.
Cloud Storage is completely different, more like a safety-deposit box at the bank, or a storage unit to hold all of your boxes of tax receipts. Now if you have a small amount stored away in a safety-deposit box, this is probably just a minor inconvenience. You can take out the contents and store at home, or find another bank and open a new safe deposit account.
However, if you have a lot stored in a storage unit, it may be more difficult.
For example, I am in the process of remodeling my home, so I have moved a lot of my stuff to a 400 cubic-foot storage unit during the process. There were a variety of storage units within miles of my home. Some are fully air-conditioned, some offered 24x7 access, while others are not air-conditioned, or only allowed access during business hours. It has taken me several weekends to box up and move them to the storage unit. My car only holds 12-14 boxes at a time, so many trips were involved.
If the Storage Unit company told me that they were closing down, and that I would have to move all of these boxes to another facility, I would have to hire moving professionals to do all the work. This is in effect what companies need to do with their data. They must take the data off Nirvanix systems, and either store it in-house, or find another cloud storage provider.
IBM offers three options:
IBM [SoftLayer Object Storage] offering which is an OpenStack Swift-based Object Storage solution. IBM's SoftLayer object based storage solution provides a robust, highly scalable solution, with the ability to retrieve and leverage data the way you want to, and grow when you need. You can choose to store your objects in Dallas, Texas (USA), Amsterdam (Europe), and/or Singapore (Asia).
SCE persistent storage solution where you will be able to manage storage resources by attaching an instance during the instance creation process.
An alternate storage solution of your choice. Yes, IBM will help you move your data to Amazon, Google, Microsoft, etc. While technically competitors, IBM also has strategic partnerships in place with each to facilitate the movement.
These options are not just for IBM's SmartCloud Enterprise Object Storage clients. Nirvanix has named IBM the savior for all of its other non-IBM customers as well. Why IBM? Well, IBM is one of the most recognized names in the IT industry. Not just one of the biggest Cloud Service providers, IBM also has an army of professionals in its Global Services division to help.
Well, it's Tuesday again, and you know what that means? Announcements!
Today, IBM's announcements are designed to change the economics of big data analytics, cloud, mobile and social media.
[Software Defined Environments] require [Software Defined Storage], combining storage virtualization with open, extensible, industry-led interfaces. The IBM SmartCloud Virtual Storage Center (VSC) and IBM Storwize Family are the market leaders in storage virtualization. SmartCloud VSC, Storwize Family, and XIV support the industry-led OpenStack interfaces.
Here are some of the announcements today:
IBM Storwize® Family
The [SAN Volume Controller] was first introduced 10 years ago, in 2003. Today, clients enjoy these storage virtualization capabilities across a variety of offerings, known collectively as the [IBM Storwize Family].
IBM adds a new member to the Storwize Family. In addition to SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, Flex System V7000, Storwize V3700, and Storwize V3500, IBM is announcing the [IBM Storwize V5000]. Here's a quick side-by-side comparison:
Scalability: Maximum configuration
Four control enclosures clustered together, 36 expansion enclosures, 960 drives, 64GB cache
Two control enclosures clustered together, 12 expansion enclosures, 336 drives, 32GB cache
One control enclosure, 4 expansion enclosures, 120 drives, 8GB cache upgradeable to 16GB, optional Turbo performance
8Gbps FCP and 1GbE iSCSI standard; optional 10GbE iSCSI/FCoE.
Can upgrade to Storwize V7000 Unified by adding NAS File Modules to add support for CIFS, NFS, HTTPS, SCP and FTP protocols
1GbE iSCSI, 6Gbps SAS, 8Gbps FCP and 10GbE iSCSI/FCoE Standard
1GbE iSCSI, 6Gbps SAS standard; optional 8Gbps FCP and 10GbE iSCSI/FCoE
Storage virtualization/Data Migration
Internal virtualization, Data Migration standard; optional external virtualization
Internal virtualization, Data Migration standard; optional external virtualization
Internal virtualization, Data Migration (external devices can be attached to ingest data only) standard
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
Sub-LUN Automated Tiering
Easy Tier standard
optional Easy Tier
optional Easy Tier
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
Storwize V7000, V5000 and V37000 now support larger 800GB SSD drives. Previously, they only support SSD drives up to 400GB.
VMware 5.5 and VASA support. VMware ships every release with built-in support for all members of the IBM Storwize Family, but it bears repeating here just in case you were interested. IBM is a leading reseller of VMware, so it makes sense for IBM's storage devices to support everything that VMware customers could possibly want in terms of VMware integration. IBM SmartCloud VSC, Storwize Family, and XIV Storage System are no exception!
New IP-based replication driving lower costs for replication. Previously, Metro Mirror, Global Mirror and Global Mirror with Change Volumes were FCP-based, and many clients bought extra equipment to run FCP packets over long-distance IP (known as FCIP). Now, clients can replicate across distnace natively without FCIP routers, and use IP-based connections natively.
In my blog posts covering [Edge 2013 - Day 3 Solution Center], I mentioned that IBM has certified Bridgeworks' SANSlide 150SVCV7K unit that provides a Riverbed-like WAN Optimization for long-distance replication. Now, IBM has fully integrated Bridgeworks' SANSlide network optimization technology directly into Storwize Family!
All members of the Storwize Family will support 1GbE remote disk replication, and this will be extended to 10GbE support at a later date.
The [Storwize V3700] is now offered in 48-volt Direct Current (DC) models, [NEBS/ETSI compliance] for Telecommunications companies that require this, and now support 4TB drives.
When we introduced [IBM SmartCloud Storage Access] in February, it was to offer self-service, automated policy-based provisioning for file storage on the SONAS and Storwize V7000 Unified. Today, we add self-service, automated policy-based provisioning for block storage. The first products to be supported are SmartCloud VSC, the entire Storwize Family, and XIV Storage Systems. In addition to the web portal, the Storage Cloud Integration API enables 3rd party ISV applications to support SmartCloud Storage Access.
Storage admins will no longer need to be bothered with tedious provisioning requests, freeing up more time for them to work on more strategic, transformational projects.
[IBM SmartCloud Virtual Storage Center] was introduced last year, combining SAN Volume Controller, Tivoli Storage Productivity Center, Tivoli FlashCopy Mangaer and the Storage Analytics Engine into a single license. The initial offering provided the cross-platform "Tiered Storage Optimization" that provided recommendations for what LUNs should be moved from one disk array to another to manage performance vs. cost. Today, IBM is first to market with an automated version, moving LUNs automatically from one disk array to another.
[SmartCloud Enterprise Object Storage] is switching from 3rd-party Nirvanix to its internal IBM Softlayer. This one involves more in-depth explanation which I will save for another post.
IBM XIV Storage System
As part of the [due diligence] team for IBM to acquire the XIV company back in 2007, I am glad to see how this system has evolved since then. I have certainly [blogged quite a bit on XIV] over the years.
Earlier this year, IBM introduced Hyper-Scale Mobility which allows the storage admin to move LUNs non-disruptively from one XIV frame to another. Today, Hyper-Scale Cross-system Consistency Groups allows you to have snapshots of collections of volumes across multiple XIV frames, up to 3PB of capacity snapped at the same instance of time.
The current supported releases of OpenStack are Folsom and Grizzly, and the newest release is Havana. XIV now offers OpenStack Cinder interfaces at the Havana level.
XIV now offers a RESTful API for monitoring and provisioning. [REST] is a de-facto standard in WEB services and cloud implementations. XIV's RESTful API is a programmatic management interface that follows REST principles:
Resources are identified by global identifiers (URIs)
Data is sent as JSON/XML over HTTP
Manipulations of resources are done by HTTP methods (GET, PUT, POST, DELETE)
The interface is Stateless and Hypertext driven
The interface is universally supported, programming language and platform agnostic. For monitoring, the following GET example could show the list of volumes on a particular XIV storage system:
For provisioning, the following PUT example could create "vol1" on that XIV storage system.
IBM SmartCloud Storage Access to allow self-service provisioning (see the SmartCloud section above).
Data-at-Rest encryption, using Self-Encrypting Drives (SED). XIV will encrypt the data, and IBM's Security Key Lifecycle Manager (SKLM) or Tivoli Key Lifecycle Manager (TKLM). If you have an XIV already, you may already have SED drives ready to use! The XIV will also encrypt the data on the SSD drives used for persistent read-cache.
Other new and enhanced offerings
For our mainframe clients, the Virtualization Engine TS7700 now supports 60 percent more capacity, and can now support 8Gbps FICON attachement.
N series N3000, N6000 and N7000 support new disk drive types and sizes, as well as Data OnTap 8.2 Cluster mode. You can now lash together up to 16 N series together into a SONAS-like single system image.
Cisco MDS 9710 Multilayer Director for IBM® System Networking is a new 16 Gbps SAN director with robust security to support multi-tenancy cloud configurations.
Whew! That is a lot of things to discuss in one post. Since they were all related, I did not want to split it up into parts.
Wrapping up my coverage of the of the [IBM Edge 2013] conference, I have some photos of people I ran into at the Solutions Center.
Leslie Hattig and Lisa Stone, both account managers for [MarkIII Systems], an IBM Business Partner located in Houston, TX. These ladies are inseparable BFFs, I have never seen one without the other! I first met them at the [Storage Symposium in Chicago] back in 2009.
Stacy Tabor was our Community Manager for the [Storage Community]. This community covers IT Storage challenges, hot topics, architecture and solutions. You'll find industry news, videos, blog discussion threads on timely topics, exclusive analyst white papers and experts opinions. I am a frequent contributor, myself, and thank Stacy for her past service. She helped run a "Social Media Hour" at Edge for all the bloggers like me to get to meet each other.
I could not resist getting a picture with this Las Vegas Cirque du Soleil] dancer. This was an invitation-only event, sponsored by IBM Business Partners, that I was invited to during the Social Media Hour. (See, it pays to be social!) I think the visual effects of the flag she was waving turned out really well in the picture! And yes, in case you are wondering, that is my favorite grape-flavored beverage (GFB) in my left hand. Posing for this picture was quite the balancing act, but then I am also a certified yoga instructor, so I was able to manage!
Tanaz Sowdagar is an IBM Storage Rep for our Business Development Team. This includes finding other companies to OEM our technology and re-brand it under their own names. I have worked with Tanaz for many years, helping answer questions that potential OEM parnters have about our products and technologies for this purpose.
This was Michelle, my Conference Room Monitor. Each room had one, scanning the bar-codes on each badge for all the attendees, keeping count of the number of people for each session, supporting anything the speaker needs, like getting the A/V guy to come help set up the laptop projector.
Since this was Friday, last day of the conference. I decide to dress casually, consistent with many company's [Casual Friday] dress code policy. I am wearing the "IBM Edge Rocks" tee-shirt given out at the concert and Solutions center the first few nights.
Getting this shot right took several takes, as the man I handed my camera to had apparently never seen a digital camera before, did not know how to focus, and some
Finally, leaving Las Vegas, I sat next to Mrs. Joey Clark, wife of "Bulldog" Clark of the Utah band [Blammity Blam]. She also sometimes plays violin with the band. She is a newly-wed, and not sure if Joey is her name, or her husband's name. (Joey, if you are out there, and want me to correctly identify you, please write a comment in the section below.)
What I have learned however, is that if a beautiful girl is sitting next to me on the plane, she will either talk to me the entire flight, implying that she is single, or mention within the first 30 seconds of conversation that she is married. Sadly for me, it was the latter.
(We were both flying on to Dallas, TX, whereupon she was going to visit her parents in Florida, and I was on my way to Sao Paulo, Brazil to get stuck there amongst the protesters in what is now called the [V for Vinegar movement], but I will save that for another blog post!)
Well, that wraps up my coverage of Edge 2013. I am sorry it took so many months to cover all the material, but I did not want to have it go uncovered much longer.
Next year's [Edge 2014] is expected to be bigger and better. It will in Las Vegas again, but this time at the Venetian Hotel, May 19-23, 2014. I plan to be there!
Monday marked the first official day of [IBM Edge 2013] conference. This is actually three conferences in one: Executive Edge for the high-level executives, Winning Edge for the Business Partners, and Technical Edge for storage administrators and IT manager/directors. I attended the latter.
The General Session was kicked off by an awesome drumbeat-heavy song performed by a band from North Carolina called [Delta Rae]. Their use of drums reminded me of Adam Ant.
Deon Newman, IBM VP of Marketing, Systems and Technology Group, North America, served as today's master of ceremonies. He was pleased to announce there were more then 4,700 attendees at this event -- representing more than 60 countries -- a huge increase over the attendance we had last year. Here are my notes of the opening General Session:
Stephen Leonard, IBM General Manager, Sales, Systems & Technology Group
Consumers expect an always-on technology experience. We, as consumers, are leaving a trail of data that is getting wider and wider every day. Data is the new "natural resource", but plentiful and never ending.
In 1996, about 29 percent of IT spend was for adminstration and management, today it has grown to 68 percent. Some 34 percent of IT projects deploy late.
Stephen emphasized the themes of Smarter Computing: (a) systems that are designed for the data, (b) software-defined environments, that are (c) open and collaborative.
Stephen cited a customer example from [Jaguar Land Rover], a manufacturer of sporty automobiles and rugged 4x4 vehicles. IBM developed a ["Virtual Dealership"] for them. Rather that trying to maintain additional physical bricks-and-mortar facilities, which can be expensive to staff and fill with vehicles across their wide portfolio, the virtual dealership allows prospective customers to try out vehicles through simulation. This virtual dealership could be taken to where prospective clients are, such as a sporting event or shopping mall.
Ed Walsh, IBM VP of Marketing, System Storage and Networking
Ed presented the "data economics" of all-Flash arrays. IBM recently acquired Texas Memory Systems, and renamed the RamSan products to IBM FlashSystem, and committed to invest an additional $1 Billion US dollars in flash technologies.
On a $-per-IOPS basis, IBM FlashSystems can be 30 percent lower total-cost-of-ownership TCO than disk-based alternatives. The cost of Flash is offset by 17 percent fewer servers from having higher CPU utilization rates, resuling in 38 percent lower software license fees. Flash is also more efficient, with 74 percent lower in environmental costs, and 35 percent lower operational support costs. For many situations, Flash is the solution for poorly written software applications.
Ed also mentioned IBM's strong support for open source and open standards. Over the past 15 years, IBM as been a major contributor for open source efforts like Linux, Eclipse and Apache. IBM continues that tradition, with contributions to OpenStack and Hadoop.
Without going into any details, Ed also hinted that IBM announced 65 new or refreshed products in Storage, Networking and PureSystems. The details of each announcement would be explained during the break-out sessions during the week.
Charles Long, Founder and CEO of Centerline Digital
[Centerline Digital] does computer-generated animations in support of corporate marketing efforts.
(FTC disclosure: I work for IBM, and have worked closely with Centerline Digital marketing agency when I was the chief marketing strategist for System Storage back in 2006-2007. I was not paid or provided any products or services to mention any of the clients mentioned in this post.)
Charles indicates that internet technologies have converted "Analog dollars to digital pennies". Using IBM PureFlex with Storwize V7000 storage, real-time compression, and Tivoli Endpoint Manager, Centerline was able to drastically improve their business. He feels the old joke of "Better, Faster, Cheaper - Choose Any Two!" no longer applies with IBM solutions!
Ambuj Goyal, IBM General Manager, System Storage and Networking
Formerly my fifth-line manager in charge of Software and Systems, Ambuj switched to be the General Manager of System Storage and Networking group earlier this year.
In his former roles, Ambuj managed software and hardware product lines, but he feels storage is a completely different animal. In the past, clients focused on choosing the best servers, then chose their storage as an afterthought. Today, Ambuj feels that processors are now a commodity, and that storage is becoming the forethought.
Ambuj also highlighted the evolution of IBM's Software-Defined Environment:
In 2003, IBM introduced its the SAN Volume Controller, a storage hypervisor. Now, over 10,000 clients enjoy the benefits of a Software-Defined Environment using SAN Volume Controller.
SmartCloud Virtual Storage Center represents the "third generation" for policy-driven management, combining SAN Volume Controller, Tivoli Storage Productivity Center, FlashCopy Manager and the Storage Analytics Engine.
IBM is trying to help people keep their business critical apps running securely, to be able to start quickly, add value and functions at scale, and to leverage all of this data-intensive solutions to help drive new business and gain customer insight.
Joseph Balsamo, VP of Platform Engineering at Prudential Insurance
While the IT department of [Prudential Insurance] is focused on the three V's -- Volume, Velocity and Variety -- Joe is more focused on solutions, status and cost. His mission was to strengthen the role of IT as a partner through business aligned services. Prudential has deployed XIV, N series, SAN Volume Controller (SVC) and Storwize V7000 disk systems, with the following results:
Reduced their $-per-IOPS by 75 percent
No additional storage administrators
85 percent utilization through thick-to-thin migrations
Reduced their $-per-MB by 50 percent
Reduced their 72-hour RPO to 15 seconds
These benefits were achieved over the past 24 months of deployment.
Paulo Carvao, IBM Vice President, North America Systems & Technology Group
Paulo is Deon Newman's boss. He presented BlueInsight, IBM's internal "Business Analytics" cloud accessible by over 200,000 users, with over 1 PB of content.
Inside IBM, the deployment of a Smarter Infrastructure has allowed for 25 percent capacity growth at flat IT budget, with 30,000 fewer Megawatts and 103,000 square feet.
Why is this significant? Today's disk writes each bit of information across 1200 atoms, and the smallest number of atoms that can retain information is 12 bits, so sometime in the next 7 to 10 years, the improvements in magnetic bit density for disk will stop.
For silicon chips, the smallest practical feature is 7 nanometers, about 35 atoms wide. We are quickly approaching that limit also.
I can already tell that it's going to be a busy week! Follow me on twitter (@az990tony) and tag your posts and tweets with #IBMedge hashtag.
Continuing this week's theme about the future, fellow blogger, published author, and futurist David Houle is coming out with a new book this month titled [Entering the Shift Age]. This is a follow-on to his book, [The Shift Age].
Since this book cites IBM studies explicitly, his PR department asked me to review it. If you are an aspiring author that has a book you want me review, and it relates to the topics my blog covers like Cloud, Big Data, storage, and the explosion of information, feel free to send me a copy!
(FTC Disclosure: I work for IBM. I was not paid by anyone to mention this book on my blog. I was provided an "Uncorrected Advanced Copy" of this book at no cost to me for this review. I do not know David Houle personally, have not read any of his prior works, nor have I ever seen him speak at public events. This post is neither a paid nor celebrity endorsement of this author, his book, nor any other books by this author.)
First, let's get a few details out of the way:
Title:Entering the Shift Age, 284 pages Author: David Houle, futurist Genre: Non-fiction, trends and predictions
Publisher: Sourcebooks, Inc. Publish date: January 2013
As I mentioned in my post [Historians vs. Futurists], there is only one past, but there are many potential futures. There seems to be as many futurists out there as there are potential futures. I suspect not everyone will agree with all that David has written. However, this reminds me of one of my favorite quotes:
"When two futurists always agree, one is no longer necessary." -- old Italian adage
In his book, David asks a series of thought-provoking questions, then answers them with his views and opinions on how the future will roll out:
Is humanity now entering a new age that is different than the Information age?
If so, what should we call it?
Which forces are driving this new age?
How will this impact various aspects and institutions of society?
David feels humanity is indeed entering a new age, which he calls the Shift Age. This is driven by three forces: the shift to globalization of culture and politics, the flow of power and influence to individuals, and the acceleration of electronic connectedness.
In a sense, David is like a hunter-gatherer from the Stone age, hunting down trends and gathering ideas from others. In much the same way my compost brings renewed purpose to the rinds and pits of my fruits and vegetables, David's book does a good job paraphrasing the works of many of today's leading futurists.
David predicts the decade we are now in, the 2010's, will mark the end of the Information age, a transition period to this new era, that will lead to transformations in government, education, health, technology, and energy.
Over the past two weeks, I had time to enjoy a variety of movies. I had seen several whose stories wrapped around key moments of transition.
"Gone with the Wind", as well as the new offering "Lincoln" from Steven Spielberg. Both are set in the 1860's, the time of the [American Civil War], pitting the Industrial-age forces of the North, against the Agricultural-age economy of the South. This time saw the transition from slavery to freedom.
"Doctor Zhivago", set in the time of World War I, on the German-Russian front, as well as the Russian Revolution of 1917, and the resulting Civil War between the Red Guard and the White Army. This saw the transition from a Russian government ruled by Czars, to one ruled by the people through Communism.
"Lawrence of Arabia", also set in the time of World War I, but south in Arabia. T. E. Lawrence was able to bring several warring Arab tribes together to defeat the Turks, and was a key figure in the transition to an Arab National Council.
Some might call these completely unexpected [Black Swan] events, while others might feel they are merely fortunate (or misfortunate) sequences of events that led to inevitable social change. Has something happened, or will something happen later this decade, that will drive us to leave the Information Age?
David's previous book, The Shift Age, was published back in 2007, and a lot has happened in the past six years: a global financial melt-down recession; the Arab Spring uprisings in the Middle East; Barack Obama was elected and re-elected; man-made climate change in the form of hurricanes, tsunamis and superstorms hit various parts of the world; brush fires lit up Australia, and BP's Deepwater Horizon oil rig exploded off the Gulf coast, just to name a few.
David's new book reflects the impact of these recent events, from discussions on his [Evolutionshift] blog, to Q&A sessions he has after his public speaking presentations. For those who are not interested in the wide array of topics he covers in this one book, David also offers [a dozen different mini-eBooks] that cover specific topics like [Technology, Energy and Health].
My Rating: Moist and Flaky
Who should read this book: If you are a time-traveler from 1975 that came to this decade to learn all about what your future has in store, but can only select one book to read before you zoom back to your own time period, this would be the book I recommend.
I do not want to imply this is a quick read, or one that you can't put down once you start reading it. Just like you should not gulp down a full bottle of cheap Vodka in one sitting, this book should be read over a series of days, as I did, so that you can mull over in your mind the different points and thoughts he is trying to convey.
If you store your VMware bits on external SAN or NAS-based disk storage systems, this post is for you. The subject of the post, VM Volumes, is a potential storage management game changer!
Fellow blogger Stephen Foskett mentioned VM Volumes in his [Introducing VMware vSphere Storage Features] presentation at IBM Edge 2012 conference. His session on VMware's storage features included VMware APIs for Array Integration (VAAI), VMware Array Storage Awareness (VASA), vCenter plug-ins, and a new concept he called "vVol", now more formally known as VM Volumes. This post provides a follow-up to this, describing the VM Volumes concepts, architecture, and value proposition.
"VM Volumes" is a future architecture that VMware is developing in collaboration with IBM and other major storage system vendors. So far, very little information about VM Volumes has been released. At VMworld 2012 Barcelona, VMware highlights VM Volumes for the first time and IBM demonstrates VM Volumes with the IBM XIV Storage System (more about this demo below). VM Volumes is worth your attention -- when it becomes generally available, everyone using storage arrays will have to reconsider their storage management practices in a VMware environment -- no exaggeration!
But enough drama. What is this all about?
(Note: for the sake of clarity, this post refers to block storage only. However, the VM Volumes feature applies to NAS systems as well. Special thanks to Yossi Siles and the XIV development team for their help on this post!)
The VM Volumes concept is simple: VM disks are mapped directly to special volumes on a storage array system, as opposed to storing VMDK files on a vSphere datastore.
The following images illustrate the differences between the two storage management paradigms.
You may still be asking yourself: bottom line, how will I benefit from VM Volumes?
Well, take a VM snapshot for example. With VM Volumes, vSphere can simply offload the operation by invoking a hardware snapshot of the hardware volume. This has significant implications:
VM-Granularity: Only the right VMs are copied (with datastores, backing up or cloning individual-VM portions of hardware snapshot of a datastore would require more complex configuration, tools and work)
Hardware Offload: No ESXi server resources are consumed
XIV advantage: With XIV, snapshots consume no space upfront and are completed instantly.
Here's the first takeaway: With VM Volumes, advanced storage services (which cost a lot when you buy a storage array), will become available at an individual VM level. In a cloud world, this means that applications can be provisioned easily with advanced storage services, such as snapshots and mirroring.
Now, let's take a closer look at another relevant scenario where VM Volumes will make a lot of difference - provisioning an application with special mirroring requirements:
VM Volumes case: The application is ordered via the private cloud portal. The requestor checks a box requesting an asynchronous mirror. He changes the default RPO for his needs. When the request is submitted, the process wraps up automatically: Volumes are created on one of the storage arrays, configured with a mirror and RPO exactly as specified. A few minutes later, the requestor receives an automatic mail pointing to the application virtual machine.
Datastores case #1: As may be expected, a datastore that is mirrored with the special RPO does not exist. As a result, the automated workflow sets a pending status on the request, creates an urgent ticket to a VMware administrator and aborts. When the VMware admin handles that ticket, she re-assigns the ticket to the storage administrator, asking for a new volume which is mirrored with the special RPO, and mapped to the right ESXi cluster. The next day, the volume is created; the ticket is re-assigned to the storage admin, with the new LUN being pointed to. The VMware administrator follows and creates the datastore on top of it. Since the automated workflow was aborted, the admin re-assigns the ticket to the cloud administrator, who sometime later completes the application provisioning manually.
Datastores case #2: Luckily for the requestor, a datastore that is mirrored with the special RPO does exist. However, that particular datastore is consuming space from a high performance XIV Gen3 system with SSD caching, while the application does not require that level of performance, so the workflow requires a storage administrator approval. The approval is given to save time, but the storage administrator opens a ticket for himself to create a new volume on another array, as well as a follow-up ticket for the VMware admin to create a new datastore using the new volume and migrate the application to the other datastore. In this case, provisioning was relatively rapid, but required manual follow up, involving the two administrators.
Here's the second takeaway: With VM Volumes, management is simplified, and end-to-end automation is much more applicable. The reason is that there are no datastores. Datastores physically group VMs that may otherwise be totally unrelated, and require close coordination between storage and VMware administrators.
Now, the above mainly focuses on the VMware or cloud administrator perspective. How does VM Volumes impact storage management?
VM's are the new hosts: Today, storage administrators have visibility of physical hosts in their management environment. In a non-virtualized environment, this visibility is very helpful. The storage administrator knows exactly which applications in a data center are storage-provisioned or affected by storage management operations because the applications are running on well-known hosts. However, in virtualized environments the association of an application to a physical host is temporary. To keep at least the same level of visibility as in physical environments, VMs should become part of the storage management environment, like hosts. Hosts are still interesting, for example to manage physical storage mapping, but without VM visibility, storage administrators will know less about their operation than they are used to, or need to. VM Volumes enables such visibility, because volumes are provided to individual VMs. The XIV VM Volumes demonstration at VMworld Barcelona, although experimental, shows a view of VM volumes, in XIV's management GUI.
Here's a screenshot:
That's not all!
Storage Profiles and Storage Containers: A Storage Profile is a vSphere specification of a set of storage services. A storage profile can include properties like thin or thick provisioning, mirroring definition, snapshot policy, minimum IOPS, etc.
Storage administrators define a portfolio of supported storage services, maintained as a set of storage profiles, and published (via VASA integration) to vSphere.
VMware or cloud administrators define the required storage profiles for specific applications
VMware and storage administrators need to coordinate the typical storage requirements and the automatically-available storage services. When a request to provision an application is made, the associated storage profiles are matched against the published set of available storage profiles. The matching published profiles will be used to create volumes, which will be bound to the application VMs. All that will happen automatically.
Note that when a VM is created today, a datastore must be specified. With VM Volumes, a new management entity called Storage Container (also known as Capacity Pool) replaces the use of datastore as a management object. Each Storage Container exposes a subset of the available storage profiles, as appropriate. The storage container also has a capacity quota.
Here are some more takeaways:
New way to interface vSphere and storage management: Storage administrators structure and publish storage services to vSphere via storage profiles and storage containers.
Automated provisioning, out of the box: The provisioning process automatically matches application-required storage profiles against storage profiles available from the specified storage containers. There is no need to build custom scripts and custom processes to automate storage provisioning to applications
The XIV advantage:
XIV services are very simple to define and publish. The typical number of available storage profiles would be low. It would also be easy to define application storage profiles.
XIV provides consistent high performance, up to very high capacity utilization levels, without any maintenance. As a result, automated provisioning (which inherently implies less human attention) will not create an elevated risk of reduced performance.
Note: A storage vendor VASA provider is required to support VM Volumes, storage profiles, storage containers and automated provisioning. The IBM Storage VASA provider runs as a standalone service that needs to be deployed on a server.
To summarize the VM Volumes value proposition:
Streamline cloud operation by providing storage services at VM and application level, enabling end-to-end provisioning automation, and unifying VMware and storage administration around volumes and VMs.
Increase storage array ROI, improve vSphere scalability and response time, and reduce cloud provisioning lag, by offloading VM-level provisioning, failover, backup, storage migration, storage space recycling, monitoring, and more, to the storage array, using advanced storage operations such as mirroring and snapshots.
Simplify the adoption of VM Volumes using XIV, with smaller and simpler sets of storage profiles. Apply XIV's supreme fast cloning to individual VMs, and keep automation risks at bay with XIV's consistent high performance.
Until you can get your hands on a VM Volumes-capable environment, the VMware and IBM developer groups will be collaborating and working hard to realize this game-changing feature. The above information is definitely expected to trigger your questions or comments, and our development teams are eager to learn from them and respond. Enter your comments below, and I will try to answer them, and help shape the next post on this subject. There's much more to be told.
This month, I am pleased to announce the new [IBM STG Executive Briefing Center] website, representing a huge improvement over the previous website we had been using over the past two years. STG refers to IBM's Systems and Technology Group, the division that focuses on servers, storage, switches and the system software that makes them run. This new website is for the dozen STG EBCs that span the globe. The new website reminds me of this famous quote:
"Perfection is achieved, not when there is nothing left to add, but when there is nothing left to take away"
-- Antoine de Saint-Exupery
Let's take a quick look at what makes it so much better.
The previous website required registration. At every briefing, those of us who work in the EBCs had to pass around a sign-up sheet for email addresses from each attendee so that we could send them an invitation to register for the site. We would have a hard time reading people's handwriting, resulting in some emails coming back rejected.
Inspired by self-service gas stations, automated teller machines, and the many self-service portals of Cloud Computing, the new website has everything up-front, without registration. IBM Business Partners and sales representatives can easily request a briefing at any of the dozen briefing centers represented!
IBM-managed and IBM-hosted
We had a difficult time explaining to our attendees why our previous website was hosted on a lone machine and maintained by a third party. Think about it, IBM manages the data centers of over 400 clients. IBM has provided web hosting to the most mission critical workloads, with high levels of availability and reliability, and is recognized as one of the "Big 5" Cloud companies. I have done web design myself in my career, and we were terribly disappointed with the third party chosen to create and maintain our previous website, constantly having to point out errors in their HTML and CSS.
For the new website, IBM took back control. Staff from each EBC, myself included, came up with a simple page to bring the essence of each location to life. Special thanks to my colleage Hal Jennings, from the Austin EBC, for bringing this altogether!
Despite two years of manually registering attendees to use the previous website, Google Analytics showed that few people visited, and the few that did spent little time exploring the vast repository of content.
The new website is vastly simpler. The front page points to all twelve EBCs, and a single mouse click gets you to the location you are interested in, with all the details you need to make a decision to book a briefing, and the contact information to make it happen.
Elimination of Wasted and Duplicate Effort
In the previous website, we spent as much as 15 hours just to create, voice over, edit and produce a single 15-minute recorded presentation. Less than six percent of the previous website visitors watched more than five minutes of these videos, making us feel that most of our effort was wasted.
The EBC staff kept wasting their time, month after month, thanks to all-stick, no-carrot tactics that mandated minimums for contributions for more and more content that nobody was ever looking at. Even more disappointing was that much of our work duplicated the formal responsibilities of our IBM Marketing team. They weren't happy about this either, causing confusion between the roles of our two teams.
Finally, we said enough was enough! The new STG EBC website is a marvel in minimalism. If you want to see presentations, videos, expert profiles, or partake in on-going conversations, I welcome you to visit the [IBM Expert Network], the [IBM Storage YouTube Channel], and the [Storage Community] where they belong.
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
This week, I am in beautiful Sao Paulo, Brazil, teaching Top Gun class to IBM Business Partners and sales reps. Traditionally, we have "Tape Thursday" where we focus on our tape systems, from tape drives, to physical and virtual tape libraries. IBM is the number #1 tape vendor, and has been for the past eight years.
(The alliteration doesn't translate well here in Brazil. The Portuguese word for tape is "fita", and Thursday here is "quinta-feira", but "fita-quinta-feira" just doesn't have the same ring to it.)
In the class, we discussed how to handle common misperceptions and myths about tape. Here are a few examples:
Myth 1: Tape processing is manually intensive
In my July 2007 blog post [Times a Million], I coined the phrase "Laptop Mentality" to describe the problem most people have dealing with data center decisions. Many folks extend linearly their experiences using their PCs, workstations or laptops to apply to the data center, unable to comprehend large numbers or solutions that take advantage of the economies of scale.
For many, the only experience dealing with tape was manual. In the 1980s, we made "mix tapes" on little cassettes, and in the 1990s we recorded our favorite television shows on VHS tapes in the VCR. Today, we have playlists on flash or disk-based music players, and record TV shows on disk-based video recorders like Tivo. The conclusion is that tapes are manual, and disk are not.
Manual processing of tapes ended in 1987, with the introduction of a silo-like tape library from StorageTek. IBM quickly responded with its own IBM 3495 Tape Library Data Server in 1992. Today, clients have many tape automation choices, from the smallest IBM TS2900 Tape Autoloader that has one drive and nine cartridges, all the way to the largest IBM TS3500 multiple-library shuttle complex that can hold exabytes of data. These tape automation systems eliminate most of the manual handling of cartridges in day-to-day operations.
Myth 2: Tape media is less reliable than disk media
For any storage media to be unreliable is to return the wrong information that is different than what was originally stored. There are only two ways for this to happen: if you write a "zero" but read back a "one", or write a "one" and read a "zero". This is called a bit error. Every storage media has a "bit error rate" that is the average likelihood for some large amount of data written.
According to the latest [LTO Bit Error rates, 2012 March], today's tape expects only 1 bit error per 10E17 bits written (about 100 Petabytes). This is 10 times more reliable than Enterprise SAS disk (1 bit per 10E16), and 100 times more reliable than Enterprise-class SATA disk (1 bit per 10E15).
Tape is the media used in "black boxes" for airplanes. When an airplane crashes, the black box is retrieved and used to investigate the causes of the crash. In 1986, the Space Shuttle Challenger exploded 73 seconds after take-off. The tapes in the black box sat on the ocean floor for six weeks before being recovered. Amazingly, IBM was able to successfully restore [90 percent of the block data, and 100 percent of voice data].
Analysts are quite upset when they are quoted out of context, but in this case, Gartner never said anything closely similar to this. Nor did the other analysts that Curtis investigated for similar claims. What Garnter did say was that disk provides an attractive alternative storage media for backup which can increase the performance of the recovery process.
Back in the 1990s, Savur Rao and I developed a patent to help backup DB2 for z/OS by using the FlashCopy feature of IBM's high-end disk system. The software method to coordinate the FlashCopy snapshots with the database application and maintain multiple versions was implemented in the DFSMShsm component of DFSMS. A few years later, this was part of a set of patents IBM cross-licensed to Microsoft for them to implement a similar software for Windows called Data Protection Manager (DPM). IBM has since introduced its own version for distributed systems called IBM Tivoli FlashCopy Manager that runs not just on Windows, but also AIX, Linux, HP-UX and Solaris operating systems.
Curtis suspects the "71 percent" citation may have been propogated by an ambitious product manager of Microsoft's Data Protection Manager, back in 2006, perhaps to help drive up business to their new disk-based backup product. Certainly, Microsoft was not the only vendor to disparage tape in this manner.
A few years ago, an [EMC failure brought down the State of Virginia] due to not just a component failure it its production disk system, but then made it worse by failing to recover from the disk-based remote mirror copy. Fortunately, the data was able to be restored from tape over the next four days. If you wonder why nobody at EMC says "Tape is Dead" anymore, perhaps it is because tape saved their butts that week.
(FTC Disclosure: I work for IBM and this post can be considered a paid, celebrity endorsement for all of the IBM tape and software products mentioned on this post. I own shares of stock in both IBM and Google, and use Google's Gmail for my personal email, as well as many other Google services. While IBM, Google and Microsoft can be considered competitors to each other in some areas, IBM has working relationships with both companies on various projects. References in this post to other companies like EMC are merely to provide illustrative examples only, based on publicly available information. IBM is part of the Linear Tape Open (LTO) consortium.)
Myth 4: Vendors and Manufacturers are no longer investing in tape technology
IBM and others are still investing Research and Development (R&D) dollars to improve tape technology. What people don't realize is that much of the R&D spent on magnetic media can be applied across both disk and tape, such as IBM's development of the Giant Magnetoresistance read/write head, or [GMR] for short.
Most recently, IBM made another major advancement with tape with the introduction of the Linear Tape File Systems (LTFS). This allows greater portability to share data between users, and between companies, but treating tape cartridges much like USB memory sticks or pen drives. You can read more in my post [IBM and Fox win an Emmy for LTFS technology]!
Next month, IBM celebrates the 60th anniversary for tape. It is good to see that tape continues to be a vibrant part of the IT industry, and to IBM's storage business!
Well, it's Tuesday again, and you know what that means!
This Thursday is the Thanksgiving holiday here in the United States, so instead of announcing IBM products, I wanted to announce the general availability of my latest book, [Inside System Storage: Volume III].
This book includes blog posts from May 2008 to March 2009, along with the ever popular behind-the-scenes commentary on what was going on during IBM's launch of the Information Infrastructure initiative.
Do you know someone who celebrates Chanukah, Christmas, Kwanza, or the Winter Solstice, and have a hard time finding the right gift?
Do you know a client or IBM Business Partner that would appreciate a nominally-priced gift to thank them for their business?
Do you know someone newly hired into IBM or another IT company that could benefit from behind-the-scenes insight and commentary?
As with the other two volumes, Inside System Storage: Volume III is available in your choice of paperback, hardcover, and eBook (Adobe PDF) format.
In the spirit of Thanksgiving, I would like to thank my editor, Susan Pollard, who put in the extra effort, working evenings and weekends, to get this book done in time for the upcoming holiday season. For those outside the United States, there is an American tradition to shop in brick-and-mortar stores on Black Friday (the day after Thanksgiving) and to shop on-line for books like mine on Cyber Monday (the Monday after Thanksgiving).
I would also like to thank my publisher, Lulu.com, for upgrading me to "Spotlight" level, so now I have a spotlight page titled [Books Written by Tony Pearson], making it easy for you to order any of my books in various formats.
And last, but not least, I would like to thank all my friends and family that were supportive these past few difficult months while I was putting this book together.
Next month, I will be in Las Vegas, Dec 4-8, speaking at Gartner's [Data Center Conference]. If you order a book today, and bring it with you to the IBM booth at the Solution Expo, I can sign it for you!
This week, IBM made over a dozen announcements related to IBM storage products. Here is part 2 of my overview:
IBM System Storage® DS8000 series microcode
One of the advantages of acquiring XIV as IBM's other high-end disk system, is that it allows the DS8000 team to focus on the IBM i and z/OS operating systems. As a result, IBM DS8000 has over half the mainframe-attach market share.
For both the DS8700 and DS8800 models, IBM Easy Tier now support sub-LUN automated tiering across three storage tiers: Solid-State Drives, high-performance spinning disk drives (15K and 10K RPM), and high-capacity disk drives (7200 RPM).
For System z customers, the latest DS8000 microcode has synergy with z/OS and GDPS, now supporting 4x larger EAV volumes, faster high-performance FICON (zHPF), and Workload Manager (WLM) integration with the I/O Priority Manager. IBM has a world record SAP performance of 59 million account postings per hour. DB2 v10 for z/OS queries were measured at 11x faster using the new zHPF feature.
IBM System Storage® DS8800 systems
On the hardware side, the DS8800 now supports a fourth frame to hold a total over 1,500 disk drives. Yes, we have customers that three frames wasn't enough, and they wanted more.
IBM is now also offering new drive options. Small Form Factor (2.5 inch) drives now include 300GB 15K RPM drives, and a 900GB 10K RPM drives. But wait! There's more! The DS8800 is no longer a SFF-only box, it now allows for mixing in Large form factor (3.5 inch) drives, starting with the 3TB NL-SAS 7200 RPM drive.
IBM XIV® Storage System Gen3
We announced the XIV Gen3 already, but we have two enhancements.
First, we now offer a model based entirely on 3TB NL-SAS drives. If you are thinking, what IBM is going to put 3TB drives into everything? Yup. Once we go through all the pain and suffering of qualifying a drive, we make sure we get our money's worth!
Secondly, we have now an iPad application to manage the XIV. This has nothing to do with Apple CEO Steve Jobs passing away last week, it was merely coincidence.
IBM Real-time Compression Appliances™ STN6500 and STN6800 V3.8
The latest software for RtCA now supports Microsoft SMB v2, and enhanced reporting so that storage admins know exactly the benefits of the compression ratios of different file extensions.
IBM System Storage EXP2500 Express®
The EXP2500 is for direct-attach situations, like the IBM BladeCenter. IBM adds LFF 3.5-inch 3TB NL-SAS drives, SFF 2.5-inch 300GB 15K RPM SAS drives, and 900GB 7200 RPM NL-SAS drives.
My colleague Curtis Neal refers to these as "B.F.D" announcements, which of course stands for Bigger, Faster, Denser!
Last week, fellow IBMer Ron Riffe started his three-part series on the Storage Hypervisor. I discussed Part I already in my previous post [Storage Hypervisor Integration with VMware]. We wrapped up the week with a Live Chat with over 30 IT managers, industry analysts, independent bloggers, and IBM storage experts.
"The idea of shopping from a catalog isn’t new and the cost efficiency it offers to the supplier isn’t new either. Public storage cloud service providers seized on the catalog idea quickly as both a means of providing a clear description of available services to their clients, and of controlling costs. Here’s the idea… I can go to a public cloud storage provider like Amazon S3, Nirvanix, Google Storage for Developers, or any of a host of other providers, give them my credit card, and get some storage capacity. Now, the “kind” of storage capacity I get depends on the service level I choose from their catalog.
Most of today’s private IT environments represent the complete other end of the pendulum swing – total customization. Every application owner, every business unit, every department wants to have complete flexibility to customize their storage services in any way they want. This expectation is one of the reasons so many private IT environments have such a heavy mix of tier-1 storage. Since there is no structure around the kind of requests that are coming in, the only way to be prepared is to have a disk array that could service anything that shows up. Not very efficient… There has to be a middle ground.
Private storage clouds are a little different. Administrators we talk to aren’t generally ready to let all their application owners and departments have the freedom to provision new storage on their own without any control. In most cases, new capacity requests still need to stop off at the IT administration group. But once the request gets there, life for the IT administrator is sweet!
Here comes the request from an application owner for 500GB of new “Database” capacity (one of the options available in the storage service catalog) to be attached to some server. After appropriate approvals, the administrator can simply enter the three important pieces of information (type of storage = “Database”, quantity = 500GB, name of the system authorized to access the storage) and click the “Go” button (in TPC SE it’s actually a “Run now” button) to automatically provision and attach the storage. No more complicated checklists or time consuming manual procedures.
A storage hypervisor increases the utilization of storage resources, and optimizes what is most scarce in your environment. For Linux, UNIX and Windows servers, you typically see utilization rates of 20 to 35 percent, and this can be raised to 55 to 80 percent with a storage hypervisor. But what is most scarce in your environment? Time! In a competitive world, it is not big animals eating smaller ones as much as fast ones eating the slow.
Want faster time-to-market? A storage hypervisor can help reduce the time it takes to provision storage, from weeks down to minutes. If your business needs to react quickly to changes in the marketplace, you certainly don't want your IT infrastructure to slow you down like a boat anchor.
Want more time with your friends and family? A storage hypervisor can migrate the data non-disruptively, during the week, during the day, during normal operating hours, instead of scheduling down-time on an evenings and weekends. As companies adopt a 24-by-7 approach to operations, there are fewer and fewer opportunities in the year for scheduled outages. Some companies get stuck paying maintenance after their warranty expires, because they were not able to move the data off in time.
Want to take advantage of the new Solid-State Drives? Most admins don't have time to figure out what applications, workloads or indexes would best benefit from this new technology? Let your storage hypervisor automated tiering do this for you! In fact, a storage hypervisor can gather enough performance and usage statistics to determine the characteristics of your workload in advance, so that you can predict whether solid-state drives are right for you, and how much benefit you would get from them.
Want more time spent on strategic projects? A storage hypervisor allows any server to connect to any storage. This eliminates the time wasted to determine when and how, and let's you focus on the what and why of your more strategic transformational projects.
If this sounds all too familiar, it is similar to the benefits that one gets from a server hypervisor -- better utilization of CPU resources, optimizing the management and administration time, with the agility and flexibility to deploy new technologies in and decommission older ones out.
"Server virtualization is a fairly easy concept to understand: Add a layer of software that allows processing capability to work across multiple operating environments. It drives both efficiency and performance because it puts to good use resources that would otherwise sit idle.
Storage virtualization is a different animal. It doesn't free up capacity that you didn't know you had. Rather, it allows existing storage resources to be combined and reconfigured to more closely match shifting data requirements. It's a subtle distinction, but one that makes a lot of difference between what many enterprises expect to gain from the technology and what it actually delivers."
Jon Toigo on his DrunkenData blog brings back the sanity with his post [Once More Into the Fray]. Here is an excerpt:
"What enables me to turn off certain value-add functionality is that it is smarter and more efficient to do these functions at a storage hypervisor layer, where services can be deployed and made available to all disk, not to just one stand bearing a vendor’s three letter acronym on its bezel. Doesn’t that make sense?
I think of an abstraction layer. We abstract away software components from commodity hardware components so that we can be more flexible in the delivery of services provided by software rather than isolating their functionality on specific hardware boxes. The latter creates islands of functionality, increasing the number of widgets that must be managed and requiring the constant inflation of the labor force required to manage an ever expanding kit. This is true for servers, for networks and for storage.
Can we please get past the BS discussion of what qualifies as a hypervisor in some guy’s opinion and instead focus on how we are going to deal with the reality of cutting budgets by 20% while increasing service levels by 10%. That, my friends, is the real challenge of our times."
Did you miss out on last Friday's Live Chat? We are doing it again this Friday, covering parts I and II of Ron's posts, so please join the conversation! The virtual dialogue on this topic will continue in another [Live Chat] on September 30, 2011 from 12 noon to 1pm Eastern Time.
Can you believe it has been five years since I started blogging?
(If you absolutely abhor the navel-gazing associated with blogging-about-blogging posts, then by all means stop reading now!)
Back in July 2005, IBM decided to merge together two brands, IBM eServer and IBM TotalStorage, into a single all-encompassing "IBM Systems" brand. Thus TotalStorage brand became the "IBM System Storage" product line of the "IBM Systems" brand. The next six months was spent renaming some (not all) of the products. The following January, I was named the Marketing Strategist for this new product line, with the mission to help promote the new naming convention.
We looked at possibly doing a regularly-scheduled podcast, but nobody back then, including myself, were familar with audio editing tools. Instead, we chose a blog. Most blogs at IBM are internal, safely hidden behind the firewall, accessible only to IBM employees. I wanted mine to be different, to be accessible to the public, clients, prospects, IBM Business Partners, and yes, even those working for IBM's various competitors. One thing I like about blogs is that if you have a typo, or make a mistake, you can go back and correct it after it has posted.
Marketing through social media is quite different than traditional marketing techniques. Management was supportive, but legal wanted to review and approval everything I wrote before I posted it onto my blog. Official IBM Press Releases, for example, go through a dozen reviews before they are finally made public. I refused. This kind of review and approval would ruin the blogging process.
Fortunately, this blog was not my first attempt at technical writing. Our legal counsel reviewed my past trip reports from various conferences, and decided to let me blog without review. Occasionally, someone will reivew my blog once already posted, and ask me to make some corrections. It reminds me of my favorite saying used heavily within IBM:
Despite these delays, we managed to launch this blog in September 2006, just in time to celebrate the 50th anniversary of disk systems. IBM introduced the industry's first commercial disk system on September 13, 1956.
Over the years, this blog has helped sales reps and IBM Business Partners close deals, and address the FUD their prospects heard from competition. I have helped my readers get in touch with the right people within IBM. And, I have "sent the elevator back down", helping other IBMers launch their own blogs, including [Barry Whyte], [Elisabeth Stahl], and [Anthony Vandewerdt].
Today, bloggers have a profound impact on the world. Not everyone has a positive view on this. Bloggers and other users of social media have been seen as whistle-blowers for fraudulent corporations, as activists against corrupt governments and dictators, and as subject matter experts and fact checkers referenced during television and radio newscasts. In a recent movie, one of the major characters was a trouble-making blogger, and another character describes his blogging as nothing more than "graffiti with punctuation."
I want to thank all of my readers for making this the #1 most influential blog on IBM DeveloperWorks in 2011! This blog has been [published in a series of books], Inside System Storage Volume I and Volume II. And yes, before you all ask in the comments below, I am actively working on Volume III.
For a bit of nostalgia, I invite you to read my first 21 blog posts that I posted back in [September 2006].
After the amount of flack Jon Toigo had to endure for not giving advanced notice to his upcoming Webcast, I thought I would better remind people about my own Webinar that is happening next Tuesday, August 23.
So here's the scoop, next Tuesday I will be presenting [The Future of Storage], August 23, 1pm to 2pm EDT. You can register to attend at the [Infoboom Registration Page]. Infoboom is a social community for business and IT leaders of small and midsize businesses brought to you by IBM.
But that's not all! After the webinar, I will then travel to various cities for face-to-face lectures. Here are the first two:
September 7 - Indianapolis
September 8 - Boston area
If you are near either of these two locations, contact your local IBM storage specialist or IBM business partner to participate.
The IBM Storwize V7000 was introduced last October, and has proven to be wildly successful. I saw two awesome reviews recently of the IBM Storwize V7000 disk system that I thought I would bring to your attention.
The first review is [IBM Storwize V7000] from Roger Howorth of ZDNet UK. Here are some quotes:
"Under the hood, the Storwize V7000 is built from technologies originally developed for IBM's enterprise-class storage systems, so the V7000 benefits from a comprehensive set of high-end features that have been scaled down for mid-range buyers."
"Initial configuration couldn't be simpler."
"We really liked the layout and functionality of the GUI."
"Storwize V7000 is virtual storage that offers efficiency and flexibility through built-in SSD optimization and "thin provisioning" technologies while enabling users to virtualize and re-use existing disk systems..."
"Storwize V7000 advanced functionality also enables non-disruptive migration of data from existing storage, simplifying implementation and minimizing disruption to users."
"The Storwize V7000 graphical user interface is a browser-based, easy to navigate intuitive GUI."
"ESG Lab found that getting started with the Storwize V7000 disk system was intuitive and straightforward."
"Easy Tier increases the efficiency and simplicity of deploying SSD drives."
Full VMware Vstorage API for Array Integration (VAAI). Back in 2008, VMware announced new vStorage APIs for its vSphere ESX hypervisor: vStorage API for Site Recovery Manager, vStorage API for Data Potection, vStorage API for Multipathing. Last July, VMware added a new API called vStorage API for Array Integration [VAAI] which offers three primitives:
Hardware-assisted Blocks zeroing. Sometimes referred to as "Write Same", this SCSI command will zero out a large section of blocks, presumably as part of a VMDK file. This can then be used to reclaim space on the XIV on thin-provisioned LUNs.
Hardware-assisted Copy. Make an XIV snapshot of data without any I/O on the server hardware.
Hardware-assisted locking. On mainframes, this is call Parallel Access Volumes (PAV). Instead of locking an entire LUN using standard SCSI reserve commands, this primitive allows an ESX host to lock just an individual block so as not to interfere with other hosts accessing other blocks on that same LUN.
Quality of Service (QoS) Performance Classes.
When XIV was first released, it treated all hosts and all data the same, even when deployed for a variety of different applications. This worked for some clients, such as [Medicare y Mucho Más]. They migrated their databases, file servers and email system from EMC CLARiiON to an IBM XIV Storage System. In conjunction with VMware, the XIV provides a highly flexible and scalable virtualized architecture, which enhances the company's business agility.
However, other clients were skeptical, and felt they needed additional "nobs" to prioritize different workloads. The new 10.2.4 microcode allows you to define four different "performance classes". This is like the door of a nightclub. All the regular people are waiting in a long line, but when a celebrity in a limo arrives, the bouncer unclips the cord, and lets the celebrity in. For each class, you provide IOPS and/or MB/sec targets, and the XIV manages to those goals. Performance classes are assigned to each host based on their value to the business.
Offline Initialization for Asynchronous Mirror.
Internally, we called this Truck Mode. Normally, when a customer decides to start using Asynchronous Mirror, they already have a lot of data at the primary location, and so there is a lot of data to send over to the new XIV box at the secondary location. This new feature allows the data to be dumped to tape at the primary location. Those tapes are shipped to the secondary location and restored on the empty XIV. The two XIV boxes are then connected for Asynchronous Mirroring, and checksums of each 64KB block are compared to determine what has changed at the primary during this "tape delivery time". This greatly reduces the time it takes for the two boxes to get past the initial synchronization phase.
IP-based Replication. When IBM first launched the Storwize V7000 last October, people commented that the one feature they felt missing was IP-based replication. Sure, we offered FCP-based replication as most other Enterprise-class disk systems offer today, but many midrange systems also offer IP-based repliation to reduce the need for expensive FCIP routers. [IBM Tivoli Storage FastBack for Storwize V7000] provides IP-based replication for Storwize V7000 systems.
Network Attached Storage
IBM announced two new models of the IBM System Storage N series. The midrange N6240 supports up to 600 drives, replacing the N6040 system. The entry-level N6210 supports up to 240 drives, and replaces the N3600 system. Details for both are available on the latest [data sheet].
IBM Real-Time Compression appliances work with all N series models to provide additional storage efficiency. Last October, I provided the [Product Name Decoder Ring] for the STN6500 and STN6800 models. The STN6500 supports 1 GbE ports, and the STN6800 supports 10GbE ports (or a mix of 10GbE and 1GbE, if you prefer). The IBM versions of these models were announced last December, but some people were on vacation and might have missed it. For more details of this, read the [Resources page], the [landing page], or [watch this video].
IBM System Storage DS3000 series
IBM System Storage [DS3524 Express DC and EXP3524 Express DC] models are powered with direct current (DC) rather than alternating current (AC). The DS3524 packs dual controllers and two dozen small-form factor (2.5 inch) drives in a compact 2U-high rack-optimized module. The EXP3524 provides addition disk capacity that can be attached to the DS3524 for expansion.
Large data centers, especially those in the Telecommunications Industry, receive AC from their power company, then store it in a large battery called an Uninterruptible Power Supply (UPS). For DC-powered equipment, they can run directly off this battery source, but for AC-powered equipment, the DC has to be converted back to AC, and some energy is lost in the conversion. Thus, having DC-powered equipment is more energy efficient, or "green", for the IT data center.
Whether you get the DC-powered or AC-powered models, both are NEBS-compliant and ETSI-compliant.
New Tape Drive Options for Autoloaders and Libraries
IBM System Storage [TS2900 Autoloader] is a compact 1U-high tape system that supports one LTO drive and up to 9 tape cartridges. The TS2900 can support either an LTO-3, LTO-4 or LTO-5 half-height drive.
IBM System Storage [TS3100 and TS3200 Tape Libraries] were also enhanced. The TS3100 can accomodate one full-height LTO drive, or two half-height drives, and hold up to 24 cartridges. The TS3200 offers twice as many drives and space for cartridges.
From New York, Rolf went to London, Paris, Madrid, Morocco, Cairo, South Africa, Bangkok Thailand, Malaysia, Singapore, New Zealand, Australia, and then back to United States. I was hoping to run into him while I was in Australia and New Zealand last month, but our schedules did not line up.
Travelingwithout baggage is more than just a convenience, it is a metaphor for the philosophy that we should keep only what we need, and leave behind what we don't. This was the approach taken by IBM in the design of the IBM Storwize V7000 midrange disk system.
The IBM Storwize V7000 disk system consists of 2U enclosures. Controller enclosures have dual-controllers and drives. Expansion enclosures have just drives. Enclosures can have either 24 smaller form factor (SFF) 2.5-inch drives, or twelve larger 3.5-inch drives. A controller enclosure can be connected up to nine expansion enclosures.
The drives are all connected via 6 Gbps SAS, and come in a variety of speeds and sizes: 300GB Solid-State Drive (SSD); 300GB/450GB/600GB high-speed 10K RPM; and 2TB low-speed 7200 RPM drives. The 12-bay enclosures can be intermixed with 24-bay enclosures on the same system, and within an enclosure different speeds and sizes can be intermixed. A half-rack system (20U) could hold as much as 480TB of raw disk capacity.
This new system, freshly designed entirely within IBM, competes directly against systems that carry a lot of baggage, including the HDS AMS, HP EVA, an EMC CLARiiON CX4 systems. Instead, we decided to keep the what we wanted from our other successful IBM products.
Inspired by our successful XIV storage system, IBM has developed a web-based GUI that focuses on ease-of-use. This GUI uses the latest HTML5 and dojo widgets to provide an incredible user experience.
Borrowed from our IBM DS8000 high-end disk systems, state-of-the-art device adapters provide 6 Gbps SAS connectivity with a variety of RAID levels: 0, 1, 5, 6, and 10.
From our SAN Volume Controller, the embedded [ SVC 6.1 firmware] provides all of the features and functions normally associated with enterprise-class systems, including Easy Tier sub-LUN automated tiering between Solid-State Drives and Spinning disk, thin provisioning, external disk virtualization, point-in-time FlashCopy, disk mirroring, built-in migration capability, and long-distance synchronous and asynchronous replication.
Finally, the various "internal NDA" that kept me from publishing this sooner have expired, so now I have the long-awaited [Inside System Storage: Volume II], documenting IBM's transformation in its storage strategy, including behind-the-scenes commentary about IBM's acquisitions of XIV and Diligent. Available initially in paperback form. I am still working on the hard cover and eBook editions.
For those who have not yet read my first book, Inside System Storage: Volume I, it is still available from my publisher Lulu, in [hard cover], [paperback] and [eBook] editions.
IBM System Storage DS8800
A lesson IBM learned long ago was not to make radical changes to high-end disk systems, as clients who run mission-critical applications are more concerned about reliability, availability and serviceability than they are performance or functionality. Shipping any product before it was ready meant painfully having to fix the problems in the field instead.
(EMC apparently is learning this same lesson now with their VMAX disk system. Their Engenuity code from Symmetrix DMX4 was ported over to new CLARiiON-based hardware. With several hundred boxes in the field, they have already racked up over 150 severity 1 problems, roughly half of these resulted in data loss or unavailability issues. For the sake of our mutual clients that have both IBM servers and EMC disk, I hope they get their act together soon.)
To avoid this, IBM made incremental changes to the successful design and architecture of its predecessors. The new DS8800 shares 85 percent of the stable microcode from the DS8700 system. Functions like Metro Mirror, Global Mirror, and Metro/Global Mirror, are compatible with all of the previous models of the DS8000 series, as well as previous models of the IBM Enterprise Storage Server (ESS) line.
The previous models of DS8000 series were designed to take in cold air from both front and back, and route the hot air out the top, known as chimney design. However, many companies are re-arranging their data centers into separate cold aisles and hot aisles. The new DS8800 has front-to-back cooling to help accommodate this design.
My colleague Curtis Neal would call the rest of this a "BFD" announcement, which of course stands for "Bigger, Faster and Denser". The new DS8800 scales-up to more drives than its DS8700 predecessor, and can scale-out from a single-frame 2-way system to a multi-frame 4-way system. IBM has upgraded to faster 5GHz POWER6+ processors, with dual-core 8 Gbps FC and FICON host adapters, 8 Gbps device adapters, and 6 Gbps SAS connectivity to smaller form factor (SFF) 2.5-inch SAS drives. IBM Easy Tier will provide sub-LUN automated tiering between Solid-State Drives and spinning disk. The denser packaging with SFF drives means that we can pack over 1000 drives in only three frames, compared to five frames required for the DS8700.
The [IBM System Storage SAN Volume Controller] software release v6.1 brings Easy Tier sub-LUN automated tiering to the rest of the world. IBM Easy Tier moves the hottest, most active extents up to Solid-State Drives (SSD) and moves the coldest, least active down to spinning disk. This works whether the SSD is inside the SVC 2145-CF8 nodes, or in the managed disk pool.
Tired of waiting for EMC to finally deliver FAST v2 for your VMAX? It has been 18 months since they first announced that someday they would have sub-LUN automatic tiering. What is taking them so long? Why not virtualize your VMAX with SVC, and you can have it sooner!
SVC 6.1 also upgrades to a sexy new web-based GUI, which like the one for the IBM Storwize V7000, is based on the latest HTML5 and dojo widget standards. Inspired by the popular GUI from the IBM XIV Storage System, this GUI has greatly improved ease-of-use.
A client asked me to explain "Nearline storage" to them. This was easy, I thought, as I started my IBM career on DFHSM, now known as DFSMShsm for z/OS, which was created in 1977 to support the IBM 3850 Mass Storage System (MSS), a virtual storage system that blended disk drives and tape cartridges with robotic automation. Here is a quick recap:
Online storage is immediately available for I/O. This includes DRAM memory, solid-state drives (SSD), and always-on spinning disk, regardless of rotational speed.
Nearline storage is not immediately available, but can be made online quickly without human intervention. This includes optical jukeboxes, automated tape libraries, as well as spin-down massive array of idle disk (MAID) technologies.
Offline storage is not immediately available, and requires some human intervention to bring online. This can include USB memory sticks, CD/DVD optical media, shelf-resident tape cartridges, or other removable media.
Sadly, it appears a few storage manufacturers and vendors have been misusing the term "Nearline" to refer to "slower online" spinning disk drives. I find this [June 2005 technology paper from Seagate], and this [2002 NetApp Press Release], the latter of which included this contradiction for their "NearStore" disk array. Here is the excerpt:
"Providing online access to reference information—NetApp nearline storage solutions quickly retrieve and replicate reference and archive information maintained on cost-effective storage—medical images, financial models, energy exploration charts and graphs, and other data-intensive records can be stored economically and accessed in multiple locations more quickly than ever"
Which is it, "online access" or "nearline storage"?
If a client asked why slower drives consume less energy or generate less heat, I could explain that, but if they ask why slower drives must have SATA connections, that is a different discussion. The speed of a drive and its connection technology are for the most part independent. A 10K RPM drive can be made with FC, SAS or SATA connection.
I am opposed to using "Nearlne" just to distinguish between four-digit speeds (such as 5400 or 7200 RPM) versus "online" for five-digit speeds (10,000 and 15,000 RPM). The difference in performance between 10K RPM and 7200 RPM spinning disks is miniscule compared to the differences between solid-state drives and any spinning disk, or the difference between spinning disk and tape.
I am also opposed to using the term "Nearline" for online storage systems just because they are targeted for the typical use cases like backup, archive or other reference information that were previously directed to nearline devices like automated tape libraries.
Can we all just agree to refer to drives as "fast" or "slow", or give them RPM rotational speed designations, rather than try to incorrectly imply that FC and SAS drives are always fast, and SATA drives are always slow? Certainly we don't need new terms like "NL-SAS" just to represent a slower SAS connected drive.
Well, it feels like Tuesday and you know what that means... "IBM Announcement Day!" Actually, today is Wednesday, but since Monday was Memorial Day holiday here in the USA, my week is day-shifted. Yesterday, IBM announced its latest IBM FlashCopy Manager v2.2 release. Fellow blogger, Del Hoobler (IBM) has also posted something on this out atthe [Tivoli Storage Blog].
IBM FlashCopy Manager replaces two previous products. One was called Tivoli Storage Manager for Copy Services, the other was called Tivoli Storage Manager for Advanced Copy Services. To say people were confused between these two was an understatement, the first was for Windows, and the second was for UNIX and Linux operating systems. The solution? A new product that replaces both of these former products to support Windows, UNIX and Linux! Thus, IBM FlashCopy Manager was born. I introduced this product back in 2009 in my post [New DS8700 and other announcements].
IBM Tivoli Storage FlashCopy Manager provides what most people with "N series SnapManager envy" are looking for: application-aware point-in-time copies. This product takes advantage of the underlying point-in-time interfaces available on various disk storage systems:
FlashCopy on the DS8000 and SAN Volume Controller (SVC)
Snapshot on the XIV storage system
Volume Shadow Copy Services (VSS) interface on the DS3000, DS4000, DS5000 and non-IBM gear that supports this Microsoft Windows protocol
For Windows, IBM FlashCopy Manager can coordinate the backup of Microsoft Exchange and SQL Server. The new version 2.2 adds support for Exchange 2010 and SQL Server 2008 R2. This includes the ability to recover an individual mailbox or mail item from an Exchange backup. The data can be recovered directly to an Exchange server, or to a PST file.
For UNIX and Linux, IBM FlashCopy Manager can coordinate the backup of DB2, SAP and Oracle databases. Version 2.2 adds support specific Linux and Solaris operating systems, and provides a new capability for database cloning. Basically, database cloning restores a database under a new name with all the appropriate changes to allow its use for other purposes, like development, test or education training. A new "fcmcli" command line interface allows IBM FlashCopy Manager to be used for custom applications or file systems.
A common misperception is that IBM FlashCopy Manager requires IBM Tivoli Storage Manager backup software to function. That is not true. You have two options:
In Stand-alone mode, it's just you, the application, IBM FlashCopy Manager and your disk system. IBM FlashCopy Manager coordinates the point-in-time copies, maintains the correct number of versions, and allows you to backup and restore directly disk-to-disk.
Unified Recovery Management with Tivoli Storage Manager
Of course, the risk with relying only on point-in-time copies is that in most cases, they are on the same disk system as the original data. The exception being virtual disks from the SAN Volume Controller. IBM FlashCopy Manager can be combined with IBM Tivoli Storage Manager so that the point-in-time copies can be copied off to a local or remote TSM server, so that if the disk system that contains both the source and the point-in-time copies fails, you have a backup copy from TSM. In this approach, you can still restore from the point-in-time copies, but you can also restore from the TSM backups as well.
IBM FlashCopy Manager is an excellent platform to connect application-aware fucntionality with hardware-based copy services.
Well, I'm back safely from my tour of Asia. I am glad to report that Tokyo, Beijing and Kuala Lumpur are pretty much how I remember them from the last time I was there in each city. I have since been fighting jet lag by watching the last thirteen episodes of LOST season 6 and the series finale.
Recently, I have started seeing a lot of buzz on the term "Storage Federation". The concept is not new, but rather based on the work in database federation, first introduced in 1985 by [A federated architecture for information management] by Heimbigner and McLeod. For those not familiar with database federation, you can take several independent autonomous databases, and treat them as one big federated system. For example, this would allow you to issue a single query and get results across all the databases in the federated system. The advantage is that it is often easier to federate several disparate heterogeneous databases than to merge them into a single database. [IBM Infosphere Federation Server] is a market leader in this space, with the capability to federate DB2, Oracle and SQL Server databases.
Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.
Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.
Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.
Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.
Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.
To some extent, IBM SAN Volume Controller (SVC), XIV, Scale-Out NAS (SONAS), and Information Archive (IA) offer most, if not all, of these capabilities. EMC claims its VPLEX will be able to offer storage federation, but only with other VPLEX clusters, which brings up a good question. What about heterogenous storage federation? Before anyone accuses me of throwing stones at glass houses, let's take a look at each IBM solution:
IBM SAN Volume Controller
The IBM SAN Volume Controller has been doing storage federation since 2003. Not only can IBM SAN Volume Controller bring together storage from a variety of heterogenous storage, the SVC cluster itself can be a mix of different hardware models. You can have a 2145-8A4 node pair, 2145-8G4 node pair, and the new 2145-CF8 node pair, all combined together into a single SVC cluster. Upgrading SVC hardware nodes in an SVC cluster is always non-disruptive.
IBM XIV storage system
The IBM XIV has two kinds of independent modules. Data modules have processor, cache and 12 disks. Interface modules are data modules with additional processor, FC and Ethernet (iSCSI) adapters. Because these two modules play different roles in an XIV "colony", that number of each type is predetermined. Entry-level six-module systems have 2 interface and 4 data modules. Full 15-module systems have 6 interface and 9 data modules. Individual modules can be added or removed non-disruptively in an XIV.
IBM Scale-Out NAS
The SONAS is comprised of three kinds of nodes that work together in concert. A management node, one or more interface nodes, and two or more storage nodes. The storage nodes are paired to manage up to 240 nodes in a storage pod. Individual interface or data nodes can be added or removed non-disruptively in the SONAS. The underlying technology, the General Parallel File System, has been doing storage federation since 1996 for some of the largest top 500 supercomputers in the world.
IBM Information Archive (IA)
For the IA, there are 1, 2 or 3 nodes, which manages a set of collections. A collection can either be file-based using industry-standard NAS protocols, or object-based using the popular System Storage™ Archive Manager (SSAM) interface. Normally, you have as many collections as you have nodes, but nodes are powerful enough to manage two collections to provide N-1 availability. This allows a node to be removed, and a new node added into the IA "colony", in a non-disruptive manner.
Even in an ant colony, there are only a few types of ants, with typically one queen, several males, and lots of workers. But all the ants are red. You don't see colonies that mix between different species of ants. For databases, federation was a way to avoid the much harder task of merging databases from different platforms. For storage, I am surprised people have latched on to the term "federation", given our mixed results in the other "federations" we have formed, which I have conveniently (IMHO) ranked from least effective to most effective:
The Union of Soviet Socialist Republics (USSR)
My father used to say, "If the Soviet Union were in charge of the Sahara desert, they would run out of sand in 50 years." The [Soviet Union] actually lasted 68 years, from 1922 to 1991.
The United Nations (UN)
After the previous League of Nations failed, the UN was formed in 1945 to facilitate cooperation in international law, international security, economic development, social progress, human rights, and the achieving of world peace by stopping wars between countries, and to provide a platform for dialogue.
The European Union (EU)
With the collapse of the Greek economy, and the [rapid growth of debt] in the UK, Spain and France, there are concerns that the EU might not last past 2020.
The United States of America (USA)
My own country is a federation of states, each with its own government. California's financial crisis was compared to the one in Greece. My own state of Arizona is under boycott from other states because of its recent [immigration law]. However, I think the US has managed better than the EU because it has evolved over the past 200 years.
The Organization of the Petroleum Exporting Countries [OPEC]
Technically, OPEC is not a federation of cooperating countries, but rather a cartel of competing countries that have agreed on total industry output of oil to increase individual members' profits. Note that it was a non-OPEC company, BP, that could not "control their output" in what has now become the worst oil spill in US history. OPEC was formed in 1960, and is expected to collapse sometime around 2030 when the world's oil reserves run out. Matt Savinar has a nice article on [Life After the Oil Crash].
United Federation of Planets
The [Federation] fictitiously described in the Star Trek series appears to work well, an optimistic view of what federations could become if you let them evolve long enough.
Given the mixed results with "federation", I think I will avoid using the term for storage, and stick to the original term "scale-out architecture".
Here I am, day 11 of a 17-day business trip, on my last leg of the trip this week, in Kuala Lumpur in Malaysia. I have been flooded with requests to give my take on EMC's latest re-interpretation of storage virtualization, VPLEX.
I'll leave it to my fellow IBM master inventor Barry Whyte to cover the detailed technical side-by-side comparison. Instead, I will focus on the business side of things, using Simon Sinek's Why-How-What sequence. Here is a [TED video] from Garr Reynold's post
[The importance of starting from Why].
Let's start with the problem we are trying to solve.
Problem: migration from old gear to new gear, old technology to new technology, from one vendor to another vendor, is disruptive, time-consuming and painful.
Given that IT storage is typically replaced every 3-5 years, then pretty much every company with an internal IT department has this problem, the exception being those companies that don't last that long, and those that use public cloud solutions. IT storage can be expensive, so companies would like their new purchases to be fully utilized on day 1, and be completely empty on day 1500 when the lease expires. I have spoken to clients who have spent 6-9 months planning for the replacement or removal of a storage array.
A solution to make the data migration non-disruptive would benefit the clients (make it easier for their IT staff to keep their data center modern and current) as well as the vendors (reduce the obstacle of selling and deploying new features and functions). Storage virtualization can be employed to help solve this problem. I define virtualization as "technology that makes one set of resources look and feel like a different set of resources, preferably with more desirable characteristics.". By making different storage resources, old and new, look and feel like a single type of resource, migration can be performed without disrupting applications.
Before VPLEX, here is a breakdown of each solution:
Non-disruptive tech refresh, and a unified platform to provide management and functionality across heterogeneous storage.
Non-disruptive tech refresh, and a unified platform to provide management and functionality between internal tier-1 HDS storage, and external tier-2 heterogeneous storage.
Non-disruptive tech refresh, with unified multi-pathing driver that allows host attachment of heterogeneous storage.
New in-band storage virtualization device
Add in-band storage virtualization to existing storage array
New out-of-band storage virtualization device with new "smart" SAN switches
SAN Volume Controller
HDS USP-V and USP-VM
For IBM, the motivation was clear: Protect customers existing investment in older storage arrays and introduce new IBM storage with a solution that allows both to be managed with a single set of interfaces and provide a common set of functionality, improving capacity utilization and availability. IBM SAN Volume Controller eliminated vendor lock-in, providing clients choice in multi-pathing driver, and allowing any-to-any migration and copy services. For example, IBM SVC can be used to help migrate data from an old HDS USP-V to a new HDS USP-V.
With EMC, however, the motivation appeared to protect software revenues from their PowerPath multi-pathing driver, TimeFinder and SRDF copy services. Back in 2005, when EMC Invista was first announced, these three software represented 60 percent of EMC's bottom-line profit. (Ok, I made that last part up, but you get my point! EMC charges a lot for these.)
Back in 2006, fellow blogger Chuck Hollis (EMC) suggested that SVC was just a [bump in the wire] which could not possibly improve performance of existing disk arrays. IBM showed clients that putting cache(SVC) in front of other cache(back end devices) does indeed improve performance, in the same way that multi-core processors successfully use L1/L2/L3 cache. Now, EMC is claiming their cache-based VPLEX improves performance of back-end disk. My how EMC's story has changed!
So now, EMC announces VPLEX, which sports a blend of SVC-like and Invista-like characteristics. Based on blogs, tweets and publicly available materials I found on EMC's website, I have been able to determine the following comparison table. (Of course, VPLEX is not yet generally available, so what is eventually delivered may differ.)
Scalable, 1 to 4 node-pairs
One size fits all, single pair of CPCs
SVC-like, 1 to 4 director-pairs
Works with any SAN switches or directors
Required special "smart" switches (vendor lock-in)
SVC-like, works with any SAN switches or directors
Broad selection of IBM Subsystem Device Driver (SDD) offered at no additional charge, as well as OS-native drivers Windows MPIO, AIX MPIO, Solaris MPxIO, HP-UX PV-Links, VMware MPP, Linux DM-MP, and comercial third-party driver Symantec DMP.
Limited selection, with focus on priced PowerPath driver
Invista-like, PowerPath and Windows MPIO
Read cache, and choice of fast-write or write-through cache, offering the ability to improve performance.
No cache, Split-Path architecture cracked open Fibre Channel packets in flight, delayed every IO by 20 nanoseconds, and redirected modified packets to the appropriate physical device.
SVC-like, Read and write-through cache, offering the ability to improve performance.
Space-Efficient Point-in-Time copies
SVC FlashCopy supports up to 256 space-efficient targets, copies of copies, read-only or writeable, and incremental persistent pairs.
Like Invista, No
Remote distance mirror
Choice of SVC Metro Mirror (synchronous up to 300km) and Global Mirror (asynchronous), or use the functionality of the back-end storage arrays
No native support, use functionality of back-end storage arrays, or purchase separate product called EMC RecoverPoint to cover this lack of functionality
Limited synchronous remote-distance mirror within VPLEX (up to 100km only), no native asynchronous support, use functionality of back-end storage arrays
Provides thin provisioning to devices that don't offer this natively
Like Invista, No
SVC Split-Cluster allows concurrent read/write access of data to be accessed from hosts at two different locations several miles apart
I don't think so
PLEX-Metro, similar in concept but implemented differently
Non-disruptive tech refresh
Can upgrade or replace storage arrays, SAN switches, and even the SVC nodes software AND hardware themselves, non-disruptively
Tech refresh for storage arrays, but not for Invista CPCs
Tech refresh of back end devices, and upgrade of VPLEX software, non-disruptively. Not clear if VPLEX engines themselves can be upgraded non-disruptively like the SVC.
Heterogeneous Storage Support
Broad support of over 140 different storage models from all major vendors, including all CLARiiON, Symmetrix and VMAX from EMC, and storage from many smaller startups you may not have heard of
Invista-like. VPLEX claims to support a variety of arrays from a variety of vendors, but as far as I can find, only DS8000 supported from the list of IBM devices. Fellow blogger Barry Burke (EMC) suggests [putting SVC between VPLEX and third party storage devices] to get the heterogeneous coverage most companies demand.
Back-end storage requirement
Must define quorum disks on any IBM or non-IBM back end storage array. SVC can run entirely on non-IBM storage arrays
HP SVSP-like, requires at least one EMC storage array to hold metadata
SVC 2145-CF8 model supports up to four solid-state drives (SSD) per node that can treated as managed disk to store end-user data
Invista-like. VPLEX has an internal 30GB SSD, but this is used only for operating system and logs, not for end-user data.
In-band virtualization solutions from IBM and HDS dominate the market. Being able to migrate data from old devices to new ones non-disruptively turned out to be only the [tip of the iceberg] of benefits from storage virtualization. In today's highly virtualized server environment, being able to non-disruptively migrate data comes in handy all the time. SVC is one of the best storage solutions for VMware, Hyper-V, XEN and PowerVM environments. EMC watched and learned in the shadows, taking notes of what people like about the SVC, and decided to follow IBM's time-tested leadership to provide a similar offering.
EMC re-invented the wheel, and it is round. On a scale from Invista (zero) to SVC (ten), I give EMC's new VPLEX a six.
Last week, I presented "An Introduction to Cloud Computing" for two hours to the local Institute of Management Accountants [IMA] for their Continuing Professional Education [CPE]. Since I present IBM's leadership in Cloud Storage offerings, I have had to become an expert in Cloud Computing overall. The audience was a mix of bookkeepers, accountants, auditors, comptrollers, CPAs, and accounting teachers.
Here is a sample of the questions I took during and after my presentation:
If I need to shut down host machine, I lose all my virtual machines as well?
No, it is possible to seemlessly move virtual machines from one host to another. If you need to shut down a host machine, move all the VMs to other hosts, then you can shut down the empty host without impacting business.
Does the SaaS provider have to build their own app, can they not buy an app and then rent it out?
Yes, but they won't have competitive differentiation, and the software development they buy from will want a big cut of the action. SaaS developers that build their own applications can keep more of the profits for themselves.
How do backups work in cloud computing? Do I have to contact someone at the cloud computing company to find the backup tape?
Large datacenters often keep the most recent backups on disk, and older versions on tape in automated tape libraries that can fetch your backup in less than 2 minutes. Because of this, there is no need to talk to anyone, you can schedule or invoke your own backups, and often perform the recovery yourself using self-service tools.
Last month, my sister tried to rent a car during the week the Tucson Gem Show, but they were out of cars she wanted to drive. Could this happen with Cloud Computing?
Not likely. With rental cars, the cars have to be physically in Tucson to rent them. Rental companies could have brought cars down from Phoenix to satisfy demand. With Cloud Computing, it is all accessible over the global network, you are not limited to the cloud providers nearest you.
Is there a reason why Amazon Web Services (AWS) charges more for a Windows image than a Linux image?
Yes, Amazon and Microsoft have a patent cross-licensing agreement where Amazon pays Microsoft for the priveledge of offering Windows-based images on their EC2 cloud infrastructure. It just makes business sense to pass those costs onto the consumer. Linux is a free open source operating system, and is often the better choice.
So if we rent a machine from Amazon, they send it to my accounting office? What exactly am I getting for 12 cents per hour?
No. The computer remains in their datacenter. You get a virtual machine that runs 1.2Ghz Intel processor, with 1700MB of RAM, and 160GB of hard disk space, with Windows operating system running on it, comparable to a machine you can get at the local BestBuy, but instead of it running in the next room, it is running in a datacenter somewhere else in the United States with electricity and air conditioning.
You access it remotely from your desktop or laptop PC.
Why would I ever rent more than one computer?
It depends on your workload. For example, Derek Gottfrid at the New York Times needed to convert 11 million articles from TIFF format to PDF format so that he could put them up on the web. This would have taken him months using a single computer, so he rented 100 computers and got the entire stack converted in 24 hours, for a cost of about $240. See the articles [Self-Service, Prorated, Super Computing] and [TimesMachine] for details.
What about throughput? Won't I need to run cables from my accounting office to this cloud computing data center?
You will need connectivity, most likely from connections provided by your local telephone or cable company, or through the Internet. Certainly, there can be cases where direct privately-owned fiber optic cables, known as "dark fiber", can directly connect consumers to local Cloud service providers, for added security.
What about medical records? Will Cloud Computing help the Healthcare industry?
Yes, hospitals are finding that digitizing their records greatly reduces costs. IBM offers the Grid Medical Archive Solution [GMAS] as a private cloud storage solution to store X-ray images and other electronic medical records on disk and tape, and these records can be accessed from multiple hospitals and clinics, wherever the doctor or patient happens to be.
The advantage of personal computers was individualization, I could put on my own choices of software, and customize my own settings, won't we lose this with Cloud Computing?
Yes, customized software and settings cost companies millions of dollars with help desk calls. Cloud Computing attempts to provide some standardization, reducing the amount of effort to support IT operations.
Won't putting all the computers into a big datacenter make them more vulnerable to hackers?
Security is a well-known concern, but this is being addressed with encryption, access control lists, multi-tenancy isolation, and VPN connections.
My daughter has a BlackBerry or iPod or something, and when we mentioned that someone in Phoenix wore a monkey suit to avoid photo-radar speed cameras, she was able to pull up a picture on her little hand-held thing, is this the future?
Yes, mobile phones and other hand-held devices now have internet access to take advantage of Cloud Computing services. People will be able to access the information they need from wherever they happen to be. (You can see the picture here: [Man Dons Mask for Speed-Camera Photos])
IBM offers a variety of Cloud Computing services, as well as customized solutions and integrated systems that can be deployed on-premises behind your corporate firewall. To learn more, go to [ibm.com/cloud].
The second speaker was local celebrity Dan Ryan presenting the financials for the upcoming [Rosemont Copper] mining operations. Copper is needed for emerging markets, such as hybrid vehicles and wind turbines. Copper is a major industry in Arizona.
This week I got a comment on my blog post [IBM Announces another SSD Disk offering!]. The exchange involved Solid State Disk storage inside the BladeCenter and System x server line. Sandeep offered his amazing performance results, but we have no way to get in contact with him. So, for those interested, I have posted on SlideShare.net a quick five-chart presentation on recent tests with various SSD offerings on the eX5 product line here:
Continuing my drawn out coverage of IBM's big storage launch of February 9, today I'll cover the IBM System Storage TS7680 ProtecTIER data deduplication gateway for System z.
On the host side, TS7680 connects to mainframe systems running z/OS or z/VM over FICON attachment, emulating an automated tape library with 3592-J1A devices. The TS7680 includes two controllers that emulate the 3592 C06 model, with 4 FICON ports each. Each controller emulates up to 128 virtual 3592 tape drives, for a total of 256 virtual drives per TS7680 system. The mainframe sees up to 1 million virtual tape cartridges, up to 100GB raw capacity each, before compression. For z/OS, the automated library has full SMS Tape and Integrated Library Management capability that you would expect.
Inside, the two control units are both connected to a redundant pair cluster of ProtecTIER engines running the HyperFactor deduplication algorithm that is able to process the deduplication inline, as data is ingested, rather than post-process that other deduplication solutions use. These engines are similar to the TS7650 gateway machines for distributed systems.
On the back end, these ProtecTIER deduplication engines are then connected to external disk, up to 1PB. If you get 25x data deduplication ratio on your data, that would be 25PB of mainframe data stored on only 1PB of physical disk. The disk can be any disk supported by ProtecTIER over FCP protocol, not just the IBM System Storage DS8000, but also the IBM DS4000, DS5000 or IBM XIV storage system, various models of EMC and HDS, and of course the IBM SAN Volume Controller (SVC) with all of its supported disk systems.
It's Tuesday, and that means more IBM announcements!
I haven't even finished blogging about all the other stuff that got announced last week, and here we are with more announcements. Since IBM's big [Pulse 2010 Conference] is next week, I thought I would cover this week's announcement on Tivoli Storage Manager (TSM) v6.2 release. Here are the highlights:
Client-Side Data Deduplication
This is sometimes referred to as "source-side" deduplication, as storage admins can get confused on which servers are clients in a TSM client-server deployment. The idea is to identify duplicates at the TSM client node, before sending to the TSM server. This is done at the block level, so even files that are similar but not identical, such as slight variations from a master copy, can benefit. The dedupe process is based on a shared index across all clients, and the TSM server, so if you have a file that is similar to a file on a different node, the duplicate blocks that are identical in both would be deduplicated.
This feature is available for both backup and archive data, and can also be useful for archives using the IBM System Storage Archive Manager (SSAM) v6.2 interface.
Simplified management of Server virtualization
TSM 6.2 improves its support of VMware guests by adding auto-discovery. Now, when you spontaneously create a new virtual machine OS guest image, you won't have to tell TSM, it will discover this automatically! TSM's legendary support of VMware Consolidated Backup (VCB) now eliminates the manual process of keeping track of guest images. TSM also added support of the Vstorage API for file level backup and recovery.
While IBM is the #1 reseller of VMware, we also support other forms of server virtualization. In this release, IBM adds support for Microsoft Hyper-V, including support using Microsoft's Volume Shadow Copy Services (VSS).
Automated Client Deployment
Do you have clients at all different levels of TSM backup-archive client code deployed all over the place? TSM v6.2 can upgrade these clients up to the latest client level automatically, using push technology, from any client running v5.4 and above. This can be scheduled so that only certain clients are upgraded at a time.
Simultaneous Background Tasks
The TSM server has many background administrative tasks:
Migration of data from one storage pool to another, based on policies, such as moving backups and archives on a disk pool over to a tape pools to make room for new incoming data.
Storage pool backup, typically data on a disk pool is copied to a tape pool to be kept off-site.
Copy active data. In TSM terminology, if you have multiple backup versions, the most recent version is called the active version, and the older versions are called inactive. TSM can copy just the active versions to a separate, smaller disk pool.
In previous releases, these were done one at a time, so it could make for a long service window. With TSM v6.2, these three tasks are now run simultaneously, in parallel, so that they all get done in less time, greatly reducing the server maintenance window, and freeing up tape drives for incoming backup and archive data. Often, the same file on a disk pool is going to be processed by two or more of these scheduled tasks, so it makes sense to read it once and do all the copies and migrations at one time while the data is in buffer memory.
Enhanced Security during Data Transmission
Previous releases of TSM offered secure in-flight transmission of data for Windows and AIX clients. This security uses Secure Socket Layer (SSL) with 256-bit AES encryption. With TSM v6.2, this feature is expanded to support Linux, HP-UX and Solaris.
Improved support for Enterprise Resource Planning (ERP) applications
I remember back when we used to call these TDPs (Tivoli Data Protectors). TSM for ERP allows backup of ERP applications, seemlessly integrating with database-specific tools like IBM DB2, Oracle RMAN, and SAP BR*Tools. This allows one-to-many and many-to-one configurations between SAP servers and TSM servers. In other words, you can have one SAP server backup to several TSM servers, or several SAP servers backup to a single TSM server. This is done by splitting up data bases into "sub-database objects", and then process each object separately. This can be extremely helpful if you have databases over 1TB in size. In the event that backing up an object fails and has to be re-started, it does not impact the backup of the other objects.
Continuing on the [IBM Storage Launch of February 9], John Sing has offered to write the following guest post about the [announcement] of IBM Scale Out Network Attached Storage [IBM SONAS]. John and I have known each other for a while, traveled the world to work with clients and speak at conferences. He is an Executive IT Consultant on the SONAS team.
Guest Post written by John Sing, IBM San Jose, California
What is IBM SONAS? It’s many things, so let’s start with this list:
It’s IBM’s delivery of a productized, pre-packaged Scale Out NAS global virtual file server, delivered in a easy-to-use appliance
IBM’s solution for large enterprise file-based storage requirements, where massive scale in capacity and extreme performance is required, especially for today’s modern analytics-based Competitive Advantage IT applications
Scales to many petabytes of usable storage and billions of files in a single global namespace
Provides integrated central management, central deployment of petabyte levels of storage
Modular commercial-off-the-shelf [COTS] building blocks. I/O, storage, network capacity scale independently of each other. Up to 30 interface nodes and 60 storage nodes, in an IBM General Parallel File System [GPFS]-based cluster. Each 10Gb CEE interface node port is capable of streaming at 900 MB/sec
Files are written in block-sized chunks, striped over as many multiple disk drives in parallel – aggregating throughput on a massive scale (both read and write), as well as providing auto-tuning, auto-balancing
Functionality delivered via one program product, IBM SONAS Software, which provides all of above functions, along with clustered CIFS, NFS v2/v3 with session auto-failover, FTP, high availability, and more
IBM SONAS makes automated tiered storage achievable and realistic at petabyte levels:
Integrated high performance parallel scan engine capable of identifying files at over 10 million files per minute per node
Integrated parallel data movement engine to physically relocate the data within tiered storage
And we’re just scratching the surface. IBM has plans to deploy additional protocols, storage hardware options, and software features.
However, the real question of interest should be, “who really needs that much storage capacity and throughput horsepower?”
The answer may surprise you. IMHO, the answer is: almost any modern enterprise that intends to stay competitive. Hmmm…… Consider this: the reason that IT exists today is no longer to simply save cost (that may have been true 10 years ago). Everyone is reducing cost… but how much competitive advantage is purchased through “let’s cut our IT budget by 10% this year”?
Notice that in today’s world, there are (many) bright people out there, changing our world every day through New Intelligence Competitive Advantage analytics-based IT applications such as real time GPS traffic data, real time energy monitoring and redirection, real time video feed with analytics, text analytics, entity analytics, real time stream computing, image recognition applications, HDTV video on demand, etc. Think of how GPS industry, cell phone / Twitter / Facebook, iPhone and iPad applications, as examples, are creating whole new industries and markets almost overnight.
Then start asking yourself, “What's behind these Competitive Advantage IT applications – as they are the ones that are driving all my storage growth? Why do they need so much storage? What do those applications mean for my storage requirements?”
To be “real-time”, long-held IT paradigms are being broken every day. Things like “data proximity”: we can no longer can extract terabytes of data from production databases and load them to a data warehouse – where’s the “real-time” in that? Instead, today’s modern analytics-based applications demand:
Multiple processes and servers (sometimes numbering in the 100s) simultaneously ….
Running against hundreds of terabytes of data of live production data, streaming in from expanding number of smarter sensors, input devices, users
Producing digital image-intensive results that must be programatically sent to an ever increasing number of mobile devices in geographically dispersed storage
Requiring parallel performance levels, that used to be the domain only of High Performance Computing (HPC)
This is a major paradigm shift in storage – and that is the solution and storage capabilities that IBM SONAS is designed to address. And of course, you should be able to save significant cost through the SONAS global virtual file server consolidation and virtualization as well.
Certainly, this topic warrants more discussion. If you found it interesting, contact me, your local IBM Business Partner or IBM Storage rep to discuss Competitive Advantage IT applications and SONAS further.
Wrapping up my coverage of the Data Center Conference 2009, the week ends with a celebration. This year we had six "Hospitality Suites" sponsored by various different vendors. Each suite has its own theme, decorations and entertainment. The first suite was VMware's "Cloud 9 Ultra Lounge" which offered blue cotton candy martinis. IBM is the leading reseller of VMware.
When the red martini liquid was poured on top of the blue cotton candy, the result was a nasty muddish brown grey color. The guy on the left chose to get the martini without the blue cotton candy. We joked that this is perhaps a good metaphor for cloud computing in general. It looks good on paper, until you actually put it all together and realize it does not look as blue and puffy as you were expecting. However, it tasted good!
Next suite was sponsored by Cisco, one of IBM's storage networking partners. Cisco also decorated in blue, as the guy Jake in the middle demonstrates.
Next suite was sponsored by Brocade, our supplier for IBM-branded networking gear. They went with a red-and-black color scheme. Sadly, many of my pictures inside involved straight jackets and unicycles, so not appropriate for this blog. However, it was easy to remember that they were talking about their "extraordinary networks". Makes you want to help out Brocade by contacting your nearest IBM storage sales rep and buy yourself a SAN768B or two.
Somewhere along the way, we picked up Hawaiian leis at the "Margaritaville" Hospitality Suite, compliments of sponsor APC by Schneider Electric. We had the best "Filet Mignon" appetizers at "Club Dedupe" by our competitor DataDomain, and some fun with my friends over at Computer Associates' "Top Gun" suite. Pictured at right are Paula Koziol with Christian Barrera from Argentina. A good time was had by all.
Well, it's Tuesday again, but this time, today we had our third big storage launch of 2009! A lot got announced today as part of IBM's big "Dynamic Infrastructure" marketing campaign. I will just focus on the
disk-related announcements today:
IBM System Storage DS8700
IBM adds a new model to its DS8000 series with the
[IBM System Storage DS8700]. Earlier this month, fellow blogger and arch-nemesis Barry Burke from EMC posted [R.I.P DS8300] on this mistaken assumption that the new DS8700 meant that DS8300 was going away, or that anyone who bought a DS8300 recently would be out of luck. Obviously, I could not respond until today's announcement, as the last thing I want to do is lose my job disclosing confidential information. BarryB is wrong on both counts:
IBM will continue to sell the DS8100 and DS8300, in addition to the new DS8700.
Clients can upgrade their existing DS8100 or DS8300 systems to DS8700.
BarryB's latest post [What's In a Name - DS8700] is fair game, given all the fun and ridicule everyone had at his expense over EMC's "V-Max" name.
So the DS8700 is new hardware with only 4 percent new software. On the hardware side, it uses faster POWER6 processors instead of POWER5+, has faster PCI-e buses instead of the RIO-G loops, and faster four-port device adapters (DAs) for added bandwidth between cache and drives. The DS8700 can be ordered as a single-frame dual 2-way that supports up to 128 drives and 128GB of cache, or as a dual 4-way, consisting of one primary frame, and up to four expansion frames, with up to 384GB of cache and 1024 drives.
Not mentioned explicitly in the announcements were the things the DS8700 does not support:
ESCON attachment - Now that FICON is well-established for the mainframe market, there is no need to support the slower, bulkier ESCON options. This greatly reduced testing effort. The 2-way DS8700 can support up to 16 four-port FICON/FCP host adapters, and the 4-way can support up to 32 host adapters, for a maximum of 128 ports. The FICON/FCP host adapter ports can auto-negotiate between 4Gbps, 2Gbps and 1Gbps as needed.
LPAR mode - When IBM and HDS introduced LPAR mode back in 2004, it sounded like a great idea the engineers came up with. Most other major vendors followed our lead to offer similar "partitioning". However, it turned out to be what we call in the storage biz a "selling apple" not a "buying apple". In other words, something the salesman can offer as a differentiating feature, but that few clients actually use. It turned out that supporting both LPAR and non-LPAR modes merely doubled the testing effort, so IBM got rid of it for the DS8700.
Update: I have been reminded that both IBM and HDS delivered LPAR mode within a month of each other back in 2004, so it was wrong for me to imply that HDS followed IBM's lead when obviously development happened in both companies for the most part concurrently prior to that. EMC was late to the "partition" party, but who's keeping track?
Initial performance tests show up to 50 percent improvement for random workloads, and up to 150 percent improvement for sequential workloads, and up to 60 percent improvement in background data movement for FlashCopy functions. The results varied slightly between Fixed Block (FB) LUNs and Count-Key-Data (CKD) volumes, and I hope to see some SPC-1 and SPC-2 benchmark numbers published soon.
The DS8700 is compatible for Metro Mirror, Global Mirror, and Metro/Global Mirror with the rest of the DS8000 series, as well as the ESS model 750, ESS model 800 and DS6000 series.
New 600GB FC and FDE drives
IBM now offers [600GB drives] for the DS4700 and DS5020 disk systems, as well as the EXP520 and EXP810 expansion drawers. In each case, we are able to pack up to 16 drives into a 3U enclosure.
Personally, I think the DS5020 should have been given a DS4xxx designation, as it resembles the DS4700
more than the other models of the DS5000 series. Back in 2006-2007, I was the marketing strategist for IBM System Storage product line, and part of my job involved all of the meetings to name or rename products. Mostly I gave reasons why products should NOT be renamed, and why it was important to name the products correctly at the beginning.
IBM System Storage SAN Volume Controller hardware and software
Fellow IBM master inventory Barry Whyte has been covering the latest on the [SVC 2145-CF8 hardware]. IBM put out a press release last week on this, and today is the formal announcement with prices and details. Barry's latest post
[SVC CF8 hardware and SSD in depth] covers just part of the entire
The other part of the announcement was the [SVC 5.1 software] which can be loaded
on earlier SVC models 8F2, 8F4, and 8G4 to gain better performance and functionality.
To avoid confusion on what is hardware machine type/model (2145-CF8 or 2145-8A4) and what is software program (5639-VC5 or 5639-VW2), IBM has introduced two new [Solution Offering Identifiers]:
5465-028 Standard SAN Volume Controller
5465-029 Entry Edition SAN Volume Controller
The latter is designed for smaller deployments, supports only a single SVC node-pair managing up to
150 disk drives, available in Raven Black or Flamingo Pink.
EXN3000 and EXP5060 Expansion Drawers
IBM offers the [EXN3000 for the IBM N series]. These expansion drawers can pack 24 drives in a 4U enclosure. The drives can either be all-SAS, or all-SATA, supporting 300GB, 450GB, 500GB and 1TB size capacity drives.
The [EXP5060 for the IBM DS5000 series] is a high-density expansion drawer that can pack up to 60 drives into a 4U enclosure. A DS5100 or DS5300
can handle up to eight of these expansion drawers, for a total of 480 drives.
Pre-installed with Tivoli Storage Productivity Center Basic Edition. Basic Edition can be upgraded with license keys to support Data, Disk and Standard Edition to extend support and functionality to report and manage XIV, N series, and non-IBM disk systems.
Pre-installed with Tivoli Key Lifecycle Manager (TKLM). This can be used to manage the Full Disk Encryption (FDE) encryption-capable disk drives in the DS8000 and DS5000, as well as LTO and TS1100 series tape drives.
IBM Tivoli Storage FlashCopy Manager v2.1
The [IBM Tivoli Storage FlashCopy Manager V2.1] replaces two products in one. IBM used
to offer IBM Tivoli Storage Manager for Copy Services (TSM for CS) that protected Windows application data, and IBM Tivoli Storage Manager for Advanced Copy Services (TSM for ACS) that protected AIX application data.
The new product has some excellent advantages. FlashCopy Manager offers application-aware backup of LUNs containing SAP, Oracle, DB2, SQL server and Microsoft Exchange data. It can support IBM DS8000, SVC and XIV point-in-time copy functions, as well as the Volume Shadow Copy Services (VSS) interfaces of the IBM DS5000, DS4000 and DS3000 series disk systems. It is priced by the amount of TB you copy, not on the speed or number of CPU processors inside the server.
Don't let the name fool you. IBM FlashCopy Manager does not require that you use Tivoli Storage Manager (TSM) as your backup product. You can run IBM FlashCopy Manager on its own, and it will manage your FlashCopy target versions on disk, and these can be backed up to tape or another disk using any backup product. However, if you are lucky enough to also be using TSM, then there is optional integration that allows TSM to manage the target copies, move them to tape, inventory them in its DB2 database, and provide complete reporting.
Yup, that's a lot to announce in one day. And this was just the disk-related portion of the launch!
I saw this as an opportunity to promote the new IBM Tivoli Storage Manager v6.1 which offers a variety of new scalability features, and continues to provide excellent economies of scale for large deployments, in my post [IBM has scalable backup solutions].
"So does TSM scale? Sure! Just add more servers. But this is not an economy of scale. Nothing gets less expensive as the capacity grows. You get a more or less linear growth of costs that is directly correlated to the growth of primary storage capacity. (Technically, it costs will jump at regular and predictable intervals, by regular and predictable and equal amounts, as you add TSM servers to the infrastructure--but on average it is a direct linear growth. Assuming you are right sized right now, if you were to double your primary storage capacity, you would double the size of the TSM infrastructure, and double your associated costs.)"
I talked about inaccurate vendor FUD in my post [The murals in restaurants], and recently, I saw StorageBod's piece, [FUDdy Waters]. So what would "economies of scale" look like? Using Scott's own words:
Without Economies of Scale
"If it costs you $5 to backup a given amount of data, it probably costs you $50 to back up 10 times that amount of data, and $500 to back up 100 times that amount of data."
With Economies of Scalee
"If anybody can figure out how to get costs down to $40 for 10 times the amount of data, and $300 for 100 times the amount of data, they will have an irrefutable advantage over anybody that has not been able to leverage economies of scale."
So, let's do some simple examples. I'll focus on a backup solution just for employee workstations, each employee has 100GB of personal data to backup on their laptop or PC. We'll look at a one-person company, a ten-person company, and a hundred-person company.
Case 1: The one-person company
Here the sole owner needs a backup solution. Here are all the steps she might perform:
Spend hours of time evaluating different backup products available, and make sure her operating system, file system and applications are supported
Spend hours shopping for external media, this could be an external USB disk drive, optical DVD drive, or tape drive, and confirm it is supported by the selected backup software.
Purchase the backup software, external drive, and if optical or tape, blank media cartridges.
Spend time learning the product, purchase "Backup for Dummies" or similar book, and/or taking a training class.
Install and configure the software
Operate the software, or set it up to run automatically, and take the media offsite at the end of the day, and back each morning
Case 2: The ten-person company
I guess if each of the ten employees went off and performed all of the same steps as above, there would be no economies of scale.
Fortunately, co-workers are amazingly efficient in avoiding unnecessary work.
Rather than have all ten people evaluate backup solutions, have one person do it. If everyone runs the same or similar operating system, file systems and applications, this can be done about the same as the one-person case.
Ditto on the storage media. Why should 10 people go off and evaluate their own storage media. One person can do it for all ten people in about the same time as it takes for one person.
Purchasing the software and hardware. Ok, here is where some costs may be linear, depending on your choices. Some software vendors give bulk discounts, so purchasing 10 seats of the same software could be less than 10 times the cost of one license. As for storage hardware, it might be possible to share drives and even media. Perhaps one or two storage systems can be shared by the entire team.
For a lot of backup software, most of the work is in the initial set up, then it runs automatically afterwards. That is the case for TSM. You create a "dsm.opt" file, and it can list all of the include/exclude files and other rules and policies. Once the first person sets this up, they share it with their co-workers.
Hopefully, if storage hardware was consolidated, such that you have fewer drives than people, you can probably have fewer people responsible for operations. For example, let's have the first five employees sharing one drive managed by Joe, and the second five employees sharing a second drive managed by Sally. Only two people need to spend time taking media offsite, bringing it back and so on.
Case 3: The hundred-person company
Again, it is possible that a hundred-person company consists of 10 departments of 10 people each, and they all follow the above approach independently, resulting in no economies of scale. But again, that is not likely.
Here one or a few people can invest time to evaluate backup solutions. Certainly far less than 100 times the effort for a one-person company.
Same with storage media. With 100 employees, you can now invest in a tape library with robotic automation.
Purchase of software and hardware. Again, discounts will probably apply for large deployments. Purchasing 1 tape library for all one hundred people is less than 10 times the cost and effort of 10 departments all making independent purchases.
With a hundred employees, you may have some differences in operating system, file systems and applications. Still, this might mean two to five versions of dsm.opt, and not 10 or 100 independent configurations.
Operations is where the big savings happen. TSM has "progressive incremental backup" so it only backs up changed data. Other backup schemes involve taking period full backups which tie up the network and consume a lot of back end resources. In head-to-head comparisons between IBM Tivoli Storage Manager and Symantec's NetBackup, IBM TSM was shown to use significantly less network LAN bandwidth, less disk storage capacity, and fewer tape cartridges than NetBackup.
The savings are even greater with data deduplication. Either using hardware, like IBM TS76750 ProtecTIER data deduplication solution, or software like the data deduplication capability built-in with IBM TSM v6.1, you can take advantage of the fact that 100 employees might have a lot of common data between them.
So, I have demonstrated how savings through economies of scale are achieved using IBM Tivoli Storage Manager. Adding one more person in each case is cheaper than the first person. The situation is not linear as Scott suggests. But what about larger deployments? IBM TS3500 Tape Library can hold one PB of data in only 10 square feet of data center floorspace. The IBM TS7650G gateway can manage up to 1 PB of disk, holding as much as 25 PB of backup copies. IT Analysts Tony Palmer, Brian Garrett and Lauren Whitehouse from Enterprise Strategy Group tried IBM TSM v6.1 out for themselves and wrote up a ["Lab Validation"] report. Here is an excerpt:
"Backup/recovery software that embeds data reduction technology can address all three of these factors handily. IBM TSM 6.1 now has native deduplication capabilities built into its Extended Edition (EE) as a no-cost option. After data is written to the primary disk pool, a deduplication operation can be scheduled to eliminate redundancy at the sub-file level. Data deduplication, as its name implies, identifies and eliminates redundant data.
TSM 6.1 also includes features that optimize TSM scalability and manageability to meet increasingly demanding service levels resulting from relentless data growth. The move from a proprietary back-end database to IBM DB2 improves scalability, availability, and performance without adding complexity; the DB2 database is automatically maintained and managed by TSM. IBM upgraded the monitoring and reporting capabilities to near real-time and completely redesigned the dashboard that provides visibility into the system. TSM and TSM EE include these enhanced monitoring and reporting capabilities at no cost."
The majority of Fortune 1000 customers use IBM Tivoli Storage Manager, and it is the backup software that IBM uses itself in its own huge data centers, including the cloud computing facilities. In combination with IBM Tivoli FastBack for remote office/branch office (ROBO) situations, and complemented with point-in-time and disk mirroring hardware capabilities such as IBM FlashCopy, Metro Mirror, and Global Mirror, IBM Tivoli Storage Manager can be an effective, scalable part of a complete Unified Recovery Management solution.
This week, scientists at IBM Research and the California Institute of
Technology announced a scientific advancement that could be a major
breakthrough in enabling the semiconductor industry to pack more power
and speed into tiny computer chips, while making them more energy
efficient and less expensive to manufacture. IBM is a leader in
solid-state technology, and this scientific breakthrough shows promise.
But first, a discussion of how solid-state chips are made in the first place. Basically, a round thin wafer is etched using [photolithography]
with lots of tiny transistor circuits. The same chip is repeated over
and over on a single wafer, and once the wafer is complete, it is
chopped up into little individual squares. Wikipedia has a nice article
on [semiconductor device fabrication], but I found this
[YouTube video] more illuminating.
Up until now, the industry was able to get features down to 22
nanometers, and were hitting physical limitations to get down to
anything smaller. The new development from IBM and Caltech is to use
self-assembling DNA strands, folded into specific shapes using other
strands that act as staples, and then using these folded structures as
scaffolding to place in nanotubes. The result? Features as small as 6 nanometers. How cool is that?
While NAND Flash Solid-State Drives are available today, this new
technique can help develop newer, better technologies like Phase Change
Continuing my week in Chicago, for the IBM Storage Symposium 2008, we had sessions that focused on individual products. IBM System Storage SAN Volume Controller (SVC) was a popular topic.
SVC - Everything you wanted to know, but were afraid to ask!
Bill Wiegand, IBM ATS, who has been working with SAN Volume Controller since it was first introduced in 2003. answered some frequently asked questions about IBM System Storage SAN Volume Controller.
Do you have to upgrade all of your HBAs, switches and disk arrays to the recommended firmware levels before upgrading SVC? No. These are recommended levels, but not required. If you do plan to update firmware levels, focus on the host end first, switches next, and disk arrays last.
How do we request special support for stuff not yet listed on the Interop Matrix?
Submit an RPQ/SCORE, same as for any other IBM hardware.
How do we sign up for SVC hints and tips? Go to the IBM
[SVC Support Site] and select the "My Notifications" under the "Stay Informed" box on the right panel.
When we call IBM for SVC support, do we select "Hardware" or "Software"?
While the SVC is a piece of hardware, there are very few mechanical parts involved. Unless there are sparks,
smoke, or front bezel buttons dangling from springs, select "Software". Most of the questions are
related to the software components of SVC.
When we have SVC virtualizing non-IBM disk arrays, who should we call first?
IBM has world-renown service, with some of IT's smartest people working the queues. All of the major storage vendors play nice
as part of the [TSAnet Agreement when a mutual customer is impacted.
When in doubt, call IBM first, and if necessary, IBM will contact other vendors on your behalf to resolve.
What is the difference between livedump and a Full System Dump?
Most problems can be resolved with a livedump. While not complete information, it is generally enough,
and is completely non-disruptive. Other times, the full state of the machine is required, so a Full System Dump
is requested. This involves rebooting one of the two nodes, so virtual disks may temporarily run slower on that
What does "svc_snap -c" do?The "svc_snap" command on the CLI generates a snap file, which includes the cluster error log and trace files from all nodes. The "-c" parameter includes the configuration and virtual-to-physical mapping that can be useful for
disaster recovery and problem determination.
I just sent IBM a check to upgrade my TB-based license on my SVC, how long should I wait for IBM to send me a software license key?
IBM trusts its clients. No software license key will be sent. Once the check clears, you are good to go.
During migration from old disk arrays to new disk arrays, I will temporarily have 79TB more disk under SVC management, do I need to get a temporary TB-based license upgrade during the brief migration period?
Nope. Again, we trust you. However, if you are concerned about this at all, contact IBM and they will print out
a nice "Conformance Letter" in case you need to show your boss.
How should I maintain my Windows-based SVC Master Console or SSPC server?
Treat this like any other Windows-based server in your shop, install Microsoft-recommended Windows updates,
run Anti-virus scans, and so on.
Where can I find useful "How To" information on SVC?
Specify "SAN Volume Controller" in the search field of the
[IBM Redbooks vast library of helpful books.
I just added more managed disks to my managed disk group (MDG), can I get help writing a script to redistribute the extents to improve wide-striping performance?
Yes, IBM has scripting tools available for download on
[AlphaWorks]. For example, svctools will take
the output of the "lsinfo" command, and generate the appropriate SVC CLI to re-migrate the disks around to optimize
performance. Of course, if you prefer, you can use IBM Tivoli Storage Productivity Center instead for a more
Any rules of thumb for sizing SVC deployments?
IBM's Disk Magic tool includes support for SVC deployments. Plan for 250 IOPS/TB for light workloads,
500 IOPS/TB for average workloads, and 750 IOPS/TB for heavy workloads.
Can I migrate virtual disks from one manage disk group (MDG) to another of different extent size?
Yes, the new Vdisk Mirroring capability can be used to do this. Create the mirror for your Vdisk between the
two MDGs, wait for the copy to complete, and then split the mirror.
Can I add or replace SVC nodes non-disruptively? Absolutely, see the Technotes
[SVC Node Replacement page.
Can I really order an SVC EE in Flamingo Pink? Yes. While my blog post that started all
this [Pink It and Shrink It] was initially just some Photoshop humor, the IBM product manager for SVC accepted this color choice as an RPQ option.
The default color remains Raven Black.
The focus on square footage resulted in higher density. This reminds me of the classicIBM commercial ["The Heist"] where Gil panics that the roomful of servers are missing, and Ned explains that it was all consolidated ontoa single IBM server.
I suspect few people picked up on the fact that the acronym for["new enterprise datacenter"] spells "Ned", ourdonut-eating hero in these series of videos.
Costs in the data center are proportional to power usage rather than space.
Power efficiency is more of a behavior problem than it is a technology problem.
This is definitely a step in the right direction. Both servers and storage systems consume a large portionof the energy on the data center floor. IBM Tivoli Usage and Accounting Manager can includeenergy consumption as part of the chargeback calculations.
Well, we had another successful event in Second Life today.
Unlike our April 26 launch of our System Storage products for IBM Business Partners only, this time we decided this time to make it as a "Meet the Storage Experts" Q&A Panel format, and open up registration to everyone. Thesubject matter experts sat at the front of the room on four stools. We had six rows of chairs arrangedsemi-circularly.
Shown above, from left to right, are the avatars of our four experts:
IBM System Storage N series, focusing on recent N3000 disk system announcements
Harold Pike (holding the microphone while speaking)
IBM System Storage DS3000 and DS4000 series, focusing on recent DS3000 disk system announcements
IBM System Storage TS series, focusing on recent TS2230, TS3400 and TS7700 tape system announcements
IBM storage networking, focusing on recent IBM SAN256B director blade announcements
While Eric was a veteran Second Lifer, having presented at our April event, the other three were trainedon how to raise their hand, speak into the microphone, sit on the stool, and so on. I want to thank allof our experts for putting in this effort!
The event was produced by Katrina H Smith. She did a great job, and made sure we were on top ofall the issues and tasks required to get the job done. Running a Second Life event is every bit ashard as running a real face-to-face event. We had several meetings to discuss venue details, placementof chairs, placement of product demos, audio/video recording, wall decorations, tee-shirt and coffee mug design, logistics, and so on.
I acted as moderator/emcee for the event. That is my back in the picture above. The process wassimple, modeled after the "Birds of a Feather" sessions at events like SHARE and the IBMStorage and Storage Networking Symposium. We threw out a list of topics the experts would cover,and people in the audience would "raise their left hand". I, as the moderator, would then walkover to each person, and hold out the microphone for them to ask the question. I would then repeat the question and ask the appropriate expert to provide an answer. We defined gestures onhow to "raise hand" and "put hand down" that we gave to each registered participant.
We had four dedicated "camera-avatars" in world to capture both video and screenshots.Our video editors are now working to edit "highlight videos" that we can use at future events, for training materials, and for our internal "BlueTube" online video system.
The room was filled with examples of each of our products, made into 3D objects that were dimensionallycorrect, and "textured" with photographs of the actual products. If you click on an object, you get a "notecard" that provided more information. Special thanks to Scott Bissmeyer for making all of theseobjects for us.
We made posters of each expert and placed them in all four corners of the room. On the bottom of each coffee mug was a picture of each of the experts, and if you walked under each of the posters, you were"dispensed" a coffee mug matching the expert shown in the poster.Participants could "Collect all Four!" When you bring the coffee mug up to takea sip, the picture on the bottom of the mug is exposed for all to see.And as a final give-away to the audience, we made a variety of event tee-shirts and polo-shirts.
At the end of the session, we asked everyone to click on the "Survey" kiosk near the exit door. We askedsix simple questions using SurveyMonkey.com that took only a fewminutes to process. We found asking questions immediately at the end of the event was the best way tocapture this feedback.
From a "Green" perspective, we had people registered from the following countries: US, India, Mexico,Australia, United Kingdom, Brazil, Germany, Argentina, Chile, China, Canada, and Venezuela. Second Lifeallows all these people who probably could not travel, or could not afford the time and expense to travel,to participate in a simulated face-to-face meeting without energy consumption of traditional travel methods.
More importantly, we got several leads for business. People often ask "Yes, but is there any businessassociated with this?" This time, there was, based on the answers to the questions, several avatars asked for a real sales call to follow-up on the products and offerings they were discussed.
With such a great success, we have already scheduled our next Second Life event, November 8. Mark your calendars! I'll postmore details on the registration process of the November event when available.
My how time flies! It has been nearly a year since our new Tucson Executive Briefing Center had its [Ribbon Cutting Ceremony].
To celebrate this achievement, IBM asked me to write and direct a short film to remind everyone we are here to help clients solve problems, determine an appropriate strategy and make solid purchase decisions.
I have produced other videos for IBM. See my October 2013 blog post [Incorporating Videos] for other examples. This was my first time as writer/director for a project.
This video won't win any Oscars, but I would still like to thank the Academy, my colleagues IBM VP Calline Sanchez, Lee Olguin, Joe Hayward and Kris Keller agreeing to be filmed on camera. Behind the scenes, I want to thank IBM Fellow John Cohn for his superb narration, Andrew Greenfield as cinematographer and editor, Shelly Jost as creative consultant selecting the musical tracks, and Denise White for reviewing the screenplay. Finally, I want to thank our producer, Bill Terry, for funding this effort.
What do you think? Will it go viral? Enter your comments below!
I am not stranger to the Sugar learning platform, developed as part of the One Laptop per Child [OLPC] project.
As I mentioned in my post [Helping Young Students - part 1], I was part of the OLPC development team back in 2008, helped local volunteers deploy laptops to children in Nepal and Uruguay, mentored a college student in India, and learned a lot of Python programming language in the process.
Sugar is now developed by Sugar Labs, a nonprofit spin-off of OLPC. The code is free and open source desktop environment for many other machines, including as a "Desktop Environment" for Fedora Linux.
I kept my 40GB hard drive partitioned as follows. On the extended partition, sda5 will hold my system utilities, like Clonezilla and SystemRescue, and sda6 is my swap space, increased to 1500MB. Partition sda1 has Edubuntu 12.04 on it, and I will use sda2 to install Fedora with Sugar.
[Sugar-on-a-stick], is so named because it is designed so that each child has their own LiveUSB. This can run on PC with Windows or Mac OS without affecting those operating systems, allowing a child to use Sugar in the classroom, then take the stick home and continue on their home PC.
A 2GB or greater USB memory stick can hold both Fedora and Sugar, and use that to boot your desktop. Unfortunately, it requires 1GB of RAM, and I have only 512MB.
Can I just run Sugar natively on a Fedora install? Yes, thanks to the [Sugar not "on a stick"] instructions, I can install Fedora first, then just:
Fedora Desktop Edition - this is a LiveCD that requires 1GB RAM.
Fedora Network Install - this is a bootable CD that then uses the Internet to download the rest of the files required. Use this if you (a) have a fast Internet connection, or (b) do not have a DVD drive on your system.
Fedora Install DVD - this has all the software on the DVD itself.
I chose method 3 and downloaded the appropriate ISO file. While F17 only requires 512MB of RAM to run, the graphic installer requires 768MB, and is fully explained in this [29-step F17 installation guide].
To get around this, select "Troubleshooting" which then lets you select low-graphics/text mode installation that ran well under 512MB. I installed both LXDE and Sugar, and everything worked fine!
Why both LXDE and Sugar? Well, Sugar is quite a different environment, and I wanted LXDE as an alternative for the admin and teacher to use.
"Unlike most other desktop environments, Sugar does not use the 'desktop', 'folder' and 'window' metaphors. Instead, Sugar's default full-screen activities require users to focus on only one program at a time. Sugar implements a novel file-handling metaphor (the Journal), which automatically saves the user's running program session and allows him or her to later use an interface to pull up their past works by date, activity used or file type."
Fedora Upgrader tool (FedUp) command line interface
Fedora upgrade script
As you can probably guess from the title of this post, I chose method 2 "FedUp" as it seemed to be the least invasive. I was unsure if method-1 "Clean Install" of F18 would work with 512MB of RAM, and I have been through enough horrors of failed yum upgrades on my own Red Hat Enterprise Linux [RHEL] at work to avoid method 3. Method 4 is just a script to automate the steps of method 3.
The steps are fairly straightforward. First, install the FedUp package, run "yum update" to ensure you have all the latest kernel and F17 packages for everything else, and reboot.
Then run the fedup-cli command, which upgrades all the packages to F18 level and creates a special kernel level that will then finish the install after the second reboot. It took a while, so I let it run unattended. I put the debug log on partition sda5 in case anything went wrong.
What could go wrong? Well, it turns out that fedup works by updating the Grub2 boot loader configuration, but my grub2 resides on sda1 partition instead, owned by my existing Edubuntu. The reboot did not give me the option to run the specialized kernel to finish the process.
Fixing this was a hot mess, but I managed to configure Grub2 on Fedora, and complete the upgrade and get everything working as before. However, even though it just came out last year, [F18 version is already out of support]! This means I get a second chance to do FedUp, this time to F19 release. Oh boy! Fun!
While the second time went smoother, the problem was that F19 doesn't seem to run well in 512MB of RAM, and chances are F20 won't either.
So what have I learned from this?
Fedora is fully supported, has been around over 10 years, with a vibrant and helpful community.
Sugar is designed for kids, so adding a traditional desktop environment like XFCE or LXDE can be useful for administrator or teacher.
Offering multiple Linux versions in a dual-boot or triple-boot approach may complicate the Grub2 loader configuration and maintenance.
Fedora's "rolling upgrade" approach means that someone will need to consider upgrading to later versions at least every school year or semester to maintain support. Running fedup-cli or any of the other upgrade methods may be too complicated for your average teacher.
If you have any experience with Fedora or Sugar in the classroom, comment below!