Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Systems Client Experience Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
For the longest time, people thought that humans could not run a mile in less than four minutes. Then, in 1954, [Sir Roger Bannister] beat that perception, and shortly thereafter, once he showed it was possible, many other runners were able to achieve this also. The same is being said now about the IBM Watson computer which appeared this week against two human contestants on Jeopardy!
(2014 Update: A lot has happened since I originally wrote this blog post! I intended this as a fun project for college students to work on during their summer break. However, IBM is concerned that some businesses might be led to believe they could simply stand up their own systems based entirely on open source and internally developed code for business use. IBM recommends instead the [IBM InfoSphere BigInsights] which packages much of the software described below. IBM has also launched a new "Watson Group" that has [Watson-as-a-Service] capabilities in the Cloud. To raise awareness to these developments, IBM has asked me to rename this post from IBM Watson - How to build your own "Watson Jr." in your basement to the new title IBM Watson -- How to replicate Watson hardware and systems design for your own use in your basement. I also took this opportunity to improve the formatting layout.)
Often, when a company demonstrates new techology, these are prototypes not yet ready for commercial deployment until several years later. IBM Watson, however, was made mostly from commercially available hardware, software and information resources. As several have noted, the 1TB of data used to search for answers could fit on a single USB drive that you buy at your local computer store.
Take a look at the [IBM Research Team] to determine how the project was organized. Let's decide what we need, and what we don't in our version for personal use:
Do we need it for personal use?
Yes, That's you. Assuming this is a one-person project, you will act as Team Lead.
Yes, I hope you know computer programming!
No, since this version for personal use won't be appearing on Jeopardy, we won't need strategy on wager amounts for the Daily Double, or what clues to pick next. Let's focus merely on a computer that can accept a question in text, and provide an answer back, in text.
Yes, this team focused on how to wire all the hardware together. We need to do that, although this version for personal use will have fewer components.
Optional. For now, let's have this version for personal use just return its answer in plain text. Consider this Extra Credit after you get the rest of the system working. Consider using [eSpeak], [FreeTTS], or the Modular Architecture for Research on speech sYnthesis [MARY] Text-to-Speech synthesizers.
Yes, I will explain what this is, and why you need it.
Yes, we will need to get information for personal use to process
Yes, this team developed a system for parsing the question being asked, and to attach meaning to the different words involved.
No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.
(Disclaimer: As with any Do-It-Yourself (DIY) project, I am not responsible if you are not happy with your version for personal use I am basing the approach on what I read from publicly available sources, and my work in Linux, supercomputers, XIV, and SONAS. For our purposes, this version for personal use is based entirely on commodity hardware, open source software, and publicly available sources of information. Your implementation will certainly not be as fast or as clever as the IBM Watson you saw on television.)
Step 1: Buy the Hardware
Supercomputers are built as a cluster of identical compute servers lashed together by a network. You will be installing Linux on them, so if you can avoid paying extra for Microsoft Windows, that would save you some money. Here is your shopping list:
Three x86 hosts, with the following:
64-bit quad-core processor, either Intel-VT or AMD-V capable,
8GB of DRAM, or larger
300GB of hard disk, or larger
CD or DVD Read/Write drive
Computer Monitor, mouse and keyboard
Ethernet 1GbE 4-port hub, and appropriate RJ45 cables
Surge protector and Power strip
Local Console Monitor (LCM) 4-port switch (formerly known as a KVM switch) and appropriate cables. This is optional, but will make it easier during the development. Once your implementation is operational, you will only need the monitor and keyboard attached to one machine. The other two machines can remain "headless" servers.
Step 2: Establish Networking
IBM Watson used Juniper switches running at 10Gbps Ethernet (10GbE) speeds, but was not connected to the Internet while playing Jeopardy! Instead, these Ethernet links were for the POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes.
The implementation will be able to run "disconnected from the Internet" as well. However, you will need Internet access to download the code and information sources. For our purposes, 1GbE should be sufficient. Connect your Ethernet hub to your DSL or Cable modem. Connect all three hosts to the Ethernet switch. Connect your keyboard, video monitor and mouse to the LCM, and connect the LCM to the three hosts.
Step 3: Install Linux and Middleware
To say I use Linux on a daily basis is an understatement. Linux runs on my Android-based cell phone, my laptop at work, my personal computers at home, most of our IBM storage devices from SAN Volume Controller to XIV to SONAS, and even on my Tivo at home which recorded my televised episodes of Jeopardy!
For this project, you can use any modern Linux distribution that supports KVM. IBM Watson used Novel SUSE Linux Enterprise Server [SLES 11]. Alternatively, I can also recommend either Red Hat Enterprise Linux [RHEL 6] or Canonical [Ubuntu v10]. Each distribution of Linux comes in different orientations. Download the the 64-bit "ISO" files for each version, and burn them to CDs.
Graphical User Interface (GUI) oriented, often referred to as "Desktop" or "HPC-Head"
Command Line Interface (CLI) oriented, often referred to as "Server" or "HPC-Compute"
Guest OS oriented, to run in a Hypervisor such as KVM, Xen, or VMware. Novell calls theirs "Just Enough Operating System" [JeOS].
For this version for personal use, I have chosen a [multitier architecture], sometimes referred to as an "n-tier" or "client/server" architecture.
Host 1 - Presentation Server
For the Human-Computer Interface [HCI], the IBM Watson received categories and clues as text files via TCP/IP, had a [beautiful avatar] representing a planet with 42 circles streaking across in orbit, and text-to-speech synthesizer to respond in a computerized voice. Your implementation will not be this sophisticated. Instead, we will have a simple text-based Query Panel web interface accessible from a browser like Mozilla Firefox.
Host 1 will be your Presentation Server, the connection to your keyboard, video monitor and mouse. Install the "Desktop" or "HPC Head Node" version of Linux. Install [Apache Web Server and Tomcat] to run the Query Panel. Host 1 will also be your "programming" host. Install the [Java SDK] and the [Eclipse IDE for Java Developers]. If you always wanted to learn Java, now is your chance. There are plenty of books on Java if that is not the language you normally write code.
While three little systems doesn't constitute an "Extreme Cloud" environment, you might like to try out the "Extreme Cloud Administration Tool", called [xCat], which was used to manage the many servers in IBM Watson.
Host 2 - Business Logic Server
Host 2 will be driving most of the "thinking". Install the "Server" or "HPC Compute Node" version of Linux. This will be running a server virtualization Hypervisor. I recommend KVM, but you can probably run Xen or VMware instead if you like.
Host 3 - File and Database Server
Host 3 will hold your information sources, indices, and databases. Install the "Server" or "HPC Compute Node" version of Linux. This will be your NFS server, which might come up as a question during the installation process.
Technically, you could run different Linux distributions on different machines. For example, you could run "Ubuntu Desktop" for host 1, "RHEL 6 Server" for host 2, and "SLES 11" for host 3. In general, Red Hat tries to be the best "Server" platform, and Novell tries to make SLES be the best "Guest OS".
My advice is to pick a single distribution and use it for everything, Desktop, Server, and Guest OS. If you are new to Linux, choose Ubuntu. There are plenty of books on Linux in general, and Ubuntu in particular, and Ubuntu has a helpful community of volunteers to answer your questions.
Step 4: Download Information Sources
You will need some documents for your implementation to process.
IBM Watson used a modified SONAS to provide a highly-available clustered NFS server. For this version, we won't need that level of sophistication. Configure Host 3 as the NFS server, and Hosts 1 and 2 as NFS clients. See the [Linux-NFS-HOWTO] for details. To optimize performance, host 3 will be the "official master copy", but we will use a Linux utility called rsync to copy the information sources over to the hosts 1 and 2. This allows the task engines on those hosts to access local disk resources during question-answer processing.
We will also need a relational database. You won't need a high-powered IBM DB2. Your implementation can do fine with something like [Apache Derby] which is the open source version of IBM CloudScape from its Informix acquisition. Set up Host 3 as the Derby Network Server, and Hosts 1 and 2 as Derby Network Clients. For more about structured content in relational databases, see my post [IBM Watson - Business Intelligence, Data Retrieval and Text Mining].
Linux includes a utility called wget which allows you to download content from the Internet to your system. What documents you decide to download is up to you, based on what types of questions you want answered. For example, if you like Literature, check out the vast resources at [FullBooks.com]. You can automate the download by writing a shell script or program to invoke wget to all the places you want to fetch data from. Rename the downloaded files to something unique, as often they are just "index.html". For more on wget utility, see [IBM Developerworks].
Step 5: The Query Panel - Parsing the Question
Next, we need to parse the question and have some sense of what is being asked for. For this we will use [OpenNLP] for Natural Language Processing, and [OpenCyc] for the conceptual logic reasoning. See Doug Lenat presenting this 75-minute video [Computers versus Common Sense]. To learn more, see the [CYC 101 Tutorial].
Unlike Jeopardy! where Alex Trebek provides the answer and contestants must respond with the correct question, we will do normal Question-and-Answer processing. To keep things simple, we will limit questions to the following formats:
Who is ...?
Where is ...?
When did ... happen?
What is ...?
Host 1 will have a simple Query Panel web interface. At the top, a place to enter your question, and a "submit" button, and a place at the bottom for the answer to be shown. When "submit" is pressed, this will pass the question to "main.jsp", the Java servlet program that will start the Question-answering analysis. Limiting the types of questions that can be posed will simplify hypothesis generation, reduce the candidate set and evidence evaluation, allowing the analytics processing to continue in reasonable time.
Step 6: Unstructured Information Management Architecture
The "heart and soul" of IBM Watson is Unstructured Information Management Architecture [UIMA]. IBM developed this, then made it available to the world as open source. It is maintained by the [Apache Software Foundation], and overseen by the Organization for the Advancement of Structured Information Standards [OASIS].
Basically, UIMA lets you scan unstructured documents, gleam the important points, and put that into a database for later retrieval. In the graph above, DBs means 'databases' and KBs means 'knowledge bases'. See the 4-minute YouTube video of [IBM Content Analytics], the commercial version of UIMA.
Starting from the left, the Collection Reader selects each document to process, and creates an empty Common Analysis Structure (CAS) which serves as a standardized container for information. This CAS is passed to Analysis Engines , composed of one or more Annotators which analyze the text and fill the CAS with the information found. The CAS are passed to CAS Consumers which do something with the information found, such as enter an entry into a database, update an index, or update a vote count.
(Note: This point requires, what we in the industry call a small matter of programming, or [SMOP]. If you've always wanted to learn Java programming, XML, and JDBC, you will get to do plenty here. )
If you are not familiar with UIMA, consider this [UIMA Tutorial].
Step 7: Parallel Processing
People have asked me why IBM Watson is so big. Did we really need 2,880 cores of processing power? As a supercomputer, the 80 TeraFLOPs of IBM Watson would place it only in 94th place on the [Top 500 Supercomputers]. While IBM Watson may be the [Smartest Machine on Earth], the most powerful supercomputer at this time is the Tianhe-1A with more than 186,000 cores, capable of 2,566 TeraFLOPs.
To determine how big IBM Watson needed to be, the IBM Research team ran the DeepQA algorithm on a single core. It took 2 hours to answer a single Jeopardy question! Let's look at the performance data:
Number of cores
Time to answer one Jeopardy question
Single IBM Power750 server
< 4 minutes
Single rack (10 servers)
< 30 seconds
IBM Watson (90 servers)
< 3 seconds
The old adage applies, [many hands make for light work]. The idea is to divide-and-conquer. For example, if you wanted to find a particular street address in the Manhattan phone book, you could dispatch fifty pages to each friend and they could all scan those pages at the same time. This is known as "Parallel Processing" and is how supercomputers are able to work so well. However, not all algorithms lend well to parallel processing, and the phrase [nine women can't have a baby in one month] is often used to remind us of this.
Fortuantely, UIMA is designed for parallel processing. You need to install UIMA-AS for Asynchronous Scale-out processing, an add-on to the base UIMA Java framework, supporting a very flexible scale-out capability based on JMS (Java Messaging Services) and ActiveMQ. We will also need Apache Hadoop, an open source implementation used by Yahoo Search engine. Hadoop has a "MapReduce" engine that allows you to divide the work, dispatch pieces to different "task engines", and the combine the results afterwards.
Host 2 will run Hadoop and drive the MapReduce process. Plan to have three KVM guests on Host 1, four on Host 2, and three on Host 3. That means you have 10 task engines to work with. These task engines can be deployed for Content Readers, Analysis Engines, and CAS Consumers. When all processing is done, the resulting votes will be tabulated and the top answer displayed on the Query Panel on Host 1.
Step 8: Testing
To simplify testing, use a batch processing approach. Rather than entering questions by hand in the Query Panel, generate a long list of questions in a file, and submit for processing. This will allow you to fine-tune the environment, optimize for performance, and validate the answers returned.
There you have it. By the time you get your implementation fully operational, you will have learned a lot of useful skills, including Linux administration, Ethernet networking, NFS file system configuration, Java programming, UIMA text mining analysis, and MapReduce parallel processing. Hopefully, you will also gain an appreciation for how difficult it was for the IBM Research team to accomplish what they had for the Grand Challenge on Jeopardy! Not surprisingly, IBM Watson is making IBM [as sexy to work for as Apple, Google or Facebook], all of which started their business in a garage or a basement with a system as small as this version for personal use.
As we get to larger and larger flash and spinning disk drives, a common question I get is whether to use RAID-5 versus RAID-6. Here is my take on the matter.
A quick review of basic probability statistics
Failure rates are based on probabilities. Take for example a traditional six-sided die, with numbers one through six represented as dots on each face. What are the chances that we can roll the die several times in a row, that we will have no sixes ever rolled? You might think that if there is a 1/6 (16.6 percent) chance to roll a six, then you would guarantee hit a six after six rolls. That is not the case.
# of Rolls
Probability of no sixes (percent)
So, even after 24 rolls, there is more than 1 percent chance of not rolling a six at all. The formula is (1-1/6) to the 24th power.
Let's say that rolling one to five is success, and rolling a six is a failure. Being successful requires that no sixes appear in a sequence of events. This is the concept I will use for the rest of this post. If you don't care for the math, jump down to the "Summary of Results" section below.
Error Correcting Codes (ECC) and Unreadable Read Errors (URE)
When I speak to my travel agent, I have to provide my six-character [Record Locator] code. Pronouncing individual letters can be error prone, so we use a "spelling alphabet".
The International Radiotelephony Spelling Alphabet, sometimes known as the [NATO phonetic alphabet], has 26 code words assigned to the 26 letters of the English alphabet in alphabetical order as follows: Alfa, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliett, Kilo, Lima, Mike, November, Oscar, Papa, Quebec, Romeo, Sierra, Tango, Uniform, Victor, Whiskey, X-ray, Yankee, Zulu.
Foxtrot Golf Mike Oscar Victor Whiskey
Foxtrot Gold Mine Oscar Vector Whisker
Boxcart Golf Miko Boxcart Victor Whiskey
Having five or so characters to represent a single character may seem excessive, but you can see that this can be helpful when communications link has static, or background noise is loud, as is often the case at the airport!
If spelling words are misheard, either (a) they are close enough like "Gold" for "Golf", or "Whisker" for "Whiskey", that the correct word is known, or (b) not close enough, such that "Boxcart" could refer to either "Foxtrot" or "Oscar" that we can at least detect that the failure occurred.
For data transfers, or data that is written, and later read back, the functional equivalent is an Error Correcting Code [ECC], used in transmission and storage of data. Some basic ECC can correct a single bit error, and detect double bit errors as failures. More sophisticated ECC can correct multiple bit errors up to a certain number of bits, and detect most anything worse.
When reading a block, sector or page of data from a storage device, if the ECC detects an error, but is unable to correct the bits involved, we call this an "Unrecoverable Read Error", or URE for short.
Bit Error Rate (BER)
Different storage devices have different block, sector or page sizes. Some use 512 bytes, 4096 bytes or 8192 bytes, for example. To normalize likelihood of errors, the industry has simplified this to a single bit error rate or BER, represented often as a power of 10.
Bit Error Rate per read (BER)
Consumer HDD (PC/Laptops)
Enterprise 15k/10k/7200 rpm
Solid-State and Flash
IBM TS1150 tape
In other words, the chance that a bit is unreadable on optical media is 1 in 10 trillion (1E13), on enterprise 15k drives is 1 in 10 quadrillion, and on LTO-7 tape is 1 in 10 quintillion.
There are eight bits per byte, so reading 1 GB of data is like rolling the die eight billion times. The chance of successfully reading 1GB on DVD, then would be (1 - 1/1E13) to the 8 billionth power, or 99.92 percent, or conversely a 0.08 percent chance of failure.
In this paper, Google had studied drive failure using an "Annual Failure Rate" or AFR. Here are two graphs from this paper:
This first graph shows AFR by age. Some drives fail in their first 3-6 months, often called "infant mortality". Then they are fairly reliable for a few years, down to 1.7 percent, then as they get older, they start to fail more often, up to 8.3 percent.
This second graph factors in how busy the drives are. Dividing the drive set into quartiles, "Low" represents the least busy drives (the bottom quartile), "Medium" represents the median two quartiles, and "High" represents the busiest drives, the top quartile. Not surprisingly, the busiest drives tend to fail more often than medium-busy drives.
Given an AFR, what are the chances a drive will fail in the next hour? There are 8,766 hours per year, so the success of a drive over the course of a year is like rolling the die 8,766 times. This allows us to calculate a "Drive Error Rate" or DER:
Drive Error Rate per hour (DER)
For example, an AFR=3 drive has a 1 in 287,800 chance of failing in a particular hour. The probability this drive will fail in the next 24 hours would be like rolling the die 24 times. The formula is (1-1/287,800) to the 24th power, resulting in a failure rate of roughly 0.008 percent.
Let's take a typical RAID-5 rank with 600GB drives at 15K rpm, in a 7+P RAID-5 configuration.
During normal processing, if a URE occurs on a individual drive, RAID comes to the rescue. The system can rebuild the data from parity, and correct the broken block of data.
When a drive fails, however, we don't have this rescue, so a URE that occurs during the rebuild process is catastrophic. How likely is this? Data is read from the other seven drives, and written to a spare empty drive. At 8 bits per byte, reading 4200 GB of data is rolling the die 33.6 trillion times. The formula is then (1-1/E16) to the 33.6 trillionth power, or approximately 0.372 percent chance of URE during the rebuild process.
The time to perform the rebuild depends heavily on the speed of the drive, and how busy the RAID rank is doing other work. Under heavy load, the rebuild might only run at 25 MB/sec, and under no workload perhaps 90 MB/sec. If we take a 60 MB/sec moderate rebuild rate, then it would take 10,000 seconds or nearly 3 hours. The chance that any of the seven drives fail during these three hours, at AFR=10 rolling the DER die (7 x 3) 21 times, results in a 0.025 percent chance of failure.
It is nearly 15 times more likely to get a URE failure than a second drive failure. A rebuild failure would happen with either of these, with a probability of 0.397 percent.
The situation gets worse with higher capacity Nearline drives. Let's do a RAID-5 rank with 6TB Nearline drives at 7200 rpm, in a 7+P configuration. The likelihood of URE reading 42 TB of data, is rolling the die 336 trillion times, or approximately 3.66 percent chance of URE failure. Yikes!
The time to rebuild is also going to take longer. A moderate rebuild rate might only be 30 MB/sec, so that rebuilding a 6TB drive would take 55 hours. The chance that one of the other seven drives fail, assuming again AFR=10, during these 55 hours results in a 0.462 percent.
This time, a URE failure is nearly eight times more likely than a double drive failure. The chance of a rebuild failure is 4.12 percent. Good thing you backed up to tape or object storage!
The math can be done easily using modern spreadsheet software. The URE failure rate is based on the quantity of data read from the remaining drives, so a 4+P with 600GB drives is the same as 8+P with 300GB drives. Both read 2.4 TB of data to recalculate from parity. The Double Drive failure rate is based on the number of drives being read times the number of hours during the rebuild. Slower, higher capacity drives take longer to rebuild. However, in both the 15K and 7200rpm examples, the chance of a URE failure was 8 to 15 times more likely than double drive failure.
Many of the problems associated with RAID-5 above can be mitigated with RAID-6.
After a single drive fails, any URE during rebuild can be corrected from parity. However, if a second drive fails during the rebuild process, then a URE on the remaining drives would be a problem.
Let's start with the 600GB 15k drives in a 6+P+Q RAID-6 configuration. The chance of a second drive failing is 0.0252 percent, as we calculated above. The likelihood of a URE is then based on the remaining six drives, 3600 GB of data. Doing the math, that is 0.0319 percent chance. So, the change of a URE during RAID-6 failure is the probability of both occurring, roughly 0.0000806 percent. Far more reliable than RAID-5!
Likewise, we can calculate the probability of a triple drive failure. After the second drive fails, the likelihood of a third drive at AFR=10, results in 0.00000546 percent.
Combining these, the chance of failure of rebuild is 0.000861 percent.
Switching to 6 TB Nearline drives, in a 6+P+Q RAID-6 configuration, we can do the math in the same manner. The likelihood of URE and two drives failing is 0.0145 percent, and for triple drive failure is 0.00183 percent. Chance of rebuild failure is 0.0163 percent.
Summary of Results
Putting all the results in a table, we have the following:
RAID-5 rebuild failure (percent)
RAID-6 rebuild failure (percent)
600GB 15K rpm
6 TB 7200rpm
Hopefully, I have shown you how to calculate these yourself, so that you can plug in your own drive sizes, rebuild rates, and other parameters to convince yourself of this.
In all cases, RAID-6 drastically reduced the probability of rebuild failure. With modern cache-based systems, the write-penalty associated with additional parity generally does not impact application performance. As clients transition from faster 15K drives to slower, higher capacity 10K and 7200 rpm drives, I highly recommend using RAID-6 instead of RAID-5 in all cases.
In the prank, I indicated that I had submitted my video to the [Arizona International Film Festival], of AIFF for short, which coincidently was running April 1-20, and that it had won an award. I invited everyone who read my blog to see me accept the award at a ceremony at 6:00pm on April 1 at the Fox Theater, followed by the 8:00pm showing of another award-winning film.
I didn't submit the video, the video didn't win any award, and I was not invited to the award ceremony. I did, however, plan to see the movie at 8:00pm.
When I got there, I learned that a dozen of my friends, not realizing it was a prank, showed up, asking for me. The AIFF was quite amused, and invited me to award ceremony still going on. The other filmmakers were impressed I had concocted such an elaborate social media campaign!
A slideshow is another style of video, animating still images to music. The [Ken Burns effect] was named after the technique fellow filmmaker Ken Burns used in his documentaries.
In 2010, I worked with the XIV team to address FUD that our competitors were flinging about double drive failures. My blog post [Double Drive Failure Debunked: XIV Two Years Later] set the record straight and put this issue to rest once and for all. XIV sales shot up dramatically after this post went public!
Are you tired of hearing about Cloud Computing without having any hands-on experience? Here's your chance. IBM has recently launched its IBM Development and Test Cloud beta. This gives you a "sandbox" to play in. Here's a few steps to get started:
Generate a "key pair". There are two keys. A "public" key that will reside in the cloud, and a "private" key that you download to your personal computer. Don't lose this key.
Request an IP address. This step is optional, but I went ahead and got a static IP, so I don't have to type in long hostnames like "vm353.developer.ihost.com".
Request storage space. Again, this step is optional, but you can request a 50GB, 100GB and 200GB LUN. I picked a 200GB LUN. Note that each instance comes with some 10 to 30GB storage already. The advantage to a storage LUN is that it is persistent, and you can mount it to different instances.
Start an "instance". An "instance" is a virtual machine, pre-installed with whatever software you chose from the "asset catalog". These are Linux images running under Red Hat Enterprise Virtualization (RHEV) which is based on Linux's kernel virtual machine (KVM). When you start an instance, you get to decide its size (small, medium, or large), whether to use your static IP address, and where to mount your storage LUN. On the examples below, I had each instance with a static IP and mounted the storage LUN to /media/storage subdirectory. The process takes a few minutes.
So, now that you are ready to go, what instance should you pick from the catalog? Here are three examples to get you started:
IBM WebSphere sMASH Application Builder
Base OS server to run LAMP stack
Next, I decided to try out one of the base OS images. There are a lot of books on Linux, Apache, MySQL and PHP (LAMP) which represents nearly 70 percent of the web sites on the internet. This instance let's you install all the software from scratch. Between Red Hat and Novell SUSE distributions of Linux, Red Hat is focused on being the Hypervisor of choice, and SUSE is focusing on being the Guest OS of choice. Most of the images on the "asset catalog" are based on SLES 10 SP2. However, there was a base OS image of Red Hat Enterprise Linux (RHEL) 5.4, so I chose that.
To install software, you either have to find the appropriate RPM package, or download a tarball and compile from source. To try both methods out, I downloaded tarballs of Apache Web Server and PHP, and got the RPM packages for MySQL. If you just want to learn SQL, there are instances on the asset catalog with DB2 and DB2 Express-C already pre-installed. However, if you are already an expert in MySQL, or are following a tutorial or examples based on MySQL from a classroom textbook, or just want a development and test environment that matches what your company uses in production, then by all means install MySQL.
This is where my SSH client comes in handy. I am able to login to my instance and use "wget" to fetch the appropriate files. An alternative is to use "SCP" (also part of PuTTY) to do a secure copy from your personal computer up to the instance. You will need to do everything via command line interface, including editing files, so I found this [VI cheat sheet] useful. I copied all of the tarballs and RPMs on my storage LUN ( /media/storage ) so as not to have to download them again.
Compiling and configuring them is a different matter. By default, you login as an end user, "idcuser" (which stands for IBM Developer Cloud user). However, sometimes you need "root" level access. Use "sudo bash" to get into root level mode, and this allows you to put the files where they need to be. If you haven't done a configure/make/make install in awhile, here's your chance to relive those "glory days".
In the end, I was able to confirm that Apache, MySQL and PHP were all running correctly. I wrote a simple index.php that invoked phpinfo() to show all the settings were set correctly. I rebooted the instance to ensure that all of the services started at boot time.
Rational Application Developer over VDI
This last example, I started an instance pre-installed with Rational Application Developer (RAD), which is a full Integrated Development Environment (IDE) for Java and J2EE applications. I used the "NX Client" to launch a virtual desktop image (VDI) which in this case was Gnome on SLES 10 SP2. You might want to increase the screen resolution on your personal computer so that the VDI does not take up the entire screen.
From this VDI, you can launch any of the programs, just as if it were your own personal computer. Launch RAD, and you get the familiar environment. I created a short Java program and launched it on the internal WebSphere Application Server test image to confirm it was working correctly.
If you are thinking, "This is too good to be true!" there is a small catch. The instances are only up and running for 7 days. After that, they go away, and you have to start up another one. This includes any files you had on the local disk drive. You have a few options to save your work:
Copy the files you want to save to your storage LUN. This storage LUN appears persistent, and continues to exist after the instance goes away.
Take an "image" of your "instance", a function provided in the IBM Developer and Test Cloud. If you start a project Monday morning, work on it all week, then on Friday afternoon, take an "image". This will shutdown your instance, and backup all of the files to your own personal "asset catalog" so that the next time you request an instance, you can chose that "image" as the starting point.
Another option is to request an "extension" which gives you another 7 days for that instance. You can request up to five unique instances running at the same time, so if you wanted to develop and test a multi-host application, perhaps one host that acts as the front-end web server, another host that does some kind of processing, and a third host that manages the database, this is all possible. As far as I can tell, you can do all the above from either a Windows, Mac or Linux personal computer.
Getting hands-on access to Cloud Computing really helps to understand this technology!
Back in June, I mentioned this blog was [Moving to MyDeveloperWorks] which is based on IBM Lotus Connections.
Finally, the move is complete for all bloggers. If you are having problems with the redirects, you might need to unsubscribe and re-subscribe in your RSS feed reader. Here are the new links for several IBM bloggers that have moved over:
Well, it's Tuesday again, and you know what that means? IBM Announcements!
I just got back from my vacation, so this is a guest post from my colleagues Moshe Weiss, Senior Manager, Development and Design, IBM Storage; and Diane Benjuya, Portfolio Marketing Manager for IBM Spectrum Accelerate.
1. What is IBM announcing?
Today IBM announces another leap forward in storage management, with the availability of IBM Hyper-Scale Manager version 5.1. In April 2016, when IBM announced IBM FlashSystem A9000 and A9000R, they also introduced a fully revamped GUI: IBM Hyper-Scale Manager 5.0. That version brought FlashSystem A9000/A9000R clients a terrific new storage management experience, with advanced look and feel, analytics tools, and other enhancements for managing smarter, with greater simplicity and in less time.
Hyper-Scale Manager 5.x dramatically reduces task time -- by 45% for this task
With Hyper-Scale Manager 5.1, IBM is bringing this exceptional GUI and unified user management experience across the entire set of Spectrum Accelerate-based products, which IBMers internally refer to as the "A family":
IBM Spectrum Accelerate software
IBM XIV Storage System
IBM Hyper-Scale Manager lets you view and move quickly across software-defined, disk based, and all-flash storage in seconds, equipping you with the information you need to ensure every application is performing at its peak.
2. What is innovative about the new GUI -- how does it help clients?
IBM Hyper-Scale Manager 5 makes storage management more insightful and easier in multiple ways, helping clients find info, act and troubleshoot faster. Concepts implemented include: web application with tablet-ready design, single page application, strong navigation scheme, smart filter with analytics, capacity trend/forecast, call for action, better communication using social media. All this helps users make fast, informed decisions while being able to see at a glance the impact of any change on the environment, including into the future. IBM team has designed it over the past three years working closely with clients and using Design Thinking methodology.
Get a holistic view of your storage
Provisioning, Monitoring and Troubleshooting
Find everything, get anywhere
Call for action!
The IBM team applied an "emotional design" approach that makes users feel emotionally attached to GUI for its coolness and elegance -- making the experience not just more productive but also more pleasant.
Version 5.1 brings many exciting and important new features to ease the client's day to day activities. Here are some key ones:
Managing your "A Family" in one UI
Instantly gain insights, spot problematic areas
Integrated Capacity Analytics
4. Any unique features that will be focused on?
The IT industry is entering a cognitive era, right? So IBM has brought cognitive into the GUI. The GUI actually learns each user's habits and preferences over time and adapts the experience to the specific user.
5. How does 5.1 add value to the family of products based on Spectrum Accelerate software?
Hyper-Scale manager makes this powerful family for private, public, hybrid block storage clouds that much more attractive and relevant. Just imagine yourself:
Waking up, driving to the office, opening the UI and seeing that your FlashSystem A9000 systems are doing worse than your XIV in terms of IOPS. Scary, but no worries.
You drill down to the specific FlashSystem A9000 by comparing IOPS. You find that a QoS performance class is deliberately reducing performance for the host. A quick analysis, and you find that it is due to the contract with the host. After a short chat with the host admin, you establish better terms, and decide to stop the IO limitation on the volumes and move them to a disk-based XIV to reduce dollar-per-TB cost.
You look for the best candidate by looking at the capacity trend/forecast charts for each XIV and at growth rate per month. You compare performance metrics and chose the preferred XIV to move the volumes to.
You migrate the volumes from the A9000 to the chosen XIV using the same interface, creating connectivity in one click. You then add the same host configuration as for the A9000 to the XIV in a second click. Then just map and monitor the new IO statistics with a third click. Easy!
Imagine carrying out your daily work and decisions -- creating volumes, monitoring, mirroring, troubleshooting and configuring -- across different systems of different types within the family in single clicks -- without the need to move between user interfaces. You can think of Hyper-Scale Manager 5.1 as a GUI come alive: a dynamic, breathing, thinking work enhancer that simplifies and helps you make the most of your investment.
Come see it in action! Register now for the [Live demo webinar], scheduled for Wednesday, November 9, 2016, from 10am to 11:30am MST!
Download the software from [IBM Fix Central], installation is one click and takes just seconds!
Here is an infographic!
Comments? Feedback? Enter them below. Both Moshe and Diane would be pleased to hear from you!
Today, I met with Teresa Ferraro and Mike Buttrum from FirstRain in their Manhattan office in downtown New York City. IBM recently contracted FirstRain to provide IBMers like myself with analytics on publicly-available news to keep us informed for business meetings. Here's how IBMers can get the most out of this service.
Basically, FirstRain takes a list and generates the best summaries of publicly-available news that are most relevant. You can organize into different channels. Here I have seven channels.
Companies to watch refer to existing or prospective clients that I plan to be talking with soon. Some of my colleagues are assigned to specific clients, so they can set this up once and enjoy the news for the rest of the year. I, on the other hand, meet with different clients every week, so I will be updating this list on a frequent basis.
I have divided the Competitors between major ones, and smaller startups. Since I am often working with business partners and distributors, I made that a separate channel as well.
For product lines, I picked three: Data migration, Data storage solutions, and Software defined storage.
For conferences where I don't know which companies will attend, such as the IBM Technical University, I can set up information by territory. Here is one for Brazil.
I also attend industry-oriented events, so I can pick those vertical markets that might be helpful with dinner conversations. In this example, I chose Energy, Electric Utilities and Gas Utilities.
Once you have your channels configured, you get your results in various sections:
Management Changes lists any changes in top C-level positions, who left the company, who got recently hired.
Key Developments indicates news like mergers and acquisitions and government regulations.
First Reads prioritizes the top six articles for your channel. You can access more, but these six will get you started as you have your morning coffee.
First Tweets gives you the six most relevant tweets, if those articles above were just "TL;DR"
A section on Business Influencers and Market Drivers is interesting to see who the big players are, and what topics are driving the most conversation. Here's an example from my Energy/Electric/Gas channel:
The Most Talked About section covers quotes and commentary about the most talked about companies in your channel.
With most news sources focused on politics, weather and celebrity gossip, it is nice to have a quicker, more focused approach to get the news I need to prepare for my client briefings. Special thanks to my hosts Teresa and Mike for their hospitality!
IBM has chosen three particular Software Defined Environments. At one end, IBM is a platinum sponsor of OpenStack which supports x86 servers, POWER systems and z System mainframes. A problem with open source projects like this, however, is that they can be a bit like putting together IKEA furniture from pieces in a box: "Some assembly required."
At the other end, highly proprietary environments from VMware and Microsoft bring enterprise-ready out-of-the-box solutions. However, nobody wants to be limited to just x86-based solutions. IBM offers the best of both worlds, basing its IBM Cloud and SmartCloud software on OpenStack standards, but providing enterprise-ready solutions for x86, POWER Systems and z System mainframes. This includes IBM Cloud Manager with OpenStack, IBM Cloud Orchestrator, and IBM SmartCloud Cost Management software products.
(Analogy: If open source solutions were vanilla ice cream, and proprietary solutions were chocolate ice cream, then IBM Cloud and SmartCloud is vanilla ice cream with chocolate sauce on top! This is the same approach IBM used for WebSphere Application Server, based on Apache web server, and IBM BigInsights, based on Hadoop analytics.)
For some people, software defined can also refer to how the resources are deployed. Rather than using specialized hardware, solutions based on industry-standard hardware can be delivered either as pre-built appliances, services in the Cloud, or as software-only products.
Back in the 1990s, IBM came up with the [Seascape Storage Enterprise Architecture], deciding to focus the design of its storage systems to be based, where possible and practical, on industry-standard components.
Let's review a few products:
IBM SAN Volume Controller (SVC) and Storwize V7000: IBM storage hypervisors were originally designed to run on industry-standard x86 servers. The IBM scientists at Almaden Research Center referred to this as the "COMmodity PArts Storage System" (COMPASS) architecture.
That is still mostly true 12 years later, but SVC and Storwize V7000 does have specialized hardware, including host bus adapter cards and the [Intel® QuickAssist] chip for Real-time Compression.
IBM DS8000 disk system: The DS8000 is based on off-the-shelf IBM POWER servers. Originally, you could only purchase POWER-based servers from IBM, but now thanks to the [OpenPOWER Foundation], you now have more options.
The DS8000 does use some specialized hardware for its host and device adapters, taking advantage of ASICs and FPGAs to optimize performance.
IBM XIV storage system: IBM acquired XIV back in 2008, but its design is very similar to Seascape architecture. All of the Intellectual Property was in the software, installed on industry-standard x86 servers, cache memory, host bus adapters and 7200 RPM nearline disk drives. I joked that the entire hardware bill-of-materials could be ordered directly from the CDW catalog!
IBM FlashSystem: IBM is #1 rank in the All-Flash Array market. Rather than using off-the-shelf commodity Solid-State drives (SSD), the IBM FlashSystem employs specialized hardware based on FPGAs to optimize performance.
IBM FlashSystem came from the recent acquisition of Texas Memory Systems, and was not designed under the IBM Seascape architecture.
Combining the method the resources are controlled and managed with the way storage is deployed results in a quadrant. Let's take a look at this from a storage perspective:
Traditional storage products that are based on specialized hardware that do not support Software Defined Environment APIs.
Storage products that are based on specialized hardware, but have been enhanced to support Software Defined Environment APIs. For OpenStack, this refers to Cinder and Swift interfaces. For VMware, this would include VAAI, VASA and VADP interfaces and vCenter Console plug-ins.
Storage products that are basically software, either installed on pre-built hardware appliances, offered as services in the Cloud, or software you deploy on your own industry-standard hardware. Unfortunately, this category does not support software defined environment APIs, and so proprietary interfaces require administrator-intensive involvement instead.
Storage software for industry-standard hardware. You purchase the appropriate server, cache memory, flash and disk drives as needed. This category could also extend to pre-built appliance versions of this software, or as services in the Cloud. APIs for software defined environments are available to deploy this with self-service automation.
IBM Spectrum Storage is a family of Category IV software offerings. Here are the products announced:
Based on technology from...
IBM Spectrum Control™
Simplified control and optimization of storage and data infrastructure
SmartCloud Virtual Storage Center, Tivoli Storage Productivity Center
IBM Spectrum Protect™
Single point of administration for data backup and recovery
Tivoli Storage Manager
IBM Spectrum Accelerate™
Accelerating speed of deployment and access to data for new workloads
XIV storage system
IBM Spectrum Virtualize™
Storage virtualization that frees client data from IT boundaries
SAN Volume Controller
IBM Spectrum Scale™
High-performance, scalable storage manages exabytes of unstructured data
GPFS and codename:Elastic Storage
IBM Spectrum Archive™
Enables easy access to long term storage of low activity data
Linear Tape File System (LTFS)
Last year, IDC recognized IBM as #1 in this new emerging software defined storage market. This announcement reinforces IBM's lead in this area. See the [Press Release] for details.
We have a lot to cover, so I will do the quick recap today, and then go in-depth on subsequent posts.
IBM FlashSystem 840 and V840
The FlashSystem now offers a high-voltage 1300W power supply. There are two supplies providing redundancy. In the unlikely event that you are doing maintenance on one of them, the other supply handles all the workload. With the original power supply, the system slowed down the clock speeds to reduce electrical demand. The new power supplies can handle full performance.
Also, the Graphical User Interface (GUI) now holds 300 days of performance data with pan-and-zoom capability. Five predefined graphs showing key performance metrics with additional user-defined metrics available for visualization.
The new v7.4 level of microcode combines features from v7.2.7 and v7.3 into a single code base.
In previous 3-site mirroring implementations, you had A-to-B-to-C cascading. Metro Mirror would get the data from A-to-B, then Global Mirror would copy B-to-C. Multiple Target Peer-to-Peer Remote Copy (PPRC) feature number 7025 allows you to have two separate paths of the data: A-to-B and separately A-to-C. Some folks refer to this as a "star" configuration.
For System z mainframe clients, the new v7.4 introduces new zHyperWrite for DB2 database logs, enhances zGM (XRC) write pacing, and extends Easy Tier automated-tiering API to allow z/OS applications to influence placement on different tiers of storage.
The High Performance Flash Enclosures (HPFE) that IBM introduced last May for the "A" frames are now available for "B" frames. You can have four HPFE in A, and another 4 in B.
DS8870 now offers 600 GB 15K rpm SAS and 1.6 TB 2.5-inch SSD encryption drives for additional capacity and cost performance options to meet data growth demands within the same space. Both support data-at-rest encryption.
Lastly, we have upgraded the OpenStack Cinder driver to the latest Juno release, including features like volume replication and volume retype.
The latest SAN switch is a slim 1U high box that can be configured with 12 or 24 ports. These are 16Bps ports that can auto-negotiate down to 8Gbps, 4Gbps and 2Gbps. These are easy to set up, and can be managed with the IBM Network Advisor management software.
GPFS is the core technology for IBM's "Codename: Elastic Storage" initiative.
You have several options. First, you can purchase just the GPFS software itself. It runs natively on AIX, Windows and Linux, and can be extended to support other operating systems through the use of NAS protocols like NFS or CIFS. Today, the Linux support which was previously just x86 and POWER has been extended to include Linux on System z mainframes as well.
GPFS v4.1 offers "Native RAID" support, with de-clustered RAID in 8+2P and 8+3P configurations. Like the IBM XIV Storage System, this scatters the data across many drives, and can tolerate drive failures better than traditional RAID-5 configurations.
Another option is to get a pre-configured "Converged" appliance that combines servers, storage and hardware. We already offer SONAS and the Storwize V7000 Unified, but IBM now offers the "GPFS Storage Server" running on the new P822L Linux-on-Power servers, RHEL v7, and and GPFS v4.1 with Native RAID to twin-tailed attached DCS3700 expansion drawers. Since GPFS provides the RAID, no need for DCS37000 controllers, saving clients substantial costs.
The IBM Storwize family includes SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, Storwize V5000, Storwize V3700 and Storwize V3500.
The big announcement is that IBM now offers data-at-rest encryption for block data on internal drives in the new generation of Storwize V7000 and V7000 Unified models. There is no performance impact, and no need to purchase new SED-capable drives.
Data-at-rest encryption helps in several ways. First, it protects data if a drive is pulled out and taken away maliciously. Second, it protects data if the drive fails and you want to send it back to the manufacturer for replacement. Third, it allows you to perform a "secure erase" so that the data can be sold or re-purposed without fear of anyone reading previous data.
Initially, the encryption key management is built-in, with the keys stored on a USB memory stick physically attached to the model. In the future, IBM will extend this support to SVC, extend this support to external virtualized drives, and extend this support to IBM Security Key Lifecycle Manager (SKLM).
Other announcements include 16Gbps adapters for SVC, Storwize V7000 and V7000 Unified. The entire Storwize family will also enjoy both 1.8TB 10K RPM 2.5-inch drives, and 6TB 7200RPM 3.5-inch drives
See the Announcement Letter (available later this month) for details.
New TS1150 enterprise tape drives
The anticipation is over! The new TS1150 tape drive has been announced, with 10TB raw un-compressed "JD" media cartridge capacity and 360 MB/sec throughput performance. The new drive is read/write compatible with TS1140 on JC, JY and JK media cartridges.
For the virtual tape libraries for the System z platform, IBM offers two models. The TS7740 had a small amount of disk front ending tape library of physical tape. The TS7720 had a large amount of disk with no tape library.
But then the person carrying the chocolate bar bumped into the person carrying the jar of peanut butter, and the rest is history. IBM will now allow tape attach on TS7720, best of both worlds! Large disk cache plus tape library attach.
Tape-attached TS7720 configurations can have up to eight partitions, with different partitions have different policies. Some might move data from disk cache to tape more aggressively, while other partitions may keep data on disk for longer periods, or indefinitely if needed.
Logical tape volumes can now be up to 25GB in size.
The DCS3700 is IBM's entry-level disk system for sequential-oriented workloads. Today, IBM announced new disk drive options: 400GB 2.5-inch SSD, 800 GB 2.5-inch SSD, and 1.2TB 10K RPM 2.5-inch drive. All of these offer T10 Protection Information (PI) data integrity.
Recently, a client asked how to backup their IBM PureData System for Analytics devices. IBM had [acquired Netezza in November 2010], and later renamed their TwinFin devices as the IBM PureData for Analytics, powered by Netezza.
The [IBM PureData System for Analytics] is incredibly fast for performing deep, ad-hoc analytics. However, the people who use them are "data scientists", not backup experts.
Likewise, there are backup administrators who may not be familiar with the unique characteristics of this expert-integrated system to know what backup options are available.
As with the rest of the IBM PureSystems line, the IBM PureData System for Analytics (or, PDA for short) has a combination of servers, storage and switches inside.
In a full-frame PDA, there are two servers in Active/Passive mode, these coordinate activity to FPGA-based blade servers, which have parallel access to hundreds of disk drives, storing nearly 200 TB of compressed database data. A system can span up to four frames.
But what do you backup? And why? You don't need to worry about backing up the Linux operating system or NPS server code, that is considered firmware and if anything every got corrupted, IBM would help restore it for you. System-wide metadata, such as the host catalog and global users, groups, and permissions should be backed up periodically to protect against data corruption.
There are a number of reasons to backup your user databases:
As part of firmware upgrade/downgrade
To transfer data to another system
Protect against hardware failure / disaster
Protect against data corruption
The PDA has three backup formats. You can backup the entire user database in compressed format, backup individual tables in compressed format, or export to a text-format file.
Compressed format is faster, but can only be restored to the same PDA, or a PDA that has the same or higher level of NPS firmware. The text-format is slower, but can be used to restore to lower levels of NPS firmware, or to other database systems.
There are basically two methods to backup your PDA. The first is called the "Filesystem" method. Basically, you can attach an external storage device to the NPS server, and use the built-in command line interface (CLI) to store the backups onto its file system.
On NPS version 6, the nzhostbackup will backup the /nz/data directory which stores the system tables, database catalogs, configuration files, query plans, and cached executable code for the SPU blade servers.
(I have heard that the nzhostbackup will get deprecated in NPS version 7, but I only have access to version 6. As always, [RTFM] for your specific NPS code level.)
The nzbackup with the users parameter will backup the global users, groups and permissions. This is included in the /nz/data backup contents from the nzhostbackup command, but you may want to backup and restore these separately.
The nzbackup with the db parameter will backup a user database in compressed format. To backup individual tables, use the CREATE EXTERNAL TABLE command, which can create compressed or text-format exports.
You may find that your databases are so large, they will exceed the limits of the filesystem on the external storage device. For SAN or NAS deployments, I recommend the IBM Storwize V7000 Unified with IBM General Parallel File System (GPFS). However, if you are using something else, you may need to use the "nz_backup" scripts provided which split up the backup images into smaller pieces that most other filesystems can handle.
The PDA comes with 10GbE Ethernet ports that you can attach a NAS storage device over a Local Area Network (LAN), or add Fibre Channel Protocol (FCP) ports and connect over a Storage Area Network (SAN). To keep things simple, I will refer to whichever network you decide as the "Backup Network" in the drawings.
The second method for backup is called the "External Backup Software" method. As you have probably guessed, it involves sending the backups to a supported software product like IBM Tivoli Storage Manager (or, TSM for short).
In this case, the PDA acts as a client node, similar to a laptop, desktop, or application server with internal disk. Backup data is sent over the LAN to the designated TSM server, and the TSM server in turn writes over the SAN to its storage hierarchy of disk, virtual tape and/or physical tape resources.
Backups can be done by command "on demand", or automated on a schedule. For the /nz/data directory, direct the nzhostbackup command to send the backup copy to local disk, then use TSM's dsmc archive command to transfer this backup copy to the TSM server.
For nzbackup with the users or db parameters, you can send the data directly to the appropriate TSM server by specifying the connector and connectorArgs parameters.
To reduce traffic on the TSM Server, an intermediary "TSM Proxy Node" can be put in between. In this case, the PDA sends the backup to the Proxy Node, the Proxy Node uses a "LAN Free Storage Agent" to send the backups directly to the virtual tape and/or physical tape, and then notifies the TSM Server to updates its system catalog to record which tape holds these new backups.
Another configuration involves installing the TSM LAN Free storage agent directly on the PDA. While this will require FCP ports to be added and consume more CPU resources on the NPS server, it eliminates most of the LAN traffic, allowing the PDA to send its backups directly to virtual or physical tape.
If Eskimos have 37 words for "snow", then EMC has perhaps a similar number of names for "failure". I have already covered a few of their past attempts, including [ATMOS], [Invista], and [VPLEX]. Last week, EMC introduced its latest, called XtremeIO.
But rather than focus on XtremeIO's many shortcomings, I thought it would be better to point out the highlights of IBM's All-Flash array, IBM FlashSystem.
But first, a quick story.
Two years ago, I worked the booth at [Oracle OpenWorld 2011]. After a conference attendee had visited the booths of Violin Memory and Pure Storage, he asked me why IBM did not have an all-Flash array.
Of course IBM did, and I showed him the [Storwize V7000]. For example, a 2U model with 18 SSD drives of 400GB each, configured in two RAID-5 ranks 7+P+S could offer 5.6 TB of space, running up to 250,000 IOPS at sub-millisecond response times.
Why didn't IBM advertise the Storwize V7000 as an all-Flash array? I though the question was silly at the time, since the Storwize V7000 supported SSD, 15K, 10K and 7200 RPM spinning disk, it seemed obvious that it could be configured with only SSD if you chose.
Since then, IBM has added 800GB support to the Storwize V7000, doubling the capacity. More importantly, IBM acquired Texas Memory Systems, and offers a much better all-Flash array.
Flash can be deployed in three levels. The first is in the server itself, such as with PCiE cards containing Flash chips, limited to applications running on that server only.
The second option is a hybrid disk system, that can intermix Flash-based Solid State Drives (SSD) with regular spinning hard disk drives (HDD). These can be attached to many servers.
The problem with this approach is that when Flash is packaged to pretend to be spinning disk, it undermines some of the performance benefits. Traditional disk system architectures using SCSI commands over Device adapter loops can introduce added latency.
The third fits snuggly in the middle: all-Flash arrays designed from the ground up to be only Flash.
Whereas SSD can typically achieve an I/O latency in the 300 to 1000 microseconds range, IBM FlashSystem can process I/O in the 25 to 110 microsecond range. That is a huge difference!
(FTC Disclosure: The U.S. Federal Trade Commission requires that I mention that I am an IBM employee, and that this post may be considered a paid, celebrity endorsement of both the IBM FlashSystem and IBM Storwize family of products. I have no financial interest in EMC, do not endorse the XtremeIO mentioned here, and was not paid to mention their company or products in any manner.)
Fellow blogger and IBM Master Inventor Barry Whyte has a great comparison table in his blog post [Extreme Blogging]. I thought I would add an added column for the Storwize V7000 with 18 Solid State drives.
IBM FlashSystem 820
IBM Storwize V7000 with SSD
20 Terabytes: 1U
11 Terabytes: 2U
7 Terabytes: 6U
I/O latency (microseconds)
110us (~5x faster)
Maximum I/O per second
NAND Flash type
While it is easy to show that EMC's XtremeIO does not hold a candle to IBM FlashSystems, I think it is more amusing that it is not even as good as a Storwize V7000 with SSD that IBM offered two years ago, long before [EMC acquired XtremeIO company] back in May 2012.
The first day of the residency started with introductions. Our emcee and project leader is Vasfi Gucer from IBM Austin lab. There are 17 participants (referred to as "residents") from the USA and various countries including Brazil, Canada and Sweden.
Michael Fork presenting. I am sitting on the far left side
in the pink shirt. Photo taken by Tina Williams.
To set the right expectations, Tina Williams (IBM Social Media ITSO Projects Program Manager) explained what was going to happen this week.
In a typical "residency", residents are brought together for 4-6 weeks to write an [IBM Redbook] which are often how-to guides written in a very conversational tone.
This residency is different. A bunch of social media and Cloud experts have been brought together to share experiences and to build up skills to write individual blog posts about IBM Cloud offerings. I was invited as both a world-reknown blogger as well as a Cloud expert. Everyone who signed up for this commits to write at least six blog posts about Cloud sometime in the next 90 days.
(Residents who do not have their own blogs can post to the IBM [Thoughts on Cloud] group blog Publishing is part of our promotion process, and writing blogs consistently over a period of time counts!)
Jennifer Turner (IBM Worldwide Cloud Marketing Manager) explained IBM Cloud Social Media Initiative. Five years ago, IBM was one of the top 5 Cloud service providers, then a whole bunch of things happened, and we fell out of the top 5 list, and now with the recent [IBM acquisition of SoftLayer], we are in the top 5 again!
Michael Fork (IBM CloudFirst Lead Architect), presented the latest about SoftLayer. Wow! He did a great job, and am glad to have him as a contact in case I have future questions from clients at the Tucson Executive Briefing Center.
Mohsin Syed [@mohsinusyed], IBM Development Manager, presented [IBM Social Media Analytics], combining Hadoop-style analytics using IBM BigInsights, DB2 database and Cognos reporting. IBM can do [sentiment analysis] to determine positive and negative comments in various languages. This product was formerly known as Cognos Consumer Insight.
I was the last speaker of the day. As one of the top bloggers in both the IT Storage Industry, and company-wide within IBM, I was invited to provide a few tips on blogging to the newbies in the audience. Jeff Antley, the "co-owner" of my blog [Inside System Storage] who works on the IBM developerWorks team, was there on hand to help answer questions.
(IBM requires all highly-visible corporate blogs like mine to have at least two owners. Jeff is an expert at HTML, CSS and other web design and has been immensely helpful in getting my blog looking nicer.)
Well it's Tuesday again, and you know what that means.. IBM announcements! Today, IBM announces that next Monday marks the 60th anniversary of first commercial digital tape storage system! I am on the East coast this week visiting clients, but plan to be back in Tucson in time for the cake and fireworks next Monday.
1925 - masking tape (which 3M sold under its newly announced Scotch® brand)
1930 - clear cellulose-based tape (today, when people say Scotch tape, they usually are referring to the cellulose version)
1935 - Allgemeine Elektrizitatsgesellschaft (AEG) presents Magnetophon K1, audio recording on analog tape
1942 - Duct tape
1947 - Bing Crosby adopts audio recording for his radio program. This eliminated him doing the same program live twice per day, perhaps the first example of using technology for "deduplication".
According to the IBM Archives the [IBM 726 tape drive was formally announced May 21, 1952]. It was the size of a refrigerator, and the tape reel was the size of a large pizza. The next time you pull a frozen pizza from your fridge, you can remember this month's celebration!
When I first joined IBM in 1986, there were three kinds of IBM tape. The round reel called 3420, and the square cartridge called 3480, and the tubes that contained a wide swath of tape stored in honeycomb shelves called the [IBM 3850 Mass Storage System].
My first job at IBM was to work on DFHSM, which was specifically started in 1977 to manage the IBM 3850, and later renamed to the DFSMShsm component of the DFSMS element of the z/OS operating system. This software was instrumental in keeping disk and tape at high 80-95 percent utilization rates on mainframe servers.
While visiting a client in Detroit, the client loved their StorageTek tape automation silo, but didn't care for the StorageTek drives inside were incompatible with IBM formats. They wanted to put IBM drives into the StorageTek silos. I agreed it was a good idea, and brought this back to the attention of development. In a contentious meeting with management and engineers, I presented this feedback from the client.
Everyone in the room said IBM couldn't do that. I asked "Why not?" The software engineers I spoke to already said they could support it. With StorageTek at the brink of Chapter 11 bankruptcy, I argued that IBM drives in their tape automation would ease the transition of our mainframe customers to an all-IBM environment.
Was the reason related to business/legal concerns, or was their a hardware issue? It turned out to be a little of both. On the business side, IBM had to agree to work with StorageTek on service and support to its mutual clients in mixed environments. On the technical side, the drive had to be tilted 12 degrees to line up with the robotic hand. A few years later, the IBM silo-compatible 3592 drive was commercially available.
Rather than put StorageTek completely out of business, it had the opposite effect. Now that IBM drives can be put in StorageTek libraries, everyone wanted one, basically bringing StorageTek back to life. This forced IBM to offer its own tape automation libraries.
In 1993, I filed my first patent. It was for the RECYCLE function in DFHSM to consolidate valid data from partial tapes to fresh new tapes. Before my patent, the RECYCLE function selected tapes alphabetically, by volume serial (VOLSER). My patent evaluated all tapes based on how full they were, and sorted them least-full to most-full, to maximize the return of cartridges.
Different tape cartridges can hold different amounts of data, especially with different formats on the same media type, with or without compression, so calculating the percentage full turned out to be a tricky algorithm that continues to be used in mainframe environments today.
The patent was popular for cross-licensing, and IBM has since filed additional patents for this invention in other countries to further increase its license revenue for intellectual property.
In 1997, IBM launched the IBM 3494 Virtual Tape Server (VTS), the first virtual tape storage device, blending disk and tape to optimal effect. This was based off the IBM 3850 Mass Storage Systems, which was the first virtual disk system, that used 3380 disk and tape to emulate the older 3350 disk systems.
In the VTS, tape volume images would be emulated as files on a disk system, then later moved to physical tape. We would call the disk the "Tape Volume Cache", and use caching algorithms to decide how long to keep data in cache, versus destage to tape. However, there were only a few tape drives, and sometimes when the VTS was busy, there were no tape drives available to destage the older images, and the cache would fill up.
I had already solved this problem in DFHSM, with a function called pre-migration. The idea was to pre-emptively copy data to tape, but leave it also on disk, so that when it needed to be destaged, all we had to do was delete the disk copy and activate the tape copy. We patented using this idea for the VTS, and it is still used in the successor models of IBM Sysem Storage TS7740 virtual tape libraries today.
Today, tape continues to be the least expensive storage medium, about 15 to 25 times less expensive, dollar-per-GB, than disk technologies. A dollar of today's LTO-5 tape can hold 22 days worth of MP3 music at 192 Kbps recording. A full TS1140 tape cartridge can hold 2 million copies of the book "War and Peace".
(If you have not read the book, Woody Allen took a speed reading course and read the entire novel in just 20 minutes. He summed up the novel in three words: "It involves Russia." By comparison, in the same 20 minutes, at 650MB/sec, the TS1140 drive can read this novel over and over 390,000 times.)
If you have your own "war stories" about tape, I would love to hear them, please consider posting a comment below.
Tonight PBS plans to air Season 38, Episode 6 of NOVA, titled [Smartest Machine On Earth]. Here is an excerpt from the station listing:
"What's so special about human intelligence and will scientists ever build a computer that rivals the flexibility and power of a human brain? In "Artificial Intelligence," NOVA takes viewers inside an IBM lab where a crack team has been working for nearly three years to perfect a machine that can answer any question. The scientists hope their machine will be able to beat expert contestants in one of the USA's most challenging TV quiz shows -- Jeopardy, which has entertained viewers for over four decades. "Artificial Intelligence" presents the exclusive inside story of how the IBM team developed the world's smartest computer from scratch. Now they're racing to finish it for a special Jeopardy airdate in February 2011. They've built an exact replica of the studio at its research lab near New York and invited past champions to compete against the machine, a big black box code -- named Watson after IBM's founder, Thomas J. Watson. But will Watson be able to beat out its human competition?"
Like most supercomputers, Watson runs the Linux operating system. The system runs 2,880 cores (90 IBM Power 750 servers, four sockets each, eight cores per socket) to achieve 80 [TeraFlops]. TeraFlops is the unit of measure for supercomputers, representing a trillion floating point operations. By comparison, Hans Morvec, principal research scientist at the Robotics Institute of Carnegie Mellon University (CMU) estimates that the [human brain is about 100 TeraFlops]. So, in the three seconds that Watson gets to calculate its response, it would have processed 240 trillion operations.
Several readers of my blog have asked for details on the storage aspects of Watson. Basically, it is a modified version of IBM Scale-Out NAS [SONAS] that IBM offers commercially, but running Linux on POWER instead of Linux-x86. System p expansion drawers of SAS 15K RPM 450GB drives, 12 drives each, are dual-connected to two storage nodes, for a total of 21.6TB of raw disk capacity. The storage nodes use IBM's General Parallel File System (GPFS) to provide clustered NFS access to the rest of the system. Each Power 750 has minimal internal storage mostly to hold the Linux operating system and programs.
When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.
“In times of universal deceit, telling the truth will be a revolutionary act.”
-- George Orwell
Well, it has been over two years since I first covered IBM's acquisition of the XIV company. Amazingly, I still see a lot of misperceptions out in the blogosphere, especially those regarding double drive failures for the XIV storage system. Despite various attempts to [explain XIV resiliency] and to [dispel the rumors], there are still competitors making stuff up, putting fear, uncertainty and doubt into the minds of prospective XIV clients.
Clients love the IBM XIV storage system! In this economy, companies are not stupid. Before buying any enterprise-class disk system, they ask the tough questions, run evaluation tests, and all the other due diligence often referred to as "kicking the tires". Here is what some IBM clients have said about their XIV systems:
“3-5 minutes vs. 8-10 hours rebuild time...”
-- satisfied XIV client
“...we tested an entire module failure - all data is re-distributed in under 6 hours...only 3-5% performance degradation during rebuild...”
-- excited XIV client
“Not only did XIV meet our expectations, it greatly exceeded them...”
In this blog post, I hope to set the record straight. It is not my intent to embarrass anyone in particular, so instead will focus on a fact-based approach.
Fact: IBM has sold THOUSANDS of XIV systems
XIV is "proven" technology with thousands of XIV systems in company data centers. And by systems, I mean full disk systems with 6 to 15 modules in a single rack, twelve drives per module. That equates to hundreds of thousands of disk drives in production TODAY, comparable to the number of disk drives studied by [Google], and [Carnegie Mellon University] that I discussed in my blog post [Fleet Cars and Skin Cells].
Fact: To date, no customer has lost data as a result of a Double Drive Failure on XIV storage system
This has always been true, both when XIV was a stand-alone company and since the IBM acquisition two years ago. When examining the resilience of an array to any single or multiple component failures, it's important to understand the architecture and the design of the system and not assume all systems are alike. At it's core, XIV is a grid-based storage system. IBM XIV does not use traditional RAID-5 or RAID-10 method, but instead data is distributed across loosely connected data modules which act as independent building blocks. XIV divides each LUN into 1MB "chunks", and stores two copies of each chunk on separate drives in separate modules. We call this "RAID-X".
Spreading all the data across many drives is not unique to XIV. Many disk systems, including EMC CLARiiON-based V-Max, HP EVA, and Hitachi Data Systems (HDS) USP-V, allow customers to get XIV-like performance by spreading LUNs across multiple RAID ranks. This is known in the industry as "wide-striping". Some vendors use the terms "metavolumes" or "extent pools" to refer to their implementations of wide-striping. Clients have coined their own phrases, such as "stripes across stripes", "plaid stripes", or "RAID 500". It is highly unlikely that an XIV will experience a double drive failure that ultimately requires recovery of files or LUNs, and is substantially less vulnerable to data loss than an EVA, USP-V or V-Max configured in RAID-5. Fellow blogger Keith Stevenson (IBM) compared XIV's RAID-X design to other forms of RAID in his post [RAID in the 21st Centure].
Fact: IBM XIV is designed to minimize the likelihood and impact of a double drive failure
The independent failure of two drives is a rare occurrence. More data has been lost from hash collisions on EMC Centera than from double drive failures on XIV, and hash collisions are also very rare. While the published worst-case time to re-protect from a 1TB drive failure for a fully-configured XIV is 30 minutes, field experience shows XIV regaining full redundancy on average in 12 minutes. That is 40 times less likely than a typical 8-10 hour window for a RAID-5 configuration.
A lot of bad things can happen in those 8-10 hours of traditional RAID rebuild. Performance can be seriously degraded. Other components may be affected, as they share cache, connected to the same backplane or bus, or co-dependent in some other manner. An engineer supporting the customer onsite during a RAID-5 rebuild might pull the wrong drive, thereby causing a double drive failure they were hoping to avoid. Having IBM XIV rebuild in only a few minutes addresses this "human factor".
In his post [XIV drive management], fellow blogger Jim Kelly (IBM) covers a variety of reasons why storage admins feel double drive failures are more than just random chance. XIV avoids load stress normally associated with traditional RAID rebuild by evenly spreading out the workload across all drives. This is known in the industry as "wear-leveling". When the first drive fails, the recovery is spread across the remaining 179 drives, so that each drive only processes about 1 percent of the data. The [Ultrastar A7K1000] 1TB SATA disk drives that IBM uses from HGST have specified 1.2 million hours mean-time-between-failures [MTBF] would average about one drive failing every nine months in a 180-drive XIV system. However, field experience shows that an XIV system will experience, on average, one drive failure per 13 months, comparable to what companies experience with more robust Fibre Channel drives. That's innovative XIV wear-leveling at work!
Fact: In the highly unlikely event that a DDF were to occur, you will have full read/write access to nearly all of your data on the XIV, all but a few GB.
Even though it has NEVER happened in the field, some clients and prospects are curious what a double drive failure on an XIV would look like. First, a critical alert message would be sent to both the client and IBM, and a "union list" is generated, identifying all the chunks in common. The worst case on a 15-module XIV fully loaded with 79TB data is approximately 9000 chunks, or 9GB of data. The remaining 78.991 TB of unaffected data are fully accessible for read or write. Any I/O requests for the chunks in the "union list" will have no response yet, so there is no way for host applications to access outdated information or cause any corruption.
(One blogger compared losing data on the XIV to drilling a hole through the phone book. Mathematically, the drill bit would be only 1/16th of an inch, or 1.60 millimeters for you folks outside the USA. Enough to knock out perhaps one character from a name or phone number on each page. If you have ever seen an actor in the movies look up a phone number in a telephone booth then yank out a page from the phone book, the XIV equivalent would be cutting out 1/8th of a page from an 1100 page phone book. In both cases, all of the rest of the unaffected information is full accessible, and it is easy to identify which information is missing.)
If the second drive failed several minutes after the first drive, the process for full redundancy is already well under way. This means the union list is considerably shorter or completely empty, and substantially fewer chunks are impacted. Contrast this with RAID-5, where being 99 percent complete on the rebuild when the second drive fails is just as catastrophic as having both drives fail simultaneously.
Fact: After a DDF event, the files on these few GB can be identified for recovery.
Once IBM receives notification of a critical event, an IBM engineer immediately connects to the XIV using remote service support method. There is no need to send someone physically onsite, the repair actions can be done remotely. The IBM engineer has tools from HGST to recover, in most cases, all of the data.
Any "union" chunk that the HGST tools are unable to recover will be set to "media error" mode. The IBM engineer can provide the client a list of the XIV LUNs and LBAs that are on the "media error" list. From this list, the client can determine which hosts these LUNs are attached to, and run file scan utility to the file systems that these LUNs represent. Files that get a media error during this scan will be listed as needing recovery. A chunk could contain several small files, or the chunk could be just part of a large file. To minimize time, the scans and recoveries can all be prioritized and performed in parallel across host systems zoned to these LUNs.
As with any file or volume recovery, keep in mind that these might be part of a larger consistency group, and that your recovery procedures should make sense for the applications involved. In any case, you are probably going to be up-and-running in less time with XIV than recovery from a RAID-5 double failure would take, and certainly nowhere near "beyond repair" that other vendors might have you believe.
Fact: This does not mean you can eliminate all Disaster Recovery planning!
To put this in perspective, you are more likely to lose XIV data from an earthquake, hurricane, fire or flood than from a double drive failure. As with any unlikely disaster, it is best to have a disaster recovery plan than to hope it never happens. All disk systems that sit on a single datacenter floor are vulnerable to such disasters.
For mission-critical applications, IBM recommends using disk mirroring capability. IBM XIV storage system offers synchronous and asynchronous mirroring natively, both included at no additional charge.
This week, I was reminded that back in 2011, Watson beat two human players, Ken Jennings and Brad Rutter on the TV game show "Jeopardy!" On his last response, Ken wrote "I for one welcome our new computer overlords." With IBM investing heavily in Cognitive Solutions, should people be worried, or welcome the new technology?
Back in 1950, Isaac Asimov proposed "Three laws of robots":
A robot may not injure a human being or, through inaction, allow a human being to come to harm.
A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Let's take a look at how Artificial Intelligence has been represented in the movies over the past few decades. I have put these in chronological order when they were initially released in the United States.
(FCC Disclosure and Spoiler Alert: I work for IBM. This blog post can be considered a "paid celebrity endorsement" for cognitive solutions made by IBM. While IBM may have been involved or featured in some of these movies, I have no financial interest in them. I have seen them all and highly recommend them. I am hoping that you have all seen these, or at least familiar enough with their plot lines that I am not spoiling them for you.)
2001: A Space Odyssey
Back in 1968, Stanley Kubrick and Arthur C. Clarke made a masterpiece movie about a mysterious obelisk floating near Jupiter. To investigate, a crew of human beings takes a space ship managed by a sentient computer named [HAL-9000].
(Many people thought HAL was a subtle reference to IBM. Stanley Kubrick clarifies:
"By the way, just to show you how interpretation can sometimes be bewildering: A cryptographer went to see the film, and he said, 'Oh. I get it. Each letter of HAL's name is one letter ahead of IBM. The H is one letter in front of I, the A is one letter in front of B, and the L is one letter in front of M.'
Now this is a pure coincidence, because HAL's name is an acronym of heuristic and algorithmic, the two methods of computer programming...an almost inconceivable coincidence. It would have taken a cryptographer to have noticed that."
Source: The Making of 2001: A Space Odyssey, Eye Magazine Interview, Modern Library, pp. 249)
The problem arises when HAL-9000 refuses commands from the astronauts. The astronauts are not in control, HAL-9000 was given separate orders from ground control back on earth, and it has determined it would be more successful without the crew.
In 1973, Michael Crichton wrote and directed this movie about an amusement park with three uniquely themed areas: Medieval World, Roman World, and Westworld. Robots are used to staff the parks to make them more realistic, interacting with the guests in character appropriate for each time period.
A malfunction spreads like a computer virus among the robots, causing them to harm or kill the park's guests. Yul Brenner played a robot called simply "the Gunslinger". Equipped with fast reflexes and infrared vision, the Gunslinger proves especially deadly!
(Michael Crichton also wrote "Jurassic Park", which had a similar story line involving dinosaurs with catastrophic results!)
Last year, HBO launched a TV series called "Westworld", based on the same themes covered in this movie. The first season of 10 episodes just finished, and the next season is scheduled for 2018.
Directed by Ridley Scott, this 1982 movie stars Harrison Ford as Rick Deckard, a law enforcement officer. Rick is tasked to hunt down and "retire" four cognitive androids named "replicants" that have killed some humans and are now in search of their creator, a man named J. F. Sebastian.
(I enjoy the euphemisms used in these movies. Terms like kill, murder or assassinate apply to humans but not machines. The word "retire" in this movie refers to destruction of the robots. As we say in IBM, "retirement is not something you do, it is something done to you!")
Destroying machines does not carry the same emotional toll as killing humans, but this movie explores that empathy. A sequel called "Blade Runner 2049" will be released later this year.
In 1983, Matthew Broderick plays David, a young high school student who hacks into the U.S. Military's War Operation Plan Response (WOPR) computer. The WOPR was designed to run various strategic games, including war game simulations, learning as it goes. David decides to initiate the game "Global Thermonuclear War", and the military responds as if the threats were real.
Can the computer learn that the only way to win a war is not to wage it in the first place? And if a computer can learn this, can our human leaders learn this too?
In this series of movies, a franchise spanning from 1984 to 2009, the US Military builds a defense grid computer called [Skynet]. After cognitive learning at an alarming rate, Skynet becomes self-aware, and decides to launch missiles, starting a nuclear war that kills over 3 billion people.
Arnold Schwarzenegger plays the Terminator model T-800, a cognitive solution in human form designed by Skynet to finish the job and kill the remainder of humanity.
In this 2004 movie, Will Smith plays Del Spooner, a technophobic cop who investigates a crime committed by a cognitive robot.
(Many people associate the title with author Isaac Asimov. A short story called "I, Robot" written by Earl and Otto Binder was published in the January 1939 issue of 'Amazing Stories', well before the unrelated and more well-known book 'I, Robot' (1950), a collection of short stories, by Asimov.
Asimov admitted to being heavily influenced by the Binder short story. The title of Asimov's collection was changed to "I, Robot" by the publisher, against Asimov's wishes. Source: IMDB)
Del Spooner uncovers a bigger threat to humanity, not just a single malfunctioning robot, but rather the Virtual Interactive Kinesthetic Interface, or simply VIKI for short, a cognitive solution that controls all robots. VIKI interprets Asimov's three laws in a manner not originally intended.
In this 2015 movie, Domhnall Gleeson plays Caleb, a 26 year old programmer at the world's largest internet company. Caleb wins a competition to spend a week at a private mountain retreat. However, when Caleb arrives he discovers that he must interact with Ava, the world's first true artificial intelligence, a beautiful robot played by Alicia Vikander.
(The title derives from the Latin phrase "Deus Ex-Machina," meaning "a god from the Machine," a phrase that originated in Greek tragedies. Sources: IMDB)
Nathan, the reclusive CEO of this company, relishes this opportunity to have Caleb participate in this experiment, explaining how Artificial Intelligence (AI) will transform the world.
(The three main characters all have appropriate biblical names. Ava is a form of Eve, the first woman; Nathan was a prophet in the court of David; and Caleb was a spy sent by Moses to evaluate the Promised Land. Source: IMDB)
The premise is based in part on the famous [Turing Test], developed by Alan Turing. This is designed to test a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.
Movies that depict the bad guys as a particular nationality, ethnicity or religion may be offensive to some movie audiences. Instead, having dinosaurs, monsters, aliens or robots provides a villain that all people can fear equally. This helps movie makers reach a more global audience!
Of course, if robots, androids and other forms of Artificial Intelligence did exactly what humans expect them to, we would not have the tense, thrilling action movies to watch on the big screen.
This is not a complete list of movies. Enter in the comments below your favorite movie that features Artificial Intelligence and why it is your favorite!
Last month, I had the pleasure to help train Watson in its latest mission, to help answer questions from sellers, this are not just for the IBM feet on the street, but also for IBM distributors and IBM Business Partners as well.
"... [survey by SearchYourCloud] revealed 'workers took up to 8 searches to find the right document and information.' Here are a few other statistics that help tell the tale of information overload and wasted time spent searching for correct information -- either external or internal:
'According to a McKinsey report, employees spend 1.8 hours every day -- 9.3 hours per week, on average -- searching and gathering information. Put another way, businesses hire 5 employees but only 4 show up to work; the fifth is off searching for answers, but not contributing any value.' Source: [Time Searching for Information]
'19.8 percent of business time -- the equivalent of one day per working week -- is wasted by employees searching for information to do their job effectively,' according to Interact. Source: [A Fifth of Business Time is Wasted]
IDC data shows that 'the knowledge worker spends about 2.5 hours per day, or roughly 30 percent of the workday, searching for information ... 60 percent [of company executives] felt that time constraints and lack of understanding of how to find information were preventing their employees from finding the information they needed.' Source: [Information: The Lifeblood of the Enterprise]."
In the early days of the Internet, before search engines like Google or Bing, I competed in [Internet Scavenger Hunts]. A dozen or more contestants would be in a room, and would be given a list of 20 questions to find answers for. Each of us would then hunt down answers on the Internet. The person to find the most documented answers before time runs out wins. It was quite the challenge!
Over the years, I have honed my skills as a [Search Ninja]. With over 30 years of experience in IBM Storage, many sellers come to me for answers. Sometimes sellers are just too lazy to look for the answers themselves, too busy trying to meet client deadlines, or too green to know where to look.
A good portion of my 60-hour week is spent helping sellers find the answers they are looking for. Sometimes I dig into the [SSIC], product data sheets, or various IBM Redbooks.
Other times, I would confer with experts, engineers and architects in particular development teams. Often, I learn something new myself. In a few cases, I have turned some questions into ideas for blog posts!
It was no surprise when I was asked to help train Watson for the new "Systems SmartSeller" tool. This will be a tool that runs on smartphones or desktops to help answer questions that sellers might need to respond to RFP or other client queries.
The premise was simple. Treat Watson as a student at "Cognitive University" taking classes from dozens of IBM professors, in a series of semesters, or "phases".
Phase I involved building the "Corpus", the set of documents related to z Systems, POWER systems, Storage and SDI solutions; and a "Grading Tool" that would be used as the Graphical User Interface. I was not involved in phase I.
Phase II was where I came in. Hundreds of questions are categorized by product area. I worked on 500 questions for storage. For each question, Watson had up to eleven different responses, typically a paragraph from the Corpus. My job as a professor was to grade the responses to some 500 storage questions:
★ (one star)
Irrelevant, answer not even storage-related
★★ (two stars)
Relevant, at least it is storage-related, but does not answer the question, or answers it poorly
★★★ (three stars)
Relevant, adequately answers the question
★★★★ (four stars)
Relevant, answers the question well
Most of the answers were either 1-star (not storage related) or 2-star (mentioned storage, but poor response). I would search through the existing Corpus looking for a better answer, and at best found only 3-star responses, which I would add to the list and grade as a 3-star response.
I then searched the Internet for better answers. Once I found a good match, I would type up a 4-star response, add it to the list, and point it to the appropriate resources on the Web.
Other professors, who were also looking at these questions, would then get to grade my suggested responses as well. Watson would learn based on the consensus of how appropriate and accurate each response was graded.
I don't know where the Cognitive University team got some of the questions, but they were quite representative of the ones I get every week. In some cases, the seller didn't understand the question he heard from the client, making it difficult for me to figure out what they were actually asking for.
It reminds me of that parlor game ["Telephone" or "Chinese Whispers"], in which one person whispers a message to the ear of the next person through a line of people until the last player announces the message to the entire group. I have actually played this at an IBM event in China!
Watson needs to parse the question into nouns and verbs, and use that Natural Linguistic Programming (NLP) to then search the Corpus for appropriate answer. I determined three challenges for Watson in this case:
The questions are not always fully formed sentences. For example, "Object storage?" Is this asking what is object storage in general, or rather what does IBM offer in this area?
The questions often do not spell the names of products correctly, or use informal abbreviations. "Can Store-wise V7 do RtC?" is a typical example, short for "Can the IBM Storwize V7000 storage controller perform Real-time Compression?"
The questions ask what is planned in the future. "When will IBM offer feature x in product y?" I am sorry, but Watson is not [Zoltar, the fortune teller]!
I managed to grade the responses in the two weeks we were given. Part of my frustration was the grading tool itself was a bit buggy, and I spent some time trying to track down some of its flaws.
The next phase is in late January and February. This will give the Cognitive University team a chance to update the Corpus, improve the grading interface, and find more professors and different set of questions. I volunteered the most recent four years' worth of my blog posts to be added to the Corpus.
Maybe this tool will help me turn my 60-hour week back to the 40-hour week it should be!
Well, it's Tuesday again, and you know what that means? IBM Announcements!
(OK, yes, today is Friday, but I was busy getting married on Tuesday, so IBM pushed the announcements out one day to Wednesday, and technically I am writing this blog post during my honeymoon vacation, so the IBM marketing team and my new wife both cut me some slack. Work/Life balance is all about compromises, right?)
IBM DS8880 Storage System
The IBM DS8880 comes in three models, the DS8884 entry level, the DS8886 enterprise level, and the DS8888 all-flash array. IBM offers 1, 2, 3 and 4 year warranties.
The new High Performance Flash Enclosure (HPFE) Gen2 delivers more capacity than Gen1. The 2U flash enclosures are configured in pairs with each enclosure supporting up to twenty-four 2.5-inch flash cards in capacities 400 GB, 800 GB, 1.6 TB and 3.2 TB.
The HPFE Gen2 are currently available for both the DS8884 and DS8886 models. The maximum flash capacity for the DS8886 increases from 96 TB to 614.4 TB, delivering reduced storage costs through lesser cost per IOPS with this new flash enclosure. IBM has made a statement of direction to offer these HPFE Gen2 on the DS8888 as well.
To improve security, IBM DS8880 now supports customer-defined digital certificates for authentication, and configurable Hardware Management Console (HMC) firewall support.
For IBM's mainframe clients, IBM now offers "Extents-level" space release support for z/OS®, DSCLI (Command Line Interface) support for z/OS environment, and FICON® Information Unit (IU) pacing improvements.
IBM Spectrum Virtualize™ V7.8 delivers support for the latest SAN Volume Controller, FlashSystem V9000 and Storwize® product family, and adds new software functionality and improvements
In conjunction with [IBM Spectrum Copy Data Management], Spectrum Virtualize v7.8 offers flexible data protection with transparent cloud tiering to leverage the cloud as FlashCopy targets and restore these snapshots from the cloud on select platforms.
However, the encryption keys are kept on USB thumb drives, which are either left in the USB ports on the back of the hardware, or locked away in a safe, only to be retrieved as needed when rebooting the systems or upgrading the firmware.
Now, IBM Spectrum Virtualize v7.8 supports the IBM Security Key Lifecycle Manager (SKLM) to manage encryption keys. IBM continues to support USB thumb drives if you prefer, but SKLM is used to manage keys for most of the rest of IBM products, and provides centralized management.
The SVC and Storwize models can directly attach via 12Gb SAS to expansion drawers. At the time, we supported 2U-high 12-bay that support Large Form Factor (LFF) 3.5-inch Nearline (7200 rpm) drives, and 2U-high 24-bay that support the Small Form Factor (SFF) 2.5-inch drives (SSD, 15K, 10K and 7200 rpm).
With Spectrum Virtualize v7.8, IBM now offers a third option, the 5U-high 92-bay that supports both LFF and SFF drives. This new expansion can be attached to Storwize V5000 Gen2, Storwize V7000 (models 524/Gen2 and 624/Gen2+), and SVC (models DH8 and SV1).
For the 12-bay and 92-bay, IBM now supports 10TB capacity 3.5-inch Nearline drives. For the 24-bay and 92-bay, IBM now supports 7.68 TB and 15.36 TB capacity Solid State Drives (SSD).
For those concerned about the phrase "lower endurance" in the press release, let me explain. SSD have a bit of extra capacity included. If you write the full capacity of the drive every day for a year, you will "burn up" about one percent of the capacity.
To handle ten "Full Drive Writes per Day" (10 FDWP) over the course of five years, IBM adds 50 percent extra spare capacity above the 400 GB, 800 GB, 1.6 TB and 3.2 TB capacities. So, a 400GB full-endurance drive is really 600 GB inside. These were sometimes referred to as "Enterprise" SSD.
For the larger device sizes, the IT industry has determined that 1 FDWP is sufficient, so instead of 50 percent spare capacity, IBM adds only 5 percent extra. The 7.68 TB is really 8.06 TB inside. These were earlier referred to as "Read-Intensive" SSD. These come in 1.92 TB, 3.84 TB, 7.68 TB and 15.36 TB capacities.
IBM is also offering non-disruptive model conversions. Storwize V5010 can now be converted to V5020, and V5020 can be converted to V5030. The Storwize V7000 Model 524 (Gen2) can be converted to model 624 (Gen2+).
The DeepFlash 150 is the perfect JBOF addition to the ESS family. The current ESS models had either 2U-high 24-drive bays, or 4U-high 60-drive bays. This new model is 3U-high with 64 high-capacity (8 TB) Board Solid State Drives (BSSD).
The ESS includes all the features of IBM Spectrum Scale, including both 8+2 and 8+3 Erasure Coding data protection. This provides file and object access to data, including POSIX compliance for Windows, Linux and AIX operating systems, as well as HDFS-compliant access for big data analytics.
SAP HANA is an in-memory, relational database management system supported on Linux for x86 and POWER servers. The "HANA" acronym is short for "High-Performance Analytic Appliance" software. By keeping the data in memory, analytics and queries can be performed much faster than from traditional disk repositories.
Server memory, however, is volatile storage, so the data needs to be stored on persistent storage such as flash or disk drives. SAP has certified several configurations, some involve IBM Spectrum Scale solutions. I will use the following graphic to explain the three configurations.
Linux on x86-64 with Spectrum Scale FPO
With SAP HANA on Lenovo x86-64 servers, SAP has certified internal flash or disk drives running IBM Spectrum Scale in "File Placement Optimization" (FPO) mode. FPO provides a shared-nothing architecture that matches the SAP HANA architecture. IBM Spectrum Protect can backup this configuration, providing data protection and disaster recovery support.
Linux on POWER with Elastic Storage Server
With SAP HANA on POWER servers, SAP has certified external Elastic Storage Server (ESS). Not only is POWER the better platform to run SAP HANA than x86-64, but Elastic Storage Server offers excellent erasure coding to provide excellent rebuild times and storage efficiency.
The ESS is a pre-built system that combines IBM Spectrum Scale software with server and storage hardware. IBM Spectrum Protect can also backup this configuration, providing data protection and disaster recovery support.
Block-level Storage over Storage Area Network (SAN)
Various IBM block-level devices are support for SAP HANA on both Linux on x86-64 and Linux on POWER. Unfortunately, SAP only has certified (to date) the use of the XFS file system. The problem many clients mention about this configuration is the lack of end-to-end backup and disaster recovery. This is solved by the Spectrum Scale configurations in the previous two examples.
Other combinations, such as SAP HANA on POWER with Spectrum Scale FPO, or on x86-64 servers with Elastic Storage Serer, are either not SAP-certified, or not directly supported by SAP without their approval.
IBM and SAP have worked closely together for many years, and I am glad to see SAP HANA and IBM Spectrum Scale based solutions continue this tradition.
Well, it's Tuesday again, and you know what that means? IBM Announcements!
Last week, IBM announced a variety of tape system enhancements.
IBM TS7760 Virtual Tape System
The IBM TS7760 combines the benefits of the previous TS7720 and TS7740 offerings. Those with IBM z System mainframes will recognize both. The TS7740 has a small amount of disk that pretend to be a tape library, with enough capacity to hold a few hours to a few days worth of data. After that, the data is moved to physical tape. The TS7720 is an all-disk solution, holding up to 1 PB of disk to hold weeks or months worth of data, but did not have tape attachment. Previously, IBM announced the TS7720T, a high-capacity offering with tape attachment. The new TS7760 is now the replacement for all three of these, powered by the latest POWER8 processor.
In addition to all the features available in the former models, the new TS7760 uses 4TB drives instead of 3TB drives, resulting in a maximum capacity of 1.3PB of disk capacity before compression. The disks are encrypted and protected by distributed RAID-6 referred to as "Dynamic Disk Pooling". While tape attachment is still optional, it supports both IBM TS3500 and TS4500 tape libraries.
This week, I am attending the [InterConnect Conference] in Las Vegas, Feb 21-25, 2016. This is IBM's premier Cloud & Mobile conference for the year.
Monday afternoon, I attended various break-out sessions.
1441A Data Resiliency: Data-Driven Analytics and Beyond
Ramani Routray (IBM) and B.J. Klingenberg, IBM, co-presented. Aggressive and differentiated Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs) create data protection silos. Resiliency for an enterprise data center is often achieved via redundant components, periodic backup, continuous replication and/or highly available architectures. With the emergence of cloud delivery models, Backup-as-a-Service and DR-as-a-Service have gained wide acceptance. This uniquely challenges service providers to quickly analyze all the metadata from these environments to enable problem determination, fault isolation, capacity management, SLA violation, etc. Learn about a big data analytics framework that analyzes millions of resiliency metadata tuples in near real-time to generate actionable insights.
1267A Prudential and IBM: Integrating Application and Storage Management to Drive Cloud Service Levels
This was a 50/50 presentation, with the first half covered by clients OJ Dua, supported by his boss, Scott Singerline, both from Prudential Financial.
Prudential explored their successful approach for optimizing storage and improving service. First, experts from Prudential Financial will describe their experiences integrating IBM Spectrum Control v5.2 (formerly IBM Tivoli Storage Productivity Center) inventory, availability, and performance data with Tivoli Application Dependency Discovery Manager (TADDM) and Netcool OMNIbus to improve services for core business applications.
(Over 10 years ago, I was the chief architect for IBM TotalStorage Productivity Center v1. The clients from Prudential could not emphasize enough how much better Spectrum Control v5.2 was compared to their experiences with the prior versions. It has come a long way, baby!)
The second half was covered by Brian Sherman, IBM Distinguished Engineer. He described how related IBM Spectrum Storage solutions are transforming storage. IBM Spectrum Storage solutions deliver reliable, flexible service levels at a significantly lower cost than traditional storage.
6523A VersaStack: Because Time and Cost are of the Essence for Cloud Service Providers
This was more of a 25/75 presentation. Ian Shave, IBM Business Line Executive for Spectrum Virtualize and VersaStack, kicked off the session with a quick overview of VersaStack, which combines Cisco UCS x86 blade servers and Cisco network switches with IBM Spectrum Virtualize storage solutions. This is often referred to as "Integrated Infrastructure" or "Converged Systems". While the growth of Integrated Infrastructure adoption is growing 15 percent, storage within Integrated Infrastructure solutions is growing faster at 44 percent.
VersaStack can be implemented as follows:
Cisco UCS Mini with Storwize V5000, either iSCSI or FCP
Cisco UCS with Storwize V7000 (block-only) or V7000 Unified (file and block access)
Cisco UCS with FlashSystem V9000, for high-speed, low-latency application requirements
John Buskermolen and Dan Simunic, both from i-Virtualize, covered their experiences with VersaStack. Founded in 2009, i-Virtualize is a Managed Services Provider (MSP), Cloud Service Provider (CSP) and value-added reseller, for clients in both USA and Canada, growing 41 percent year over year.
They reduced the time to market from weeks to days, cut new environment provisioning time from days to minutes, and simplified management when it implemented VersaStack, an integrated infrastructure solution that combines Cisco UCS Integrated Infrastructure with IBM storage solutions built with IBM Spectrum Virtualize to deliver extraordinary levels of performance and efficiency.
Why did i-Virtualize choose VersaStack?
79 percent reduced provisioning time
60 percent lower costs
10x performance acceleration
Higher flexibility, with clustered systems that scale up and out
Let's i-Virtualize administrators and management sleep at night
47 percent capacity savings with Real-time Compression
IBM Spectrum Virtualize HyperSwap for high availability
Storage-based replication across multiple datacenters
Cisco UCS director provides single-pane-of-glass management
Their latest project is called VIXO, a Cloud Managed Services Console which stacks Cloud Foundry, Docker, OpenStack, VMware and other 3rd party services on top of their VersaStack. This is a collaboration with Oxbury Group.
VersaStack is an ideal solution for Cloud Service Providers (CSP) or for any client interested in "cloud-in-a-box."
3690A Meet the Experts on IBM Cloud Storage Services
Ann Corrao and Mike Fork, both from IBM, presented IBM's various storage capabilities on SoftLayer and Cloud Managed Services (CMS). Of IBM's 43 Cloud datacenters, 28 are SoftLayer, and the other 15 are CMS.
For block-based volume storage, SoftLayer offers "Endurance" and "Performance". These are backed by multi-pathed iSCSI volumes.
With "Endurance" option, you purchase a fixed I/O density, either 0.5 IOPS/GB, 1 IOPS/GB or 4 IOPS/GB. If you choose a 100 GB volume, you are guaranteed 400 IOPS. Typical business applications like database or email consume about 0.7 IOPS/GB.
With the "Performance" option, you pick the IOPS for your volume, up to 6,000 IOPS, and then pick the size to match your needs, say 100 GB. This is best suited for clients who know their application well enough to specify this.
IBM Bluemix also has a block service, based on OpenStack Cinder drivers. These are backed by internal disk on storage-rich servers. IBM SoftLayer can pack 4 drives into a 1U server, 12 drives into a 2U server and 36 drives into a 3U server.
For object store, IBM SoftLayer supports OpenStack Swift. They support content expiration, versioning and metadata search.
(When asked if this was Cleversafe or something else, Mike was quick to point out that IBM SoftLayer focuses on the "Service Level Agreement (SLA), the client experience, and the APIs" so however they chose to back this storage is internally determined. The client should not have to specify product xyz in their contract.)
An extra feature for object store is "Content Delivery Network" (CDN) which uses EdgeCast to cache content at the edges of the network to improve performance delivery. You designate which object containers you want to accelerate performance, and you pay for the amount of bandwidth consumed.
For file space, IBM SoftLayer supports NFS and SFTP only. Supporting CIFS, or rather its replacement SMB, is a known requirement. In the meantime, there are a variety of 3rd party "Cloud Gateway" solutions, like NetApp AltaVault, Panzura global namespace, or CTERA.
For file sync-and-share, IBM has partnered with Box to provide Enterprise-class service.
How do clients ingest data into their IBM SoftLayer account? One option is to use Aspera, a recent IBM acquisition that is 3x faster than traditional SCP. Another option is to ship disk or tape cartridges to IBM SoftLayer facility.
Well, it's Tuesday again, and you know what that means? IBM Announcements!
This week, IBM announces the second generation of Storwize V5000 flash and disk storage systems. There are the V5000F All-flash configurations, as well as the V5000 that can support a variety of flash and spinning disk drives.
There are three models:
The V5010 has dual 2-core/2-thread processors and 16GB of cache. It supports thin provisioning, FlashCopy, Easy Tier, and remote mirroring. The base unit includes 1 GbE Ethernet ports for iSCSI host connectivity, with options to add 16GB Fibre Channel, 12Gb SAS, and 10GbE iSCSI/FCoE as well.
The 2U controllers and expansion enclosures can hold either 24 small 2.5-inch drives, or 12 larger 3.5-inch drives. A single control enclosure has two active/active IBM Spectrum Virtualize nodes, and can attach up to 10 expansion enclosures for a maximum of 264 drives.
The V5020 unit has dual 2-core/4-thread processors and up to 32GB of cache. It supports everything the V5010 does, plus encryption. The encryption is done via the Intel AES-NI instruction set to eliminate the need for special "self-encrypting drives" (SED) that other storage devices may require.
The V5030 has dual 6-core/4-thread processors and up to 64GB of cache. It supports everything the V5010 and V5020 do, plus Real-time Compression and external virtualization. The Real-time Compression can achieve up to 80 percent space savings, representing a 5:1 compression ratio.
Each control enclosure can attach to 20 expansion enclosures, which can support 504 internal drives per controller, and up to 1,008 with two controllers (four Spectrum Virtualize nodes) clustered together. This is in addition to the drives in external storage systems virtualized.
Well it's Tuesday again, and you know what that means? IBM Announcements!
(FCC Disclosure: This official launch also includes October 6 announcements. In any case, the usual disclaimer applies: I currently work for IBM, and this blog post can be considered a "paid celebrity endorsement" of the IBM products mentioned below.)
IBM announced various updates to its Spectrum Storage product line. Here is a quick recap.
IBM Spectrum Virtualize 7.6
Spectrum Virtualize is the new name of the "storage hypervisor" code that resides in IBM SAN Volume Controller (SVC) and Storwize family products. When you buy an SVC, you will license Spectrum Virtualize software on it. It is NOT available separately as software-only that you can install on any other hardware. There are three major improvements:
Software-based Data-at-Rest Encryption
Earlier this year, IBM delivered data-at-rest encryption for the Storwize V7000 and V7000 Unified. This week, IBM extends this support to other storage hypervisors.
Since this feature is based on the Intel processor that supports the Advanced Encryption Standard New Instructions (AES-NI), it applies only to the newer hardware: SAN Volume Controller 2145-DH8, the Storwize V7000 Gen2, FlashSystem V9000, and VersaStack converged systems that contain these. You can run Spectrum Virtualize v7.6 on older hardware models, but the encryption feature will be disabled.
Basically, by taking advantage of AES-NI commands, IBM can now offer data-at-rest encryption on any virtualized flash or disk arrays, eliminating the need for special "Self-Encrypting Drives", or SED.
The encryption keys are kept on USB memory sticks, that you can either leave in the machine, or stash away in some vault or safe somewhere.
The other improvement is distributed RAID. Distributed RAID has been hugely popular on IBM XIV products, and has since found its way into the DCS3700, DCS3860 and Elastic Storage Server models.
With this new enhancement, storage admins can select "Distributed RAID-5" or "Distributed RAID-6" as alternate choices to traditional RAID ranks.
Why use it? All the drives are now active, eliminating idle spare drives that do nothing collecting dust and cobwebs waiting for an opportunity to spin up, and when they finally are used for a rebuild become a terrible bottleneck. Since all drives are reading and writing, the rebuild rate is an order of magnitude (5 to 10x) faster!
For those clients nervous about large 8TB drives and the number of days it would take to perform a traditional RAID rebuild, this should calm all of your fears.
This is one of those line-items that we have told clients that it was "just around the corner" and "coming soon, watch this space", and finally it is available. For clients using Stretched Cluster or HyperSwap across two buildings, best practices suggests keeping the quorum disk in a third building. This often met having to dedicate a single 2U disk system in a closet somewhere, with expensive Fibre Channel cables connecting to the other two buildings.
To address this, IBM now allows the quorum disk to be based on Internet Protocol (the IP portion of TCP/IP), which can be any bare-metal or virtual machine that is LAN or WAN attached. The "quorum disk" is just a little Java program. This can run on any cloud service provider as well, such as IBM SoftLayer, that both buildings have connectivity.
A minor improvement worth mentioning is that the IBM "Comprestimator" tool that estimates the capacity savings of Real-time Compression is now integrated into Spectrum Virtualize v7.6 command line interface (CLI), allowing you to run the tool on demand, as needed, on any virtual volume.
IBM Spectrum Scale v4.2
IBM plans to offer all of its solutions in any of three flavors: software-only that you can deploy on your own server hardware, pre-built system appliances, and cloud services on IBM SoftLayer, IBM Cloud Managed Services or third-party cloud providers. Spectrum Scale is the software-only flavor, and Elastic Storage Server and Storwize V7000 Unified are pre-built systems based on that software.
File and Object access
IBM published a "Redbook" on how to implement OpenStack Swift and Amazon S3 interfaces to an existing Spectrum Scale deployment. IBM supported it, but it was basically Do-it-Yourself DIY implementation. This has now been resolved, with full integration of OpenStack Swift and Amazon S3 object-protocol interfaces.
(For those unfamiliar with "Object storage", think of it like valet parking for your data. Before working for IBM, I was previously employed as a valet attendant, so I feel qualified to make this analogy.
If you park your car in a 10-story high parking structure, you have to remember where you parked to go find the car again. With valet parking, you hand over the keys to the valet attendant, the car gets parked, and you get a claim stub that you then use to get your car back. In the meantime, you don't know where your car is parked, and you don't care either!
Storing files in volume-level or file-level storage is like that 10-story high parking structure. You have to remember where you put it, which LUN or which sub-directory. With object storage, the system provides a "claim stub" in the form of an Universal Record Identifier, or URI, and simple HTTP commands like GET and POST can be used to upload and download the content.)
Policy-driven Compression and Quality of Service (QoS)
If you want to differentiate the levels of service provided by files and objects stored in your infrastructure, look no further. Simple SQL-like language is used to set up policies that are invoked when needed.
Hadoop Connector for File and Objects
The IBM Hadoop Connector allows Hadoop and Spark analytics applications to treat Spectrum Scale as a 100 percent compatible alternative to Hadoop File Systems (HDFS). Previously, this was only available for files, but now it has been extended to include objects as well.
Advanced Graphical User Interface (GUI)
Based on the award-winning GUI that has been used for IBM XIV, SVC, Storwize and various other members of the IBM System Storage family, IBM announces an HTML5-based web-browser GUI for configuring and managing Spectrum Scale and Elastic Storage Server (ESS).
Storwize V7000 Unified
The "file modules" that run IBM Spectrum Scale will get updated to R1.6 level, which supports SMB 3.0 and NFS 4.0 protocols. SMB support will now include both internal and externally-virtualized storage. You will also be able to use Active File Management to migrate to other Spectrum Scale implementations.
IBM Spectrum Control
As the former chief architect of IBM Tivoli Storage Productivity Center v1, I have been a big fan of the advancements and evolution of Spectrum Control. IBM offers three levels. The first level is "Basic Edition", entitled at no additional charge for IBM storage hardware clients. The second level is "Standard Edition" which offers configuration, provisioning and performance monitoring. The third level is "Advanced Edition", which includes advanced storage analytics, file-level reporting, storage tiering and data placement optimization.
You can imagine my skepticism when I was told that Spectrum Control was going to be enhanced to support Spectrum Scale. What could it offer? IBM Spectrum Scale already has built-in storage tiering and data placement optimization!
It turns out that having effective "management tools" was the #1 reason clients have stated were needed to implement and deploy Spectrum Scale. Since 1998, back when it was called General Parallel File System, or GPFS, the target market was High Performance Computing (HPC) familiar with Command Line Interfaces (CLI).
But IBM was to broaden the reach of IBM Spectrum Scale, to financial services, health care and life sciences, government and education, and a variety of other industries. They won't tolerate being limited to CLI interfaces.
For clients with multiple Spectrum Scale clusters, Spectrum Control can offer the following:
Visibility across the capacity utilization (file systems, pools, file sets, quotas) and cluster health across all Spectrum Scale clusters in the data center
Ability to specify alerts which are applied across all Spectrum Scale clusters, for things like relative or absolute free space in a file system, or inodes used, nodes going down, etc.
Understand the cross-cluster relationships established by remote cluster mounts, and seamlessly navigate between them
If external SAN storage is used, Spectrum Control shows the correlation between Spectrum Scale Network Shared Disks (NSD) and their corresponding SAN volumes, again with the ability to navigate between them; also it can provide performance monitoring for the volumes backing the NSD
Ability to monitor file capacity usage in the context of applications, by adding Spectrum Scale "file set containers" to application groups defined in Spectrum Control
Compare file system activity across Spectrum Scale clusters, with the ability to drill into file system and node performance charts
Support for object storage on Spectrum Scale, determine which object-enabled clusters are closest to running out of free space
While the basic built-in GUI is great for smaller deployments, if you have a dozen or more Spectrum Scale clusters, or have Spectrum Scale clusters intermixed with traditional block-level and NAS storage devices, then Spectrum Control is for you!
It used to take weeks to deploy the original versions of Tivoli Storage Productivity Center, but now, Spectrum Control is now offered in the cloud, and you can deploy it in as little as 30 minutes.
Want to check it out? You can explore Spectrum Control Storage Insights cloud service as a [Live Demo], or [Start your free trial]! The reporting capabilities of Spectrum Scale are identical between the on-premise version of Spectrum Control, and this cloud service offering.
Here's a great quote from a leading IT industry analyst:
"In multi-petabyte, multivendor installations, overall storage costs of ownership for use of IBM Spectrum Storage solutions averaged 73 percent less than EMC, and 61 percent less than Hitachi equivalents" -- Brian Jeffery, Managing Director, International Technology Group, Naples, FL
As IBM continues its transition from a hardware-oriented company founded over a century ago, manufacturing meat scales and cheese slicers, to one more focused on higher value-add software and services, the Spectrum Storage software family will play a critical role of this transformation!
It's Tuesday, and you know what that means? IBM Announcements! This week I am in beautiful Orlando, Florida for the [IBM Systems Technical University] conference.
This week, IBM announced its latest tape offerings for the seventh generation of Linear Tape Open (LTO-7), providing huge gains in performance and capacity.
For capacity, the new LTO-7 cartridges can hold up to 6TB native capacity, or 15TB effective capacity with 2.5x compression that for typical data. That is 2.4x larger than the 2.5TB catridges available with LTO-6. Performance is also nearly doubled, with a native throughput of 315 MB/sec, or effective 780 MB/sec effective capacity with 2.5x compression. The LTO consortium, of which IBM is a founding member, has published the roadmap for LTO generations to LTO-8, LTO-9 and LTO-10.
IBM will offer both half-height and full-height LTO-7 tape drives. All the features you love from LTO-6 like WORM, partitioning and Encryption carry forward. These drives will be supported on a variety of distributed operating systems, including Linux on z System mainframes, and the IBM i platform on POWER Systems.
The Linear Tape File System (LTFS) can be used to treat LTO-7 cartridges in much the same way as Compact Discs or USB memory sticks, allowing one person to create conent on an LTO-7 tape cartridge, and pass that cartridge to the next employee, or to another company. LTFS is also the basis for IBM Spectrum Archive that allows tape data to be part of a global namespace with IBM Spectrum Scale.
LTO-7 will be supported on the TS2900 auto-loader, as well as all of IBM's tape libraries: TS3100, TS3200, TS3310, TS3500 and TS4500. You can connect up to 15 TS3500 tape libraries together with shuttle connectors, for a maximum capacity of 2,700 drives serving 300,000 cartridges, for a maximum capacity of 1.8 Exabytes of data in a single system environment.
In addition to LTO-7 support, the IBM TS4500 tape library was also enchanced. You can now grow it up to 18 frames, and have up to 128 drives serving 23,170 cartridges, for a maximum capacity of 139 PB of data. You can now also intermix LTO and 3592 frames in the same TS4500 tape library.
For comptability, LTO-7 drives can read existing LTO-5 and LTO-6 tape cartridges, and can write to LTO-6 media, to help clients with transition.
Officially, the VVol concept was still just a "technology preview" in 2012, to be fleshed out over the next few years through extensive collaboration between VMware and all the major players: IBM, HP, Dell, NetApp and EMC.
In 2013 and 2014, IBM attended VMworld with live demonstrations of VVol support. VMware vSphere v6 was not yet available, but when it was, we assured them, IBM would be one of the first vendors with support!
To understand why VVol is such a game-changer, you have to understand a major problem with VMware version 4 and version 5, namely their Virtual Machine File System, or [VMFS].
Here is a picture to help illustrate:
On the left, we see that VMFS datastore is a set of LUNs from the storage admin perspective, and a set of VMDK and related files from the vCenter admin perspective.
If there was a storage-related problem, such as bandwidth performance or latency, how would the two admins communicate to perform troubleshooting? For many disk systems, it is not obvious which VMDK file sits on which LUN.
There are also a variety of hardware capabilities that work at the LUN level, such as snapshots, clones or remote distance mirroring, and this would apply to all the VMDK files in the data store across the set of LUNs, which may not be what you want.
There are two ways to address this in vSphere v4 and v5:
The first method is to have fewer VMDK files per datastore. By defining smaller datastores with just a few VMs associated with each, you can then have a closer mapping of VMDK files to datastore LUNs. Unfortunately, VMware ESXi has a 256 limit on the number of different datastores that can be attached, so this method has its own limitations.
The other method around this is "Raw Device Mapping" (RDM) which allowed Virtual Machines to be attached to specific LUNs. Some of the earlier restrictions and limitations for RDMs have since been relaxed over the releases, but your disk system still needs to expose the SCSI identifiers of each LUN to make this work, and additional setup is required if you plan to cluster two or more systems together, such as for a Microsoft Cluster Server (MSCS).
On the right side of the picture, using VMware v6, vCenter admins can now allocate VVols, which are mapped to specific "VVol Storage Containers" on specific storage systems. The storage admin knows exactly which VVol is in which container, so they can now communicate and collaborate on troubleshooting!
The vSphere ESXi host communicates to storage arrays via a new "virtual LUN id" called a "Protocol Endpoint". This is to allow FCP, iSCSI and FCoE traffic to flow correctly through SAN or LAN switches. For NFS, the Protocol Endpoint represents a "virtual mount point", so that traffic can be routed through LAN switches correctly.
Storage Policies can help determine which attributes or characteristics you want for your VVol. For example, you may want your VVol to be on a storage container that supports snapshots at the hardware level. The vCenter server can be aware of which storage arrays, and which storage containers in those arrays, through the VMware API for Storage Awareness, or VASA.
Different storage manufactures can implement their VASA provider in different ways. IBM has opted to have a single VASA provider for all of its supported devices, so as to provide consistent client experience. When you purchase any VVol-supported storage system from IBM, you are entitled to download the IBM VASA provider at no additional charge!
Initially, the IBM VASA provider will focus on IBM XIV Storage System, an ideal platform for your VVol needs. The XIV is a grid-based storage system, utilizing unique algorithms that give optimal data placement for every LUN or VVol created, and virtually guarantees there will be no hot spots. The XIV provides an impressive selection of Enterprise-class features, including snapshot, mirroring, thin provisioning, real-time compression, data-at-rest encryption, performance monitoring, multi-tenancy and data migration capabilities.
Let me give some real world examples from Paul Braren, an IBM XIV and FlashSystem Storage Technical Advisor from Connecticut, who has been working directly with clients over the past five years:
"Many of my customers have clearly said they really want the ability to have a granular snapshot that grabs a moment in time of just one VM, rather than all the VMs that happen to be on the same LUN. They also want to delete VMs, and have the storage array automatically present that newly available space. Even better, with VVol, these SAN related tasks appear to be executed nearly instantly, leaving behind those legacy shared VMFS datastore limitations and overhead.
The same benefits of VVol are evident when cloning or deploying VMs. Imagine being to create a Windows Server VM with a 400GB thick-provisioned drive in under 20 seconds. Well, you don't have to imagine it! I recorded video of this actually happening over at IBM's European Storage Competence Center, featured in this 8-minute video: [IBM XIV Storage System and VMware vSphere Virtual Volumes (VVol). An ideal combination!]"
-- Paul Braren
In addition to XIV, all of IBM's Spectrum Virtualize products also support VVolLs, including SAN Volume Controller, Storwize including the Storwize in VersaStack, and FLashSystem V9000.
I am not in San Francisco this week for VMworld, but lots of my IBM colleagues are, so please, stop by the IBM booth and tell them I sent you!
Every year, March 31 marks "World Backup Day". Sadly, many people forget the importance of backing up their critical information. This is not just true for businesses, non-profit organizations and government agencies, but also for all of your personal information that you keep on computer devices.
My friends over at Cloudwards had developed an awesome infographic related to World Backup Day. Here it is.
(FTC Disclosure: I work for IBM, which has no business relationship with Cloudwards. Cloudwards does not itself provide backup services, but rather reviews services provided by others. This post should not be considered an endorsement of Cloudwards or their reviews.)
Courtesy of: Cloudwards.net
I hope you find this information helpful and informative!
Well it's Tuesday again, and you know what that means? IBM Announcements!
Today, IBM announced an exciting new addition to the IBM System Storage™ product line, [IBM Spectrum Storage™], a family of software defined storage offerings.
To understand its significance, I need to explain a few things first. Software defined storage is part of a larger concept of software defined environment.
How is software defined environment different than what you have now? In every data center, you need to map business requirements of an application workload with an appropriate set of IT infrastructure, including server, network and storage resources.
The traditional approach involves an application owner or database administrator reviewing the business requirements documented for the application, calling the server, network and storage administrators, who match those requirements to appropriate IT hardware and notify the folks in facilities to rack and stack the gear accordingly.
In a software defined environment, Application Programming Interfaces (API), Service Level Agreements (SLA) and Orchestration workflows can automate the request for the appropriate resources. This is referred as the "Control Plane".
Responding to these requests, the software can provision the appropriate server, network and storage resources required. Server, network and storage virtualization, standard interfaces and deployment technologies exist to make this practical. This is referred to as the "Data Plane".
Any time new a way of doing things is introduced into the world, there could be some resistance. Let's tackle the three most frequently stated objections:
"IT infrastructure resources are rare and expensive! Administrators need to control or approve how resources are doled out!" An objection to self-service automation is the fear that employees would take too much.
If you have a bank account, Automated Teller Machines (ATM) can restrict the amount of cash you can take out, based on what is appropriate per request, or per day, with an upper limit of what you have in your personal checking or savings account. You enter your debit card and PIN into the "Control Plane" keypad and out comes a stack of 20-dollar bills from the "Data Plane" slot. In a software defined environment, you can limit requests through quotas and resource pools.
"Some application workloads are more important than others! Another objection is that every workload will be treated in the same standard way, mission critical workloads and dev/test would be treated alike.
At the gas station, you can select different levels of octane gasoline. You enter your credit card and zip code into the "Control Plane" keypad and selected octane comes out of the "Data Plane" hose. In a software defined environment, resources can be provisioned with different Quality of Service (QoS) levels.
"Different applications require different combinations of resources!" Another objection is the fear that fixed combinations of server, storage and network resources will be stifling to innovation and productivity.
At the vending machine, you can choose which candy bar and which chips to have with whatever soft drink you choose for lunch. You enter your bills and coins into the "Control Plane" slot, select the row letter and column number for your snack of choice, and then fetch your purchases from the "Data Plane" flap. In a software defined environment, a Service Catalog can offer a virtual menu of different server, network and storage resources to be combined together as needed.
These concerns are addressed well enough in software defined environments, in general, and with IBM Spectrum Storage family of products, in particular.
(Nostalgia: I remember the days before self-service automation. At the bank, I had to stand in line at the bank until I could to talk to a human bank teller to get cash from my savings account. At the gas station, human gas attendants would come out and pump the gas for me, check my oil and wash my windshield. And at a restaurant, I felt like I waited an eternity from the time I ordered my meal to the time the human short-order cook had it ready and human wait staff delivered it to my table. These all seem silly today, doesn't it?)
Jamie Thomas, IBM General Manager of Storage and Software Defined Environments
Jamie announced [IBM Elastic Storage], a new offering that is available as a software defined storage solution, based on IBM's General Parallel File System (GPFS) technology already deployed at 45,000 installations.
IBM Elastic Storage provides a global name view across data center locations. It can manage up to a Yotabyte of information, combining Flash, disk and tape resources. It supports OpenStack interfaces, Hadoop and standard POSIX file system conventions.
IBM Elastic Storage provides automated tiering to move data from different storage media types. Infrequently accessed files can be migrated to tape and automatically recalled back to disk when required. Unlike traditional storage, it allows you to smoothly grow or shrink your storage infrastructure without application disruption or outages.
IBM Elastic Storage software can run on a cluster of x86 and/or POWER-based servers, and can be used with internal disk, commodity storage, or advanced storage systems from IBM or other vendors.
IBM partnered with various clients in different industries in a special beta program. Jamie led a client panel to discuss their experiences with IBM Elastic Storage:
Alan Malek, Director of IT, Cypress Semiconductor.
"Total cycle time is key". Over the past 31 years, they bought whatever file storage was available. Now, with IBM Elastic Storage, the performance was very consistent for their engineering workloads with full load balancing.
Russell Schneider, Principal Storage Consultant, Jeskell.
Russell's company works with a lot of federal agencies, "Big Data has become Bigger Data". For example, research on Global Warming and Climate Change requires a large amount of storage across agencies.
In another example, when the tsunami hit Japan a few years ago, an agency here in the USA realized they had 14PB of data stored as a single copy in a data center at sea level less than a mile from the coast. They realized they needed to have a secondary copy, and an option to cache to a third location depending on regional disasters.
Matthew Richards, Products, OwnCloud.
For those not familiar with OwnCloud, it provides a Dropbox-like file sharing service, but in the Enterprise, with on-premise storage. It has been fully tested and certified with IBM Elastic Storage to provide a secure file sharing platform.
With IBM Elastic Storage, they were able to scale linearly up to 20,000 users, and are now testing 100,000 users. The need to have intelligent access to files at scale is what Matthew likes about IBM Elastic Storage.
Dr. Michael Factor, IBM Distinguished Engineer at IBM Research
Michael started out explaining there are three areas for storage: block, file and object. The fastest growing type of data is unstructured fixed content with associated metadata. This is ideal for object storage. Michael has been working with OpenStack Swift, an open source interface defined for object storage. He defined "storlets" as follows:
Storlets extend an object store by moving computation to the data -- filtering, transforming, analyzing -- instead of bringing data to the computation.
Storlets have been deployed on a variety of European Union research projects. For example, in partnership with Phillips, a pathology storlet can count the number of cancer cells in an image. By bringing the computation to the data, it eliminates having to transfer large amounts of data over the network.
Storlets can run on-premise and on IBM's SoftLayer IaaS cloud offering.
Bruce Hillsberg, IBM Director of Storage Systems at IBM Research
Bruce led another panel discussion, this time of IBM storage experts:
Vincent Hsu, IBM Fellow and CTO of Storage.
The problem is the isolation of data into "storage silos". Isolation causes problems in managing large amounts of data at scale, and costs more as storage is not fully utilized. IBM Elastic Storage brings everything together, eliminating storage silos.
Michael explained how IBM works with clients all over the world to ensure that storage solutions meet client requirements. For example, storlets can be used to use rich metadata to manage photographs, and display them based on GPS satellite location, or other content that makes it easier to manage these images.
IBM Elastic Storage will support OpenStack Cinder and Swift interfaces. IBM is a platinum sponsor of OpenStack foundation, and is now its second most prolific contributor, with hundreds of full-time employees working on this.
Tom Clark, IBM Distinguished Engineer, Chief Architect, Storage Software, Cloud & Smarter Infrastructure.
Storage Management is a critical piece of Software Defined Storage. This is done in three ways:
The use of analytics to optimize the deployment of storage, based on workload requirements. Storage admins set policies, and then IBM Elastic Storage analytics gather metrics and then optimize data placement and movement based on these policies. IBM Elastic Storage has 70 percent lower TCO that competitive offerings.
The focus on backup services. Backups are not just for data protection, but rather can be used to duplicate or replicate data for testing, for training, and for other purposes. IBM Elastic Storage is fully supported by IBM Tivoli Storage Manager.
Being able to support Hybrid Cloud environments, where some data can be on-premise, and other data off-premise. Storage Management challenges will need to deal with this possibility. IBM Elastic Storage is well positioned for this.
Carl Kraenzel, IBM Distinguished Engineer, Director of Watson Cloud Technology and Support.
Watson is ground-breaking technology, and IBM Elastic Storage technology was at the heart of the Watson that was first introduced in 2011.
To consider IBM Elastic Storage based on lower-cost and higher-scalability is not the full picture. Rather, this is an important platform for Cognitive Computing, which we are just at the tip of the iceberg in exploring. IT systems need to be aware of the context of what we are doing.
While the Grand Challenge demonstration on Jeopardy! was exciting, it is time we stop playing games and apply IBM Elastic Storage to business, to help with health care and medical research, and other problems in society. IBM has already deployed this at Anderson Cancer Center and Memorial Sloan Kettering Cancer Center, for example.
Tom Rosamilia provided closing remarks. IBM Elastic Storage is not just for new workloads in Cloud, Analytics, Mobile and Social (CAMS) but also traditional workloads as well. IBM Elastic Storage provides "data democracy" and allows for "better rested storage administrators" that make fewer mistakes.
Tom opened the floor for questions from the audience:
Q1. Data integrity, not just security but also quality? IBM Elastic Storage has end-to-end data integrity checking built-in.
Q2. How does IT transition from full control to auto-pilot? IBM allows you to tap into existing storage. This is not rip-and-replace. With storage virtualization, IBM hides the complexity that normally requires full control over specific assets.
Q3. Storage admins would rather have a root canal without Novocaine than move their data. What is IBM doing to offer automation to help storage admins move to this new infrastructure? IBM storage virtualization breaks that hard link between applications and specific storage devices. IBM Elastic Storage eliminates application downtime previously associated with data movement.
Tom Rosamilia assured the audience that IBM is fully committed to its storage portfolio. IBM Elastic Storage is not just about the profoundness of what IBM announced today, but also where IBM is investing in the future of storage.
Well it's Tuesday again, and you know what that means? IBM announcements! Many of the announcements were made by IBM Executives at the [IBM Pulse 2014 conference].
IBM BlueMix is the newest cloud offering from IBM, providing Platform-as-a-Service (PaaS) offering based on the Cloud Foundry open source project that promises to deliver enterprise-level features and services that are easy to integrate into cloud applications.
This week, my fifth-line manager Tom Rosamilia, IBM Senior Vice President IBM Systems & Technology Group and Integrated Supply Chain made two announcements at Pulse. First, in additional to x86-based servers, SoftLayer will also offer POWER-based servers to run AIX, IBM i and [Linux on POWER] applications.
Second, SoftLayer will support PureApplication Patterns of Expertise. What is a pattern of expertise? It can be as simple as a virtual machine encapsulated in [Open Virtual Format], to more dynamic architectures, packaged with required platform services, that are deployed and managed by the system according to a set of policies.
Patterns simplify and automate tasks across the lifecycle of the application. Customers and partners alike are [seeing significant reductions in cost and time] across the application lifecycle with the deployment of a PureApplication System.
Also, this week at Pulse, Robert LaBlanc, IBM Senior Vice President of Software and Cloud Solutions, announced [IBM plans to Acquire Cloudant] which offers an open, cloud Database-as-a-Service (DBaaS) that helps organizations simplify mobile, web app and big data development efforts.
When I introduced [SmartCloud Virtual Storage Center] back in October 2012, I mentioned that it was a great solution for large enterprise that have all of their disk behind SAN Volume Controller (SVC).
To reach smaller accounts, IBM has announced two new offerings:
IBM SmartCloud Virtual Storage Entry for customers that have less than 250TB of disk behind two or four SVC nodes. It is priced per terabyte, by the amount of capacity that is virtualized.
IBM SmartCloud Virtual Storage for Storwize Family for customers that have other Storwize family products (Storwize V7000 or V5000, for example). It is priced per the number of storage enclosures that are managed by the Storwize family hardware.
From the photo, the marketing people staggered the various components to give it a stylized [Dagwood Sandwich] effect. I can assure you that these are just standard 19-inch rack components that fit into 6U of space in standard IT racks.
Starting top to bottom, we have the first FlashSystem V840 Control Enclosure, its 1U-high UPS, a second FlashSystem V840 Control Enclosure and its UPS, and finally a 2U-high FlashSystem V840 Storage Enclosure.
You can have up to a dozen Flash modules, either 2TB or 4TB size, for a maximum of 40TB usable RAID-protected capacity. These can be protected with AES 256-bit encryption. The FlashSystem modules are front-loaded, and slide in and out for easy maintenance.
The system is fully redundant and hot-swappable with concurrent code load to ensure high availability.
(Update: In the comments, readers thought that this was nothing more than just a two-node SVC with FlashSystem 840. There are differences, so I have added the following table.)
SVC with FlashSystem 840
Cabling from controllers to storage
Through SAN fabric ports
Direct attach from V840 Controllers to V840 Storage Enclosures
Call Home Support
GUI screen branding
The system is fully VMware-certified, supporting VAAI interfaces, and an SRA for VMware's Site Recovery Manager (SRM). With Real-time Compression, you can get up to 80 percent capacity savings for workloads like Virtual Desktop Infrastructure (VDI). That in effect gives you up to 5x (200TB) of virtual capacity in 6U of rack space!
You can either keep it as an All-Flash array, or you can virtualize external IBM and non-IBM disk systems, and use the Flash capacity in the Storage Enclosure for IBM's Easy Tier automated sub-volume tiering and data migration. With or without external storage, the FlashSystem V840 can provide local and remote mirroring and point-in-time copies.
However, I was speaking to various clients in Winnipeg, Canada Tuesday and Wednesday this week, so marketing moved the announcement date to today to accommodate my schedule. Sometimes, being the #1 most influential IBM employee in storage comes in handy!)
Here, then, is a quick review of the storage portion of today's announcements.
IBM FlashSystem 840
The [IBM FlashSystem 840] offers twice the capacity as its predecessors, the 810 and 820, with up to 48TB in a dense 2U package.
(Quick recap of previous models: Both the FlashSystem 810 and 820 supported ECC-protected memory and Variable-striped RAID (VSR). The [FlashSystem 810] supported RAID-0 striped across the modules, and the [FlashSystem 820] supported two-dimensional 2D-RAID across modules for higher availability. Fellow blogger Jim Kelley (IBM) on his Storage Buddhist blog has a great post on this: [IBM FlashSystem: Feeding the Hogs].
The new FlashSystem 840 in effect replaces both, so you can choose RAID-0 striping or 2D-RAID, along with your ECC-protected memory and Variable-striped RAID. It offers hot-swappable Flash modules, redundant components, and non-disruptive concurrent code load (CCL).
The FlashSystem 840 also introduces military-grade AES-XTS 256 bit encryption to provide added protection to your data.
For host attachment, you have some great choices: 16Gb/8Gb/4Gb auto-negotiated Fibre Channel (FCP), 40Gb InfiniBand QDR, and 10Gb FCoE. Whatever you decide, you get 90 microsecond writes, and 135 microsecond reads.
Since its introduction just over a year ago, IBM has sold FlashSystem to over 1,000 clients! For more on how this compares to other all-flash arrays, read my previous post about [IBM FlashSystem].
Adding SAN Volume Controller provides some key advantages, including Real-time compression, Thin provisioning, FlashCopy point-in-time copies, Stretched Cluster support, Easy Tier sub-LUN automated tiering, and remote copy services like Metro Mirror (synchronous) and Global Mirror (asynchronous).
Adding the SVC also changes the host attachment options: 8Gb/4Gb/2Gb Fibre Channel (FCP), 1Gb and 10Gb iSCSI, and 10Gb FCoE. Depending on the options and features you choose, the SVC layer adds a modest 60 to 100 microseconds to each read and write.
Each SVC node dedicates four of its six cores, and 2GB of its 24GB cache, to use with compression. Those interested in beefing up compression performance, either with FlashSystems or with any other disk, can choose the "Compression Hardware Upgrade Boosts Base I/O Efficiency" (affectionately known as the CHUBBIE) RPQ 8S1296 for SVC systems with software version 220.127.116.11 or higher. Basically, this RPQ adds another 6-core CPU and another 24GB of cache, so that each node can dedicate 8 cores for compression, and 26GB of cache for compression processing. Initial test results show this can increase performance 3x!
IBM Network Advisor
The [IBM Network Advisor v12.1] management software provides comprehensive management for data, storage and converged networks. This single application can deliver end-to-end visibility and insight across different network types--it supports Fibre Channel SANs (including Gen 5 Fibre Channel platform), IBM FICON and IBM b-type SAN FCoE networks--and provides new features to manage your Brocade and IBM b-type SAN switches.
Cisco MDS 9710 Multilayer Director
The [Cisco MDS 9710 Multilayer Director] is mainframe-ready, with full support for System z FICON and Fibre Channel protocol (FCP) environments. This director supports eight module slots for a maximum of 384 ports.
Well it's Tuesday again, and you know what that means? IBM Announcements!
You might be thinking, didn't IBM just have a [huge storage announcement October 8, 2013]? You would be right! IBM's $1B additional investment in Storage has been like shot of adrenaline in getting new features and functions out sooner to our clients.
DS8870 Disk System Release 7.2
New IBM POWER7+ controllers. The previous models of DS8870 were based on the POWER7 controllers, and these new models have POWER7+ processors. This change enhances the performance across the board, from mainframe to distributed systems, from sequential to random. Customers with existing POWER7-based models will be able to do MES upgrade to the new POWER7+ next year.
For comparison with older DS8000 models, here are some internal IBM measurements we took for Database workloads on both z/OS(mainframe) and Distributed systems with typical 70% read, 30% write and 50% cache hit:
IBM Internal Measurements (thousands of IOPS)
DB Distributed systems
New 1.2TB (10K RPM) and 4TB (7200 RPM) self-encrypting enterprise drives (SED). This is a 33% boost over the previous 900GB and 3TB drives previously available. As with all the other drives in the DS8870, these new drives include the encryption chip right on the drive itself, offering encryption with scalability.
Improved security. Release 7.2 will support the U.S. National Institute of Standards and Technology [NIST.gov]] 800-131A specification, raising the 96-bit encryption to the required 112 bits on the customer IP network. This involves updates to the security firmware, management software and digital signatures on code loads.
Metro Mirror enhancement for System z. By avoiding serial conflicts of updated blocks, this enhancement can boost performance up to 100 percent when using Metro Mirror with z/OS applications on System z mainframes.
Easy Tier™ reporting and graphs to determine optimal mix. Now you can see for yourself how sub-LUN automated tiering is helping your applications.
Easy Tier Workload Categorization
New workload visuals help clients and IBM technical specialists compare activity across tiers within and across pools to help determine optimal drive mix for current workloads
Easy Tier Data Movement Daily Report
New Easy Tier summary report every 24 hours illustrating data migration activity (5-minute intervals) can help visualize migration types and patterns for current workloads
Easy Tier Workload Skew Curve
Shows skew of all workloads across the system in a graph to help clients visualize and accurately tier configurations when adding capacity or a new system Clients can import data into Disk Magic
All-Flash Optimization. Yesterday, in my post [IBM FlashSystem versus EMC XtremeIO], I mentioned that any hybrid systems like the IBM Storwize V7000 that can support a mix of SSD and HDD can obviously be configured as SSD-only. Apparently, that was not obvious to many readers, so I apologize. For the DS8870, you can configure an all-Flash (SSD only) configuration, and Release 7.2 added some optimization when configured with SSD only.
1,056 drives 15K @146GB in RAID-10
224 drives SSD @400GB in RAID-5
Same - Usable 72 TB
70 percent faster
33 percent less floor space required
62 percent less energy consumed
(Note: Performance results based on measurements and projections using IBM benchmarks in a controlled environment.)
OpenStack™ support DS8870 now offers the [OpenStack Cinder] interface for block LUN allocations in OpenStack environments. IBM is a Platinum sponsor of OpenStack, and Opentack is the strategic platform for IBM private and hybrid clouds.
XIV Storage System
Following on the heels of the [XIV enhancements announced], IBM has now added 800GB Solid State Drives (SSD) as Read cache for its 4TB drive-based models.
DCS3860 Disk System
The DCS3860 is the next generation of the DCS3700 disk system. Designed with Linux-x86 servers in mind, the system offers direct SAS host attachement, 24GB of cache, and 60 drives in a compact 4U drawer. Like its predecessor, the drives are stored on five pull-out trays, with twelve hot-swappable drives per tray. You can add up to five more expansion units, with 60 drives each, for a total of 360 drives in 24U rack space.
These new models will help our clients deploy new workloads and consolidate existing workloads.
Each resident presented at least six proposals for blog post ideas. A proposal included a title and short description of what it would entail. Titles had to be less than 70 characters, and the short descriptions were typically just a few sentences.
These were presented to the entire team, and we picked them apart, suggested better wording for the titles, or different ways to approach the topic.
"I treat others respectfully, attacking ideas and not people. I also welcome respectful disagreement with my own ideas.
I believe in intellectual property rights, providing links, citing sources, and crediting inspiration where appropriate.
I disclose my material relationships, policies and business practices. My readers will know the difference between editorial, advertorial, and advertising, should I choose to have it. If I do sponsored or paid posts, they are clearly marked.
When collaborating with marketers and PR professionals, I handle myself professionally and abide by basic journalistic standards.
I always present my honest opinions to the best of my ability.
I own my words. Even if I occasionally have to eat them."
Words to live by.
The residents spent most of the day working on our blogs from the proposals that were approved. The target was around 400 to 600 words in length, with one or two stock photos.
IBM is the #1 vendor for Social Business tools, so it makes sense for us to use our own stuff to facilitate the submission process. The residents submit their blog posts to IBM Connections as an activity in the Cloud Social Media Residency community. All of the resources we used, and all the presentations we saw, are all here in the community.
As an incentive, prizes were given out to those who submitted the most posts by end of the day.
We were given certificates for completing the class, and a "Redbooks Thought Leader" emblem to put on our blog.
Ryan Boyles took a group photo! If it seems that the photo is slightly askew, it is to make me look taller. Yes, I could have used GIMP to fix the orientation, but why bother? I look tall! Woo hoo! I will have to remember this technique for future group photos.
ITSO Cloud Social Media Residency, Oct 2013. Photo by Ryan Boyles.
Lastly, I would like to thank Vasfi, Tamikia, Hillary, Caroline, Ric, Jane, LeeAnne, Tina, Karen, Michael, Shelbee, Farzad, Stewart, Arun, Eric, Chris, Hans, Odilon, Mohsin, Wolfgang and the rest of the ITSO team for a wonderful job organizing this week!
The gondolier propels the boat with an oar, and stopped rowing a few times to belt out beautiful Italian songs.
Truly impressed, I asked the gondolier how long was the training for this job. "Six weeks!" he answered. Wow! Where can I learn to sing like that in six weeks?
He clarified. No, the Venetian hotel hires competent singers, and then spends six weeks to teach them to row the gondola. Duh!
I asked Vasfi Gucer, our ITSO project leader for this residency, why there were so many Cloud topics on the agenda for this social media training. He explained it was just as important to emphasize "why" people need to be passionate about Cloud, in addition to the "what" and "how" of blogging.
This reminded me of this quote from fellow author Hugh MacLeod. I highly recommend his series of books.
"Blogging requires passion and authority. Which leaves out most people."
--- Hugh MacLeod.
Vasfi had invited Cloud experts who already have the authority to blog, and the point of this residency is for the residents to become passionate in sharing their expertise.
Here are some of the people that spoke on Cloud:
Ric Telford, IBM VP of Cloud Services
Ric Telford shared with us IBM's point of view of where the Cloud industry is going. He has been in this job position since 2009, and shared with us the history of how the IBM Cloud business has evolved in the past four years.
Jane Munn, IBM VP Business Line Executive for Cloud hardware
As the Center of Competency on Cloud for all 12 IBM Executive Briefing Centers in my group, I had to report to Jane Munn on a frequent basis. I was pretty candid on those calls on what we should change, and I am glad to see that many of my suggestions have been implemented, or being considered for 2014.
Michael Fork, IBM Lead Architect for Hosted Private Cloud
Michael Fork gave two great presentations, one on [IBM SoftLayer] Cloud services, and the second on IBM's support of open standards, such as [OpenStack] and Cloud Foundry.
Hans Zai, IBM Cloud Service Line Leader; and Odilon Magroski Goulart Junior, IBM Technical Solution Architect
All the residents had to present in front of the class on their expertise. Hans and Odilon presented their work on [IBM SmartCloud for SAP Applications]. Hans is from Sweden, and Odilon from Brazil, so their perspectives on this was quite interesting.
When IBM renamed LotusLive to [SmartCloud for Social Business], I thought this would be the naming convention for all of our Software-as-a-Service (SaaS) offerings.
But SmartCloud for SAP Applications is a Platform-as-a-Service, providing the SAP environment as a platform, which allows clients to then deploy their customized SAP applications on this platform.
What did I present on for my "Share your expertise" session? IBM System Storage, of course! Storage is a critical part of Cloud!
So, my gentle readers, what topics do you want me to write about that combines Storage and Cloud? Enter your suggestions in the comments below.
"SmartCloud Enterprise Object Storage is switching from 3rd-party Nirvanix to its internal IBM Softlayer. This one involves more in-depth explanation which I will save for another post."
It's time to make good on that promise! Here is a quick diagram to help visualize the agreement (with sincere apologies to [Jessica Hagy]!) but not to scale, of course!
Last month, Nirvanix announced it was shutting down October 15. Here was the exact wording from their website:
For the past seven years, we have worked to deliver cloud storage solutions. We have concluded that we must begin a wind-down of our business and we need your active participation to achieve the best outcome.
We are dedicating the resources we can to assisting our customers in either returning their data or transitioning their data to alternative providers who provide similar services including IBM SoftLayer, Amazon S3, Google Storage or Microsoft Azure.
We have an agreement with IBM, and a team from IBM is ready to help you. In addition, we have established a higher speed connection with some companies to increase the rate of data transfer from Nirvanix to their servers.
We are working hard to have resources available through October 15 to assist you with the transition process, and have set up a rapid response team that can be reached at (619) 764-5650 [press 2 for customer support during normal business hours] or (888) 791-0365 after business hours, or contact email@example.com.
Please check back to this web page periodically for status updates.
We thank you for your support and patience.
The Nirvanix team
UPDATE ON NIRVANIX
On October 1, 2013, Nirvanix voluntarily sought Chapter 11 bankruptcy protection in order to pursue all alternatives to maximize value for its creditors while continuing its efforts to provide the best possible transition for customers."
In response, IBM put out this press release:
"In light of reports that Nivanix has decided to soon cease operations, IBM is moving quickly to help clients of our Nivanix-based Object Storage offering to move their data to other solutions such as the robust and highly scalable IBM SoftLayer Object Storage or IBM's persistent storage solution."
To understand why this is a big deal, consider the difference between Cloud Computing and Cloud Storage. Cloud Computing is like buying gasoline at your favorite gas station. If the station is closed, you can just drive a few blocks to another gas station. The ease with which customers can switch from one Cloud Compute provider to another is part of the appeal, forcing Cloud Compute providers to be extremely efficient at what they do to offer the lowest price.
Cloud Storage is completely different, more like a safety-deposit box at the bank, or a storage unit to hold all of your boxes of tax receipts. Now if you have a small amount stored away in a safety-deposit box, this is probably just a minor inconvenience. You can take out the contents and store at home, or find another bank and open a new safe deposit account.
However, if you have a lot stored in a storage unit, it may be more difficult.
For example, I am in the process of remodeling my home, so I have moved a lot of my stuff to a 400 cubic-foot storage unit during the process. There were a variety of storage units within miles of my home. Some are fully air-conditioned, some offered 24x7 access, while others are not air-conditioned, or only allowed access during business hours. It has taken me several weekends to box up and move them to the storage unit. My car only holds 12-14 boxes at a time, so many trips were involved.
If the Storage Unit company told me that they were closing down, and that I would have to move all of these boxes to another facility, I would have to hire moving professionals to do all the work. This is in effect what companies need to do with their data. They must take the data off Nirvanix systems, and either store it in-house, or find another cloud storage provider.
IBM offers three options:
IBM [SoftLayer Object Storage] offering which is an OpenStack Swift-based Object Storage solution. IBM's SoftLayer object based storage solution provides a robust, highly scalable solution, with the ability to retrieve and leverage data the way you want to, and grow when you need. You can choose to store your objects in Dallas, Texas (USA), Amsterdam (Europe), and/or Singapore (Asia).
SCE persistent storage solution where you will be able to manage storage resources by attaching an instance during the instance creation process.
An alternate storage solution of your choice. Yes, IBM will help you move your data to Amazon, Google, Microsoft, etc. While technically competitors, IBM also has strategic partnerships in place with each to facilitate the movement.
These options are not just for IBM's SmartCloud Enterprise Object Storage clients. Nirvanix has named IBM the savior for all of its other non-IBM customers as well. Why IBM? Well, IBM is one of the most recognized names in the IT industry. Not just one of the biggest Cloud Service providers, IBM also has an army of professionals in its Global Services division to help.
Well, it's Tuesday again, and you know what that means? Announcements!
Today, IBM's announcements are designed to change the economics of big data analytics, cloud, mobile and social media.
[Software Defined Environments] require [Software Defined Storage], combining storage virtualization with open, extensible, industry-led interfaces. The IBM SmartCloud Virtual Storage Center (VSC) and IBM Storwize Family are the market leaders in storage virtualization. SmartCloud VSC, Storwize Family, and XIV support the industry-led OpenStack interfaces.
Here are some of the announcements today:
IBM Storwize® Family
The [SAN Volume Controller] was first introduced 10 years ago, in 2003. Today, clients enjoy these storage virtualization capabilities across a variety of offerings, known collectively as the [IBM Storwize Family].
IBM adds a new member to the Storwize Family. In addition to SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, Flex System V7000, Storwize V3700, and Storwize V3500, IBM is announcing the [IBM Storwize V5000]. Here's a quick side-by-side comparison:
Scalability: Maximum configuration
Four control enclosures clustered together, 36 expansion enclosures, 960 drives, 64GB cache
Two control enclosures clustered together, 12 expansion enclosures, 336 drives, 32GB cache
One control enclosure, 4 expansion enclosures, 120 drives, 8GB cache upgradeable to 16GB, optional Turbo performance
8Gbps FCP and 1GbE iSCSI standard; optional 10GbE iSCSI/FCoE.
Can upgrade to Storwize V7000 Unified by adding NAS File Modules to add support for CIFS, NFS, HTTPS, SCP and FTP protocols
1GbE iSCSI, 6Gbps SAS, 8Gbps FCP and 10GbE iSCSI/FCoE Standard
1GbE iSCSI, 6Gbps SAS standard; optional 8Gbps FCP and 10GbE iSCSI/FCoE
Storage virtualization/Data Migration
Internal virtualization, Data Migration standard; optional external virtualization
Internal virtualization, Data Migration standard; optional external virtualization
Internal virtualization, Data Migration (external devices can be attached to ingest data only) standard
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
optional Metro Mirror, Global Mirror, Global Mirror with Change Volumes
Sub-LUN Automated Tiering
Easy Tier standard
optional Easy Tier
optional Easy Tier
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
VMware VAAI, VASA, vCenter plug-in, and OpenStack Cinder APIs standard
Storwize V7000, V5000 and V37000 now support larger 800GB SSD drives. Previously, they only support SSD drives up to 400GB.
VMware 5.5 and VASA support. VMware ships every release with built-in support for all members of the IBM Storwize Family, but it bears repeating here just in case you were interested. IBM is a leading reseller of VMware, so it makes sense for IBM's storage devices to support everything that VMware customers could possibly want in terms of VMware integration. IBM SmartCloud VSC, Storwize Family, and XIV Storage System are no exception!
New IP-based replication driving lower costs for replication. Previously, Metro Mirror, Global Mirror and Global Mirror with Change Volumes were FCP-based, and many clients bought extra equipment to run FCP packets over long-distance IP (known as FCIP). Now, clients can replicate across distnace natively without FCIP routers, and use IP-based connections natively.
In my blog posts covering [Edge 2013 - Day 3 Solution Center], I mentioned that IBM has certified Bridgeworks' SANSlide 150SVCV7K unit that provides a Riverbed-like WAN Optimization for long-distance replication. Now, IBM has fully integrated Bridgeworks' SANSlide network optimization technology directly into Storwize Family!
All members of the Storwize Family will support 1GbE remote disk replication, and this will be extended to 10GbE support at a later date.
The [Storwize V3700] is now offered in 48-volt Direct Current (DC) models, [NEBS/ETSI compliance] for Telecommunications companies that require this, and now support 4TB drives.
When we introduced [IBM SmartCloud Storage Access] in February, it was to offer self-service, automated policy-based provisioning for file storage on the SONAS and Storwize V7000 Unified. Today, we add self-service, automated policy-based provisioning for block storage. The first products to be supported are SmartCloud VSC, the entire Storwize Family, and XIV Storage Systems. In addition to the web portal, the Storage Cloud Integration API enables 3rd party ISV applications to support SmartCloud Storage Access.
Storage admins will no longer need to be bothered with tedious provisioning requests, freeing up more time for them to work on more strategic, transformational projects.
[IBM SmartCloud Virtual Storage Center] was introduced last year, combining SAN Volume Controller, Tivoli Storage Productivity Center, Tivoli FlashCopy Mangaer and the Storage Analytics Engine into a single license. The initial offering provided the cross-platform "Tiered Storage Optimization" that provided recommendations for what LUNs should be moved from one disk array to another to manage performance vs. cost. Today, IBM is first to market with an automated version, moving LUNs automatically from one disk array to another.
[SmartCloud Enterprise Object Storage] is switching from 3rd-party Nirvanix to its internal IBM Softlayer. This one involves more in-depth explanation which I will save for another post.
IBM XIV Storage System
As part of the [due diligence] team for IBM to acquire the XIV company back in 2007, I am glad to see how this system has evolved since then. I have certainly [blogged quite a bit on XIV] over the years.
Earlier this year, IBM introduced Hyper-Scale Mobility which allows the storage admin to move LUNs non-disruptively from one XIV frame to another. Today, Hyper-Scale Cross-system Consistency Groups allows you to have snapshots of collections of volumes across multiple XIV frames, up to 3PB of capacity snapped at the same instance of time.
The current supported releases of OpenStack are Folsom and Grizzly, and the newest release is Havana. XIV now offers OpenStack Cinder interfaces at the Havana level.
XIV now offers a RESTful API for monitoring and provisioning. [REST] is a de-facto standard in WEB services and cloud implementations. XIV's RESTful API is a programmatic management interface that follows REST principles:
Resources are identified by global identifiers (URIs)
Data is sent as JSON/XML over HTTP
Manipulations of resources are done by HTTP methods (GET, PUT, POST, DELETE)
The interface is Stateless and Hypertext driven
The interface is universally supported, programming language and platform agnostic. For monitoring, the following GET example could show the list of volumes on a particular XIV storage system:
For provisioning, the following PUT example could create "vol1" on that XIV storage system.
IBM SmartCloud Storage Access to allow self-service provisioning (see the SmartCloud section above).
Data-at-Rest encryption, using Self-Encrypting Drives (SED). XIV will encrypt the data, and IBM's Security Key Lifecycle Manager (SKLM) or Tivoli Key Lifecycle Manager (TKLM). If you have an XIV already, you may already have SED drives ready to use! The XIV will also encrypt the data on the SSD drives used for persistent read-cache.
Other new and enhanced offerings
For our mainframe clients, the Virtualization Engine TS7700 now supports 60 percent more capacity, and can now support 8Gbps FICON attachement.
N series N3000, N6000 and N7000 support new disk drive types and sizes, as well as Data OnTap 8.2 Cluster mode. You can now lash together up to 16 N series together into a SONAS-like single system image.
Cisco MDS 9710 Multilayer Director for IBM® System Networking is a new 16 Gbps SAN director with robust security to support multi-tenancy cloud configurations.
Whew! That is a lot of things to discuss in one post. Since they were all related, I did not want to split it up into parts.
Wrapping up my coverage of the of the [IBM Edge 2013] conference, I have some photos of people I ran into at the Solutions Center.
Leslie Hattig and Lisa Stone, both account managers for [MarkIII Systems], an IBM Business Partner located in Houston, TX. These ladies are inseparable BFFs, I have never seen one without the other! I first met them at the [Storage Symposium in Chicago] back in 2009.
Stacy Tabor was our Community Manager for the [Storage Community]. This community covers IT Storage challenges, hot topics, architecture and solutions. You'll find industry news, videos, blog discussion threads on timely topics, exclusive analyst white papers and experts opinions. I am a frequent contributor, myself, and thank Stacy for her past service. She helped run a "Social Media Hour" at Edge for all the bloggers like me to get to meet each other.
I could not resist getting a picture with this Las Vegas Cirque du Soleil] dancer. This was an invitation-only event, sponsored by IBM Business Partners, that I was invited to during the Social Media Hour. (See, it pays to be social!) I think the visual effects of the flag she was waving turned out really well in the picture! And yes, in case you are wondering, that is my favorite grape-flavored beverage (GFB) in my left hand. Posing for this picture was quite the balancing act, but then I am also a certified yoga instructor, so I was able to manage!
Tanaz Sowdagar is an IBM Storage Rep for our Business Development Team. This includes finding other companies to OEM our technology and re-brand it under their own names. I have worked with Tanaz for many years, helping answer questions that potential OEM parnters have about our products and technologies for this purpose.
This was Michelle, my Conference Room Monitor. Each room had one, scanning the bar-codes on each badge for all the attendees, keeping count of the number of people for each session, supporting anything the speaker needs, like getting the A/V guy to come help set up the laptop projector.
Since this was Friday, last day of the conference. I decide to dress casually, consistent with many company's [Casual Friday] dress code policy. I am wearing the "IBM Edge Rocks" tee-shirt given out at the concert and Solutions center the first few nights.
Getting this shot right took several takes, as the man I handed my camera to had apparently never seen a digital camera before, did not know how to focus, and some
Finally, leaving Las Vegas, I sat next to Mrs. Joey Clark, wife of "Bulldog" Clark of the Utah band [Blammity Blam]. She also sometimes plays violin with the band. She is a newly-wed, and not sure if Joey is her name, or her husband's name. (Joey, if you are out there, and want me to correctly identify you, please write a comment in the section below.)
What I have learned however, is that if a beautiful girl is sitting next to me on the plane, she will either talk to me the entire flight, implying that she is single, or mention within the first 30 seconds of conversation that she is married. Sadly for me, it was the latter.
(We were both flying on to Dallas, TX, whereupon she was going to visit her parents in Florida, and I was on my way to Sao Paulo, Brazil to get stuck there amongst the protesters in what is now called the [V for Vinegar movement], but I will save that for another blog post!)
Well, that wraps up my coverage of Edge 2013. I am sorry it took so many months to cover all the material, but I did not want to have it go uncovered much longer.
Next year's [Edge 2014] is expected to be bigger and better. It will in Las Vegas again, but this time at the Venetian Hotel, May 19-23, 2014. I plan to be there!
Monday marked the first official day of [IBM Edge 2013] conference. This is actually three conferences in one: Executive Edge for the high-level executives, Winning Edge for the Business Partners, and Technical Edge for storage administrators and IT manager/directors. I attended the latter.
The General Session was kicked off by an awesome drumbeat-heavy song performed by a band from North Carolina called [Delta Rae]. Their use of drums reminded me of Adam Ant.
Deon Newman, IBM VP of Marketing, Systems and Technology Group, North America, served as today's master of ceremonies. He was pleased to announce there were more then 4,700 attendees at this event -- representing more than 60 countries -- a huge increase over the attendance we had last year. Here are my notes of the opening General Session:
Stephen Leonard, IBM General Manager, Sales, Systems & Technology Group
Consumers expect an always-on technology experience. We, as consumers, are leaving a trail of data that is getting wider and wider every day. Data is the new "natural resource", but plentiful and never ending.
In 1996, about 29 percent of IT spend was for adminstration and management, today it has grown to 68 percent. Some 34 percent of IT projects deploy late.
Stephen emphasized the themes of Smarter Computing: (a) systems that are designed for the data, (b) software-defined environments, that are (c) open and collaborative.
Stephen cited a customer example from [Jaguar Land Rover], a manufacturer of sporty automobiles and rugged 4x4 vehicles. IBM developed a ["Virtual Dealership"] for them. Rather that trying to maintain additional physical bricks-and-mortar facilities, which can be expensive to staff and fill with vehicles across their wide portfolio, the virtual dealership allows prospective customers to try out vehicles through simulation. This virtual dealership could be taken to where prospective clients are, such as a sporting event or shopping mall.
Ed Walsh, IBM VP of Marketing, System Storage and Networking
Ed presented the "data economics" of all-Flash arrays. IBM recently acquired Texas Memory Systems, and renamed the RamSan products to IBM FlashSystem, and committed to invest an additional $1 Billion US dollars in flash technologies.
On a $-per-IOPS basis, IBM FlashSystems can be 30 percent lower total-cost-of-ownership TCO than disk-based alternatives. The cost of Flash is offset by 17 percent fewer servers from having higher CPU utilization rates, resuling in 38 percent lower software license fees. Flash is also more efficient, with 74 percent lower in environmental costs, and 35 percent lower operational support costs. For many situations, Flash is the solution for poorly written software applications.
Ed also mentioned IBM's strong support for open source and open standards. Over the past 15 years, IBM as been a major contributor for open source efforts like Linux, Eclipse and Apache. IBM continues that tradition, with contributions to OpenStack and Hadoop.
Without going into any details, Ed also hinted that IBM announced 65 new or refreshed products in Storage, Networking and PureSystems. The details of each announcement would be explained during the break-out sessions during the week.
Charles Long, Founder and CEO of Centerline Digital
[Centerline Digital] does computer-generated animations in support of corporate marketing efforts.
(FTC disclosure: I work for IBM, and have worked closely with Centerline Digital marketing agency when I was the chief marketing strategist for System Storage back in 2006-2007. I was not paid or provided any products or services to mention any of the clients mentioned in this post.)
Charles indicates that internet technologies have converted "Analog dollars to digital pennies". Using IBM PureFlex with Storwize V7000 storage, real-time compression, and Tivoli Endpoint Manager, Centerline was able to drastically improve their business. He feels the old joke of "Better, Faster, Cheaper - Choose Any Two!" no longer applies with IBM solutions!
Ambuj Goyal, IBM General Manager, System Storage and Networking
Formerly my fifth-line manager in charge of Software and Systems, Ambuj switched to be the General Manager of System Storage and Networking group earlier this year.
In his former roles, Ambuj managed software and hardware product lines, but he feels storage is a completely different animal. In the past, clients focused on choosing the best servers, then chose their storage as an afterthought. Today, Ambuj feels that processors are now a commodity, and that storage is becoming the forethought.
Ambuj also highlighted the evolution of IBM's Software-Defined Environment:
In 2003, IBM introduced its the SAN Volume Controller, a storage hypervisor. Now, over 10,000 clients enjoy the benefits of a Software-Defined Environment using SAN Volume Controller.
SmartCloud Virtual Storage Center represents the "third generation" for policy-driven management, combining SAN Volume Controller, Tivoli Storage Productivity Center, FlashCopy Manager and the Storage Analytics Engine.
IBM is trying to help people keep their business critical apps running securely, to be able to start quickly, add value and functions at scale, and to leverage all of this data-intensive solutions to help drive new business and gain customer insight.
Joseph Balsamo, VP of Platform Engineering at Prudential Insurance
While the IT department of [Prudential Insurance] is focused on the three V's -- Volume, Velocity and Variety -- Joe is more focused on solutions, status and cost. His mission was to strengthen the role of IT as a partner through business aligned services. Prudential has deployed XIV, N series, SAN Volume Controller (SVC) and Storwize V7000 disk systems, with the following results:
Reduced their $-per-IOPS by 75 percent
No additional storage administrators
85 percent utilization through thick-to-thin migrations
Reduced their $-per-MB by 50 percent
Reduced their 72-hour RPO to 15 seconds
These benefits were achieved over the past 24 months of deployment.
Paulo Carvao, IBM Vice President, North America Systems & Technology Group
Paulo is Deon Newman's boss. He presented BlueInsight, IBM's internal "Business Analytics" cloud accessible by over 200,000 users, with over 1 PB of content.
Inside IBM, the deployment of a Smarter Infrastructure has allowed for 25 percent capacity growth at flat IT budget, with 30,000 fewer Megawatts and 103,000 square feet.
Why is this significant? Today's disk writes each bit of information across 1200 atoms, and the smallest number of atoms that can retain information is 12 bits, so sometime in the next 7 to 10 years, the improvements in magnetic bit density for disk will stop.
For silicon chips, the smallest practical feature is 7 nanometers, about 35 atoms wide. We are quickly approaching that limit also.
I can already tell that it's going to be a busy week! Follow me on twitter (@az990tony) and tag your posts and tweets with #IBMedge hashtag.
Continuing this week's theme about the future, fellow blogger, published author, and futurist David Houle is coming out with a new book this month titled [Entering the Shift Age]. This is a follow-on to his book, [The Shift Age].
Since this book cites IBM studies explicitly, his PR department asked me to review it. If you are an aspiring author that has a book you want me review, and it relates to the topics my blog covers like Cloud, Big Data, storage, and the explosion of information, feel free to send me a copy!
(FTC Disclosure: I work for IBM. I was not paid by anyone to mention this book on my blog. I was provided an "Uncorrected Advanced Copy" of this book at no cost to me for this review. I do not know David Houle personally, have not read any of his prior works, nor have I ever seen him speak at public events. This post is neither a paid nor celebrity endorsement of this author, his book, nor any other books by this author.)
First, let's get a few details out of the way:
Title:Entering the Shift Age, 284 pages Author: David Houle, futurist Genre: Non-fiction, trends and predictions
Publisher: Sourcebooks, Inc. Publish date: January 2013
As I mentioned in my post [Historians vs. Futurists], there is only one past, but there are many potential futures. There seems to be as many futurists out there as there are potential futures. I suspect not everyone will agree with all that David has written. However, this reminds me of one of my favorite quotes:
"When two futurists always agree, one is no longer necessary." -- old Italian adage
In his book, David asks a series of thought-provoking questions, then answers them with his views and opinions on how the future will roll out:
Is humanity now entering a new age that is different than the Information age?
If so, what should we call it?
Which forces are driving this new age?
How will this impact various aspects and institutions of society?
David feels humanity is indeed entering a new age, which he calls the Shift Age. This is driven by three forces: the shift to globalization of culture and politics, the flow of power and influence to individuals, and the acceleration of electronic connectedness.
In a sense, David is like a hunter-gatherer from the Stone age, hunting down trends and gathering ideas from others. In much the same way my compost brings renewed purpose to the rinds and pits of my fruits and vegetables, David's book does a good job paraphrasing the works of many of today's leading futurists.
David predicts the decade we are now in, the 2010's, will mark the end of the Information age, a transition period to this new era, that will lead to transformations in government, education, health, technology, and energy.
Over the past two weeks, I had time to enjoy a variety of movies. I had seen several whose stories wrapped around key moments of transition.
"Gone with the Wind", as well as the new offering "Lincoln" from Steven Spielberg. Both are set in the 1860's, the time of the [American Civil War], pitting the Industrial-age forces of the North, against the Agricultural-age economy of the South. This time saw the transition from slavery to freedom.
"Doctor Zhivago", set in the time of World War I, on the German-Russian front, as well as the Russian Revolution of 1917, and the resulting Civil War between the Red Guard and the White Army. This saw the transition from a Russian government ruled by Czars, to one ruled by the people through Communism.
"Lawrence of Arabia", also set in the time of World War I, but south in Arabia. T. E. Lawrence was able to bring several warring Arab tribes together to defeat the Turks, and was a key figure in the transition to an Arab National Council.
Some might call these completely unexpected [Black Swan] events, while others might feel they are merely fortunate (or misfortunate) sequences of events that led to inevitable social change. Has something happened, or will something happen later this decade, that will drive us to leave the Information Age?
David's previous book, The Shift Age, was published back in 2007, and a lot has happened in the past six years: a global financial melt-down recession; the Arab Spring uprisings in the Middle East; Barack Obama was elected and re-elected; man-made climate change in the form of hurricanes, tsunamis and superstorms hit various parts of the world; brush fires lit up Australia, and BP's Deepwater Horizon oil rig exploded off the Gulf coast, just to name a few.
David's new book reflects the impact of these recent events, from discussions on his [Evolutionshift] blog, to Q&A sessions he has after his public speaking presentations. For those who are not interested in the wide array of topics he covers in this one book, David also offers [a dozen different mini-eBooks] that cover specific topics like [Technology, Energy and Health].
My Rating: Moist and Flaky
Who should read this book: If you are a time-traveler from 1975 that came to this decade to learn all about what your future has in store, but can only select one book to read before you zoom back to your own time period, this would be the book I recommend.
I do not want to imply this is a quick read, or one that you can't put down once you start reading it. Just like you should not gulp down a full bottle of cheap Vodka in one sitting, this book should be read over a series of days, as I did, so that you can mull over in your mind the different points and thoughts he is trying to convey.
If you store your VMware bits on external SAN or NAS-based disk storage systems, this post is for you. The subject of the post, VM Volumes, is a potential storage management game changer!
Fellow blogger Stephen Foskett mentioned VM Volumes in his [Introducing VMware vSphere Storage Features] presentation at IBM Edge 2012 conference. His session on VMware's storage features included VMware APIs for Array Integration (VAAI), VMware Array Storage Awareness (VASA), vCenter plug-ins, and a new concept he called "vVol", now more formally known as VM Volumes. This post provides a follow-up to this, describing the VM Volumes concepts, architecture, and value proposition.
"VM Volumes" is a future architecture that VMware is developing in collaboration with IBM and other major storage system vendors. So far, very little information about VM Volumes has been released. At VMworld 2012 Barcelona, VMware highlights VM Volumes for the first time and IBM demonstrates VM Volumes with the IBM XIV Storage System (more about this demo below). VM Volumes is worth your attention -- when it becomes generally available, everyone using storage arrays will have to reconsider their storage management practices in a VMware environment -- no exaggeration!
But enough drama. What is this all about?
(Note: for the sake of clarity, this post refers to block storage only. However, the VM Volumes feature applies to NAS systems as well. Special thanks to Yossi Siles and the XIV development team for their help on this post!)
The VM Volumes concept is simple: VM disks are mapped directly to special volumes on a storage array system, as opposed to storing VMDK files on a vSphere datastore.
The following images illustrate the differences between the two storage management paradigms.
You may still be asking yourself: bottom line, how will I benefit from VM Volumes?
Well, take a VM snapshot for example. With VM Volumes, vSphere can simply offload the operation by invoking a hardware snapshot of the hardware volume. This has significant implications:
VM-Granularity: Only the right VMs are copied (with datastores, backing up or cloning individual-VM portions of hardware snapshot of a datastore would require more complex configuration, tools and work)
Hardware Offload: No ESXi server resources are consumed
XIV advantage: With XIV, snapshots consume no space upfront and are completed instantly.
Here's the first takeaway: With VM Volumes, advanced storage services (which cost a lot when you buy a storage array), will become available at an individual VM level. In a cloud world, this means that applications can be provisioned easily with advanced storage services, such as snapshots and mirroring.
Now, let's take a closer look at another relevant scenario where VM Volumes will make a lot of difference - provisioning an application with special mirroring requirements:
VM Volumes case: The application is ordered via the private cloud portal. The requestor checks a box requesting an asynchronous mirror. He changes the default RPO for his needs. When the request is submitted, the process wraps up automatically: Volumes are created on one of the storage arrays, configured with a mirror and RPO exactly as specified. A few minutes later, the requestor receives an automatic mail pointing to the application virtual machine.
Datastores case #1: As may be expected, a datastore that is mirrored with the special RPO does not exist. As a result, the automated workflow sets a pending status on the request, creates an urgent ticket to a VMware administrator and aborts. When the VMware admin handles that ticket, she re-assigns the ticket to the storage administrator, asking for a new volume which is mirrored with the special RPO, and mapped to the right ESXi cluster. The next day, the volume is created; the ticket is re-assigned to the storage admin, with the new LUN being pointed to. The VMware administrator follows and creates the datastore on top of it. Since the automated workflow was aborted, the admin re-assigns the ticket to the cloud administrator, who sometime later completes the application provisioning manually.
Datastores case #2: Luckily for the requestor, a datastore that is mirrored with the special RPO does exist. However, that particular datastore is consuming space from a high performance XIV Gen3 system with SSD caching, while the application does not require that level of performance, so the workflow requires a storage administrator approval. The approval is given to save time, but the storage administrator opens a ticket for himself to create a new volume on another array, as well as a follow-up ticket for the VMware admin to create a new datastore using the new volume and migrate the application to the other datastore. In this case, provisioning was relatively rapid, but required manual follow up, involving the two administrators.
Here's the second takeaway: With VM Volumes, management is simplified, and end-to-end automation is much more applicable. The reason is that there are no datastores. Datastores physically group VMs that may otherwise be totally unrelated, and require close coordination between storage and VMware administrators.
Now, the above mainly focuses on the VMware or cloud administrator perspective. How does VM Volumes impact storage management?
VM's are the new hosts: Today, storage administrators have visibility of physical hosts in their management environment. In a non-virtualized environment, this visibility is very helpful. The storage administrator knows exactly which applications in a data center are storage-provisioned or affected by storage management operations because the applications are running on well-known hosts. However, in virtualized environments the association of an application to a physical host is temporary. To keep at least the same level of visibility as in physical environments, VMs should become part of the storage management environment, like hosts. Hosts are still interesting, for example to manage physical storage mapping, but without VM visibility, storage administrators will know less about their operation than they are used to, or need to. VM Volumes enables such visibility, because volumes are provided to individual VMs. The XIV VM Volumes demonstration at VMworld Barcelona, although experimental, shows a view of VM volumes, in XIV's management GUI.
Here's a screenshot:
That's not all!
Storage Profiles and Storage Containers: A Storage Profile is a vSphere specification of a set of storage services. A storage profile can include properties like thin or thick provisioning, mirroring definition, snapshot policy, minimum IOPS, etc.
Storage administrators define a portfolio of supported storage services, maintained as a set of storage profiles, and published (via VASA integration) to vSphere.
VMware or cloud administrators define the required storage profiles for specific applications
VMware and storage administrators need to coordinate the typical storage requirements and the automatically-available storage services. When a request to provision an application is made, the associated storage profiles are matched against the published set of available storage profiles. The matching published profiles will be used to create volumes, which will be bound to the application VMs. All that will happen automatically.
Note that when a VM is created today, a datastore must be specified. With VM Volumes, a new management entity called Storage Container (also known as Capacity Pool) replaces the use of datastore as a management object. Each Storage Container exposes a subset of the available storage profiles, as appropriate. The storage container also has a capacity quota.
Here are some more takeaways:
New way to interface vSphere and storage management: Storage administrators structure and publish storage services to vSphere via storage profiles and storage containers.
Automated provisioning, out of the box: The provisioning process automatically matches application-required storage profiles against storage profiles available from the specified storage containers. There is no need to build custom scripts and custom processes to automate storage provisioning to applications
The XIV advantage:
XIV services are very simple to define and publish. The typical number of available storage profiles would be low. It would also be easy to define application storage profiles.
XIV provides consistent high performance, up to very high capacity utilization levels, without any maintenance. As a result, automated provisioning (which inherently implies less human attention) will not create an elevated risk of reduced performance.
Note: A storage vendor VASA provider is required to support VM Volumes, storage profiles, storage containers and automated provisioning. The IBM Storage VASA provider runs as a standalone service that needs to be deployed on a server.
To summarize the VM Volumes value proposition:
Streamline cloud operation by providing storage services at VM and application level, enabling end-to-end provisioning automation, and unifying VMware and storage administration around volumes and VMs.
Increase storage array ROI, improve vSphere scalability and response time, and reduce cloud provisioning lag, by offloading VM-level provisioning, failover, backup, storage migration, storage space recycling, monitoring, and more, to the storage array, using advanced storage operations such as mirroring and snapshots.
Simplify the adoption of VM Volumes using XIV, with smaller and simpler sets of storage profiles. Apply XIV's supreme fast cloning to individual VMs, and keep automation risks at bay with XIV's consistent high performance.
Until you can get your hands on a VM Volumes-capable environment, the VMware and IBM developer groups will be collaborating and working hard to realize this game-changing feature. The above information is definitely expected to trigger your questions or comments, and our development teams are eager to learn from them and respond. Enter your comments below, and I will try to answer them, and help shape the next post on this subject. There's much more to be told.
This month, I am pleased to announce the new [IBM STG Executive Briefing Center] website, representing a huge improvement over the previous website we had been using over the past two years. STG refers to IBM's Systems and Technology Group, the division that focuses on servers, storage, switches and the system software that makes them run. This new website is for the dozen STG EBCs that span the globe. The new website reminds me of this famous quote:
"Perfection is achieved, not when there is nothing left to add, but when there is nothing left to take away"
-- Antoine de Saint-Exupery
Let's take a quick look at what makes it so much better.
The previous website required registration. At every briefing, those of us who work in the EBCs had to pass around a sign-up sheet for email addresses from each attendee so that we could send them an invitation to register for the site. We would have a hard time reading people's handwriting, resulting in some emails coming back rejected.
Inspired by self-service gas stations, automated teller machines, and the many self-service portals of Cloud Computing, the new website has everything up-front, without registration. IBM Business Partners and sales representatives can easily request a briefing at any of the dozen briefing centers represented!
IBM-managed and IBM-hosted
We had a difficult time explaining to our attendees why our previous website was hosted on a lone machine and maintained by a third party. Think about it, IBM manages the data centers of over 400 clients. IBM has provided web hosting to the most mission critical workloads, with high levels of availability and reliability, and is recognized as one of the "Big 5" Cloud companies. I have done web design myself in my career, and we were terribly disappointed with the third party chosen to create and maintain our previous website, constantly having to point out errors in their HTML and CSS.
For the new website, IBM took back control. Staff from each EBC, myself included, came up with a simple page to bring the essence of each location to life. Special thanks to my colleage Hal Jennings, from the Austin EBC, for bringing this altogether!
Despite two years of manually registering attendees to use the previous website, Google Analytics showed that few people visited, and the few that did spent little time exploring the vast repository of content.
The new website is vastly simpler. The front page points to all twelve EBCs, and a single mouse click gets you to the location you are interested in, with all the details you need to make a decision to book a briefing, and the contact information to make it happen.
Elimination of Wasted and Duplicate Effort
In the previous website, we spent as much as 15 hours just to create, voice over, edit and produce a single 15-minute recorded presentation. Less than six percent of the previous website visitors watched more than five minutes of these videos, making us feel that most of our effort was wasted.
The EBC staff kept wasting their time, month after month, thanks to all-stick, no-carrot tactics that mandated minimums for contributions for more and more content that nobody was ever looking at. Even more disappointing was that much of our work duplicated the formal responsibilities of our IBM Marketing team. They weren't happy about this either, causing confusion between the roles of our two teams.
Finally, we said enough was enough! The new STG EBC website is a marvel in minimalism. If you want to see presentations, videos, expert profiles, or partake in on-going conversations, I welcome you to visit the [IBM Expert Network], the [IBM Storage YouTube Channel], and the [Storage Community] where they belong.
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
This week, I am in beautiful Sao Paulo, Brazil, teaching Top Gun class to IBM Business Partners and sales reps. Traditionally, we have "Tape Thursday" where we focus on our tape systems, from tape drives, to physical and virtual tape libraries. IBM is the number #1 tape vendor, and has been for the past eight years.
(The alliteration doesn't translate well here in Brazil. The Portuguese word for tape is "fita", and Thursday here is "quinta-feira", but "fita-quinta-feira" just doesn't have the same ring to it.)
In the class, we discussed how to handle common misperceptions and myths about tape. Here are a few examples:
Myth 1: Tape processing is manually intensive
In my July 2007 blog post [Times a Million], I coined the phrase "Laptop Mentality" to describe the problem most people have dealing with data center decisions. Many folks extend linearly their experiences using their PCs, workstations or laptops to apply to the data center, unable to comprehend large numbers or solutions that take advantage of the economies of scale.
For many, the only experience dealing with tape was manual. In the 1980s, we made "mix tapes" on little cassettes, and in the 1990s we recorded our favorite television shows on VHS tapes in the VCR. Today, we have playlists on flash or disk-based music players, and record TV shows on disk-based video recorders like Tivo. The conclusion is that tapes are manual, and disk are not.
Manual processing of tapes ended in 1987, with the introduction of a silo-like tape library from StorageTek. IBM quickly responded with its own IBM 3495 Tape Library Data Server in 1992. Today, clients have many tape automation choices, from the smallest IBM TS2900 Tape Autoloader that has one drive and nine cartridges, all the way to the largest IBM TS3500 multiple-library shuttle complex that can hold exabytes of data. These tape automation systems eliminate most of the manual handling of cartridges in day-to-day operations.
Myth 2: Tape media is less reliable than disk media
For any storage media to be unreliable is to return the wrong information that is different than what was originally stored. There are only two ways for this to happen: if you write a "zero" but read back a "one", or write a "one" and read a "zero". This is called a bit error. Every storage media has a "bit error rate" that is the average likelihood for some large amount of data written.
According to the latest [LTO Bit Error rates, 2012 March], today's tape expects only 1 bit error per 10E17 bits written (about 100 Petabytes). This is 10 times more reliable than Enterprise SAS disk (1 bit per 10E16), and 100 times more reliable than Enterprise-class SATA disk (1 bit per 10E15).
Tape is the media used in "black boxes" for airplanes. When an airplane crashes, the black box is retrieved and used to investigate the causes of the crash. In 1986, the Space Shuttle Challenger exploded 73 seconds after take-off. The tapes in the black box sat on the ocean floor for six weeks before being recovered. Amazingly, IBM was able to successfully restore [90 percent of the block data, and 100 percent of voice data].
Analysts are quite upset when they are quoted out of context, but in this case, Gartner never said anything closely similar to this. Nor did the other analysts that Curtis investigated for similar claims. What Garnter did say was that disk provides an attractive alternative storage media for backup which can increase the performance of the recovery process.
Back in the 1990s, Savur Rao and I developed a patent to help backup DB2 for z/OS by using the FlashCopy feature of IBM's high-end disk system. The software method to coordinate the FlashCopy snapshots with the database application and maintain multiple versions was implemented in the DFSMShsm component of DFSMS. A few years later, this was part of a set of patents IBM cross-licensed to Microsoft for them to implement a similar software for Windows called Data Protection Manager (DPM). IBM has since introduced its own version for distributed systems called IBM Tivoli FlashCopy Manager that runs not just on Windows, but also AIX, Linux, HP-UX and Solaris operating systems.
Curtis suspects the "71 percent" citation may have been propogated by an ambitious product manager of Microsoft's Data Protection Manager, back in 2006, perhaps to help drive up business to their new disk-based backup product. Certainly, Microsoft was not the only vendor to disparage tape in this manner.
A few years ago, an [EMC failure brought down the State of Virginia] due to not just a component failure it its production disk system, but then made it worse by failing to recover from the disk-based remote mirror copy. Fortunately, the data was able to be restored from tape over the next four days. If you wonder why nobody at EMC says "Tape is Dead" anymore, perhaps it is because tape saved their butts that week.
(FTC Disclosure: I work for IBM and this post can be considered a paid, celebrity endorsement for all of the IBM tape and software products mentioned on this post. I own shares of stock in both IBM and Google, and use Google's Gmail for my personal email, as well as many other Google services. While IBM, Google and Microsoft can be considered competitors to each other in some areas, IBM has working relationships with both companies on various projects. References in this post to other companies like EMC are merely to provide illustrative examples only, based on publicly available information. IBM is part of the Linear Tape Open (LTO) consortium.)
Myth 4: Vendors and Manufacturers are no longer investing in tape technology
IBM and others are still investing Research and Development (R&D) dollars to improve tape technology. What people don't realize is that much of the R&D spent on magnetic media can be applied across both disk and tape, such as IBM's development of the Giant Magnetoresistance read/write head, or [GMR] for short.
Most recently, IBM made another major advancement with tape with the introduction of the Linear Tape File Systems (LTFS). This allows greater portability to share data between users, and between companies, but treating tape cartridges much like USB memory sticks or pen drives. You can read more in my post [IBM and Fox win an Emmy for LTFS technology]!
Next month, IBM celebrates the 60th anniversary for tape. It is good to see that tape continues to be a vibrant part of the IT industry, and to IBM's storage business!
Well, it's Tuesday again, and you know what that means!
This Thursday is the Thanksgiving holiday here in the United States, so instead of announcing IBM products, I wanted to announce the general availability of my latest book, [Inside System Storage: Volume III].
This book includes blog posts from May 2008 to March 2009, along with the ever popular behind-the-scenes commentary on what was going on during IBM's launch of the Information Infrastructure initiative.
Do you know someone who celebrates Chanukah, Christmas, Kwanza, or the Winter Solstice, and have a hard time finding the right gift?
Do you know a client or IBM Business Partner that would appreciate a nominally-priced gift to thank them for their business?
Do you know someone newly hired into IBM or another IT company that could benefit from behind-the-scenes insight and commentary?
As with the other two volumes, Inside System Storage: Volume III is available in your choice of paperback, hardcover, and eBook (Adobe PDF) format.
In the spirit of Thanksgiving, I would like to thank my editor, Susan Pollard, who put in the extra effort, working evenings and weekends, to get this book done in time for the upcoming holiday season. For those outside the United States, there is an American tradition to shop in brick-and-mortar stores on Black Friday (the day after Thanksgiving) and to shop on-line for books like mine on Cyber Monday (the Monday after Thanksgiving).
I would also like to thank my publisher, Lulu.com, for upgrading me to "Spotlight" level, so now I have a spotlight page titled [Books Written by Tony Pearson], making it easy for you to order any of my books in various formats.
And last, but not least, I would like to thank all my friends and family that were supportive these past few difficult months while I was putting this book together.
Next month, I will be in Las Vegas, Dec 4-8, speaking at Gartner's [Data Center Conference]. If you order a book today, and bring it with you to the IBM booth at the Solution Expo, I can sign it for you!
This week, IBM made over a dozen announcements related to IBM storage products. Here is part 2 of my overview:
IBM System Storage® DS8000 series microcode
One of the advantages of acquiring XIV as IBM's other high-end disk system, is that it allows the DS8000 team to focus on the IBM i and z/OS operating systems. As a result, IBM DS8000 has over half the mainframe-attach market share.
For both the DS8700 and DS8800 models, IBM Easy Tier now support sub-LUN automated tiering across three storage tiers: Solid-State Drives, high-performance spinning disk drives (15K and 10K RPM), and high-capacity disk drives (7200 RPM).
For System z customers, the latest DS8000 microcode has synergy with z/OS and GDPS, now supporting 4x larger EAV volumes, faster high-performance FICON (zHPF), and Workload Manager (WLM) integration with the I/O Priority Manager. IBM has a world record SAP performance of 59 million account postings per hour. DB2 v10 for z/OS queries were measured at 11x faster using the new zHPF feature.
IBM System Storage® DS8800 systems
On the hardware side, the DS8800 now supports a fourth frame to hold a total over 1,500 disk drives. Yes, we have customers that three frames wasn't enough, and they wanted more.
IBM is now also offering new drive options. Small Form Factor (2.5 inch) drives now include 300GB 15K RPM drives, and a 900GB 10K RPM drives. But wait! There's more! The DS8800 is no longer a SFF-only box, it now allows for mixing in Large form factor (3.5 inch) drives, starting with the 3TB NL-SAS 7200 RPM drive.
IBM XIV® Storage System Gen3
We announced the XIV Gen3 already, but we have two enhancements.
First, we now offer a model based entirely on 3TB NL-SAS drives. If you are thinking, what IBM is going to put 3TB drives into everything? Yup. Once we go through all the pain and suffering of qualifying a drive, we make sure we get our money's worth!
Secondly, we have now an iPad application to manage the XIV. This has nothing to do with Apple CEO Steve Jobs passing away last week, it was merely coincidence.
IBM Real-time Compression Appliances™ STN6500 and STN6800 V3.8
The latest software for RtCA now supports Microsoft SMB v2, and enhanced reporting so that storage admins know exactly the benefits of the compression ratios of different file extensions.
IBM System Storage EXP2500 Express®
The EXP2500 is for direct-attach situations, like the IBM BladeCenter. IBM adds LFF 3.5-inch 3TB NL-SAS drives, SFF 2.5-inch 300GB 15K RPM SAS drives, and 900GB 7200 RPM NL-SAS drives.
My colleague Curtis Neal refers to these as "B.F.D" announcements, which of course stands for Bigger, Faster, Denser!
Last week, fellow IBMer Ron Riffe started his three-part series on the Storage Hypervisor. I discussed Part I already in my previous post [Storage Hypervisor Integration with VMware]. We wrapped up the week with a Live Chat with over 30 IT managers, industry analysts, independent bloggers, and IBM storage experts.
"The idea of shopping from a catalog isn’t new and the cost efficiency it offers to the supplier isn’t new either. Public storage cloud service providers seized on the catalog idea quickly as both a means of providing a clear description of available services to their clients, and of controlling costs. Here’s the idea… I can go to a public cloud storage provider like Amazon S3, Nirvanix, Google Storage for Developers, or any of a host of other providers, give them my credit card, and get some storage capacity. Now, the “kind” of storage capacity I get depends on the service level I choose from their catalog.
Most of today’s private IT environments represent the complete other end of the pendulum swing – total customization. Every application owner, every business unit, every department wants to have complete flexibility to customize their storage services in any way they want. This expectation is one of the reasons so many private IT environments have such a heavy mix of tier-1 storage. Since there is no structure around the kind of requests that are coming in, the only way to be prepared is to have a disk array that could service anything that shows up. Not very efficient… There has to be a middle ground.
Private storage clouds are a little different. Administrators we talk to aren’t generally ready to let all their application owners and departments have the freedom to provision new storage on their own without any control. In most cases, new capacity requests still need to stop off at the IT administration group. But once the request gets there, life for the IT administrator is sweet!
Here comes the request from an application owner for 500GB of new “Database” capacity (one of the options available in the storage service catalog) to be attached to some server. After appropriate approvals, the administrator can simply enter the three important pieces of information (type of storage = “Database”, quantity = 500GB, name of the system authorized to access the storage) and click the “Go” button (in TPC SE it’s actually a “Run now” button) to automatically provision and attach the storage. No more complicated checklists or time consuming manual procedures.
A storage hypervisor increases the utilization of storage resources, and optimizes what is most scarce in your environment. For Linux, UNIX and Windows servers, you typically see utilization rates of 20 to 35 percent, and this can be raised to 55 to 80 percent with a storage hypervisor. But what is most scarce in your environment? Time! In a competitive world, it is not big animals eating smaller ones as much as fast ones eating the slow.
Want faster time-to-market? A storage hypervisor can help reduce the time it takes to provision storage, from weeks down to minutes. If your business needs to react quickly to changes in the marketplace, you certainly don't want your IT infrastructure to slow you down like a boat anchor.
Want more time with your friends and family? A storage hypervisor can migrate the data non-disruptively, during the week, during the day, during normal operating hours, instead of scheduling down-time on an evenings and weekends. As companies adopt a 24-by-7 approach to operations, there are fewer and fewer opportunities in the year for scheduled outages. Some companies get stuck paying maintenance after their warranty expires, because they were not able to move the data off in time.
Want to take advantage of the new Solid-State Drives? Most admins don't have time to figure out what applications, workloads or indexes would best benefit from this new technology? Let your storage hypervisor automated tiering do this for you! In fact, a storage hypervisor can gather enough performance and usage statistics to determine the characteristics of your workload in advance, so that you can predict whether solid-state drives are right for you, and how much benefit you would get from them.
Want more time spent on strategic projects? A storage hypervisor allows any server to connect to any storage. This eliminates the time wasted to determine when and how, and let's you focus on the what and why of your more strategic transformational projects.
If this sounds all too familiar, it is similar to the benefits that one gets from a server hypervisor -- better utilization of CPU resources, optimizing the management and administration time, with the agility and flexibility to deploy new technologies in and decommission older ones out.
"Server virtualization is a fairly easy concept to understand: Add a layer of software that allows processing capability to work across multiple operating environments. It drives both efficiency and performance because it puts to good use resources that would otherwise sit idle.
Storage virtualization is a different animal. It doesn't free up capacity that you didn't know you had. Rather, it allows existing storage resources to be combined and reconfigured to more closely match shifting data requirements. It's a subtle distinction, but one that makes a lot of difference between what many enterprises expect to gain from the technology and what it actually delivers."
Jon Toigo on his DrunkenData blog brings back the sanity with his post [Once More Into the Fray]. Here is an excerpt:
"What enables me to turn off certain value-add functionality is that it is smarter and more efficient to do these functions at a storage hypervisor layer, where services can be deployed and made available to all disk, not to just one stand bearing a vendor’s three letter acronym on its bezel. Doesn’t that make sense?
I think of an abstraction layer. We abstract away software components from commodity hardware components so that we can be more flexible in the delivery of services provided by software rather than isolating their functionality on specific hardware boxes. The latter creates islands of functionality, increasing the number of widgets that must be managed and requiring the constant inflation of the labor force required to manage an ever expanding kit. This is true for servers, for networks and for storage.
Can we please get past the BS discussion of what qualifies as a hypervisor in some guy’s opinion and instead focus on how we are going to deal with the reality of cutting budgets by 20% while increasing service levels by 10%. That, my friends, is the real challenge of our times."
Did you miss out on last Friday's Live Chat? We are doing it again this Friday, covering parts I and II of Ron's posts, so please join the conversation! The virtual dialogue on this topic will continue in another [Live Chat] on September 30, 2011 from 12 noon to 1pm Eastern Time.
Can you believe it has been five years since I started blogging?
(If you absolutely abhor the navel-gazing associated with blogging-about-blogging posts, then by all means stop reading now!)
Back in July 2005, IBM decided to merge together two brands, IBM eServer and IBM TotalStorage, into a single all-encompassing "IBM Systems" brand. Thus TotalStorage brand became the "IBM System Storage" product line of the "IBM Systems" brand. The next six months was spent renaming some (not all) of the products. The following January, I was named the Marketing Strategist for this new product line, with the mission to help promote the new naming convention.
We looked at possibly doing a regularly-scheduled podcast, but nobody back then, including myself, were familar with audio editing tools. Instead, we chose a blog. Most blogs at IBM are internal, safely hidden behind the firewall, accessible only to IBM employees. I wanted mine to be different, to be accessible to the public, clients, prospects, IBM Business Partners, and yes, even those working for IBM's various competitors. One thing I like about blogs is that if you have a typo, or make a mistake, you can go back and correct it after it has posted.
Marketing through social media is quite different than traditional marketing techniques. Management was supportive, but legal wanted to review and approval everything I wrote before I posted it onto my blog. Official IBM Press Releases, for example, go through a dozen reviews before they are finally made public. I refused. This kind of review and approval would ruin the blogging process.
Fortunately, this blog was not my first attempt at technical writing. Our legal counsel reviewed my past trip reports from various conferences, and decided to let me blog without review. Occasionally, someone will reivew my blog once already posted, and ask me to make some corrections. It reminds me of my favorite saying used heavily within IBM:
Despite these delays, we managed to launch this blog in September 2006, just in time to celebrate the 50th anniversary of disk systems. IBM introduced the industry's first commercial disk system on September 13, 1956.
Over the years, this blog has helped sales reps and IBM Business Partners close deals, and address the FUD their prospects heard from competition. I have helped my readers get in touch with the right people within IBM. And, I have "sent the elevator back down", helping other IBMers launch their own blogs, including [Barry Whyte], [Elisabeth Stahl], and [Anthony Vandewerdt].
Today, bloggers have a profound impact on the world. Not everyone has a positive view on this. Bloggers and other users of social media have been seen as whistle-blowers for fraudulent corporations, as activists against corrupt governments and dictators, and as subject matter experts and fact checkers referenced during television and radio newscasts. In a recent movie, one of the major characters was a trouble-making blogger, and another character describes his blogging as nothing more than "graffiti with punctuation."
I want to thank all of my readers for making this the #1 most influential blog on IBM DeveloperWorks in 2011! This blog has been [published in a series of books], Inside System Storage Volume I and Volume II. And yes, before you all ask in the comments below, I am actively working on Volume III.
For a bit of nostalgia, I invite you to read my first 21 blog posts that I posted back in [September 2006].