This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections Platform is now in read-only mode and content is only available for viewing. No new wiki pages, posts, or messages may be added. Please see our FAQ for more information. The developerWorks Connections platform will officially shut down on March 31, 2020 and content will no longer be available. More details available on our FAQ. (Read in Japanese.)
Full VMware Vstorage API for Array Integration (VAAI). Back in 2008, VMware announced new vStorage APIs for its vSphere ESX hypervisor: vStorage API for Site Recovery Manager, vStorage API for Data Potection, vStorage API for Multipathing. Last July, VMware added a new API called vStorage API for Array Integration [VAAI] which offers three primitives:
Hardware-assisted Blocks zeroing. Sometimes referred to as "Write Same", this SCSI command will zero out a large section of blocks, presumably as part of a VMDK file. This can then be used to reclaim space on the XIV on thin-provisioned LUNs.
Hardware-assisted Copy. Make an XIV snapshot of data without any I/O on the server hardware.
Hardware-assisted locking. On mainframes, this is call Parallel Access Volumes (PAV). Instead of locking an entire LUN using standard SCSI reserve commands, this primitive allows an ESX host to lock just an individual block so as not to interfere with other hosts accessing other blocks on that same LUN.
Quality of Service (QoS) Performance Classes.
When XIV was first released, it treated all hosts and all data the same, even when deployed for a variety of different applications. This worked for some clients, such as [Medicare y Mucho Más]. They migrated their databases, file servers and email system from EMC CLARiiON to an IBM XIV Storage System. In conjunction with VMware, the XIV provides a highly flexible and scalable virtualized architecture, which enhances the company's business agility.
However, other clients were skeptical, and felt they needed additional "nobs" to prioritize different workloads. The new 10.2.4 microcode allows you to define four different "performance classes". This is like the door of a nightclub. All the regular people are waiting in a long line, but when a celebrity in a limo arrives, the bouncer unclips the cord, and lets the celebrity in. For each class, you provide IOPS and/or MB/sec targets, and the XIV manages to those goals. Performance classes are assigned to each host based on their value to the business.
Offline Initialization for Asynchronous Mirror.
Internally, we called this Truck Mode. Normally, when a customer decides to start using Asynchronous Mirror, they already have a lot of data at the primary location, and so there is a lot of data to send over to the new XIV box at the secondary location. This new feature allows the data to be dumped to tape at the primary location. Those tapes are shipped to the secondary location and restored on the empty XIV. The two XIV boxes are then connected for Asynchronous Mirroring, and checksums of each 64KB block are compared to determine what has changed at the primary during this "tape delivery time". This greatly reduces the time it takes for the two boxes to get past the initial synchronization phase.
IP-based Replication. When IBM first launched the Storwize V7000 last October, people commented that the one feature they felt missing was IP-based replication. Sure, we offered FCP-based replication as most other Enterprise-class disk systems offer today, but many midrange systems also offer IP-based repliation to reduce the need for expensive FCIP routers. [IBM Tivoli Storage FastBack for Storwize V7000] provides IP-based replication for Storwize V7000 systems.
Network Attached Storage
IBM announced two new models of the IBM System Storage N series. The midrange N6240 supports up to 600 drives, replacing the N6040 system. The entry-level N6210 supports up to 240 drives, and replaces the N3600 system. Details for both are available on the latest [data sheet].
IBM Real-Time Compression appliances work with all N series models to provide additional storage efficiency. Last October, I provided the [Product Name Decoder Ring] for the STN6500 and STN6800 models. The STN6500 supports 1 GbE ports, and the STN6800 supports 10GbE ports (or a mix of 10GbE and 1GbE, if you prefer). The IBM versions of these models were announced last December, but some people were on vacation and might have missed it. For more details of this, read the [Resources page], the [landing page], or [watch this video].
IBM System Storage DS3000 series
IBM System Storage [DS3524 Express DC and EXP3524 Express DC] models are powered with direct current (DC) rather than alternating current (AC). The DS3524 packs dual controllers and two dozen small-form factor (2.5 inch) drives in a compact 2U-high rack-optimized module. The EXP3524 provides addition disk capacity that can be attached to the DS3524 for expansion.
Large data centers, especially those in the Telecommunications Industry, receive AC from their power company, then store it in a large battery called an Uninterruptible Power Supply (UPS). For DC-powered equipment, they can run directly off this battery source, but for AC-powered equipment, the DC has to be converted back to AC, and some energy is lost in the conversion. Thus, having DC-powered equipment is more energy efficient, or "green", for the IT data center.
Whether you get the DC-powered or AC-powered models, both are NEBS-compliant and ETSI-compliant.
New Tape Drive Options for Autoloaders and Libraries
IBM System Storage [TS2900 Autoloader] is a compact 1U-high tape system that supports one LTO drive and up to 9 tape cartridges. The TS2900 can support either an LTO-3, LTO-4 or LTO-5 half-height drive.
IBM System Storage [TS3100 and TS3200 Tape Libraries] were also enhanced. The TS3100 can accomodate one full-height LTO drive, or two half-height drives, and hold up to 24 cartridges. The TS3200 offers twice as many drives and space for cartridges.
The IBM Challenge was a big success. One of the contestants, Ken Jennings, [welcomes our new computer overlords]. Congratulations are in order to the IBM Research team who pulled off this Herculean effort!
Some folks have poked fun at some of the odd responses and wager amounts from the IBM Watson computer during the three-day tournament. Others were surprised as I was that the impressive feat was done with less than 1TB of stored data. Here is what John Webster wrote in CNET yesterday, in hist article [What IBM's Watson says to storage systems developers]:
"All well and good. But here's what I find most interesting as a result of what IBM has done in response to the Grand Challenge that motivated Watson's creators. We know, from Tony Pearson's blog, that the foundation of Watson's data storage system is a modified IBM SONAS cluster with a total of 21.6TB of raw capacity. But Pearson also reveals another very significant, and to me, surprising data point: "When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte."
What Pearson just said is that the data set Watson actually uses to reach his push-the-button decision would fit on a 1TB drive. So much for big data?"
To better appreciate how difficult the challenge was, and how a small amount of data can answer a billion different questions, I thought I would cover Business Intelligence, Data Retrieval and Text Mining concepts.
"In this paper, business is a collection of activities carried
on for whatever purpose, be it science, technology,
commerce, industry, law, government, defense, et cetera.
The communication facility serving the conduct of a business
(in the broad sense) may be referred to as an intelligence
system. The notion of intelligence is also defined
here, in a more general sense, as the ability to apprehend
the interrelationships of presented facts in such a way as
to guide action towards a desired goal."
Ideally, when you need "Business Intelligence" to help you make a better decision, you perform data retrieval from a structured database for the specific information you are looking for. In other cases, you might be looking for insight, patterns or trends. In that case, you go "data mining" against your structured databases.
Here's a simple example. John runs a fruit stand. One day, he kept track of how many apples and oranges were bought by men and women. How many questions can we ask against this small set of data? Let's count them:
How many apples were sold to men?
How many apples were sold to women?
How many oranges were sold to men?
How many oranges were sold to women?
But wait! For each row and column, we can combine them into totals.
How many apples were sold in total?
How many oranges were sold in total?
How many fruit in total were sold to men?
How many fruit in total were sold to women?
How many fruit in total were sold?
But wait, there's more! Each row and column can be evaluated for relative percentages, as well as percentages of each cell compared to the total. You could make five relevant pie-charts from this data. This results in 16 more questions, such as:
Of the fruit purchased by men, what percentage for apples?
Of all the apples purchased, what percentage by women?
And that's not including more ethereal questions, such as:
Are there gender-specific preferences for different types of fruit?
What type of fruit do men prefer?
This is just for a small set, two market segments (by gender) and two products (apples and oranges). However, if you have many market segments (perhaps by age group, zip code, etc.) and many products, the number of queries that can be supported is huge. For small sets of data, you can easily do this with a spreadsheet program like IBM Lotus Symphony or Microsoft Excel.
But why limit yourself to two dimensions? The above example was just for one day's worth of activity, if John captures this data for every day for historical and seasonal trending, it can be represented as a three-dimensional cube. The number of queries becomes astronomical. This is the basis for Online Analytical Processing (OLAP), and three-dimensional tables are often referred to as [OLAP cubes].
Back in 1970, IBM invented the Structured Query Language [SQL], and today, nearly all modern relational databases support this, including IBM DB2, Informix, Microsoft SQL Server, and Oracle DB. SQL poses two challenges. First, you had to structure the data in advance to the way you expect to perform your ad-hoc queries. Deciding the groups and categories in advance can limit the way information is recorded and captured.
Second, you had to be skilled at SQL to phrase your queries correctly to retrieve the data you are after. What ended up happening was that skilled SQL programmers would develop "canned reports" with fixed SQL parameters, so that less-skilled business decision makers could base their decisions from these reports.
IBM has fully integrated stacks to help process structured data, combining servers, storage, and advanced analytics software into a complete appliance. IBM offers the [Smart Analytics System] for robust, customized deployments, and recently acquired [Netezza] for pre-configured, and more rapid deployments.
However, the bigger problem is that more than 80 percent of information is not structured!
Semi-structured data like email provides some searchable fields like From and Subject. The rest of the information is unstructured, such as text files, photographs, video and audio. To look for specific information in unstructured sources can be like looking for a needle in a haystack, and trying to get insight, patterns or trends involves text mining.
This, in effect, is what IBM Watson was able to perform so well this week. Finding the needle in the haystacks of unstructured data from 200 million pages of text stored in its system, combined with the ability to apprehend the interrelationships of meaning and subtle nuance, resulted in an impressive technology demonstration. Certainly, this new technology will be powerful for a variety of use cases across a broad set of industries!
For the longest time, people thought that humans could not run a mile in less than four minutes. Then, in 1954, [Sir Roger Bannister] beat that perception, and shortly thereafter, once he showed it was possible, many other runners were able to achieve this also. The same is being said now about the IBM Watson computer which appeared this week against two human contestants on Jeopardy!
(2014 Update: A lot has happened since I originally wrote this blog post! I intended this as a fun project for college students to work on during their summer break. However, IBM is concerned that some businesses might be led to believe they could simply stand up their own systems based entirely on open source and internally developed code for business use. IBM recommends instead the [IBM InfoSphere BigInsights] which packages much of the software described below. IBM has also launched a new "Watson Group" that has [Watson-as-a-Service] capabilities in the Cloud. To raise awareness to these developments, IBM has asked me to rename this post from IBM Watson - How to build your own "Watson Jr." in your basement to the new title IBM Watson -- How to replicate Watson hardware and systems design for your own use in your basement. I also took this opportunity to improve the formatting layout.)
Often, when a company demonstrates new techology, these are prototypes not yet ready for commercial deployment until several years later. IBM Watson, however, was made mostly from commercially available hardware, software and information resources. As several have noted, the 1TB of data used to search for answers could fit on a single USB drive that you buy at your local computer store.
Take a look at the [IBM Research Team] to determine how the project was organized. Let's decide what we need, and what we don't in our version for personal use:
Do we need it for personal use?
Yes, That's you. Assuming this is a one-person project, you will act as Team Lead.
Yes, I hope you know computer programming!
No, since this version for personal use won't be appearing on Jeopardy, we won't need strategy on wager amounts for the Daily Double, or what clues to pick next. Let's focus merely on a computer that can accept a question in text, and provide an answer back, in text.
Yes, this team focused on how to wire all the hardware together. We need to do that, although this version for personal use will have fewer components.
Optional. For now, let's have this version for personal use just return its answer in plain text. Consider this Extra Credit after you get the rest of the system working. Consider using [eSpeak], [FreeTTS], or the Modular Architecture for Research on speech sYnthesis [MARY] Text-to-Speech synthesizers.
Yes, I will explain what this is, and why you need it.
Yes, we will need to get information for personal use to process
Yes, this team developed a system for parsing the question being asked, and to attach meaning to the different words involved.
No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.
(Disclaimer: As with any Do-It-Yourself (DIY) project, I am not responsible if you are not happy with your version for personal use I am basing the approach on what I read from publicly available sources, and my work in Linux, supercomputers, XIV, and SONAS. For our purposes, this version for personal use is based entirely on commodity hardware, open source software, and publicly available sources of information. Your implementation will certainly not be as fast or as clever as the IBM Watson you saw on television.)
Step 1: Buy the Hardware
Supercomputers are built as a cluster of identical compute servers lashed together by a network. You will be installing Linux on them, so if you can avoid paying extra for Microsoft Windows, that would save you some money. Here is your shopping list:
Three x86 hosts, with the following:
64-bit quad-core processor, either Intel-VT or AMD-V capable,
8GB of DRAM, or larger
300GB of hard disk, or larger
CD or DVD Read/Write drive
Computer Monitor, mouse and keyboard
Ethernet 1GbE 4-port hub, and appropriate RJ45 cables
Surge protector and Power strip
Local Console Monitor (LCM) 4-port switch (formerly known as a KVM switch) and appropriate cables. This is optional, but will make it easier during the development. Once your implementation is operational, you will only need the monitor and keyboard attached to one machine. The other two machines can remain "headless" servers.
Step 2: Establish Networking
IBM Watson used Juniper switches running at 10Gbps Ethernet (10GbE) speeds, but was not connected to the Internet while playing Jeopardy! Instead, these Ethernet links were for the POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes.
The implementation will be able to run "disconnected from the Internet" as well. However, you will need Internet access to download the code and information sources. For our purposes, 1GbE should be sufficient. Connect your Ethernet hub to your DSL or Cable modem. Connect all three hosts to the Ethernet switch. Connect your keyboard, video monitor and mouse to the LCM, and connect the LCM to the three hosts.
Step 3: Install Linux and Middleware
To say I use Linux on a daily basis is an understatement. Linux runs on my Android-based cell phone, my laptop at work, my personal computers at home, most of our IBM storage devices from SAN Volume Controller to XIV to SONAS, and even on my Tivo at home which recorded my televised episodes of Jeopardy!
For this project, you can use any modern Linux distribution that supports KVM. IBM Watson used Novel SUSE Linux Enterprise Server [SLES 11]. Alternatively, I can also recommend either Red Hat Enterprise Linux [RHEL 6] or Canonical [Ubuntu v10]. Each distribution of Linux comes in different orientations. Download the the 64-bit "ISO" files for each version, and burn them to CDs.
Graphical User Interface (GUI) oriented, often referred to as "Desktop" or "HPC-Head"
Command Line Interface (CLI) oriented, often referred to as "Server" or "HPC-Compute"
Guest OS oriented, to run in a Hypervisor such as KVM, Xen, or VMware. Novell calls theirs "Just Enough Operating System" [JeOS].
For this version for personal use, I have chosen a [multitier architecture], sometimes referred to as an "n-tier" or "client/server" architecture.
Host 1 - Presentation Server
For the Human-Computer Interface [HCI], the IBM Watson received categories and clues as text files via TCP/IP, had a [beautiful avatar] representing a planet with 42 circles streaking across in orbit, and text-to-speech synthesizer to respond in a computerized voice. Your implementation will not be this sophisticated. Instead, we will have a simple text-based Query Panel web interface accessible from a browser like Mozilla Firefox.
Host 1 will be your Presentation Server, the connection to your keyboard, video monitor and mouse. Install the "Desktop" or "HPC Head Node" version of Linux. Install [Apache Web Server and Tomcat] to run the Query Panel. Host 1 will also be your "programming" host. Install the [Java SDK] and the [Eclipse IDE for Java Developers]. If you always wanted to learn Java, now is your chance. There are plenty of books on Java if that is not the language you normally write code.
While three little systems doesn't constitute an "Extreme Cloud" environment, you might like to try out the "Extreme Cloud Administration Tool", called [xCat], which was used to manage the many servers in IBM Watson.
Host 2 - Business Logic Server
Host 2 will be driving most of the "thinking". Install the "Server" or "HPC Compute Node" version of Linux. This will be running a server virtualization Hypervisor. I recommend KVM, but you can probably run Xen or VMware instead if you like.
Host 3 - File and Database Server
Host 3 will hold your information sources, indices, and databases. Install the "Server" or "HPC Compute Node" version of Linux. This will be your NFS server, which might come up as a question during the installation process.
Technically, you could run different Linux distributions on different machines. For example, you could run "Ubuntu Desktop" for host 1, "RHEL 6 Server" for host 2, and "SLES 11" for host 3. In general, Red Hat tries to be the best "Server" platform, and Novell tries to make SLES be the best "Guest OS".
My advice is to pick a single distribution and use it for everything, Desktop, Server, and Guest OS. If you are new to Linux, choose Ubuntu. There are plenty of books on Linux in general, and Ubuntu in particular, and Ubuntu has a helpful community of volunteers to answer your questions.
Step 4: Download Information Sources
You will need some documents for your implementation to process.
IBM Watson used a modified SONAS to provide a highly-available clustered NFS server. For this version, we won't need that level of sophistication. Configure Host 3 as the NFS server, and Hosts 1 and 2 as NFS clients. See the [Linux-NFS-HOWTO] for details. To optimize performance, host 3 will be the "official master copy", but we will use a Linux utility called rsync to copy the information sources over to the hosts 1 and 2. This allows the task engines on those hosts to access local disk resources during question-answer processing.
We will also need a relational database. You won't need a high-powered IBM DB2. Your implementation can do fine with something like [Apache Derby] which is the open source version of IBM CloudScape from its Informix acquisition. Set up Host 3 as the Derby Network Server, and Hosts 1 and 2 as Derby Network Clients. For more about structured content in relational databases, see my post [IBM Watson - Business Intelligence, Data Retrieval and Text Mining].
Linux includes a utility called wget which allows you to download content from the Internet to your system. What documents you decide to download is up to you, based on what types of questions you want answered. For example, if you like Literature, check out the vast resources at [FullBooks.com]. You can automate the download by writing a shell script or program to invoke wget to all the places you want to fetch data from. Rename the downloaded files to something unique, as often they are just "index.html". For more on wget utility, see [IBM Developerworks].
Step 5: The Query Panel - Parsing the Question
Next, we need to parse the question and have some sense of what is being asked for. For this we will use [OpenNLP] for Natural Language Processing, and [OpenCyc] for the conceptual logic reasoning. See Doug Lenat presenting this 75-minute video [Computers versus Common Sense]. To learn more, see the [CYC 101 Tutorial].
Unlike Jeopardy! where Alex Trebek provides the answer and contestants must respond with the correct question, we will do normal Question-and-Answer processing. To keep things simple, we will limit questions to the following formats:
Who is ...?
Where is ...?
When did ... happen?
What is ...?
Host 1 will have a simple Query Panel web interface. At the top, a place to enter your question, and a "submit" button, and a place at the bottom for the answer to be shown. When "submit" is pressed, this will pass the question to "main.jsp", the Java servlet program that will start the Question-answering analysis. Limiting the types of questions that can be posed will simplify hypothesis generation, reduce the candidate set and evidence evaluation, allowing the analytics processing to continue in reasonable time.
Step 6: Unstructured Information Management Architecture
The "heart and soul" of IBM Watson is Unstructured Information Management Architecture [UIMA]. IBM developed this, then made it available to the world as open source. It is maintained by the [Apache Software Foundation], and overseen by the Organization for the Advancement of Structured Information Standards [OASIS].
Basically, UIMA lets you scan unstructured documents, gleam the important points, and put that into a database for later retrieval. In the graph above, DBs means 'databases' and KBs means 'knowledge bases'. See the 4-minute YouTube video of [IBM Content Analytics], the commercial version of UIMA.
Starting from the left, the Collection Reader selects each document to process, and creates an empty Common Analysis Structure (CAS) which serves as a standardized container for information. This CAS is passed to Analysis Engines , composed of one or more Annotators which analyze the text and fill the CAS with the information found. The CAS are passed to CAS Consumers which do something with the information found, such as enter an entry into a database, update an index, or update a vote count.
(Note: This point requires, what we in the industry call a small matter of programming, or [SMOP]. If you've always wanted to learn Java programming, XML, and JDBC, you will get to do plenty here. )
If you are not familiar with UIMA, consider this [UIMA Tutorial].
Step 7: Parallel Processing
People have asked me why IBM Watson is so big. Did we really need 2,880 cores of processing power? As a supercomputer, the 80 TeraFLOPs of IBM Watson would place it only in 94th place on the [Top 500 Supercomputers]. While IBM Watson may be the [Smartest Machine on Earth], the most powerful supercomputer at this time is the Tianhe-1A with more than 186,000 cores, capable of 2,566 TeraFLOPs.
To determine how big IBM Watson needed to be, the IBM Research team ran the DeepQA algorithm on a single core. It took 2 hours to answer a single Jeopardy question! Let's look at the performance data:
Number of cores
Time to answer one Jeopardy question
Single IBM Power750 server
< 4 minutes
Single rack (10 servers)
< 30 seconds
IBM Watson (90 servers)
< 3 seconds
The old adage applies, [many hands make for light work]. The idea is to divide-and-conquer. For example, if you wanted to find a particular street address in the Manhattan phone book, you could dispatch fifty pages to each friend and they could all scan those pages at the same time. This is known as "Parallel Processing" and is how supercomputers are able to work so well. However, not all algorithms lend well to parallel processing, and the phrase [nine women can't have a baby in one month] is often used to remind us of this.
Fortuantely, UIMA is designed for parallel processing. You need to install UIMA-AS for Asynchronous Scale-out processing, an add-on to the base UIMA Java framework, supporting a very flexible scale-out capability based on JMS (Java Messaging Services) and ActiveMQ. We will also need Apache Hadoop, an open source implementation used by Yahoo Search engine. Hadoop has a "MapReduce" engine that allows you to divide the work, dispatch pieces to different "task engines", and the combine the results afterwards.
Host 2 will run Hadoop and drive the MapReduce process. Plan to have three KVM guests on Host 1, four on Host 2, and three on Host 3. That means you have 10 task engines to work with. These task engines can be deployed for Content Readers, Analysis Engines, and CAS Consumers. When all processing is done, the resulting votes will be tabulated and the top answer displayed on the Query Panel on Host 1.
Step 8: Testing
To simplify testing, use a batch processing approach. Rather than entering questions by hand in the Query Panel, generate a long list of questions in a file, and submit for processing. This will allow you to fine-tune the environment, optimize for performance, and validate the answers returned.
There you have it. By the time you get your implementation fully operational, you will have learned a lot of useful skills, including Linux administration, Ethernet networking, NFS file system configuration, Java programming, UIMA text mining analysis, and MapReduce parallel processing. Hopefully, you will also gain an appreciation for how difficult it was for the IBM Research team to accomplish what they had for the Grand Challenge on Jeopardy! Not surprisingly, IBM Watson is making IBM [as sexy to work for as Apple, Google or Facebook], all of which started their business in a garage or a basement with a system as small as this version for personal use.
It's Tuesday again, and that means one thing.... IBM Announcements! On the heels of [last week's announcements], IBM announced some additional products of interest to storage administrators.
IBM Information Archive
Back in 2008, IBM [unveiled the Information Archive]. This storage solution provides automated policy-based tiering between disk and tape, with non-erasable non-rewriteable enforcement to protect against unethical tampering of data. The initial release supported [both files and object storage], with support for different collections, each with its own set of policies for management. However, it only supported NFS initially for the file protocol. Today, IBM announces the addition of CIFS protocol support, which will be especially helpful in healthcare and life sciences, as much of the medical equipment is designed for CIFS protocol storage.
Also, Information Archive will now provide a full index and search feature capability to help with e-Discovery. Searches and retrievals can be done in the background without disrupting applications or the archiving operations.
IBM Tivoli Storage Manager for Virtual Environments V6.2 extends capabilities that currently exist in IBM Tivoli Storage Manager. TSM backup/archive clients run fine on guest operating systems, but now this new extension improves backup for VMware environments. TSM provides incremental block-level backups utilizing VMware's vStorage APIs for Data Protection and Changed Block Tracking features.
To minimize impact to the VMware host, TSM for VE make use of non-disruptive snapshots and offload the backup processing to a vStorage backup server. This supports file-level recovery, volume-level recovery, and full VM recovery. Of course, since it is based on TSM v6, you get advanced storage efficiency features such as compression and deduplication to minimize consumption of disk storage pools.
IBM Tivoli Monitor has been extended to support virtual servers, including VMware, Linux KVM, and Citrix XenServer. This can help with capacity planning, performance monitoring, and availability. Tivoli Monitor will help you understand the relationships between physical and virtual resources to help isolate problems to the correct resource, reducing the time it takes for debug issues between servers and storage. See the
Next week is [IBM Pulse2011 Conference] in Las Vegas, February 27 to March 2. Sorry, I don't plan to be there this year. It is looking to be a great conference, with fellow inventor Dean Kamen as the keynote speaker. For a blast from the past, read my blog posts from Pulse2008 [Main Tent sessions] and [Breakout sessions].
My series last week on IBM Watson (which you can read [here], [here], [here], and [here]) brought attention to IBM's Scale-Out Network Attached Storage [SONAS]. IBM Watson used a customized version of SONAS technology for its internal storage, and like most of the components of IBM Watson, IBM SONAS is commercially available as a stand-alone product.
Like many IBM products, SONAS has gone through various name changes. First introduced by Linda Sanford at an IBM SHARE conference in 2000 under the IBM Research codename Storage Tank, it was then delivered as a software-only offering SAN File System, then as a services offering Scale-out File Services (SoFS), and now as an integrated system appliance, SONAS, in IBM's Cloud Services and Systems portfolio.
If you are not familiar with SONAS, here are a few of my previous posts that go into more detail:
This week, IBM announces that SONAS has set a world record benchmark for performance, [a whopping 403,326 IOPS for a single file system]. The results are based on comparisons of publicly available information from Standard Performance Evaluation Corporation [SPEC], a prominent performance standardization organization with more than 60 member companies. SPEC publishes hundreds of different performance results each quarter covering a wide range of system performance disciplines (CPU, memory, power, and many more). SPECsfs2008_nfs.v3 is the industry-standard benchmark for NAS systems using the NFS protocol.
(Disclaimer: Your mileage may vary. As with any performance benchmark, the SPECsfs benchmark does not replicate any single workload or particular application. Rather, it encapsulates scores of typical activities on a NAS storage system. SPECsfs is based on a compilation of workload data submitted to the SPEC organization, aggregated from tens of thousands of fileservers, using a wide variety of environments and applications. As a result, it is comprised of typical workloads and with typical proportions of data and metadata use as seen in real production environments.)
The configuration tested involves SONAS Release 1.2 on 10 Interface Nodes and 8 Storage Pods, resulting a single file system over 900TB usable capacity.
10 Interface Nodes; each with:
Maximum 144 GB of memory
One active 10GbE port
8 Storage Pods; each with:
2 Storage nodes and 240 drives
Drive type: 15K RPM SAS hard drives
Data Protection using RAID-5 (8+P) ranks
Six spare drives per Storage Pod
IBM wanted a realistic "no compromises" configuration to be tested, by choosing:
Regular 15K RPM SAS drives, rather than a silly configuration full of super-expensive Solid State Drives (SSD) to plump up the results.
Moderate size, typical of what clients are asking for today. The Goldilocks rule applies. This SONAS is not a small configuration under 100TB, and nowhere close to the maximum supported configuration of 7,200 disks across 30 Interface Nodes and 30 Storage Pods.
Single file system, often referred to as a global name space, rather than using an aggregate of smaller file systems added together that would be more complicated to manage. Having multiple file systems often requires changes to applications to take advantage of the aggregate peformance. It is also more difficult to load-balance your performance and capacity across multiple file systems. Of course, SONAS can support up to 256 separate file systems if you have a business need for this complexity.
The results are stunning. IBM SONAS handled three times more workload for a single file system than the next leading contender. All of the major players are there as well, including NetApp, EMC and HP.
Guest Post: The following post was written by Tom Rauchut, IBM Infrastructure Architect and Advanced Technical Sales Specialist for Tivoli Automation. Tom is at IBM Pulse 2011 for Las Vegas this week, and has offered to send his observations.
The expo opened last night. There are so many fantastic demos and product experts. Las Vegas has a Tivoli buzz on right now.
This week was the IBM Pulse 2011 converence in Las Vegas, Nevada, with over 7,000 attendees. I wasn't there, and my on-the-scene correspondent was too busy running the hands-on lab to get out and attend sessions. Fortunately, I was able to watch some of the [IBM Software live stream], and here are my thoughts and observations.
Fellow inventor [Dean Kamen] was the keynote speaker. His inventions help people, making the world a better place. Here are three examples I found interesting during his talk:
Helping third world countries
Dean started out with his favorite quote:
"A problem well defined is a problem half-solved." - John Dewey
Dean mentioned that we are fortunate, having both potable drinking water and a reliable supply of electricity, but 2 to 4 billion people on the planet do not. Sponsored by Coca-Cola, Dean and his team of innovators were able to come up with small units that can be placed in a village or town. One unit takes in wet liquid and produces potable drinking water. The other unit takes combustible materials, like cow dung, and products electricity. Each unit is roughly the size of half a standard server rack. What does Coca-Cola get out of this? New "vending machines"! By combining drinking water with flavored syrups, they can create soft drinks on demand.
Dean's opinion was that if you want something done, you need to work with large corporations, as governments are mired in beauracracy and rules. I agree. When I first joined IBM, I was introduced to [TRIZ] which was a systematic method for solving problems. IBM's best and brightest are working to solve some of the toughest computer science challenges. For more on TRIZ, see this blog post about [TRIZ in BusinessWeek].
Helping injured veterans
Dean Kamen is well known for inventing the two-wheeled [Segway Personal Transporter], but his company, [DEKA], makes all kinds of things, mostly medical equipment. To help wounded soldiers returning from Iraq or Afghanistan without one or both arms, Dean and his team developed a robotic arm that has enough motor dexterity to pick up a raisin or grape off the table without dropping or squashing it. Dean has appeared several times on the Colbert Report, and here is a video of the robotic arm:
I have myself enjoyed riding a Segway. A local place in Tucson uses them to lead tourists through downtown Tucson and the University of Arizona campus.
Helping young students to learn science and technology
Dean wrapped up his talking by talking about his passion about "For Inspiration and Recognition of Science and Technology" or [FIRST]. Modeled after sports competitions, FIRST encourages teams of kids to build robots that perform specific tasks. Every year, companies and universities sponsor teams by purchasing robot kits from FIRST. Teams compete in regional competitions, and then the best of those go on to compete in a stadium in Atlanta, Georgia, hosting 76,000 people cheering for their teams.
Unlike other school sports (Football, Basketball, Baseball, etc.) where a student is more likely to win the lottery than get a successful career as a professional athlete, every student involved in FIRST competitions can "go pro". A study of FIRST success tracked students who participated in competitions, and found a substantial improvement in percentage of those students attending college and working as science and engineering professionals.
I am a big fan of encouraging kids of all ages to learn more about science, technology, engineering and math [STEM]. Back in 2009, I blogged about my involvement with [One Laptop Per Child] and [Junior FIRST Lego League]. I've gotten a great reaction to my latest challenge, to build a Watson Jr. in your own basement, based on my [step-by-step] instructions.
If you attended IBM Pulse this week, please comment on your thoughts and observations!
Wrapping up my week's coverage of the IBM Pulse 2011 conference, I have had several people ask me to explain IBM's latest initiative, Smarter Computing, which IBM launched this week at this conference. Having led the IT industry through the Centralized Computing era and the Distributed Computing era, IBM is now well-positioned to help companies, governments and non-profit organizations to enter the new Smarter Computing era, focused on insight and discovery.
Thousands of IT professionals
Effiicent, but only the largest companies and governments had them
Millions of office workers
Personal computers (PC)
Innovative, extending the reach to small and medium-sized businesses, but resulted in server sprawl and increased TCO
Billions of people
Smart phones and other handheld devices
Efficient and Innovative, combining the best of centralized and distributed computing
1952 to 1980
1981 to 2010
2011 and beyond
To help clients with this transition, IBM's Smarter Computing initiative has three main components. This is a corporate-wide strategy, with systems, software and services all working together to realize results.
The first component is Big Data. This combines three different sources of data:
Traditional structured data in OLTP databases and OLAP data warehouses, using data management solutions like DB2 and IBM Netezza.
Unstructured data, including text documents, images, audio, and video, processed with massive parallelism using IBM BigInsights and Apache Hadoop.
Real-Time Analytics Processing (RTAP) of incoming data, including video surveillance, social media, RFID chips, smart meters, and traffic control systems, processed with IBM InfoSphere Streams
Of course, Big Data will bring new opportunities on the storage front, which I will save for a future post!
Rather than general purpose IT equipment, we have now the scale and scope to specialize with systems optimized for particular workloads, the second component of the Smarter Computing initiative. Of course, IBM has been delivering integrated stacks of systems, software and services for decades now, but it is important to remind people of this, as IBM now has a spate of competitors all trying to follow IBM's lead in this arena.
As with Big Data, the focus on Optimized Systems has impacted IBM's strategy on storage as well. I'll save that discussion for a future post as well!
I am glad that nearly all of the storage vendors have standardized to a common definition for Cloud, the third component of Smarter Computing, which shows that this concept has matured:
Cloud computing is a pay-per-use model for enabling network access to a pool of computing resources that can be provisioned and released rapidly with minimal management effort or service provider interaction. -- U.S. National Institute of Standards and Technology [nist.gov]
Of course, Cloud is just an evolution of IBM's Service Bureau business of the 1960s and 1970s, renting out time-sharing on mainframe systems, Grid Computing of the 1980s, and Application Service Providers that popped up in the 1990s. While the [butchers, bakers and candlestick makers] that IBM competes against might focus their efforts on just private cloud or just public cloud, IBM recognizes the reality is that different clients will need different solutions. Rather than rip-and-replace, IBM will help clients transition to cloud via inclusive solutions that adopt a hybrid approach:
Traditional enterprise with private cloud deployments, using solutions like IBM CloudBurst, SONAS and Information Archive
Traditional enterprise with public cloud services to handle seasonable peaks, providing offsite resiliency, and solutions for a mobile workforce
Hybrid clouds that blend private and public cloud services, to handle seasonal peak workloads, remote and branch offices
IBM's emphasis on IT Infrastructure Library (ITIL), Tivoli and Maximo products will play well in this space to provide integrated service management across traditional and cloud deployments. This is why IBM decided to launch Smarter Computing initiative at Pulse 2011 conference, the industry's premiere conference on intergrated service management.
The IBM Watson that competed on Jeopardy! is an excellent example of all three components of Smarter Computing at work.
IBM Watson was able to respond to Jeopardy! clues within three seconds, processing a combination of database searches with DB2 and text-mining analytics of unstructured data with IBM BigInsights.
IBM Watson combined servers, software and storage into an integrated supercomputer that was optimized for one particular workload: playing Jeopardy!
IBM Watson used many technologies prevalent in private and public cloud computing systems, storing its data on a modified version of SONAS for storage, using xCat administration tools, networking across 10GbE Ethernet, and massive parallel processing through lots of PowerVM guest images.
Webcast: How to Diagnose and Cure What Ails Your Storage Infrastructure
Wednesday, March 23, 2011 at 11:00 AM PDT / 11:00 AM Arizona MST / 2:00 PM EDT
Storage is the most poorly utilized infrastructure element -- and the most costly part of hardware budgets -- in most IT shops today. And it’s getting worse. Storage management typically involves nightmarish mash-up of tools for capacity management, performance management and data protection management unique to each array deployed in heterogeneous fabrics. Server and desktop virtualization seem to have made management issues worse, and coming on the heels of changing workloads and data proliferation is the requirement to add data management to the set of responsibilities shouldered by fewer and fewer storage professionals. Forecast for Storage in 2012: more pain as long delayed storage infrastructure refresh becomes mandatory.
In this webcast, fellow blogger Jon Toigo, CEO of Toigo Partners International, of [DrunkenData] fame, and I will take turns assessing the challenges and suggesting real-world solutions to the many issues that confound storage efficiency in contemporary IT. Integrating real world case studies and technology insights, our storage experts will deliver a must see webcast that sets down a strategy for fixing storage...before it fixes you.
Don't miss this event, unless you like the stress of knowing that your next disaster may be a data disaster.
IBM and the Austin Chamber of Commerce is inviting registered SXSW Interactive attendees to the networking reception being hosted by the IBM Innovation Center and the IBM Venture Capital Group. Power Systems and Watson will have a significant feature at this SXSW event to be held on March 14, 2011.
While I won't be there personally at the SXSW conference, I strongly recommend you to attend this event.
Innovators and Entrepreneurs Networking Reception
Four Seasons Hotel
March 14, 2011
Hosted by IBM Venture Capital Group, Austin Chamber of Commerce, and the IBM Innovation Center.
This reception will provide a rare opportunity to network and collaborate with your professional community of industry leaders, entrepreneurs, developers, academics, venture capitalists, members of the Austin Chamber of Commerce.
When I turned on the television last weekend, I saw large waves of water knock down rows of small houses. I thought I had caught the end of a bad Godzilla movie, but sadly it was not movie special effects. Mother Nature can be quite destructive. Over the past four days, Japan has been hit hard by a series of earthquakes and resulting tsunami.
(Note: Disasters can happen anywhere and at any time. Last month, New Zealand had an earthquake as well. It is best to always be prepared. If you haven't done so lately, check out the latest recommendations from the US Government [Ready.Gov] website.)
Several have asked me how this tragedy in Japan might affect IBM and its clients. Here is what I have gathered from various sources. All IBM Japan employees have survived, are safe and reporting no major injuries. IBM has four major facilities, near central part of the country around Tokyo, far from Sendai, the epicenter. All IBM buildings are still standing and operational. A few sections of Tokyo are affected by scheduled brown-outs in an effort to save electricity. Employees are asked to telecommute (a.k.a. work from home) to minimize traffic congestion.
Hakozaki - Headquarters and executive briefing center
Makuhari - Technical Center, where we often hold conferences and other events
Yamato - Research Facility, where R&D is done for IBM tape storage products
Toyosu - Service Delivery Center
I have been to Japan many times throughout my career. Back in the summer of 1995, IBM sent me to Osaka to help out clients in the aftermath of the Great Hanshin eartquake near Kobe. I remember it well, sending an email back to my team saying "It is 1995, and here in Japan it is 95 degrees and 95 percent humidiy." It was seven months after the earthquake, but people were still living in cardboard boxes and make-shift tents.
Many people asked if I will be going back to Japan to help out. I speak Japanese, can make sense of the Japanese Katakana characters on computer monitors, and am an expert in Disaster Recovery. However, the IBM Japan team is doing an awesome job helping our clients restore their data and recovery their business operations. Of course, if IBM needs me in Japan, I will gladly go, but so far, it doesn't seem that I am needed there.
IBM announced that it will offer [three free months of IBM Smart Business Cloud] computing and storage services to government agencies, charitable non-profit organizations, and other organizations involved with reconstruction resulting from the earthquakes and tsunami in Japan and the northern Pacific region.
With traditional communications down, and many data centers incapacitated, Cloud Computing can be a great way to resume operations. According to the announcement, organizations can submit their requests now until April 30, and the program will run until July 31, 2011. Options include:
Virtual machine images running 32-bit and 64-bit versions of Linux or Windows.
60GB to 2 TB of disk storage per instance.
Options for various IBM middleware (DB2, Informix, Lotus, and WebSphere)
Rational Application Lifecycle Management and Tivoli Monitoring software
The offer also includes [LotusLive] Software-as-a-Service (SaaS) for email and online collaboration. For more about LotusLive, see this [Red Paper].
A reader from New Zealand expressed concern some corporate bloggers were [using the earthquake for marketing]. He lost someone close to him in Christchurch, and is unable to reach a friend living in Japan, so I am sorry for his loss. I plan to be in Australia and New Zealand to teach a Top Gun class May 15-27, so hopefully I will be able to meet him in person when I am down there.
"Earmarking funds is a really good way of hobbling relief organizations and ensuring that they have to leave large piles of money unspent in one place while facing urgent needs in other places. ... Meanwhile, the smaller and less visible emergencies where NGOs can do the most good are left unfunded.
In the specific case of Japan, there's all the more reason not to donate money. Japan is a wealthy country which is responding to the disaster, among other things, by printing hundreds of billions of dollars' worth of new money."
Another reader mentioned that the last surviving American WW-II vet died the same week. WTF? IBM and Japan have been allies for quite a while now, and there is no reason to bring up past wars except to compare the scope and magnitude of the cleanup effort. (Update: Frank Buckles was the last surviving WW-I vet, but also served in WW-II).
Many readers felt that charity begins at home, and there are plenty of worthy causes right here in the USA to donate to instead. Inspired by last year's movie [Waiting for Superman], my girlfriend started a project called [Centers for My Super Stars] for her first grade class on DonorsChoose.org. For those not familiar with this website, DonorsChoose.org uses the cloud to connect school teachers in need of supplies with rich people to donate funds towards these projects. If you want to contribute to her project, [donate here].
"And speaking of class, there just happens to be a baseball team in Sendai, Japan. The Golden Eagles. Their stadium was severely damaged from the earthquake. Wouldn't you think some of them lug nuts who run American baseball would bring the Golden Eagles and their opponents over to the United States when the Japanese season starts -- play some games over here and raise money to help the Japanese? Wouldn't you think they could just once stop that national pastime stuff and help the international pastime?"
As you can see, different readers have different opinions on this. We are all on this world together, and both our economy and our ecology are more interconnected than you might think. Let's build a smarter planet.
Before we started, we asked the first survey question: "How is storage planning conducted in your shop?" Of the various responses, nearly four out of ten responded "Part of an overall IT infrastructure strategy".
Jon Toigo went first, and spent 20 minutes or so laying out the problem as he sees it. Jon travels all over visiting customers struggling with their storage infrastructures, so he gets to hear a lot of this first hand.
I then spent 20 minutes or so presenting IBM's vision, strategy and offerings to help solve these problems. I could speak for hours on this topic, but we kept it short for this one-hour webcast. To learn more, request a visit to the Tucson Executive Briefing Center.
At the end of my talk, we put out the second survey, asking the audience "What is your number one priority with respect to storage operations today?" Over one fourth of the attendees were focused on reducing storage infrastructure cost of ownership by any means possible.
I am glad we saved the last 15 minutes for Q&A, as there were a lot of questions.
The replay is now available. If you attended the event and want to hear it again, or want to share it with your colleagues, or you missed it and want to hear it, then [Register for the Replay].
Last night, I presented an E-Talk to the Engineering Student Council (ESC) of the University of Arizona (UofA).
The ESC is the student governing body of The University of Arizona’s College of Engineering. The organization works with scholastic honorary societies, professional organizations, and project clubs to aid and encourage the professional and social development of students. This year, ESC launched a new program, Engineering Talks (E-Talks), consisting of workshops and lectures, which will focus on teaching students what it takes to work within a company, before they enter the workforce. To make this program successful, career advice from professionals working at established companies is essential.
The audience was a mix of undergraduate and graduate engineering students from a variety of disciplines, such as Petroleum, Hydrology, Mining, Biomedical, Electrical and Computer Engineering. Only a few were graduating this May. There were roughly an equal number of boys and girls, which was encouraging. When I was an engineering student at the UofA, women engineers were very rare.
A little about myself, my academic and professional career over the past 30 years, and some background of IBM as a company, how it is organized, and its 100 year Centennial celebration.
An overview of IBM's corporate strategy for Smarter Computing, explaining how IBM is solving the world's toughest challenges for analyzing Big Data, developing Optimized Systems for particular workloads, and new delivery and deployment models for Cloud Computing.
Some career advice, based on my decades of work experience at IBM and elsewhere
After the Q&A, several students stayed around afterwards to ask questions. This seems to happen every time I give a presentation to a mixed audience. I handed out plenty of business cards, and offered to make the charts available to all the students via the IBM Expert Network on Slideshare.net website.
Next week, April 6, IBM will host the [Smarter Computing Virtual Event] to cover IBM's Smarter Computing initiative, with key themes of Smarter Computing - Big Data, Optimized Systems, and Cloud. Smarter Computing is a new and innovative approach to computing based on the evolving role of IT in your business and an intrinsic understanding of the economics of IT.
(I found it amusing that EMC has chosen two of IBM's themes, "Big Data" and "Cloud", for their upcoming EMC World 2011 conference. I was tempted to include their graphic, but people might have accused me of using Photoshop or GIMP to make EMC look bad. Instead, you can look at the graphic on this blog post titled [When Cloud Meets Big Data: Information Logistics Revisited] by fellow blogger Chuck Hollis from EMC. IBM has been a leader in IT for decades, so we are used to having other companies follow in our footsteps. As an [IBM wannabee], EMC is no different.)
For many on tight travel budgets, this event REQUIRES NO TRAVEL! This is a virtual event, You can participate from your desk. You will hear from key IBM executives, all of which I have heard speak myself, so I can vouch that this should be a good event.
Steve Mills - IBM Senior Vice President and Group Executive, Software and Systems (my seventh-line manager)
Tom Rosamilia - IBM General Manager, Power and Mainframe Systems, IBM Systems and Technology Group.
Robert LaBlanc - IBM Senior Vice President, Middleware Software
Helene Armitage - IBM General Manager, Systems Software, Systems and Technology Group.
This event is targeted to CIOs, IT Directors and Managers, Business Analysts, Systems and Storage Administrators, and DBAs. However, we don't check what your actual title is, so feel free to attend even if you have different job responsibilities.
I am giving you one week's notice for this event. If this is the first time you have heard of this event, then I hope that is enough time to plan for this event in your busy schedule. If you had heard of it already, perhaps this serves as a useful reminder to [Register Now!] Is a week ahead the right amount of time? For virtual events, do we need more or less advance notice? What about for events that involve travel? Feel free to enter your thoughts on this in the comments section below.
Normally, when EMC fails, it is worth a giggle. Companies are run by humans, and nobody is perfect. However, their latest one, failing to defend their RSA SecurID two-factor website, is no laughing matter. Breaches like this undermine the trust needed for business and commerce to be done with Information Technology, so it affects the entire IT industry.
(FTC Disclosure: I do not work or have any financial investments in either EMC nor ENC Security Systems. Neither EMC nor ENC Security Systems paid me to mention them on this blog. Their mention in this blog is not an endorsement of either company or their products. Information about EMC was based solely on publicly available information made available by EMC and others. My friends at ENC Security Systems provided me an evaluation license for their latest software release so that I could confirm the use cases posed in this post.)
Of course, EMC did the right thing by making this breach public in an [Open Letter to RSA Customers]. While this may affect their revenues, as clients question whether they should do business with EMC, or affect their stock price, as investors question whether they should invest in EMC, they were very clear and public that the breach occurred. As far as I know, none of the executives of the RSA security division have stepped down. The disclosure of the breach was the right thing to do, and required by law from the [US Securities Exchange Commission]. This law was created to prevent companies from trying to hide breaches that expose external client information.
The breach does not affect RSA public/private key pairs used by IBM and most every other large company. Rather, this breach was targeted to RSA SecurID two-factor authentication. I explained two-factor authentication in my blog post [Day 5 Grid, SOA and Cloud Computing - System x KVM solutions], but basically it is an added level of security, requiring something you know (your password) with something you have (such as a magnetic card or key fob). Both are required to gain access to the system.
Breaches happen. Recently, [Hackers found vulnerabilities in the McAfee.com website]. Last month, fellow blogger Chuck Hollis from EMC had a blog post on [Understanding Advanced Persistent Threats (APT)] in the week leading up to their RSA Conference. It was precisely an APT that hit RSA, so the irony of this breach was not lost on the blogosphere. Perhaps Chuck's blog post gave hackers the idea to do this, like saying "I hope terrorists don't bomb this building that hold all of our chemical weapons..." or "I hope bank robbers don't rob this repository where we keep all the cash..."
(The sinister counter-theory, that EMC staged this breach as a marketing stunt to undermine trust in hybrid or public cloud offerings, such as those offered by IBM, Amazon or Salesforce.com, offers an interesting twist. While computer breaches in general are fodder for [Luddites] to argue we should not use computers at all, this particular breach could be used by EMC salesmen to encourage their customers to choose private cloud over hybrid cloud or public cloud deployments. Given all the extra work that RSA SecurID customers have to now do to harden their environments, that would be in bad taste.)
Today, March 31, is World Backup Day. This is because many viruses are triggered to operate on April 1. Just like checking the batteries in your smoke alarms every year, you should ensure that your backup methodology remains valid.
Back in 2008, I was a volunteer for the One Laptop Per Child (OLPC) initiative, and built an XS server to be used for Uruguay. I shipped [this baby off to school] to be the central server that all the student and teacher laptops connected to. It was the gateway to the Internet, as well as the [repository for the blogs of each student]. The blogs were accessible to the public, so that parents could read what their students were writing.
Unfortunately, this public access resulted in my little XS server being attacked by hackers, with IP addresses in Russia and China. Why anyone from either of those two countries wanted to ruin the hopes and dreams of small school children in Uruguay was beyond me. Fortunately, I had planned for remote administration. Backups were taken by me weekly to a second drive that was only mounted when I was dialed in to take the backup. The rest of the time, it was offline, so as not to be written to by hackers.
I also shipped along with the server a bootable DVD that contained a modified version of [System Rescue CD], scripts to start up SSHD daemon, and pre-populated for use with public/private RSA keys for me and eight other administrators located in various countries. To effect repairs, the local operator would reboot to the DVD, and then I could login via "ssh" and restore the operating system, programs and data. Sadly, this meant that the students might have lost some of their most recent blog posts since the last backup.
Please consider reviewing your own backup strategies. If your security were compromised, data was corrupted or lost, would you be able to recover from your backups?
Use Encryption where Appropriate
If you plan to travel this Summer, you may want to consider encryption to protect yourself. ENC Security Systems has just released their latest [Encrypt Stick] which is a USB memory stick pre-loaded with software that provides three features:
Encryption for your files
A secure web browser for accessing sensitive websites
Secure password manager
Many hotels now offer computers for use by the guests. These are typically running some flavor of Windows operating system. Encrypt Stick comes with an EXE file that you can run to browse the web securely, and have access to your encrypted files and passwords, leaving no trace on the hotel lobby computer.
Friends and Family
What if you are visiting friends and family, and they have a Mac instead? No problem, as Encrypt Stick has a DMG file to use on Mac OS X operating system. While you may not be worried about your siblings hacking into your bank account, you may not want them necessarily seeing what sites you visited.
I have been to several airport lounges now that use Linux for their public computers. Makes sense to me, as there are fewer viruses for Linux, and updating Linux is relatively straightforward. However, Encrypt Stick does not support Linux. For my Linux-knowledgeable readers, you can build your own with [Unetbootin] bootable USB memory stick to launch your favorite Linux browser in memory on whatever system you are using. The [Gparted Magic] utility rescue tool includes [TrueCrypt] to encrypt your files. Lastly, you can use [MyPasswordSafe] to hold all of your passwords securely.
Several clients have asked if any of the IBM data-at-rest encrypted disks or tapes are affected by this breach. IBM uses AES encryption for the actual disk and tape media, but we do use RSA keys to encrypt the generated keys used on the TS1120 and TS1130 drives. However, these were not affected by the RSA SecurID breach, and your tapes are safely protected.
Advanced Persistent Threats, viruses and other malware are no laughing matter. If you are concerned about security, contact IBM to help you assess your current environment and help you plan a robust protection strategy.
I started attending the Arizona International Film Festival eight years ago. I took a week off work to see the films, and came back to tell people how enjoyable it was to just sit and watch thirty movies. "Dirty movies?" they would ask. "No, not dirty, thirty!" To avoid further confusion, I quickly switched to saying "I spent the week watching 25 to 35 independent films."
A few weeks ago, the Arizona International Film Festival notified me that my 2007 System Storage video has been recognized for a technical award under the category of "Innovative use of Technology for Animated Short Film." I will receive the award in person this evening, April 1, at the opening ceremony, which starts at 6:00pm, at the [Fox Theater] in downtown Tucson, Arizona.
As is the case with the Oscars and Grammies, technical awards are handed out in smaller ceremonies in advance of the primary award ceremony that recognizes the best actors, directors and films, which will be held April 9.
In this virtual world, avatars representing IBM executives and marketing managers would present our latest products to avatars of the IBM Business Partners, Industry Analysts and the Press. A short "highlights" video that stitched together bits and pieces of the 90-minute event was used by executives at conferences and road shows. I submitted this shortened version to the Airzona International Film Festival back in 2008, so I am glad the judges had finally gotten around to review it. Here it is uploaded as a [YouTube video]:
During the event, I captured the real-time video from my laptop screen using a tool called [FRAPS]. I also had some of my colleagues capture video from different angles in case we needed these in post-production. The technique of capturing computer-generated 3D video from a computer screen is known as Machinima.
I was in Bogota Columbia that week teaching a Top Gun class. I got to the IBM building only to discover the firewall would not let me get through to the Second Life website, so I took a taxi back to the hotel and ran the event from their business center. Then the unthinkable happened, and I got to experience [Columbia's worst power outage in 22 years], in which 98 percent of the country lost power. Luckily, I had enough battery charge on my laptop and was still connected to the Internet to continue with the rest of the event.
Voice-over-IP but it did not have that feature back then. The other $91 was for virtual items in Second Life. I learned how to make virtual objects and use the GNU Image Manipulation Program [GIMP] to create avatar clothing, giveaway items, and demo equipment.
Instead of hiring voice actors, I had IBMers Andy Monshaw, Eric Buckley, Funda Eceral, David Tareen, and Kristie Bell all provide their voice talents directly.
I asked the coordinator of the film festival if there was going to be a &quot;practice session&quot; for the technical award ceremony. She laughed, and said basically that I would just be walking across the stage, receive the award in my left hand as I shook hands with my right, and then turn slightly clockwise to pose while my picture is taken. If I had ever gotten a diploma from high school or college, she said, then I already knew what to expect. "Don't worry," she assured me, "you won't have to give a speech!"
(In lieu of a speech, I would like to thank Christine Heinisch, my video editor, and Katrina Smith, my cinematographer. I could not have won this technical award without their assistance.)
After the ceremony tonight, the film festival (celebrating its 20th anniversary this year) will kick off with the first of 110 films called "Journey from Zanskar", a 90-minute documentary directed by Frederick Marx and narrated by Richard Gere, starting at 8:00pm. The Arizona International Film Festival will continue through April 20, with films being shown on the evenings and weekends so that I won't have to take time off from work.
If you are in the Tucson area, come out and join me tonight at the Fox Theater!
(Update: Yes, this was an April Fools joke! I did not win any awards for this video. I apologize to my friends and family who showed up to see me receive the award that I didn't get.)
The IBM Storwize V7000 was introduced last October, and has proven to be wildly successful. I saw two awesome reviews recently of the IBM Storwize V7000 disk system that I thought I would bring to your attention.
The first review is [IBM Storwize V7000] from Roger Howorth of ZDNet UK. Here are some quotes:
"Under the hood, the Storwize V7000 is built from technologies originally developed for IBM's enterprise-class storage systems, so the V7000 benefits from a comprehensive set of high-end features that have been scaled down for mid-range buyers."
"Initial configuration couldn't be simpler."
"We really liked the layout and functionality of the GUI."
"Storwize V7000 is virtual storage that offers efficiency and flexibility through built-in SSD optimization and "thin provisioning" technologies while enabling users to virtualize and re-use existing disk systems..."
"Storwize V7000 advanced functionality also enables non-disruptive migration of data from existing storage, simplifying implementation and minimizing disruption to users."
"The Storwize V7000 graphical user interface is a browser-based, easy to navigate intuitive GUI."
"ESG Lab found that getting started with the Storwize V7000 disk system was intuitive and straightforward."
"Easy Tier increases the efficiency and simplicity of deploying SSD drives."