Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
For the longest time, people thought that humans could not run a mile in less than four minutes. Then, in 1954, [Sir Roger Bannister] beat that perception, and shortly thereafter, once he showed it was possible, many other runners were able to achieve this also. The same is being said now about the IBM Watson computer which appeared this week against two human contestants on Jeopardy!
(2014 Update: A lot has happened since I originally wrote this blog post! I intended this as a fun project for college students to work on during their summer break. However, IBM is concerned that some businesses might be led to believe they could simply stand up their own systems based entirely on open source and internally developed code for business use. IBM recommends instead the [IBM InfoSphere BigInsights] which packages much of the software described below. IBM has also launched a new "Watson Group" that has [Watson-as-a-Service] capabilities in the Cloud. To raise awareness to these developments, IBM has asked me to rename this post from IBM Watson - How to build your own "Watson Jr." in your basement to the new title IBM Watson -- How to replicate Watson hardware and systems design for your own use in your basement. I also took this opportunity to improve the formatting layout.)
Often, when a company demonstrates new techology, these are prototypes not yet ready for commercial deployment until several years later. IBM Watson, however, was made mostly from commercially available hardware, software and information resources. As several have noted, the 1TB of data used to search for answers could fit on a single USB drive that you buy at your local computer store.
Take a look at the [IBM Research Team] to determine how the project was organized. Let's decide what we need, and what we don't in our version for personal use:
Do we need it for personal use?
Yes, That's you. Assuming this is a one-person project, you will act as Team Lead.
Yes, I hope you know computer programming!
No, since this version for personal use won't be appearing on Jeopardy, we won't need strategy on wager amounts for the Daily Double, or what clues to pick next. Let's focus merely on a computer that can accept a question in text, and provide an answer back, in text.
Yes, this team focused on how to wire all the hardware together. We need to do that, although this version for personal use will have fewer components.
Optional. For now, let's have this version for personal use just return its answer in plain text. Consider this Extra Credit after you get the rest of the system working. Consider using [eSpeak], [FreeTTS], or the Modular Architecture for Research on speech sYnthesis [MARY] Text-to-Speech synthesizers.
Yes, I will explain what this is, and why you need it.
Yes, we will need to get information for personal use to process
Yes, this team developed a system for parsing the question being asked, and to attach meaning to the different words involved.
No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.
(Disclaimer: As with any Do-It-Yourself (DIY) project, I am not responsible if you are not happy with your version for personal use I am basing the approach on what I read from publicly available sources, and my work in Linux, supercomputers, XIV, and SONAS. For our purposes, this version for personal use is based entirely on commodity hardware, open source software, and publicly available sources of information. Your implementation will certainly not be as fast or as clever as the IBM Watson you saw on television.)
Step 1: Buy the Hardware
Supercomputers are built as a cluster of identical compute servers lashed together by a network. You will be installing Linux on them, so if you can avoid paying extra for Microsoft Windows, that would save you some money. Here is your shopping list:
Three x86 hosts, with the following:
64-bit quad-core processor, either Intel-VT or AMD-V capable,
8GB of DRAM, or larger
300GB of hard disk, or larger
CD or DVD Read/Write drive
Computer Monitor, mouse and keyboard
Ethernet 1GbE 4-port hub, and appropriate RJ45 cables
Surge protector and Power strip
Local Console Monitor (LCM) 4-port switch (formerly known as a KVM switch) and appropriate cables. This is optional, but will make it easier during the development. Once your implementation is operational, you will only need the monitor and keyboard attached to one machine. The other two machines can remain "headless" servers.
Step 2: Establish Networking
IBM Watson used Juniper switches running at 10Gbps Ethernet (10GbE) speeds, but was not connected to the Internet while playing Jeopardy! Instead, these Ethernet links were for the POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes.
The implementation will be able to run "disconnected from the Internet" as well. However, you will need Internet access to download the code and information sources. For our purposes, 1GbE should be sufficient. Connect your Ethernet hub to your DSL or Cable modem. Connect all three hosts to the Ethernet switch. Connect your keyboard, video monitor and mouse to the LCM, and connect the LCM to the three hosts.
Step 3: Install Linux and Middleware
To say I use Linux on a daily basis is an understatement. Linux runs on my Android-based cell phone, my laptop at work, my personal computers at home, most of our IBM storage devices from SAN Volume Controller to XIV to SONAS, and even on my Tivo at home which recorded my televised episodes of Jeopardy!
For this project, you can use any modern Linux distribution that supports KVM. IBM Watson used Novel SUSE Linux Enterprise Server [SLES 11]. Alternatively, I can also recommend either Red Hat Enterprise Linux [RHEL 6] or Canonical [Ubuntu v10]. Each distribution of Linux comes in different orientations. Download the the 64-bit "ISO" files for each version, and burn them to CDs.
Graphical User Interface (GUI) oriented, often referred to as "Desktop" or "HPC-Head"
Command Line Interface (CLI) oriented, often referred to as "Server" or "HPC-Compute"
Guest OS oriented, to run in a Hypervisor such as KVM, Xen, or VMware. Novell calls theirs "Just Enough Operating System" [JeOS].
For this version for personal use, I have chosen a [multitier architecture], sometimes referred to as an "n-tier" or "client/server" architecture.
Host 1 - Presentation Server
For the Human-Computer Interface [HCI], the IBM Watson received categories and clues as text files via TCP/IP, had a [beautiful avatar] representing a planet with 42 circles streaking across in orbit, and text-to-speech synthesizer to respond in a computerized voice. Your implementation will not be this sophisticated. Instead, we will have a simple text-based Query Panel web interface accessible from a browser like Mozilla Firefox.
Host 1 will be your Presentation Server, the connection to your keyboard, video monitor and mouse. Install the "Desktop" or "HPC Head Node" version of Linux. Install [Apache Web Server and Tomcat] to run the Query Panel. Host 1 will also be your "programming" host. Install the [Java SDK] and the [Eclipse IDE for Java Developers]. If you always wanted to learn Java, now is your chance. There are plenty of books on Java if that is not the language you normally write code.
While three little systems doesn't constitute an "Extreme Cloud" environment, you might like to try out the "Extreme Cloud Administration Tool", called [xCat], which was used to manage the many servers in IBM Watson.
Host 2 - Business Logic Server
Host 2 will be driving most of the "thinking". Install the "Server" or "HPC Compute Node" version of Linux. This will be running a server virtualization Hypervisor. I recommend KVM, but you can probably run Xen or VMware instead if you like.
Host 3 - File and Database Server
Host 3 will hold your information sources, indices, and databases. Install the "Server" or "HPC Compute Node" version of Linux. This will be your NFS server, which might come up as a question during the installation process.
Technically, you could run different Linux distributions on different machines. For example, you could run "Ubuntu Desktop" for host 1, "RHEL 6 Server" for host 2, and "SLES 11" for host 3. In general, Red Hat tries to be the best "Server" platform, and Novell tries to make SLES be the best "Guest OS".
My advice is to pick a single distribution and use it for everything, Desktop, Server, and Guest OS. If you are new to Linux, choose Ubuntu. There are plenty of books on Linux in general, and Ubuntu in particular, and Ubuntu has a helpful community of volunteers to answer your questions.
Step 4: Download Information Sources
You will need some documents for your implementation to process.
IBM Watson used a modified SONAS to provide a highly-available clustered NFS server. For this version, we won't need that level of sophistication. Configure Host 3 as the NFS server, and Hosts 1 and 2 as NFS clients. See the [Linux-NFS-HOWTO] for details. To optimize performance, host 3 will be the "official master copy", but we will use a Linux utility called rsync to copy the information sources over to the hosts 1 and 2. This allows the task engines on those hosts to access local disk resources during question-answer processing.
We will also need a relational database. You won't need a high-powered IBM DB2. Your implementation can do fine with something like [Apache Derby] which is the open source version of IBM CloudScape from its Informix acquisition. Set up Host 3 as the Derby Network Server, and Hosts 1 and 2 as Derby Network Clients. For more about structured content in relational databases, see my post [IBM Watson - Business Intelligence, Data Retrieval and Text Mining].
Linux includes a utility called wget which allows you to download content from the Internet to your system. What documents you decide to download is up to you, based on what types of questions you want answered. For example, if you like Literature, check out the vast resources at [FullBooks.com]. You can automate the download by writing a shell script or program to invoke wget to all the places you want to fetch data from. Rename the downloaded files to something unique, as often they are just "index.html". For more on wget utility, see [IBM Developerworks].
Step 5: The Query Panel - Parsing the Question
Next, we need to parse the question and have some sense of what is being asked for. For this we will use [OpenNLP] for Natural Language Processing, and [OpenCyc] for the conceptual logic reasoning. See Doug Lenat presenting this 75-minute video [Computers versus Common Sense]. To learn more, see the [CYC 101 Tutorial].
Unlike Jeopardy! where Alex Trebek provides the answer and contestants must respond with the correct question, we will do normal Question-and-Answer processing. To keep things simple, we will limit questions to the following formats:
Who is ...?
Where is ...?
When did ... happen?
What is ...?
Host 1 will have a simple Query Panel web interface. At the top, a place to enter your question, and a "submit" button, and a place at the bottom for the answer to be shown. When "submit" is pressed, this will pass the question to "main.jsp", the Java servlet program that will start the Question-answering analysis. Limiting the types of questions that can be posed will simplify hypothesis generation, reduce the candidate set and evidence evaluation, allowing the analytics processing to continue in reasonable time.
Step 6: Unstructured Information Management Architecture
The "heart and soul" of IBM Watson is Unstructured Information Management Architecture [UIMA]. IBM developed this, then made it available to the world as open source. It is maintained by the [Apache Software Foundation], and overseen by the Organization for the Advancement of Structured Information Standards [OASIS].
Basically, UIMA lets you scan unstructured documents, gleam the important points, and put that into a database for later retrieval. In the graph above, DBs means 'databases' and KBs means 'knowledge bases'. See the 4-minute YouTube video of [IBM Content Analytics], the commercial version of UIMA.
Starting from the left, the Collection Reader selects each document to process, and creates an empty Common Analysis Structure (CAS) which serves as a standardized container for information. This CAS is passed to Analysis Engines , composed of one or more Annotators which analyze the text and fill the CAS with the information found. The CAS are passed to CAS Consumers which do something with the information found, such as enter an entry into a database, update an index, or update a vote count.
(Note: This point requires, what we in the industry call a small matter of programming, or [SMOP]. If you've always wanted to learn Java programming, XML, and JDBC, you will get to do plenty here. )
If you are not familiar with UIMA, consider this [UIMA Tutorial].
Step 7: Parallel Processing
People have asked me why IBM Watson is so big. Did we really need 2,880 cores of processing power? As a supercomputer, the 80 TeraFLOPs of IBM Watson would place it only in 94th place on the [Top 500 Supercomputers]. While IBM Watson may be the [Smartest Machine on Earth], the most powerful supercomputer at this time is the Tianhe-1A with more than 186,000 cores, capable of 2,566 TeraFLOPs.
To determine how big IBM Watson needed to be, the IBM Research team ran the DeepQA algorithm on a single core. It took 2 hours to answer a single Jeopardy question! Let's look at the performance data:
Number of cores
Time to answer one Jeopardy question
Single IBM Power750 server
< 4 minutes
Single rack (10 servers)
< 30 seconds
IBM Watson (90 servers)
< 3 seconds
The old adage applies, [many hands make for light work]. The idea is to divide-and-conquer. For example, if you wanted to find a particular street address in the Manhattan phone book, you could dispatch fifty pages to each friend and they could all scan those pages at the same time. This is known as "Parallel Processing" and is how supercomputers are able to work so well. However, not all algorithms lend well to parallel processing, and the phrase [nine women can't have a baby in one month] is often used to remind us of this.
Fortuantely, UIMA is designed for parallel processing. You need to install UIMA-AS for Asynchronous Scale-out processing, an add-on to the base UIMA Java framework, supporting a very flexible scale-out capability based on JMS (Java Messaging Services) and ActiveMQ. We will also need Apache Hadoop, an open source implementation used by Yahoo Search engine. Hadoop has a "MapReduce" engine that allows you to divide the work, dispatch pieces to different "task engines", and the combine the results afterwards.
Host 2 will run Hadoop and drive the MapReduce process. Plan to have three KVM guests on Host 1, four on Host 2, and three on Host 3. That means you have 10 task engines to work with. These task engines can be deployed for Content Readers, Analysis Engines, and CAS Consumers. When all processing is done, the resulting votes will be tabulated and the top answer displayed on the Query Panel on Host 1.
Step 8: Testing
To simplify testing, use a batch processing approach. Rather than entering questions by hand in the Query Panel, generate a long list of questions in a file, and submit for processing. This will allow you to fine-tune the environment, optimize for performance, and validate the answers returned.
There you have it. By the time you get your implementation fully operational, you will have learned a lot of useful skills, including Linux administration, Ethernet networking, NFS file system configuration, Java programming, UIMA text mining analysis, and MapReduce parallel processing. Hopefully, you will also gain an appreciation for how difficult it was for the IBM Research team to accomplish what they had for the Grand Challenge on Jeopardy! Not surprisingly, IBM Watson is making IBM [as sexy to work for as Apple, Google or Facebook], all of which started their business in a garage or a basement with a system as small as this version for personal use.
Are you tired of hearing about Cloud Computing without having any hands-on experience? Here's your chance. IBM has recently launched its IBM Development and Test Cloud beta. This gives you a "sandbox" to play in. Here's a few steps to get started:
Generate a "key pair". There are two keys. A "public" key that will reside in the cloud, and a "private" key that you download to your personal computer. Don't lose this key.
Request an IP address. This step is optional, but I went ahead and got a static IP, so I don't have to type in long hostnames like "vm353.developer.ihost.com".
Request storage space. Again, this step is optional, but you can request a 50GB, 100GB and 200GB LUN. I picked a 200GB LUN. Note that each instance comes with some 10 to 30GB storage already. The advantage to a storage LUN is that it is persistent, and you can mount it to different instances.
Start an "instance". An "instance" is a virtual machine, pre-installed with whatever software you chose from the "asset catalog". These are Linux images running under Red Hat Enterprise Virtualization (RHEV) which is based on Linux's kernel virtual machine (KVM). When you start an instance, you get to decide its size (small, medium, or large), whether to use your static IP address, and where to mount your storage LUN. On the examples below, I had each instance with a static IP and mounted the storage LUN to /media/storage subdirectory. The process takes a few minutes.
So, now that you are ready to go, what instance should you pick from the catalog? Here are three examples to get you started:
IBM WebSphere sMASH Application Builder
Base OS server to run LAMP stack
Next, I decided to try out one of the base OS images. There are a lot of books on Linux, Apache, MySQL and PHP (LAMP) which represents nearly 70 percent of the web sites on the internet. This instance let's you install all the software from scratch. Between Red Hat and Novell SUSE distributions of Linux, Red Hat is focused on being the Hypervisor of choice, and SUSE is focusing on being the Guest OS of choice. Most of the images on the "asset catalog" are based on SLES 10 SP2. However, there was a base OS image of Red Hat Enterprise Linux (RHEL) 5.4, so I chose that.
To install software, you either have to find the appropriate RPM package, or download a tarball and compile from source. To try both methods out, I downloaded tarballs of Apache Web Server and PHP, and got the RPM packages for MySQL. If you just want to learn SQL, there are instances on the asset catalog with DB2 and DB2 Express-C already pre-installed. However, if you are already an expert in MySQL, or are following a tutorial or examples based on MySQL from a classroom textbook, or just want a development and test environment that matches what your company uses in production, then by all means install MySQL.
This is where my SSH client comes in handy. I am able to login to my instance and use "wget" to fetch the appropriate files. An alternative is to use "SCP" (also part of PuTTY) to do a secure copy from your personal computer up to the instance. You will need to do everything via command line interface, including editing files, so I found this [VI cheat sheet] useful. I copied all of the tarballs and RPMs on my storage LUN ( /media/storage ) so as not to have to download them again.
Compiling and configuring them is a different matter. By default, you login as an end user, "idcuser" (which stands for IBM Developer Cloud user). However, sometimes you need "root" level access. Use "sudo bash" to get into root level mode, and this allows you to put the files where they need to be. If you haven't done a configure/make/make install in awhile, here's your chance to relive those "glory days".
In the end, I was able to confirm that Apache, MySQL and PHP were all running correctly. I wrote a simple index.php that invoked phpinfo() to show all the settings were set correctly. I rebooted the instance to ensure that all of the services started at boot time.
Rational Application Developer over VDI
This last example, I started an instance pre-installed with Rational Application Developer (RAD), which is a full Integrated Development Environment (IDE) for Java and J2EE applications. I used the "NX Client" to launch a virtual desktop image (VDI) which in this case was Gnome on SLES 10 SP2. You might want to increase the screen resolution on your personal computer so that the VDI does not take up the entire screen.
From this VDI, you can launch any of the programs, just as if it were your own personal computer. Launch RAD, and you get the familiar environment. I created a short Java program and launched it on the internal WebSphere Application Server test image to confirm it was working correctly.
If you are thinking, "This is too good to be true!" there is a small catch. The instances are only up and running for 7 days. After that, they go away, and you have to start up another one. This includes any files you had on the local disk drive. You have a few options to save your work:
Copy the files you want to save to your storage LUN. This storage LUN appears persistent, and continues to exist after the instance goes away.
Take an "image" of your "instance", a function provided in the IBM Developer and Test Cloud. If you start a project Monday morning, work on it all week, then on Friday afternoon, take an "image". This will shutdown your instance, and backup all of the files to your own personal "asset catalog" so that the next time you request an instance, you can chose that "image" as the starting point.
Another option is to request an "extension" which gives you another 7 days for that instance. You can request up to five unique instances running at the same time, so if you wanted to develop and test a multi-host application, perhaps one host that acts as the front-end web server, another host that does some kind of processing, and a third host that manages the database, this is all possible. As far as I can tell, you can do all the above from either a Windows, Mac or Linux personal computer.
Getting hands-on access to Cloud Computing really helps to understand this technology!
Well, it feels like Tuesday and you know what that means... "IBM Announcement Day!" Actually, today is Wednesday, but since Monday was Memorial Day holiday here in the USA, my week is day-shifted. Yesterday, IBM announced its latest IBM FlashCopy Manager v2.2 release. Fellow blogger, Del Hoobler (IBM) has also posted something on this out atthe [Tivoli Storage Blog].
IBM FlashCopy Manager replaces two previous products. One was called Tivoli Storage Manager for Copy Services, the other was called Tivoli Storage Manager for Advanced Copy Services. To say people were confused between these two was an understatement, the first was for Windows, and the second was for UNIX and Linux operating systems. The solution? A new product that replaces both of these former products to support Windows, UNIX and Linux! Thus, IBM FlashCopy Manager was born. I introduced this product back in 2009 in my post [New DS8700 and other announcements].
IBM Tivoli Storage FlashCopy Manager provides what most people with "N series SnapManager envy" are looking for: application-aware point-in-time copies. This product takes advantage of the underlying point-in-time interfaces available on various disk storage systems:
FlashCopy on the DS8000 and SAN Volume Controller (SVC)
Snapshot on the XIV storage system
Volume Shadow Copy Services (VSS) interface on the DS3000, DS4000, DS5000 and non-IBM gear that supports this Microsoft Windows protocol
For Windows, IBM FlashCopy Manager can coordinate the backup of Microsoft Exchange and SQL Server. The new version 2.2 adds support for Exchange 2010 and SQL Server 2008 R2. This includes the ability to recover an individual mailbox or mail item from an Exchange backup. The data can be recovered directly to an Exchange server, or to a PST file.
For UNIX and Linux, IBM FlashCopy Manager can coordinate the backup of DB2, SAP and Oracle databases. Version 2.2 adds support specific Linux and Solaris operating systems, and provides a new capability for database cloning. Basically, database cloning restores a database under a new name with all the appropriate changes to allow its use for other purposes, like development, test or education training. A new "fcmcli" command line interface allows IBM FlashCopy Manager to be used for custom applications or file systems.
A common misperception is that IBM FlashCopy Manager requires IBM Tivoli Storage Manager backup software to function. That is not true. You have two options:
In Stand-alone mode, it's just you, the application, IBM FlashCopy Manager and your disk system. IBM FlashCopy Manager coordinates the point-in-time copies, maintains the correct number of versions, and allows you to backup and restore directly disk-to-disk.
Unified Recovery Management with Tivoli Storage Manager
Of course, the risk with relying only on point-in-time copies is that in most cases, they are on the same disk system as the original data. The exception being virtual disks from the SAN Volume Controller. IBM FlashCopy Manager can be combined with IBM Tivoli Storage Manager so that the point-in-time copies can be copied off to a local or remote TSM server, so that if the disk system that contains both the source and the point-in-time copies fails, you have a backup copy from TSM. In this approach, you can still restore from the point-in-time copies, but you can also restore from the TSM backups as well.
IBM FlashCopy Manager is an excellent platform to connect application-aware fucntionality with hardware-based copy services.
Last week, I presented "An Introduction to Cloud Computing" for two hours to the local Institute of Management Accountants [IMA] for their Continuing Professional Education [CPE]. Since I present IBM's leadership in Cloud Storage offerings, I have had to become an expert in Cloud Computing overall. The audience was a mix of bookkeepers, accountants, auditors, comptrollers, CPAs, and accounting teachers.
Here is a sample of the questions I took during and after my presentation:
If I need to shut down host machine, I lose all my virtual machines as well?
No, it is possible to seemlessly move virtual machines from one host to another. If you need to shut down a host machine, move all the VMs to other hosts, then you can shut down the empty host without impacting business.
Does the SaaS provider have to build their own app, can they not buy an app and then rent it out?
Yes, but they won't have competitive differentiation, and the software development they buy from will want a big cut of the action. SaaS developers that build their own applications can keep more of the profits for themselves.
How do backups work in cloud computing? Do I have to contact someone at the cloud computing company to find the backup tape?
Large datacenters often keep the most recent backups on disk, and older versions on tape in automated tape libraries that can fetch your backup in less than 2 minutes. Because of this, there is no need to talk to anyone, you can schedule or invoke your own backups, and often perform the recovery yourself using self-service tools.
Last month, my sister tried to rent a car during the week the Tucson Gem Show, but they were out of cars she wanted to drive. Could this happen with Cloud Computing?
Not likely. With rental cars, the cars have to be physically in Tucson to rent them. Rental companies could have brought cars down from Phoenix to satisfy demand. With Cloud Computing, it is all accessible over the global network, you are not limited to the cloud providers nearest you.
Is there a reason why Amazon Web Services (AWS) charges more for a Windows image than a Linux image?
Yes, Amazon and Microsoft have a patent cross-licensing agreement where Amazon pays Microsoft for the priveledge of offering Windows-based images on their EC2 cloud infrastructure. It just makes business sense to pass those costs onto the consumer. Linux is a free open source operating system, and is often the better choice.
So if we rent a machine from Amazon, they send it to my accounting office? What exactly am I getting for 12 cents per hour?
No. The computer remains in their datacenter. You get a virtual machine that runs 1.2Ghz Intel processor, with 1700MB of RAM, and 160GB of hard disk space, with Windows operating system running on it, comparable to a machine you can get at the local BestBuy, but instead of it running in the next room, it is running in a datacenter somewhere else in the United States with electricity and air conditioning.
You access it remotely from your desktop or laptop PC.
Why would I ever rent more than one computer?
It depends on your workload. For example, Derek Gottfrid at the New York Times needed to convert 11 million articles from TIFF format to PDF format so that he could put them up on the web. This would have taken him months using a single computer, so he rented 100 computers and got the entire stack converted in 24 hours, for a cost of about $240. See the articles [Self-Service, Prorated, Super Computing] and [TimesMachine] for details.
What about throughput? Won't I need to run cables from my accounting office to this cloud computing data center?
You will need connectivity, most likely from connections provided by your local telephone or cable company, or through the Internet. Certainly, there can be cases where direct privately-owned fiber optic cables, known as "dark fiber", can directly connect consumers to local Cloud service providers, for added security.
What about medical records? Will Cloud Computing help the Healthcare industry?
Yes, hospitals are finding that digitizing their records greatly reduces costs. IBM offers the Grid Medical Archive Solution [GMAS] as a private cloud storage solution to store X-ray images and other electronic medical records on disk and tape, and these records can be accessed from multiple hospitals and clinics, wherever the doctor or patient happens to be.
The advantage of personal computers was individualization, I could put on my own choices of software, and customize my own settings, won't we lose this with Cloud Computing?
Yes, customized software and settings cost companies millions of dollars with help desk calls. Cloud Computing attempts to provide some standardization, reducing the amount of effort to support IT operations.
Won't putting all the computers into a big datacenter make them more vulnerable to hackers?
Security is a well-known concern, but this is being addressed with encryption, access control lists, multi-tenancy isolation, and VPN connections.
My daughter has a BlackBerry or iPod or something, and when we mentioned that someone in Phoenix wore a monkey suit to avoid photo-radar speed cameras, she was able to pull up a picture on her little hand-held thing, is this the future?
Yes, mobile phones and other hand-held devices now have internet access to take advantage of Cloud Computing services. People will be able to access the information they need from wherever they happen to be. (You can see the picture here: [Man Dons Mask for Speed-Camera Photos])
IBM offers a variety of Cloud Computing services, as well as customized solutions and integrated systems that can be deployed on-premises behind your corporate firewall. To learn more, go to [ibm.com/cloud].
The second speaker was local celebrity Dan Ryan presenting the financials for the upcoming [Rosemont Copper] mining operations. Copper is needed for emerging markets, such as hybrid vehicles and wind turbines. Copper is a major industry in Arizona.
It's Tuesday, and that means more IBM announcements!
I haven't even finished blogging about all the other stuff that got announced last week, and here we are with more announcements. Since IBM's big [Pulse 2010 Conference] is next week, I thought I would cover this week's announcement on Tivoli Storage Manager (TSM) v6.2 release. Here are the highlights:
Client-Side Data Deduplication
This is sometimes referred to as "source-side" deduplication, as storage admins can get confused on which servers are clients in a TSM client-server deployment. The idea is to identify duplicates at the TSM client node, before sending to the TSM server. This is done at the block level, so even files that are similar but not identical, such as slight variations from a master copy, can benefit. The dedupe process is based on a shared index across all clients, and the TSM server, so if you have a file that is similar to a file on a different node, the duplicate blocks that are identical in both would be deduplicated.
This feature is available for both backup and archive data, and can also be useful for archives using the IBM System Storage Archive Manager (SSAM) v6.2 interface.
Simplified management of Server virtualization
TSM 6.2 improves its support of VMware guests by adding auto-discovery. Now, when you spontaneously create a new virtual machine OS guest image, you won't have to tell TSM, it will discover this automatically! TSM's legendary support of VMware Consolidated Backup (VCB) now eliminates the manual process of keeping track of guest images. TSM also added support of the Vstorage API for file level backup and recovery.
While IBM is the #1 reseller of VMware, we also support other forms of server virtualization. In this release, IBM adds support for Microsoft Hyper-V, including support using Microsoft's Volume Shadow Copy Services (VSS).
Automated Client Deployment
Do you have clients at all different levels of TSM backup-archive client code deployed all over the place? TSM v6.2 can upgrade these clients up to the latest client level automatically, using push technology, from any client running v5.4 and above. This can be scheduled so that only certain clients are upgraded at a time.
Simultaneous Background Tasks
The TSM server has many background administrative tasks:
Migration of data from one storage pool to another, based on policies, such as moving backups and archives on a disk pool over to a tape pools to make room for new incoming data.
Storage pool backup, typically data on a disk pool is copied to a tape pool to be kept off-site.
Copy active data. In TSM terminology, if you have multiple backup versions, the most recent version is called the active version, and the older versions are called inactive. TSM can copy just the active versions to a separate, smaller disk pool.
In previous releases, these were done one at a time, so it could make for a long service window. With TSM v6.2, these three tasks are now run simultaneously, in parallel, so that they all get done in less time, greatly reducing the server maintenance window, and freeing up tape drives for incoming backup and archive data. Often, the same file on a disk pool is going to be processed by two or more of these scheduled tasks, so it makes sense to read it once and do all the copies and migrations at one time while the data is in buffer memory.
Enhanced Security during Data Transmission
Previous releases of TSM offered secure in-flight transmission of data for Windows and AIX clients. This security uses Secure Socket Layer (SSL) with 256-bit AES encryption. With TSM v6.2, this feature is expanded to support Linux, HP-UX and Solaris.
Improved support for Enterprise Resource Planning (ERP) applications
I remember back when we used to call these TDPs (Tivoli Data Protectors). TSM for ERP allows backup of ERP applications, seemlessly integrating with database-specific tools like IBM DB2, Oracle RMAN, and SAP BR*Tools. This allows one-to-many and many-to-one configurations between SAP servers and TSM servers. In other words, you can have one SAP server backup to several TSM servers, or several SAP servers backup to a single TSM server. This is done by splitting up data bases into "sub-database objects", and then process each object separately. This can be extremely helpful if you have databases over 1TB in size. In the event that backing up an object fails and has to be re-started, it does not impact the backup of the other objects.
However, I have to assume his real question is ... "what is the quick and easy way for me to build a lightweight database app like Microsoft Access that I can distribute as a standalone executable?"
To which I would say "Lotus has a program called Approach, which is part of Lotus SmartSuite, which some people still use. However, a lot of the focus in IBM now centers around the lightweight Cloudscape database which IBM acquired from Informix, which is now known as the [open source project called Derby]. Many IBM and Lotus products, such as Lotus Expeditor use the JDBC connection to Derby, which allows you to use Windows, Linux, Flash, etc. ... with no vendor lock in".
I am familiar with Cloudscape, and I evaluated it as a potential database for IBM TotalStorage Productivity Center, when I was the lead architect defining the version 1 release. It runs entirely on Java, which is both a plus and minus. Plus in that it runs anywhere Java runs, but a minus in that it is not optimized for high performance or large scalability. Because of this, we decided instead on using the full commercial DB2 database instead for Productivity Center.
Not to be undone, my colleagues over at DB2 offered a different alternative, [DB2 Express-C], which runs on a variety of Windows, Linux-x86, and Linux on POWER platforms. It is "free" as in beer, not free as in speech, which means you can download and use it today at no charge, and even ship products with it included, but you are not allowed to modify and distribute altered versions of it, as you can with "free as in speech" open source code, as in the case of Derby above (see [Apache License 2.0"] for details).
As I see it, DB2 Express-C has two key advantages. First, if you like the free version, you can purchase a "support contract" for those that need extra hand-holding, or are using this as part of a commercial business venture. Second,for those who do prefer vendor lock-in, it is easyto upgrade Express-C to the full IBM DB2 database product, so if you are developing a product intended for use with DB2, you can develop it first with DB2 Express-C, and migrate up to full DB2 commercial version when you are ready.
This is perhaps more information than you probably expected for such a simple question. Meanwhile, I am stilltrying to figure out MySQL as part of my [OLPC volunteer project].
Continuing my series on a [Laptop for Grandma], I thought I would pursue some of the "low-RAM" operating system choices. Grandma's Thinkpad R31 has only 384MB of RAM.
All of the ones below are based on Linux. For those who aren't familiar with installing or running the Linux operating system, here are some helpful tips:
Most Linux distributors allow you to download an ISO file for free. These can be either (a) burned to a CD, (b) burned to a DVD, or (c) written to a USB memory stick.
The ISO can be either a "LiveCD/LiveDVD" version, an installation program, or a combination of the two. The "Live" version allows you to boot up and try out the operating system without modifying the contents of your hard drive. Windows and Mac OS users can try out Linux without impact to their existing environment. Some Linux distributions offer both a full LiveCD+Installer version, as well as an alternate text-based Installer-only version. The latter often requires less RAM to use.
When installing, it is best to have the laptop plugged in to an electrical outlet, and hard-wired to the internet in case it needs to download the latest drivers for your particular hardware.
A CD can hold only 700MB. Many of the newer Linux distributions exceed that, requiring a DVD or USB stick instead. If your laptop has an older optical drive, it may not be able to read DVD media. Some older optical drives can only read CD's, not burn them. In my case, I burned the CDs on another machine, and then used them on grandma's Thinkpad R31.
To avoid burning "a set of coasters" when trying out multiple choices, consider using rewriteable optical media, or the USB option. If you don't like it, you can re-use for something else.
The program [Unetbootin] can take most ISO files and write them to a bootable USB stick. On my Red Hat Enterprise Linux 6 laptop, I had to also install p7zip and p7zip-plugins first.
The BIOS on some older machines, like my grandma's Thinkpad R31, cannot boot from USB. The [PLoP Boot Manager] allows you to first boot from floppy or CD-ROM, and then allows you to boot from the USB. This worked great on my grandma's system. The PLoP Boot Manager is also available on the [Ultimate Boot CD].
While I am a big fan of SUSE, Red Hat, and Ubuntu, these all require more RAM than available on grandma's laptop. Here are some Low-RAM alternatives I tried:
Damn Small Linux 4.11 RC2
The Damn Small Linux [DSL] project was dormant since 2008, but has a fresh new release for 2012. This baby can run in as little 16MB or RAM! If you have 128MB of RAM or more, the OS can run entirely from RAM, providing much faster performance.
Of course, there are always trade-offs, and in this case, apps were chosen for their size and memory footprint, not necessarily for their user-friendliness and eye candy. For example, the xMMS plays MP3 music, but I did not find it as friendly as iTunes or Rhythmbox.
Boot time is fast. From hitting the power-on button to playing the first note of MP3 music was about 1 minute.
Installing DSL Linux on the hard drive converts it into a Debian distribution, which then allows more options for applications.
Next up was [MacPup]. The latest version is 529, based on Pupply Linux 5.2.60 Precise, compatible with Ubuntu 12.04 Precise Pangolin. While traditional Puppy Linux clutters the screen with apps, the MacPup tries to have the look-and-feel of the MacOS by having a launcher tray at the bottom center of the screen.
Both MacPup and Puppy Linux can run in very small amounts of RAM and disk space. Like DSL above, you can opt to run MacPup entirely in 128MB of RAM. Unfortunately, the trade-off is a lack of application choices.
Installation to the hard drive was quite involved, certainly not for the beginner. First, you have to use Gparted to partition the disk. I created a 19GB (sda1) for my files, and 700MB (sda5) for swap. I had troubles with "ext4" file system, so re-formatted to "ext3". Second, you have to copy the files over from the LiveCD using the "Puppy Universal Installer". Third, you have to set up the Bootloader. Grub didn't work, so I installed Grub4Dos instead.
The music app is called "Alsa Player", and I was able to drag the icon into the startup tray. time-to-first-note was just over 1 minute. Fast, but not as "simple-to-use" as I would like.
SliTaz 4.0 claims to be able to run in as little as 48MB of RAM and 100MB of disk space. Time-to-first-note was similar to MacPup, but I didn't care for the TazPanel for setup, and the TazPkg for installing a limited set of software packages. I could not get Wi-Fi working at all on SliTaz, and just gave up trying.
All three of these ran on grandma's Thinkpad R31, and all three could play MP3 music. However, I was concerned that they were not as simple to use as grandma would like, and I would be concerned the amount of time and effort I might have to spend if things go wrong.
In my last blog post [Full Disk Encryption for Your Laptop] explained my decisions relating to Full-Disk Encryption (FDE) for my laptop. Wrapping up my week's theme of Full-Disk Encryption, I thought I would explain the steps involved to make it happen.
Last April, I switched from running Windows and Linux dual-boot, to one with Linux running as the primary operating system, and Windows running as a Linux KVM guest. I have Full Disk Encryption (FDE) implemented using Linux Unified Key Setup (LUKS).
Here were the steps involved for encrypting my Thinkpad T410:
Step 0: Backup my System
Long-time readers know how I feel about taking backups. In my blog post [Separating Programs from Data], I emphasized this by calling it "Step 0". I backed up my system three ways:
Backed up all of my documents and home user directory with IBM Tivoli Storage Manager.
Backed up all of my files, including programs, bookmarks and operating settings, to an external disk drive (I used rsync for this). If you have a lot of bookmarks on your browser, there are ways to dump these out to a file to load them back in the later step.
Backed up the entire hard drive using [Clonezilla].
Clonezilla allows me to do a "Bare Machine Recovery" of my laptop back to its original dual-boot state in less than an hour, in case I need to start all over again.
Step 1: Re-Partition the Drive
"Full Disk Encryption" is a slight misnomer. For external drives, like the Maxtor BlackArmor from Seagate (Thank you Allen!), there is a small unencrypted portion that contains the encryption/decryption software to access the rest of the drive. Internal boot drives for laptops work the same way. I created two partitions:
A small unencrypted partition (2 GB) to hold the Master Boot Record [MBR], Grand Unified Bootlloader [GRUB], and the /boot directory. Even though there is no sensitive information on this partition, it is still protected the "old way" with the hard-drive password in the BIOS.
The rest of the drive (318GB) will be one big encrypted Logical Volume Manager [LVM] container, often referred to as a "Physical Volume" in LVM terminology.
Having one big encrypted partition means I only have to enter my ridiculously-long encryption password once during boot-up.
Step 2: Create Logical Volumes in the LVM container
I create three logical volumes on the encrypted physical container: swap, slash (/) directory, and home (/home). Some might question the logic behind putting swap space on an encrypted container. In theory, swap could contain sensitive information after a system [hybernation]. I separated /home from slash(/) so that in the event I completely fill up my home directory, I can still boot up my system.
Step 3: Install Linux
Ideally, I would have lifted my Linux partition "as is" for the primary OS, and a Physical-to-Virtual [P2V] conversion of my Windows image for the guest VM. Ha! To get the encryption, it was a lot simpler to just install Linux from scratch, so I did that.
Step 4: Install Windows guest KVM image
The folks in our "Open Client for Linux" team made this step super-easy. Select Windows XP or Windows 7, and press the "Install" button. This is a fresh install of the Windows operating system onto a 30GB "raw" image file.
(Note: Since my Thinkpad T410 is Intel-based, I had to turn on the 'Intel (R) Virtualization Technology' option in the BIOS!)
There are only a few programs that I need to run on Windows, so I installed them here in this step.
Step 5: Set up File Sharing between Linux and Windows
In my dual-boot set up, I had a separate "D:" drive that I could access from either Windows or Linux, so that I would only have to store each file once. For this new configuration, all of my files will be in my home directory on Linux, and then shared to the Windows guest via CIFS protocol using [samba].
In theory, I can share any of my Linux directories using this approach, but I decide to only share my home directory. This way, any Windows viruses will not be able to touch my Linux operating system kernels, programs or settings. This makes for a more secure platform.
Step 6: Transfer all of my files back
Here I used the external drive from "Step 0" to bring my data back to my home directory. This was a good time to re-organize my directory folders and do some [Spring cleaning].
Step 7: Re-establish my backup routine
Previously in my dual-boot configuration, I was using the TSM backup/archive client on the Windows partition to backup my C: and D: drives. Occasionally I would tar a few of my Linux directories and storage the tarball on D: so that it got included in the backup process. With my new Linux-based system, I switched over to the Linux version of TSM client. I had to re-work the include/exclude list, as the files are different on Linux than Windows.
One of my problems with the dual-boot configuration was that I had to manually boot up in Windows to do the TSM backup, which was disruptive if I was using Linux. With this new scheme, I am always running Linux, and so can run the TSM client any time, 24x7. I made this even better by automatically scheduling the backup every Monday and Thursday at lunch time.
There is no Linux support for my Maxtor BlackArmor external USB drive, but it is simple enough to LUKS-encrypt any regular external USB drive, and rsync files over. In fact, I have a fully running (and encrypted) version of my Linux system that I can boot directly from a 32GB USB memory stick. It has everyting I need except Windows (the "raw" image file didn't fit.)
I can still use Clonezilla to make a "Bare Machine Recovery" version to restore from. However, with the LVM container encrypted, this renders the compression capability worthless, and so takes a lot longer and consumes over 300GB of space on my external disk drive.
Backing up my Windows guest VM is just a matter of copying the "raw" image file to another file for safe keeping. I do this monthly, and keep two previous generations in case I get hit with viruses or "Patch Tuesday" destroys my working Windows image. Each is 30GB in size, so it was a trade-off between the number of versions and the amount of space on my hard drive. TSM backup puts these onto a system far away, for added protection.
Step 8: Protect your Encryption setup
In addition to backing up your data, there are a few extra things to do for added protection:
Add a second passphrase. The first one is the ridiculously-long one you memorize faithfully to boot the system every morning. The second one is a ridiculously-longer one that you give to your boss or admin assistant in case you get hit by a bus. In the event that your boss or admin assistant leaves the company, you can easily disable this second passprhase without affecting your original.
Backup the crypt-header. This is the small section in front that contains your passphrases, so if it gets corrupted, you would not be able to access the rest of your data. Create a backup image file and store it on an encrypted USB memory stick or external drive.
If you are one of the lucky 70,000 IBM employees switching from Windows to Linux this year, Welcome!
Earlier this year, IBM mandated that every employee provided a laptop had to implement Full-Disk Encryption for their primary hard drive, and any other drive, internal or external, that contained sensitive information. An exception was granted to anyone who NEVER took their laptop out of the IBM building. At IBM Tucson, we have five buildings, so if you are in the habit of taking your laptop from one building to another, then encryption is required!
The need to secure the information on your laptop has existed ever since laptops were given to employees. In my blog post [Biggest Mistakes of 2006], I wrote the following:
"Laptops made the news this year in a variety of ways. #1 was exploding batteries, and #6 were the stolen laptops that exposed private personal information. Someone I know was listed in one of these stolen databases, so this last one hits close to home. Security is becoming a bigger issue now, and IBM was the first to deliver device-based encryption with the TS1120 enterprise tape drive."
Not surprisingly, IBM laptops are tracked and monitored. In my blog post [Using ILM to Save Trees], I wrote the following:
"Some assets might be declared a 'necessary evil' like laptops, but are tracked to the n'th degree to ensure they are not lost, stolen or taken out of the building. Other assets are declared "strategically important" but are readily discarded, or at least allowed to [walk out the door each evening]."
Unfortunately, dual-boot environments won't cut it for Full-Disk Encryption. For Windows users, IBM has chosen Pretty Good Privacy [PGP]. For Linux users, IBM has chosen Linux Unified Key Setup [LUKS]. PGP doesn't work with Linux, and LUKS doesn't work with Windows.
For those of us who may need access to both Operating Systems, we have to choose. Select one as the primary OS, and run the other as a guest virtual machine. I opted for Red Hat Enterprise Linux 6 as my primary, with LUKS encryption, and Linux KVM to run Windows as the guest.
I am not alone. While I chose the Linux method voluntarily, IBM has decided that 70,000 employees must also set up their systems this way, switching them from Windows to Linux by year end, but allowing them to run Windows as a KVM guest image if needed.
Let's take a look at the pros and cons:
LUKS allows for up to 8 passphrases, so you can give one to your boss, one to your admin assistant, and in the event they leave the company, you can disable their passphrase without impacting anyone else or having to memorize a new one. PGP on Windows supports only a single passphrase.
Linux is a rock-solid operating system. I found that Windows as a KVM guest runs better than running it natively in a dual-boot configuration.
Linux is more secure against viruses. Most viruses run only on Windows operating systems. The Windows guest is well isolated from the Linux operating system files. Recovering from an infected or corrupted Windows guest is merely re-cloning a new "raw" image file.
Linux has a vibrant community of support. I am very impressed that anytime I need help, I can find answers or assistance quickly from other Linux users. Linux is also supported by our help desk, although in my experience, not as well as the community offers.
Employees that work with multiple clients can have a separate Windows guest for each one, preventing any cross-contamination between systems.
Linux is different from Windows, and some learning curve may be required. Not everyone is happy with this change.
(I often joke that the only people who are comfortable with change are babies with soiled diapers and prisoners on death row!)
Implementation is a full re-install of Linux, followed by a fresh install of Windows.
Not all software required for our jobs at IBM runs on Linux, so a Windows guest VM is a necessity. If you thought Windows ran slowly on a fully-encrypted disk, imagine how much slower it runs as a VM guest with limited memory resources.
In theory, I could have tried the Windows/PGP method for a few weeks, then gone through the entire process to switch over to Linux/LUKS, and then draw my comparisons that way. Instead, I just chose the Linux/LUKS method, and am happy with my decision.
Well, it's Wednesday, and you know what that means... IBM Announcements!
(Actually most IBM announcements are on Tuesdays, but IBM gave me extra time to recover from my trip to Europe!)
Today, IBM announced [IBM PureSystems], a new family of expert-integrated systems that combine storage, servers, networking, and software, based on IBM's decades of experience in the IT industry. You can register for the [Launch Event] today (April 11) at 2pm EDT, and download the companion "Integrated Expertise" event app for Apple, Android or Blackberry smartphones.
(If you are thinking, "Hey, wait a minute, hasn't this been done before?" you are not alone. Yes, IBM introduced the System/360 back in 1964, and the AS/400 back in 1988, so today's announcement is on scheduled for this 24-year cycle. Based on IBM's past success in this area, others have followed, most recently, Oracle, HP and Cisco.)
Initially, there are two offerings:
IBM PureFlex™ System
IBM PureFlex is like IaaS-in-a-box, allowing you to manage the system as a pool of virtual resources. It can be used for private cloud deployments, hybrid cloud deployments, or by service providers to offer public cloud solutions. IBM drinks its own champagne, and will have no problem integrating these into its [IBM SmartCloud] offerings.
To simplify ordering, the IBM PureFlex comes in three tee-shirt sizes: Express, Standard and Enterprise.
IBM PureFlex is based on a 10U-high, 19-inch wide, standard rack-mountable chassis that holds 14 bays, organized in a 7 by 2 matrix. Unlike BladeCenter where blades are inserted vertically, the IBM PureFlex nodes are horizontal. Some of the nodes take up a single bay (half-wide), but a few are full-wide, take up two bays, the full 19-inch width of the chassis. Compute and storage snap in the front, while power supplies, fans, and networking snap in the back. You can fit up to four chassis in a standard 42U rack.
Unlike competitive offerings, IBM does not limit you to x86 architectures. Both x86 and POWER-based compute nodes can be mixed into a single chassis. Out of the box, the IBM PureFlex supports four operating systems (AIX, IBM i, Linux and Windows), four server hypervisors (Hyper-V, Linux KVM, PowerVM, and VMware), and two storage hypervisors (SAN Volume Controller and Storwize V7000).
There are a variety of storage options for this. IBM will offer SSD and HDD inside the compute nodes themselves, direct-attached storage nodes, and an integrated version of the Storwize V7000 disk system. Of course, every IBM System Storage product is supported as external storage. Since Storwize V7000 and SAN Volume Controller support external virtualization, many non-IBM devices will be supported automatically as well.
Networking is also optimized, with options for 10Gb and 40Gb Ethernet/FCoE, 40Gb and 56Gb Infiniband, 8Gbps and 16Gbps Fibre Channel. Much of the networking traffic can be handled within the chassis, to minimize traffic on external switches and directors.
For management, IBM offers the Flex System Manager, that allows you to manage all the resources from a single pane of glass. The goal is to greatly simplify the IT lifecycle experience of procurement, installation, deployment and maintenance.
IBM PureApplication™ System
IBM PureApplication is like PaaS-in-a-box. Based on the IBM PureFlex infrastructure, the IBM PureApplication adds additional software layers focused on transactional web, business logic, and database workloads. Initially, it will offer two platforms: Linux platform based on x86 processors, Linux KVM and Red Hat Enterprise Linux (RHEL); and a UNIX platform based on POWER7 processors, PowerVM and AIX operating system. It will be offered in four tee-shirt sizes (small, medium, large and extra large).
In addition to having IBM's middleware like DB2 and WebSphere optimized for this platform, over 600 companies will announce this week that they will support and participate in the IBM PureSystems ecosystem as well. Already, there are 150 "Patterns of Expertise" ready to deploy from IBM PureSystem Centre, a kind of a "data center app store", borrowing an idea used today with smartphones.
By packaging applications in this manner, workloads can easily shift between private, hybrid and public clouds.
If you are unhappy with the inflexibility of your VCE Vblock, HP Integrity, or Oracle ExaLogic, talk to your local IBM Business Partner or Sales Representative. We might be able to buy your boat anchor off your hands, as part of an IBM PureSystems sale, with an attractive IBM Global Financing plan.
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday breakout sessions.
Aging Data: The Challenges of Long-Term Data Retention
The analyst defined "aging data" to be any data that is older than 90 days. A quick poll of the audience showed the what type of data was the biggest challenge:
In addition to aging data, the analyst used the term "vintage" to refer to aging data that you might actually need in the future, and "digital waste" being data you have no use for. She also defined "orphaned" data as data that has been archived but not actively owned or managed by anyone.
You need policies for retention, deletion, legal hold, and access. Most people forget to include access policies. How are people dealing with data and retention policies? Here were the poll results:
The analyst predicts that half of all applications running today will be retired by 2020. Tools like "IBM InfoSphere Optim" can help with application retirement by preserving both the data and metadata needed to make sense of the information after the application is no longer available. App retirement has a strong ROI.
Another problem is that there is data growth in unstructured data, but nobody is given the responsibility of "archivist" for this data, so it goes un-managed and becomes a "dumping ground". Long-term retention involves hardware, software and process working together. The reason that purpose-built archive hardware (such as IBM's Information Archive or EMC's Centera) was that companies failed to get the appropriate software and process to complete the solution.
Cloud computing will help. The analyst estimates that 40 percent of new email deployments will be done in the cloud, such as IBM LotusLive, Google Apps, and Microsoft Online365. This offloads the archive requirement to the public cloud provider.
A case study is University of Minnesota Supercomputing Institute that has three tiers for their storage: 136TB of fast storage for scratch space, 600TB of slower disk for project space, and 640 TB of tape for long-term retention.
What are people using today to hold their long-term retention data? Here were the poll results:
Bottom line is that retention of aging data is a business problem, techology problem, economic problem and 100-year problem.
A Case Study for Deploying a Unified 10G Ethernet Network
Brian Johnson from Intel presented the latest developments on 10Gb Ethernet. Case studies from Yahoo and NASA, both members of the [Open Data Center Alliance] found that upgrading from 1Gb to 10Gb Ethernet was more than just an improvement in speed. Other benefits include:
45 percent reduction in energy costs for Ethernet switching gear
80 percent fewer cables
15 percent lower costs
doubled bandwidth per server
Ruiping Sun, from Yahoo, found that 10Gb FCoE achieved 920 MB/sec, which was 15 percent faster than the 8Gb FCP they were using before.
IBM, Dell and other Intel-based servers support Single Root I/O Virtualization, or SR-IOV for short. NASA found that cloud-based HPC is feasible with SR-IOV. Using IBM General Parallel File System (GPFS) and 10Gb Ethernet were able to replace a previous environment based on 20 Gbps DDR Infiniband.
While some companies are still arguing over whether to implement a private cloud, an archive retention policy, or 10Gb Ethernet, other companies have shown great success moving forward!
Since the [IBM System Storage Technical University 2011] runs concurrently with the System x Technical University, attendees are allowed to mix-and-match. I attended several presentations regarding server virtualization and hypervisors.
Matt Archibald is an IT Management Consultant in IBM's Systems Agenda Delivery team. He started with a history of hypervisors, from IBM's early CP/CMS in 1967, through the latest VMware Vsphere 5 just announced.
He explained that there are three types of Hypervisor architectures today:
Type 1 - often referred to as "Bare Metal" runs directly on the server host hardware, and allows different operating system virtual machines to run as guests. IBM's System z [PR/SM] and [PowerVM] as well as the popular VMware ESXi are examples of this type.
Type 2 - often referred to as "Hosted" runs above an existing operating system, and allows different operating system virtual machines to run as guests. The popular [Oracle/Sun VirtualBox] is an example of this type.
OS Containers - runs above an existing operating system base, and allows multiple "guests" that all run the same operating system as the base. This affords some isolation between applications. [Parallels Virtuozzo Containers] is an example of this type.
The dominant architecture is Type 1. For x86, IBM is the number one reseller of VMware. VMware recently announced [Vsphere 5], which changes its licensing model from CPU-based to memory-based. For example, a virtual machine with 32 virtual CPUs and 1TB of virtual RAM (VRAM) would cost over $73,000 per year to license the VMware "Enterprise Plus" software. The only plus-side to this new licensing is that the "memory" entitlement transfers during Disaster Recovery to the remote location.
"Xen is dead." was the way Matt introduced the section discussing Hybrid Type-1 hypervisors like Xen and Hyper-V. These run bare-metal, but require networking and storage I/O to be processed by a single bottleneck partition referred to as "Dom 0". As such, this hybrid approach does not scale well on larger multi-sock host servers. So, his Xen-is-dead message was referring to all Hybrid-based Hypervisors including Hyper-V, not just those based on Xen itself.
The new up-and-comer is "Linux KVM". Last year, in my blog post about [System x KVM solutions], I mentioned the confusion over KVM acronym used with two different meanings. Many people use KVM to refer to Keyboard-Video-Mouse switches that allow access to multiple machines. IBM has renamed these switches to Local Console Managers (LCM) and Global Console Manager (GCM). This year, the System x team have adopted the use of "Linux KVM" to refer to the second meaning, the [Kernel-based Virtual Machine] hypervisor.
Linux KVM is not a product, but an open-source project. As such, it is built into every Linux kernel. Red Hat has created two specific deliverables under the name Red Hat Enterprise Virtualization (RHEV):
RHEV-H, a tiny ESXi-like bare-metal hypervisor that fits in 78MB, making it small enough to be on a USB stick, CD-rom or memory chip.
RHEV-M, a vCenter-like management software to manage multiple virtual machines across multiple hosts.
Personally, I run RHEL 6.1 with KVM on my IBM laptop as my primary operating system, with a Windows XP guest image to run a few Windows-specific applications.
A complaint of the current RHEV 2.2 release from Linux fanboys is that RHEV-M requires a Windows server, and uses Windows Powershell for scripting. The next release of RHEV is likely to provide a Linux-based option for management server.
Of the various hypervisors evaluated, KVM appears to be poised to offer the best scalability for multi-socket host machines. The next release is expected to support up to 4096 threads, 64TB of RAM, and over 2000 virtual machines. Compare that to VMware Vsphere 5 that supports only 160 threads, 2TB of RAM and up to 512 virtual machines.
Linux KVM Overview
Matt also presented a session focused on Linux KVM. While IBM is the leading reseller of VMware for the x86 server platform, it has chosen Linux KVM to run all of its internal x86 Cloud Computing facilities, as it can offer 40 to 80 percent savings, based on Total Cost of Ownership (TCO).
Linux KVM can run unmodified Windows and Linux guest operating systems as guest images with less than 5 percent overhead. Since KVM is built into the Linux kernel, any certification testing automatically benefits KVM as well. KVM takes advantage of modern CPU extensions like Intel's VT and AMD's AMD-V.
For high availability, in the event that a host fails, KVM can restart the guest images on other KVM hosts. RHEV offers "prioritized restart order" which allows mision-critical images to be started before less important ones.
RHEV also provides "Virtual Desktop Infrastructure", known as VDI. This allows a lightweight client with a browser to access an OS image running on a KVM host. Matt was able to demonstrate this with Firefox browser running on his Android-based Nexus One smartphone.
RHEV also adds features that make it ideal for cloud deployments, including hot-pluggable CPU, network and storage; service Level Agreement monitoring for CPU, memory and I/O resources; storage live migrations to move the raw image files while guests are running; and a self-service user portal.
IBM has been doing server virtualization for decades. When I first started at IBM in 1986, I was doing z/OS development and testing on z/VM guest images. Later, around 1999, I started working with the "Linux on z" team, running multiple Linux images under PR/SM and z/VM. While the server virtualization solutions most people are familiar with (VMware, Hyper-V, Xen) have only been around the last five years or so, IBM has a much deeper and robust understanding and long heritage. This helps to set IBM apart from the competition when helping clients.
(FTC Disclosure: I do not work or have any financial investments in ENC Security Systems. ENC Security Systems did not paid me to mention them on this blog. Their mention in this blog is not an endorsement of either their company or any of their products. Information about EncryptStick was based solely on publicly available information and my own personal experiences. My friends at ENC Security Systems provided me a full-version pre-loaded stick for this review.)
The EncryptStick software comes in two flavors, a free/trial version, and the full/paid version. The free trial version has [limits on capacity and time] but provides enough glimpse of the product to decide before you buy the full version. You can download the software yourself and put in on your own USB device, or purchase the pre-loaded stick that comes with the full-version license.
Whichever you choose, the EncryptStick offers three nice protection features:
Encryption for data organized in "storage vaults", which can be either on the stick itself, or on any other machine the stick is connected to. That is a nice feature, because you are not limited to the capacity of the USB stick.
Encrypted password list for all your websites and programs.
A secure browser, that prevents any key-logging or malware that might be on the host Windows machine.
I have tried out all three functions and everything works as advertised. However, there is always room for improvement, so here are my suggestions.
The first problem is that the pre-loaded stick looks like it is worth a million dollars. It is in a shiny bronze color with "EncryptStick" emblazoned on it. This is NOT subtle advertising! This 8GB capacity stick looks like it would be worth stealing solely on being a nice piece of jewelry, and then the added bonus that there might be "valuable secrets" just makes that possibility even more likely.
If you want to keep your information secure, it would help to have "plausible deniability" that there is nothing of value on a stick. Either have some corporate logo on it, of have the stick look like a cute animal, like these pig or chicken USB sticks.
It reminds me how the first Apple iPod's were in bright [Mug-me White]. I use black headphones with my black iPod to avoid this problem.
Of course, you can always install the downloadable version of EncryptStick software onto a less conspicuous stick if you are concerned about theft. The full/paid version of EncryptStick offers an option for "lost key recovery" which would allow you to backup the contents of the stick and be able to retrieve them on a newly purchased stick in the event your first one is lost or stolen.
Imagine how "unlucky" I felt when I notice that I had lost my "rabbits feet" on this cute animal-themed USB stick.
I sense trouble for losing the cap on my EncryptStick as well. This might seem trivial, but is a pet-peeve of mine that USB sticks should plan for this. Not only is there nothing to keep the cap on (it slides on and off quite smoothly), but there is no loop to attach the cap to anything if you wanted to.
Since then, I got smart and try to look for ways to keep the cap connected. Some designs, like this IBM-logoed stick shown above, just rotate around an axle, giving you access when you need it, and protection when it is folded closed.
Alternatively, get a little chain that allows you to attach the cap to the main stick. In the case of the pig and chicken, the memory section had a hole pre-drilled and a chain to put through it. I drilled an extra hole in the cap section of each USB stick, and connected the chain through both pieces.
(Warning: Kids, be sure to ask for assistance from your parents before using any power tools on small plastic objects.)
The EncryptStick can run on either Microsoft Windows or Mac OS. The instructions indicate that you can install both versions of download software onto a single stick, so why not do that for the pre-loaded full version? The stick I have had only the Windows version pre-loaded. I don't know if the Windows and Mac OS versions can unlock the same "storage vaults" on the stick.
Certainly, I have been to many companies where either everyone runs Windows or everyone runs Mac OS. If the primary target audience is to use this stick at work in one of those places, then no changes are required. However, at IBM, we have employees using Windows, Mac OS and Linux. In my case, I have all three! Ideally, I would like a version of EncryptStick that I could take on trips with me that would allow me to use it regardless of the Operating System I encountered.
Since there isn't a Linux-version of EncryptStick software, I decided to modify my stick to support booting Linux. I am finding more and more Linux kiosks when I travel, especially at airports and high-traffic locations, so having a stick that works both in Windows or Linux would be useful. Here are some suggestions if you want to try this at home:
Use fdisk to change the FAT32 partition type from "b" to "c". Apparently, Grub2 requires type "c", but the pre-loaded EncryptStick was set to "b". The Windows version of EncryptStick> seems to work fine in either mode, so this is a harmless change.
Install Grub2 with "grub-install" from a working Linux system.
Once Grub2 is installed, you can boot ISO images of various Linux Rescue CDs, like [PartedMagic] which includes the open-source [TrueCrypt] encryption software that you could use for Linux purposes.
This USB stick could also be used to help repair a damaged or compromised Windows system. Consider installing [Ophcrack] or [Avira].
Certainly, 8GB is big enough to run a full Linux distribution. The latest 32-bit version of [Ubuntu] could run on any 32-bit or 64-bit Intel or AMD x86 machine, and have enough room to store an [encrypted home directory].
Since the stick is formatted FAT32, you should be able to run your original Windows or Mac OS version of EncryptStick with these changes.
Depending on where you are, you may not have the luxury to reboot a system from the USB memory stick. Certainly, this may require changes to the boot sequence in the BIOS and/or hitting the right keys at the right time during the boot sequence. I have been to some "Internet Cafes" that frown on this, or have blocked this altogether, forcing you to boot only from the hard drive.
Well, those are my suggestions. Whether you go on a trip with or without your laptop, it can't hurt to take this EncryptStick along. If you get a virus on your laptop, or have your laptop stolen, then it could be handy to have around. If you don't bring your laptop, you can use this at Internet cafes, hotel business centers, libraries, or other places where public computers are available.
This week, IBM celebrates its Centennial, 100 years since its incorporation on June 16, 1911.
A few months ago, the Tucson Executive Briefing Center ordered its latest IBM System Storage [DS8800] to be on display for demos. This was manufactured in Vác, Hungary (about an hour north of Budapest), and was going to be shipped over to the United States.
However, Sam Palmisano, IBM Chairman and CEO, was in Hannover, Germany for the [CeBIT conference] and wanted this DS8800 to be re-directed to Germany first for this event. He was kind enough to sign it for us. Brian Truskowski, IBM General Manager for Storage, and Rod Adkins, IBM Senior Vice President for IBM Systems Technolgoy Group (and my fifth-line manager), also signed this as well!
I am pleased to say this "signed" DS8000 has arrived to Tucson. This is the latest model in a family of market-leading high-end enterprise-class disk systems designed to attach to all computers, including System z mainframes, POWER systems running AIX and IBM i, as well as servers running HP-UX, Solaris, Linux or Windows.
For more on IBM's other innovations over the past 100 years, check out the [Icons of Progress], which includes these storage innovations:
IBM announced that it will offer [three free months of IBM Smart Business Cloud] computing and storage services to government agencies, charitable non-profit organizations, and other organizations involved with reconstruction resulting from the earthquakes and tsunami in Japan and the northern Pacific region.
With traditional communications down, and many data centers incapacitated, Cloud Computing can be a great way to resume operations. According to the announcement, organizations can submit their requests now until April 30, and the program will run until July 31, 2011. Options include:
Virtual machine images running 32-bit and 64-bit versions of Linux or Windows.
60GB to 2 TB of disk storage per instance.
Options for various IBM middleware (DB2, Informix, Lotus, and WebSphere)
Rational Application Lifecycle Management and Tivoli Monitoring software
The offer also includes [LotusLive] Software-as-a-Service (SaaS) for email and online collaboration. For more about LotusLive, see this [Red Paper].