Modified by TonyPearson
For the longest time, people thought that humans could not run a mile in less than four minutes. Then, in 1954, [Sir Roger Bannister] beat that perception, and shortly thereafter, once he showed it was possible, many other runners were able to achieve this also. The same is being said now about the IBM Watson computer which appeared this week against two human contestants on Jeopardy!
(2014 Update: A lot has happened since I originally wrote this blog post! I intended this as a fun project for college students to work on during their summer break. However, IBM is concerned that some businesses might be led to believe they could simply stand up their own systems based entirely on open source and internally developed code for business use. IBM recommends instead the [IBM InfoSphere BigInsights] which packages much of the software described below. IBM has also launched a new "Watson Group" that has [Watson-as-a-Service] capabilities in the Cloud. To raise awareness to these developments, IBM has asked me to rename this post from IBM Watson - How to build your own "Watson Jr." in your basement to the new title IBM Watson -- How to replicate Watson hardware and systems design for your own use in your basement. I also took this opportunity to improve the formatting layout.)
Often, when a company demonstrates new techology, these are prototypes not yet ready for commercial deployment until several years later. IBM Watson, however, was made mostly from commercially available hardware, software and information resources. As several have noted, the 1TB of data used to search for answers could fit on a single USB drive that you buy at your local computer store.
But could you fit an entire Watson in your basement? The IBM Power 750 servers used in IBM Watson earned the [EPA Energy Star] rating, and is substantially [more energy-efficient than comparable 4-socket x86, Itanium, or SPARC servers]. However, having ninety of them in your basement would drive up your energy bill.
That got me thinking, would it be possible to build your own question-answering system, something less fancy, less sophisticated, scaled-down for personal use? John Pultorak explained [how to build your own Apollo Guidance Computer (AGC) in your basement]. Jay Shafer explains [how to build your own house for $20K]. And a 17-year-old George Hotz figured out a [hack to unlock your Apple iPhone] over the summer in his basement.
It turns out that much of the inner workings of IBM Watson were written in a series of articles in [IBM Systems Journal, Vol. 43, No. 3]. You can also read the [Wikipedia article]. Eric Brown from IBM Research will be presenting "Jeopardy: Under the Hood of IBM Watson Supercomputer" at next month's [The Linux Foundation End User Summit].
Take a look at the [IBM Research Team] to determine how the project was organized. Let's decide what we need, and what we don't in our version for personal use:
Do we need it for personal use?
Yes, That's you. Assuming this is a one-person project, you will act as Team Lead.
Yes, I hope you know computer programming!
No, since this version for personal use won't be appearing on Jeopardy, we won't need strategy on wager amounts for the Daily Double, or what clues to pick next. Let's focus merely on a computer that can accept a question in text, and provide an answer back, in text.
Yes, this team focused on how to wire all the hardware together. We need to do that, although this version for personal use will have fewer components.
Optional. For now, let's have this version for personal use just return its answer in plain text. Consider this Extra Credit after you get the rest of the system working. Consider using [eSpeak], [FreeTTS], or the Modular Architecture for Research on speech sYnthesis [MARY] Text-to-Speech synthesizers.
Yes, I will explain what this is, and why you need it.
Yes, we will need to get information for personal use to process
Yes, this team developed a system for parsing the question being asked, and to attach meaning to the different words involved.
No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.
Yes, even for a one-person project, having a little "project management" never hurt anyone. I highly recommend the book [Getting Things Done: The Art of Stress-Free Productivity] by David Allen].
(Disclaimer: As with any Do-It-Yourself (DIY) project, I am not responsible if you are not happy with your version for personal use I am basing the approach on what I read from publicly available sources, and my work in Linux, supercomputers, XIV, and SONAS. For our purposes, this version for personal use is based entirely on commodity hardware, open source software, and publicly available sources of information. Your implementation will certainly not be as fast or as clever as the IBM Watson you saw on television.)
Step 1: Buy the Hardware
Supercomputers are built as a cluster of identical compute servers lashed together by a network. You will be installing Linux on them, so if you can avoid paying extra for Microsoft Windows, that would save you some money. Here is your shopping list:
Three x86 hosts, with the following:
64-bit quad-core processor, either Intel-VT or AMD-V capable,
8GB of DRAM, or larger
300GB of hard disk, or larger
CD or DVD Read/Write drive
Computer Monitor, mouse and keyboard
Ethernet 1GbE 4-port hub, and appropriate RJ45 cables
Surge protector and Power strip
Local Console Monitor (LCM) 4-port switch (formerly known as a KVM switch) and appropriate cables. This is optional, but will make it easier during the development. Once your implementation is operational, you will only need the monitor and keyboard attached to one machine. The other two machines can remain "headless" servers.
Step 2: Establish Networking
IBM Watson used Juniper switches running at 10Gbps Ethernet (10GbE) speeds, but was not connected to the Internet while playing Jeopardy! Instead, these Ethernet links were for the POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes.
The implementation will be able to run "disconnected from the Internet" as well. However, you will need Internet access to download the code and information sources. For our purposes, 1GbE should be sufficient. Connect your Ethernet hub to your DSL or Cable modem. Connect all three hosts to the Ethernet switch. Connect your keyboard, video monitor and mouse to the LCM, and connect the LCM to the three hosts.
Step 3: Install Linux and Middleware
To say I use Linux on a daily basis is an understatement. Linux runs on my Android-based cell phone, my laptop at work, my personal computers at home, most of our IBM storage devices from SAN Volume Controller to XIV to SONAS, and even on my Tivo at home which recorded my televised episodes of Jeopardy!
For this project, you can use any modern Linux distribution that supports KVM. IBM Watson used Novel SUSE Linux Enterprise Server [SLES 11]. Alternatively, I can also recommend either Red Hat Enterprise Linux [RHEL 6] or Canonical [Ubuntu v10]. Each distribution of Linux comes in different orientations. Download the the 64-bit "ISO" files for each version, and burn them to CDs.
Graphical User Interface (GUI) oriented, often referred to as "Desktop" or "HPC-Head"
Command Line Interface (CLI) oriented, often referred to as "Server" or "HPC-Compute"
Guest OS oriented, to run in a Hypervisor such as KVM, Xen, or VMware. Novell calls theirs "Just Enough Operating System" [JeOS].
For this version for personal use, I have chosen a [multitier architecture], sometimes referred to as an "n-tier" or "client/server" architecture.
Host 1 - Presentation Server
For the Human-Computer Interface [HCI], the IBM Watson received categories and clues as text files via TCP/IP, had a [beautiful avatar] representing a planet with 42 circles streaking across in orbit, and text-to-speech synthesizer to respond in a computerized voice. Your implementation will not be this sophisticated. Instead, we will have a simple text-based Query Panel web interface accessible from a browser like Mozilla Firefox.
Host 1 will be your Presentation Server, the connection to your keyboard, video monitor and mouse. Install the "Desktop" or "HPC Head Node" version of Linux. Install [Apache Web Server and Tomcat] to run the Query Panel. Host 1 will also be your "programming" host. Install the [Java SDK] and the [Eclipse IDE for Java Developers]. If you always wanted to learn Java, now is your chance. There are plenty of books on Java if that is not the language you normally write code.
While three little systems doesn't constitute an "Extreme Cloud" environment, you might like to try out the "Extreme Cloud Administration Tool", called [xCat], which was used to manage the many servers in IBM Watson.
Host 2 - Business Logic Server
Host 2 will be driving most of the "thinking". Install the "Server" or "HPC Compute Node" version of Linux. This will be running a server virtualization Hypervisor. I recommend KVM, but you can probably run Xen or VMware instead if you like.
Host 3 - File and Database Server
Host 3 will hold your information sources, indices, and databases. Install the "Server" or "HPC Compute Node" version of Linux. This will be your NFS server, which might come up as a question during the installation process.
Technically, you could run different Linux distributions on different machines. For example, you could run "Ubuntu Desktop" for host 1, "RHEL 6 Server" for host 2, and "SLES 11" for host 3. In general, Red Hat tries to be the best "Server" platform, and Novell tries to make SLES be the best "Guest OS".
My advice is to pick a single distribution and use it for everything, Desktop, Server, and Guest OS. If you are new to Linux, choose Ubuntu. There are plenty of books on Linux in general, and Ubuntu in particular, and Ubuntu has a helpful community of volunteers to answer your questions.
Step 4: Download Information Sources
You will need some documents for your implementation to process.
IBM Watson used a modified SONAS to provide a highly-available clustered NFS server. For this version, we won't need that level of sophistication. Configure Host 3 as the NFS server, and Hosts 1 and 2 as NFS clients. See the [Linux-NFS-HOWTO] for details. To optimize performance, host 3 will be the "official master copy", but we will use a Linux utility called rsync to copy the information sources over to the hosts 1 and 2. This allows the task engines on those hosts to access local disk resources during question-answer processing.
We will also need a relational database. You won't need a high-powered IBM DB2. Your implementation can do fine with something like [Apache Derby] which is the open source version of IBM CloudScape from its Informix acquisition. Set up Host 3 as the Derby Network Server, and Hosts 1 and 2 as Derby Network Clients. For more about structured content in relational databases, see my post [IBM Watson - Business Intelligence, Data Retrieval and Text Mining].
Linux includes a utility called wget which allows you to download content from the Internet to your system. What documents you decide to download is up to you, based on what types of questions you want answered. For example, if you like Literature, check out the vast resources at [FullBooks.com]. You can automate the download by writing a shell script or program to invoke wget to all the places you want to fetch data from. Rename the downloaded files to something unique, as often they are just "index.html". For more on wget utility, see [IBM Developerworks].
Step 5: The Query Panel - Parsing the Question
Next, we need to parse the question and have some sense of what is being asked for. For this we will use [OpenNLP] for Natural Language Processing, and [OpenCyc] for the conceptual logic reasoning. See Doug Lenat presenting this 75-minute video [Computers versus Common Sense]. To learn more, see the [CYC 101 Tutorial].
Unlike Jeopardy! where Alex Trebek provides the answer and contestants must respond with the correct question, we will do normal Question-and-Answer processing. To keep things simple, we will limit questions to the following formats:
Who is ...?
Where is ...?
When did ... happen?
What is ...?
Host 1 will have a simple Query Panel web interface. At the top, a place to enter your question, and a "submit" button, and a place at the bottom for the answer to be shown. When "submit" is pressed, this will pass the question to "main.jsp", the Java servlet program that will start the Question-answering analysis. Limiting the types of questions that can be posed will simplify hypothesis generation, reduce the candidate set and evidence evaluation, allowing the analytics processing to continue in reasonable time.
Step 6: Unstructured Information Management Architecture
The "heart and soul" of IBM Watson is Unstructured Information Management Architecture [UIMA]. IBM developed this, then made it available to the world as open source. It is maintained by the [Apache Software Foundation], and overseen by the Organization for the Advancement of Structured Information Standards [OASIS].
Basically, UIMA lets you scan unstructured documents, gleam the important points, and put that into a database for later retrieval. In the graph above, DBs means 'databases' and KBs means 'knowledge bases'. See the 4-minute YouTube video of [IBM Content Analytics], the commercial version of UIMA.
Starting from the left, the Collection Reader selects each document to process, and creates an empty Common Analysis Structure (CAS) which serves as a standardized container for information. This CAS is passed to Analysis Engines , composed of one or more Annotators which analyze the text and fill the CAS with the information found. The CAS are passed to CAS Consumers which do something with the information found, such as enter an entry into a database, update an index, or update a vote count.
(Note: This point requires, what we in the industry call a small matter of programming, or [SMOP]. If you've always wanted to learn Java programming, XML, and JDBC, you will get to do plenty here. )
If you are not familiar with UIMA, consider this [UIMA Tutorial].
Step 7: Parallel Processing
People have asked me why IBM Watson is so big. Did we really need 2,880 cores of processing power? As a supercomputer, the 80 TeraFLOPs of IBM Watson would place it only in 94th place on the [Top 500 Supercomputers]. While IBM Watson may be the [Smartest Machine on Earth], the most powerful supercomputer at this time is the Tianhe-1A with more than 186,000 cores, capable of 2,566 TeraFLOPs.
To determine how big IBM Watson needed to be, the IBM Research team ran the DeepQA algorithm on a single core. It took 2 hours to answer a single Jeopardy question! Let's look at the performance data:
Number of cores
Time to answer one Jeopardy question
Single IBM Power750 server
< 4 minutes
Single rack (10 servers)
< 30 seconds
IBM Watson (90 servers)
< 3 seconds
The old adage applies, [many hands make for light work]. The idea is to divide-and-conquer. For example, if you wanted to find a particular street address in the Manhattan phone book, you could dispatch fifty pages to each friend and they could all scan those pages at the same time. This is known as "Parallel Processing" and is how supercomputers are able to work so well. However, not all algorithms lend well to parallel processing, and the phrase [nine women can't have a baby in one month] is often used to remind us of this.
Fortuantely, UIMA is designed for parallel processing. You need to install UIMA-AS for Asynchronous Scale-out processing, an add-on to the base UIMA Java framework, supporting a very flexible scale-out capability based on JMS (Java Messaging Services) and ActiveMQ. We will also need Apache Hadoop, an open source implementation used by Yahoo Search engine. Hadoop has a "MapReduce" engine that allows you to divide the work, dispatch pieces to different "task engines", and the combine the results afterwards.
Host 2 will run Hadoop and drive the MapReduce process. Plan to have three KVM guests on Host 1, four on Host 2, and three on Host 3. That means you have 10 task engines to work with. These task engines can be deployed for Content Readers, Analysis Engines, and CAS Consumers. When all processing is done, the resulting votes will be tabulated and the top answer displayed on the Query Panel on Host 1.
Step 8: Testing
To simplify testing, use a batch processing approach. Rather than entering questions by hand in the Query Panel, generate a long list of questions in a file, and submit for processing. This will allow you to fine-tune the environment, optimize for performance, and validate the answers returned.
There you have it. By the time you get your implementation fully operational, you will have learned a lot of useful skills, including Linux administration, Ethernet networking, NFS file system configuration, Java programming, UIMA text mining analysis, and MapReduce parallel processing. Hopefully, you will also gain an appreciation for how difficult it was for the IBM Research team to accomplish what they had for the Grand Challenge on Jeopardy! Not surprisingly, IBM Watson is making IBM [as sexy to work for as Apple, Google or Facebook], all of which started their business in a garage or a basement with a system as small as this version for personal use.
technorati tags: IBM, Watson, Jeopardy, Challenge, POWER7, EPA, Energy Star, RHEL, SLES, Ubuntu, Linux, UIMA, Hadoop, MapReduce, KVM, DeepQA, Roger Bannister, John Pultorak, Jay Shafer, George Hotz
, Eric Brown
|Tony Pearson holding his new XO laptop|
My XO laptop arrived Friday, December 21, this was from the [Give 1 Get 1 (G1G1)] program fromthe One Laptop Per Child (OLPC) foundation. The program continuesto the end of this month (December 31).
Here are my first impressions.
- Setup was Easy
Open the box, put in battery, and plug in the adapter. Enter your name and choose your favorite color for your stick figurine. No passwords, no parameters. Software is pre-installed and ready to use.
The four pages of instructions included how to open the unit (not intuitive), where the various connection ports are located, what the home screen and neighborhood screen look like, safety warnings, and a nice letter from Nicholas Negroponte with an 800 phone number and website in case more help is needed.
Connecting to the internet was the first thing I did. The neighborhood screen shows all the Wi-Fi access points. It recognized mineand three others. I clicked on mine, entered my WEP key, and was connected.
- Main Screen
This is a Linux operating system running the Sugar user interface.There are four screens:
- Neighborhood - shows all Wi-Fi access points
- Friends - shows all other XO laptops nearby, in my case I am all alone
- Home - your stick figurine with all the applications you can choose from are represented as icons at the bottom, just like OS X on my Mac Mini, or the launchpad on my Windows XP. Left panel for clipboard items.
- Application - Applications run in full-screen mode
Four buttons across the top allow you to jump to any screen instantly.Everything else is single left-click. No double-clicks or right-clicks.
A circle on the home screen designates which applications are running, and how much of the available 256MB RAM they are consuming. This makes it easy to seeif you can run more applications or need to shut something down. Youcan jump to any application, or shut it down, from this view.
Shutting down the XO is done by clicking your stick figurine,and choosing shutdown.
- Pre-installed Applications
I fired up the browser. The default 'home page' offers some help offline, as well as links to online resources and a google search bar. The full-color 1200x900 is very easy to read. You can hit ctrl+plus to make the fonts bigger. In bright sunlight, the screen turns automatically to greyscale.The built-in browser is easy enough to use, with standard back, forward, re-load, and bookmark buttons. The URL entry field also shows the pages title. It doesn't have tabs to see multiple pages at the same time, but I was able to fire up a second instance of the browser, so thatI could alt-tab back and forth between the two web sites.
There are so many applications that they don't all fit on the bottom of the screen.Left and right tab buttons will display the next set. I don't know if it is possible to re-order the icons, but I can certainly see some applications appealing to different ages, and perhaps re-ordering them into age-specific groups might be helpful.
Basic applications include the Abiword word processor, a PDF viewer, a simple paint program, calculator, chat, and news RSS feed reader; TamTam music to play and edit compositions; and some learn-to-program-a-computer software including Pippy, Etoys, and TurtleArt.
The 'record' program lets you take 640x480 pictures with the built-in camera, up to 45 seconds of video and audio recording. The picture abovewas taken with my XO, and edited online using [snipshot.com]. Another program can be usedto make video calls to another computer, similar to Skype or IBM Lotus Sametime.
- Connection ports
The XO has built-in microphone and speakers, but also microphone and speaker ports, as well as three USB ports, and a slot for an SD memory card.
The QWERTY keyboard is designed for small children hands, I found myself using my two index fingers in a hunt-and-peck style. People who use Blackberry's or other hand-held devices might be able to use their two thumbs instead. Also, I am not used to a touchpad as the pointing device. My other laptops have a red knob between the G/H/B keys that acts like a joystick. So, I decided to attach my Apple keyboard/mouse to one USB port, which allows me faster typing and better resolution with my mouse.
I also inserted a 1GB SD card into the slot. Getting to the SD slot was challenging--you have to rotate the screen 90 degrees so that the lower right corner is over the laptop handle. It appears I need to purchase some tweasers to get my SD card back out, so until then, it will remain there as permanent addition to my XO.
A terminal application provides a command line interface into Linux.
[olpc@xo-10-CC-6F ~] $ df -hFilesystem Size Used Avail Use% Mounted on mtd0 1.0G 365M 660M 36% /tmpfs 35M 0M 35M 0% /dev/shm/dev/mmcblk0p1 983M 7.9M 975M 1% /media/CANON_DCThe 'vi' editor is installed, in case I need to make changes to fstab or anythingelse in my /etc directory.
There is no S-video or VGA port. However, a teacher could probably fold thislaptop up in e-book mode and lay it flat on an [overhead projector] since the screen can handle bright sunlight in black-and-white mode.
- The Journal and the Clipboard
There are no folders or subdirectories here. The journal acts as your desktop, holding all the files you have referenced, sorted in chronological order with the most recent on top. The journal application is started automatically when you boot up.My SD card is shown as a separate entry at the bottom right corner, but I have access only to files on my top-level directory on the card. The journal allows you to drag and drop between the system and the SD flash card.The list can be filtered by file type and application, so finding things is easy.You can also copy anything in the journal to the clipboard, appearing on the leftpanel of the home screen. You can then launch or paste this into other applications.
Pressing Alt-1 takes a 1200x900 snapshot of the current screen, and puts it into the journal.On websites that allow you to upload a file, including GMAIL, snipshot.com, etc. the browse button brings up the journal. So, for example, you could take a snapshot of the current webpage or paint creation, and send it as an attachment to someone via GMAIL. Google has an XO-enabled version of GMAIL that you can download from the OLPC activities page.
This entire post, including the picture above, was done with the XO laptop itself. I am impressed with the thought that went into this design, and I see great potential here. The interface adequately hides the Linux operating system for those who just want to use the computer, but makes it readily accessible for those who want to learn more about the Linux operating system and computer programming.
technorati tags: OLPC, G1G1, XO
Fellow blogger Chuck Hollis from EMC has a post titled[Whither Frankenstorage
] causing quite a stir in the [Stor-o-Sphere
]. He is not the firstEMC blogger to use this phrase, I credit [BarryB
] for coining the term back in September 2008.Frankenstein serves as the ideal icon for EMC's FUD machine. In the novel, Dr. Frankenstein wasattempting to do something nobody else had ever attempted, to create human life from variousdead body parts, a process full of uncertainty and doubt, with frightful results.
Perhaps it was a coincidence that I discussed IBM's storage strategy in my post[Foundations and Flavorings] on January 28, shortly followed by NetApp's announcing V-series gateway [support of Texas Memory Systems' RamSan-500] on February 3. These two events mighthave been the trigger that pushed ChuckH over the edge to put
pen to paper, .. finger to keyboard.
Flinging FUD in all directions was ChuckH's not-so-subtle way to remind the world that EMC is the only major storage vendor to not offer a successful storage virtualization product. Withoutfirst-hand experience with well-designed storage virtualization, ChuckH conjectures that a configuration matching intelligent front-ends to reliable back-ends might be more expensive, might be more difficult to manage, or might be harder to support.
(Note: Rest assured, IBM can demonstrate that a modular approach, combining intelligent front-ends to reliableback-ends can help reduce costs, be easier to manage, and be fully supported. Contact yourlocal IBM Business Partner or storage sales rep for details.)
The reaction was notas much a blogfight and more of a [dog pile]. Defending NetApp were[Alex McDonald],[Kostadis Roussos], and [Stephen Foskett, Pack Rat]. On the HDS front, we have [Tony Asaro]. My fellow blogger from IBM took his swing with [How Quickly We Forget].And finally, pointing out EMC's hypocrisy, overall, was [James Or] from Storage Monkeys.
My favorite was from Nigel Poulton's post on[Ruptured Monkey]. Here's an excerpt:
In fact, I'm fairly certain that EMC don't back away from customers who run HP or IBM servers and say "sorry we cant help you here, an end to end HP or IBM solution would be much better for you when it comes to troubleshooting……. putting our storage in would only add extra layers of complexity and make things messy….."
On most other days, ChuckH has well-written, insightful blog posts that show that EMC brings some value to the industry. I could have made a snarky reference to[Dr Jekyll and Mr Hyde], or indicate this post proves that nobody at EMC is editing or reviewingChuck's thoughts before they get posted. But it's too late, Chuck already got the message, and added the following to bring the discussion back to civility:
When considering the broad range of storage media service levels available today (flash, FC, SATA, spin-down, etc.) what's the best way to offer these media choices in an array? Is the answer (a) combine smaller arrays from different vendors together behind a virtualization head, or (b) invest the time and effort to build arrays that can directly support all of these media types?
Would anyone like to try a cogent response to the question posed, please?
|To address ChuckH's question, Nigel's post gave me the idea to use today's 200th year celebration of [Charles Darwin].|
Over millions of years, Charles Darwin argued, evolution results in change in the inherited traits of a population of organisms from one generation to the next.A key component of this is a biological process called [mitosis] that allows a single cell to split and become two cells. In some cases, these individual daughter cells can then specialize to specific functions, such as nerve cells, muscle cells or bone cells. Over time, adaptations that work well carry forward, and thosethat don't get left behind.
I find it interesting that before [On the Origin of Species] was published in 1859, works of fiction like Mary Shelley's[Frankenstein] had monsters being"created", and afterward, monsters were the result of mutation or selective adaptation.
Nigel compares EMC's monolithic approach to placing an intelligent front-end with a reliable back-end as "One man band, where one guy is trying playing all the instruments himself" versus the "Philharmonic Orchestra". I would take it one step further, comparing single-cell organisms to multi-cell life forms.
Innovative companies like Google and Amazon can't wait for a completely integrated solution from a major IT vendor to meet their needs. Why should they? There are open standards, and ways to interconnect the best intelligence into a [dynamic infrastructure®.].You don't need to wait another million years to see which way the IT marketplace considers the better approach. Just look at the last 60 years. Back then, computer systems were all integrated, server, storage, and the wires that connected them were all inside a huge container. Then, mitosis happened, and IBM created external tape storage in 1952, and external disk storage in 1956. Open standards for interfaces allowed third party manufacturers like HDS, StorageTek and EMC to offer plug-compatible storage devices.
On the server side, it didn't take long for functionality in mainframes to split off. Mitosis happened again, with front-end UNIX systems processing incoming data, and mainframes handling the back-end data bases and printing. The client-server era replaced dumb terminals with more intelligent desktops and workstations, and these could handle the front-end processing to display information, with the back-end storage and number-crunching being handled by the UNIX and mainframe systems they connected to.Connections between desktops and servers, and from servers to storage, have also evolved. From thousands of direct-attach cables to networks of switches and directors.
Charles Darwin was particularly interested in cases where evolution happened faster or slower than in other cases. While IBM and Microsoft encouraged third-party innovations on the PC side, Apple resisted mitosis, trying to keep its machines pure single-cell, integrated solutions.For the same reasons that you can't fight the laws of nature, Apple ended up having to support I/O ports to external devices. Thanks to open standards like USB and Firewire, you can connect third-party storage to Apple computers. My little Mac Mini at home has more devices hanging off it than any of my Windows or Linux boxes! And Apple's iPod is successful because its iTunes software runs on both Windows and Mac OS operating systems.
Every time mitosis happens in the IT industry, it opens up opportunities to specialize, to innovate, to adapt to a dynamically changing world. When mitosis is suppressed, you get limiting products and frustratedengineers leaving to form their own start-up companies.But when mitosis is encouraged, you get successful products, solutions and partnerships positioned for a smarter planet.
Happy Valentines Day, Chuck!
technorati tags: EMC, Chuck Hollis, frankenstorage, Frankenstein, FUD, IBM, NetApp, TMS, V-series, RamSan-500, storage virtualization, FC, SATA, Charles Darwin, HDS, StorageTek, Microsoft, Apple, UNIX, Linux, Windows, iPod, iTunes, mitosis, Invista, EDL, NX4, Centera, Valentines Day, dynamic infrastructure, smarter planet
Wrapping up my coverage of the annual [2010 System Storage Technical University], I attended what might be perhaps the best session of the conference. Jim Nolting, IBM Semiconductor Manufacturing Engineer, presented the new IBM zEnterprise mainframe, "A New Dimension in Computing", under the Federal track.
The zEnterprises debunks the "one processor fits all" myth. For some I/O-intensive workloads, the mainframe continues to be the most cost-effective platform. However, there are other workloads where a memory-rich Intel or AMD x86 instance might be the best fit, and yet other workloads where the high number of parallel threads of reduced instruction set computing [RISC] such as IBM's POWER7 processor is more cost-effective. The IBM zEnterprise combines all three processor types into a single system, so that you can now run each workload on the processor that is optimized for that workload.
- IBM zEnterprise z196 Central Processing Complex (CPC)
Let's start with the new mainframe z196 central processing complex (CPC). Many thought this would be called the z11, but that didn't happen. Basically, the z196 machine has a maximum 96 cores versus z10's 64 core maximum, and each core runs 5.2GHz instead of z10's cores running at 4.7GHz. It is available in air-cooled and water-cooled models. The primary operating system that runs on this is called "z/OS", which when used with its integrated UNIX System Services subsystem, is fully UNIX-certified. The z196 server can also run z/VM, z/VSE, z/TPF and Linux on z, which is just Linux recompiled for the z/Architecture chip set. In my June 2008 post [Yes, Jon, there is a mainframe that can help replace 1500 servers], I mentioned the z10 mainframe had a top speed of nearly 30,000 MIPS (Million Instructions per Second). The new z196 machine can do 50,000 MIPS, a 60 percent increase!
(Update: Back in 2007, IBM and Sun mutually supported [OpenSolaris on an IBM System z mainframe]. Unfortunately, after Oracle acquired Sun, the OpenSolaris Governing Board has [grown uneasy over Oracle's silence] about the future of OpenSolaris on any platform. The OpenSolaris [download site] identifies 2009.06 as the latest release, but only for x86 and SPARC chip sets. Apparently, the 2010.03 release expected five months ago in March has slipped. Now it looks official that [OpenSolaris is Dead].)
The z196 runs a hypervisor called PR/SM that allows the box to be divided into dozens of logical partitions (LPAR), and the z/VM operating system can also act as a hypervisor running hundreds or thousands of guest OS images. Each core can be assigned a specialty engine "personality": GP for general processor, IFL for z/VM and Linux, zAAP for Java and XML processing, and zIIP for database, communications and remote disk mirroring. Like the z9 and z10, the z196 can attach to external disk and tape storage via ESCON, FICON or FCP protocols, and through NFS via 1GbE and 10GbE Ethernet.
- IBM zEnterprise BladeCenter Extension (zBX)
There is a new frame called the zBX that basically holds two IBM BladeCenter chassis, each capable of 14 blades, so total of 28 blades per zBX frame. For now, only select blade servers are supported inside, but IBM plans to expand this to include more as testing continues. The POWER-based blades can run native AIX, IBM's other UNIX operating system, and the x86-based blades can run Linux-x86 workloads, for example. Each of these blade servers can run a single OS natively, or run a hypervisor to have multiple guest OS images. IBM plans to look into running other POWER and x86-based operating systems in the future.
If you are already familiar with IBM's BladeCenter, then you can skip this paragraph. Basically, you have a chassis that holds 14 blades connected to a "mid-plane". On the back of the chassis, you have hot-swappable modules that snap into the other side of the mid-plane. There are modules for FCP, FCoE and Ethernet connectivity, which allows blades to talk to each other, as well as external storage. BladeCenter Management modules serve as both the service processor as well as the keyboard, video and mouse Local Console Manager (LCM). All of the IBM storage options available to IBM BladeCenter apply to zBX as well.
Besides general purpose blades, IBM will offer "accelerator" blades that will offload work from the z196. For example, let's say an OLAP-style query is issued via SQL to DB2 on z/OS. In the process of parsing the complicated query, it creates a Materialized Query Table (MQT) to temporarily hold some data. This MQT contains just the columnar data required, which can then be transferred to a set of blade servers known as the Smart Analytics Optimizer (SAO), then processes the request and sends the results back. The Smart Analytics Optimizer comes in various sizes, from small (7 blades) to extra large (56 blades, 28 in each of two zBX frames). A 14-blade configuration can hold about 1TB of compressed DB2 data in memory for processing.
- IBM zEnterprise Unified Resource Manager
You can have up to eight z196 machines and up to four zBX frames connected together into a monstrously large system. There are two internal networks. The Inter-ensemble data network (IEDN) is a 10GbE that connects all the OS images together, and can be further subdivided into separate virtual LANs (VLAN). The Inter-node management network (INMN) is a 1000 Mbps Base-T Ethernet that connects all the host servers together to be managed under a single pane of glass known as the Unified Resource Manager. It is based on IBM Systems Director.
By integrating service management, the Unified Resource Manager can handle Operations, Energy Management, Hypervisor Management, Virtual Server Lifecycle Management, Platform Performance Management, and Network Management, all from one place.
- IBM Rational Developer for System z Unit Test (RDz)
But what about developers and testers, such as those Independent Software Vendors (ISV) that produce mainframe software. How can IBM make their lives easier?
Phil Smith on z/Journal provides a history of [IBM Mainframe Emulation]. Back in 2007, three emulation options were in use in various shops:
- Open Mainframe, from Platform Solutions, Inc. (PSI)
- FLEX-ES, from Fundamental Software, Inc.
- Hercules, which is an open source package
None of these are viable options today. Nobody wanted to pay IBM for its Intellectual Property on the z/Architecture or license the use of the z/OS operating system. To fill the void, IBM put out an officially-supported emulation environment called IBM System z Professional Development Tool (zPDT) available to IBM employees, IBM Business Partners and ISVs that register through IBM Partnerworld. To help out developers and testers who work at clients that run mainframes, IBM now offers IBM Rational Developer for System z Unit Test, which is a modified version of zPDT that can run on a x86-based laptop or shared IBM System x server. Based on the open source [Eclipse IDE], the RDz emulates GP, IFL, zAAP and zIIP engines on a Linux-x86 base. A four-core x86 server can emulate a 3-engine mainframe.
With RDz, a developer can write code, compile and unit test all without consuming any mainframe MIPS. The interface is similar to Rational Application Developer (RAD), and so similar skills, tools and interfaces used to write Java, C/C++ and Fortran code can also be used for JCL, CICS, IMS, COBOL and PL/I on the mainframe. An IBM study ["Benchmarking IDE Efficiency"] found that developers using RDz were 30 percent more productive than using native z/OS ISPF. (I mention the use of RAD in my post [Three Things to do on the IBM Cloud]).
What does this all mean for the IT industry? First, the zEnterprise is perfectly positioned for [three-tier architecture] applications. A typical example could be a client-facing web-server on x86, talking to business logic running on POWER7, which in turn talks to database on z/OS in the z196 mainframe. Second, the zEnterprise is well-positioned for government agencies looking to modernize their operations and significantly reduce costs, corporations looking to consolidate data centers, and service providers looking to deploy public cloud offerings. Third, IBM storage is a great fit for the zEnterprise, with the IBM DS8000 series, XIV, SONAS and Information Archive accessible from both z196 and zBX servers.
To learn more, see the [12-page brochure] or review the collection of [IBM Redbooks]. Check out the [IBM Conferences schedule] for an event near you. Next year, the IBM Storage University will be held July 18-22, 2011 in Orlando, Flordia.
technorati tags: IBM, Technical University, zEnterprise, x86, POWER7, RISC, z/OS, Linux, AIX, OpenSolaris, Oracle, FICON, NFS, z196, zBX, DB2, SAO, IEDN, INMN, RDz, ISV, Eclipse, Cloud Computing
Well, it's Tuesday again, and that means more IBM announcements!
Today, IBM announced the enhanced IBM System Storage DS3200 disk system.It is in our DS3000 series, the DS3200 is SAS-attach, DS3300 is iSCSI-attach, and DS3400 is FC-attach. All of them support up to 48 drives, which can be a mix of SAS and SATA drives.
The DS3200 supports the following operating environments (see IBM's [Interop Matrix] for details):
- Microsoft Windows
- Linux (both Linux-x86 and Linux on POWER)
- Sun Solaris
- Novell NetWare
With today's announcements, the DS3200 can be used to boot from, as well as contain data. This is ideal to combine with IBM BladeCenter. With the IBM BladeCenter you can have 14 blades, either x86 or POWER based processors, attached to a DS3200 via SAS switch modules in the back of the chassis.
Let's take an example of how this can be used for a Scale-Out File Services[SoFS] deployment.
First, we start with servers. We can have either three [IBM System x3650] servers, but this would use up all six of the direct-attach ports. Instead, we'll choose the [BladeCenter H chassis], with three HS21 blades for SoFS, and that leaves us with eleven empty blade slots we could put in a management node, or other blades to run applications.
- SAS connectivity modules
The IBM BladeCenter [SAS Connectivity Module] allows the blade servers to connect to a DS3200. Two of them fit right in the back of the BladeCenter chassis, providing full redundancy without consuming additional rack space.
- DS3200 and EXP3000 expansion drawers
We'll have one DS3200 controller with twelve internal drives, and three expansion EXP3000 drawers with twelve drives each, for a total of 48 drives. Using 1TB SATA, this would be 48 TB raw capacity.
The end result? You get a 48TB NAS scalable storage solution, supporting up to 7500 concurrent CIFS and NFS users, with up to 700 MB/sec with large block transfers. By using BladeCenter, you can expand performance by adding more blades to the Chassis, or have some blades running SAP or Oracle RAC have direct read/write access to the SoFS data.
Just another example on how IBM can bring together all the components of a solution to provide customer value!
technorati tags: IBM, DS3200, BladeCenter, Linux, AIX, Windows, Solaris, VMware, NetWare, POWER, SAS, EXP3000, SATA, CIFS, NFS, SoFS
Are you tired of hearing about Cloud Computing without having any hands-on experience? Here's your chance. IBM has recently launched its IBM Development and Test Cloud beta. This gives you a "sandbox" to play in. Here's a few steps to get started:
- Go to the [IBM Developer & Test in the IBM Cloud beta] dashboard and register for an account. This can be your email address and your own password. You can watch the helpful videos.
- Generate a "key pair". There are two keys. A "public" key that will reside in the cloud, and a "private" key that you download to your personal computer. Don't lose this key.
- Request an IP address. This step is optional, but I went ahead and got a static IP, so I don't have to type in long hostnames like "vm353.developer.ihost.com".
- Request storage space. Again, this step is optional, but you can request a 50GB, 100GB and 200GB LUN. I picked a 200GB LUN. Note that each instance comes with some 10 to 30GB storage already. The advantage to a storage LUN is that it is persistent, and you can mount it to different instances.
- Start an "instance". An "instance" is a virtual machine, pre-installed with whatever software you chose from the "asset catalog". These are Linux images running under Red Hat Enterprise Virtualization (RHEV) which is based on Linux's kernel virtual machine (KVM). When you start an instance, you get to decide its size (small, medium, or large), whether to use your static IP address, and where to mount your storage LUN. On the examples below, I had each instance with a static IP and mounted the storage LUN to /media/storage subdirectory. The process takes a few minutes.
- Download some programs on your personal computer. I downloaded an SSH client called [PuTTY] and an [NX client from NoMachine].
So, now that you are ready to go, what instance should you pick from the catalog? Here are three examples to get you started:
- IBM WebSphere sMASH Application Builder
- Base OS server to run LAMP stack
Next, I decided to try out one of the base OS images. There are a lot of books on Linux, Apache, MySQL and PHP (LAMP) which represents nearly 70 percent of the web sites on the internet. This instance let's you install all the software from scratch. Between Red Hat and Novell SUSE distributions of Linux, Red Hat is focused on being the Hypervisor of choice, and SUSE is focusing on being the Guest OS of choice. Most of the images on the "asset catalog" are based on SLES 10 SP2. However, there was a base OS image of Red Hat Enterprise Linux (RHEL) 5.4, so I chose that.
To install software, you either have to find the appropriate RPM package, or download a tarball and compile from source. To try both methods out, I downloaded tarballs of Apache Web Server and PHP, and got the RPM packages for MySQL. If you just want to learn SQL, there are instances on the asset catalog with DB2 and DB2 Express-C already pre-installed. However, if you are already an expert in MySQL, or are following a tutorial or examples based on MySQL from a classroom textbook, or just want a development and test environment that matches what your company uses in production, then by all means install MySQL.
This is where my SSH client comes in handy. I am able to login to my instance and use "wget" to fetch the appropriate files. An alternative is to use "SCP" (also part of PuTTY) to do a secure copy from your personal computer up to the instance. You will need to do everything via command line interface, including editing files, so I found this [VI cheat sheet] useful. I copied all of the tarballs and RPMs on my storage LUN ( /media/storage ) so as not to have to download them again.
Compiling and configuring them is a different matter. By default, you login as an end user, "idcuser" (which stands for IBM Developer Cloud user). However, sometimes you need "root" level access. Use "sudo bash" to get into root level mode, and this allows you to put the files where they need to be. If you haven't done a configure/make/make install in awhile, here's your chance to relive those "glory days".
In the end, I was able to confirm that Apache, MySQL and PHP were all running correctly. I wrote a simple index.php that invoked phpinfo() to show all the settings were set correctly. I rebooted the instance to ensure that all of the services started at boot time.
- Rational Application Developer over VDI
This last example, I started an instance pre-installed with Rational Application Developer (RAD), which is a full Integrated Development Environment (IDE) for Java and J2EE applications. I used the "NX Client" to launch a virtual desktop image (VDI) which in this case was Gnome on SLES 10 SP2. You might want to increase the screen resolution on your personal computer so that the VDI does not take up the entire screen.
From this VDI, you can launch any of the programs, just as if it were your own personal computer. Launch RAD, and you get the familiar environment. I created a short Java program and launched it on the internal WebSphere Application Server test image to confirm it was working correctly.
If you are thinking, "This is too good to be true!" there is a small catch. The instances are only up and running for 7 days. After that, they go away, and you have to start up another one. This includes any files you had on the local disk drive. You have a few options to save your work:
- Copy the files you want to save to your storage LUN. This storage LUN appears persistent, and continues to exist after the instance goes away.
- Take an "image" of your "instance", a function provided in the IBM Developer and Test Cloud. If you start a project Monday morning, work on it all week, then on Friday afternoon, take an "image". This will shutdown your instance, and backup all of the files to your own personal "asset catalog" so that the next time you request an instance, you can chose that "image" as the starting point.
Another option is to request an "extension" which gives you another 7 days for that instance. You can request up to five unique instances running at the same time, so if you wanted to develop and test a multi-host application, perhaps one host that acts as the front-end web server, another host that does some kind of processing, and a third host that manages the database, this is all possible. As far as I can tell, you can do all the above from either a Windows, Mac or Linux personal computer.
Getting hands-on access to Cloud Computing really helps to understand this technology!
technorati tags: , IBM, Development, Test, Cloud, WebSphere, sMASH, WaveMaker, LAMP, Linux, Apache, MySQL, PHP, Rational, RAD, VDI, , RHEL, RHEV, KVM, SUSE, SLES, Novell, NX, PuTTY, SSH
Wrapping up this week's theme on the XO laptop, I decided to take on thechallenge of printing. I managed to print from my XO laptop to my laserjet printer.I checked the One Laptop Per Child [OLPC
] website,and found there is no built-in support for printers, but there have been several peopleasking how to print from the XO, so here are the steps I did to make it happen.
(Note: I did all of these steps successfully on my Qemu-emulated system first, and then performed them on my XO laptop)
- Step 1: Determine if you have an acceptable printer
The XO laptop can only connect to a printer via USB cable or over the network.Check your printer to see if it supports either of these two options. In my case, my printer is connected to my Linksys hub that offers Wi-Fi in my home.
The XO runs a modified version of Red Hat's Fedora 7, so we need to also determineif the printer is supported on Linux.Check the [Open Printing Database]for the level of support. This database has come up with the following ranking system.Printers are categorized according to how well they work under Linux and Unix. The ratings do not pertain to whether or not the printer will be auto-recognized or auto-configured, but merely to the highest level of functionality achieved.
- Perfectly - everything the printer can do is working also under Linux
- Mostly - work almost perfectly - funny enhanced resolution modes may be missing, or the color is a bit off, but nothing that would make the printouts not useful
- Partially - mostly don't work; you may be able to print only in black and white on a color printer, or the printouts look horrible
- Paperweight - These printers don't work at all. They may work in the future, but don't count on it
If your printer only supports a parallel cable connection, or does not have a high enough ranking above, go buy another printer. The [Linux Foundation] websiteoffers a list of suggested printers and tutorials.
In my case, I have a Brother HL5250-DN black-and-white laserjet printer connected over a network to Windows XP, OS X and my other Linux systems. It is rated as supporting Linux perfectly, so I decided to use this for my XO laptop.
- Step 2: Install Common UNIX Printing System (CUPS)
Technically, Linux is not UNIX, but for our purposes, close enough. Start the Terminalactivity, use "su" to change to root, and then use "yum" to install CUPS. Yum will automatically determine what other packages are needed, in this case paps and tmpwatch. Once installed, use "/usr/sbin/cupsd" to get the CUPS daemon started, and add this to the end ofrc.local so that it gets started every time you reboot.
Click graphic on the left to see larger view
[olpc@xo-10-CC-6F ~]$ subash-3.2# yum install cups...Total download size = 3.0 MIs this OK [y/N]? y
bash-3.2# /usr/sbin/cupsdbash-3.2# echo "/usr/sbin/cupsd" >> /etc/rc.d/rc.localbash-3.2# exit[olpc@xo-10-CC-6F ~]$
- Step 3: Install Opera or Firefox browser
To download the appropriate drivers, you may need a browser that can handle file downloads. I have triedto do this with the built-in Browse activity (aka Gecko) but encountered problems. I have both Opera and Firefox installed, but I will focus on Opera for this effort.I also installed the older220.127.116.11 version of the Flash player (worked better than the latest 18.104.22.168 version) and Java JRE.Follow the OLPC Wiki instructions for [Opera, Adobe Flash,and Sun Java] installation, thenverify with the following [Java and Flash] testers.
- Step 4: Download drivers and packages unique for your printer
In my case, I used Opera to get to the [Brother Linux Driver Homepage], and downloaded the RPM's for LPR and CUPS wrapper. These are the ones listed under "Drivers for Red Hat, Mandrake (Mandriva), SuSE". I saved these under "/home/olpc" directory.
[olpc@xo-10-CC-6F ~]$ subash-3.2# cd /home/olpcbash-3.2# rpm -vi brhl5250dnlpr-2.0.1-1.i386.rpmbash-3.2# rpm -vi cupswrapperHL5250DN-2.0.1-1.i386.rpmbash-3.2# exit[olpc@xo-10-CC-6F ~]$
- Step 5: Create a "root" password
By default, the root user has no password. However, you will need it to be something for later steps,so here is the process to create a root password. I set mine to "tony" which normallywould be considered too simple a password, but ignore those messages and continue.We will remove it in step 8 (below) to put things back to normal.
[olpc@xo-10-CC-6F ~]$ subash-3.2# passwdChanging password for user root.New UNIX password: tonyBAD PASSWORD: it is too shortRetype new UNIX password: tonypasswd: all authentication tokens updated successfullybash-3.2# exit[olpc@xo-10-CC-6F ~]$
- Step 6: Launch CUPS administration
Here I followed the instructions in Robert Spotswood's [Printing In Linux with CUPS] tutorial.Launch the Opera browser, and enter "http://localhost:631/admin" as the URL. The localhostrefers to the laptop itself, and 631 is the special port that CUPS listens to from browsers. You can alsouse 127.0.0.1 as a shortcut for "localhost", and can be used interchangeably.
In my case, it detected both of my networked printers, so I selected the HL5250DN, entered thelocation of my PPD file "/usr/share/cups/model/HL5250DN.ppd" that was created in Step 4. I set the URI to "lpd://192.168.0.75/binary_p1" per the instructions [Network Setting in CUPS based Linux system] in the Brother FAQ page. I chage the page size from "A4" to "Letter".I set this printer as the default printer. When it asks for userid and password, that is whereyou would enter "root" for the user, and "tony" or whatever you decided to set your root password to.
Select "Print a Test Page" to verify that everything is working.
- Step 7: Printing actual files
Sadly, I don't know Opera well enough to know how to print from there. So, I went over to my trustedFirefox browser. Select File->Page Setup to specify the settings, File->Print Preview tosee what it will look like, and then File->Print to send it to the printer.
To print the file "out.txt" that is in your /home/olpc directory, for example, enter"file:///home/olpc/out.txt" as the URL of the firefox browser. This will show the file,which you can then print to your printer. I had to specify 200% scaling otherwise the fontswere too small to read.
- Step 8: Remove the "root" password
If you want to remove the root password, here are the steps.
[olpc@xo-10-CC-6F ~]$ suPassword: tonybash-3.2# passwd -d rootRemoving password for user root.passwd: Successbash-3.2# exit[olpc@xo-10-CC-6F ~]$
Now the problem is that there is no way to print stuff from any of the Sugar activities. The best place toput in print support would be the Journal
activity. Along the bottom where the mounted USB keys arelocated could be an icon for a printer, and dragging a file down to the printer ojbect could cause it tobe send to the printer.
The alternative is to write some scripts invocable from the Terminal activity to determine what isin the journal, and send them to LPR with the appropriate parameters.
I did not have time to do either of these, but perhaps someone out there can take on that as a project.
technorati tags: OLPC, XO, printing, printer, linux, Opera, Firefox, Java, Flash
Since so many personal and corporate users are still on [Windows XP], Microsoft announced that it would provide [Extended Support until 2014]. A ComputerWorld article back in 2007 offered tips on [How to make Windows XP last for the next seven years]. From May 2009 to April 2014, all support is fee-based and non-security hotfixes are produced only for corporate customers.
If we have learned anything from last decade's Y2K crisis, is that we should not wait for the last minute to take action. Now is the time to start thinking about weaning ourselves off Windows XP. IBM has 400,000 employees, so this is not a trivial matter.
Already, IBM has taken some bold steps:
Last July, IBM announced that it was switching from Internet Explorer (IE6) to [Mozilla Firefox as its standard browser]. IBM has been contributing to this open source project for years, including support for open standards, and to make it [more accessible to handicapped employees with visual and motor impairments]. I use Firefox already on Windows, Mac and Linux, so there was no learning curve for me. Before this announcement, if some web-based application did not work on Firefox, our Helpdesk told us to switch back to Internet Explorer. Those days are over. Now, if a web-based application doesn't work on Firefox, we either stop using it, or it gets fixed.
IBM also announced the latest [IBM Lotus Symphony 3] software, which replaces Microsoft Office for Powerpoint, Excel and Word applications. Symphony also works across Mac, Windows and Linux. It is based on the OpenOffice open source project, and handles open-standard document formats (ODF). Support for Microsoft Office 2003 will also run out in the year 2014, so moving off proprietary formats to open standards makes sense.
I am not going to wait for IBM to decide how to proceed next, so I am starting my own migrations. In my case, I need to do it twice, on my IBM-provided laptop as well as my personal PC at home.
- IBM-provided laptop
Last summer, IBM sent me a new laptop, we get a new one every 3-4 years. It was pre-installed with Windows XP, but powerful enough to run a 64-bit operating system in the future. Here are my series of blog posts on that:
I decided to try out Red Hat Enterprise Linux 6.1 with its KVM-based Red Hat Enterprise Virtualization to run Windows XP as a guest OS. I will try to run as much as I can on native Linux, but will have Windows XP guest as a next option, and if that still doesn't work, reboot the system in native Windows XP mode.
Here's is how I have configured my laptop:
|/dev/sda1||35GB||NTFS||C:||Windows XP SP3 operating system and programs|
|/dev/sda2||15GB||ext4||/(root)||Ubuntu Desktop 10.10, SystemRescueCD, Clonezilla, Parted Magic|
|/dev/sda3||55GB||ext4||/(root)||RHEL 6.1 with KVM to run Windows XP as guest OS|
|/dev/sda6||130GB||NTFS||D:||My Documents, Lotus Notes and other data|
|/dev/sda7||70GB||NTFS||E:||Extras and Archives|
Basically, this is a multi-boot system. I use Ubuntu to hold all my Linux utilities, including [SystemRescueCD], [Clonezilla], and [Parted Magic]. The new [Grub2 loader] makes this easy.
Here is what my initial boot screen looks like:
So far, I am pleased that I can do nearly everything my job requires natively in Red Hat Linux, including accessing my Lotus Notes for email and databases, edit and present documents with Lotus Symphony, and so on. I have made RHEL 6.1 my default when I boot up. Setting up Windows XP under KVM was relatively simple, involving an 8-line shell script and 54-line XML file. Here is what I have encountered:
- We use a wonderful tool called "iSpring Pro" which merges Powerpoint slides with voice recordings for each page into a Shockwave Flash video. I have not yet found a Linux equivalent for this yet.
- To avoid having to duplicate files between systems, I use instead symbolic links. For example, my Lotus Notes local email repository sits on D: drive, but I can access it directly with a link from /home/tpearson/notes/data.
- While my native Ubuntu and RHEL Linux can access my C:, D: and E: drives in native NTFS file system format, the irony is that my Windows XP guest OS under KVM cannot. This means moving something from NTFS over to Ext4, just so that I can access it from the Windows XP guest application.
- For whatever reason, "Password Safe" did not run on the Windows XP guest. I launch it, but it takes forever to load and never brings up the GUI. Fortunately, there is a Linux version [MyPasswordSafe] that seems to work just fine to keep track of all my passwords.
- Personal home PC
My Windows XP system at home gave up the ghost last month, so I bought a new system with Windows 7 Professional, quad-core Intel processor and 6GB of memory. There are [various editions of Windows 7], but I chose Windows 7 Professional to support running Windows XP as a guest image.
Here's is how I have configured my personal computer:
|/dev/sda1||104MB||NTFS||C:||Windows 7 Loader|
|/dev/sda2||10GB||ext4||/(root)||Ubuntu Desktop 10.10, SystemRescueCD, Clonezilla, Parted Magic|
|/dev/sda6||60GB||NTFS||C:||Windows 7 OS and programs|
|/dev/sda7||230GB||NTFS||D:||My Documents, Lotus Notes and other data|
|/dev/sda8||250GB||NTFS||E:||Extras and Archives|
I actually found it more time-consuming to implement the "Virtual PC" feature of Windows 7 to get Windows XP mode working than KVM on Red Hat Linux. I am amazed how many of my Windows XP programs DO NOT RUN AT ALL natively on Windows 7. I now have native 64-bit versions of Lotus Notes and Symphony 3, which will do well enough for me for now.
I went ahead and put Red Hat Linux on my home system as well, but since I have Windows XP running as a guest under Windows 7, no need to duplicate KVM setup there. At least if I have problems with Windows 7, I can reboot in RHEL6 Linux at home and use that for Linux-native applications.
Hopefully, this will position me well in case IBM decides to either go with Windows 7 or Linux as the replacement OS for Windows XP.
technorati tags: IBM, Windows, Windows XP, Windows 7, Linux, Ubuntu, RedHat, RHEL, RHEL6, Lotus, Lotus Notes, Lotus Symphony
After my response to Jon Toigo on Drunken Data
withmy post [Elevenanswers about Deduplication
], Jon follows up with a request to validate the numbersquoted in the February 26 Press Release[IBM launches New “System z10” Mainframe
], particularly the estimate that a single mainframecan handle to the workload of 1500 x86 servers, in his post[A Bit More Blegging
]. The timing is perfect in that IBM launched[the next wave of Project Big Green
] today.To avoid sounding like an [editorial from the New York Sun
],I checked the facts, and spoke to the person in IBM who did all the calculations. Jon, as always, you have my permissionto publish this on your site if you want.
( I cannot take credit for coining the new term "bleg". I saw this term firstused over on the [FreakonomicsBlog]. If you have not yet read the book "Freakonomics", I highly recommend it! The authors' blog is excellent as well.)
For this comparison, it is important to figure out how much workload a mainframe can support, how much an x86 cansupport, and then divide one from the other. Sounds simple enough, right? And what workload should you choose?IBM chose a business-oriented "data-intensive" workload using Oracle database. (If you wanted instead a scientific"compute-intensive" workload, consider an [IBM supercomputer] instead, the most recent of which clocked in over 1 quadrillion floating point operations per second, or PetaFLOP.) IBM compares the following two systems:
- Sun Fire X2100 M2, model 1220 server (2-way)
IBM did not pick a wimpy machine to compare against. The model 1220 is the fastest in the series, with a 2.8Ghz x86-64 dual-core AMD Opteron processor, capable of running various levels of Solaris, Linux or Windows.In our case, we will use Oracle workloads running on Red Hat Enterprise Linux.All of the technical specifications are available at the[Sun Microsystems Sun Fire X1200] Web site.I am sure that there are comparable models from HP, Dell or even IBM that could have been used for this comparison.
- IBM z10 Enterprise Class mainframe model E64 (64-way)
This machine can run a variety of operating systems also, including Red Hat Enterprise Linux (RHEL). The E64 has four "multiple processor modules" called"processor books" for a total of 77 processing units: 64 central processors, 11 system assist processors (SAP) and 2 spares. That's right, spare processors, in case any others gobad, IBM has got your back. You can designate a central processor in a variety of flavors. For running z/VM and Linux operating systems, the central processors can be put into "Integrated Facility for Linux" (IFL) mode.On IT Jungle, Timothy Patrick Morgan explains the z10 EC in his article[IBM Launches 64-Way z10 Enterprise Class Mainframe Behemoth]. For more information on the z10 EC, see the 110-page [Technical Introduction], orread the specifications on the[IBM z10 EC] Web site.
Moving Oracle workloads from x86 over to mainframe is quite commonsince [IBM and Cisco joined forces to meet Linux On Mainframe demand]. For more information on consolidating x86 servers running Oracle over to a mainframe, read the [Quick Reference], IBMRedbook titled ["Using Oracle Solutionson Linux for System z"], or this [presentation by Jim Elliott, IBM System z Specialist].
In a shop full of x86 servers, there are production servers, test and development servers, quality assuranceservers, standby idle servers for high availability, and so on. On average, these are only 10 percent utilized.For example, consider the following mix of servers:
- 125 Production machines running 70 percent busy
- 125 Backup machines running idle ready for active failover in case a production machine fails
- 1250 machines for test, development and quality assurance, running at 5 percent average utilization
While [some might question, dispute or challenge thisten percent] estimate, it matches the logic used to justify VMware, XEN, Virtual Iron or other virtualization technologies. Running 10 to 20 "virtual servers" on a single physical x86 machine assumes a similar 5-10 percent utilization rate.
Note: The following paragraphs have been revised per comments received.
|Now the math. Jon, I want to make it clear I was not involved in writing the press release nor assisted with thesemath calculations. Please, don't shoot the messenger! Remember this cartoon where two scientists in white lab coats are writing mathcalculations on a chalkboard, and in the middle there is "and then a miracle happens..." to continue the rest ofthe calculations?|
In this case, the miracle is the number that compares one server hardware platform to another. I am not going to bore people with details like the number of concurrent processor threads or the differencesbetween L1 and L3 cache. IBM used sophisticated tools and third party involvement that I am not allowed to talk about, and I have discussed this post with lawyers representing
four (now five) different organizations already,so for the purposes of illustration and explanation only, I have reverse-engineered a new z10-to-Opteron conversion factor as 6.866 z10 EC MIPS per GHz of dual-core AMD Opteron for I/O-intensive workloads running only 10 percent average CPU utilization. Business applications that perform a lot of I/O don't use their CPU as much as other workloads.For compute-intensive or memory-intensive workloads, the conversion factor may be quite different, like 200 MIPS per GHz, as Jeff Savit from Sun Microsystems points out in the comments below.
Keep in mind that each processor is different, and we now have Intel, AMD, SPARC, PA-RISC and POWER (and others); 32-bit versus 64-bit; dual-core and quad-core; and different co-processor chip sets to worry about. AMD Opteron processors come in different speeds, but we are comparing against the 2.8GHz, so 1500 times 6.866 times 2.8 is 28,337. Since these would be running as Linux guestsunder z/VM, we add an additional 7 percent overhead or 2,019 MIPS. We then subtract 15 percent for "smoothing", whichis what happens when you consolidate workloads that have different peaks and valleys in workload, or 4,326 MIPS.The end is that we need a machine to do 26,530 MIPS. Thanks to advances in "Hypervisor" technological synergy between the z/VM operating system and the underlying z10 EC hardware, the mainframe can easily run 90 percent utilized when aggregating multiple workloads, so a 29,477 MIPS machine running at 90 percent utilization can handle these 26,530 MIPS.
N-way machines, from a little 2-way Sun Fire X2100 to the might 64-way z10 EC mainframe, are called "Symmetric Multiprocessors". All of the processors or cores are in play, but sometimes they have to taketurns, wait for exclusive access on a shared resource, such as cache or the bus. When your car is stopped at a red light, you are waiting for your turn to use the shared "intersection". As a result, you don't get linear improvement, but rather you get diminishing returns. This is known generically as the "SMP effect", and in IBM documentsthis as [Large System Performance Reference].While a 1-way z10 EC can handle 920 MIPS, the 64-way can only handle30,657 MIPS. The 29,477 MIPS needed for the Sun x2100 workload can be handled by a 61-way, giving you three extraprocessors to handle unexpected peaks in workload.
But are 1500 Linux guest images architecturally possible? A long time ago, David Boyes of[Sine Nomine Associates] ran 41,400 Linux guest images on a single mainframe using his [Test Plan Charlie], and IBM internallywas able to get 98,000 images, and in both cases these were on machines less powerful than the z10 EC. Neitherof these were tests ran I/O intensive workloads, but extreme limits are always worth testing. The 1500-to-1 reduction in IBM's press release is edge-of-the-envelope as well, so in production environments, several hundred guest images are probably more realistic, and still offer significant TCO savings.
The z10 EC can handle up to 60 LPARs, and each LPAR can run z/VM which acts much like VMware in allowing multipleLinux guests per z/VM instance. For 1500 Linux guests, you could have 25 guests each on 60 z/VM LPARs, or 250 guests on each of six z/VM LPARs, or 750 guests on two LPARs. with z/VM 5.3, each LPAR can support up to 256GB of memory and 32 processors, so you need at least two LPAR to use all 64 engines. Also, there are good reasons to have different guests under different z/VM LPARs, such as separating development/test from production workloads. If you had to re-IPLa specific z/VM LPAR, it could be done without impacting the workloads on other LPARs.
To access storage, IBM offers N-port ID Virtualization (NPIV). Without NPIV, two Linux guest images could not accessthe same LUN through the same FCP port because this would confuse the Host Bus Adapter (HBA), which IBM calls "FICON Express" cards. For example, Linux guest 1 asks to read LUN 587 block 32 and this is sent out a specific port, to a switch, to a disk system. Meanwhile, Linux guest 2 asks to read LUN 587 block 49. The data comes back to the z10 EC with the data, gives it to the correct z/VM LPAR, but then what? How does z/VM know which of the many Linux guests to give the data to? Both touched the same LUN, so it is unclear which made the request. To solve this, NPIV assigns a virtual "World Wide Port Name" (WWPN), up to 256 of them per physical port, so you can have up to 256 Linux guests sharing the same physical HBA port to access the same LUN.If you had 250 guests on each of six z/VM LPARs, and each LPAR had its own set of HBA ports, then all 1500 guestscould access the same LUN.
Yes, the z10 EC machines support Sysplex. The concept is confusing, but "Sysplex" in IBM terminology just means that you can have LPARs either on the same machine or on separate mainframes, all sharing the same time source, whether this be a "Sysplex Timer" or by using the "Server Time Protocol" (STP). The z10 EC can have STP over 6 Gbps Infiniband over distance. If you wantedto have all 1500 Linux guests time stamp data identically, all six z/VM LPARs need access to the shared time source. This can help in a re-do or roll-back situation for Oracle databases to complete or back-out "Units of Work" transactions. This time stamp is also used to form consistency groups in "z/OS Global Mirror", formerly called "XRC" for Extended Remote Distance Copy. Currently, the "timestamp" on I/O applies only to z/OS and Linux and not other operating systems. (The time stamp is done through the CDK driver on Linux, and contributed back to theopen source community so that it is available from both Novell SUSE and Red Hat distributions.)To have XRC have consistency between z/OS and Linux, the Linux guests would need to access native CKD volumes,rather than VM Minidisks or FCP-oriented LUNs.
Note: this is different than "Parallel Sysplex" which refers to having up to 32 z/OS images sharing a common "Coupling Facility" which acts as shared memory for applications. z/VM and Linux do not participate in"Parallel Sysplex".
As for the price, mainframes list for as little as "six figures" to as much as several million dollars, but I have no idea how much this particular model would cost. And, of course, this is just the hardware cost. I could not find the math for the $667 per server replacement you mentioned, so don't have details on that.You would need to purchase z/VM licenses, and possibly support contracts for Linux on System z to be fully comparable to all of the software license and support costs of the VMware, Solaris, Linux and/or Windows licenses you run on the x86 machines.
This is where a lot of the savings come from, as a lot of software is licensed "per processor" or "per core", and so software on 64 mainframe processors can be substantially less expensive than 1500 processors or 3000 cores.IBM does "eat its own cooking" in this case. IBM is consolidating 3900 one-application-each rack-mounted serversonto 30 mainframes, for a ratio of 130-to-1 and getting amazingly reduced TCO. The savings are in the followingareas:
- Hardware infrastructure. It's not just servers, but racks, PDUs, etc. It turns out to be less expensive to incrementally add more CPU and storage to an existing mainframe than to add or replace older rack-em-and-stack-emwith newer models of the same.
- Cables. Virtual servers can talk to each other in the same machine virtually, such as HiperSockets, eliminatingmany cables. NPIV allows many guests to share expensive cables to external devices.
- Networking ports. Both LAN and SAN networking gear can be greatly reduced because fewer ports are needed.
- Administration. We have Universities that can offer a guest image for every student without having a majorimpact to the sys-admins, as the students can do much of their administration remotely, without having physicalaccess to the machinery. Companies uses mainframe to host hundreds of virtual guests find reductions too!
- Connectivity. Consolidating distributed servers in many locations to a mainframe in one location allows youto reduce connections to the outside world. Instead of sixteen OC3 lines for sixteen different data centers, you could have one big OC48 line instead to a single data center.
- Software licenses. Licenses based on servers, cores or CPUs are reduced when you consolidate to the mainframe.
- Floorspace. Generally, floorspace is not in short supply in the USA, but in other areas it can be an issue.
- Power and Cooling. IBM has experienced significant reduction in power consumption and cooling requirementsin its own consolidation efforts.
All of the components of DFSMS (including DFP, DFHSM, DFDSS and DFRMM) were merged into a single product "DFSMS for z/OS" and is now an included element in the base z/OS operating system. As a result of these, customers typically have 80 to 90 percent utilization on their mainframe disk. For the 1500 Linux guests, however, most of the DFSMS features of z/OS do not apply. These functions were not "ported over" to z/VM nor Linux on any platform.
Note: DFSMS can backup or dump Linux on System z partitions or volumes. See this [Appendix C. HOWTO backup Linux data through z/OS] for details.
Instead, the DFSMS concepts have been re-implemented into a new product called "Scale-Out File Services" (SOFS) which would provide NAS interfaces to a blendeddisk-and-tape environment. The SOFS disk can be kept at 90 percent utilization because policies can place data, movedata and even expire files, just like DFSMS does for z/OS data sets. SOFS supports standard NAS protocols such as CIFS,NFS, FTP and HTTP, and these could be access from the 1500 Linux guests over an Ethernet Network Interface Card (NIC), which IBM calls "OSA Express" cards.
Lastly, IBM z10 EC is not emulating x86 or x86-64 interfaces for any of these workloads. No doubt IBM and AMD could collaborate together to come up with an AMD Opteron emulator for the S/390 chipset, and load Windows 2003 right on top of it, but that would just result in all kinds of emulation overhead.Instead, Linux on System z guests can run comparable workloads. There are many Linux applications that are functionally equivalent or the same as their Windows counterparts. If you run Oracle on Windows, you could runOracle on Linux. If you run MS Exchange on Windows, you could run Bynari on Linux and let all of your Outlook Expressusers not even know their Exchange server had been moved! Linux guest images can be application servers, web servers, database servers, network infrastructure servers, file servers, firewall, DNS, and so on. For nearly any business workload you can assign to an x86 server in a datacenter, there is likely an option for Linux on System z.
Hope this answers all of your questions, Jon. These were estimates based on basic assumptions. This is not to imply that IBM z10 EC and VMware are the only technologies that help in this area, you can certainly find virtualization on other systems and through other software.I have asked IBM to make public the "TCO framework" that sheds more light on this.As they say, "Your mileage may vary."
For more on this series, check out the following posts:
If in your travels, Jon, you run into someone interested to see how IBM could help consolidate rack-mounted servers over to a z10 EC mainframe, have them ask IBM for a "Scorpion study". That is the name of the assessment that evaluates a specific clientsituation, and can then recommend a more accurate estimate configuration.
technorati tags: Jon Toigo, DrunkenData, bleg, IBM, z10, EC, E64, mainframe, x86, AMD, Opteron, Sun, Fire, X2100, petaFLOP, Freakonomics, Red Hat, RHEL, IFL, VMware, Jim Elliott, Xen, Virtual Iron, Solaris, Linux, Windows, Project Big Green, Infiniband, STP, Sysplex, Scorpion study, MS Exchange, Bynari, Oracle
Continuing my rant from Monday's post [Time for a New Laptop], I got my new laptop Wednesday afternoon. I was hoping the transition would be quick, but that was not the case. Here were my initial steps prior to connecting my two laptops together for the big file transfer:
- Document what my old workstation has
Back in 2007, I wrote a blog post on how to [Separate Programs from Data]. I have since added a Linux partition for dual-boot on my ThinkPad T60.
|/dev/sda1||26GB||NTFS||C:||Windows XP SP3 operating system and programs|
|/dev/sda2||12GB||ext3||/(root)||Red Hat Enterprise Linux 5.4|
|/dev/sda6||80GB||NTFS||D:||My Documents and other data|
I also created a spreadsheet of all my tools, utilities and applications. I combined and deduplicated the list from the following sources:
- Control Panel -> Add/Remove programs
- C:\Program Files
- Start -> Programs panels
- Program taskbar at bottom of screen
The last one was critical. Over the years, I have gotten in the habit of saving those ZIP or EXE files that self-install programs into a separate directory, D:/Install-Files, so that if I had to unintsall an application, due to conflicts or compatability issues, I could re-install it without having to download them again.
So, I have a total of 134 applications, which I have put into the following rough categories:
- AV - editing and manipulating audio, video or graphics
- Files - backup, copy or manipulate disks, files and file systems
- Browser - Internet Explorer, Firefox, Opera and Google Chrome
- Communications - Lotus Notes and Lotus Sametime
- Connect - programs to connect to different Web and Wi-Fi services
- Demo - programs I demonstrate to clients at briefings
- Drivers - attach or sync to external devices, cell phones, PDAs
- Games - not much here, the basic solitaire, mindsweeper and pinball
- Help Desk - programs to diagnose, test and gather system information
- Projects - special projects like Second Life or Lego Mindstorms
- Lookup - programs to lookup information, like American Airlines TravelDesk
- Meeting - I have FIVE different webinar conferencing tools
- Office - presentations, spreadsheets and documents
- Platform - Java, Adobe Air and other application runtime environments
- Player - do I really need SIXTEEN different audio/video players?
- Printer - print drivers and printer management software
- Scanners - programs that scan for viruses, malware and adware
- Tools - calculators, configurators, sizing tools, and estimators
- Uploaders - programs to upload photos or files to various Web services
- Backup my new workstation
My new ThinkPad T410 has a dual-core i5 64-bit Intel processor, so I burned a 64-bit version of [Clonezilla LiveCD] and booted the new system with that. The new system has the following configuration:
|/dev/sda1||320GB||NTFS||C:||Windows XP SP3 operating system, programs and data|
There were only 14.4GB of data, it took 10 minutes to backup to an external USB disk. I ran it twice: first, using the option to dump the entire disk, and the second to dump the selected partition. The results were roughly the same.
- Run Workstation Setup Wizard
The Workstation Setup Wizard asks for all the pertinent location information, time zone, userid/password, needed to complete the installation.
- Re-Partition Disk Drive
I burned a 64-bit version of [System Rescue CD] and ran [Gparted] to re-partition this disk into the following:
|/dev/sda1||40GB||NTFS||C:||Windows XP SP3 operating system and programs|
|/dev/sda2||15GB||ext3||/(root)||Ubuntu Desktop 10.04 LTS|
|/dev/sda6||245GB||NTFS||D:||My Documents and other data|
- Redefine Windows directory structure
I made two small changes to connect C: to D: drive.
- Changed "My Documents" to point to D:\Documents which will move the files over from C: to D: to accomodate its new target location. See [Microsoft procedure] for details.
- Edited C:\notes\notes.ini to point to D:\notes\data to store all the local replicas of my email and databases.
- Install Ubuntu Desktop 10.04 LTS
My plan is to run Windows and Linux guests through virtualization. I decided to try out Ubuntu Desktop 10.04 LTS, affectionately known as Lucid Lynx, which can support a variety of different virtualization tools, including KVM, VirtualBox-OSE and Xen. I have two identical 15GB partitions (sda2 and sda3) that I can use to hold two different systems, or one can be a subdirectory of the other. For now, I'll leave sda3 empty.
- Take another backup of my new workstation
I took a fresh new backup of paritions (sda1, sda2, sda6) with Clonezilla.
The next step involved a cross-over Ethernet cable, which I don't have. So that will have to wait until Thursday morning.
technorati tags: IBM, Lenovo, ThinkPad, T60, T410, Intel, Clonezilla, SysRescCD, Gparted, Windows, Ubuntu, Linux, Lucid, LTS