Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Last week, I got the following comment from Bob Swann:
I am looking for the IBM VM Poster or a picture of the IBM VM "Catch the Wave"
Do you know where I might find it?
Well, Bob, I made some phone calls. The company that published these posters no longer exists, butI found a coworker at the Poughkeepsie Briefing Center who still had the poster on his wall, and he was kind enough to take a picture of it for you.
VM: The Wave of the Future (click thumbnail at left to see larger image)
Some may recognize this as a [mash-up] using as a base the famous Japanese 10-inch by 15-inch block print[The Great Wave off Kanagawa] byartist [Katsushika Hokusai]. I had this as my laptop'swallpaper screen image until last year when I was presenting in Kuala Lumpur, Malaysia. I was told that it reminded people about the horrible tsunami caused by the [Indian Ocean earthquake] back in 2004.I was actually scheduled to fly the last week of December 2004 to Jakarta, Indonesia, but at the last minute ourclient team changed plans. I would have been on route over the Pacific ocean when the tsunami hit, and probably stranded over there for weeks or months until the airports re-opened.
The Wave theme was in part to honor the IBM users group called World Alliance VSE VM and Linux (WAVV) which is havingtheir next meeting [April 18-22, 2008] in Chattanooga, Tennessee. I presentedat this conference back in 1996 in Green Bay, Wisconsin, as part of the IBM Linux for S/390 team. It started onthe Sunday that Wisconsin switched their clocks for [DaylightSaving Time], and the few of us from Arizona or other places that don't both with this, all showed up forbreakfast an hour early.
When I was in Australia last year, I was told the wave that sports fans do, by raising their hands in coordinatedsequence, was called the [Mexican Wave]in most other countries. When I was there, Melbourne was trying to outlaw this practice at their cricket matches.
The "wave" represents a powerful metaphor, from z/VM operating system on System z mainframes to VMware and Xenon Intel-based processor machines, as the direction of virtualization that we are heading for future data centers.The Mexican wave represents a glimpse of what humans can accomplish with collaboration on a globalscale. It can also represent the tidal wave of data arising from nearly 60 percent annual growth instorage capacity. (I had to mention storage eventually, to avoid being completely off-topic on this post!)
I hope this is the graphic you were looking for Bob. If anyone else has wave-themed posters they would like to contribute, please post a comment below.
A long time ago, perhaps in the early 1990s, I was an architect on the component known today as DFSMShsm on z/OS mainframe operationg system. One of my job responsibilities was to attend the biannual [SHARE conference to listen to the requirements of the attendees on what they would like added or changed to the DFSMS, and ask enough questions so that I can accurately present the reasoning to the rest of the architects and software designers on my team. One person requested that the DFSMShsm RELEASE HARDCOPY should release "all" the hardcopy. This command sends all the activity logs to the designated SYSOUT printer. I asked what he meant by "all", and the entire audience of 120 some attendees nearly fell on the floor laughing. He complained that some clever programmer wrote code to test if the activity log contained only "Starting" and "Ending" message, but no error messages, and skip those from being sent to SYSOUT. I explained that this was done to save paper, good for the environment, and so on. Again, howls of laughter. Most customers reroute the SYSOUT from DFSMS from a physical printer to a logical one that saves the logs as data sets, with date and time stamps, so having any "skipped" leaves gaps in the sequence. The client wanted a complete set of data sets for his records. Fair enough.
When I returned to Tucson, I presented the list of requests, and the immediate reaction when I presented the one above was, "What did he mean by ALL? Doesn't it release ALL of the logs already?" I then had to recap our entire dialogue, and then it all made sense to the rest of the team. At the following SHARE conference six months later, I was presented with my own official "All" tee-shirt that listed, and I am not kidding, some 33 definitions for the word "all", in small font covering the front of the shirt.
I am reminded of this story because of the challenges explaining complicated IT concepts using the English language which is so full of overloaded words that have multiple meanings. Take for example the word "protect". What does it mean when a client asks for a solution or system to "protect my data" or "protect my information". Let's take a look at three different meanings:
The first meaning is to protect the integrity of the data from within, especially from executives or accountants that might want to "fudge the numbers" to make quarterly results look better than they are, or to "change the terms of the contract" after agreements have been signed. Clients need to make sure that the people authorized to read/write data can be trusted to do so, and to store data in Non-Erasable, Non-Rewriteable (NENR) protected storage for added confidence. NENR storage includes Write-Once, Read-Many (WORM) tape and optical media, disk and disk-and-tape blended solutions such as the IBM Grid Medical Archive Solution (GMAS) and IBM Information Archive integrated system.
The second meaning is to protect access from without, especially hackers or other criminals that might want to gather personally-identifiably information (PII) such as social security numbers, health records, or credit card numbers and use these for identity theft. This is why it is so important to encrypt your data. As I mentioned in my post [Eliminating Technology Trade-Offs], IBM supports hardware-based encryption FDE drives in its IBM System Storage DS8000 and DS5000 series. These FDE drives have an AES-128 bit encryption built-in to perform the encryption in real-time. Neither HDS or EMC support these drives (yet). Fellow blogger Hu Yoshida (HDS) indicates that their USP-V has implemented data-at-rest in their array differently, using backend directors instead. I am told EMC relies on the consumption of CPU-cycles on the host servers to perform software-based encryption, either as MIPS consumed on the mainframe, or using their Powerpath multi-pathing driver on distributed systems.
There is also concern about internal employees have the right "need-to-know" of various research projects or upcoming acquisitions. On SANs, this is normally handled with zoning, and on NAS with appropriate group/owner bits and access control lists. That's fine for LUNs and files, but what about databases? IBM's DB2 offers Label-Based Access Control [LBAC] that provides a finer level of granularity, down to the row or column level. For example, if a hospital database contained patient information, the doctors and nurses would not see the columns containing credit card details, the accountants would not see the columnts containing healthcare details, and the individual patients, if they had any access at all, would only be able to access the rows related to their own records, and possibly the records of their children or other family members.
The third meaning is to protect against the unexpected. There are lots of ways to lose data: physical failure, theft or even incorrect application logic. Whatever the way, you can protect against this by having multiple copies of the data. You can either have multiple copies of the data in its entirety, or use RAID or similar encoding scheme to store parts of the data in multiple separate locations. For example, with RAID-5 rank containing 6+P+S configuration, you would have six parts of data and one part parity code scattered across seven drives. If you lost one of the disk drives, the data can be rebuilt from the remaining portions and written to the spare disk set aside for this purpose.
But what if the drive is stolen? Someone can walk up to a disk system, snap out the hot-swappable drive, and walk off with it. Since it contains only part of the data, the thief would not have the entire copy of the data, so no reason to encrypt it, right? Wrong! Even with part of the data, people can get enough information to cause your company or customers harm, lose business, or otherwise get you in hot water. Encryption of the data at rest can help protect against unauthorized access to the data, even in the case when the data is scattered in this manner across multiple drives.
To protect against site-wide loss, such as from a natural disaster, fire, flood, earthquake and so on, you might consider having data replicated to remote locations. For example, IBM's DS8000 offers two-site and three-site mirroring. Two-site options include Metro Mirror (synchronous) and Global Mirror (asynchronous). The three-site is cascaded Metro/Global Mirror with the second site nearby (within 300km) and the third site far away. For example, you can have two copies of your data at site 1, a third copy at nearby site 2, and two more copies at site 3. Five copies of data in three locations. IBM DS8000 can send this data over from one box to another with only a single round trip (sending the data out, and getting an acknowledgment back). By comparison, EMC SRDF/S (synchronous) takes one or two trips depending on blocksize, for example blocks larger than 32KB require two trips, and EMC SRDF/A (asynchronous) always takes two trips. This is important because for many companies, disk is cheap but long-distance bandwidth is quite expensive. Having five copies in three locations could be less expensive than four copies in four locations.
Fellow blogger BarryB (EMC Storage Anarchist) felt I was unfair pointing out that their EMC Atmos GeoProtect feature only protects against "unexpected loss" and does not eliminate the need for encryption or appropriate access control lists to protect against "unauthorized access" or "unethical tampering".
(It appears I stepped too far on to ChuckH's lawn, as his Rottweiler BarryB came out barking, both in the [comments on my own blog post], as well as his latest titled [IBM dumbs down IBM marketing (again)]. Before I get another rash of comments, I want to emphasize this is a metaphor only, and that I am not accusing BarryB of having any canine DNA running through his veins, nor that Chuck Hollis has a lawn.)
As far as I know, the EMC Atmos does not support FDE disks that do this encryption for you, so you might need to find another way to encrypt the data and set up the appropriate access control lists. I agree with BarryB that "erasure codes" have been around for a while and that there is nothing unsafe about using them in this manner. All forms of RAID-5, RAID-6 and even RAID-X on the IBM XIV storage system can be considered a form of such encoding as well. As for the amount of long-distance bandwidth that Atmos GeoProtect would consume to provide this protection against loss, you might question any cost savings from this space-efficient solution. As always, you should consider both space and bandwidth costs in your total cost of ownership calculations.
Of course, if saving money is your main concern, you should consider tape, which can be ten to twenty times cheaper than disk, affording you to keep a dozen or more copies, in as many time zones, at substantially lower cost. These can be encrypted and written to WORM media for even more thorough protection.
I am still wiping the coffee off my computer screen, inadvertently sprayed when I took a sip while reading HDS' uber-blogger Hu Yoshida's post on storage virtualization and vendor lock-in.
HDS is a major vendor for disk storage virtualization, and Hu Yoshida has been around for a while, so I felt it was fair to disagree with some of the generalizations he made to set the record straight. He's been more careful ever since.
However, his latest post [The Greening of IT: Oxymoron or Journey to a New Reality] mentions an expert panel at SNW that includedMark O’Gara Vice President of Infrastructure Management at Highmark. I was not at the SNW conference last week in Orlando, so I will just give the excerpt from Hu's account of what happened:
"Later I had the opportunity to have lunch with Mark O’Gara. Mark is a West Point graduate so he takes a very disciplined approach to addressing the greening of IT. He emphasized the need for measurements and setting targets. When he started out he did an analysis of power consumption based on vendor specifications and came up with a number of 513 KW for his data center infrastructure....
The physical measurements showed that the biggest consumers of power were in order: Business Intelligence Servers, SAN Storage, Robotic tape Library, and Virtual tape servers....
Another surprise may be that tape libraries are such large consumers of power. Since tape is not spinning most of the time they should consume much less power than spinning disk - right? Apparently not if they are sitting in a robotic tape library with a lot of mechanical moving parts and tape drives that have to accelerate and decelerate at tremendous speeds. A Virtual Tape Library with de-duplication factor of 25:1 and large capacity disks may draw significantly less power than a robotic tape library for a given amount of capacity.
Obviously, I know better than to sip coffee whenever reading Hu's blog. I am down here in South America this week, the coffee is very hot and very delicious, so I am glad I didn't waste any on my laptop screen this time, especially reading that last sentence!
In that report, a 5-year comparison found that a repository based on SATA disk was 23 times more expensive overall, and consumed 290 times more energy, than a tape library based on LTO-4 tape technology. The analysts even considered a disk-based Virtual Tape Library (VTL). Focusing just on backups, at a 20:1 deduplication ratio, the VTL solution was still 5 times per expensive than the tape library. If you use the 25:1 ratio that Hu Yoshida mentions in his post above, that would still be 4 times more than a tape library.
I am not disputing Mark O'Gara's disciplined approach. It is possible that Highmark is using a poorly written backup program, taking full backups every day, to an older non-IBM tape library, in a manner that causes no end of activity to the poor tape robotics inside. But rather than changing over to a VTL, perhaps Mark might be better off investigating the use of IBM Tivoli Storage Manager, using progressive backup techniques, appropriate policies, parameters and settings, to a more energy-efficient IBM tape library.In well tuned backup workloads, the robotics are not very busy. The robot mounts the tape, and then the backup runs for a long time filling up that tape, all the meanwhile the robot is idle waiting for another request.
(Update: My apologies to Mark and his colleagues at Highmark. The above paragraph implied that Mark was using badproducts or configured them incorrectly, and was inappropriate. Mark, my full apology [here])
If you do decide to go with a Virtual Tape Library, for reasons other than energy consumption, doesn't it make sense to buy it from a vendor that understands tape systems, rather than buying it from one that focuses on disk systems? Tape system vendors like IBM, HP or Sun understand tape workloads as well as related backup and archive software, and can provide better guidance and recommendations based on years of experience. Asking advice abouttape systems, including Virtual Tape Libraries, from a disk vendor is like asking for advice on different types of bread from your butcher, or advice about various cuts of meat at the bakery.
The butchers and bakers might give you answers, but it may not be the best advice.
Are you tired of hearing about Cloud Computing without having any hands-on experience? Here's your chance. IBM has recently launched its IBM Development and Test Cloud beta. This gives you a "sandbox" to play in. Here's a few steps to get started:
Generate a "key pair". There are two keys. A "public" key that will reside in the cloud, and a "private" key that you download to your personal computer. Don't lose this key.
Request an IP address. This step is optional, but I went ahead and got a static IP, so I don't have to type in long hostnames like "vm353.developer.ihost.com".
Request storage space. Again, this step is optional, but you can request a 50GB, 100GB and 200GB LUN. I picked a 200GB LUN. Note that each instance comes with some 10 to 30GB storage already. The advantage to a storage LUN is that it is persistent, and you can mount it to different instances.
Start an "instance". An "instance" is a virtual machine, pre-installed with whatever software you chose from the "asset catalog". These are Linux images running under Red Hat Enterprise Virtualization (RHEV) which is based on Linux's kernel virtual machine (KVM). When you start an instance, you get to decide its size (small, medium, or large), whether to use your static IP address, and where to mount your storage LUN. On the examples below, I had each instance with a static IP and mounted the storage LUN to /media/storage subdirectory. The process takes a few minutes.
So, now that you are ready to go, what instance should you pick from the catalog? Here are three examples to get you started:
IBM WebSphere sMASH Application Builder
Base OS server to run LAMP stack
Next, I decided to try out one of the base OS images. There are a lot of books on Linux, Apache, MySQL and PHP (LAMP) which represents nearly 70 percent of the web sites on the internet. This instance let's you install all the software from scratch. Between Red Hat and Novell SUSE distributions of Linux, Red Hat is focused on being the Hypervisor of choice, and SUSE is focusing on being the Guest OS of choice. Most of the images on the "asset catalog" are based on SLES 10 SP2. However, there was a base OS image of Red Hat Enterprise Linux (RHEL) 5.4, so I chose that.
To install software, you either have to find the appropriate RPM package, or download a tarball and compile from source. To try both methods out, I downloaded tarballs of Apache Web Server and PHP, and got the RPM packages for MySQL. If you just want to learn SQL, there are instances on the asset catalog with DB2 and DB2 Express-C already pre-installed. However, if you are already an expert in MySQL, or are following a tutorial or examples based on MySQL from a classroom textbook, or just want a development and test environment that matches what your company uses in production, then by all means install MySQL.
This is where my SSH client comes in handy. I am able to login to my instance and use "wget" to fetch the appropriate files. An alternative is to use "SCP" (also part of PuTTY) to do a secure copy from your personal computer up to the instance. You will need to do everything via command line interface, including editing files, so I found this [VI cheat sheet] useful. I copied all of the tarballs and RPMs on my storage LUN ( /media/storage ) so as not to have to download them again.
Compiling and configuring them is a different matter. By default, you login as an end user, "idcuser" (which stands for IBM Developer Cloud user). However, sometimes you need "root" level access. Use "sudo bash" to get into root level mode, and this allows you to put the files where they need to be. If you haven't done a configure/make/make install in awhile, here's your chance to relive those "glory days".
In the end, I was able to confirm that Apache, MySQL and PHP were all running correctly. I wrote a simple index.php that invoked phpinfo() to show all the settings were set correctly. I rebooted the instance to ensure that all of the services started at boot time.
Rational Application Developer over VDI
This last example, I started an instance pre-installed with Rational Application Developer (RAD), which is a full Integrated Development Environment (IDE) for Java and J2EE applications. I used the "NX Client" to launch a virtual desktop image (VDI) which in this case was Gnome on SLES 10 SP2. You might want to increase the screen resolution on your personal computer so that the VDI does not take up the entire screen.
From this VDI, you can launch any of the programs, just as if it were your own personal computer. Launch RAD, and you get the familiar environment. I created a short Java program and launched it on the internal WebSphere Application Server test image to confirm it was working correctly.
If you are thinking, "This is too good to be true!" there is a small catch. The instances are only up and running for 7 days. After that, they go away, and you have to start up another one. This includes any files you had on the local disk drive. You have a few options to save your work:
Copy the files you want to save to your storage LUN. This storage LUN appears persistent, and continues to exist after the instance goes away.
Take an "image" of your "instance", a function provided in the IBM Developer and Test Cloud. If you start a project Monday morning, work on it all week, then on Friday afternoon, take an "image". This will shutdown your instance, and backup all of the files to your own personal "asset catalog" so that the next time you request an instance, you can chose that "image" as the starting point.
Another option is to request an "extension" which gives you another 7 days for that instance. You can request up to five unique instances running at the same time, so if you wanted to develop and test a multi-host application, perhaps one host that acts as the front-end web server, another host that does some kind of processing, and a third host that manages the database, this is all possible. As far as I can tell, you can do all the above from either a Windows, Mac or Linux personal computer.
Getting hands-on access to Cloud Computing really helps to understand this technology!
I've gotten suggestions to upgrade the memory and disk storage, and how to fine-tune the Microsoft Windows XP operating system. Others suggested replacing the OS with Linux, and to use the Cloud to avoid some of the storage space limitations.
But first, I have to mention the latest in our series of "Enterprise Systems" videos. The first was being [Data Ready]. The second was being [Security Ready]. The now the third in the series: the 3-minute
[Cloud Ready] video.
So I decided to try different Cloud-oriented Operating Systems, to see if any would be a good fit. Here is what I found:
(FTC Disclosure: I work for IBM and own IBM stock. This blog post is not meant to endorse one OS over another. I have financial interests in, and/or have friends and family who work at some of the various companies mentioned in this post. Some of these companies also have business relationships with IBM.)
Jolicloud and Joli OS 1.2
I gave this OS a try. This is based on Linux, but with an interesting approach. First, you have to be on-line all the time, and this OS is designed for 15-25 year-olds who are on social media websites like Facebook. By having a Jolicloud account, you can access this from any browser on any system, or run the Joli OS operating system, or buy the already pre-installed Jolibook netbook computer.
The Joli OS 1.2 LiveCD ran fine on my T410 with 4GB or RAM, giving me a chance to check it out, but sadly did not run on grandma's Thinkpad R31 with 384MB of RAM. According to the [Jolicloud specifications], Joli OS should run in as little as 384MB of RAM and 2GB of disk storage space, but it didn't for me.
Google Chrome and Chromium OS Vanilla
Like the Jolibook, Google has come out with a $249 Chromebook laptop that runs their "Chrome OS". This is only available via OEM install on desginated hardware, but the open source version is available called Chromium OS. These are also based on Linux.
Rather than compiling from source, Hexxeh has made nightly builds available. You can download [Chromium OS Vanilla] zip file, unzip the image file, and copy it to a 4GB USB memory stick. The compressed image is about 300MB, but uncompressed about 2.5GB, so too big to fit on a CD. The image on the USB stick is actually two partitions, and cannot be run from DVD either.
If you don't have a 4GB USB stick handy, and want to see what all the fuss is about, just install the Google Chrome browser on your Windows or Linux system, and then maximize the browser window. That's it. That is basically what Chromium OS is all about.
Files can be stored locally, or out on your Google Drive. Documents can be edited using "Google Docs" in the Cloud. You can run in "off-line" mode, for example, read your Gmail notes when not connected to the Internet. Music and video files can be played using the "Files" app.
If you really need to get out of the browser, you can hit the right combination of keys to get to the "crosh" command line shell.
Like Joli OS, I was able to run this from my Thinkpad T410 with 4GB of RAM, but not on grandma's Thinkpad R31. It appears that Chromium requires at least 1GB of RAM to run properly.
Android for x86
While researching the Chromium OS, I found that there is an open source community porting [Android to the x86] platform. Android is based on Linux, and would allow your laptop or netbook to run very much like a smartphone or tablet. Most of the apps available to Android should work here as well.
Unfortunately, the project has focused only on selected hardware:
ASUS Eee PCs/Laptops
Viewsonic Viewpad 10
Dell Inspiron Mini Duo
Lenovo ThinkPad x61 Tablet
I tried running the Thinkpad x61 version on both my Thinkpad T410 and grandma's Thinkpad R31, but with no success.
Peppermint OS Three
Next up was Peppermint OS, which claims to be a blend of Linux Mint, Lubuntu, and Xfce, but with a "twist" of aspiring to be a Cloud-oriented OS.
Rather than traditional apps to write documents or maintain a calendar, this OS offers a "Single-Site Browser" (SSB) experience, where you can configure "apps" by pointing to their respective URL. For documents, launch GWoffice, the client for Google Docs. For calendar, launch Google Calendar.
Most Linux distros have both a number and a project name associated with them. For example, Ubuntu 10.04 LTS is known as "Lucid Lynx". The Peppermint OS team avoided this by just calling their latest version "Three" which serves as both its number and its name.
The browser is Chromium, similar to Google Chrome OS above, and uses the "DuckDuckGo" search engine. This is how the Peppermint OS folks make their money to defray the costs of this effort.
Peppermint OS claims to run in systems as little as 192MB or RAM, and only 4GB of disk space. The LiveCD ran well on both my Thinkpad T410, as well as grandma's Thinkpad R31. More importantly, when I installed on the hard drive, it ran well.
The music app "Guayadeque" that came pre-installed was awful. It couldn't play MP3 music out-of-the-box. I had to install the Codec plugins from various "ubuntu-restricted-extras" libraries. I also installed the music app "Rhythmbox", and that worked great. Time from power-on to first-note was less than 2 minutes! However, the problems with the Guayadeque gave me the impression this OS might not be ready for primetime.
I contacted grandma to ask if she has Wi-Fi in her home, and sure enough, she doesn't. Her PC upstairs is direct attached to the cable modem. So, while the Cloud suggestion was worthy of investigation, I will continue to pursue other options that do not require being connected. I certainly do not want to spend any time and effort getting Wi-Fi installed there.