Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
Last week, I got the following comment from Bob Swann:
I am looking for the IBM VM Poster or a picture of the IBM VM "Catch the Wave"
Do you know where I might find it?
Well, Bob, I made some phone calls. The company that published these posters no longer exists, butI found a coworker at the Poughkeepsie Briefing Center who still had the poster on his wall, and he was kind enough to take a picture of it for you.
VM: The Wave of the Future (click thumbnail at left to see larger image)
Some may recognize this as a [mash-up] using as a base the famous Japanese 10-inch by 15-inch block print[The Great Wave off Kanagawa] byartist [Katsushika Hokusai]. I had this as my laptop'swallpaper screen image until last year when I was presenting in Kuala Lumpur, Malaysia. I was told that it reminded people about the horrible tsunami caused by the [Indian Ocean earthquake] back in 2004.I was actually scheduled to fly the last week of December 2004 to Jakarta, Indonesia, but at the last minute ourclient team changed plans. I would have been on route over the Pacific ocean when the tsunami hit, and probably stranded over there for weeks or months until the airports re-opened.
The Wave theme was in part to honor the IBM users group called World Alliance VSE VM and Linux (WAVV) which is havingtheir next meeting [April 18-22, 2008] in Chattanooga, Tennessee. I presentedat this conference back in 1996 in Green Bay, Wisconsin, as part of the IBM Linux for S/390 team. It started onthe Sunday that Wisconsin switched their clocks for [DaylightSaving Time], and the few of us from Arizona or other places that don't both with this, all showed up forbreakfast an hour early.
When I was in Australia last year, I was told the wave that sports fans do, by raising their hands in coordinatedsequence, was called the [Mexican Wave]in most other countries. When I was there, Melbourne was trying to outlaw this practice at their cricket matches.
The "wave" represents a powerful metaphor, from z/VM operating system on System z mainframes to VMware and Xenon Intel-based processor machines, as the direction of virtualization that we are heading for future data centers.The Mexican wave represents a glimpse of what humans can accomplish with collaboration on a globalscale. It can also represent the tidal wave of data arising from nearly 60 percent annual growth instorage capacity. (I had to mention storage eventually, to avoid being completely off-topic on this post!)
I hope this is the graphic you were looking for Bob. If anyone else has wave-themed posters they would like to contribute, please post a comment below.
Wrapping up my week's theme of storage optimization, I thought I would help clarify the confusion between data reduction and storage efficiency. I have seen many articles and blog posts that either use these two terms interchangeably, as if they were synonyms for each other, or as if one is merely a subset of the other.
Data Reduction is LOSSY
By "Lossy", I mean that reducing data is an irreversible process. Details are lost, but insight is gained. In his paper, [Data Reduction Techniques", Rajana Agarwal defines this simply:
"Data reduction techniques are applied where the goal is to aggregate or amalgamate the information contained in large data sets into manageable (smaller) information nuggets."
Data reduction has been around since the 18th century.
Take for example this histogram from [SearchSoftwareQuality.com]. We have reduced ninety individual student scores, and reduced them down to just five numbers, the counts in each range. This can provide for easier comprehension and comparison with other distributions.
The process is lossy. I cannot determine or re-create an individual student's score from these five histogram values.
This next example, complements of [Michael Hardy], represents another form of data reduction known as ["linear regression analysis"]. The idea is to take a large set of data points between two variables, the x axis along the horizontal and the y axis along the vertical, and find the best line that fits. Thus the data is reduced from many points to just two, slope(a) and intercept(b), resulting in an equation of y=ax+b.
The process is lossy. I cannot determine or re-create any original data point from this slope and intercept equation.
In this last example, from [Yahoo Finance], reduces millions of stock trades to a single point per day, typically closing price, to show the overall growth trend over the course of the past year.
The process is lossy. Even if I knew the low, high and closing price of a particular stock on a particular day, I would not be able to determine or re-create the actual price paid for individual trades that occurred.
Storage Efficiency is LOSSLESS
By contrast, there are many IT methods that can be used to store data in ways that are more efficient, without losing any of the fine detail. Here are some examples:
Thin Provisioning: Instead of storing 30GB of data on 100GB of disk capacity, you store it on 30GB of capacity. All of the data is still there, just none of the wasteful empty space.
Space-efficient Copy: Instead of copying every block of data from source to destination, you copy over only those blocks that have changed since the copy began. The blocks not copied are still available on the source volume, so there is no need to duplicate this data.
Archiving and Space Management: Data can be moved out of production databases and stored elsewhere on disk or tape. Enough XML metadata is carried along so that there is no loss in the fine detail of what each row and column represent.
Data Deduplication: The idea is simple. Find large chunks of data that contain the same exact information as an existing chunk already stored, and merely set a pointer to avoid storing the duplicate copy. This can be done in-line as data is written, or as a post-process task when things are otherwise slow and idle.
When data deduplication first came out, some lawyers were concerned that this was a "lossy" approach, that somehow documents were coming back without some of their original contents. How else can you explain storing 25PB of data on only 1PB of disk?
(In some countries, companies must retain data in their original file formats, as there is concern that converting business documents to PDF or HTML would lose some critical "metadata" information such as modificatoin dates, authorship information, underlying formulae, and so on.)
Well, the concern applies only to those data deduplication methods that calculate a hash code or fingerprint, such as EMC Centera or EMC Data Domain. If the hash code of new incoming data matches the hash code of existing data, then the new data is discarded and assumed to be identical. This is rare, and I have only read of a few occurrences of unique data being discarded in the past five years. To ensure full integrity, IBM ProtecTIER data deduplication solution and IBM N series disk systems chose instead to do full byte-for-byte comparisons.
Compression: There are both lossy and lossless compression techniques. The lossless Lempel-Ziv algorithm is the basis for LTO-DC algorithm used in IBM's Linear Tape Open [LTO] tape drives, the Streaming Lossless Data Compression (SLDC) algorithm used in IBM's [Enterprise-class TS1130] tape drives, and the Adaptive Lossless Data Compression (ALDC) used by the IBM Information Archive for its disk pool collections.
Last month, IBM announced that it was [acquiring Storwize. It's Random Access Compression Engine (RACE) is also a lossless compression algorithm based on Lempel-Ziv. As servers write files, Storwize compresses those files and passes them on to the destination NAS device. When files are read back, Storwize retrieves and decompresses the data back to its original form.
As with tape, the savings from compression can vary, typically from 20 to 80 percent. In other words, 10TB of primary data could take up from 2TB to 8TB of physical space. To estimate what savings you might achieve for your mix of data types, try out the free [Storwize Predictive Modeling Tool].
So why am I making a distinction on terminology here?
Data reduction is already a well-known concept among specific industries, like High-Performance Computing (HPC) and Business Analytics. IBM has the largest marketshare in supercomputers that do data reduction for all kinds of use cases, for scientific research, weather prediction, financial projections, and decision support systems. IBM has also recently acquired a lot of companies related to Business Analytics, such as Cognos, SPSS, CoreMetrics and Unica Corp. These use data reduction on large amounts of business and marketing data to help drive new sources of revenues, provide insight for new products and services, create more focused advertising campaigns, and help understand the marketplace better.
There are certainly enough methods of reducing the quantity of storage capacity consumed, like thin provisioning, data deduplication and compression, to warrant an "umbrella term" that refers to all of them generically. I would prefer we do not "overload" the existing phrase "data reduction" but rather come up with a new phrase, such as "storage efficiency" or "capacity optimization" to refer to this category of features.
IBM is certainly quite involved in both data reduction as well as storage efficiency. If any of my readers can suggest a better phrase, please comment below.
(FTC Disclosure: I do not work or have any financial investments in ENC Security Systems. ENC Security Systems did not paid me to mention them on this blog. Their mention in this blog is not an endorsement of either their company or any of their products. Information about EncryptStick was based solely on publicly available information and my own personal experiences. My friends at ENC Security Systems provided me a full-version pre-loaded stick for this review.)
The EncryptStick software comes in two flavors, a free/trial version, and the full/paid version. The free trial version has [limits on capacity and time] but provides enough glimpse of the product to decide before you buy the full version. You can download the software yourself and put in on your own USB device, or purchase the pre-loaded stick that comes with the full-version license.
Whichever you choose, the EncryptStick offers three nice protection features:
Encryption for data organized in "storage vaults", which can be either on the stick itself, or on any other machine the stick is connected to. That is a nice feature, because you are not limited to the capacity of the USB stick.
Encrypted password list for all your websites and programs.
A secure browser, that prevents any key-logging or malware that might be on the host Windows machine.
I have tried out all three functions and everything works as advertised. However, there is always room for improvement, so here are my suggestions.
The first problem is that the pre-loaded stick looks like it is worth a million dollars. It is in a shiny bronze color with "EncryptStick" emblazoned on it. This is NOT subtle advertising! This 8GB capacity stick looks like it would be worth stealing solely on being a nice piece of jewelry, and then the added bonus that there might be "valuable secrets" just makes that possibility even more likely.
If you want to keep your information secure, it would help to have "plausible deniability" that there is nothing of value on a stick. Either have some corporate logo on it, of have the stick look like a cute animal, like these pig or chicken USB sticks.
It reminds me how the first Apple iPod's were in bright [Mug-me White]. I use black headphones with my black iPod to avoid this problem.
Of course, you can always install the downloadable version of EncryptStick software onto a less conspicuous stick if you are concerned about theft. The full/paid version of EncryptStick offers an option for "lost key recovery" which would allow you to backup the contents of the stick and be able to retrieve them on a newly purchased stick in the event your first one is lost or stolen.
Imagine how "unlucky" I felt when I notice that I had lost my "rabbits feet" on this cute animal-themed USB stick.
I sense trouble for losing the cap on my EncryptStick as well. This might seem trivial, but is a pet-peeve of mine that USB sticks should plan for this. Not only is there nothing to keep the cap on (it slides on and off quite smoothly), but there is no loop to attach the cap to anything if you wanted to.
Since then, I got smart and try to look for ways to keep the cap connected. Some designs, like this IBM-logoed stick shown above, just rotate around an axle, giving you access when you need it, and protection when it is folded closed.
Alternatively, get a little chain that allows you to attach the cap to the main stick. In the case of the pig and chicken, the memory section had a hole pre-drilled and a chain to put through it. I drilled an extra hole in the cap section of each USB stick, and connected the chain through both pieces.
(Warning: Kids, be sure to ask for assistance from your parents before using any power tools on small plastic objects.)
The EncryptStick can run on either Microsoft Windows or Mac OS. The instructions indicate that you can install both versions of download software onto a single stick, so why not do that for the pre-loaded full version? The stick I have had only the Windows version pre-loaded. I don't know if the Windows and Mac OS versions can unlock the same "storage vaults" on the stick.
Certainly, I have been to many companies where either everyone runs Windows or everyone runs Mac OS. If the primary target audience is to use this stick at work in one of those places, then no changes are required. However, at IBM, we have employees using Windows, Mac OS and Linux. In my case, I have all three! Ideally, I would like a version of EncryptStick that I could take on trips with me that would allow me to use it regardless of the Operating System I encountered.
Since there isn't a Linux-version of EncryptStick software, I decided to modify my stick to support booting Linux. I am finding more and more Linux kiosks when I travel, especially at airports and high-traffic locations, so having a stick that works both in Windows or Linux would be useful. Here are some suggestions if you want to try this at home:
Use fdisk to change the FAT32 partition type from "b" to "c". Apparently, Grub2 requires type "c", but the pre-loaded EncryptStick was set to "b". The Windows version of EncryptStick> seems to work fine in either mode, so this is a harmless change.
Install Grub2 with "grub-install" from a working Linux system.
Once Grub2 is installed, you can boot ISO images of various Linux Rescue CDs, like [PartedMagic] which includes the open-source [TrueCrypt] encryption software that you could use for Linux purposes.
This USB stick could also be used to help repair a damaged or compromised Windows system. Consider installing [Ophcrack] or [Avira].
Certainly, 8GB is big enough to run a full Linux distribution. The latest 32-bit version of [Ubuntu] could run on any 32-bit or 64-bit Intel or AMD x86 machine, and have enough room to store an [encrypted home directory].
Since the stick is formatted FAT32, you should be able to run your original Windows or Mac OS version of EncryptStick with these changes.
Depending on where you are, you may not have the luxury to reboot a system from the USB memory stick. Certainly, this may require changes to the boot sequence in the BIOS and/or hitting the right keys at the right time during the boot sequence. I have been to some "Internet Cafes" that frown on this, or have blocked this altogether, forcing you to boot only from the hard drive.
Well, those are my suggestions. Whether you go on a trip with or without your laptop, it can't hurt to take this EncryptStick along. If you get a virus on your laptop, or have your laptop stolen, then it could be handy to have around. If you don't bring your laptop, you can use this at Internet cafes, hotel business centers, libraries, or other places where public computers are available.
Are you tired of hearing about Cloud Computing without having any hands-on experience? Here's your chance. IBM has recently launched its IBM Development and Test Cloud beta. This gives you a "sandbox" to play in. Here's a few steps to get started:
Generate a "key pair". There are two keys. A "public" key that will reside in the cloud, and a "private" key that you download to your personal computer. Don't lose this key.
Request an IP address. This step is optional, but I went ahead and got a static IP, so I don't have to type in long hostnames like "vm353.developer.ihost.com".
Request storage space. Again, this step is optional, but you can request a 50GB, 100GB and 200GB LUN. I picked a 200GB LUN. Note that each instance comes with some 10 to 30GB storage already. The advantage to a storage LUN is that it is persistent, and you can mount it to different instances.
Start an "instance". An "instance" is a virtual machine, pre-installed with whatever software you chose from the "asset catalog". These are Linux images running under Red Hat Enterprise Virtualization (RHEV) which is based on Linux's kernel virtual machine (KVM). When you start an instance, you get to decide its size (small, medium, or large), whether to use your static IP address, and where to mount your storage LUN. On the examples below, I had each instance with a static IP and mounted the storage LUN to /media/storage subdirectory. The process takes a few minutes.
So, now that you are ready to go, what instance should you pick from the catalog? Here are three examples to get you started:
IBM WebSphere sMASH Application Builder
Base OS server to run LAMP stack
Next, I decided to try out one of the base OS images. There are a lot of books on Linux, Apache, MySQL and PHP (LAMP) which represents nearly 70 percent of the web sites on the internet. This instance let's you install all the software from scratch. Between Red Hat and Novell SUSE distributions of Linux, Red Hat is focused on being the Hypervisor of choice, and SUSE is focusing on being the Guest OS of choice. Most of the images on the "asset catalog" are based on SLES 10 SP2. However, there was a base OS image of Red Hat Enterprise Linux (RHEL) 5.4, so I chose that.
To install software, you either have to find the appropriate RPM package, or download a tarball and compile from source. To try both methods out, I downloaded tarballs of Apache Web Server and PHP, and got the RPM packages for MySQL. If you just want to learn SQL, there are instances on the asset catalog with DB2 and DB2 Express-C already pre-installed. However, if you are already an expert in MySQL, or are following a tutorial or examples based on MySQL from a classroom textbook, or just want a development and test environment that matches what your company uses in production, then by all means install MySQL.
This is where my SSH client comes in handy. I am able to login to my instance and use "wget" to fetch the appropriate files. An alternative is to use "SCP" (also part of PuTTY) to do a secure copy from your personal computer up to the instance. You will need to do everything via command line interface, including editing files, so I found this [VI cheat sheet] useful. I copied all of the tarballs and RPMs on my storage LUN ( /media/storage ) so as not to have to download them again.
Compiling and configuring them is a different matter. By default, you login as an end user, "idcuser" (which stands for IBM Developer Cloud user). However, sometimes you need "root" level access. Use "sudo bash" to get into root level mode, and this allows you to put the files where they need to be. If you haven't done a configure/make/make install in awhile, here's your chance to relive those "glory days".
In the end, I was able to confirm that Apache, MySQL and PHP were all running correctly. I wrote a simple index.php that invoked phpinfo() to show all the settings were set correctly. I rebooted the instance to ensure that all of the services started at boot time.
Rational Application Developer over VDI
This last example, I started an instance pre-installed with Rational Application Developer (RAD), which is a full Integrated Development Environment (IDE) for Java and J2EE applications. I used the "NX Client" to launch a virtual desktop image (VDI) which in this case was Gnome on SLES 10 SP2. You might want to increase the screen resolution on your personal computer so that the VDI does not take up the entire screen.
From this VDI, you can launch any of the programs, just as if it were your own personal computer. Launch RAD, and you get the familiar environment. I created a short Java program and launched it on the internal WebSphere Application Server test image to confirm it was working correctly.
If you are thinking, "This is too good to be true!" there is a small catch. The instances are only up and running for 7 days. After that, they go away, and you have to start up another one. This includes any files you had on the local disk drive. You have a few options to save your work:
Copy the files you want to save to your storage LUN. This storage LUN appears persistent, and continues to exist after the instance goes away.
Take an "image" of your "instance", a function provided in the IBM Developer and Test Cloud. If you start a project Monday morning, work on it all week, then on Friday afternoon, take an "image". This will shutdown your instance, and backup all of the files to your own personal "asset catalog" so that the next time you request an instance, you can chose that "image" as the starting point.
Another option is to request an "extension" which gives you another 7 days for that instance. You can request up to five unique instances running at the same time, so if you wanted to develop and test a multi-host application, perhaps one host that acts as the front-end web server, another host that does some kind of processing, and a third host that manages the database, this is all possible. As far as I can tell, you can do all the above from either a Windows, Mac or Linux personal computer.
Getting hands-on access to Cloud Computing really helps to understand this technology!
I've gotten suggestions to upgrade the memory and disk storage, and how to fine-tune the Microsoft Windows XP operating system. Others suggested replacing the OS with Linux, and to use the Cloud to avoid some of the storage space limitations.
But first, I have to mention the latest in our series of "Enterprise Systems" videos. The first was being [Data Ready]. The second was being [Security Ready]. The now the third in the series: the 3-minute
[Cloud Ready] video.
So I decided to try different Cloud-oriented Operating Systems, to see if any would be a good fit. Here is what I found:
(FTC Disclosure: I work for IBM and own IBM stock. This blog post is not meant to endorse one OS over another. I have financial interests in, and/or have friends and family who work at some of the various companies mentioned in this post. Some of these companies also have business relationships with IBM.)
Jolicloud and Joli OS 1.2
I gave this OS a try. This is based on Linux, but with an interesting approach. First, you have to be on-line all the time, and this OS is designed for 15-25 year-olds who are on social media websites like Facebook. By having a Jolicloud account, you can access this from any browser on any system, or run the Joli OS operating system, or buy the already pre-installed Jolibook netbook computer.
The Joli OS 1.2 LiveCD ran fine on my T410 with 4GB or RAM, giving me a chance to check it out, but sadly did not run on grandma's Thinkpad R31 with 384MB of RAM. According to the [Jolicloud specifications], Joli OS should run in as little as 384MB of RAM and 2GB of disk storage space, but it didn't for me.
Google Chrome and Chromium OS Vanilla
Like the Jolibook, Google has come out with a $249 Chromebook laptop that runs their "Chrome OS". This is only available via OEM install on desginated hardware, but the open source version is available called Chromium OS. These are also based on Linux.
Rather than compiling from source, Hexxeh has made nightly builds available. You can download [Chromium OS Vanilla] zip file, unzip the image file, and copy it to a 4GB USB memory stick. The compressed image is about 300MB, but uncompressed about 2.5GB, so too big to fit on a CD. The image on the USB stick is actually two partitions, and cannot be run from DVD either.
If you don't have a 4GB USB stick handy, and want to see what all the fuss is about, just install the Google Chrome browser on your Windows or Linux system, and then maximize the browser window. That's it. That is basically what Chromium OS is all about.
Files can be stored locally, or out on your Google Drive. Documents can be edited using "Google Docs" in the Cloud. You can run in "off-line" mode, for example, read your Gmail notes when not connected to the Internet. Music and video files can be played using the "Files" app.
If you really need to get out of the browser, you can hit the right combination of keys to get to the "crosh" command line shell.
Like Joli OS, I was able to run this from my Thinkpad T410 with 4GB of RAM, but not on grandma's Thinkpad R31. It appears that Chromium requires at least 1GB of RAM to run properly.
Android for x86
While researching the Chromium OS, I found that there is an open source community porting [Android to the x86] platform. Android is based on Linux, and would allow your laptop or netbook to run very much like a smartphone or tablet. Most of the apps available to Android should work here as well.
Unfortunately, the project has focused only on selected hardware:
ASUS Eee PCs/Laptops
Viewsonic Viewpad 10
Dell Inspiron Mini Duo
Lenovo ThinkPad x61 Tablet
I tried running the Thinkpad x61 version on both my Thinkpad T410 and grandma's Thinkpad R31, but with no success.
Peppermint OS Three
Next up was Peppermint OS, which claims to be a blend of Linux Mint, Lubuntu, and Xfce, but with a "twist" of aspiring to be a Cloud-oriented OS.
Rather than traditional apps to write documents or maintain a calendar, this OS offers a "Single-Site Browser" (SSB) experience, where you can configure "apps" by pointing to their respective URL. For documents, launch GWoffice, the client for Google Docs. For calendar, launch Google Calendar.
Most Linux distros have both a number and a project name associated with them. For example, Ubuntu 10.04 LTS is known as "Lucid Lynx". The Peppermint OS team avoided this by just calling their latest version "Three" which serves as both its number and its name.
The browser is Chromium, similar to Google Chrome OS above, and uses the "DuckDuckGo" search engine. This is how the Peppermint OS folks make their money to defray the costs of this effort.
Peppermint OS claims to run in systems as little as 192MB or RAM, and only 4GB of disk space. The LiveCD ran well on both my Thinkpad T410, as well as grandma's Thinkpad R31. More importantly, when I installed on the hard drive, it ran well.
The music app "Guayadeque" that came pre-installed was awful. It couldn't play MP3 music out-of-the-box. I had to install the Codec plugins from various "ubuntu-restricted-extras" libraries. I also installed the music app "Rhythmbox", and that worked great. Time from power-on to first-note was less than 2 minutes! However, the problems with the Guayadeque gave me the impression this OS might not be ready for primetime.
I contacted grandma to ask if she has Wi-Fi in her home, and sure enough, she doesn't. Her PC upstairs is direct attached to the cable modem. So, while the Cloud suggestion was worthy of investigation, I will continue to pursue other options that do not require being connected. I certainly do not want to spend any time and effort getting Wi-Fi installed there.