Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
In last week's System Storage Portfolio Top Gun class in Dallas, some of the students were not familiarwith Really Simple Syndication (RSS). For the uninitiated, this can be intimidating.I thought a quick overview of what I've done might help:
Chose a "feed reader". I chose Bloglines but there are many others.
Use Technorati to search other blogs for keywords or phrases I am looking for.
When I find a blog that I like to continue tracking, I "add" it to my subscription list on bloglines. Just hit "add" and copy the URL of the blog you want to track. Bloglines will figure out the RSS keywords required.I track eight blogs at the momemnt, but some people with lots of time on their hands track 20 or more. It is easy to unsubscribe, so don't be afraid to try some out for a few days.
Since I was actually going to run a blog of my own, I read a few books on the topic. One I recommend is "Naked Conversations" by Robert Scoble and Shel Israel, both experienced bloggers.
Finally, I am not big on spell checking, but most places have the option to preview your post or comment before it actually gets posted, which is not a bad idea if you use any HTML tags.
For a quick taste of blogging, consider using Data Storage Blogger Feed Reader. This has a lot of blogs on the topic of storage, already added and categorized for your convenience, ready for your perusal.
I am sure there are many other ways to enjoy the Blogosphere, but this works for me.[Read More]
I get a lot of suggestions for what to put on my blog.I realize that tweets are limited to 140 characters, so pointing to a video URL without muchexplanation or warning can be dangerous. An email can at least add appropriate warnings,such NSFW (Not Safe For Work) or "sorry if this offends you". The only warning I got fora video posted to YouTube by "StorageNetworkDud" was this short email:
"Sorry about the language they have used in some translations, but not sure who put this. It was on twitter."
Fortunately, I have my browser set up not to automatically play YouTube videos. The titlehelped warn me of the content, which turned out to be a [fan-subbed] scene from a World War II movie with brown-shirted tyrannical leader of an evil empire talking to his top generals. He dismisses all but threewith "Hollis, Burke, and Twomey stay in here" followed by a lengthy recap of EMC's recent troublesin the marketplace. At least in the video, the fuhrer correctly follows Tim Sander's advice:"if you have to tell someone bad news, say it in person."
While I understand that many people don't like EMC, the #3 storagevendor in the world, this type of "geek humor" hits a new low. The video was posted over amonth ago, but in light of the recent [shooting in Washington DC], I felt it was just notappropriate to post it here.
Readers, I appreciate all the suggestions, but give me some better warning next time!
Continuing this week's theme on Cloud Computing, Dynamic Infrastructure and Data Center Networking, IBM unveiled details of an advanced computing system that will be able to compete with humans on Jeopardy!, America’s favorite quiz television show. Additionally, officials from Jeopardy! announced plans to produce a human vs. machine competition on the renowned show.
For nearly two years, IBM scientists have been working on a highly advanced Question Answering (QA) system, codenamed "Watson" after IBM's first president, [Thomas J. Watson]. The scientists believe that the computing system will be able to understand complex questions and answer with enough precision and speed to compete on Jeopardy!Produced by Sony Pictures Television, the trivia questions on Jeopardy! cover a broad range of topics, such as history, literature, politics, film, pop culture, and science. It poses a grand challenge for a computing system due to the variety of subject matter, the speed at which contestants must provide accurate responses, and because the clues given to contestants involve analyzing subtle meaning, irony, riddles, and other complexities at which humans excel and computers traditionally do not. Watson will incorporate massively parallel analytical capabilities and, just like human competitors, Watson will not be connected to the Internet or have any other outside assistance.
If this all sounds familiar, you might remember some of the events that have led up to this:
In 1984, the movie ["The Terminator"] introduced the concept of [Skynet], a fictional computer system developed by the militarythat becomes self-aware from its advanced artificial intelligence.
In 1997, an IBM computer called Deep Blue defeated World Chess Champion [Garry Kasparov] in a famous battle of human versus machine. To compete at chess, IBM built an extremely fast computer that could calculate 200 million chess moves per second based on a fixed problem. IBM’s Watson system, on the other hand, is seeking to solve an open-ended problem that requires an entirely new approach – mainly through dynamic, intelligent software – to even come close to competing with the human mind. Despite their massive computational capabilities, today’s computers cannot consistently analyze and comprehend sentences, much less understand cryptic clues and find answers in the same way the human brain can.
In 2005, Ray Kurzweil wrote [The Singularity is Near] referring to the wonders that artificial intelligence will bring to humanity.
The research underlying Watson is expected to elevate computer intelligence and human-to-computer communication to unprecedented levels. IBM intends to apply the unique technological capabilities being developed for Watson to help clients across a wide variety of industries answer business questions quickly and accurately.
In the post [Flowing Workflow], the folks over at Eightbar point to the latest 3D work being done with IBM Lotus Sametime.
IBM Sametime is IBM's instant messaging facility, which has been extended to include Voice over IP (VOIP) capability similar to Skype, and now is being developed as a launch point for 3D impromptu meetings "in-world", similar to [Second Life].
With many companies facing hard times and considering travel restrictions for face-to-faceinternal meetings, an information infrastructure that adopts this technology might be a reasonable alternative.
In Monday's post, [IBM Information Infrastructure launches today], I explained how this strategic initiative fit into IBM's New EnterpriseData Center vision. The launch was presented at the IBM Storage and Storage Networking Symposium to over 400 attendeesin Montpelier, France, with corresponding standing-room-only crowds in New York and Tokyo.
This post will focus on Information Retention, the third of the four-part series this week.
Here's another short 2-minute video, on Information Retention
Let's start with some interesting statistics.Fellow blogger Robin Harris on his StorageMojo blog has an interesting post:[Our changing file workloads],which discusses the findings of study titled"Measurement and Analysis of Large-Scale Network File System Workloads"[14-page PDF]. This paper was a collaborationbetween researchers from University of California Santa Cruz and our friends at NetApp.Here's an excerpt from the study:
Compared to Previous Studies:
Both of our workloads are more write-oriented. Read to write byte ratios have significantly decreased.
Read-write access patterns have increased 30-fold relative to read-only and write-only access patterns.
Most bytes are transferred in longer sequential runs. These runs are an order of magnitude larger.
Most bytes transferred are from larger files. File sizes are up to an order of magnitude larger.
Files live an order of magnitude longer. Fewer than 50 percent are deleted within a day of creation.
Files are rarely re-opened. Over 66 percent are re-opened once and 95% fewer than five times.
Files re-opens are temporally related. Over 60 percent of re-opens occur within a minute of the first.
A small fraction of clients account for a large fraction of file activity. Fewer than 1 percent of clients account for50 percent of file requests.
Files are infrequently shared by more than one client. Over 76 percent of files are never opened by more than one client.
File sharing is rarely concurrent and sharing is usually read-only. Only 5 percent of files opened by multiple clients are concurrent and 90 percent of sharing is read-only.
Most file types do not have a common access pattern.
Why are files being kept ten times longer than before? Because the information still has value:
Provide historical context
Gain insight to specific situations, market segment demographics, or trends in the greater marketplace
Help innovate new ideas for products and services
Make better, smarter decisions
National Public Radio (NPR) had an interesting piece the other day. By analyzing old photos, a researcher for Cold War Analysis was able to identify an interesting [pattern for Russian presidents]. (Be sure to listen to the 3-minute audio to hear a hilarious song about the results!)
Which brings me to my own collection of "old photos". I bought my first digital camera in the year 2000,and have taken over 15,000 pictures since then. Before that,I used 35mm film camera, getting the negatives developed and prints made. Some of these date back to my years in High School and College. I have a mix of sizes, from 3x5, 4x6 and 5x7 inches,and sometimes I got double prints.Only a small portion are organized intoscrapbooks. The rest are in envelopes, prints and negatives, in boxes taking up half of my linen closet in my house.Following the success of the [Library of Congress using flickr],I decided the best way to organize these was to have them digitized first. There are several ways to do this.
This method is just too time consuming. Lift the lid place 1 or a few prints face down on the glass, close the lid,press the button, and then repeat. I estimate 70 percent of my photos are in [landscape orientation], and 30 percent in [portrait mode]. I can either spend extra time toorient each photo correctly on the glass, or rotate the digital image later.
I was pleased to learn that my Fujitsu ScanSnap S510 sheet-feed scanner can take in a short stack (dozen or so) photos, and generate JPEG format files for each. I can select 150, 300 or 600dpi, and five levels of JPEG compression.All the photos feed in portrait mode, which I can then rotate later on the computer once digitized.A command line tool called [ImageMagick] can help automate the rotations.While I highly recommend the ScanSnap scanner, this is still a time-consuming process for thousands of photos.
"The best way to save your valuable photos may be by eliminating the paper altogether. Consider making digital images of all your photos."
Here's how it works:You ship your prints (or slides, or negatives) totheir facility in Irvine, California. They have a huge machine that scans them all at 300dpi, no compression, andthey send back your photos and a DVD containing digitized versions in JPEG format, all for only 50 US dollars plusshipping and handling, per thousand photos. I don't think I could even hire someone locally to run my scanner for that!
The deal got better when I contacted them. For people like me with accounts on Facebook, flickr, MySpace or Blogger,they will [scan your first 1000 photos for free] (plus shipping and handling). I selected a thousand 4x6" photos from my vast collection, organized them into eight stacks with rubber bands,and sent them off in a shoe box. The photos get scanned in landscape mode, so I had spent about four hours in preparing what I sent them, making sure they were all face up, with the top of the picture oriented either to the top or left edge.For the envelopes that had double prints, I "deduplicated" them so that only one set got scanned.
The box weighed seven pounds, and cost about 10 US dollars to send from Tucson to Irvinevia UPS on Tuesday. They came back the following Monday, all my photos plus the DVD, for 20 US dollars shipping and handling. Each digital image is about 1.5MB in size, roughly 1800x1200 pixels in size, so easily fit on a single DVD. The quality is the sameas if I scanned them at 300dpi on my own scanner, and comparable to a 2-megapixel camera on most cell phones.Certainly not the high-res photos I take with my Canon PowerShot, but suitable enough for email or Web sites. So, for about 30 US dollars, I got my first batch of 1000 photos scanned.
ScanMyPhotos.com offers a variety of extra priced options, like rotating each file to the correct landscape or portrait orientation, color correction, exact sequence order, hosting them on their Web site online for 30 days to share with friends and family, and extra copies of the DVD.All of these represent a trade-off between having them do it for me for an additional fee, or me spending time doing it myself--either before in the preparation, or afterwards managing the digital files--so I can appreciate that.
Perhaps the weirdest option was to have your original box returned for an extra $9.95? If you don't have a hugecollection of empty shoe boxes in your garage, you can buy a similarly sized cardboard box for only $3.49 at the local office supply store, so I don't understand this one. The box they return all your photos in can easily be used for the next batch.
I opted not to get any of these extras. The one option I think they should add would be to have them just discardthe prints, and send back only the DVD itself. Or better yet, discard the prints, and email me an ISO file of the DVD that I can burn myself on my own computer.Why pay extra shipping to send back to me the entire box of prints, just so that I can dump the prints in the trash myself? I will keep the negatives, in case I ever need to re-print with high resolution.
Overall, I am thoroughlydelighted with the service, and will now pursue sending the rest of my photos in for processing, and reclaim my linen closet for more important things. Now that I know that a thousand 4x6 prints weighs 7 pounds, I can now estimate how many photos I have left to do, and decide on which discount bulk option to choose from.
With my photos digitized, I will be able to do all the things that IBM talks about with Information Retention:
Place them on an appropriate storage tier. I can keep them on disk, tape or optical media.
Easily move them from one storage tier to another. Copying digital files in bulk is straightforward, and as new techhologies develop, I can refresh the bits onto new media, to avoid the "obsolescence of CDs and DVDs" as discussed in this article in[PC World].
Share them with friends and family, either through email, on my Tivo (yes, my Tivo is networked to my Mac and PC and has the option to do this!), or upload themto a photo-oriented service like [Kodak Gallery or flickr].
Keep multiple copies in separate locations. I could easily burn another copy of the DVD myself and store in my safe deposit box or my desk at work.With all of the regional disasters like hurricanes, an alternative might be to backup all your files, including your digitized photos, with an online backup service like [IBM Information Protection Services] from last year's acquisition of Arsenal Digital.
If the prospect of preserving my high school and college memories for the next few decades seems extreme,consider the [Long Now Foundation] is focused on retaining information for centuries.They areeven suggesting that we start representing years with five digits, e.g., 02008, to handle the deca-millennium bug which will come into effect 8,000 years from now. IBM researchers are also working on [long-term preservation technologies and open standards] to help in this area.
For those who only read the first and last paragraphs of each post, here is my recap:Information Retention is about managing [information throughout its lifecycle], using policy-based automation to help with the placement, movement and expiration. An "active archive" of information serves to helpgain insight, innovate, and make better decisions. Disk, tape, and blended disk-and-tape solutions can all play a part in a tiered information infrastructure for long-term retention of information.
Based on this success, and perhaps because I am also fluent in Spanish, I was asked to help with Proyecto Ceibal, the team for OLPC Uruguay. Normally theXS school server resides at the school location itself, so that even if the internet connection is disrupted or limited, the school kids can continue to access each other and the web cache content until internet connection is resumed.However, with a diverse developmentteam with people in United States, Uruguay, and India, we first looked to Linux hosting providers that wouldagree to provide free or low-cost monthly access. We spent (make that "wasted") the month of May investigating.Most that I talked to were not interested in having a customized Linux kernel on non-standard hardware on their shop floor, and wanted instead to offer their own standard Linux build on existing standard servers, managed by theirown system administrators, or were not interested in providing it for free. Since the XS-163 kernel is customizedfor the x86 architecture, it is one of those exceptions where we could not host it on an IBM POWER or mainframe as a virtual guest.
This got picked up as an [idea] for the Google's[Summer of Code] and we are mentoring Tarun, a 19-year-old student to actas lead software developer. However, summer was fast approaching, and we wanted this ready for the next semester. In June, our project leader, Greg, came up with a new plan. Build a machine and have it connected at an internet service provider that would cover the cost of bandwidth, and be willing to accept this with remote administration. We found a volunteer organization to cover this -- Thank you Glen and Vicki!
We found a location, so the request to me sounded simple enough: put together a PC from commodity parts that meet the requirements of the customizedLinux kernel, the latest release being called [XS-163]. The server would have two disk drives, three Ethernet ports, and 2GB of memory; and be installed with the customized XS-163 software, SSHD for remote administration, Apache web server, PostgreSQL database and PHP programming language.Of course, the team wanted this for as little cost as possible, and for me to document the process, so that it could be repeated elsewhere. Some stretch goals included having a dual-boot with Debian 4.0 Etch Linux for development/test purposes, an alternative database such as MySQL for testing, a backup procedure, and a Recover-DVD in case something goes wrong.
Some interesting things happened:
The XS-163 is shipped as an ISO file representing a LiveCD bootable Linux that will wipe your system cleanand lay down the exact customized software for a one-drive, three-Ethernet-port server. Since it is based on Red Hat's Fedora 7 Linux base, I found it helpful to install that instead, and experiment moving sections of code over.This is similar to geneticists extracting the DNA from the cell of a pit bull and putting it into the cell for a poodle. I would not recommend this for anyone not familiar with Linux.
I also experimented with modifying the pre-built XS-163 CD image by cracking open the squashfs, hacking thecontents, and then putting it back together and burning a new CD. This provided some interesting insight, but in the end was able to do it all from the standard XS-163 image.
Once I figured out the appropriate "scaffolding" required, I managed to proceed quickly, with running versionsof XS-163, plain vanilla Fedora 7, and Debian 4, in a multi-boot configuration.
The BIOS "raid" capability was really more like BIOS-assisted RAID for Windows operating system drivers. This"fake raid" wasn't supported by Linux, so I used Linux's built-in "software raid" instead, which allowed somepartitions to be raid-mirrored, and other partitions to be un-mirrored. Why not mirror everything? With two160GB SATA drives, you have three choices:
No RAID, for a total space of 320GB
RAID everything, for a total space of 160GB
Tiered information infrastructure, use RAID for some partitions, but not all.
The last approach made sense, as a lot of of the data is cache web page images, and is easily retrievable fromthe internet. This also allowed to have some "scratch space" for downloading large files and so on. For example,90GB mirrored that contained the OS images, settings and critical applications, and 70GB on each drive for scratchand web cache, results in a total of 230GB of disk space, which is 43 percent improvement over an all-RAID solution.
While [Linux LVM2] provides software-based "storage virtualization" similar to the hardware-based IBM System Storage SAN Volume Controller (SVC), it was a bad idea putting different "root" directories of my many OS images on there. With Linux, as with mostoperating systems, it expects things to be in the same place where it last shutdown, but in a multi-boot environment, you might boot the first OS, move things around, and then when you try to boot second OS, it doesn'twork anymore, or corrupts what it does find, or hangs with a "kernel panic". In the end, I decided to use RAIDnon-LVM partitions for the root directories, and only use LVM2 for data that is not needed at boot time.
While they are both Linux, Debian and Fedora were different enough to cause me headaches. Settings weredifferent, parameters were different, file directories were different. Not quite as religious as MacOS-versus-Windows,but you get the picture.
During this time, the facility was out getting a domain name, IP address, subnet mask and so on, so I testedwith my internal 192.168.x.y and figured I would change this to whatever it should be the day I shipped the unit.(I'll find out next week if that was the right approach!)
Afraid that something might go wrong while I am in Tokyo, Japan next week (July 7-11), or Mumbai, India the following week (July 14-18), I added a Secure Shell [SSH] daemon that runs automaticallyat boot time. This involves putting the public key on the server, and each remote admin has their own private key on their own client machine.I know all about public/private key pairs, as IBM is a leader in encryption technology, and was the first todeliver built-in encryption with the IBM System Storage TS1120 tape drive.
To have users have access to all their files from any OS image required that I either (a) have identical copieseverywhere, or (b) have a shared partition. The latter turned out to be the best choice, with an LVM2 logical volumefor "/home" directory that is shared among all of the OS images. As we develop the application, we might findother directories that make sense to share as well.
For developing across platforms, I wanted the Ethernet devices (eth0, eth1, and so on) match the actual ports they aresupposed to be connected to in a static IP configuration. Most people use DHCP so it doesn't matter, but the XSsoftware requires this, so it did. For example, "eth0" as the 1 Gbps port to the WAN, and "eth1/eth2" as the two 10/100 Mbps PCI NIC cards to other servers.Naming the internet interfaces to specific hardware ports wasdifferent on Fedora and Debian, but I got it working.
While it was a stretch goal to develop a backup method, one that could perform Bare Machine Recovery frommedia burned by the DVD, it turned out I needed to do this anyways just to prevent me from losing my work in case thingswent wrong. I used an external USB drive to develop the process, and got everything to fit onto a single 4GB DVD. Using IBM Tivoli Storage Manager (TSM) for this seemed overkill, and [Mondo Rescue] didn't handle LVM2+RAID as well as I wanted, so I chose [partimage] instead, which backs up each primary partition, mirrored partition, or LVM2 logical volume, keeping all the time stamps, ownerships, and symbolic links in tact. It has the ability to chop up the output into fixed sized pieces, which is helpful if you are goingto burn them on 700MB CDs or 4.7GB DVDs. In my case, my FAT32-formatted external USB disk drive can't handle files bigger than 2GB, so this feature was helpful for that as well. I standardized to 660 GiB [about 692GB] per piece, sincethat met all criteria.
The folks at [SysRescCD] saved the day. The standard "SysRescueCD" assigned eth0, eth1, and eth2 differently than the three base OS images, but the nice folks in France that write SysRescCD created a customized[kernel parameter that allowed the assignments to be fixed per MAC address ] in support of this project. With this in place, I was able to make a live Boot-CD that brings up SSH, with all the users, passwords,and Ethernet devices to match the hardware. Install this LiveCD as the "Rescue Image" on the hard disk itself, and also made a Recovery-DVD that boots up just like the Boot-CD, but contains the 4GB of backup files.
For testing, I used Linux's built-in Kernel-based Virtual Machine [KVM]which works like VMware, but is open source and included into the 2.6.20 kernels that I am using. IBM is the leadingreseller of Vmware and has been doing server virtualization for the past 40 years, so I am comfortable with thetechnology. The XS-163 platform with Apache and PostgreSQL servers as a platform for [Moodle], an open source class management system, and the combination is memory-intensive enough that I did not want to incur the overheads running production this manner, but it wasgreat for testing!
With all this in place, it is designed to not need a Linux system admin or XS-163/Moodle expert at the facility. Instead, all we need is someone to insert the Boot-CD or Recover-DVD and reboot the system if needed.
Just before packing up the unit for shipment, I changed the IP addresses to the values they need at the destination facility, updated the [GRUB boot loader] default, and made a final backup which burned the Recover-DVD. Hopefully, it works by just turning on the unit,[headless], without any keyboard, monitor or configuration required. Fingers crossed!
So, thanks to the rest of my team: Greg, Glen, Vicki, Tarun, Marcel, Pablo and Said. I am very excited to bepart of this, and look forward to seeing this become something remarkable!
There is a difference between improving "energy efficiency" versus reducing "power consumption".
Let's consider the average 100 watt light bulb, of which 5 watts generate the desired feature (light), and 95 percent generated as undesired waste (heat). In this case, it would be 5 percent efficient. If you delivered a new light bulb that generated 3 watts of light for only 30 watts of energy, then you would have an offering that was more energy efficient (10 percent instead of 5 percent) and use 70 percent less power (30 watts instead of 100 watts). This new "dim bulb" would not be as bright as the original, but has other desirable energy qualities.
Nearly all of the output of data center equipment results in heat.In The Raised Floor blog [It's Too Darn Hot!], Will Runyon explains how IBM researcher Bruno Michel in Zurich has developed new ways to cool chips with water shot through thousands of nozzles, much like capillaries in the human body. This is just one of many developments that are part of IBM's [Project Big Green]
But what if the desired feature is heat, and the undesired feature is light?In the case of Hasbro's toy[Easy-Bake Oven],a 100W incadescent light bulb is used to bake small cakes. This is generating 95W of desired heat, and onlywasting 5 percent as light (unused inside the oven). That makes this little toy 95 percent energy efficient, butconsumes as much energy as any other 100W light bulb lamp or fixture in your house. With manufacturing switchingfrom incadescent to compact flourescent bulbs, this toy oven may not be around much longer.
While we all joke that it is just a matter of time before our employers make us ride stationary bicycles attached to generators to power our monstrous data centers, 23-year old student Daniel Sheridan designeda see-saw for kids in Africa to play on that generates electricity for nearby schools. [Dan won the "mostinnovative product" at the Enterprise Festival].
Another approach is to improve efficiency by converting previously undesirable outcomes to desirable. Brian Bergstein has a piece in Forbes titled["Heat From Data Center to Warm a Pool"].Here's an excerpt:
"In a few cases, the heat produced by the computers is used to warm nearby offices. In what appears to be a first, the town pool in Uitikon, Switzerland, outside Zurich, will be the beneficiary of the waste heat from a data center recently built by IBM Corp. (nyse: IBM) for GIB-Services AG.
As in all data centers, air conditioners will blast the computers with chilly air - to keep the machines from exceeding their optimum temperature of around 70 degrees - and pump hot air out.
Usually, the hot air is vented outdoors and wasted. In the Uitikon center, it will flow through heat exchangers to warm water that will be pumped into the nearby pool. The town covered the cost of some of the connecting equipment but will get to use the heat for free."
I see a business opportunity here. Next to every data center lamenting about their power and cooling, build a state-of-the-art fitness center for the employees and nearby townspeople. Exercise on a stationary bicyclegenerating electricity, while your kids play on the see-saw generating electricity, and then afterwards thewhole family can take a dip in the heated swimming pool. And if the company subscribes to the notion of a Results-Oriented Work Environment [ROWE],it could encourage its employees to take "fitness" breaks throughout the day, rather than having everyone there in the early morning or late evening hours, leveling out the energy generated.
While many are just becoming familiar with the end-user interfaces of Web 2.0, from blogs and wikis to FaceBook and FlickR, fewer may be familiar with the "information infrastructure" of servers and storagebehind the scenes.
Last year, I bought an XO laptop under the One Laptop Per Child [OLPC] foundation's Give-1-Get-1 program and posted my impressions on this blog. One in particular, my post[Printingon XO laptop with CUPS and LPR] showed how to print from the XO laptop over to a network-attached printer.This caught the attention of the OLPC development team, who asked me tohelp them with another project as a volunteer. Before accepting, I had to learn what skills they were really looking for, especially since I do notconsider myself an expert in neither printing nor networking.
(Unlike a regular 9-to-5 job where most people just try to look busy for eight hours a day, doingvolunteer work means being ready to ["roll up your sleeves"] and actuallyaccomplish something. This applies to any kind of volunteer work, from hammering nails for [Habitat for Humanity] to sorting cans at the [Community Food Bank].Best Buy uses the phrase "Results Oriented Work Environment" [ROWE] to describetheir latest program, modeled in part after the mobile workforce policies of Web2.0-enlightened companiesIBM and Sun, but that is perhaps a topic for another blog post!)
Apparently, to support a school full of students with XO laptops, it would be nice to have a few serversthat provide support to manage the class lesson plans, make reading materials and other content available,and keep track of results. What they need is an "information infrastructure"! They decided on two specific servers:
School Server -- this would run a popular class management system called [Moodle]
Library Server -- a server for a digital library collection, based on Fedora Commons[16-minute video]
In keeping with OLPC philosophy to use free and open source software[FOSS], both servers are based on the [LAMP] platform. LAMP is an acronym for thecombined software bundle of Linux, Apache, MySQL and a Programming language like PHP. The "XS" team working onthe school server wanted me to build a LAMP server and install Moodle to help test the configuration, determinewhat other software is required, and perhaps develop a backup/recovery scenario. Basically, they needed someone with Linux skills to put some hardware and software together.
(I am no stranger to Linux. Back in the 1990s, I was part of the Linux for S/390 team, led the effort to createthe infamous "compatible disk layout" (CDL) that allows z/OS to access ESCON and FICON-attached Linux volumes,took my LPI certification exam, and led a team to validate FCP drivers for our disk and tape storage systems. For an IBMer to volunteer foran Open Source community project, you have to take an "open source" class and get management approval to reviewfor any possible "conflicts of interest". I got this all taken care of, and accepted to help the XS team.)
Building a test environment is similar to baking a cake. You have a recipe, utensils, and ingredients. Here'sa bit of description of each of the ingredients:
Like Windows, the Linux operating system comes in different flavors to run on handhelds, desktops and servers. For servers, IBM tends to focus on Red Hat Enterprise Linux (RHEL) and SUSE Linux Eneterprise Server (SLES). However, the XS team decidedinstead to use [Fedora 7], a community-supported version from Red Hat. Earlier versions of Fedora were known as "Fedora Core", but apparently with version 7, the word "Core" has been dropped. Fedora 7 can be used in either desktop or server mode.
[Apache] is web server software, and half of all web servers on the internet use it. It competes head-on against Micorosofts Internet Information Services (IIS) serverprovided in Windows 2003. The Apache name is partly from thefact that its origins were "a patchy" variant of the NCSA HTTPd 1.3 codebase. Thepopular [IBM HTTP Server] is poweredby Apache, with added support to the rest of the IBM WebSphere software portfolio. The XS team chose Apache v2as the web server platform.
[MySQL] is a relational database management system (RDBMS) software, similar to commercial products like IBM DB2 Universal Database, Oracle DB, or Microsoft SQL Server. The SQL stands for Structured Query Language, developed by IBM in the early 1970s as a standard languageto update and query database tables. MySQL comes in two flavors, MySQL Enterprise for commercial use, and MySQLCommunity, which is community-supported. There are over 10 million instances of MySQL running websites on the internet, which helps explain why Sun Microsystems agreed to acquire MySQL AB company last month.The XS team decided on MySQL 5.0 as the database platform.
To make HTML pages dynamic, including the possibility to add or query database contents, requires programming.A variety of web scripting languages were developed, all starting with the letter "P" to claim to be the programming part of the LAMP platform, including [PHP], Perl, and Python. Later, new programming language frameworks have been developed that do not start with the letter "P", like [Ruby on Rails]. PHP is short for PHP: Hypertext Preprocessor which explains that it pre-processes HTML during web serving,looking for special tags indicating PHP code, allowing programming logic to insert HTML content, such as information extracted from a database.While Python is the language that runs the Sugar interface on the XO laptops, the XS team decided onPHP v5 as the programming language for the server.
As for utensils, you only need a few utilities
A simple text editor: I go old-school and use the classic "vi" (to learn this editor, see the["Cheat Sheet" method] on IBM Developerworks)
secure socket shell (SSH): this allows you to access one server from another
browser access to the internet: when you encounter problems, get error messages, or whatever, it pays to know how to search for things with Google
As for a recipe, the Moodle website spells out some unique details and parameters. For the base LAMP platform,I chose to follow the book [Fedora 7 Unleashed] that has specific chapters on setting up SSH, Apache, MySQL, PHP, Squid and so on. The resultingconfiguration looks like this:
Here were the sequence of events:
I took an old PC that I wasn't using anymore, backed up the Windows system, and installed Linux on top. Thebook above had a Fedora 7 DVD on the back jacket, but I used the [OLPC LiveCD] that had some values pre-configured.
Set the IP address static. I set mine to 192.168.0.77 which nobody sees except my other systems.
My school server is "headless" which means it does not have its own keyboard, video or mouse. It also runs only to Linux run level 3, command line interface only, no graphics.I was able toshare using a KVM switch], but this meant having to remember something on one screen while I was switching over to the other. My Windows XP system has mybrowser connection to the internet to follow instructions or read error messages, so I need that up all thetime. To get around this, on my Windows XP system,I generated SSH public and private keys, copied the public key over to my new Linux system, and used [OpenSSH for Windows] to connect over. Now, on one screen,I have my Windows XP Firefox browser, and a separate command line window that is accessing my Linux schoolserver.
With SSH up and running, I can now use "vi" to edit files, and issue commands to install or activatethe remaining software. First up, Apache. I got this working, and from Windows XP, verified that going to"http://192.168.0.77" showed the Apache test screen.
I installed PHP, and tested it with a simple short index.php file.
I installed MySQL, setup the base "installation databases", and created a test database. Here is whereyou might want to set a password for the MySQL root user, but I chose to do that later for now.
I installed Moodle. It was smart enough to check that Apache, PHP, and MySQL were operational, andapparently I missed a few special "PHP" modules that had to be linked in. I was able to find them, downloadthem, and get them installed.
I brought up Moodle, created a "class category" of SCIENCE and a new class "Chemistry 101", and it allworked.
I also activated Squid, which is a web proxy cache server that stores web pages for faster access.
Another idea was to activate Samba, to provide CIFS file and print sharing, but I decided to put this off.
I got all of this done last Saturday, start to finish. Now the fun begins. We are going to run throughsome tests, document the procedures, and try to get a system up and running in a remote school in Nepal. Fornow, I have only one XO laptop to simulate what the student sees, and one laptop that can represent eithera teacher's Windows-based laptop, or run QEMU and emulate a second XO laptop.For tuning, I might go through the procedures mentioned on IBM Developerworks "Tuning LAMP"[Part 1, Part 2,Part 3].
For those in the server or storage industry that need to understand Web 2.0 information infrastructure better,building a LAMP server like this can be quite helpful.
Wrapping up my week's theme on IBM's acquisition XIV, we have gotten hundreds of positive articles and reviews in the press, but has caused quite a stir with the[Not-Invented-Here] folks at EMC.We've heard already from EMC bloggers [Chuck Hollis] and [Mark Twomey].The latest is fellow EMC blogger BarryB's missive [Obligatory "IBM buys XIV" Post], which piles on the "Fear, Uncertainty and Doubt" [FUD], including this excerpt here:
In a block storage device, only the host file system or database engine "knows" what's actually stored in there. So in the Nextra case that Tony has described, if even only 7,500-15,000 of the 750,000 total 1MB blobs stored on a single 750GB drive (that's "only" 1 to 2%) suddenly become inaccessible because the drive that held the backup copy also failed, the impact on a file system could be devastating. That 1MB might be in the middle of a 13MB photograph (rendering the entire photo unusable). Or it might contain dozens of little files, now vanished without a trace. Or worst yet, it could actually contain the file system metadata, which describes the names and locations of all the rest of the files in the file system. Each 1MB lost to a double drive failure could mean the loss of an enormous percentage of the files in a file system.
And in fact, with Nextra, the impact will be across not just one, but more likely several dozens or even hundreds of file systems.
Worse still, the Nextra can't do anything to help recover the lost files.
Nothing could be further from the truth. If any disk drive module failed, the system would know exactly whichone it was, what blobs (binary large objects) were on it, and where the replicated copies of those blobs are located. In the event of a rare double-drive failure, the system would know exactly which unfortunate blobs were lost, and couldidentify them by host LUN and block address numbers, so that appropriate repair actions could be taken from remote mirrored copies or tape file backups.
Second, nobody is suggesting we are going to put a delicateFAT32-like Circa-1980 file system that breaks with the loss of a single block and requires tools like "fsck" to piece back together. Today's modern file systems--including Windows NTFS, Linux ext3, and AIX JFS2--are journaled and have sophisticated algorithms tohandle the loss of individual structure inode blocks. IBM has its own General Parallel File System [GPFS] and corresponding Scale out File Services[SOFS], and thus brings a lotof expertise to the table.Advanced distributed clustered file systems, like [Google File System] and Yahoo's [Hadoop project] take this one step further, recognizing that individual node and drive failures at the Petabyte-scale are inevitable.
In other words, XIV Nextra architecture is designed to eliminate or reduce recovery actions after disk failures, not make them worse. Back in 2003, when IBM introduced the new and innovative SAN Volume Controller (SVC), EMCclaimed this in-band architecture would slow down applications and "brain-damage" their EMC Symmetrix hardware.Reality has proved the opposite, SVC can improve application performance and help reduce wear-and-tear on the manageddevices. Since then, EMC acquired Kashya to offer its own in-band architecture in a product called EMC RecoverPoint, that offers some of the features that SVC offers.
If you thought fear mongering like this was unique to the IT industry, consider that 105years ago, [Edison electrocuted an elephant]. To understand this horrific event, you have to understand what was going on at the time.Thomas Edison, inventor of the light bulb, wanted to power the entire city of New York with Direct Current(DC). Nikolas Tesla proposed a different, but more appropriate architecture,called Alternating Current(AC), that had lower losses over distances required for a city as large and spread out as New York. But Thomas Edison was heavily invested in DC technology, and would lose out on royalties if ACwas adopted.In an effort to show that AC was too dangerous to have in homes and businesses, Thomas Edison held a pressconference in front of 1500 witnesses, electrocuting an elephant named Topsy with 6600 volts, and filmed the event so that it could be shown later to other audiences (Edison invented the movie camera also).
Today's nationwide electric grid would not exist without Alternating Current.We enjoy both AC for what it is best used for, and DC for what it is best used for. Both are dangerous at high voltage levels if not handled properly. The same is the case for storage architectures. Traditional high-performance disk arrays, like the IBM System Storage DS8000, will continue to be used for large mainframe applications, online transaction processing and databases. New architectures,like IBM XIV Nextra, will be used for new Web 2.0 applications, where scalability, self-tuning, self-repair,and management simplicity are the key requirements.
(Update: Dear readers, this was meant as a metaphor only, relating the concerns expressed above thatthe use of new innovative technology may result in the loss or corruption of "several dozen or even hundreds of file systems" and thus too dangerous to use, with an analogy on the use of AC electricity was too dangerous to use in homes. To clarify, EMC did not re-enact Thomas Edison's event, no animalswere hurt by EMC, and I was not trying to make political commentary about the current controversy of electrocution as amethod of capital punishment. The opinions of individual bloggers do not necessarily reflect the official positions of EMC, and I am not implying that anyone at EMC enjoys torturing animals of any size, or their positions on capital punishment in general. This is not an attack on any of the above-mentioned EMC bloggers, but rather to point out faulty logic. Children should not put foil gum wrappers in electrical sockets. BarryB and I have apologized to each other over these posts for any feelings hurt, and discussion should focus instead on the technologies and architectures.)
While EMC might try to tell people today that nobody needs unique storage architectures for Web 2.0 applications, digital media and archive data, because their existing products support SATA disk and can be used instead for these workloads, they are probably working hard behind the scenes on their own "me, too" version.And with a bit of irony, Edison's film of the elephant is available on YouTube, one of the many Web 2.0 websites we are talking about. (Out of a sense of decency, I decided not to link to it here, so don't ask)
Continuing this week's theme of "Innovation that Matters", today I'll discuss cell phones, and their rolein "cloud computing". Some people call these "cellular phones", "mobile phones" or "hand phones".I have posted about these topics before. Last January, I discussed the[Convergence]represented by Apple's iPhone, and in August, I talked about[Accessing Data in the Clouds], but some recent announcements bring this back up as a fresh topic.
This is a major game-changer, forcing companies to rethink many of their strategies. For example,John Windsor, on The YouBlog asks the CBS Interactive division[What Business Are You In?]The answer is that CBS is shifting from a content focus, to an audience focus, looking to provide CBS television contentto an audience of cell phone users.ThinkBeta [Me, My Cell Phone and I] presents some interesting statistics. Google CEO Eric Schmidt estimates there are over 2.5 billion cell phones in use today, with 288 million units shipped alone in 3Q07.
That's quite a trend. As a leader in IT innovation, IBM tries to stay one step ahead of the industry, selling off mature technologies to other manufacturers, like typewriters, printers, and most recently laptops and desktop PCs, so that it can focus on newer technologies and market trends. For example, while many people might be aware that IBM designs and fabricates processor chips for all of the major game consoles (Microsoft's Xbox 360, Nitentendo's Wii, and Sony'sPlay Station 3), they might not know that IBM also makes chips for many cell phone manufacturers. IBM[POWER Architecture] blog writes about the IBM CMOS 7RF SOI semiconductor:
IBM has managed to integrate seven Radio Frequency (RF) front-end functions onto this single CMOS chip using silicon-on-insulator (SOI) technology. And this means? For cell phones, according to IBM foundry product director Ken Torino, "Our solution minimizes insertion loss and maximizes isolation which will prevent dropped calls even on the most inexpensive handsets." Currently, cell phone RF front-end functions are handled by five to seven chips and at least two of those are using expensive gallium arsenide (GaA) technologies. The CMOS 7RF SOI should not only reduce costs by eliminating the need for so many chips, but also trim the fat from materials expenditures since GaA tech is somewhat expensive. IBM predicts that manufacturers will first use the chip to reduce on-phone processors to two or three before making the leap to a single chip.
With all this demand, the world will need engineers to develop softwareapplications that work in this new environment. This plays into IBM's strength in the area of grid and supercomputing.IBM and Google announced they have jointly established an Internet-scale computing initiative to promote new software development methods that can help students and researchers address the challenges of Internet-scale applications. From[IBM Internet-scale computing] webpage:
Internet use and content has grown dramatically, fueled by global reach, mobile device access, and user-generated Web content, including large audio and video files. More of the world population is looking to the mobile Web to fulfill basic economic needs. To meet this challenge, Web developers need to adopt new methods to address significant applications such as search, social networking, collaborative innovation, virtual worlds and mobile commerce.
The University of Washington is the first to join the initiative. A small number of universities will also pilot the program, including Carnegie-Mellon University, Massachusetts Institute of Technology, Stanford University, the University of California at Berkeley, and the University of Maryland. In the future, the program will be expanded to include additional researchers, educators and scientists.
The heart of the project is a large cluster of several hundred computers (a combination of Google and IBM systems) that is planned to grow to more than 1,600 processors. Students will access the cluster through the Internet to test their parallel programming projects. The cluster is powered with open source software, including:
"EqualLogic didn’t get 2,000 customers because people were dying to use iSCSI. It got them because it built systems that scale dynamically and because a system the size of Montana can be managed by someone as clueless as my ex-wife."
As with any acquisition, people might be asking if this is a "match made in heaven" that makes strong business sense,or another HP-Compaq debacle. Back in September, I posted [Supermarkets and Specialty Shops] to explain how the storage marketplace has two market segments. Internally, IBM distinguishesbetween "clients" and "customers". Clients are those that buy services and complete solutions from a one-stop systems vendor, such as IBM, HP, Sun, or Dell, or systems integrator like IBM, CSC or EDS. Customers are those that buy products and components, from the systems vendors I just mentioned, as well as from individual specialty shops, like EMC, HDS, or NetApp.
To reach the growing "supermarket" segment, specialty shops are dependent on systems vendors to OEM or resell their kit: EMC disk through Dell, HDS disk through Sun and HP, NetApp through IBM. Until now, EqualLogichad to make their living as a "specialty" shop, but iSCSI appeals more to SMB than large enterprises, andSMB tend to be in the "supermarket" segment, so they partnered with Sun. Here is the timeline of this likely awkwardand strained relationship:
I am not surprised that I haven't seen anything in the blogosphere yet from HP, Dell or Sun. I suspect this news meansthat Sun won't be reselling Dell's EqualLogic boxes anymore, and perhaps there is nothing more for Sun bloggers Randy Chalfant or Nigel Dessau to add to that. HP and Dell are practically non-existent in the storage blogosphere, so I didn't expect much from them either.
I did, however, expect EMC to put in their spin, given that Dell resells EMC disk, and accounts for perhaps 15% of their revenues.Now that Dell has multiple offerings, they will be instructing their channel reps when to lead with EqualLogic versus when to sell EMC, for now, until 2011, at which point may simplify their storage sales model to just EqualLogic. I don't know if Dell would do that in 2011. Depending on how quick the decline happens, EMC may have to increase the pricesof their gear, or cut into their development budgets, to make up for this loss.
I started this post because of a comment from EMC blogger Chuck Hollis, who speculates how this will impact[Dell, EqualLogic and EMC].In that post, he expresses his opinion (which I will put into a different color):
"Speculation is pretty evenly split. Neither HP nor IBM have a good, entry-level iSCSI product."
If he had left out the word "good", then that would just be a false statement, but by adding the word "good" reduces this to merely an opinion of IBM products that I disagree with. (I have no experience with whateverHP sells in this category, nor talked to any customers about their experiences, so will neither agree nordisagree with Chuck's opinion of the HP half of his statement). As for the term "Entry-level", this is fairly well defined by analysts as a storage system under $50,000 US Dollars. Actually, IBM has three good offerings.
Our basic, lowest-price model is the IBM System Storage DS3300, which does iSCSI only, like the EqualLogic offerings. This supports both SAS and SATA disks, and can attach to our System x and System p server product lines.
Our smallest model of our fancier IBM System Storage N series not only supports iSCSI, but also CIFS, NFS,HTTP, FTP, and FCP protocols, what we call "Unified Storage". The iSCSI feature is included at no additional charge, and small customers can start with this, then scale up to larger N3600, N5000 or N7000 models, andadd more protocols and software features, as their business grows.
Our next larger model, but still entry-level, is the N3600. Since the N series supports a unified multi-protocolplatform, with features like SnapLock for regulatory compliance and SnapMirror for remote disk mirroring. The IBM System Storage N series easily replaces any mix of EMC "C-boxes": Centera, Celerra, and CLARiiON.
Both the DS3300 and the N series support the various Business Applications I have discussed this week, Microsoft Exchange, Lotus Domino, SAP, Oracle, Siebel, JD Edwards and PeopleSoft. N series offers SnapManager for variousapplications to make the business value even that much better.
Chuck speculates that Dell did this to compete better against rival HP, but that doesn't make sense, sincehe feels HP didn't have much to offer in this space. Perhaps Dell did this to competebetter against IBM, the number one vendor in storage hardware, according to IDC. Looking at what IBM andNetApp have to offer, Dell may have realized that they didn't have competitive disk systems from their resellingrelationship with EMC, looked elsewhere and found EqualLogic. Meanwhile, EqualLogic probably felt that Sun wasgoing out of business, or not yet fully supportive of IP SAN environments, and decided to ["switch horses midstream"].
Continuing this week's theme on Enterprise Applications, I thought that since I mentioned Lotus Notes in my discussion ofSAP yesterday, that I would cover Microsoft Exchange today.
IBM and Microsoft is the ultimate example of "Coopetition". Both companies develop popular operating systems. Microsoft's "Xbox 360" gaming console uses IBM processors. Microsoft Exchange and IBM Lotus Domino are the Coke-and-Pepsi dominant players in the email marketplace, with Microsoft slightly in the lead, as seen on this graph[Lotus Notes/Domino marketshare growing] from fellow IBM Lotus blogger Alan Lepofsky.And now, Microsoft is getting serious about participating in the storage software business, with its strong support for iSCSI and its SharePoint product. For this post, I will focus just on email.
For those not familiar with both Microsoft and IBM products, I offer the simple cheat-sheet below:
Microsoft Outlook (client)::IBM Lotus Notes (client) Microsoft Exchange (server)::IBM Lotus Domino (server)
Email has become the primary collaboration tool for most businesses, raising it to the level of "mission-critical".Microsoft has introduced its new Exchange 2007 to replace the existing Exchange 2003. Here are the key differences:
Windows 2000 or 2003
Runs on 32-bit x86
Requires 64-bit EM64T or AMD64, but Itanium IA64 not supported
Two(2) server roles
Five(5) server roles
Edge Server Role for combating SPAM
Unified Messaging services to combine voicemail, email, fax
5 storage groups
50 storage groups per server on Enterprise edition
50 databases per server on Enterprise edition (max 5 per storage group)
NAS or NTFS-formatted block disk
NTFS-formatted block disk recommended
Obviously, Exchange only runs on Windows operating system. The change from 32-bit to 64-bit means that many Exchange 2003 customers have not yet migrated over, and perhapsnow is a good time to point out alternative email servers on more reliable operating system platforms.For example, in addition to Windows 2003, Lotus Domino runs on IBM AIX, Linux on x86, Linux on System z, Sun Solaris, i5/OS on System i, and z/OS.
Another Linux alternative to Microsoft Exchange is Bynari InsightServer, which allows you to use your existing Windows-based Microsoft Outlook clients, swapping out only the server. This approach can be used when consolidating Windows servers to Linux virtual images on System z mainframe.Linux desktops can run [Ximian Evolution] to attach to either Bynari server, or Windows-based Microsoft Exchange server.Linux Journal offers a few articles on this:[Understanding and Replacing Microsoft Exchange, andExchange Functionality for Linux].
As with [Exchange 2003 editions], the new Exchange 2007 comes in both ["Standard" and "Enterprise" editions]. With all the newroles supported, you now can limit your "Mailbox Storage Server" role as Enterprise, and have the other roles, likeEdge and Hub, as simply "Standard" instead. Enterprise is about 5x more expensive than Standard, so that can makea difference.With Exchange 2003, the big difference was that "Standard" supported only 16GB, versus 16TB with "Enterprise",making "Standard" impractical for all but the smallest company. In the new Exchange 2007, both Standard and Enterprise support 16TB.
Exchange 2007 is also less IOPS-intensive. Thanks to 64-bit addressing, it generates about 75 percent fewer IOPS than Exchange 2003 for comparable configurations. This is good becauseaccording to a 2006 Radicati Group survey, the average corporate employee gets 84 emails per day, averaging 10MBdaily ingestion, and this is expected to grow to 15.8MB daily ingestion by 2008. The number of mailboxes worldwideis growing at a rate of 16 percent per year.
IBM System Storage is a Microsoft Gold certified partner, and participates in Microsoft's Exchange Solution Reviewed Program [ESRP].Both IBM DS8000 and DS4000 series are certified under this program, using a testbed called Jetstress.Those considering IBM System Storage N series can use Exchange 2007 with NTFS-formatted LUNs via FCP or iSCSIattachment.
Backup and Business Continuity
Back in 2003, the Meta Group found that 80 percent of organizations surveyed felt access to email was more importantthan telephone service, and that 74 percent believed being without email would present a greater hardship thanlosing telephone service. These percentages are probably higher today, with websiteslike ["Crackberry.com"] to cater to those addicted to theirRIM Blackberry hand-held devices.
IBM Tivoli Storage Manager can provide backup and recovery support for Microsoft Exchange.TSM for Mail supports both Microsoft Exchange and Lotus Domino. TSM for Copy Services can use MicrosoftVolume Shadow Copy Services (VSS) interfaces. I blogged about this before, back in June[Exchange 2003 VSS Snapshot Backup Whitepaper], and now there TSM has support for Exchange 2007 as well.
Interestingly, Exchange 2007 has some built-in"Business Continuity" features. Of the ones below, Standard edition has LCR only, Enterprise edition gives you the full set.
Local Continuous Replication (LCR):In this approach, a single server ships update logs from the active storage group on one disk system over to a passivecopy on a secondary disk system, presumably within 10km FCP distance. These logs can then be forward-applied to thepassive copy. This is sometimes called "database shadowing".
Cluster Continuous Replication (CCR):This is based on two servers in an active/passive MSCS cluster. First server is attached to the primary disk system,and ships logs to the passive copy attached to the second server.
Standby Continuous Replication (SCR):For the MSCS cluster-averse customer, SCR is based on two independent servers that are in two locations. In the event of failure on thefirst, scripts can be run to switch over to the second server. Each server has its own disk system.
Single Copy Clusters (SCC):This is for customers who have existing systems, but not recommended for new customers. An MSCS cluster, where both active andpassive servers are connected to the same single disk system. The disk array can be a single point of failure (SPOF) in this environment.You could mitigate risks by using IBM's disk mirroring in this situation, but then you are left coordinating those copies with new servers at the remote location.
It is estimated that as much as 75 percent of a company's intellectual property (IP) can be found somewhere in their email repository. Email is often requested in lawsuits and regulatory investigations. According to the Workplaceemail IM & blogging 2006 survey by AMA and the ePolicy Institute, 24 percent of organizations have be subpoenaed by courts and regulators, and another 15 percent have gone to court in lawsuits triggered by employee emails.
New regulations now mandate that emails are archived, protected against tampering and unauthorized access, and kept for a specific amount of time, or until certain conditions are met. According to a 2004 CSI and FBI Computer Crime and Security survey, 78 percent of organizations were hit by viruses (the rest must have been running Linux, AIX, i5/OS or z/OS!)and 37 percent reported unauthorized access to confidential information.
According to Gartner, over 60 million people will be doing some form of telecommuting, so access Microsoft hasbeen working on extending the reach of email beyond Outlook client. There is now "Outlook Web Access" thatprovides browser-based access, "Outlook Mobile" to provide text access from cellular phones, and even "Outlook Voice Access" which allows you to listen to your emails from any phone. These are all part of the new Unified MessagingServices feature.
Chris Evans over at Storage Architect posts aboutHardware Replacement Lifecycle Update, on how storage virtualization can helpwith storage hardware replacemement. He makes two points that I would like to comment on.
... indeed products such as USP, SVC and Invista can help in this regard. However at some stage even the virtualisation tools need replacing and the problem remains, although in a different place.
Knowing that replacement of technologies at all levels are inevitable, IBM System Storage SAN Volume Controlleris actually designed to allow cluster non-disruptive upgrade, which we announcedMay 2006.
The process is quite elegant. The SVC consists of one or more node-pairs, and can be upgraded while the systemis up and running by replacing nodes one at a time in a sequence of suspend and resume. All of the mapping tablesare loaded onto the new nodes from the rest of the still active nodes.
I was hoping as part of the USP-V announcement HDS would indicate how they intend to help customers migrate from an existing USP which is virtualising storage, but alas it didn't happen.
Unlike the SVC, once cannot just upgrade the USP in place and make it into a USP-V. While it might be possible tounplug external disk from the old USP, and re-plug into the new USP-V, what do you do about the internal disk data?I doubt you can just move drawers and trays of disk from the old to the new. The data has to be moved some other way.
Some have asked why not just put an SVC in front of both the old USP and the new USP-V and transfer the data that way.While SVC does support virtualizing the old USP device, IBM is still testing the new USP-V as a managed device, and so this solution is not yet available, and would only apply to the LUNs in the USP-V, not the volumes specifically formatted for System i or System z.
An alternative is to take advantage of IBM's Data Mobility Services, the result of our recentacquisition of SofTek. IBM can help you both mainframe and distributed systems data from any device, to any device.
In a typical four year lifecycle of storage arrays, it might take six months or so to fill up the box, and might takeas much as a year at the end to move the data out to other equipment. SVC can greatly reduce both of these, so that you can take immediate advantage of new equipment as soon as possible, and keep using it for close to the full four years,migrating weeks or days before your lease expires.
Yesterday, IBM announced a variety of new storage offerings. Our theme this time around was "Policies and Performance". Here's a quick recap.
IBM offers new appliance and gateway models of its popular "unified storage" IBM System Storage N series disk systems.The N5300 appliance has two models. A10 for the single-controller, and A20 for the dual-controller model. The N5600 gateway also has two models. G10 for the single-controller, and G20 for the dual-controller model.A new EXN4000 disk expansion drawer is 3U high, and can hold up to 14 disks. It can support 1Gbps, 2Gpbs and 4Gpbs speeds.In addition to all this new "performance", we offer a new "policy" called the Advanced Single Instance Storage feature for the N5000 and N7000 series, which provides de-duplication at the block level. This can be particularly useful if you are using your N series for e-mail, document publishing, databases, backups or archives.
SAN Volume Controller
A technology refresh with the new 8G4 model. Like its predecessor, the 8F4, this new model has 8GB of cache per node, and is fitted with 4Gpbs SAN attachment ports. The difference is that the 8G4 is based on our successfulIBM System x3550 server.This baby screams, so I look forward to seeing the updated SPC-1 and SPC-2 performance benchmark ratings.The new SVC 4.2 software provides additional authentication policies for more granular administration support, andmulti-destination FlashCopy (one source copied to up to 16 destination copies at the same time).
The DS8000 series now supports having third and fourth expansion frames. This was actually already available via RPQ, but now it can be directly ordered.This means that you can now hold up to half a Petabyte in a single disk system.
IBM TotalStorage Productivity Center v3.3 offers policy and performance-based guidance in configuring disk system volumes, specification of paths between hosts and disk systems during storage provisioning, policy-based specification of zone membership, configuration analysis capabilities, configuration change management, extended tape management, and both content-sensitive and scalable enterprise-wide reports. There is also a version specifically designedto manage disk replication on System z platforms.
Deep Computing Storage
The IBM System Storage DCS9550 Storage System comes in a 4U controller and 3U disk expansion drawers. It is designed for High Performance Computing (HPC) such as genome medical research, government research and rich media applications.
Our clients tell us they need performance to meet their dynamic business demands, and policies to help them manage the ever growing size of their storage infrastructure. We listened!
Yesterday morning, the entire country of Colombia suffered their worst black-out (power outage) in 22 years. 98% of the country was out for 4 1/2 hours.This is just 5 months after an outage that hit 25% of the country, December 7, 2006.Ironically, this one happened the week I am here explaining the need for Business Continuity plans to IBM Business Partners from Argentina, Peru, Velenzuela, Ecuador and Colombia. As is oftenthe case, people often need a real example to recognize the need for planning is important.
It reminded me of the Northeast Black-out of 2003 that impacted USA and Canada. I was speaking to a crowd of 800 people at the SHARE conference in Washington D.C. when it happened, and hundreds of pagers and cell-phones went off all at the same time. Although we were outside the effected area and had plenty of lighting, we ended up canceling therest of my talk, and many people left immediately to help execute their business continuity plans.Of course, terrorism was immediately assumed, but a final report showed that it was initiated in Ohiodue to overgrown trees, and then propagated due to a software bug to hundreds of other plants.
According to this morning's Bogota newspaper, "El Tiempo", nobody knows the root cause of yesterday's outage. Immediately, the country's leftist rebels were blamed, but now the leading theory is that it was initiated byoperator error (a technician touching something he shouldn't have), and then propagated by a faulty distribution system.
Another example of the need for a robust and resilient infrastructure, and appropropriate business continuity plans.