Tony Pearson is a Master Inventor and Senior IT Architect for the IBM Storage product line at the
IBM Systems Client Experience Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2016, Tony celebrates his 30th year anniversary with IBM Storage. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
My books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The new [IBM System Storage Tape Controller 3592 Model C07] is an upgrade to the previous C06 controller. Like the C06, the new 3592-C07 can have up to four FICON (4Gbps) ports, four FC ports, and connect up to 16 drives. The difference is that the C07 supports 8Gbps speed FC ports, and can support the [new TS1140 tape drives that were announced on May 9]. A cool feature of the C07 is that it has a built-in library manager function for the mainframe. On the previous models, you had to have a separate library manager server.
Crossroads ReadVerify Appliance (3222-RV1)
IBM has entered an agreement to resell [Crossroads ReadVerify Appliance], or "RV1" for short. The RV1 is a 1U-high server with software that gathers information on the utilization, performance and health for a physical tape environment, such as an IBM TS3500 Tape Library. The RV1 also offers a feature called "ArchiveVerify" which validates long-term retention archive tapes, providing an audit trail on the readability of tape media. This can be useful for tape libraries attached behind IBM Information Archive compliance storage solution, or the IBM Scale-Out Network Attached Storage (SONAS).
As an added bonus, Crossroads has great videos! Here's one, titled [Tape Sticks]
Linear Tape File System (LTFS) Library Edition Version 2.1
While the hardware is all refreshed, the overall "scale-out" architecture is unchanged. Kudos to the XIV development team for designing a system that is based entirely on commodity hardware, allowing new hardware generations to be introduced with minimal changes to the vast number of field-proven software features like thin provisioning, space-efficient read-only and writeable snapshots, synchronous and asynchronous mirroring, and Quality of Service (QoS) performance classes.
The new XIV Gen3 features an Infiniband interconnect, faster 8Gbps FC ports, more iSCSI ports, faster motherboard and processors, SAS-NL 2TB drives, 24GB cache memory per XIV module, all in a single frame IBM rack that supports the IBM Rear Door Heat Exchanger. The results are a 2x to 4x boost in performance for various workloads. Here are some example performance comparisons:
Disclaimer: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. Your mileage may vary.
In a Statement of Direction, IBM also has designed the Gen3 modules to be "SSD-ready" which means that you can insert up to 500GB of Solid-State drive capacity per XIV module, up to 7.5TB in a fully-configured 15 module frame. This SSD would act as an extension of DRAM cache, similar to how Performance Accelerator Modules (PAM) on IBM N series.
IBM will continue to sell XIV Gen2 systems for the next 12-18 months, as some clients like the smaller 1TB disk drives. The new Gen3 only comes with 2TB drives. There are some clients that love the XIV so much, that they also use it for less stringent Tier 2 workloads. If you don't need the blazing speed of the new Gen3, perhaps the lower cost XIV Gen2 might be a great fit!
As if I haven't said this enough times already, the IBM XIV is a Tier-1, high-end, enterprise-class disk storage system, optimized for use with mission critical workloads on Linux, UNIX and Windows operating systems, and is the ideal cost-effective replacement for EMC Symmetrix VMAX, HDS USP-V and VSP, and HP P9000 series disk systems, . Like the XIV Gen2, the XIV Gen3 can be used with IBM System i using VIOS, and with IBM System z mainframes running Linux, z/VM or z/VSE. If you run z/OS or z/TPF with Count-Key-Data (CKD) volumes and FICON attachment, go with the IBM System Storage DS8000 instead, IBM's other high-end disk system.
Friday - We landed in Paris, France. I have been to Paris many times, but this was a first for Mo. A croissant cost only 2 Euro, but the young woman behind the counter gave me a look of disgust when I asked for a knife and butter to put on the croissant. If you ever get the chance to have a real French croissant, you will realize you don't need any more butter. If you do attempt to put anything on the croissant, it will disintegrate into a million tiny pieces!
2. Visit Ronda
Saturday - We rented a car and drove to the mountain village of [Ronda, Spain], which is in the heart of the region of Spain called Andalucia. Why Ronda? This was where Mo's uncle was stationed during the war. The town is built on two mountains, connected by a set of bridges. The tallest is "Puente Nuevo", built in the 1700s, which is nearly 400 feet tall. Ronda is also home of Spain's oldest Bull Fighting ring. Bars and restaurants built along the cliff offer some spectacular views. Mo and I shared a "Paella Mixta" for lunch, consisting of yellow rice with bits of chicken and seafood.
3. Soak in European Mineral Waters
Sunday - Most things in Europe are closed on Sunday, so we decided to have a "Spa Day" at the [Gran Hotel Benahavis], in Benahavis, Spain. This lovely hotel is built over a natural mineral waters hot spring, and an underground spa allowed us to relax in the warmth. The spa also had a dry sauna, steam sauna, and ice cold water bath to complete the experience.
4. Climb to the Top of the Rock of Gibraltar
Monday - Technically, Gibraltar is a separate country, but they use British money (Pound Sterling). To get to the top of the rock, we drove across their airport runway, saw the mosque at Point Europa, parked in large parking lot and took the cable car to the top. From there, we climbed a few more steps to see the grand views of Spain and North Africa, while keeping our distance from the infamous monkeys. These [Barbary Macaque] are cute, but can bite or scratch you if you get too close. Afterwards, we had lunch in a pub called the Angry Friar.
5. See Snake Charmers in Morocco
Tuesday - We took a guided tour over to the Kingdom of Morocco. This included a ferry boat ride from Tarifa, Spain to Tangier, Morocco. A bus then took us to the "Kasbah" (the fort), where we got to see snake charmers perform their act. We had an interesting lunch, followed by obligatory "shopping opportunities" for rugs and spices. Back on the bus, we went to a place to go ride camels, see the King's palace, and visit the the Grotto of Hercules. The last stop was to sit back and relax for a nice cup of hot Menthol Tea at Cap Spartel, the northernmost point of Morocco.
6. Hang Out at a Mediterranean Beach
Wednesday - Our last full day in Spain, we decided to have lunch on the beach. This region is referred to as Costa Del Sol. We opted for "Playa de la Rada" in Estepona, Spain. The beach was a bit rocky, the sand was hot and uncomfortable to walk on, and the heat and humidity was just slightly less than the steam sauna at the Gran Hotel Benahavis. We stayed in the shade of our beach-side restaurant and had a lunch of grilled sardines and the local Cruzcampo beer.
7. Visit the World of Coca-Cola
Thursday - we drove to Malaga, Spain, and flew back to the United States. Malaga is famous for celebrities like Ernest Hemingway and Pablo Picaso. We could not get all the way back to Tucson, so we stayed overnight in Atlanta.
Friday - This gave us an opportunity to visit the [World of Coca-Cola], where Mo's cousin had done some recent marketing work in celebration of their 125 year anniversary. This is a museum with a live bottling operation on display, a 4D movie, viewing areas to see commercials from around the world, and free tasting, sampling some of the 105 different soft drink flavors manufactured. I recommend the Tawney Ginger from Tanzania, and the Simba Guarana from Brazil. I did not care for the Apple-and-Carrot soda from Japan.
8. See a Manta Ray Up Close
Our discount combo tickets included a visit to the [Georgia Aquarium] next door. Mo can't scuba-dive, but she got stung by a ray when she was a kid, and wanted to show me a big Manta Ray up close. The aquarium was quite good, divided up into separate exhibits, including interactive touch-the-fish areas for the kids, Beluga whales, Jellyfish, Seahorses, and a moving sidewalk that takes you underneath the sea life.
I would like to thank Delta Air Lines for letting Mo and I take this trip using frequent flyer miles, Hertz Rental Cars for offering a sweet deal on a tiny Hyundai i20 car, the Gran Hotel Benahavis for their hospitality, and the incredibly warm and helpful people of Atlanta. I am glad that my language skills in French, Spanish and Arabic came in quite handy!
(FTC Disclosure: I do not work or have any financial investments in ENC Security Systems. ENC Security Systems did not paid me to mention them on this blog. Their mention in this blog is not an endorsement of either their company or any of their products. Information about EncryptStick was based solely on publicly available information and my own personal experiences. My friends at ENC Security Systems provided me a full-version pre-loaded stick for this review.)
The EncryptStick software comes in two flavors, a free/trial version, and the full/paid version. The free trial version has [limits on capacity and time] but provides enough glimpse of the product to decide before you buy the full version. You can download the software yourself and put in on your own USB device, or purchase the pre-loaded stick that comes with the full-version license.
Whichever you choose, the EncryptStick offers three nice protection features:
Encryption for data organized in "storage vaults", which can be either on the stick itself, or on any other machine the stick is connected to. That is a nice feature, because you are not limited to the capacity of the USB stick.
Encrypted password list for all your websites and programs.
A secure browser, that prevents any key-logging or malware that might be on the host Windows machine.
I have tried out all three functions and everything works as advertised. However, there is always room for improvement, so here are my suggestions.
The first problem is that the pre-loaded stick looks like it is worth a million dollars. It is in a shiny bronze color with "EncryptStick" emblazoned on it. This is NOT subtle advertising! This 8GB capacity stick looks like it would be worth stealing solely on being a nice piece of jewelry, and then the added bonus that there might be "valuable secrets" just makes that possibility even more likely.
If you want to keep your information secure, it would help to have "plausible deniability" that there is nothing of value on a stick. Either have some corporate logo on it, of have the stick look like a cute animal, like these pig or chicken USB sticks.
It reminds me how the first Apple iPod's were in bright [Mug-me White]. I use black headphones with my black iPod to avoid this problem.
Of course, you can always install the downloadable version of EncryptStick software onto a less conspicuous stick if you are concerned about theft. The full/paid version of EncryptStick offers an option for "lost key recovery" which would allow you to backup the contents of the stick and be able to retrieve them on a newly purchased stick in the event your first one is lost or stolen.
Imagine how "unlucky" I felt when I notice that I had lost my "rabbits feet" on this cute animal-themed USB stick.
I sense trouble for losing the cap on my EncryptStick as well. This might seem trivial, but is a pet-peeve of mine that USB sticks should plan for this. Not only is there nothing to keep the cap on (it slides on and off quite smoothly), but there is no loop to attach the cap to anything if you wanted to.
Since then, I got smart and try to look for ways to keep the cap connected. Some designs, like this IBM-logoed stick shown above, just rotate around an axle, giving you access when you need it, and protection when it is folded closed.
Alternatively, get a little chain that allows you to attach the cap to the main stick. In the case of the pig and chicken, the memory section had a hole pre-drilled and a chain to put through it. I drilled an extra hole in the cap section of each USB stick, and connected the chain through both pieces.
(Warning: Kids, be sure to ask for assistance from your parents before using any power tools on small plastic objects.)
The EncryptStick can run on either Microsoft Windows or Mac OS. The instructions indicate that you can install both versions of download software onto a single stick, so why not do that for the pre-loaded full version? The stick I have had only the Windows version pre-loaded. I don't know if the Windows and Mac OS versions can unlock the same "storage vaults" on the stick.
Certainly, I have been to many companies where either everyone runs Windows or everyone runs Mac OS. If the primary target audience is to use this stick at work in one of those places, then no changes are required. However, at IBM, we have employees using Windows, Mac OS and Linux. In my case, I have all three! Ideally, I would like a version of EncryptStick that I could take on trips with me that would allow me to use it regardless of the Operating System I encountered.
Since there isn't a Linux-version of EncryptStick software, I decided to modify my stick to support booting Linux. I am finding more and more Linux kiosks when I travel, especially at airports and high-traffic locations, so having a stick that works both in Windows or Linux would be useful. Here are some suggestions if you want to try this at home:
Use fdisk to change the FAT32 partition type from "b" to "c". Apparently, Grub2 requires type "c", but the pre-loaded EncryptStick was set to "b". The Windows version of EncryptStick> seems to work fine in either mode, so this is a harmless change.
Install Grub2 with "grub-install" from a working Linux system.
Once Grub2 is installed, you can boot ISO images of various Linux Rescue CDs, like [PartedMagic] which includes the open-source [TrueCrypt] encryption software that you could use for Linux purposes.
This USB stick could also be used to help repair a damaged or compromised Windows system. Consider installing [Ophcrack] or [Avira].
Certainly, 8GB is big enough to run a full Linux distribution. The latest 32-bit version of [Ubuntu] could run on any 32-bit or 64-bit Intel or AMD x86 machine, and have enough room to store an [encrypted home directory].
Since the stick is formatted FAT32, you should be able to run your original Windows or Mac OS version of EncryptStick with these changes.
Depending on where you are, you may not have the luxury to reboot a system from the USB memory stick. Certainly, this may require changes to the boot sequence in the BIOS and/or hitting the right keys at the right time during the boot sequence. I have been to some "Internet Cafes" that frown on this, or have blocked this altogether, forcing you to boot only from the hard drive.
Well, those are my suggestions. Whether you go on a trip with or without your laptop, it can't hurt to take this EncryptStick along. If you get a virus on your laptop, or have your laptop stolen, then it could be handy to have around. If you don't bring your laptop, you can use this at Internet cafes, hotel business centers, libraries, or other places where public computers are available.
In less than a month, I will be presenting at the annual IBM Storage Technical University, July 18-22, at the Hilton in Orlando, Florida. This is one of my favorite conferences! You can sign up for this at their [Online Registration Page].
I will be covering a variety of topics:
IBM Storage Strategy in the Era of Smarter Computing - After IBM has led the IT industry through the "Centralized Computing" era, and then later the "Distributed Computing" era, we are now entering the third era, that of Smarter Computing. Come learn IBM's strategy for Storage to address today's big challenges, including Big Data, Integrated Workload-optimized systems, and Cloud service delivery models.
IBM Information Archive for Email, Files and eDiscovery - This session will cover the latest announcement for our non-erasable, non-rewriteable compliance storage, the Information Archive (IA), how this can be used to protect your emails and files, and provide indexed search to assist with eDiscovery.
IBM Tivoli Storage Productivity Center Overview and Update - I was one of the original lead architects for Productivity Center. Come learn what this software is all about, and how the latest features and functions can help you manager your IT environment.
IBM SONAS and the Smart Business Storage Cloud - Confused about Cloud Computing and Cloud Storage? I will explain everything you need to know, including how the integrated SONAS appliance operates, IBM's customized solutions for private cloud deployments, and IBM's public cloud offerings.
BOF on Social Media - BOF stands for "Birds of a Feather", and his normally an after-hours discussion on a single theme. This BOF will be a four-expert Q&A panel, including myself, John Sing, Rich Swain and Ian Wright. We will discuss how we got started in Social Media, and how it has boosted our careers and our ability to get work done.
Last Thursday, on IBM's 100-year anniversary, we had a huge turn-out for the celebration here at the IBM Development Lab site in Tucson, AZ. Employees brought in memorabilia that reminded them of the past 100 years.
Everyone got a black tee-shirt with the original IBM logo. There was plenty of music, food and drink, as well as a few speeches by former and current IBM executives.
Now, the fun begins on the next century of IBM. What will be in store for the world in the 21st century? We live in interesting times!
Kevin's perspective focused on the evolution over the past 100 years of "information science", in six chapters: sensing, memory, processing, logic, connecting, and architecture. He covers the technology from IBM Punched Cards and core memory, to the latest optical chips and the DeepQA technology in IBM Watson.
Steve's perspective was on IBM as a corporation, and how IBM and other corporations have evolved over the past century. In the late 19th century and early 20th century, "Internationals" had their headquarters in the United States, and regional sales and distribution offices elsewhere. The mid-20th century gave rise to "Multinationals" that invested more heavily in regional headquarters scattered across the globe. Today, in the 21st century, IBM and its clients are [Globally Integrated Entrprises] that move work to the lowest costs, best skills, and most attractive business climates.
Jeffrey M. O'Brien
Jeffrey M. O'Brien has been a senior editor [Fortune] and [Wired] magazines, and his work has appeared in The Best of Technology Writing, The Best American Science and Nature Writing, and The Best American Science Writing.
Jeffrey's perspective is on the impact technology has on humanity, organized into five steps towards progress: Seeing, Mapping, Understanding, Believing, and Acting. These steps have been around long before IBM, and Jeffrey is able to draw parallels to such efforts as Lewis & Clark mapping out the Louisiana Purchase, advancements in genetically modified foods, and the thousands of IBMers required to land a man on the moon.
This afternoon, everyone at the IBM Tucson site will be getting together to celebrate IBM's Centernnial!
This week, IBM celebrates its Centennial, 100 years since its incorporation on June 16, 1911.
A few months ago, the Tucson Executive Briefing Center ordered its latest IBM System Storage [DS8800] to be on display for demos. This was manufactured in Vác, Hungary (about an hour north of Budapest), and was going to be shipped over to the United States.
However, Sam Palmisano, IBM Chairman and CEO, was in Hannover, Germany for the [CeBIT conference] and wanted this DS8800 to be re-directed to Germany first for this event. He was kind enough to sign it for us. Brian Truskowski, IBM General Manager for Storage, and Rod Adkins, IBM Senior Vice President for IBM Systems Technolgoy Group (and my fifth-line manager), also signed this as well!
I am pleased to say this "signed" DS8000 has arrived to Tucson. This is the latest model in a family of market-leading high-end enterprise-class disk systems designed to attach to all computers, including System z mainframes, POWER systems running AIX and IBM i, as well as servers running HP-UX, Solaris, Linux or Windows.
For more on IBM's other innovations over the past 100 years, check out the [Icons of Progress], which includes these storage innovations:
This Thursday, June 16, 2011, marks IBM's Centennial 100 year anniversary. It happens to also be my 25th anniversary with IBM Storage. To avoid conflicts in celebrations, we decided to celebrate my induction into the "Quarter Century Club" (QCC) last Friday instead.
My colleague Harley Puckett was master of ceremonies. Here he is presenting me with a memorial plaque and keychain. Harley mentioned a few facts about 1986, the year I started working for IBM. Ronald Reagan was the US President, gasoline cost only 93 cents per gallon, and the US National Debt was only 2 trillion US dollars!
Here are my colleagues from DFSMShsm. From left to right: Ninh Le, Henry Valenzuela, Shannon Gallaher, and Stan Kissinger. I started in 1986 as aa software developer on DFHSM, and slowly worked my way up to be a lead architect of DFSMS.
Here are my colleagues from Tivoli Storage Manager (TSM). From left to right: Matt Anglin, Ken Hannigan and Mark Haye. I first met them when they worked in DFDSS, having moved from San Jose, CA down to Tucson. While I never worked on the TSM code itself, I did co-author some of the patents used in the product and other products like the 3494 Virtual Tape Server that makes use of TSM internally. I also traveled extensively to promote TSM, often with a TSM developer tagging along so they can learn the ropes about how to travel and make presentaitons.
Here are my colleagues from the disk team. From left to right: Joe Bacco, Carlos Pratt, Gary Albert, and Siebo Friesenborg. I worked on the SMI-S interface for the ESS 800 and DS8000 disk systems needed for the Tivoli Storage Productivity Center. Joe leads the "Disk Magic" tools team. Carlos and I worked on qualifying the various disk products to run with Linux on System z host attachment. Gary Albert is the Business Line Executive (BLE) of Enterprise Disk. Siebo Friesenborg was a disk expert on performance and disaster recovery, but is now enjoying his retirement.
Here are my colleagues from the support team. From left to right: Max Smith, Dave Reed, and Greg McBride. I used to work in Level 2 Support for DFSMS with Max and Dave, carrying a pager and managing the queue on RETAIN. We had enough people so that each Level 2 only had to carry the pager two weeks per year. On Monday afternoons, the person with the pager would give it to the next person on the rotation. On Monday, September 10, 2001, I got the pager, and the following morning, it went off to help all the many clients affected by the September 11 tragedy.
I worked with Greg McBride when he was in DFSMS System Data Mover (SDM), and then again in Tivoli Storage Productivity Center for Replication (TPC-R), and now he is supporting IBM Scale-Out Network Attached Storage (SONAS).
Standing in the light blue striped shirt is Greg Van Hise, my first office-mate and mentor when I first joined IBM. He went on to be part of the elite "DFHSM 2.4.0" prima donna team, then move on to be an architect for Tivoli Storage Manager (TSM).
I wasn't limited to inviting just coworkers, I was also able to invite friends and family. Here are Monica, Richard, and my mother. Normally, my parents head south for the summer, but they postponed their flights so that they could participate in my QCC celebration.
From left to right: my father, Greg Tevis, and myself. It was pure coincidence that my father would wear a loud darkly patterned shirt like mine. Honestly, we did not plan this in advance. Greg Tevis and I were lead architects for the Tivoli Storage Productivity Center, and Greg is now the Technology Strategist for the Tivoli Storage product line.
Here is Jack Arnold, fellow subject matter expert who works with me here at the Tucson Executive Briefing Center, sampling the food. We had quite the spread, including egg rolls, meatballs, luncheon meats, chicken strips, and fresh vegetables.
More colleagues from the Tucson Executive Briefing Center, from left to right, Joe Hayward, Lee Olguin, and Shelly Jost. Joe was a subject matter expert on Tape when I first joioned the EBC in 2007, but he has moved back to the Tape development/test team. Lee is our master "Gunny" sargeant to manage all of our briefing schedules. Shelly is our Client Support Manager, and was the one who organized all the food and preparations for this event!
Lastly, here are Brad Johns, myself, and Harley Puckett. Brad was my mentor for my years in Marketing, and has since retired from IBM and now works on his golf game. I would like to thank all of the Tucson EBC staff for pulling off such a great event, and all my coworkers, friends and family for coming out to celebrate this milestone in my career!
In addition to the plaque and keychain, Harley presented me with a book of congratulatory letters. If you would like to send a letter, it's not too late, contact Mysti Wood (firstname.lastname@example.org).
Well, it's Tuesday, in the United States at least, and you know what that means... IBM Announcements! I am actually down under in Sydney, Australia, and it is Wednesday already as I write this. I feel like a time traveler.
IBM announces their latest disk system, the [IBM System Storage DCS3700], designed for high-performance computing (HPC), business analytics, video broadcasting, and other sequential workloads. The "DCS" stands for Deep Computing Storage. IBM already has the DCS9900 for large enterprise deployments, so this smaller DCS3700 is targeted for midrange deployments.
In a compact 4U package, the DCS3700 packs dual active-active controllers and up to 60 disk drives. The controller drawer can support two additional expansion drawers, of 60 drives each in 4U drawers, for a maximum total of 180 drives in 12U of rack space. Packed with "green" 7200RPM energy-efficient 2TB drives, a system can have up to a 360TB raw capacity. The system supports RAID levels 0, 1, 3, 5, 6, and 10.
The system comes with the latest 6Gbps SAS connections for host attachment, but you can choose 8Gbps Fibre Channel Protocol (FCP) instead, allowing the DCS3700 to be managed by SVC or Storwize V7000.
Dan Galvan, IBM VP of Marketing for Storage, was the next speaker. With 300 billion emails being sent per day, 4.6 billion cell phones in the world, and 26 million MRIs per year, there is going to be a huge demand for file-based storage. In fact, a recent study found that file-based storage will grow at 60 percent per year, compared to 15 percent growth for block-based storage.
Dan positioned IBM's Scale-out Network Attached Storage (SONAS) as the big "C:" drive for a company. SONAS offers a global namespace, a single point of management, with the ability to scale capacity and performance tailored for each environment.
The benefits of SONAS are great. We can consolidate dozens of smaller NAS filers, we can virtualize files across different storage pools, and increase overall efficiency.
Powering advanced genomic research to cure cancer
The next speaker was supposed to be Bill Pappas, Senior Enterprise Network Storage Architect, Research Informatics at [St. Jude Children’s Research Hospital]. Unfortunately, St. Jude is near the flooding of the Mississippi river, and he had to stay put. An IBM team was able to capture his thoughts on video that was shown on the big screen.
Thanks to the Human Genome project, St. Jude is able to cure people. They see 5700 patients per year, and have an impressive 70 percent cure rate. The first genetic scan took 10 years, now the technology allows a genome to be mapped in about a week. Having this genomic information is making vast strides in healthcare. It is the difference of fishing in a river, versus putting a wide net to catch all the fish in the Atlantic ocean all at once.
Recently, St. Jude migrated 250 TB of files from other NAS to an IBM SONAS solution. The SONAS can handle a mixed set of workloads, and allows internal movement of data from fast disk, to slower high-capacity disk, and then to tape. SONAS is one of the few storage systems that supports a blended disk-and-tape approach, which is ideal for the type of data captured by St. Jude.
IBM's own IT transformation
Pat Toole, IBM's CIO, presented the internal transformation of IBM's IT operations. He started in 2002 in the midst of IBM's effort to restructure its process and procedures. They identified four major data sources: employee data, client data, product data, and financial data. They put a focus to understand outcomes and set priorities.
The result? A 3-to-1 payback on CIO investments. This allowed IBM to go from server sprawl to consolidated pooling of resources with the right levels of integration. In 1997, IBM had 15,000 different applications running across 155 separate datacenters. Today, they have reduced this down to 4,500 applications and 7 datacenters. Their goal is to reduce down to 2,225 applications by 2015. Of these, only 250 are mission critical.
Pat's priorities today: server and storage virtualization, IT service management, cloud computing, and data-centered consolidation. IBM runs its corporate business on the following amount of data:
9 PB of block-based storage, SVC and XIV
1 PB of file-based storage, SONAS
15 PB of tape for backup and archive
Pat indicated that this environment is growing 25 percent per year, and that an additional 70-85 PB relates to other parts of the business.
By taking this focused approach, IBM was able to increase storage utilization from 50 to 90 percent, and to cut storage costs by 50 percent. This was done through thin provisioning, storage virtualization and pooling.
Looking forward to the future, Pat sees the following challenges: (a) that 120,000 IBM employees have smart phones and want to connect them to IBM's internal systems; (b) the increase in social media; and (c) the use of business analytics.
After the last session, people gathered in the "Hall of the Universe" for the evening reception, featuring food, drinks and live music. It was a great day. I got to meet several bloggers in person, and their feedback was that this was a very blogger-friendly event. Bloggers were given the same level of access as corporate executives and industry analysts.
During the break, I talked with some of the other bloggers at this event. From left to right: Stephen Foskett [Pack Rat] blog, Devang Panchigar [StorageNerve], and yours truly, Tony Pearson. (Picture courtesy of Stephen Foskett)
Meet the Experts
This next segment was a Q&A panel, with a moderator posing questions to four experts. Originally, I was scheduled to be the moderator, but this was changed to Doug Balog. The experts on the panel were:
Rich Castagna, Editorial Director for Storage Media, TechTarget. TechTarget is the group that runs the [SearchStorage] website.
Stan Zaffos, Gartner VP of Research, who spoke earlier today. I have worked with Stan for years as well, and have attended the last four Gartner Data Center Conferences held every December in Las Vegas.
Steve Duplessie, Founder and Senior Analyst, Enterprise Strategy Group (ESG). Steve's blog is titled [The Bigger Truth].
Jon clarified a statement Doug Balog said earlier in the day attributed to his study. Doug had said that 40 percent of all data should be archived. The study that Jon Toigo had done found that, on average, for the data on disk systems, about 30 percent is useful data, 40 percent is not active and could be eligible for archive, and the remaining 30 percent was crap.
The other experts introduced themselves. Rich felt that "Cloud" was still the biggest buzzword in the IT industry. Stan felt that CIOs should ask their storage administrators "What are you doing to improve my agility and efficiency". Steve felt that it was better to focus on improving process and procedures, rather than trying to deploy the best technology.
How can you best reduce backup costs per TB?
Jon- use tape.
Rich- Clean up your environment.
Stan- Don't rehydrate your deduplicated data, adopt archive approach, and revisit your backup schedules.
Steve- Deduplication covers up stupidity. No band-aids! Companies need to address the cause.
Does Backup as a Public Service for large enterprises makes sense?
Rich- Yes, especially for those with Remote Office/Branch Office (ROBO).
Stan- It depends. You should implement client-side dedupe. Get the Cloud Provider to waive telecom bandwidth charges.
Steve- Consider recovery scenarios, and try to maintain control.
Jon- "Clouds" are bulls@#$ marketing. WAN latency will pile up.
What are the top issues IT leaders should be discussing with the Storage Managers?
Stan- To ensure SLAs meet but not exceed design, to automate, and to evaluate SAN/NAS ratios.
Steve- Server virtualization is putting the spotlight on storage. Failure to implement storage virtualization is becoming the gate that slows down sever virtualization adoption.
Jon- Insist on management features from all storage vendors, try to separate feature/function from the underlying hardware layer. See IBM's [Project Zero].
Rich- Efficiency, Archiving, Thin Provisioning, Compression, Data Protection & Retention, Backup Redesign to protect endpoints like laptops and cell phones.
When does Archive eliminate Backup?
The need for protection never goes away. There are two kinds of data: "originals" and "derivatives", and two kinds of disk: "failed" and "not yet failed".
Given SATA and SAS drives, what is the future of 10K/15K RPM drives?
There is no future for these faster drives, they are going away.
What is the biggest challenge for adopting archive?
It is easy to move data out of production systems, but difficult to make these archives accessible for eDiscovery and Search. There is also concern about changing data formats. Adobe has changed the format of PDF a whopping 33 times.
This was by far the most entertaining section of the day! Hand-held devices allowed the audience to vote which answers they liked best.
Doug Balog, IBM VP and Business Level Executive for Storage, presented Smart Archiving. Citing research by Jon Toigo, Doug indicated that 40 percent of data on disk should be archived. Sadly, a vast majority of companies continue to use their backups as archives. There is a better way to do archives, to address the needs of four use cases:
The IBM Information Archive for email, files and eDiscovery offers full text indexing. A well-deployed archive strategy can save up to 60 percent in backup costs, and reduce backup times by 80 percent. IBM offers advanced analytics and visualization for archive data.
An analysis of a global insurance company found that they kept, on average, 120 copies of every email sent. This was the combination of an average of 12 copies of the email, multipled by 10 backups of the email repository.
Banjercito, a bank in Mexico, has a 10-year retention requirement from government regulations.
The new LTFS Library Edition allows Library-based access to files stored on tape cartridges. The new TS3500 Library Connector means that a single system of connected tape libraries can hold up to 2.7 Exabytes (EB) of data.
Archive Industry Perspectives
Steve Duplessie from Enterprise Strategy Group [ESG] gave his views on the challenges of volume, access and cost. His definition for archive: the long term retention of information on a separate environment for compliance, eDiscovery and business reference purposes. Steve advocates a purpose-built solutiion for archive. There are three major challenges for implementing an archive solution:
Getting Participation -- Steve feels that key stakeholders have inappropriate expectations of what archive is, or can be.
Define Tasks -- Steve argues that archive is very much a process-oriented approach, and tasks must fit business process and procedures
Prepare for Future Content Types -- the frequent change of standard and proprietary data types poses a real challenge for long term retention of data
For example, the Financial Industry Regulatory Authority [FINRA] oversee 4,000 brokerage firms, and 600,000 broker/dealers. They have mandated the storing of digital data related to stock trades, and this can include text messages, voice messages, and emails. They continue to expand this definition, so soon this could include tweets on Twitter, for example.
Steve feels there are four key requirements for archive:
Support for email, such as an email application plug-in
Off-line access to archived data
Support for mobile devices, such as smartphones
Basic search capabilities
Companies are starting to take archive seriously. About 35 percent of firms surveyed have adopted archive, and another 36 percent plan to in the next 12-24 months. Enterprise archive has grown over 200 percent from 2007 to 2009. Steve agrees that not everything needs to be stored on disk. Retention periods greater than six years dictates the need for tape.
Current systems may not meet today's requirements. Data loss and downtime costs have skyrocketed. Data Protection and Retention projects can represent a gold mine of savings, new capabilities can greatly lower costs, allowing companies to shift resources over to revenue generation.
Big Data, New Physics and Geospatial Super-Food
I would vote this the best session of the day! For all those confused on what the heck "Big Data" means, Jeff has the best explanation. Jeff Jonas is an IBM Distinguished Engineer and the Chief Scientist of Entity Analytics. He had just finished his 17th marathon on Saturday, and his fingers were bandaged.
Jeff had founded the Systems Research and Design (SR&D) company, known for creating NORA (non-obvious relationship awareness) used by Las Vegas casinos to identify fraud. SR&D was acquired by IBM back in 2005. Jeff is focused on sensemaking of streams. He feels many companies are suffering from "Enterprise Amnesia".
"The data must find the data .. and the relevance must find the user."
-- Jeff Jonas
Jeff's metaphor to Big Data is a jigsaw puzzle without the picture on the outside of the box. To demonstrate his point, he presented a pile of jigsaw puzzle pieces and asked four teenagers to put the puzzle together without the advantage of the picture on the box. What he had not told them was that he mixed four different puzzles together, removing out 10 to 20 percent of the pieces from each puzzle. He also added some duplicate pieces from a second identical puzzle, and just to make things fun, included a dozen pieces from a sixth puzzle just to mess with their heads. Within a few hours, the kids had managed to figure out that there were four puzzles, that there were duplicate pieces, and that there were some pieces that did not fit any of the four puzzles.
"You can't squeeze knowledge from a pixel."
-- Jeff Jonas
This approach favors false negatives. New observations reverse out old conceptions. As the picture emerges, this provides added focus on new information. More data can provide better predictions. "Bad" data, including misspelled words and mis-coded categories, was often discarded or corrected on the basis of "Garbage-In, Garbage Out", but can now be useful in a Big Data perspective.
Take for example the 600 billion recordings of the "location data" captured on cell phones every day. With regular triangulation of cell phone towers, the information can pinpoint you within 60 meters, add GPS and this improved to within 20 meters, and add Wi-Fi is further improved to 10 meters. While this data is "de-identified" so as not to identify individual users, the process of re-identification is relatively trivial. Jeff's system is able to predict a person will be next Thursday at 5:35pm with 87 percent accuracy.
Thus, Big Data represents an asset, accumulation of context. Real-time analytics can be a competitive advantage. These streams of data will need persistent storage and massive I/O capabilities. In one example, Jeff processed 4,200 separate sources of information and was able to identify "dead votes". These are votes cast by people that died in years prior, indicating voter fraud.
Jeff's latest project, codenamed G2, will tackle not just people, but everything from proteins to asteroids.
Normally, the worst time slot is the hour after lunch, but these presentations kept people's attention.
Down the street, in Times Square, IBM made it on the big board.
Continuous Data Availability
Jeanine Cotter, IBM VP for Data Center Services, started out with a video about Sabre. IBM developed this revolutionary airline reservation system to handle the huge volume of transactions. Today, 18 percent of organizations consider downtime unacceptable for their tier-1 applications, and 53 percent would be seriously impacted by an outage lasting an hour or more.
Eventually, companies cross the "Continuous Availability" threshold, the point where they discover that the possibility of downtime is too costly to ignore. IBM has clients using 3-site Metro/Global Mirror that can fail-over an entire data center in just five mouse clicks.
Jeanine also mentioned Euronics, which is using SAN Volume Controller's Stretched Cluster capability, which allows them to easily vMotion virtual guest images from one data center to another. SVC has had this capability for a while, but now, with full VMcenter plug-in and VAAI support, the capability is fully integrated with VMware.
A final example was a mid-sized University, they are using IBM Storwize V7000 with Metro Mirror. The primary location's Storwize V7000 manages Solid-state drives with Easy Tier. The secondary location's Storwize V7000 has high-capacity SATA drives and FlashCopy.
Customer Testimonial - University of Rochester Medical Center
Rick Haverty, Director of IT infrastructure at University of Rochester Medical Center [URMC] provided the next client testimonial. The mission of the URMC is to use science, education and technology to improve health. URMC gets over $400 million USD in NIH grants, which puts them around 23rd largest University-based academic medical centers in the country. They have over 900 doctors, general practice and specialists.
URMC has an IBM BlueGene supercomputer, a Cisco network over 45,000 ports, and over 7.5 million square feet of Wi-Fi wireless internet coverage. They have three datacenters. The first is 7500 square feet, the second is 6000 square feet, and the third is just 800 square feet to hold their "off-site tapes".
URMC has digitized all of their records, including Electronic Medical Records (EMR) system, medical dosage history, imaging "priors", calibration of infusion pumps, RFID monitoring, and even provide IT support while the patient is on the operating table. RFID monitoring ensures all of the refrigerators are keeping medications at the right temperature. A single failed refrigerator can lose $20,000 dollars worth of medication.
When is a good time for downtime? At URMC, they handle 90,000 Emergency Room vists per year, so the answer is never. When is the ER busiest? Monday morning. (not what I expected!)
URMC's EMR software (Epic) runs on clustered POWER7 servers, with DS8700 disk systems using Metro Mirror to secondary location. They also keep a third "shadow" POWER7 for read-only purposes, and a separate system that provides web-based read-only access. Finally, they have 90 stand-alone Personal Computers (PCs) that contain information for all the patients that have reservations this week, just in case all the other systems fail.
The exploding volume of data comes from medical imaging. For radiology (X-rays), each image is called a "study" takes 20-30 MB each, and they have 650,000 studies per year. This represents about 16TB storage per year, with 3 second response time access. These must be kept for 7 years since last view, or until the patient reaches the age of 18 years old, which ever is later.
But radiology is just one discipline. Healthcare has a whole bunch of "ologies". Another is "Pathology" which looks at cells between glass slides in a microscope. Each study consumes 10-20GB, and URMC does about 100,000 pathology studies per year, representing 150TB per year.
URMC has identified that they have 42 mission-critical applications. The data for these are stored on DS8000, XIV, Storwize V7000 and DS5000, all managed behind SAN Volume Controller.
During lunch, people were able to take a look at our solutions. Here are Dan Thompson and Brett Cooper striking a pose.
Hyper-Efficient Backup and Recovery
The afternoon was kicked off by Dr. Daniel Sabbah, IBM General Manager of Tivoli software. He started with some shocking statistics: 42 percent of small companies have experienced data loss, 32 percent have lost data forever. IBM has a solution that offers "Unified Recovery Management". This involves a combination of periodic backups, frequent snapshots, and remote mirroring.
IBM Tivoli Storage Manager (TSM) was introduced in 1993, and was the first backup software solution to support backup to disk storage pools. Today, TSM is now also part of Cloud Computing services, including IBM Information Protection Services. IBM announced today a new bundle called IBM Storwize Rapid Application Backup, which combines IBM Storwize V7000 midrange disk system, Tivoli FlashCopy Manager, implementation services, with a full three-year hardware and software warranty. This could be used, for example, to protect a Microsoft Exchange email system with 9000 mailboxes.
IBM also announced that its TS7600 ProtecTIER data deduplication solutions have been enhanced to support many-to-many bi-direction remote mirroring. Last year, University of Pittsburgh Medical Center (UPMC) reported that they were average 24x data deduplication factor in their environment using IBM ProtecTIER.
"You are out of your mind if you think you can live without tape!"
-- Dick Crosby, Director of System Administration, Estes
The new IBM TS1140 enterprise class tape drive process 2.3 TB per hour, and provides a density of 1.2 PB per square foot. The new 3599 tape media can hold 4TB of data uncompressed, which could hold up to 10TB at a 2.5x compression ratio.
The United States Golfers Association [USGA] uses IBM's backup cloud, which manages over 100PB of data from 750 locations across five continents.
Customer Testimonial - Graybar
Randy Miller, Manager of Technical System Administration at Graybar, provided the next client testimonial. Graybar is an employee-owned company focused on supply-chain management, serving as a distributor for electical, lighting, security, power and cooling equipment.
Their problem was that they had 240 different locations, and expecting local staff to handle tape backups was not working out well. They centralized their backups to their main data center. In the event that a system fails in one of their many remote locations, they can rebuild a new machine at their main data center across high-speed LAN, and then ship overnight to the remote location. The result, the remote location has a system up and running by 10:30am, faster than they would have had from local staff trying to figure out how to recover from tape. In effect, Graybar had implemented a "private cloud" for backup in the 1990s, long before the concept was "cool" or "popular".
In 2001, they had an 18TB SAP ERP application data repository. To back this up, they took it down for 1 minute per day, six days a week, and 15 minutes down on Sundays. The result was less than 99.8 percent availability. To fix this, they switched to XIV, and use Snapshots that are non-disruptive and do not impact application performance.
Over 85 percent of the servers at Graybar are virtualized.
Their next challenge is Disaster Recovery. Currently, they have two datacenters, one in St. Louis and the other in Kansas City. However, in the aftermath of Japan's earthquakes, they realize there is a nuclear power plan between their two locations, so a single incident could impact both data centers. They are working with IBM, their trusted advisors, to investigate a three-site solution.
This week, May 15-22, I am in Auckland, New Zealand teaching IBM Storage Top Gun sales class. Next week, I will be in Sydney, Australia.
Optimizing Storage Infrastructure for Growth and Innovation
This session started off with my former boss, Brian Truskowski, IBM General Manager of System Storage and Networking.
We've come a long way in storage. In 1973, the "Winchester Drive" was named after the famous Winchester 3030 rifle. The disk drive was planning to have two 30MB platters, hence the name. When it finally launched, it would have two 35MB platters, for a total raw capacity of 70MB.
Today, IBM announced the verison 6.2 of SAN Volume Controller with support for 10GbE iSCSI. Since 2003, IBM has sold over 30,000 SAN Volume Controllers. An SVC cluster can now manage up to 32PB of disk storage.
IBM also announced new 4TB tape drive (TS1140), LTFS Library Edition, the TS3500 Library Connector, improved TS7600 and TS7700 virtual tape libraries, enhanced Information Archive for email, files and eDiscovery, new Storwize V7000 hardware, new Storwize Rapid Application bundles, new firmware for SONAS and DS8000 disk systems, and Real-Time Compression support for EMC disk systems. I plan to cover each of these in follow-on posts, but if you can't wait, here are [links to all the announcements].
Customer Testimonial - CenterPoint Energy
"CenterPoint is transforming its business from being an energy distribution company that uses technology, to a technology company that distributes energy."
-- Dr. Steve Pratt, CTO of CenterPoint Energy
The next speaker was Dr. Steve Pratt is CTO of [CenterPoint Energy]. CenterPoint is 110 years old (older than IBM!) energy company that is involved in electricity, gasoline distribution, and natural gas pipeline. CenterPoint serves Houston, Texas (the fourth largest city in the USA) and surrounding area.
CenterPoint are transforming to a Smart Grid involving smart meters, and this requires the best IT infrastructure you can buy, including IBM DS8000, XIV and SAN Volume Controller disk systems, IBM Smart Analytics System, Stream Analytics, IBM Virtual Tape Library, IBM Tivoli Storage Manager, and IBM Tivoli Storage Productivity Center.
Dr. Pratt has seen the transition of information over the years:
Data Structure, deciding how to code data to record it in a structured manner
Information Reporting, reporting to upper management what happened
Intelligence Aggregation, finding patterns and insight from the data
Predictive Analytics, monitoring real-time data to take pro-active steps
Autonomics, where automation and predictive analysis allows the system to manage itself
What does the transition to a Smart Grid mean for their storage environment? They will go from 80,000 meter reads, to 230,400,000 reads per day. Ingestion of this will go from MB/day to GB/sec. Reporting will transition to real-time analytics.
Dr. Pratt prefers to avoid trade-offs. Don't lose something to get something else. He also feels that language of the IT department can help. For example, he uses "Factor" like 25x rather than percent reduction (96 percent reduced). He feels this communicates the actual results more effectively.
Today's smarter consumers are driving the need for smarter technologies. Individual consumers and small businesses can make use of intelligent meters to help reduce their energy costs. Everything from smart cars to smart grids will need real-time analytics to deal with the millions of events that occur every day.
IBM's Data Protection and Retention Story
Brian Truskowski came back to provide the latest IBM messaging for Data Protection and Retention (DP&R). The key themes were:
Stop storing so much
Store more with what's on the floor
Move data to the right place
IBM announced today that the IBM Real-Time Compression Appliances now support EMC gear, such as EMC Celerra. While some of the EMC equipment have built-in compression features, these often come at a cost of performance degradation. Instead, the IBM Real-Time compression can offer improved performance as well as 3x to 5x reduction in storage capacity.
OVer 70 percent of data on disk has not be accessed in the last 90 days. IBM Easy Tier on the DS8700 and DS8800 now support FC-to-SATA automated tiering.
IBM is projecting that backup and archive storage will grow at over 50 percent per year. To help address this, IBM is launching a new "Storage Infrastructure Optimization" assessment. All attendees at today's summit are eligible for a free assessment.
Analytics are increasing the value of information, and making it more accessible to the average knowledge worker. The cost of losing data, as well as the effort spent searching for information, has skyrocketed. Users have grown to expect 100 percent uptime availability.
An analysis of IT environments found that only 55 percent was spent on revenue-producing workloads. The remaining 45 percent was spent on Data Protection and Retenion. That means that for every IT dollar spent on projects to generate revenue, you are spending another 90 cents to protect it. Imagine spending 90 percent of your house payments for homeowners' insurance, or 90 percent of your car's purchase price for car insurance.
IBM has organized its solutions into three categories:
Hyper-Efficient Backup and Recovery
Continuous Data Availability
What would it mean to your business if you could shift some of the money spent on DP&R over to revenue-producing projects instead? That was the teaser question posed at the end of these morning sessions for us to discuss during lunch.
Normally, IBM has its announcements on Tuesdays, but this week it was on Monday!
I am here in New York City, at the Kaufmann Theater of the American Museum of Natural History, for the
[IBM Storage Innovation Executive Summit]. We have about 250 clients here, as well as many bloggers and storage analysts.
My day started out being interviewed by Lynda from Stratecast, a division of [Frost & Sullivan]. This interview will be part of a video series that Stratecast is doing about the storage industry.
(About the venue: American Museum of Natural History was built in 1869. It was featured in the film "Night at the Museum". In keeping with IBM's focus on scalability and preservation, the museum here boasts skeletons of the largest dinosaurs. The five-story building takes up several city blocks, and the Kaufmann theater is buried deep in the bottom level, well shielded from cell phone or Wi-Fi signals allowing me to focus on taking notes the traditional way, with pen and paper.)
Deon Newman, IBM VP of Marketing for Northa America, was our Master of Ceremonies. Today would be filled with market insight, best practices, thought leadership, and testimonials of powerful results.
This is my first in a series of blog posts on this event.
Information Explosion on a Smarter Planet
Bridget van Kralingen, IBM General Manager for North America, indicated that storage is finally having its day in the sun, moving from the "back office" to the "front office". According to Google's Eric Schmidt, we now create, capture and replicate more date in two days than all of the information recorded from the dawn of time to the year 2003.
1928: IBM's innovative 80-column punch card stored nearly twice as much as its 50-column predecessor.
1947: Bing Crosby decided to do his radio show by recording it at his convenience on magnetic tape, rather than doing it live. This was the motivation for IBM researches to investigate tape media, delivering the first commercial tape drive in 1952. One tape reel could hold the equivalent of 30,000 punch cards.
1956: the IBM RAMAC mainframe was the first computer to access data randomly with an externally-attached disk system, the "350 Disk Unit", which stored 5 million 7-bit characters (about 5MB) and weighed over 500 pounds. Compare that today's cell phone that can store several GB of data in a handheld device.
1978: IBM invented Redundant Array of Independent Disks (RAID) through a collaboration with University of Berkeley.
1993: IBM introduces the [IBM 9337 Disk Storage Array], the first external disk storage system for distributed operating systems. This was based on the Serial Storage Architecture [SSA] protocol.
1995: IBM launches products that support Storage Area Networks (SAN), based on the Fibre Channel Protocol. IBM's internal codenames for disk products were all names of sharks, and so our internal mantra was that a healthy storage diet was comprised of "Plenty of Fish and Fibre".
2010: IBM ships Easy Tier, the world's easiest-to-use sub-LUN automated tiering capability, for the IBM System Storage DS8700 disk system.
Storage is growing (in capacity) at 40 percent per year, but IT budgets are only growing (in dollars) by a measly 1 to 5 percent. She cited the success at [Sprint], presented at the October 2010 launch. By combining IBM SAN Volume Controller with a three-tier storage architecture, Sprint lowered their raw capacity from 10PB to 8.4PB, increasing utilization from 35 to 78 percent. This involved shrinking from six storage vendors to three, and reducing total number of disk arrays from 166 down to 96. The resulting system has only 38 percent of their data on their most expensive Tier-1 storage, the rest is now living on less expensive Tier-2 and Tier-3 storage.
Companies are entering the era of Big Data with an insatiable appetite for collecting and analyzing data for marketplace insights. IBM [InfoSphere BigInsights], based on the Apache Hadoop, has helped customers make sense of it all. Innovative technology, expertise and marketplace insight will provide the competitive path forward in the coming decade.
Storage Challenges and Opportunities in 2011 and Beyond
I always enjoy hearing Stan Zaffos, Gartner Research VP, present at the annual [Data Center Conference] in Las Vegas every December. His analysis and research focuses on storage systems and emerging storage technologies.
Stan provided his perspective on the storage industry. He suggested a top-down approach, based on the market trends that Gartner is closely monitoring. He suggests focusing heavily on managing data growth, using SLAs to improve efficiency, and to follow Gartner's recommended actions. His statement, "If something is not sustainable, then it is unsustainable." resonated well with the audience. His key three points:
Design to meet but not exceed Service Level Agreements (SLAs)
Re-evaluate your ratio of SAN versus NAS based on growth of unstructured data content,
Explore the variety of Cloud options available.
Those of us who have been in this business a long time recognize that the problems haven't changed, just the dimensions. When in the past three decades were IT budgets generous and plentiful? When was there more than enough IT staff to handle all the requests in a timely manner? When hasn't there been a period of information growth? Gartner's analysis external control block (RAID protected disk systems) is growing revenue at 8.7 percent. Raw TBs of disk capacity is growing at 55 percent, and expected to be 100 Exabytes by 2015.
SAN has four times more revenue than NAS today, but NAS is growing faster. NAS was only 9 percent marketshare in 2010, but is projected to grow to 32 percent by 2015. SAN can offer higher price/performance for traditional OLTP and database workloads, but NAS is better suited for unstructured data, backups and archives, assisted by storage efficiency features like real-time compression and data deduplication. Which industries create the most unstructured data? The ones involved in filling out forms! This includes government, insurance agencies, manufacturing, mining and pharmaceuticals.
The phrase "good enough" should no longer be considered an insult. Too often IT departments design solutions that far exceed negotiated Service Level Agreements (SLAs), and they should instead focus on just meeting them instead. Modular storage systems are often sufficient for most workloads. Slower 7200RPM SATA disks can be one third the price of faster 15K RPM Fibre Channel drives, and often sufficient performance for the tasks required. Unified storage, such as IBM N series, can help simplify capacity planning, as storage can be re-purposed if different workloads grow at different rates. The key is to focus on meeting SLAs based on the price-vs-risk factor. Take a minimalist approach with fewer SLAs, fewer management classes, and fewer storage vendors.
Stan suggests a two-pronged approach: Capacity management through content analytics and classification, and Efficient Utilization through Thin Provisioning, storage virtualization, Quality of Service (QoS), compression and deduplication capabilities. This features will be ubiquitous by 2013. If you are worried that these technologies mean more information packed onto fewer devices, Stan's response was "If it's not there, it can't break." Storing data on fewer disks or tape cartridges means less chance something will fail.
Stan feels IT shops using Thin Provisioning should continue to charge their end-users on what they ask for (the full allocation request) rather than what the thin-provisioned amount actually is on the storage devices themselves. For example, if someone asks for 100GB LUN to be allocated to their system, but this only takes up 30GB of actual data space, chargeback the full 100GB!
It can take five years for new technology to get 50 percent adopted. The Romans took eight years to build the [Colosseum]. His research on "network convergence" found that 42 percent planned to use iSCSI, 32 percent Fibre Channel over Ethernet (FCoE) or other Top-of-Rack(TOR) converged switches, and 16 percent looking for full convergence of servers, switches and storage. Features like IBM Easy Tier automatic sub-LUN tiering were introduced later, and so have not been adopted as widely as other features like Thin Provisioning that have been around since the 1990's IBM RAMAC Virtual Array.
Stan felt that Public and Private clouds were two different approaches. Public clouds offer reservation-less provisioning. Private clouds offer improved agility, but can be more complex to set up, and has the risk of idle capacity similar to traditional IT datacenter deployments. Storage and File virtualization should be considered a pre-req for adopting Cloud technologies.
Storage IT teams need to adopt more than just technical skills. They need to learn about legal and government regulatory compliance issues, financial considerations, and would even benefit doing some "marketing". Why marketing? Because often IT departments need end-users to change their attitudes and behaviours, and this can be accomplished through internal marketing campaigns.
IBM introduced the Linear Tape File System last year, which I explained in my post [IBM Celebrates 10 Year Anniversary for LTO tape], and released it as open source to the rest of the Linear Tape Open [LTO] Consortium so that the entire planet can benefit from IBM's innovation. IBM presented a technology demonstration of its Linear Tape File System - Library Edition at the NAB conference, showing how this new IBM library offering of the file system can put mass archives of rich media video content at the users fingertips with the ease of library automation.
From left to right, here is Atsushi Nagaishi (Toshiba) and Shinobu Fujihara (IBM). Fujihara-san is from IBM's Yamato lab in Japan where some of the LTFS development was done. The Yamato Lab was not damaged by the [Earthquakes in Japan].
With the capabilities of LTFS, IBM has introduced an entirely new role for tape, as an attractive high capacity, easy to use, low cost and shareable storage media. LTFS can make tape usable in a fashion like removable external disk, a giant alternative to floppy diskettes, DVD-RW and USB memory sticks with directory tree access and file-level drag-and-drop capability. LTFS can allow the for passing of information around from one system or employee to another. And as for high video storage capacity, a 1.5TB LTO-5 cartridge can hold about 50 hours of XDCAM HD video!
A group photo of the global IBM LTFS team, from left to right, David Pease from IBM Almaden Research Center, Ed Childers from IBM Tucson, Shinobu Fujihara and Hironobu Nagura from IBM Japan.
IBM was once again #1 leader in Tape worldwide for the year 2010. With this exciting new win, tape is not just for backup and archive anymore!
"All work and no play makes Jack a dull boy!"
Often I get feedback from my readers that I focus too much on storage products in this blog, and have been asked to break out of the work world for a change. Fair enough! The first Sunday of May is designated "World Laughter Day". I am proud to be one of the founding members of the [Tucson Laughter Club] back in 2004, and we held our first World Laughter Day event on May 1, 2005 at Udall Park.
Over the past seven years, we have had four other clubs "spin off" from the main group to form their own club. However, for World Laughter Day, the five sister clubs (or five warring tribes, as some call them) put down our differences and got together for a day of fun. In keeping with the tradition of having these events outside, we were granted permission to hold our event at the University of Airzona mall.
While the Tucson Laughter Club is recognized as one of the oldest laughter clubs in the United States, there are actually over 6000 clubs worldwide in over 60 countries. Laughter clubs started in India, when Dr. Madan Kataria (a medical doctor) and his wife Madhuri (a yoga instructor) gathered people in a park to try laughter as a form of healing and exercise. Today, [Laughter Yoga] is practiced outside in parks or indoors.
Up until now, all of the World Laughter Day events in Tucson have been organized by the original Tucson Laughter Club, but this time we decided to pass the baton over to Gita Fendelman of [Curve's Laughter YogHA club] to take the lead. Here she is standing next to a large yellow sign with facts and figures about the history of World Laughter Day.
We had about 45 people join us in a large circle, and proceed with a series of breathing, stretching and laughter exercises. Many of the laughter exercises involve moving around to look at each of the other participants eye-to-eye, and with 45 people, this can be quite challenging.
The weather could not have been nicer. Clear blue skies, a slight breeze, and an unusually cool 75 degrees F. Last week we were in the nineties approach summer, so we were delighted the weather cooled down for this event.
As a certified Laughter Yoga instructor myself, I offered to help lead the events for World Laughter Day. We had plenty of other certified laughter leaders on hand, and so both Gita and Emily (from Laughter Yoga with Emily) served as co-chairs.
I brought my [CRATE battery-powered amp] and microphones so that we could project our voices loud enough to the entire group. There was no electricity anywhere near our location, so battery-powered amps are the way to go for these situations.
After two hours of laughing, we all lie down for some relaxing meditation. Some people use this time to pray for World Peace. In a delightful coincidence, later that day, US President Barack Obama announced that the [world was a better place] having eliminated one of the world's most dangerous terrorists.
I would like to thank Jeff from our local NBC News Channel 4 affiliate [KVOA] who came to interview Gita and video us while we did our laughter exercises.
Back in Februray, my blog post [A Box Full of Floppies] mentioned that I uncovered some diskettes compressed with OS/2 Stacker. Jokingly, I suggested that I may have to stand up an OS/2 machine just to check out what is actually on those floppies. Each floppy contains only three files: README.STC, STACKER.EXE and a hidden STACKVOL.DSK file. The README.STC explains that the disk is compressed by Stacker, a program developed by [Stac Electronics, Inc.]. The STACKER.EXE would not run on Windows XP, Vista or Windows 7. The STACKVOL.DSK is just a huge binary file, like a ZIP file, compressed with [Lempel-Ziv-Stac] algorithm that combines Lempel-Ziv with Huffman coding.
In my follow-up post [Like Sands in an Hourglass], I explained how there are many ways I could have tackled this project. I could either use the Emulation approach and try to build an OS/2 guest image under a hypervisor like VMware, KVM or VirtualBox, or just take the Museum approach and try taking one of my half dozen old machines, wipe it clean and stand up OS/2 on it bare metal. This turned out to be more challenging than I expected. The systems I have that are modern and powerful enough to run hypervisors don't have floppy drives, so I opted for the Museum approach.
(A quick [history of OS/2] might be helpful. IBM and Microsoft jointly developed OS/2 back in 1985. By 1990, Microsoft decided it's own Windows operating system was more popular with the ladies, and decided to break off with IBM. In 1992, IBM release OS/2 version 2.0, touted as "a better DOS than DOS and a better Windows than Windows!" Both parties maintained ownership rights, Microsoft renamed OS/2 to Windows NT. The "NT" stood for New Technology, the basis for all of the enterprise-class Windows servers used today. IBM named its version of OS/2 version 3 and 4 "WARP", with the last version 4.52 released in 2001. In its heyday, OS/2 ran the majority of Automated Teller Machines (ATMs), was used for hardware management consoles (HMC), and was used worldwide to run various Railway systems. After 2001, IBM encouraged people to transition from Windows or OS/2 over to Java and Linux. For those that can't or won't leave OS/2, IBM partnered with Serenity Systems to continue OS/2 under the brand [eComStation].)
Working with an IBM [ThinkCentre 8195-E2U Pentium 4 machine] with 640MB RAM and 80GB hard disk, a CD-rom and one 3.5-inch floppy drive, I first discovered that OS/2 is limited to very small amounts of hard disk. There are limits on [file systems and partition sizes] as well as the infamous [1024-cylinder limit] for bootable operating systems. Having a completely empty drive didn't work, as the size of the disk was too big. Carving out a big partition out of this also failed, as it exceeded the various limits. Each time, it felt the partition table was corrupted because the values were so huge. Even modern Disk Partitioning tools ([SysRescueCD] or [PartedMagic]) didn't work, as these create partitions not recognizable to OS/2.
The next obstacle I knew I would encounter would be device drivers. OS/2 comes as a set of three floppy diskettes and a CD-rom. The bootable installation disk was referred to affectionately as "Disk 0", then Disk 1, then Disk 2. Once all drivers have been loaded into memory, then it can start looking at the CDrom, and continue with the installation. In searching for updated drivers, I came across [Updated OS/2 Warp 4 Installation Diskettes] to address problems with newer display monitors. It also addresses the 8.4GB volume limit.
The updates were in the form of EXE files that only execute in a running DOS or OS/2 environment, expanded onto a floppy diskette. It seemed like [Catch-22], I need a working DOS or OS/2 system to run the update programs to create the diskettes, but need the diskettes to build a working system.
To get around this, I decided to take a "scaffolding" approach. Using DOS 6 bootable floppy, I was able to re-partition the drive with FDISK into two small 1.9GB partitions. I have the full five-floppy IBM DOS 6 set, I hid the first partition for OS/2, and install the DOS 6 GUI on the second partition. I went ahead and added a few new subdirectories: BOOT to hold Grub2, PERSONAL to hold the data I decompress from the floppies, and UTILS to hold additional utilities. This little DOS system worked, and I now have new OS/2 "Disk 1" and "Disk 2" for the installation process.
(If you don't have a full set of DOS installation diskettes, you can make due with "FORMAT C: /S" from a [DOS boot disk], and then just copy over all the files from the boot disk to your C: drive. You won't have a nice DOS GUI, but the command line prompt will be enough to proceed.)
Like DOS, OS/2 expects to be installed on the C: drive. I hid the second partition (DOS), and marked the first partition installable and bootable. The OS/2 installation involves a lot of reboots, and the hard drive is not natively bootable in the intermediate stages. This means having to boot from Disk 0, then putting in Disk 1, then disk 2, before continuing the next phase of the installation. I tried to keep the installation as "Plain Vanilla" as possible.
I had to figure out what to include, and what to exclude, and this involved a lot of trial and error. For example, one of the choices was for "external diskette support". Since I had an "internal diskette drive", I didn't think I needed it. But after a full install, I discovered that it would not read or write floppy diskettes, so it appears that I do indeed need this support.
OS/2 supports two different file systems, FAT16 and the High Performance File System (HPFS). Since my partition was only 1.9GB in size, I chose just to use FAT16. HPFS supported larger disk partitions, longer file names, and faster performance, none of which I need for these purposes.
I thought it would be nice to get TCP/IP networking to work with my Ethernet card. However, after many attempts, I decided against this. I needed to focus on my mission, which was to decompress floppy diskettes. It was amusing to see that OS/2 supported all kinds of networking, including Token Ring, System Management, Remote Access, Mobile Access Services, File and Print.
Once all the options are chosen, OS/2 installation then proceeds to unpack and copy all the programs to the C: drive. During this process, IBM had informational splash screens. Here's one that caught my eye, titled "IBM Means Three Things" that listed three reasons to partner with IBM:
Providing global solutions for a small planet
Creating and Applying advanced technologies to improve with which customers run their businesses
Constantly improving customer service with the products and services we provide
You might wonder how these OS/2 splash screens, written over 10 years ago, can appear almost identical to IBM's current [Smarter Planet] campaign. Actually, it is not that odd. IBM has been keeping to these same core principles since 1911, only the words to describe and promote these core values have changed.
To access both OS/2 and DOS partitions, I installed Grand Unified Bootloader [Grub2] on the DOS partition under C:/BOOT/GRUB directory. However, when I boot OS/2, I cannot see the DOS partition. And when I boot DOS, I cannot see the OS/2 partition. Each operating system thinks its C: drive is the only partition on the system.
Now that I had OS/2 running, I was then able to install Stacker from two floppy diskettes. With this installed, I can compress and decompress data on either the hard disk, or on floppy diskettes. Most of the files were flat text documents and digital photos. After copying the data off the compressed disks onto my hard drive, I now can copy them off to a safe place.
To finish this project, I installed Ubuntu Linux on the remaining 76GB of disk space, which can access both the OS/2 and DOS drives FAT16 file systems natively. This allows me to copy files from OS/2 to DOS or vice versa.
Now that I know what data types are on the diskettes, I determined that I could have decompressed the data in just a few steps:
Set up a DOS partition on C: drive
Insert one of the compressed diskettes into the floppy drive
Copy the STACKER.EXE program from the floppy to the C: drive
Run "STACKER A:" to decompress the floppy diskette
However, now that I have a working DOS and OS/2 system, I can possibly review the rest of my floppy diskettes, some of which may require running programs natively on OS/2 or DOS. This brings me to an important lesson. If you are going to keep archive data for long-term retention, you need to choose file formats that can be read by current operating systems and programs. Installing older operating systems and programs to access proprietary formats can be quite time-consuming, and may not always be possible or desirable.
Yes, it's Tuesday, and that means more IBM Announcements! A lot was announced today, so I have selected an eclectic mix for your enjoyment.
Microsoft Windows support on IBM Mainframes
Last year's announcement of the new IBM zEnterprise included the zEnterprise BladeCenter Extention (zBX) which could run POWER7 and x86 operating systems, but managed by the mainframe's overall Unified Resource Manager. Initially, this was intended for AIX and Linux-x86, but today, IBM [announced a statement of general direction to support Microsoft Windows] on the zBX extension by end of this year. Of course, the standard disclaimer applies: All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Any reliance on these statements of general direction is at the relying party's sole risk and will not create liability or obligation for IBM.
New 15K RPM drives for IBM Storwize V7000
Last October, when IBM introduced the Storwize V7000, it offered both large (3.5 inch) and small form factor (2.5 inch) drives. Unfortunately, a few people were upset that there were no 15K RPM drives for the small form factor models. There were SSD and 10K RPM drives, but nothing in between. Today, IBM [announced new 15K RPM drives of 146GB capacity] have been qualified for both the controller and expansion drawers.
New RVU licensing for IBM Tivoli products
IBM [announced it is changing over to this new RVU licensing model], from the previous PVU license, based on processor value units. What is an RVU? An RVU is a unit of measure by which the program can be licensed. RVU Proofs of Entitlement (PoE) are based on the number of units of a specific resource used or managed by the program. This makes sense, resource management software should be charged by the amount of resources you manage, not the size of the server the software runs on. This change also enables running on server virtualization and live movement of VM guest images from one type of host machine to another.
If you are contemplating a visit to an IBM [Executive Briefing Center], then April and May is a great time to come to Tucson. The weather is ideal here. The cold snap appears to be over, and spring is in the air!
The IBM Storwize V7000 was introduced last October, and has proven to be wildly successful. I saw two awesome reviews recently of the IBM Storwize V7000 disk system that I thought I would bring to your attention.
The first review is [IBM Storwize V7000] from Roger Howorth of ZDNet UK. Here are some quotes:
"Under the hood, the Storwize V7000 is built from technologies originally developed for IBM's enterprise-class storage systems, so the V7000 benefits from a comprehensive set of high-end features that have been scaled down for mid-range buyers."
"Initial configuration couldn't be simpler."
"We really liked the layout and functionality of the GUI."
"Storwize V7000 is virtual storage that offers efficiency and flexibility through built-in SSD optimization and "thin provisioning" technologies while enabling users to virtualize and re-use existing disk systems..."
"Storwize V7000 advanced functionality also enables non-disruptive migration of data from existing storage, simplifying implementation and minimizing disruption to users."
"The Storwize V7000 graphical user interface is a browser-based, easy to navigate intuitive GUI."
"ESG Lab found that getting started with the Storwize V7000 disk system was intuitive and straightforward."
"Easy Tier increases the efficiency and simplicity of deploying SSD drives."
I started attending the Arizona International Film Festival eight years ago. I took a week off work to see the films, and came back to tell people how enjoyable it was to just sit and watch thirty movies. "Dirty movies?" they would ask. "No, not dirty, thirty!" To avoid further confusion, I quickly switched to saying "I spent the week watching 25 to 35 independent films."
A few weeks ago, the Arizona International Film Festival notified me that my 2007 System Storage video has been recognized for a technical award under the category of "Innovative use of Technology for Animated Short Film." I will receive the award in person this evening, April 1, at the opening ceremony, which starts at 6:00pm, at the [Fox Theater] in downtown Tucson, Arizona.
As is the case with the Oscars and Grammies, technical awards are handed out in smaller ceremonies in advance of the primary award ceremony that recognizes the best actors, directors and films, which will be held April 9.
In this virtual world, avatars representing IBM executives and marketing managers would present our latest products to avatars of the IBM Business Partners, Industry Analysts and the Press. A short "highlights" video that stitched together bits and pieces of the 90-minute event was used by executives at conferences and road shows. I submitted this shortened version to the Airzona International Film Festival back in 2008, so I am glad the judges had finally gotten around to review it. Here it is uploaded as a [YouTube video]:
During the event, I captured the real-time video from my laptop screen using a tool called [FRAPS]. I also had some of my colleagues capture video from different angles in case we needed these in post-production. The technique of capturing computer-generated 3D video from a computer screen is known as Machinima.
I was in Bogota Columbia that week teaching a Top Gun class. I got to the IBM building only to discover the firewall would not let me get through to the Second Life website, so I took a taxi back to the hotel and ran the event from their business center. Then the unthinkable happened, and I got to experience [Columbia's worst power outage in 22 years], in which 98 percent of the country lost power. Luckily, I had enough battery charge on my laptop and was still connected to the Internet to continue with the rest of the event.
Voice-over-IP but it did not have that feature back then. The other $91 was for virtual items in Second Life. I learned how to make virtual objects and use the GNU Image Manipulation Program [GIMP] to create avatar clothing, giveaway items, and demo equipment.
Instead of hiring voice actors, I had IBMers Andy Monshaw, Eric Buckley, Funda Eceral, David Tareen, and Kristie Bell all provide their voice talents directly.
I asked the coordinator of the film festival if there was going to be a &quot;practice session&quot; for the technical award ceremony. She laughed, and said basically that I would just be walking across the stage, receive the award in my left hand as I shook hands with my right, and then turn slightly clockwise to pose while my picture is taken. If I had ever gotten a diploma from high school or college, she said, then I already knew what to expect. "Don't worry," she assured me, "you won't have to give a speech!"
(In lieu of a speech, I would like to thank Christine Heinisch, my video editor, and Katrina Smith, my cinematographer. I could not have won this technical award without their assistance.)
After the ceremony tonight, the film festival (celebrating its 20th anniversary this year) will kick off with the first of 110 films called "Journey from Zanskar", a 90-minute documentary directed by Frederick Marx and narrated by Richard Gere, starting at 8:00pm. The Arizona International Film Festival will continue through April 20, with films being shown on the evenings and weekends so that I won't have to take time off from work.
If you are in the Tucson area, come out and join me tonight at the Fox Theater!
(Update: Yes, this was an April Fools joke! I did not win any awards for this video. I apologize to my friends and family who showed up to see me receive the award that I didn't get.)
Normally, when EMC fails, it is worth a giggle. Companies are run by humans, and nobody is perfect. However, their latest one, failing to defend their RSA SecurID two-factor website, is no laughing matter. Breaches like this undermine the trust needed for business and commerce to be done with Information Technology, so it affects the entire IT industry.
(FTC Disclosure: I do not work or have any financial investments in either EMC nor ENC Security Systems. Neither EMC nor ENC Security Systems paid me to mention them on this blog. Their mention in this blog is not an endorsement of either company or their products. Information about EMC was based solely on publicly available information made available by EMC and others. My friends at ENC Security Systems provided me an evaluation license for their latest software release so that I could confirm the use cases posed in this post.)
Of course, EMC did the right thing by making this breach public in an [Open Letter to RSA Customers]. While this may affect their revenues, as clients question whether they should do business with EMC, or affect their stock price, as investors question whether they should invest in EMC, they were very clear and public that the breach occurred. As far as I know, none of the executives of the RSA security division have stepped down. The disclosure of the breach was the right thing to do, and required by law from the [US Securities Exchange Commission]. This law was created to prevent companies from trying to hide breaches that expose external client information.
The breach does not affect RSA public/private key pairs used by IBM and most every other large company. Rather, this breach was targeted to RSA SecurID two-factor authentication. I explained two-factor authentication in my blog post [Day 5 Grid, SOA and Cloud Computing - System x KVM solutions], but basically it is an added level of security, requiring something you know (your password) with something you have (such as a magnetic card or key fob). Both are required to gain access to the system.
Breaches happen. Recently, [Hackers found vulnerabilities in the McAfee.com website]. Last month, fellow blogger Chuck Hollis from EMC had a blog post on [Understanding Advanced Persistent Threats (APT)] in the week leading up to their RSA Conference. It was precisely an APT that hit RSA, so the irony of this breach was not lost on the blogosphere. Perhaps Chuck's blog post gave hackers the idea to do this, like saying "I hope terrorists don't bomb this building that hold all of our chemical weapons..." or "I hope bank robbers don't rob this repository where we keep all the cash..."
(The sinister counter-theory, that EMC staged this breach as a marketing stunt to undermine trust in hybrid or public cloud offerings, such as those offered by IBM, Amazon or Salesforce.com, offers an interesting twist. While computer breaches in general are fodder for [Luddites] to argue we should not use computers at all, this particular breach could be used by EMC salesmen to encourage their customers to choose private cloud over hybrid cloud or public cloud deployments. Given all the extra work that RSA SecurID customers have to now do to harden their environments, that would be in bad taste.)
Today, March 31, is World Backup Day. This is because many viruses are triggered to operate on April 1. Just like checking the batteries in your smoke alarms every year, you should ensure that your backup methodology remains valid.
Back in 2008, I was a volunteer for the One Laptop Per Child (OLPC) initiative, and built an XS server to be used for Uruguay. I shipped [this baby off to school] to be the central server that all the student and teacher laptops connected to. It was the gateway to the Internet, as well as the [repository for the blogs of each student]. The blogs were accessible to the public, so that parents could read what their students were writing.
Unfortunately, this public access resulted in my little XS server being attacked by hackers, with IP addresses in Russia and China. Why anyone from either of those two countries wanted to ruin the hopes and dreams of small school children in Uruguay was beyond me. Fortunately, I had planned for remote administration. Backups were taken by me weekly to a second drive that was only mounted when I was dialed in to take the backup. The rest of the time, it was offline, so as not to be written to by hackers.
I also shipped along with the server a bootable DVD that contained a modified version of [System Rescue CD], scripts to start up SSHD daemon, and pre-populated for use with public/private RSA keys for me and eight other administrators located in various countries. To effect repairs, the local operator would reboot to the DVD, and then I could login via "ssh" and restore the operating system, programs and data. Sadly, this meant that the students might have lost some of their most recent blog posts since the last backup.
Please consider reviewing your own backup strategies. If your security were compromised, data was corrupted or lost, would you be able to recover from your backups?
Use Encryption where Appropriate
If you plan to travel this Summer, you may want to consider encryption to protect yourself. ENC Security Systems has just released their latest [Encrypt Stick] which is a USB memory stick pre-loaded with software that provides three features:
Encryption for your files
A secure web browser for accessing sensitive websites
Secure password manager
Many hotels now offer computers for use by the guests. These are typically running some flavor of Windows operating system. Encrypt Stick comes with an EXE file that you can run to browse the web securely, and have access to your encrypted files and passwords, leaving no trace on the hotel lobby computer.
Friends and Family
What if you are visiting friends and family, and they have a Mac instead? No problem, as Encrypt Stick has a DMG file to use on Mac OS X operating system. While you may not be worried about your siblings hacking into your bank account, you may not want them necessarily seeing what sites you visited.
I have been to several airport lounges now that use Linux for their public computers. Makes sense to me, as there are fewer viruses for Linux, and updating Linux is relatively straightforward. However, Encrypt Stick does not support Linux. For my Linux-knowledgeable readers, you can build your own with [Unetbootin] bootable USB memory stick to launch your favorite Linux browser in memory on whatever system you are using. The [Gparted Magic] utility rescue tool includes [TrueCrypt] to encrypt your files. Lastly, you can use [MyPasswordSafe] to hold all of your passwords securely.
Several clients have asked if any of the IBM data-at-rest encrypted disks or tapes are affected by this breach. IBM uses AES encryption for the actual disk and tape media, but we do use RSA keys to encrypt the generated keys used on the TS1120 and TS1130 drives. However, these were not affected by the RSA SecurID breach, and your tapes are safely protected.
Advanced Persistent Threats, viruses and other malware are no laughing matter. If you are concerned about security, contact IBM to help you assess your current environment and help you plan a robust protection strategy.
Next week, April 6, IBM will host the [Smarter Computing Virtual Event] to cover IBM's Smarter Computing initiative, with key themes of Smarter Computing - Big Data, Optimized Systems, and Cloud. Smarter Computing is a new and innovative approach to computing based on the evolving role of IT in your business and an intrinsic understanding of the economics of IT.
(I found it amusing that EMC has chosen two of IBM's themes, "Big Data" and "Cloud", for their upcoming EMC World 2011 conference. I was tempted to include their graphic, but people might have accused me of using Photoshop or GIMP to make EMC look bad. Instead, you can look at the graphic on this blog post titled [When Cloud Meets Big Data: Information Logistics Revisited] by fellow blogger Chuck Hollis from EMC. IBM has been a leader in IT for decades, so we are used to having other companies follow in our footsteps. As an [IBM wannabee], EMC is no different.)
For many on tight travel budgets, this event REQUIRES NO TRAVEL! This is a virtual event, You can participate from your desk. You will hear from key IBM executives, all of which I have heard speak myself, so I can vouch that this should be a good event.
Steve Mills - IBM Senior Vice President and Group Executive, Software and Systems (my seventh-line manager)
Tom Rosamilia - IBM General Manager, Power and Mainframe Systems, IBM Systems and Technology Group.
Robert LaBlanc - IBM Senior Vice President, Middleware Software
Helene Armitage - IBM General Manager, Systems Software, Systems and Technology Group.
This event is targeted to CIOs, IT Directors and Managers, Business Analysts, Systems and Storage Administrators, and DBAs. However, we don't check what your actual title is, so feel free to attend even if you have different job responsibilities.
I am giving you one week's notice for this event. If this is the first time you have heard of this event, then I hope that is enough time to plan for this event in your busy schedule. If you had heard of it already, perhaps this serves as a useful reminder to [Register Now!] Is a week ahead the right amount of time? For virtual events, do we need more or less advance notice? What about for events that involve travel? Feel free to enter your thoughts on this in the comments section below.
Last night, I presented an E-Talk to the Engineering Student Council (ESC) of the University of Arizona (UofA).
The ESC is the student governing body of The University of Arizona’s College of Engineering. The organization works with scholastic honorary societies, professional organizations, and project clubs to aid and encourage the professional and social development of students. This year, ESC launched a new program, Engineering Talks (E-Talks), consisting of workshops and lectures, which will focus on teaching students what it takes to work within a company, before they enter the workforce. To make this program successful, career advice from professionals working at established companies is essential.
The audience was a mix of undergraduate and graduate engineering students from a variety of disciplines, such as Petroleum, Hydrology, Mining, Biomedical, Electrical and Computer Engineering. Only a few were graduating this May. There were roughly an equal number of boys and girls, which was encouraging. When I was an engineering student at the UofA, women engineers were very rare.
A little about myself, my academic and professional career over the past 30 years, and some background of IBM as a company, how it is organized, and its 100 year Centennial celebration.
An overview of IBM's corporate strategy for Smarter Computing, explaining how IBM is solving the world's toughest challenges for analyzing Big Data, developing Optimized Systems for particular workloads, and new delivery and deployment models for Cloud Computing.
Some career advice, based on my decades of work experience at IBM and elsewhere
After the Q&A, several students stayed around afterwards to ask questions. This seems to happen every time I give a presentation to a mixed audience. I handed out plenty of business cards, and offered to make the charts available to all the students via the IBM Expert Network on Slideshare.net website.
Before we started, we asked the first survey question: "How is storage planning conducted in your shop?" Of the various responses, nearly four out of ten responded "Part of an overall IT infrastructure strategy".
Jon Toigo went first, and spent 20 minutes or so laying out the problem as he sees it. Jon travels all over visiting customers struggling with their storage infrastructures, so he gets to hear a lot of this first hand.
I then spent 20 minutes or so presenting IBM's vision, strategy and offerings to help solve these problems. I could speak for hours on this topic, but we kept it short for this one-hour webcast. To learn more, request a visit to the Tucson Executive Briefing Center.
At the end of my talk, we put out the second survey, asking the audience "What is your number one priority with respect to storage operations today?" Over one fourth of the attendees were focused on reducing storage infrastructure cost of ownership by any means possible.
I am glad we saved the last 15 minutes for Q&A, as there were a lot of questions.
The replay is now available. If you attended the event and want to hear it again, or want to share it with your colleagues, or you missed it and want to hear it, then [Register for the Replay].
A reader from New Zealand expressed concern some corporate bloggers were [using the earthquake for marketing]. He lost someone close to him in Christchurch, and is unable to reach a friend living in Japan, so I am sorry for his loss. I plan to be in Australia and New Zealand to teach a Top Gun class May 15-27, so hopefully I will be able to meet him in person when I am down there.
"Earmarking funds is a really good way of hobbling relief organizations and ensuring that they have to leave large piles of money unspent in one place while facing urgent needs in other places. ... Meanwhile, the smaller and less visible emergencies where NGOs can do the most good are left unfunded.
In the specific case of Japan, there's all the more reason not to donate money. Japan is a wealthy country which is responding to the disaster, among other things, by printing hundreds of billions of dollars' worth of new money."
Another reader mentioned that the last surviving American WW-II vet died the same week. WTF? IBM and Japan have been allies for quite a while now, and there is no reason to bring up past wars except to compare the scope and magnitude of the cleanup effort. (Update: Frank Buckles was the last surviving WW-I vet, but also served in WW-II).
Many readers felt that charity begins at home, and there are plenty of worthy causes right here in the USA to donate to instead. Inspired by last year's movie [Waiting for Superman], my girlfriend started a project called [Centers for My Super Stars] for her first grade class on DonorsChoose.org. For those not familiar with this website, DonorsChoose.org uses the cloud to connect school teachers in need of supplies with rich people to donate funds towards these projects. If you want to contribute to her project, [donate here].
"And speaking of class, there just happens to be a baseball team in Sendai, Japan. The Golden Eagles. Their stadium was severely damaged from the earthquake. Wouldn't you think some of them lug nuts who run American baseball would bring the Golden Eagles and their opponents over to the United States when the Japanese season starts -- play some games over here and raise money to help the Japanese? Wouldn't you think they could just once stop that national pastime stuff and help the international pastime?"
As you can see, different readers have different opinions on this. We are all on this world together, and both our economy and our ecology are more interconnected than you might think. Let's build a smarter planet.
IBM announced that it will offer [three free months of IBM Smart Business Cloud] computing and storage services to government agencies, charitable non-profit organizations, and other organizations involved with reconstruction resulting from the earthquakes and tsunami in Japan and the northern Pacific region.
With traditional communications down, and many data centers incapacitated, Cloud Computing can be a great way to resume operations. According to the announcement, organizations can submit their requests now until April 30, and the program will run until July 31, 2011. Options include:
Virtual machine images running 32-bit and 64-bit versions of Linux or Windows.
60GB to 2 TB of disk storage per instance.
Options for various IBM middleware (DB2, Informix, Lotus, and WebSphere)
Rational Application Lifecycle Management and Tivoli Monitoring software
The offer also includes [LotusLive] Software-as-a-Service (SaaS) for email and online collaboration. For more about LotusLive, see this [Red Paper].
When I turned on the television last weekend, I saw large waves of water knock down rows of small houses. I thought I had caught the end of a bad Godzilla movie, but sadly it was not movie special effects. Mother Nature can be quite destructive. Over the past four days, Japan has been hit hard by a series of earthquakes and resulting tsunami.
(Note: Disasters can happen anywhere and at any time. Last month, New Zealand had an earthquake as well. It is best to always be prepared. If you haven't done so lately, check out the latest recommendations from the US Government [Ready.Gov] website.)
Several have asked me how this tragedy in Japan might affect IBM and its clients. Here is what I have gathered from various sources. All IBM Japan employees have survived, are safe and reporting no major injuries. IBM has four major facilities, near central part of the country around Tokyo, far from Sendai, the epicenter. All IBM buildings are still standing and operational. A few sections of Tokyo are affected by scheduled brown-outs in an effort to save electricity. Employees are asked to telecommute (a.k.a. work from home) to minimize traffic congestion.
Hakozaki - Headquarters and executive briefing center
Makuhari - Technical Center, where we often hold conferences and other events
Yamato - Research Facility, where R&D is done for IBM tape storage products
Toyosu - Service Delivery Center
I have been to Japan many times throughout my career. Back in the summer of 1995, IBM sent me to Osaka to help out clients in the aftermath of the Great Hanshin eartquake near Kobe. I remember it well, sending an email back to my team saying "It is 1995, and here in Japan it is 95 degrees and 95 percent humidiy." It was seven months after the earthquake, but people were still living in cardboard boxes and make-shift tents.
Many people asked if I will be going back to Japan to help out. I speak Japanese, can make sense of the Japanese Katakana characters on computer monitors, and am an expert in Disaster Recovery. However, the IBM Japan team is doing an awesome job helping our clients restore their data and recovery their business operations. Of course, if IBM needs me in Japan, I will gladly go, but so far, it doesn't seem that I am needed there.
IBM and the Austin Chamber of Commerce is inviting registered SXSW Interactive attendees to the networking reception being hosted by the IBM Innovation Center and the IBM Venture Capital Group. Power Systems and Watson will have a significant feature at this SXSW event to be held on March 14, 2011.
While I won't be there personally at the SXSW conference, I strongly recommend you to attend this event.
Innovators and Entrepreneurs Networking Reception
Four Seasons Hotel
March 14, 2011
Hosted by IBM Venture Capital Group, Austin Chamber of Commerce, and the IBM Innovation Center.
This reception will provide a rare opportunity to network and collaborate with your professional community of industry leaders, entrepreneurs, developers, academics, venture capitalists, members of the Austin Chamber of Commerce.
Webcast: How to Diagnose and Cure What Ails Your Storage Infrastructure
Wednesday, March 23, 2011 at 11:00 AM PDT / 11:00 AM Arizona MST / 2:00 PM EDT
Storage is the most poorly utilized infrastructure element -- and the most costly part of hardware budgets -- in most IT shops today. And it’s getting worse. Storage management typically involves nightmarish mash-up of tools for capacity management, performance management and data protection management unique to each array deployed in heterogeneous fabrics. Server and desktop virtualization seem to have made management issues worse, and coming on the heels of changing workloads and data proliferation is the requirement to add data management to the set of responsibilities shouldered by fewer and fewer storage professionals. Forecast for Storage in 2012: more pain as long delayed storage infrastructure refresh becomes mandatory.
In this webcast, fellow blogger Jon Toigo, CEO of Toigo Partners International, of [DrunkenData] fame, and I will take turns assessing the challenges and suggesting real-world solutions to the many issues that confound storage efficiency in contemporary IT. Integrating real world case studies and technology insights, our storage experts will deliver a must see webcast that sets down a strategy for fixing storage...before it fixes you.
Don't miss this event, unless you like the stress of knowing that your next disaster may be a data disaster.
Wrapping up my week's coverage of the IBM Pulse 2011 conference, I have had several people ask me to explain IBM's latest initiative, Smarter Computing, which IBM launched this week at this conference. Having led the IT industry through the Centralized Computing era and the Distributed Computing era, IBM is now well-positioned to help companies, governments and non-profit organizations to enter the new Smarter Computing era, focused on insight and discovery.
Thousands of IT professionals
Effiicent, but only the largest companies and governments had them
Millions of office workers
Personal computers (PC)
Innovative, extending the reach to small and medium-sized businesses, but resulted in server sprawl and increased TCO
Billions of people
Smart phones and other handheld devices
Efficient and Innovative, combining the best of centralized and distributed computing
1952 to 1980
1981 to 2010
2011 and beyond
To help clients with this transition, IBM's Smarter Computing initiative has three main components. This is a corporate-wide strategy, with systems, software and services all working together to realize results.
The first component is Big Data. This combines three different sources of data:
Traditional structured data in OLTP databases and OLAP data warehouses, using data management solutions like DB2 and IBM Netezza.
Unstructured data, including text documents, images, audio, and video, processed with massive parallelism using IBM BigInsights and Apache Hadoop.
Real-Time Analytics Processing (RTAP) of incoming data, including video surveillance, social media, RFID chips, smart meters, and traffic control systems, processed with IBM InfoSphere Streams
Of course, Big Data will bring new opportunities on the storage front, which I will save for a future post!
Rather than general purpose IT equipment, we have now the scale and scope to specialize with systems optimized for particular workloads, the second component of the Smarter Computing initiative. Of course, IBM has been delivering integrated stacks of systems, software and services for decades now, but it is important to remind people of this, as IBM now has a spate of competitors all trying to follow IBM's lead in this arena.
As with Big Data, the focus on Optimized Systems has impacted IBM's strategy on storage as well. I'll save that discussion for a future post as well!
I am glad that nearly all of the storage vendors have standardized to a common definition for Cloud, the third component of Smarter Computing, which shows that this concept has matured:
Cloud computing is a pay-per-use model for enabling network access to a pool of computing resources that can be provisioned and released rapidly with minimal management effort or service provider interaction. -- U.S. National Institute of Standards and Technology [nist.gov]
Of course, Cloud is just an evolution of IBM's Service Bureau business of the 1960s and 1970s, renting out time-sharing on mainframe systems, Grid Computing of the 1980s, and Application Service Providers that popped up in the 1990s. While the [butchers, bakers and candlestick makers] that IBM competes against might focus their efforts on just private cloud or just public cloud, IBM recognizes the reality is that different clients will need different solutions. Rather than rip-and-replace, IBM will help clients transition to cloud via inclusive solutions that adopt a hybrid approach:
Traditional enterprise with private cloud deployments, using solutions like IBM CloudBurst, SONAS and Information Archive
Traditional enterprise with public cloud services to handle seasonable peaks, providing offsite resiliency, and solutions for a mobile workforce
Hybrid clouds that blend private and public cloud services, to handle seasonal peak workloads, remote and branch offices
IBM's emphasis on IT Infrastructure Library (ITIL), Tivoli and Maximo products will play well in this space to provide integrated service management across traditional and cloud deployments. This is why IBM decided to launch Smarter Computing initiative at Pulse 2011 conference, the industry's premiere conference on intergrated service management.
The IBM Watson that competed on Jeopardy! is an excellent example of all three components of Smarter Computing at work.
IBM Watson was able to respond to Jeopardy! clues within three seconds, processing a combination of database searches with DB2 and text-mining analytics of unstructured data with IBM BigInsights.
IBM Watson combined servers, software and storage into an integrated supercomputer that was optimized for one particular workload: playing Jeopardy!
IBM Watson used many technologies prevalent in private and public cloud computing systems, storing its data on a modified version of SONAS for storage, using xCat administration tools, networking across 10GbE Ethernet, and massive parallel processing through lots of PowerVM guest images.
This week was the IBM Pulse 2011 converence in Las Vegas, Nevada, with over 7,000 attendees. I wasn't there, and my on-the-scene correspondent was too busy running the hands-on lab to get out and attend sessions. Fortunately, I was able to watch some of the [IBM Software live stream], and here are my thoughts and observations.
Fellow inventor [Dean Kamen] was the keynote speaker. His inventions help people, making the world a better place. Here are three examples I found interesting during his talk:
Helping third world countries
Dean started out with his favorite quote:
"A problem well defined is a problem half-solved." - John Dewey
Dean mentioned that we are fortunate, having both potable drinking water and a reliable supply of electricity, but 2 to 4 billion people on the planet do not. Sponsored by Coca-Cola, Dean and his team of innovators were able to come up with small units that can be placed in a village or town. One unit takes in wet liquid and produces potable drinking water. The other unit takes combustible materials, like cow dung, and products electricity. Each unit is roughly the size of half a standard server rack. What does Coca-Cola get out of this? New "vending machines"! By combining drinking water with flavored syrups, they can create soft drinks on demand.
Dean's opinion was that if you want something done, you need to work with large corporations, as governments are mired in beauracracy and rules. I agree. When I first joined IBM, I was introduced to [TRIZ] which was a systematic method for solving problems. IBM's best and brightest are working to solve some of the toughest computer science challenges. For more on TRIZ, see this blog post about [TRIZ in BusinessWeek].
Helping injured veterans
Dean Kamen is well known for inventing the two-wheeled [Segway Personal Transporter], but his company, [DEKA], makes all kinds of things, mostly medical equipment. To help wounded soldiers returning from Iraq or Afghanistan without one or both arms, Dean and his team developed a robotic arm that has enough motor dexterity to pick up a raisin or grape off the table without dropping or squashing it. Dean has appeared several times on the Colbert Report, and here is a video of the robotic arm:
I have myself enjoyed riding a Segway. A local place in Tucson uses them to lead tourists through downtown Tucson and the University of Arizona campus.
Helping young students to learn science and technology
Dean wrapped up his talking by talking about his passion about "For Inspiration and Recognition of Science and Technology" or [FIRST]. Modeled after sports competitions, FIRST encourages teams of kids to build robots that perform specific tasks. Every year, companies and universities sponsor teams by purchasing robot kits from FIRST. Teams compete in regional competitions, and then the best of those go on to compete in a stadium in Atlanta, Georgia, hosting 76,000 people cheering for their teams.
Unlike other school sports (Football, Basketball, Baseball, etc.) where a student is more likely to win the lottery than get a successful career as a professional athlete, every student involved in FIRST competitions can "go pro". A study of FIRST success tracked students who participated in competitions, and found a substantial improvement in percentage of those students attending college and working as science and engineering professionals.
I am a big fan of encouraging kids of all ages to learn more about science, technology, engineering and math [STEM]. Back in 2009, I blogged about my involvement with [One Laptop Per Child] and [Junior FIRST Lego League]. I've gotten a great reaction to my latest challenge, to build a Watson Jr. in your own basement, based on my [step-by-step] instructions.
If you attended IBM Pulse this week, please comment on your thoughts and observations!
Guest Post: The following post was written by Tom Rauchut, IBM Infrastructure Architect and Advanced Technical Sales Specialist for Tivoli Automation. Tom is at IBM Pulse 2011 for Las Vegas this week, and has offered to send his observations.
The expo opened last night. There are so many fantastic demos and product experts. Las Vegas has a Tivoli buzz on right now.
My series last week on IBM Watson (which you can read [here], [here], [here], and [here]) brought attention to IBM's Scale-Out Network Attached Storage [SONAS]. IBM Watson used a customized version of SONAS technology for its internal storage, and like most of the components of IBM Watson, IBM SONAS is commercially available as a stand-alone product.
Like many IBM products, SONAS has gone through various name changes. First introduced by Linda Sanford at an IBM SHARE conference in 2000 under the IBM Research codename Storage Tank, it was then delivered as a software-only offering SAN File System, then as a services offering Scale-out File Services (SoFS), and now as an integrated system appliance, SONAS, in IBM's Cloud Services and Systems portfolio.
If you are not familiar with SONAS, here are a few of my previous posts that go into more detail:
This week, IBM announces that SONAS has set a world record benchmark for performance, [a whopping 403,326 IOPS for a single file system]. The results are based on comparisons of publicly available information from Standard Performance Evaluation Corporation [SPEC], a prominent performance standardization organization with more than 60 member companies. SPEC publishes hundreds of different performance results each quarter covering a wide range of system performance disciplines (CPU, memory, power, and many more). SPECsfs2008_nfs.v3 is the industry-standard benchmark for NAS systems using the NFS protocol.
(Disclaimer: Your mileage may vary. As with any performance benchmark, the SPECsfs benchmark does not replicate any single workload or particular application. Rather, it encapsulates scores of typical activities on a NAS storage system. SPECsfs is based on a compilation of workload data submitted to the SPEC organization, aggregated from tens of thousands of fileservers, using a wide variety of environments and applications. As a result, it is comprised of typical workloads and with typical proportions of data and metadata use as seen in real production environments.)
The configuration tested involves SONAS Release 1.2 on 10 Interface Nodes and 8 Storage Pods, resulting a single file system over 900TB usable capacity.
10 Interface Nodes; each with:
Maximum 144 GB of memory
One active 10GbE port
8 Storage Pods; each with:
2 Storage nodes and 240 drives
Drive type: 15K RPM SAS hard drives
Data Protection using RAID-5 (8+P) ranks
Six spare drives per Storage Pod
IBM wanted a realistic "no compromises" configuration to be tested, by choosing:
Regular 15K RPM SAS drives, rather than a silly configuration full of super-expensive Solid State Drives (SSD) to plump up the results.
Moderate size, typical of what clients are asking for today. The Goldilocks rule applies. This SONAS is not a small configuration under 100TB, and nowhere close to the maximum supported configuration of 7,200 disks across 30 Interface Nodes and 30 Storage Pods.
Single file system, often referred to as a global name space, rather than using an aggregate of smaller file systems added together that would be more complicated to manage. Having multiple file systems often requires changes to applications to take advantage of the aggregate peformance. It is also more difficult to load-balance your performance and capacity across multiple file systems. Of course, SONAS can support up to 256 separate file systems if you have a business need for this complexity.
The results are stunning. IBM SONAS handled three times more workload for a single file system than the next leading contender. All of the major players are there as well, including NetApp, EMC and HP.
It's Tuesday again, and that means one thing.... IBM Announcements! On the heels of [last week's announcements], IBM announced some additional products of interest to storage administrators.
IBM Information Archive
Back in 2008, IBM [unveiled the Information Archive]. This storage solution provides automated policy-based tiering between disk and tape, with non-erasable non-rewriteable enforcement to protect against unethical tampering of data. The initial release supported [both files and object storage], with support for different collections, each with its own set of policies for management. However, it only supported NFS initially for the file protocol. Today, IBM announces the addition of CIFS protocol support, which will be especially helpful in healthcare and life sciences, as much of the medical equipment is designed for CIFS protocol storage.
Also, Information Archive will now provide a full index and search feature capability to help with e-Discovery. Searches and retrievals can be done in the background without disrupting applications or the archiving operations.
IBM Tivoli Storage Manager for Virtual Environments V6.2 extends capabilities that currently exist in IBM Tivoli Storage Manager. TSM backup/archive clients run fine on guest operating systems, but now this new extension improves backup for VMware environments. TSM provides incremental block-level backups utilizing VMware's vStorage APIs for Data Protection and Changed Block Tracking features.
To minimize impact to the VMware host, TSM for VE make use of non-disruptive snapshots and offload the backup processing to a vStorage backup server. This supports file-level recovery, volume-level recovery, and full VM recovery. Of course, since it is based on TSM v6, you get advanced storage efficiency features such as compression and deduplication to minimize consumption of disk storage pools.
IBM Tivoli Monitor has been extended to support virtual servers, including VMware, Linux KVM, and Citrix XenServer. This can help with capacity planning, performance monitoring, and availability. Tivoli Monitor will help you understand the relationships between physical and virtual resources to help isolate problems to the correct resource, reducing the time it takes for debug issues between servers and storage. See the
Next week is [IBM Pulse2011 Conference] in Las Vegas, February 27 to March 2. Sorry, I don't plan to be there this year. It is looking to be a great conference, with fellow inventor Dean Kamen as the keynote speaker. For a blast from the past, read my blog posts from Pulse2008 [Main Tent sessions] and [Breakout sessions].
For the longest time, people thought that humans could not run a mile in less than four minutes. Then, in 1954, [Sir Roger Bannister] beat that perception, and shortly thereafter, once he showed it was possible, many other runners were able to achieve this also. The same is being said now about the IBM Watson computer which appeared this week against two human contestants on Jeopardy!
(2014 Update: A lot has happened since I originally wrote this blog post! I intended this as a fun project for college students to work on during their summer break. However, IBM is concerned that some businesses might be led to believe they could simply stand up their own systems based entirely on open source and internally developed code for business use. IBM recommends instead the [IBM InfoSphere BigInsights] which packages much of the software described below. IBM has also launched a new "Watson Group" that has [Watson-as-a-Service] capabilities in the Cloud. To raise awareness to these developments, IBM has asked me to rename this post from IBM Watson - How to build your own "Watson Jr." in your basement to the new title IBM Watson -- How to replicate Watson hardware and systems design for your own use in your basement. I also took this opportunity to improve the formatting layout.)
Often, when a company demonstrates new techology, these are prototypes not yet ready for commercial deployment until several years later. IBM Watson, however, was made mostly from commercially available hardware, software and information resources. As several have noted, the 1TB of data used to search for answers could fit on a single USB drive that you buy at your local computer store.
Take a look at the [IBM Research Team] to determine how the project was organized. Let's decide what we need, and what we don't in our version for personal use:
Do we need it for personal use?
Yes, That's you. Assuming this is a one-person project, you will act as Team Lead.
Yes, I hope you know computer programming!
No, since this version for personal use won't be appearing on Jeopardy, we won't need strategy on wager amounts for the Daily Double, or what clues to pick next. Let's focus merely on a computer that can accept a question in text, and provide an answer back, in text.
Yes, this team focused on how to wire all the hardware together. We need to do that, although this version for personal use will have fewer components.
Optional. For now, let's have this version for personal use just return its answer in plain text. Consider this Extra Credit after you get the rest of the system working. Consider using [eSpeak], [FreeTTS], or the Modular Architecture for Research on speech sYnthesis [MARY] Text-to-Speech synthesizers.
Yes, I will explain what this is, and why you need it.
Yes, we will need to get information for personal use to process
Yes, this team developed a system for parsing the question being asked, and to attach meaning to the different words involved.
No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.
(Disclaimer: As with any Do-It-Yourself (DIY) project, I am not responsible if you are not happy with your version for personal use I am basing the approach on what I read from publicly available sources, and my work in Linux, supercomputers, XIV, and SONAS. For our purposes, this version for personal use is based entirely on commodity hardware, open source software, and publicly available sources of information. Your implementation will certainly not be as fast or as clever as the IBM Watson you saw on television.)
Step 1: Buy the Hardware
Supercomputers are built as a cluster of identical compute servers lashed together by a network. You will be installing Linux on them, so if you can avoid paying extra for Microsoft Windows, that would save you some money. Here is your shopping list:
Three x86 hosts, with the following:
64-bit quad-core processor, either Intel-VT or AMD-V capable,
8GB of DRAM, or larger
300GB of hard disk, or larger
CD or DVD Read/Write drive
Computer Monitor, mouse and keyboard
Ethernet 1GbE 4-port hub, and appropriate RJ45 cables
Surge protector and Power strip
Local Console Monitor (LCM) 4-port switch (formerly known as a KVM switch) and appropriate cables. This is optional, but will make it easier during the development. Once your implementation is operational, you will only need the monitor and keyboard attached to one machine. The other two machines can remain "headless" servers.
Step 2: Establish Networking
IBM Watson used Juniper switches running at 10Gbps Ethernet (10GbE) speeds, but was not connected to the Internet while playing Jeopardy! Instead, these Ethernet links were for the POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes.
The implementation will be able to run "disconnected from the Internet" as well. However, you will need Internet access to download the code and information sources. For our purposes, 1GbE should be sufficient. Connect your Ethernet hub to your DSL or Cable modem. Connect all three hosts to the Ethernet switch. Connect your keyboard, video monitor and mouse to the LCM, and connect the LCM to the three hosts.
Step 3: Install Linux and Middleware
To say I use Linux on a daily basis is an understatement. Linux runs on my Android-based cell phone, my laptop at work, my personal computers at home, most of our IBM storage devices from SAN Volume Controller to XIV to SONAS, and even on my Tivo at home which recorded my televised episodes of Jeopardy!
For this project, you can use any modern Linux distribution that supports KVM. IBM Watson used Novel SUSE Linux Enterprise Server [SLES 11]. Alternatively, I can also recommend either Red Hat Enterprise Linux [RHEL 6] or Canonical [Ubuntu v10]. Each distribution of Linux comes in different orientations. Download the the 64-bit "ISO" files for each version, and burn them to CDs.
Graphical User Interface (GUI) oriented, often referred to as "Desktop" or "HPC-Head"
Command Line Interface (CLI) oriented, often referred to as "Server" or "HPC-Compute"
Guest OS oriented, to run in a Hypervisor such as KVM, Xen, or VMware. Novell calls theirs "Just Enough Operating System" [JeOS].
For this version for personal use, I have chosen a [multitier architecture], sometimes referred to as an "n-tier" or "client/server" architecture.
Host 1 - Presentation Server
For the Human-Computer Interface [HCI], the IBM Watson received categories and clues as text files via TCP/IP, had a [beautiful avatar] representing a planet with 42 circles streaking across in orbit, and text-to-speech synthesizer to respond in a computerized voice. Your implementation will not be this sophisticated. Instead, we will have a simple text-based Query Panel web interface accessible from a browser like Mozilla Firefox.
Host 1 will be your Presentation Server, the connection to your keyboard, video monitor and mouse. Install the "Desktop" or "HPC Head Node" version of Linux. Install [Apache Web Server and Tomcat] to run the Query Panel. Host 1 will also be your "programming" host. Install the [Java SDK] and the [Eclipse IDE for Java Developers]. If you always wanted to learn Java, now is your chance. There are plenty of books on Java if that is not the language you normally write code.
While three little systems doesn't constitute an "Extreme Cloud" environment, you might like to try out the "Extreme Cloud Administration Tool", called [xCat], which was used to manage the many servers in IBM Watson.
Host 2 - Business Logic Server
Host 2 will be driving most of the "thinking". Install the "Server" or "HPC Compute Node" version of Linux. This will be running a server virtualization Hypervisor. I recommend KVM, but you can probably run Xen or VMware instead if you like.
Host 3 - File and Database Server
Host 3 will hold your information sources, indices, and databases. Install the "Server" or "HPC Compute Node" version of Linux. This will be your NFS server, which might come up as a question during the installation process.
Technically, you could run different Linux distributions on different machines. For example, you could run "Ubuntu Desktop" for host 1, "RHEL 6 Server" for host 2, and "SLES 11" for host 3. In general, Red Hat tries to be the best "Server" platform, and Novell tries to make SLES be the best "Guest OS".
My advice is to pick a single distribution and use it for everything, Desktop, Server, and Guest OS. If you are new to Linux, choose Ubuntu. There are plenty of books on Linux in general, and Ubuntu in particular, and Ubuntu has a helpful community of volunteers to answer your questions.
Step 4: Download Information Sources
You will need some documents for your implementation to process.
IBM Watson used a modified SONAS to provide a highly-available clustered NFS server. For this version, we won't need that level of sophistication. Configure Host 3 as the NFS server, and Hosts 1 and 2 as NFS clients. See the [Linux-NFS-HOWTO] for details. To optimize performance, host 3 will be the "official master copy", but we will use a Linux utility called rsync to copy the information sources over to the hosts 1 and 2. This allows the task engines on those hosts to access local disk resources during question-answer processing.
We will also need a relational database. You won't need a high-powered IBM DB2. Your implementation can do fine with something like [Apache Derby] which is the open source version of IBM CloudScape from its Informix acquisition. Set up Host 3 as the Derby Network Server, and Hosts 1 and 2 as Derby Network Clients. For more about structured content in relational databases, see my post [IBM Watson - Business Intelligence, Data Retrieval and Text Mining].
Linux includes a utility called wget which allows you to download content from the Internet to your system. What documents you decide to download is up to you, based on what types of questions you want answered. For example, if you like Literature, check out the vast resources at [FullBooks.com]. You can automate the download by writing a shell script or program to invoke wget to all the places you want to fetch data from. Rename the downloaded files to something unique, as often they are just "index.html". For more on wget utility, see [IBM Developerworks].
Step 5: The Query Panel - Parsing the Question
Next, we need to parse the question and have some sense of what is being asked for. For this we will use [OpenNLP] for Natural Language Processing, and [OpenCyc] for the conceptual logic reasoning. See Doug Lenat presenting this 75-minute video [Computers versus Common Sense]. To learn more, see the [CYC 101 Tutorial].
Unlike Jeopardy! where Alex Trebek provides the answer and contestants must respond with the correct question, we will do normal Question-and-Answer processing. To keep things simple, we will limit questions to the following formats:
Who is ...?
Where is ...?
When did ... happen?
What is ...?
Host 1 will have a simple Query Panel web interface. At the top, a place to enter your question, and a "submit" button, and a place at the bottom for the answer to be shown. When "submit" is pressed, this will pass the question to "main.jsp", the Java servlet program that will start the Question-answering analysis. Limiting the types of questions that can be posed will simplify hypothesis generation, reduce the candidate set and evidence evaluation, allowing the analytics processing to continue in reasonable time.
Step 6: Unstructured Information Management Architecture
The "heart and soul" of IBM Watson is Unstructured Information Management Architecture [UIMA]. IBM developed this, then made it available to the world as open source. It is maintained by the [Apache Software Foundation], and overseen by the Organization for the Advancement of Structured Information Standards [OASIS].
Basically, UIMA lets you scan unstructured documents, gleam the important points, and put that into a database for later retrieval. In the graph above, DBs means 'databases' and KBs means 'knowledge bases'. See the 4-minute YouTube video of [IBM Content Analytics], the commercial version of UIMA.
Starting from the left, the Collection Reader selects each document to process, and creates an empty Common Analysis Structure (CAS) which serves as a standardized container for information. This CAS is passed to Analysis Engines , composed of one or more Annotators which analyze the text and fill the CAS with the information found. The CAS are passed to CAS Consumers which do something with the information found, such as enter an entry into a database, update an index, or update a vote count.
(Note: This point requires, what we in the industry call a small matter of programming, or [SMOP]. If you've always wanted to learn Java programming, XML, and JDBC, you will get to do plenty here. )
If you are not familiar with UIMA, consider this [UIMA Tutorial].
Step 7: Parallel Processing
People have asked me why IBM Watson is so big. Did we really need 2,880 cores of processing power? As a supercomputer, the 80 TeraFLOPs of IBM Watson would place it only in 94th place on the [Top 500 Supercomputers]. While IBM Watson may be the [Smartest Machine on Earth], the most powerful supercomputer at this time is the Tianhe-1A with more than 186,000 cores, capable of 2,566 TeraFLOPs.
To determine how big IBM Watson needed to be, the IBM Research team ran the DeepQA algorithm on a single core. It took 2 hours to answer a single Jeopardy question! Let's look at the performance data:
Number of cores
Time to answer one Jeopardy question
Single IBM Power750 server
< 4 minutes
Single rack (10 servers)
< 30 seconds
IBM Watson (90 servers)
< 3 seconds
The old adage applies, [many hands make for light work]. The idea is to divide-and-conquer. For example, if you wanted to find a particular street address in the Manhattan phone book, you could dispatch fifty pages to each friend and they could all scan those pages at the same time. This is known as "Parallel Processing" and is how supercomputers are able to work so well. However, not all algorithms lend well to parallel processing, and the phrase [nine women can't have a baby in one month] is often used to remind us of this.
Fortuantely, UIMA is designed for parallel processing. You need to install UIMA-AS for Asynchronous Scale-out processing, an add-on to the base UIMA Java framework, supporting a very flexible scale-out capability based on JMS (Java Messaging Services) and ActiveMQ. We will also need Apache Hadoop, an open source implementation used by Yahoo Search engine. Hadoop has a "MapReduce" engine that allows you to divide the work, dispatch pieces to different "task engines", and the combine the results afterwards.
Host 2 will run Hadoop and drive the MapReduce process. Plan to have three KVM guests on Host 1, four on Host 2, and three on Host 3. That means you have 10 task engines to work with. These task engines can be deployed for Content Readers, Analysis Engines, and CAS Consumers. When all processing is done, the resulting votes will be tabulated and the top answer displayed on the Query Panel on Host 1.
Step 8: Testing
To simplify testing, use a batch processing approach. Rather than entering questions by hand in the Query Panel, generate a long list of questions in a file, and submit for processing. This will allow you to fine-tune the environment, optimize for performance, and validate the answers returned.
There you have it. By the time you get your implementation fully operational, you will have learned a lot of useful skills, including Linux administration, Ethernet networking, NFS file system configuration, Java programming, UIMA text mining analysis, and MapReduce parallel processing. Hopefully, you will also gain an appreciation for how difficult it was for the IBM Research team to accomplish what they had for the Grand Challenge on Jeopardy! Not surprisingly, IBM Watson is making IBM [as sexy to work for as Apple, Google or Facebook], all of which started their business in a garage or a basement with a system as small as this version for personal use.
The IBM Challenge was a big success. One of the contestants, Ken Jennings, [welcomes our new computer overlords]. Congratulations are in order to the IBM Research team who pulled off this Herculean effort!
Some folks have poked fun at some of the odd responses and wager amounts from the IBM Watson computer during the three-day tournament. Others were surprised as I was that the impressive feat was done with less than 1TB of stored data. Here is what John Webster wrote in CNET yesterday, in hist article [What IBM's Watson says to storage systems developers]:
"All well and good. But here's what I find most interesting as a result of what IBM has done in response to the Grand Challenge that motivated Watson's creators. We know, from Tony Pearson's blog, that the foundation of Watson's data storage system is a modified IBM SONAS cluster with a total of 21.6TB of raw capacity. But Pearson also reveals another very significant, and to me, surprising data point: "When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte."
What Pearson just said is that the data set Watson actually uses to reach his push-the-button decision would fit on a 1TB drive. So much for big data?"
To better appreciate how difficult the challenge was, and how a small amount of data can answer a billion different questions, I thought I would cover Business Intelligence, Data Retrieval and Text Mining concepts.
"In this paper, business is a collection of activities carried
on for whatever purpose, be it science, technology,
commerce, industry, law, government, defense, et cetera.
The communication facility serving the conduct of a business
(in the broad sense) may be referred to as an intelligence
system. The notion of intelligence is also defined
here, in a more general sense, as the ability to apprehend
the interrelationships of presented facts in such a way as
to guide action towards a desired goal."
Ideally, when you need "Business Intelligence" to help you make a better decision, you perform data retrieval from a structured database for the specific information you are looking for. In other cases, you might be looking for insight, patterns or trends. In that case, you go "data mining" against your structured databases.
Here's a simple example. John runs a fruit stand. One day, he kept track of how many apples and oranges were bought by men and women. How many questions can we ask against this small set of data? Let's count them:
How many apples were sold to men?
How many apples were sold to women?
How many oranges were sold to men?
How many oranges were sold to women?
But wait! For each row and column, we can combine them into totals.
How many apples were sold in total?
How many oranges were sold in total?
How many fruit in total were sold to men?
How many fruit in total were sold to women?
How many fruit in total were sold?
But wait, there's more! Each row and column can be evaluated for relative percentages, as well as percentages of each cell compared to the total. You could make five relevant pie-charts from this data. This results in 16 more questions, such as:
Of the fruit purchased by men, what percentage for apples?
Of all the apples purchased, what percentage by women?
And that's not including more ethereal questions, such as:
Are there gender-specific preferences for different types of fruit?
What type of fruit do men prefer?
This is just for a small set, two market segments (by gender) and two products (apples and oranges). However, if you have many market segments (perhaps by age group, zip code, etc.) and many products, the number of queries that can be supported is huge. For small sets of data, you can easily do this with a spreadsheet program like IBM Lotus Symphony or Microsoft Excel.
But why limit yourself to two dimensions? The above example was just for one day's worth of activity, if John captures this data for every day for historical and seasonal trending, it can be represented as a three-dimensional cube. The number of queries becomes astronomical. This is the basis for Online Analytical Processing (OLAP), and three-dimensional tables are often referred to as [OLAP cubes].
Back in 1970, IBM invented the Structured Query Language [SQL], and today, nearly all modern relational databases support this, including IBM DB2, Informix, Microsoft SQL Server, and Oracle DB. SQL poses two challenges. First, you had to structure the data in advance to the way you expect to perform your ad-hoc queries. Deciding the groups and categories in advance can limit the way information is recorded and captured.
Second, you had to be skilled at SQL to phrase your queries correctly to retrieve the data you are after. What ended up happening was that skilled SQL programmers would develop "canned reports" with fixed SQL parameters, so that less-skilled business decision makers could base their decisions from these reports.
IBM has fully integrated stacks to help process structured data, combining servers, storage, and advanced analytics software into a complete appliance. IBM offers the [Smart Analytics System] for robust, customized deployments, and recently acquired [Netezza] for pre-configured, and more rapid deployments.
However, the bigger problem is that more than 80 percent of information is not structured!
Semi-structured data like email provides some searchable fields like From and Subject. The rest of the information is unstructured, such as text files, photographs, video and audio. To look for specific information in unstructured sources can be like looking for a needle in a haystack, and trying to get insight, patterns or trends involves text mining.
This, in effect, is what IBM Watson was able to perform so well this week. Finding the needle in the haystacks of unstructured data from 200 million pages of text stored in its system, combined with the ability to apprehend the interrelationships of meaning and subtle nuance, resulted in an impressive technology demonstration. Certainly, this new technology will be powerful for a variety of use cases across a broad set of industries!
Full VMware Vstorage API for Array Integration (VAAI). Back in 2008, VMware announced new vStorage APIs for its vSphere ESX hypervisor: vStorage API for Site Recovery Manager, vStorage API for Data Potection, vStorage API for Multipathing. Last July, VMware added a new API called vStorage API for Array Integration [VAAI] which offers three primitives:
Hardware-assisted Blocks zeroing. Sometimes referred to as "Write Same", this SCSI command will zero out a large section of blocks, presumably as part of a VMDK file. This can then be used to reclaim space on the XIV on thin-provisioned LUNs.
Hardware-assisted Copy. Make an XIV snapshot of data without any I/O on the server hardware.
Hardware-assisted locking. On mainframes, this is call Parallel Access Volumes (PAV). Instead of locking an entire LUN using standard SCSI reserve commands, this primitive allows an ESX host to lock just an individual block so as not to interfere with other hosts accessing other blocks on that same LUN.
Quality of Service (QoS) Performance Classes.
When XIV was first released, it treated all hosts and all data the same, even when deployed for a variety of different applications. This worked for some clients, such as [Medicare y Mucho Más]. They migrated their databases, file servers and email system from EMC CLARiiON to an IBM XIV Storage System. In conjunction with VMware, the XIV provides a highly flexible and scalable virtualized architecture, which enhances the company's business agility.
However, other clients were skeptical, and felt they needed additional "nobs" to prioritize different workloads. The new 10.2.4 microcode allows you to define four different "performance classes". This is like the door of a nightclub. All the regular people are waiting in a long line, but when a celebrity in a limo arrives, the bouncer unclips the cord, and lets the celebrity in. For each class, you provide IOPS and/or MB/sec targets, and the XIV manages to those goals. Performance classes are assigned to each host based on their value to the business.
Offline Initialization for Asynchronous Mirror.
Internally, we called this Truck Mode. Normally, when a customer decides to start using Asynchronous Mirror, they already have a lot of data at the primary location, and so there is a lot of data to send over to the new XIV box at the secondary location. This new feature allows the data to be dumped to tape at the primary location. Those tapes are shipped to the secondary location and restored on the empty XIV. The two XIV boxes are then connected for Asynchronous Mirroring, and checksums of each 64KB block are compared to determine what has changed at the primary during this "tape delivery time". This greatly reduces the time it takes for the two boxes to get past the initial synchronization phase.
IP-based Replication. When IBM first launched the Storwize V7000 last October, people commented that the one feature they felt missing was IP-based replication. Sure, we offered FCP-based replication as most other Enterprise-class disk systems offer today, but many midrange systems also offer IP-based repliation to reduce the need for expensive FCIP routers. [IBM Tivoli Storage FastBack for Storwize V7000] provides IP-based replication for Storwize V7000 systems.
Network Attached Storage
IBM announced two new models of the IBM System Storage N series. The midrange N6240 supports up to 600 drives, replacing the N6040 system. The entry-level N6210 supports up to 240 drives, and replaces the N3600 system. Details for both are available on the latest [data sheet].
IBM Real-Time Compression appliances work with all N series models to provide additional storage efficiency. Last October, I provided the [Product Name Decoder Ring] for the STN6500 and STN6800 models. The STN6500 supports 1 GbE ports, and the STN6800 supports 10GbE ports (or a mix of 10GbE and 1GbE, if you prefer). The IBM versions of these models were announced last December, but some people were on vacation and might have missed it. For more details of this, read the [Resources page], the [landing page], or [watch this video].
IBM System Storage DS3000 series
IBM System Storage [DS3524 Express DC and EXP3524 Express DC] models are powered with direct current (DC) rather than alternating current (AC). The DS3524 packs dual controllers and two dozen small-form factor (2.5 inch) drives in a compact 2U-high rack-optimized module. The EXP3524 provides addition disk capacity that can be attached to the DS3524 for expansion.
Large data centers, especially those in the Telecommunications Industry, receive AC from their power company, then store it in a large battery called an Uninterruptible Power Supply (UPS). For DC-powered equipment, they can run directly off this battery source, but for AC-powered equipment, the DC has to be converted back to AC, and some energy is lost in the conversion. Thus, having DC-powered equipment is more energy efficient, or "green", for the IT data center.
Whether you get the DC-powered or AC-powered models, both are NEBS-compliant and ETSI-compliant.
New Tape Drive Options for Autoloaders and Libraries
IBM System Storage [TS2900 Autoloader] is a compact 1U-high tape system that supports one LTO drive and up to 9 tape cartridges. The TS2900 can support either an LTO-3, LTO-4 or LTO-5 half-height drive.
IBM System Storage [TS3100 and TS3200 Tape Libraries] were also enhanced. The TS3100 can accomodate one full-height LTO drive, or two half-height drives, and hold up to 24 cartridges. The TS3200 offers twice as many drives and space for cartridges.
The Tucson Executive Briefing Center hosted 20 dignitaries from local companies and academia.
This is a historic competition, an exhibition match pitting a computer against the top two celebrated Jeopardy champions:
Brad Rutter, won $3.2 million USD on Jeopardy!, winning 5 days on the show, and then three later tournamets.
Ken Jennings, winning $2.5 million in a 74-day winning streak on Jeopardy!
One of the members of the audience had never seen an episode of Jeopardy! in his life.
(Note: there are NO SPOILERS in this blog post. If you have not yet watched the show, you are safe to continue reading the rest of this post. I will not
disclose the correct responses to any of the clues nor how well each contestant scored.)
Calline Sanchez, IBM Director, Systems Storage Development for Data Protection and Retention, kicked off today's ceremonies.
The IBM Watson computer, named after IBM founder Thomas J. Watson, has been developed over the past 4 years by a team of IBM scientists who set out to accomplish a grand challenge - build a computing system that rivals a human's ability to answer questions posed in natural language with speed, accuracy and confidence. IBM Research labs in the United States, Japan, China and Israel [collaborated with Artificial Intelligence (AI) experts at eight universities], including Massachusetts Institute of Technology (MIT), University of Texas (UT) at Austin, University of Southern California (USC), Rensselaer Polytechnic Institute (RPI), University at Albany (UAlbany), University of Trento (Italy), University of Massachusetts Amherst, and Carnegie Mellon University.
(Disclaimer: I attended the University of Texas at Austin. My father attended Carnegie Mellon University.)
Last week, NOVA on PBS had a special episode on the making of IBM Watson, you can [watch it online] on their website. Delaney Turner, IBM Social Media Communications Manager for Business Analytics Software, has posted [his observations of Nova].
Since IBM Watson is the size of 10 refrigerators and weighs over 14,000 pounds, it was easier to design the Jeopardy! set at the TJ Watson Research lab in Yorktown Heights, NY, than to ship it over to California where the show is normally recorded. Two of the visual designers that worked on this set, as well as on the visual appearance of Watson, live in Tucson and were part of our audience today.
The IBM Challenge consists of a two-game tournament, where the scores of both games will be added to determine winner rankings. The producers of Jeopardy! will give $1 million dollars USD to first place, $300,000 to second place, and $200,000 to third place. Regardless of outcome, [IBM will donate all of its winings to charity]. The two human contestants plan to donate half of their earnings to their favorite charities as well.
Jeopardy! The IBM Challenge
Alex Trebek introduces IBM Watson, explaining that it can neither hear nor see. It will receive all information electronically. Categories and clues will be sent as text files via TCP/IP over Ethernet at the same time the two human contestants see them so that all have the same time to think about the right answer.
Watson has two rows of five racks, back to back. This was done so that cold air could rise up from holes in the tile floors around the unit, and all the hot air would be forced into the center and up to the ceiling return. This technique is known as "hot aisle/cold aisle" design. Alex Trebek opens one of the rack doors to show a series of 4U-high IBM Power 750 servers.
The avatar is a representation of Watson, as the machine itself is too big to fit behind the podium. The avatar is IBM's "Smarter Planet" logo with orbiting streaks and circles. It shows "Green" when it has high confidence, and orange when it gets an answer wrong. When busy thinking, the streaks and circles speed up, the closest we will see to "watching a computer sweat."
During the show, an "Answer panel" shows Watson's top three candidate responses, with confidence level compared to its current "buzz threshold".
Watson knows what it knows, and knows what it doesn't know. Here is an [Interactive Watson Game] on New York Times website to give you an idea of how the answer panel works. I was impressed with how close all three candidate answers were. In a question about Olympic swimmers, all three candidates are Olympic swimmers. In a question about the novel "Les Miserables", all three candidates were characters of that novel.
Well, IBM Watson did well, but missed answered some questions incorrectly. This [parody Slate video] pokes fun at this. Here were some discussions we had after the show ended:
IBM did not do well in categories that required [abductive reasoning]. For example, to identify two or three things that happened in different years, and then postulate that what they all have in common is a specific decade (such as the 1950s) is difficult.
Watson does not hear the wrong answers from the two human contestants. For one question, Ken buzzes in first, guesses wrong, then Watson buzzes in with the same exact response. Alex Trebek rebukes Watson with "No, Ken just said that!" Brad would learn from their mistakes and guess correctly for the score.
Watson is provided the correct answer after a contestant guesses it correctly, or if nobody does, when Alex provides the correct response. This is sent as a text message to Watson immediately, so that it can use this information to adjust its algorithms and machine-learning for future clues in that same category. This was evident in the "Answer panel" on the fourth and fifth attempts on the category of "Decades".
With this demonstration, IBM Research has advanced science by leaps and bounds for the Articial Intelligence community. IBM is a leader in Business Analytics, and this technology will find uses in a variety of industries. The average knowledge worker spends 30 percent of her time looking for information on corporate data repositories. By demonstrating a computer that can provide answers quickly, employees will be more productive, make stronger business decisions, and have greater insight.
Day 1 was only able to cover the first round of Game 1. This allowed more time to talk about the history and technology of IBM Watson. Tomorrow, the contestants will finish Game 1 and head into Game 2.
"When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, the actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1 Terabyte (TB). For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers."
I had several readers ask me to explain the significance of the "Terabyte". I'll work my way up.
A bit is simply a zero (0) or one (1). This could answer a Yes/No or True/False question.
Most computers have standardized a byte as a collection of 8 bits. There are 256 unique combinations of ones and zeros possible, so a byte could be used to storage a 2-digit integer, or a single upper or lower case character in the English alphabet. In pratical terms, a byte could store your age in years, or your middle initial.
The Kilobyte is a thousand bytes, enough to hold a few paragraphs of text. A typical written page could be held in 4 KB, for example.
The IBM Challenge to play on Jeopardy! is being compared to the historic 1969 moon landing. To land on the moon, Apollo 11 had the "Apollo Guidance Computer" (AGC) which had 74KB of fixed read-only memory, and 2KB of re-writeable memory. Over [3500 IBM employees were involved] to get the astronauts to the moon and safely back to earth again.
The importance of this computer was highlighted in a [lecture by astronaut David Scott] who said: "If you have a basketball and a baseball 14 feet apart, where the baseball represents the moon and the basketball represents the Earth, and you take a piece of paper sideways, the thinness of the paper would be the corridor you have to hit when you come back."
The Megabyte is a thousand KB, or a million bytes. The 3.5-inch floppy diskette, mentioned in my post [A Boxfull of Floppies] could hold 1.44MB, or about 360 pages of text.
In the article [Wikipedia as a printed book], the printing of a select 400 articles resulted in a book 29 inches thick. Those 5,000 pages would consume about 20 MB of space.
One of my favorite resources I use to search is the Internet Movie Data Base [IMDB]. Leaving out the photos and videos, the [text-only portion of the IMDB database is just over 600 MB], representing nearly all of the actors, awards, nominations, television shows and movies. A standard CD-ROM can hold 700MB, so the text portion of the IMDB could easily fit on a single CD.
The Gigabyte is a thousand MB, or a billion bytes. My Thinkpad T410 laptop has 4GB of RAM and 320GB of hard disk space. My laptop comes with a DVD burner, and each DVD can hold up to 4.7GB of information.
The popular Wikipedia now has some 17 million articles, of which 3.5 million are in English language. It would only take [14GB of space to hold the entire English portion] of Wikipedia. That is small enough to fit on twenty CDs, three DVDs, an Apple iPad or my cellphone (a Samsung Galaxy S Vibrant).
Perhaps you are thinking, "Someone should offer Wikipedia pre-installed on a small handheld!" Too late. The [The Humane Reader] is able to offer 5,000 books and Wikipedia in a small device that connects to your television. This would be great for people who do not have access to the internet, or for parents who want their kids to do their homework, but not be online while they are doing it.
In the latest 2009 report of [How Much Information?] from the University of California, San Diego, the average American consumes 34 GB of information. This includes all the information from radio, television, newspapers, magazines, books and the internet that a person might look at or listen to throughout the day. This project is sponsored by IBM and others to help people understand the nature of our information-consuption habits.
Back in 1992, I visited a client in Germany. Their 90 GB of disk storage attached to their mainframe was the size of three refrigerators, and took five full-time storage administrators to manage.
The Terabyte is a thousand GB, or a trillion bytes. It is now possible to buy external USB drive for your laptop or personal computer that holds 1TB or more. However, at 40MB/sec speeds that USB 2.0 is capable of, it would take seven hours to do a bulk transfer in or out of the device.
IBM offers 1TB and 2TB disk drives in many of our disk systems. In 2008, IBM was preparing to announce the first 1TB tape drive. However, Sun Microsystems announced their own 1TB drive the day before our big announcement, so IBM had to rephrase the TS1130 announcement to [The World's Fastest 1TB tape drive!]
A typical academic research library will hold about 2TB of information. For the [US Library of Congress] print collection is considered to be about 10TB, and their web capture team has collected 160TB of digital data. If you are ever in the Washington DC, I strongly recommend a visit to the Library of Congress. It is truly stunning!
Full-length computer animated movies, like [Happy Feet], consume about 100TB of disk storage during production. IBM offers disk systems that can hold this much data. For example, the IBM XIV can hold up to 151 TB of usable disk space in the size of one refrigerator.
A Key Performance Indicator (KPI) for some larger companies is the number of TB that can be managed by a full-time employee, referred to as TB/FTE. Discussions about TB/FTE are available from IT analysts including [Forrester Research] and [The Info Pro].
The website [Ancestry.com] claims to have over 540 million names in its genealogical database, with a storage of 600TB, with the inclusion of [US census data from 1790 to 1930]. The US government took nine years to process the 1880 census, so for the 1890 census, it rented equipment from Herman Hollerith's Tabulating Machine Company. This company would later merge with two others in 1911 to form what is now called IBM.
A Petabyte is thousand TB, or a quadrillion bytes. It is estimated that all printed materials on Earth would represent approximately 200 PB of information.
IBM's largest disk system, the Scale-Out Network Attach Storage (SONAS) comprised of up to 7,200 disk drives, which can hold over 11 PB of information. A smaller 10-frame model, the same size as IBM Watson, with six interface nodes and 19 storage pods, could hold over 7 PB of information.
For those of us in the IT industry, 1TB is small potatoes. I for one, was expecting it to be much bigger. But for everyone else, the equivalent of 200 million pages of text that IBM Watson has loaded inside is an incredibly large repository of information. I suspect IBM Watson probably contains the complete works of Shakespeare as well as other fiction writers, the IMDB database, all 3.5 million articles of Wikipedia, religious texts like the Bible and the Quran, famous documents like the Magna Carta and the US Constitution, and reference books like a Dictionary, a Thesaurus, and "Gray's Anatomy". And, of course, lots and lots of lists.
For those on Twitter, follow [@ibmwatson] these next three days during the challenge.
We are only days away from the big IBM Challenge of Watson computer against two human contestants on the show Jeopardy!
I watched two episodes of Jeopardy! on my Tivo, pausing it to follow the [homework assignment] I suggested in my last post. Here are my own results and observations.
Episode  involved a web programmer, a customer service representative, and a bank teller.
Of the first six categories in Round 1, I guessed four of the six themes for each category. For the category "Diamonds are Forever", I wrote down "All answers are some kind of gem or mineral", but the reality was that all the answers were some physical characteristic of diamonds specifically. For the category "...Fame is not", I wrote down "All answers are TV or Movie celebrities". I was close, but actually it was famous celebrities, rock bands and pop culture of the 1980s. (The movie "Fame" came out in 1980).
In the round, there were 27 of the 30 answers given before they ran out of time. Of these, I was able to get 24 of 27 correct by searching the Internet. That is 88 percent correct. Here were the ones that eluded me:
Answer related to a "multi-chambered mollusk". I could not find anything on the Internet definitively on this, so abstained from wager. The correct question was "What is Nautilus?".
Answer was the Irish variant of "Kathryne". I found Kathleen as a variant, but did not investigate if it had Irish origins. The correct question was "What is Caitlin?"
Answer was this Norse name for "ruler" whether you had red hair or not. I found "Roy" and "Rory" so guessed "What is Rory?" The correct question was "What is Eric?"
The second round, I guesed three of the six themese for the categories. For category "Musical Titles Letter Drop" I wrote down "All the answers are titles of musical songs" but it was actually "Musicals" as in the Broadway shows. For category "Place called Carson", I wrote down "All the answers are places" and was way off on that one, with answers that were people, places and names of corporations. And for "State University Alums", I wrote down "All the answers are college graduates", but instead they were all "State Universities" such as the University of Arizona.
In this second round, only 26 answers were posed. I got 80 percent correct with Internet searching. I missed three on the "Musical Titles", one in "Pope-pourri" and one State University (sorry SMU). The "Musical Titles Letter Drop category" was especially difficult, as for each title of a Musical, you had to remove a single letter out of it to form the correct response.
For the answer "Good luck when you ask the singers "What I Did For Love"; they never tell the truth", you would need to take "Chorus Line" the musical, where the song "What I did for Love" appears, and ask "What is Chorus Lie?" Note that "line" changed to "lie" and the letter "n" was dropped out.
For the answer "Embrace the atoms as Simba and company lose and gain electrons en masse in this production", you would need to recognize that Simba was the main character of "The Lion King" and change it to "What is The Ion King".
I think these play-on-words are the questions that would stump the IBM Watson computer.
In the final round, the category was "Ancient Quotes". I thought the answer would be a famous adage or quotation, but it was instead famous people who uttered those phrases. The answer was "He said, to leave this stream uncrossed will breed manifold distress for me; to cross it, for all mankind". I was able to determine the correct response readily from searching the Internet: The river was the Rubicon, the border of the Gaul region governed by an ambitious general. The correct response "Who was Julius Caesar?"
Total time for the entire exercise: 87 minutes.
The following night, episode  brought back Paul Wampler, the returning champion web programmer, against two new contestants: an actor, and high school principal.
Of the first six categories in Round 1, I guessed five of the six themes for each category. For the category "Nonce Words", I wrote all the answers would be nonsense words. I was close, the clues had words invented for a particular occasion, but the correct responses did not.
I was able to get 29 of 30 correct by searching the Internet. That is 96 percent correct. The one I missed was in the category "Nonce Words" and the answer was "In an arithmocracy, this portion of the population rules, not trigonometry teachers.." My response was "What is Math?" but the correct answer was "What are the majority?" It did not occur for me to even look up [Arithmocracy] as a legitimate word, but it is real.
The second round, I guesed five of the six themese for the categories. For category "Hawk" eyes, the "Hawk" was in quotation marks, so I wrote "All answers would start with the word Hawk or end with the word "eyes". I was close, the correct theme was that the word "hawk" would appear in the front, middle or end of the correct response.
In this second round, I got 28 of 30 correct. I got 93 percent correct with Internet searching. Ironically, it was the category "German Foods" that caught me off guard.
For, the answer was "Pichelsteiner Fleisch, a favorite of Otto von Bismarck, is this one-pot concoction, made with beef & pork". I know that "fleisch" is a German word for meat, so I guessed "What is sausage?" but the correct response was "What is stew?" I should have paid more attention to the "one-pot concoction" part of the answer.
For the answer was "Mimi Sheraton says German stuffed hard-boiled eggs are always made with a great deal of this creamy product". I didn't realize that "stuffed eggs" was German for "deviled eggs". Instead, I found Mimi Sheraton's "The German Cookbook" on Google Books, and jumped to the page for "Stuffed Eggs" The ingredients I read included whippedc cream, cognac, and worcestershire sauce. Taking the "creamiest" ingredient of these, I wrote down "What is whipped cream?" However, it turned out I was actually reading the ingredients for "Crabmeat Cocktail" that was coninuing from the previous page. I thought it was gross to put whipped cream with eggs, and should have known better. The correct response was "What is mayonnaise?"
In the final round, the category was "Political Parties". This could either be political organizations like Republicans and Democrats, or festivities like the Whitehouse Correspondents Dinner. The answer was "Only one U.S. president represented this party, and he said, I dread...a division of the republic into two great parties." So, we can figure out the answer refers to political organizations, but both Democrat and Republican are ruled out because each has had multiple presidents. So, looking at a [List of Political Parties of each US President], I found that there were four presidents in the Whig party, four in the Democrat-Republic party, but only one president in the Federalist party (John Adams), and one in the War Union party (Andrew Johnson). Looking at [famous quotes from John Adams] first, I found the quote, it matched, and so I wrote down "What is the Federalist party?". I got it right, as did two of the three contestants. Ironically, the one contestant who got it wrong, the returning champion web programmer, wagered a small amount, so he still had more money after the round and won the game overall.
Total time for the entire exercise: 75 minutes. I was able to do this faster as I skipped searching the internet for the responses I was confident on.
To find out when Jeopardy is playing in your town, consult the [Interactive Map].
With all the excitement of the [IBM Challenge], where the [IBM Watson computer] will compete against humans on [Jeopardy!], I thought it would be good to provide the following homework exercise to help you appreciate how challenging the game is and the strategies required.
Overview of the game of Jeopardy!
If you are familiar with the show, you can safely skip this section.
Known as "America's Favorite Quiz Show", the Jeopardy pits three contestants against each other. The board is divided into six columns and five rows of answers. Each column indicates the category for that column of answers. The rows are ranked from easiest to most difficult, with more difficult answers being worth more money to wager.
The contestants take turns. The returning champion gets to select a spot on the board, by indicating the category (column) and wager (row), such as "I will take Animals for 800 dollars!" Contestants must then press a button to "buzz in", be recognized by the host, and respond correctly. If the contestant responds incorrectly, the other two contestants have the opportunity to respond. The contestant with the correct response gets to chose the next answer.
For each turn, the host, Alex Trebek, shows the answer on the board, and spends three seconds reading it aloud to give everyone a chance to come up with a corresponding question. This is perhaps what Jeopardy is most famous for. In a traditional "Quiz Show", the host asks questions, and the contestants answer that question. On Jeopardy, however, the host poses "answers", and the contestants provide their response in the form of a "questions" that best fit the category and answer clues. For example, if the categories were "Large Corporations" and the answer was "Sam Palmisano", the contestant would answer "Who is the CEO of IBM Corporation?" Both the categories, and the answers are filled with puns, slang and humor to make it more challenging. Often, the answer itself is not sufficient clue, you have to factor in the category as well to have a complete set of information.
The game is played in three rounds:
In the first round, there are six categories, and the rows are worth $200, $400, $600, $800 and $1000 dollars. If you respond correctly on all five answers in a category column, you would win $3000. If you respond to all thirty answers correctly, you would earn $18,000.
In the second round, there are six different categories, and the rows are worth twice as much.
The final round has a single category and a single question. Each player can decide to wager up to the full amount of their score in this game. This wager is done after they see the category, but before they see the answer.
After the host finishes reading the answer aloud, the buzzers are lighted so that the contestants can buzz in. If a contestant gets the question correctly, he earns the corresponding money for the row it was in. If the contestant guesses incorrectly, the money is subtracted from his score. If the first contestant fails, the buzzers are re-lit so the other two contestants can then buzz in with their answers, learning from previous failed attempts.
To provide added challenge, some of the answers are surprise "Daily Double". Instead of the dollar amount for the row, the contestant can wager any amount, up to their total score they have won so far in that game, or the largest dollar amount for that round, whichever is higher, based on his confidence in that category. There is one "Daily Double" surprise in the first round, and two in the second round.
In the final round, each contestant wagers an amount up to their total score, based on their confidence on the final category. A common strategy for the leading contestant with the highest score is to wager a low amount, so that if he fails to guess the response correctly, he will still have a large dollar amount. For example, if the leader has $2000 and the second place is $900, the leader can wager only $100 dollars, and the second place might wager his full $900. If the leader loses the round, he still has $1900, beating the second place regardless of how well he does.
Whomever has the most money at the end of all three rounds wins that amount of cash, and gets to return to the show for another game the next day to continue his winning streak. The other two contestants are given consolation prizes and a nominal appearance fee for being on the show, and are never seen from again.
The show is only 30 minutes long, so the folks at Sony Pictures who produce the show can film a full weeks' worth of television shows in just two days of real-life, Tuesday and Wednesday, allowing the host Alex Trebek and his "Clue Crew" time to research new categories and answers.
So, here is your homework assignment. Record a full episode of Jeopardy on your VCR or Digital Video Recorder (DVR) and have your thumb ready to press the pause button. For each round, listen to each category, pause, and try to guess what all the answers in that column will have in common. For each category, write down a statement like "All the responses in this category are ...".
The answers could be people, places or things. Suppose the category "Chicks Dig Me". In English, "chicks" can be slang for women, or refer to young chickens. The term "dig" can be slang for admires or adores, so this could be "Male Celebrities" that women find attractive, it could be objects of desire that women fancy (diamonds, puppies, etc.), or it could be places that women like to go to. As it turns out, the "dig" referred to archaeology, and the responses were all famous female archaeologists.
Once you have those all your statements written down, press play button again.
Next, as each answer is shown, you have three seconds to hit the pause again, so that you have the question on the screen, but before any contestants have responded. Go on your favorite search engine like Google or Bing and try to determine the correct response based on the category and answer. Consider these [tips for being an Internet Search ninja]. Once you think you have figured out your response, write it down, and the dollar amount you wager, or decide you will not respond for that answer, if you are not sure about your findings.
Even if you think you already know the correct response, you may decide to gain more confidence of your response by finding confirming or supporting evidence on the Internet.
Press play. Either one of the contestants will get it right, or the host will provide the question that was expected as the correct response.
How well did you do? Were you able to find on the the correct response online, or at least confirm that what you knew was correct. If you got it correct, add in your dollar amount to your score. If you got it wrong, subtract the amount.
At the end of each round, look back at your statements for each category. Did you guess correctly the common theme for each category column of answers? Did you misinterpret the slang, pun or humor intended?
At the end of the game, you might have done better than the contestant that won the game. However, check how much added time you took to do those Internet searches. The average winner only questions half of the answers and only gets 80 percent of them correctly.
If you are really brave, take the [Jeopardy Online Test]. If you do this homework assignment, feel free to post your insights in the comments below.
Tonight PBS plans to air Season 38, Episode 6 of NOVA, titled [Smartest Machine On Earth]. Here is an excerpt from the station listing:
"What's so special about human intelligence and will scientists ever build a computer that rivals the flexibility and power of a human brain? In "Artificial Intelligence," NOVA takes viewers inside an IBM lab where a crack team has been working for nearly three years to perfect a machine that can answer any question. The scientists hope their machine will be able to beat expert contestants in one of the USA's most challenging TV quiz shows -- Jeopardy, which has entertained viewers for over four decades. "Artificial Intelligence" presents the exclusive inside story of how the IBM team developed the world's smartest computer from scratch. Now they're racing to finish it for a special Jeopardy airdate in February 2011. They've built an exact replica of the studio at its research lab near New York and invited past champions to compete against the machine, a big black box code -- named Watson after IBM's founder, Thomas J. Watson. But will Watson be able to beat out its human competition?"
Like most supercomputers, Watson runs the Linux operating system. The system runs 2,880 cores (90 IBM Power 750 servers, four sockets each, eight cores per socket) to achieve 80 [TeraFlops]. TeraFlops is the unit of measure for supercomputers, representing a trillion floating point operations. By comparison, Hans Morvec, principal research scientist at the Robotics Institute of Carnegie Mellon University (CMU) estimates that the [human brain is about 100 TeraFlops]. So, in the three seconds that Watson gets to calculate its response, it would have processed 240 trillion operations.
Several readers of my blog have asked for details on the storage aspects of Watson. Basically, it is a modified version of IBM Scale-Out NAS [SONAS] that IBM offers commercially, but running Linux on POWER instead of Linux-x86. System p expansion drawers of SAS 15K RPM 450GB drives, 12 drives each, are dual-connected to two storage nodes, for a total of 21.6TB of raw disk capacity. The storage nodes use IBM's General Parallel File System (GPFS) to provide clustered NFS access to the rest of the system. Each Power 750 has minimal internal storage mostly to hold the Linux operating system and programs.
When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.
As time progresses, things change, sometimes for the better in the right direction, sometimes a step backwards, and sometimes just different enough to be annoying. I wrote my blog post about [A Box Full of Floppies] a week ago, and posted in Monday. Let's take a look at how time and change impacted that one post.
The weather has warmed up here in Tucson so I started my Spring Cleaning early this year...
If there is ever a good time to brag about how beautiful the weather here in Tucson, it would be when everyone else in the country is digging themselves out of piles of snow. When my friends on Twitter were complaining how cold it was in Scottland, Ireland, Canada, or the East Coast of the United States, I would remind them that I am wearing a tee-shirt and shorts. I played golf for a week last December!
Sadly, a few days after my post, Tucson had the coldest days of February, breaking records set back in the year 1899. Water pipes were frozen, outdoor plants have suffered, and over 14,000 homes and businesses were shut off from natural gas. The 1,400-plus employees at the IBM Tucson facility have been asked to telecommute until restroom facilities can be restored to working order.
While we should all pay more attention to [climate change], this latest chill is probably just a seasonal flucation thanks to [La Niña] that happens every 10-15 years.
Here is a YouTube video of an astronaut ejecting a floppy disk...
Back in 2009, YouTube decided to [stop supporting Internet Explorer 6 (IE6)] to view its videos. However, that is what most IBMers were on, and this posted a problem when I embedded a video on my blog. To get around that, my friends at Microsoft provided special "conditional HTML tags" that allows me to suppress YouTube videos when viewed from Internet Explorer. The video shows up for those using Chrome, Opera, Firefox or other browsers, but is suppressed for IE users, and that allowed IBM employees to at least read the text.
Fortunately, last July, IBM decided to switch from IE6 over to Mozilla Firefox as the standard browser, so I thought this would no longer be an issue.
Unfortunately, my friends at YouTube have done it again. They changed the generated embed code from using "object" tags to "iframe" which messes up blogs written in various blogging systems, including Lotus Connections that I have here on DeveloperWorks, as well as WordPress. The new method is intended to either promote the new HTML5 standard, or to piss off [iPhone users]. In any case, several readers found they could not read my entire post about floppies because the "iframe" prevented the rest of the post to be shown. I have since reverted back to the old "object" tags and re-posted for everyone's benefit.
I may have to stand up an OS/2 machine just to check out what is actually on those floppies...
For any data that you keep for long term retention, it is important that you be able to access the data in a meaningful way when you need it. IBM has identified five ways that this can be done:
Museum approach -- keep old servers, storage and applications around. In my case, I have computers that can handle 3.5-inch floppy diskettes, but no hardware to read my Zip cartridges or 5.25-inch floppies.
Emulation approach -- emulating old systems with new systems. I remember the first CD players had "tape cassette" attachements so they can be used in car stereos.
Migration approach -- migrating data and applications to new technology. This is what most businesses do. For example, if you keep archives through IBM Tivoli Storage Manager or DFSMShsm, the software will migrate data from old tapes to new tapes as part of its tape reclamation process.
Descriptive approach -- including sufficiently descriptive metadata, such as with HTML or XML tags, that would enable future rendering.
Ecapsulation approach -- encapsulate the data, metadata and related application logic for future processing. While the "descriptive" approach might help display the contents of proprietary formats, the encapsulation approach would include application logic, perhaps written in Java, that could be used to actually operate built-in macros, pivot tables, or other active features of a document or database.
IBM Research is working closely with industry standards groups, like the Organization for the Advancement of Structured Information Standards [OASIS], to help promote the use of open standards for long-term retention.
For my readers who follow American Football, enjoy the [SuperBowl]!
The weather has warmed up here in Tucson so I started my Spring Cleaning early this year and unearthed from my garage a [Bankers Box] full of floppy diskettes.
IBM invented the floppy disk back in 1971, and continued to make improvements and enhancements through the 1980s and 1990s. It will be one of the many inventions celebrated as part of IBM's Centennial (100-year) anniversary. Here is an example [T-shirt]
IBM needed a way to send out small updates and patches for microcode of devices out in client locations. IBM had drives that could write information, and sent out "read-only" drives to the customer locations to receive these updates. These were flexible plastic circles with a magnetic coating, and placed inside a square paper sleeve. Imagine a floppy disk the size of a piece of standard paper. The 8-inch floppy fit conveniently in a manila envelope, sendable by standard mail, and could hold nearly 80KB of data.
I've been using floppies for the past thirty years. Here's some of my fondest memories:
While still in high school, my friend Franz Kurath and I formed "Pearson Kurath Systems", a software development firm. We wrote computer programs to run on UNIX and Personal Computers for small businesses here in Tucson. Whenever we developed a clever piece of code, a subroutine or procedure, we would save it on a floppy disk and re-use it for our next project. We wrote in the BASIC language, and our databases were simple Comma-Separated-Variable (CSV) flat files.
The 5.25-inch floppies we used could hold 360KB, and were flexible like the 8-inch models. Later versions of these 5.25-inch floppies would be able to hold as much as 1.2MB of data. We would convert single-sided floppies into double-sided ones by cutting out a notch in the outer sleeve. Covering up the notches would mark them as read-only.
The 3.5-inch floppies were introduced with a hard plastic shell, with the selling point that you can slap on a mailing label and postage and send it "as is" without the need for a separate envelope. These new 3.5-inch floppies would carry "HD" for high density 720KB, and double-sided versions could hold 1.44MB of data. The term "diskette" was used to associate these new floppies with [hard-shelled tape cassettes]. Sliding a plastic tab would allow floppies to be marked "read-only". IBM has the patent on this clever invention.
Continuing our computer programming business in college, Franz and I took out a bank loan to buy our first Personal Computer, for over $5000 dollars USD. Until then, we had to use equipment belonging to each client. The banks we went to didn't understand why we needed a computer, and suggested we just track our expenses on traditional green-and-white ledger paper. Back then, peronsal computers were for balancing your checkbook, playing games and organizing your collection of cooking recipies. But for us, it was a production machine. A computer with both 5.25-inch and 3.5-inch drives could copy files from one format to another as needed. The boost in productivity paid for itself within months.
Apple launched its Macintosh computer in 1984, with a built-in 3.5-inch disk drive as standard equipment. Here is a YouTube video of an [astronaut ejecting a floppy disk] from an Apple computer in space.
In my senior year at the University of Arizona, my roommate Dave had borrowed my backpack to hold his lunch for a bike ride. He thought he had taken everything out, but forgot to remove my 3.5-inch floppy diskette containing files for my senior project. By the time he got back, the diskette was covered in banana pulp. I was able to rescue my data by cracking open the plastic outer shell, cleaning the flexible magnetic media in soapy water, placing it back into the plastic shell of a second diskette, and then copied the data off to a third diskette.
After graduating from college, Franz and I went our separate ways. I went to work for IBM, and Franz went to work for [Chiat/Day], the advertising agency famous for the 1984 Macintosh commercial. We still keep in touch through Facebook.
At IBM, I was given a 3270 terminal to do my job, and would not be assigned a personal computer until years later. Once I had a personal computer at home and at work, the floppy diskette became my "briefcase". I could download a file or document at work, take it home, work on it til the wee hours of the morning, and then come back the next morning with the updated effort.
To help prepare me for client visits and public speaking at conferences, IBM loaned me out to local schools to teach. This included teaching Computer Science 101 at Pima Community College. When asked by a student whether to use "disc" or "disk", I wrote a big letter "C" on the left side of the chalkboard, and a big letter "K" on the right side. If it is round, I told the students while pointing at the letter "C", like a CD-ROM or DVD, use "disc". If it has corners, pointing to corners of the letter "K", like a floppy diskette or hard disk drive, use "disk".
On one of my business trips to visit a client, we discovered the client had experienced a problem that we had just recently fixed. Normally, this would have meant cutting a Program Trouble Fix (PTF) to a 3480 tape cartridge at an IBM facility, and send it to the client by mail. Unwilling to wait, I offered to download the PTF onto a floppy diskette on my laptop, upload it from a PC connected to their systems, and apply it there. This involved a bit of REXX programming to deal with the differences between ASCII and EBCDIC character sets, but it worked, and a few hours later they were able to confirm the fix worked.
In 1998, Apple would signal the begining of the end of the floppy disk era, announcing their latest "iMac" would not come with an internal built-in floppy drive. David Adams has a great article on this titled [The iMac and the Floppy Drive: A Conspiracy Theory]. You can get external floppy drives that connect via USB, so not having an internal drive is no longer a big deal.
While teaching a Top Gun class to a mix of software and hardware sales reps, one of the students asked what a "U" was. He had noticed "2U" and "3U" next to various products and wondered what that was referring to. The "U" represents the [standard unit of measure for height of IT equipment in standard racks]. To help them visualize, I explained that a 5.25-inch floppy disk was "3U" in size, and a 3.5-inch floppy diskette was "2U". Thus, a "U" is 1.75 inches, the thinnest dimension on a two-by-four piece of lumber. Servers that were only 1U tall would be referred to as "pizza boxes" for having similar dimensions.
Every year, right around November or so, my friends and family bring me their old computers for me to wipe clean. Either I would re-load them with the latest Ubuntu Linux so that their kids could use it for homework, or I would donate it to charity. Last November, I got a computer that could not boot from a CD-ROM, forcing me to build a bootable floppy. This gave me a chance to check out the various 1-disk and 2-disk versions of Linux and other rescue disks. I also have a 3-disk set of floppies for booting OS/2 in command line mode.
So while this unexpected box of nostalgia derailed my efforts to clean out my garage this weekend, it did inspire me to try to get some of the old files off them and onto my PC hard drive. I have already retrieved some low-res photographs, some emails I sent out, and trip reports I wrote. While floppy diskettes were notorious for being unreliable, and this box of floppies has been in the heat and cold for many Arizonan summers and winters, I am amazed that I was able to read the data off most of them so far, all the way back to data written in 1989. While the data is readable, in most cases I can't render it into useful information. This brings up a few valuable lessons:
Backups are not Archives
Some of the files are in proprietary formats, such as my backups for TurboTax software. I would need a PC running a correct level of Windows operating system, and that particular software, just to restore the data. TurboTax shipped new software every year, and I don't know how forward or backward-compatible each new release was.
Another set of floppies are labeled as being in "FDBACK" format. I have no idea what these are. Each floppy has just two files, "backup.001" and "control.001", for example.
Backups are intended solely to protect against unexpected loss from broken hardware or corrupted data. If you plan to keep data as archives for long-term retention, use archive formats that will last a long time, so that you can make sense of them later.
Operating System Compatibility
Windows 7 and all of my favorite flavors of Linux are able to recognize the standard "FAT" file system that nearly all of my floppies are written in. Sadly, I have some files that were compressed under OS/2 operating system using software called "Stacker". I may have to stand up an OS/2 machine just to check out what is actually on those floppies.
You can't judge a book by its cover
Floppies were a convenient form of data interchange. Sometimes, I reused commercially-labeled floppies to hold personal files. So, just because a floppy says "America On-Line (AOL) version 2.5 Installation", I can't just toss it away. It might actually contain something else entirely. This means I need to mount each floppy to check on its actual contents.
So what will I do with the floppies I can't read, can't write, and can't format? I think I will convert them into a [retro set of coasters], to protect my new living room furniture from hot and cold beverages.
In keeping with the spirit to be a more kinder, gentler 2011, I decided last week to refrain from being the rain on someone else's parade that occurs immediately before, during or after a competitor's announcement or annual conference, and let EMC have their few moments in the spotlight last week. This of course allows me more time to learn about the announcements and reflect on marketplace reactions. Here's a quick look at the [EMC Press Release]:
A new VNXe disk system
Of the 41 new storage technologies and products EMC announced last week, the VNXe is EMC's "me-too" product to compete against other low-end disk systems like the IBM System Storage DS3524 and N3000 series. It looks truly new, developed organically from the ground up, with a new architecture, new OS. It comes in either the 2U-high VNXe3100 or the 3U-high VNXe3300. These employ 3.5-inch SAS drives to provide Ethernet-based NFS, CIFS and iSCSI host attachment. The $10K USD price tag appears to be for the hardware only. As is typical for EMC, they charge software features in bundles or "suites", so the actual TCO will be much higher. I have not seen any announcements whether Dell plans to resell either the VNXe nor the VNX models, now that they have acquired Compellent.
A new VNX disk system
Despite having a similar name as the VNXe, the VNX appears to be a re-hash of the Celerra/CLARiiON mess that EMC has been selling already, based on the old FLARE and DART operating systems of these older disk systems. This scales from 75 to 1000 SAS drives. While EMC calls the VNX "unified", it currently is only available in block-only and file-only models, with a future promise from EMC that they will offer a combined block-and-file version sometime in the future. EMC claims that the VNX will be faster than the predecessors, so hopefully that means EMC has joined the rest of the planet and will publish SPC-1 and SPC-2 benchmarks to back up that claim. They can compare against the SPC-1 benchmarks that our friends at NetApp ran against EMC CLARiiON.
New software for the VMAX
A long time ago, EMC announced they would provide non-disruptive automated tiering. Their first delivery "FAST V1" handled entire LUNs at a time. EMC now has finally "FAST VP" which we expected was going to be called "FAST V2", which provides sub-LUN automated tiering between Solid-state and spinning disk drives.. Meanwhile, IBM has been delivering "Easy Tier" on the IBM System Storage DS8000 series, SAN Volume Controller, and Storwize V7000 disk systems.
Data Domain Archiver
Competing against IBM, HP and Oracle in the tape arena, EMC's latest addition to the Data Domain family is designed for the long-term retention of backups? Archives of backups? Backups are short-lived, protecting against the unexpected loss from hardware failure or data corruption. Keeping backups as "archives" is generally a bad mistake, as it makes it hard to e-Discover the data you need when you need it, and may not have the appropriate hardware tor restore these old backups when you do find them.
I will have to dig deeper into all of these different technologies in separate posts in the future.
If we have learned anything from last decade's Y2K crisis, is that we should not wait for the last minute to take action. Now is the time to start thinking about weaning ourselves off Windows XP. IBM has 400,000 employees, so this is not a trivial matter.
Already, IBM has taken some bold steps:
Last July, IBM announced that it was switching from Internet Explorer (IE6) to [Mozilla Firefox as its standard browser]. IBM has been contributing to this open source project for years, including support for open standards, and to make it [more accessible to handicapped employees with visual and motor impairments]. I use Firefox already on Windows, Mac and Linux, so there was no learning curve for me. Before this announcement, if some web-based application did not work on Firefox, our Helpdesk told us to switch back to Internet Explorer. Those days are over. Now, if a web-based application doesn't work on Firefox, we either stop using it, or it gets fixed.
IBM also announced the latest [IBM Lotus Symphony 3] software, which replaces Microsoft Office for Powerpoint, Excel and Word applications. Symphony also works across Mac, Windows and Linux. It is based on the OpenOffice open source project, and handles open-standard document formats (ODF). Support for Microsoft Office 2003 will also run out in the year 2014, so moving off proprietary formats to open standards makes sense.
I am not going to wait for IBM to decide how to proceed next, so I am starting my own migrations. In my case, I need to do it twice, on my IBM-provided laptop as well as my personal PC at home.
Last summer, IBM sent me a new laptop, we get a new one every 3-4 years. It was pre-installed with Windows XP, but powerful enough to run a 64-bit operating system in the future. Here are my series of blog posts on that:
I decided to try out Red Hat Enterprise Linux 6.1 with its KVM-based Red Hat Enterprise Virtualization to run Windows XP as a guest OS. I will try to run as much as I can on native Linux, but will have Windows XP guest as a next option, and if that still doesn't work, reboot the system in native Windows XP mode.
So far, I am pleased that I can do nearly everything my job requires natively in Red Hat Linux, including accessing my Lotus Notes for email and databases, edit and present documents with Lotus Symphony, and so on. I have made RHEL 6.1 my default when I boot up. Setting up Windows XP under KVM was relatively simple, involving an 8-line shell script and 54-line XML file. Here is what I have encountered:
We use a wonderful tool called "iSpring Pro" which merges Powerpoint slides with voice recordings for each page into a Shockwave Flash video. I have not yet found a Linux equivalent for this yet.
To avoid having to duplicate files between systems, I use instead symbolic links. For example, my Lotus Notes local email repository sits on D: drive, but I can access it directly with a link from /home/tpearson/notes/data.
While my native Ubuntu and RHEL Linux can access my C:, D: and E: drives in native NTFS file system format, the irony is that my Windows XP guest OS under KVM cannot. This means moving something from NTFS over to Ext4, just so that I can access it from the Windows XP guest application.
For whatever reason, "Password Safe" did not run on the Windows XP guest. I launch it, but it takes forever to load and never brings up the GUI. Fortunately, there is a Linux version [MyPasswordSafe] that seems to work just fine to keep track of all my passwords.
Personal home PC
My Windows XP system at home gave up the ghost last month, so I bought a new system with Windows 7 Professional, quad-core Intel processor and 6GB of memory. There are [various editions of Windows 7], but I chose Windows 7 Professional to support running Windows XP as a guest image.
Here's is how I have configured my personal computer:
I actually found it more time-consuming to implement the "Virtual PC" feature of Windows 7 to get Windows XP mode working than KVM on Red Hat Linux. I am amazed how many of my Windows XP programs DO NOT RUN AT ALL natively on Windows 7. I now have native 64-bit versions of Lotus Notes and Symphony 3, which will do well enough for me for now.
I went ahead and put Red Hat Linux on my home system as well, but since I have Windows XP running as a guest under Windows 7, no need to duplicate KVM setup there. At least if I have problems with Windows 7, I can reboot in RHEL6 Linux at home and use that for Linux-native applications.
Hopefully, this will position me well in case IBM decides to either go with Windows 7 or Linux as the replacement OS for Windows XP.
Actually, if the title confuses you, it is because it has a double meaning.
Meaning 1: IBM earned almost 100 Billion dollars (USD)
IBM's 2010 [earnings report is now available], for the full year 2010 and the fourth quarter. IBM had $99.9 Billion dollars (USD) in revenue, almost $100 Billion dollars that it had set out as a vision in the 1980s. IBM Storage contributed with 8 percent growth, not bad for a year Dave Barry considers [one of the worst years ever.].
IBM President and CEO Sam Palmisano granted me a chunk of IBM stock in appreciation of my efforts towards the 2010 success! Actually, he gave stock to a whole bunch of IBMers, not just me, and they all deserve it also. Woo hoo!
Meaning 2: IBM is almost 100 years old
That's right, this upcoming June 16, 2011, IBM turns 100 years old. This Centennial date also happens to be my 25th year anniversary working in IBM Storage, which IBM calls joining the Quarter Century Club, or QCC for short. So, I am looking forward to plenty of cake and fireworks on that day!
I am looking forward to a year-long celebration on both counts!
Every January, we look back into the past as well as look into the future for trends to watch for the upcoming year. Ray Lucchesi of Silverton Consulting has a great post looking back at the [Top 10 storage technologies over the last decade]. I am glad to see that IBM has been involved with and instrumental in all ten technologies.
Looking into the future, Mark Cox of eChannel has an article [Storage Trends to Watch in 2011], based on his interviews with two fellow IBM executives: Steve Wojtowecz, VP of storage software development, and Clod Barrera, distinguished engineer and CTO for storage. Let's review the four key trends:
Cloud Storage and Cloud Computing
No question: Cloud Computing will be the battleground of the IT industry this decade. I am amused by the latest spate of Microsoft commercials where problems are solved with someone saying "...to the cloud". Riding on the coat tails of this is "Cloud Storage", the ability to store data across an Internet Protocol (IP) network, such as 10GbE Ethernet, in support of Cloud Computing applications. Cloud Storage protocols in the running include NFS, CIFS, iSCSI and FCoE.
Mark writes "..vendors who aren't investing in cloud storage solutions will fall behind the curve."
Economic Downturn forces Innovation
The old British adage applies: "Necessity is the mother of invention." The status quo won't do. In these difficult economic times, IT departments are running on constrained budgets and staff. This forces people to evaluate innovative technologies for storage efficiency like real-time compression and data deduplication to make better use of what they currently have. It also is forcing people to take a "good enough" attitude, instead of paying premium prices for best-of-breed they don't really need and can't really afford.
IT Service Management
Companies are getting away from managing individual pieces of IT kit, and are focusing instead on the delivery of information, from the magnetic surface of disk and tape media, to the eyes and ears of the end users. The deployment mix of private, hybrid and public clouds makes this even more important to measure and manage IT as a set of services that are delivered to the business. IT Service Management software can be the glue, helping companies implement ITIL v3 best practices and management disciplines.
Smarter Data Placement
A recent survey by "The Info Pro" analysts indicates that "managing storage growth" is considered more critical than "managing storage costs" or "managing storage complexity".
This tells me that companies are willing to spend a bit extra to deploy a tiered information infrastructure if it will help them manage storage growth, which typically ranges around 40 to 60 percent per year. While I have discussed the concept of "Information Lifecycle Management" (ILM), for the past four years on this blog, I am glad to see it has gone mainstream, helped in part with automated storage tiering features like IBM System Storage Easy Tier feature on the IBM DS8000, SAN Volume Controller and Storwize V7000 disk systems. Not all data is created equal, so the smart placement of data, based on the business value of the information contained, makes a lot of sense.
These trends are influencing what solutions the various different vendors will offer, and will influence what companies purchase and deploy.
The "Basic" offering includes a single IBM Storwize V7000 controller enclosure, and three year warranty package that includes software licenses for IBM Tivoli Storage FlashCopy Manager (FCM) and IBM Tivoli Storage Productivity Center for Disk - Midrange Edition (MRE). Planning, configuration and testing services for the software are included and can be performed by either IBM or an IBM Business Partner.
The "Standard" offering allows for multiple IBM Storwize V7000 enclosures, provides three year warranty package for the FCM and MRE software, and includes implementation services for both the hardware and the software components. These services can be performed by IBM or an IBM Business Partner.
Why bundle? Here are the key advantages for these offerings:
Increased storage utilization! First introduced in 2003, IBM SAN Volume Controller is able to improve storage utilization by 30 percent through virtualization and thin provisioning. IBM Storwize V7000 carries on this tradition. Space-efficient FlashCopy is included in this bundle at no additional charge and can reduce the amount of storage normally required for snapshots by 75 percent or more. IBM Tivoli Storage FlashCopy Manager can manage these FlashCopy targets easily.
Improved storage administrator productivity! The new IBM Storwize V7000 Graphical User Interface can help improve administrator productivity up to 2 times compared to other midrange disk solutions. The IBM Tivoli Storage Productivity Center for Disk - Midrange Edition provides real-time performance monitoring for faster analysis time.
Increased application performance! This bundle includes the "Easy Tier" feature at no additional charge. Easy Tier is IBM's implementation of sub-LUN automated tiering between Solid-State Drives (SSD) and spinning disk. Easy Tier can help improve application throughput up to 3 times, and improve response time up to 60 percent. Easy Tier can help meet or exceed application performance levels with its internal "hot spot" analytics.
Increased application availability! IBM Tivoli Storage FlashCopy Manager provides easy integration with existing applications like SAP, Microsoft Exchange, IBM DB2, Oracle, and Microsoft SQL Server. Reduce application downtime to just seconds with backups and restores using FlashCopy. The built-in online migration feature, included at no additional charge, allows you to seamlessly migrate data from your old disk to the new IBM Storwize V7000.
Significantly reduced implementation time! This bundle will help you cut implementation time in half, with little or no impact to storage administrator staff. This will help you realize your return on investment (ROI) much sooner.
Regardless of what you do, it is important to keep your finger on the pulse of what is going on around you. Let me recap the different jobs I have had within IBM:
I started as a Software Engineer on DFHSM, which was later renamed to DFSMShsm, and worked my way up to lead architect for the entire DFSMS product. I attended user group conferences like SHARE and GUIDE to formally present the latest releases of the product, and to collect requirements for improvements and additions desired by the CIOs, IT directors and Storage Admins that attended. Each requirement was proposed to the group, who then voted on a scale from -3 to +3, with zero considered abstention. Six months later, I would come back to present which requirements were implemented, which ones were in consideration for future releases, and which ones were rejected because they were not strategic. Not everyone was happy with these decisions, and I took a lot of abuse on this. However, the process of gathering requirements was important, and the products are better for it.
I switched over to Marketing, starting out as a Marketing Manager for various prodcts, and working my way up to lead Marketing Strategist for the IBM System Storage product line. I continued to attend conferences to understand the client requirements, but I also attended meetings with IBM sales reps and Business Partners. For those who lump "Marketing and Sales" into a single category, there is a difference. Marketing is the transfer of awareness and enthusiasm, whereas Sales is the transfer of ownership. When Marketing does their job well, prospects are lining up to buy your product. When they don't, the Sales team has to pick up the slack, and provide the awareness and enthusiasm that Marketing failed to deliver. I traveled all over the world to present our Marketing Strategy. Not everyone was happy with some of our decisions, and I took a lot of abuse on this. However, the process of "socializing" the marketing message and hearing feedback of those who faced clients every day was important, and the marketing strategy was better for it.
Three years ago, I switched again, this time to be a Storage Consultant at the Tucson Executive Briefing Center. While I still travel to clients and conferences, in most cases the clients come to me, here in Tucson, Arizona. I get to present our strategy, solutions and products. Not everyone is happy with some of our decisions, and I take a lot of abuse on this. However, the process of helping customers make tough business and IT purchase decisions is important, and both IBM and our clients are better for it.
It was in this same concept that US Representative Gabrielle ("Gabby") Giffords launched a series of "Congress on your Corner" meetings. These were open air townhall meetings that allowed her to present her priorities and plans for the future, and to get feedback from her constituents. Last Saturday, at one such event here in Tucson, she was shot in the head. The shooter then proceeded to shoot another 20 rounds at others before being tackled to the ground by two volunteers. He had another 70 bullets left, so it could have been much worse.
Congresswoman Gifford survived, but six died, including a US Federal Judge, a Pastor at a local church, and a 9-year-old girl, who ironically was born on Setpember 11, 2001, the date of another US tragedy. The girl had just been elected to her student council, and came out to learn what government was all about. Another dozen people were wounded.
The last time I saw Gabby in person was last October 2010, at a charity auction to benefit the local Boys and Girls Club of America. She was shaking hands with everyone. I wished her good luck on her re-election campaign, which she won a few weeks later by a slim margin of some 4,000 votes.
(People have asked me if I knew her in high school. Gabby and I both attended University High in Tucson, rated one of the top 25 high schools in the USA. She would have started her freshman year months after I graduated, so I don't remember ever crossing paths.)
Having spent much of my childhood in Central and South America, I have witnessed my fair share of gun violence, military coups, and government take-overs. Of course, in a democratic government, there is a more peaceful way to resolve your differences. In my younger days, I was a lobbyist for local and state government here in Arizona for various causes and issues. I have met and dealt with many politicians. While many people are still in shock and awe over Saturday's tragedy, consider the following:
Tucson is part of the Wild, Wild, West. We are not far from the infamous town of Tombstone where a famous shoot-out happened at the OK Corrall. A popular activity here is to shoot rounds at a shooting range, either rent a gun or bring your own. Gun ownership is high, and hunting is a popular sport. Tucson hosts "Gun Shows" that allow people to buy guns without the mandatory 5-day waiting period. Every year, Tucson celebrates "Dillinger Days" to comemorate the capture of gunslinger John Dilinger at the Hotel Congress in downtown Tucson.
Tucson is close to Mexico. Authorities have reported as many as 30,000 people have been killed on the other side of the US-Mexico border in the past five years by rival drug cartels. An estimated 30 percent of the Tucson economy comes from human and drug trafficking. Those killed in Mexico include government officials, law enforcement and journalists. Last year, US President Barack Obama [ordered 1200 troops to protect the US-Mexico border], of which half were deployed here in Arizona. The district I live in that Congresswoman Giffords represents borders Mexico.
Tucson has high schools, colleges and Universities. We have had our share of shootings by frustrated students.
While everyone immediately was quick to blame this tragedy on everyone from [Sarah Palin] to Mexican drug lords, it appears the shooter was merely a frustrated college student, acting alone, and is now in custody awaiting trial. He was attending Pima Community College and had his run-ins with the college police there as well. He had applied to join the US Army, but his application was rejected.
In the early 1990s, to help me prepare to become a public speaker, IBM loaned me out to teach at the local schools. I did four semesters of high school, and then taught a year of Computer Science 101 at Pima Community College. (Yes, I have all the teaching credentials to do this.) I found this experience to be great training for me to practice my speaking skills. However, I took a lot of abuse. I had disruptive students, angry students, frustrated students, and students that would threaten me if they did not pass the class. One by one, they would drop out of my class, leaving me with only nine students finishing my class with a passing grade.
Sadly, community colleges across the country carry a stigma that they are not as good as a full four-year University. The students I met at Pima Community College were here because they could not find decent employment with just a high school diploma, weren't smart enough or rich enough to attend the University of Arizona, and just didn't know what to do with their lives. Some who graduate manage to get jobs as technicians and medical assistants, while others use this as a stepping stone to transfer over to the University of Arizona or other specialized training program.
I am sure there is much more to learn about this incident. Politicians can expect to take some abuse for the decisions made, their actions or lack of action on various issues, but nobody deserves being shot. Congresswoman Giffords was just trying to put her finger on the pulse of her district, to understand the concerns of her constituents so that she could represent us properly in her third term in office. Instead, we have doctors at the University Medical Center keeping their finger on her pulse. So far, things are hopeful, she is able to respond to commands such as "wiggle your toes" or "hold up two fingers".
The latest update to the IBM Storage channel on YouTube is fellow IBMer Bob Dalton presenting IBM Scale-Out Network Attached Storage (SONAS) at the NAB 2010 conference. Here is the quick [2-minute YouTube video].
Last year, I took a different approach. I decided to NOT publicize my resolutions to see if that allowed me to stick to them better. Here is what I had resolved for 2010:
The recession took quite a hit on my investments and retirement plans in 2009, so in 2010 I decided to increase my savings rate, and diversify my portfolio. I consider this one a success, with special thanks to the financial planners at Fidelity Investments for their assistance in this area. This was not a matter of sticking to a strict budget as much as not wasting money on so many expensive, frivolous things.
Publish "Inside System Storage: Volume II"
Yes, I finally got my latest book published last October, a follow-on to my 2007 hit "Inside System Storage: Volume I". I have already begun working on Volume III, so I consider this one a success.
Quit Exercising at the Gym
From 2003 to 2009, I had worked out consistently at my gym three times a week for an hour, under the supervision of a personal trainer, with 20 minutes of cardio work-out on a treadmill followed by 40 minutes of weight lifting with kettle-bells, free weights and resistance machines. During that time, I did not gain muscle mass nor lose body fat. Rather than admit their failure, my personal trainers indicated this is merely a "plateau". Armed with the Time article [Why Exercise Won't Make You Thin], I decided to save thousands of dollars in 2010 by discontinuing my gym membership and expire my contract with my personal trainer. End result? I was two pounds lighter after 12 months, so I consider this one a success.
Re-Decorate my Living Room
With all the money I saved, I resolved to re-decorate my living room. I hired a professional interior decorator, bought new furniture, and had the entire room re-painted to new colors. This also means keeping the room uncluttered, which I have managed to do so far. So, this one is also a success.
Learn Cloud Computing
This last one was work-related. Every year, IBM asks its employees to document their "Personal Business Commitments", or PBCs, which then forms the basis of your year-end appraisal. IBM is what the industry calls a "Results-Oriented Work Environment" [ROWE]. These PBCs are an opportunity to identify areas to stretch and grow, broaden your skills and strengthen your expertise. I was able to get access to IBM's cloud computing offerings to get hands-on experience, as well as research this topic on various fronts so that I could provide advice to clients and make presentations at various briefings and events. While there is still much more to learn, I consider this a success.
So, while I seem to have been more successful keeping my resolutions by not making them public up front, I think the more important pattern is that when I made many resolutions, I had only a 60 to 80 percent success rate, but when I had fewer, I was more likely to keep them all and be less stressed about it. This could also be psychological, in that feeling that you have completed 60 to 80 percent allows you to forgive yourself for not keeping some of the more difficult resolutions. Therefore, this year, I have decided to focus on a single resolution, to reduce my body fat percentage.
Rather than make you wait 12 months for my results, I plan to provide periodic updates in this blog on my progress. Over the vacation break, I bought and read Tim Ferriss' book [Four Hour Body]. Mo and I are in this together, and we have started Tim's [Slow-Carb diet] last Sunday. My doctor has advised me on which vitamins and supplements to take. Rather than go back to the gym, I will just focus on walking at least 20,000 steps per week, which works out roughly to 10 kilometers.
Wrapping up my post-week coverage of the [Data Center 2010 conference], I stuck through the end to get my money's worth at this conference. As the morning went on, it became obvious many people booked flights or started their weekends prior to the official 3:15pm ending of the last day.
Strategies for Data Life Cycle Management
I prefer the term "Information Lifecycle Management", but the two analysts presenting decided to use DLM instead. Let's start with the biggest challenge faced by the audience.
The problem is not meeting Service Level Agreements (SLA) but Service Level Expectations. When looking at the real business value of IT, you should link IT strategy to business outcomes and directives, align with your CIO's pet initiatives, and position storage as a technology supporting IT Directors goals. Here were the top five goals:
Curtailing Storage Sprawl
Compliance and e-Discovery
Improving Service Levels for Data Availability and Protection
Moving to Cloud Computing
The analysts reviewed both a "Tops Down" and "Bottoms Up" approach. They recommend what they call an "Enterprise Infomration Archive" (what IBM calls Smart Archive, by the way) that provides a better understanding of all data.
No greater lie has been told than "Storage is Cheap". Currently, only 10 percent of companies hvae a formal "deletion policy", but the analysts predict this will rise to 50 percent by 2013.
The "Bottoms Up" approach is focused on modernizing the data center at the storage technology level. There has been a resurgence in interest in ILM solutions, implementing storage tiers, and storage efficiency features like thin provisioning, data deduplication and real-time compression. Cloud Computing can help off-load this effort to someone else.
ILM provides real business value, such as reduce costs, improve quality of service, and mitigate risks. The analysts felt that if you are not partnering with a storage vendor that offers five essential technologies, you should probably change vendors. What are those five essential technologies? I am glad you asked. Watch this [YouTube video] to find out.
Getting the Most From Your Storage Vendor Relationships
The analyst mentioned there are two kinds of storage vendors. Suppliers that sell you solutions, and Partners that work with you to develop unique functionality. He offered some advice:
Allow vendors to analyze and profile your workloads, such as IOPS, MB/sec bandwidth, average blocksize, and so on.
Review your Service level agreements (SLAs), procedures and asset management strategies
Identify upgrade risks, conversion costs, and unintended consequences
Take advantage of vendor engineers and technical staff for skills transfer, best practices, industry trends, and competitive comparisons
Explore different solutions and approaches
Avoid big pitfalls by negotiating and locking in upgrade and maintenance costs, scheduling conversions, and getting any guarantees in writing.
Asking the audience how they currently interact with their storage vendors:
The analyst's "Do's and Don'ts" were good advice for nearly any kind of business negotiation:
Keep language simple and enforceable
Limit diagnostic time
Be reasonble with rolling time-lines
Design remedies that keep you whole and are implementable in your environment
Make remedies punitive
Use qualitative measures
Rely on vendor's metrics only
Set terms that expire during life of system
Let the vendor provide best practices after installation, set reasonable expectations, schedule regular reviews, and insist on cross-vendor cooperation, have zero tolerance for finger-pointing between vendors. Depreciate storage equipment quickly.
This was the last session of the conference, a workshop to deal with irrational behavior during unexpected events that could disrupt or impact business operations. In the exercise, each table was a fictitious company, and the 7-8 people sitting at each table represented different department heads who had to make recommendations to upper management on how to deal with each disastrous situation presented to us. Decisions had to be made with limited and incomplete information. Each table had to come to a consensus on each action, and a single spokesperson from each table would present the recommendations. Winners of each round got prizes.
Plenty of coffee, not enough juice. Power and Cooling were top of mind. The rooms were cold, designed for people wearing suits I imagine. I enjoyed plenty of hot coffee throughout the event. Everyone complained that their smartphones and iPads were running out of electricity. The conference had "recharge" stations with plugs for all kinds of different phones, but the Micro-USB plugs that I needed for my Samsung Vibrant, and the apple connections needed by everyone else's iPhones and iPads, were always taken. I remember when you could charge your cell phone once a week, because you hardly used it to make calls, and now that they can be used to follow Twitter feeds, surf websites, and other actions between sessions, power runs out quickly.
Information Overload. I was one of those following tweets on the HootSuite app on my Android-based smart phone. I was able to meet some of the people I have exchanged blog comments and tweets. One told me that his tweets was his way of taking notes, so that his trip report would be done when he got back to the office. I used to write trip reports also, before blogging and tweeting.
The mood was positive. Overall, all the rival competitors got along well. I had friendly chats with people from Oracle, HP, Cisco, EMC, VCE, and others. People are overall optimistic that the IT industry is set for economic growth in 2011.
The only people who look forward to change are babies in soiled diapers. My impression is that people who were threatened by Cloud Computing now have a better understanding on what they need to do going forward. Yes, this means learning new skills, re-evaluating your backup/recovery procedures, reviewing your BC/DR contingency plans, and a variety of other changes. Those who don't like frequent change should consider getting out of the IT industry. Just sayin'
I suspect this will be my last post of 2010. I will be taking a much-needed break, celebrating the Winter Solstice. To all my readers, I wish you good times over the next few weeks, and a Happy New Year!
Continuing my post-week coverage of the [Data Center 2010 conference], Thursday morning had some interesting sessions for those that did not leave town last night.
Interactive Session Results
In addition to the [Profile of Data Center 2010] that identifies the demographics of this year's registrants, the morning started with highlights of the interactive polls during the week.
External or Heterogeneous Storage Virtualization
The analyst presented his views on the overall External/Heterogeneous Storage Virtualization marketplace. He started with the key selling points.
Avoid vendor lock-in. Unlike the IBM SAN Volume Controller, many of the other storage virtualization products result in vendor lock-in.
Leverage existing back-end capacity. Limited to what back-end storage devices are supported.
Simplify and unify management of storage. Yes, mostly.
Lower storage costs. Unlike the IBM SAN Volume Controller, many using other storage virtualization discover an increase in total storage costs.
Migration tools. Yes, as advertised.
Consolidation/Transition. Yes, over time.
Better functionality. Potentially.
Shortly after several vendors started selling external/heterogeneous storage virtualization solutions, either as software or pre-installed appliances, major storage vendors that were caught with their pants down immediately started calling everything internally as also "storage virtualization" to buy some time and increase confusion.
While the analyst agreed that storage virtualization simplifies the view of storage from the host server side, it can complicate the management of storage on the storage end. This often comes up at the Tucson Briefing Center. I explain this as the difference between manual and automatic transmission cars. My father was a car mechanic, and since he is the sole driver and sole mechanic, he prefers manual transmission cars, easier to work on. However, rental car companies, such as Hertz or Avis, prefer automatic transmission cars. This might require more skills on behalf of their mechanics, but greatly simplifies the experience for those driving.
The analyst offered his views on specific use cases:
Data Migration. The analyst feels that external virtualization serves as one of the best tools for data migration. But what about tech refresh of the storage virtualization devices themselves? Unlike IBM SAN Volume Controller, which allows non-disruptive upgrades of the nodes themselves, some of the other solutions might make such upgrades difficult.
Consolidation/Transition. External virtualization can also be helpful, depending on how aggressive the schedule for consolidation/transition is performed.
Improved Functionality/Usability. IBM SAN Volume Controller is a good example, an unexpected benefit. Features like thin provisioning, automated storage tiering, and so on, can be added to existing storage equipment.
The analyst mentioned that there were different types of solutions. The first category were those that support both internal storage and external storage virtualization, like the HDS USP-V or IBM Storwize V7000. He indicated that roughly 40 percent of HDS USP-V are licensed for virtualization. The second category were those that support external virtualization only, such as IBM SAN Volume Controller, HP Lefthand and SVSP, and so on. The third category were software-only Virtual Guest images that could provide storage virtualization capabilities.
The analyst mentioned EMC's failed product Invista, which sold less than 500 units over the past five years. The low penetration for external virtualization, estimated between 2-5 percent, could be explained from the bad taste that left in everyone considering their options. However, the analyst predicts that by 2015, external virtualization will reach double digit marketshare.
Having a feel for the demographics of the registrants, and specific interactive polling in each meeting, provides a great view on who is interested in what topic, and some insight into their fears and motivations.
Continuing my post-week coverage of the [Data Center 2010 conference], Wednesday evening we had six hospitality suites. These are fun informal get-togethers sponsored by various companies. I present them in the order that I attended them.
Intel - The Silver Lining
Intel called their suite "The Silver Lining". Magician Joel Bauer wowed the crowds with amazing tricks.
Intel handed out branded "Snuggies". I had to explain to this guy that he was wearing his backwards.
i/o - Wrestling with your Data Center?
New-comer "i/o" named their suite "Wrestling with your Data Center?" They invited attendees frustrated with their data centers to don inflated Sumo Wrestling suits.
APC by Schneider Electric - Margaritaville
This will be the last year for Margaritaville, a theme that APC has used now for several years at this conference.
Cisco - Fire and Ice
Cisco had "Fire and Ice" with half the room decorated in Red for fire, and White for ice.
This is Ivana, welcoming people to the "Ice" side.
This is Peter, on the "Fire" side. Cisco tried to have opposites on both sides, savory food on one side, sweets on the other.
CA Technologies - Can you Change the Game?
CA Technologies offered various "sports games", with a DJ named "Coach".
Compellent - Get "Refreshed" at the Fluid Data Hospitality Suite
Compellent chose a low-key format, "lights out" approach with a live guitarist. They had hourly raffles for prizes, but it was too dark to read the raffle ticket numbers.
Of the six, my favorite was Intel. The food was awesome, the Snuggies were hilarious, and the magician was incredibly good. I would like to think Intel for providing me super-secret inside access to their Cloud Computing training resources and for the Snuggie!
Continuing my post-week coverage of the [Data Center 2010 conference], Wendesday afternoon included a mix of sessions that covered storage and servers.
Enabling 5x Storage Efficiency
Steve Kenniston, who now works for IBM from recent acquisition of Storwize Inc, presented IBM's new Real-Time Compression appliance. There are two appliances, one handles 1 GbE networks, and the other supports mixed 1GbE/10GbE connectivity. Files are compressed in real-time with no impact to performance, and in some cases can improve performance because there is less data written to back-end NAS devices. The appliance is not limited to IBM's N series and NetApp, but is vendor-agnostic. IBM is qualifying the solution with other NAS devices in the market. The compression can compress up to 80 percent, providing a 5x storage efficiency.
Townhall - Storage
The townhall was a Q&A session to ask the analysts their thoughts on Storage. Here I will present the answer from the analyst, and then my own commentary.
Are there any gotchas deploying Automated Storage Tiering?
Analyst: you need to fully understand your workload before investing any money into expensive Solid-State Drives (SSD).
Commentary: IBM offers Easy Tier for the IBM DS8000, SAN Volume Controller, and Storwize V7000 disk systems. Before buying any SSD, these systems will measure the workload activity and IBM offers the Storage Tier Advisory Tool (STAT) that can help identify how much SSD will benefit each workload. If you don't have these specific storage devices, IBM Tivoli Storage Productivity Center for Disk can help identify disk performance to determine if SSD is cost-justified.
Wouldn't it be simpler to just have separate storage arrays for different performance levels?
Analyst: No, because that would complicate BC/DR planning, as many storage devices do not coordinate consistency group processing from one array to another.
Commentary: IBM DS8000, SAN Volume Controller and Storwize V7000 disk systems support consistency groups across storage arrays, for those customers that want to take advantage of lower cost disk tiers on separate lower cost storage devices.
Can storage virtualization play a role in private cloud deployments?
Analyst: Yes, by definition, but today's storage virtualization products don't work with public cloud storage providers. None of the major public cloud providers use storage virtualization.
Commentary: IBM uses storage virtualization for its public cloud offerings, but the question was about private cloud deployments. IBM CloudBurst integrated private cloud stack supports the IBM SAN Volume Controller which makes it easy for storage to be provisioned in the self-service catalog.
Can you suggest one thing we can do Monday when we get back to the office?
Analyst: Create a team to develop a storage strategy and plan, based on input from your end-users.
Commentary: Put IBM on your short list for your next disk, tape or storage software purchase decision. Visit
[ibm.com/storage] to re-discover all of IBM's storage offerings.
What is the future of Fibre Channel?
Analyst 1: Fibre Channel is still growing, will go from 8Gbps to 16Gbps, the transition to Ethernet is slow, so FC will remain the dominant protocol through year 2014.
Analyst 2: Fibre Channel will still be around, but NAS, iSCSI and FCoE are all growing at a faster pace. Fibre Channel will only be dominant in the largest of data centers.
Commentary: Ask a vague question, get a vague answer. Fibre Channel will still be around for the next five years.
However, SAN administrators might want to investigate Ethernet-based approaches like NAS, iSCSI and FCoE where appropriate, and start beefing up their Ethernet skills.
Will Linux become the Next UNIX?
Linux in your datacenter is inevitable. In the past, Linux was limited to x86 architectures, and UNIX operating systems ran on specialized CPU architectures: IBM AIX on POWER7, Solaris on SPARC, HP-UX on PA-RISC and Itanium, and IBM z/OS on System z Architecture, to name a few. But today, Linux now runs on many of these other CPU chipsets as well.
Two common workloads, Web/App serving and DBMS, are shifting from UNIX to Linux. Linux Reliability, Availability and Serviceability (RAS) is approaching the levels of UNIX. Linux has been a mixed blessing for UNIX vendors, with x86 server margins shrinking, but the high-margin UNIX market has shrunk 25 percent in the past three years.
UNIX vendors must make the "mainframe argument" that their flavor of UNIX is more resilient than any OS that runs on Intel or AMD x86 chipsets. In 2008, Sun Solaris was the number #1 UNIX, but today, it is IBM AIX with 40 percent marketshare. Meanwhile HP has focused on extending its Windows/x86 lead with a partnership with Microsoft.
The analyst asks "Are the three UNIX vendors in it for the long haul, or are they planning graceful exits?" The four options for each vendor are:
Milk it as it declines
Accelerate the decline by focusing elsewhere
Impede the market to protect margins
Re-energize UNIX base through added value
Here is the analyst's view on each UNIX vendor.
IBM AIX now owns 40 percent marketshare of the UNIX market. While the POWER7 chipset supports multiple operating systems, IBM has not been able to get an ecosystem to adopt Linux-on-POWER. The "Other" includes z/OS, IBM i, and other x86-based OS.
HP has multi-OS Itanium from Intel, but is moving to Multi-OS blades instead. Their "x86 plus HP-UX" strategy is a two-pronged attack against IBM AIX and z/OS. Intel Nehalem chipset is approaching the RAS of Itanium, making the "mainframe argument" more difficult for HP-UX.
Before Oracle acquired Sun Microsystems, Oracle was focused on Linux as a UNIX replacement. After the acquisition, they now claim to support Linux and Solaris equally. They are now focused on trying to protect their rapidly declining install base by keeping IBM and HP out. They will work hard to differentiate Solaris as having "secret sauce" that is not in Linux. They will continue to compete head-on against Red Hat Linux.
An interactive poll of the audience indicated that the most strategic Linux/UNIX platform over the next next five years was Red Hat Linux. This beat out AIX, Solaris and HP-UX, as well as all of the other distributions of Linux.
The rooms emptied quickly after the last session, as everyone wanted to get to the "Hospitality Suites".
Continuing my post-week coverage of the [Data Center 2010 conference], Wednesday morning started with another keynote session, followed by some break-out sessions.
Realities of IT Investment
Tighter budgets mean more business decisions. Future investments will come from cost savings. The analysts report that 77 percent of IT decisions are made by CFOs. Most organizations are spending less now than back in 2008 before the recession.
How we innovate through IT is changing. In bad times, risk trumps return, but only 21 percent of the audience have a formal "risk calculation" as part of their purchase plans.
Divestment matters as much as investment. Reductions in complexity have the greatest long-term cost savings. Try to retire at least 20 percent of your applications next year. With the advent of Cloud Computing, companies might just retire it and go entirely with public cloud offerings. Note that this graph the years are different than the ones above, in groups of half-decade increments.
It is important to identify functional dependencies and link your IT risks to business outcomes. Focus on making costs visible, and re-think how you communicate IT performance measurements and their impact to business. Try to change the culture and mind-set so that projects are not referred to as "IT projects" focused on technology, but rather they are "business projects" focused on business results.
Moving to the Cloud
Richard Whitehead from Novell presented challenges in moving to Cloud Computing. There are risks and challenges managing multiple OS environments. Users should have full access to all IT resources they need to do their jobs. Computing should be secure, compliant, and portable. Here is the shift he sees from physical servers to virtual and cloud deployments, years 2010 to 2015:
Richard considers a "workload" as being the combination of the operating system, middleware, and application. He then defines "Business Service" as an appropriate combination of these workloads. For example, a business service that provides a particular report might involve a front-end application, talking through business logic workload server, talking to a back-end database workload server.
To address this challenge, Novell introduces "Intelligent Workload Management", called WorkloadIQ. This manages the lifecycle to build, secure, deploy, manage and measure each workload. Their motto was to take the mix of physical, virtual and cloud workloads all "make it work as one". IBM is a business partner with Novell, and I am a big fan of Novell's open-source solutions including SUSE Linux.
A Funny Thing Happened on the Way to the Cloud....
Bud Albers, CTO of Disney, shared their success in deploying their hybrid cloud infrastructure. Everyone recognizes the Disney brand for movies and theme parks, but may not aware that they also own ABC News and ESPN television, Travel cruises, virtual worlds, mobile sites, and deploy applications like Fantasy Football and Fantasy Fishing.
Two years ago, each Line of Business (LOB) owned their own servers, they were continually out of space, power and HVAC issues forced tactical build-outs of their datacenters. But in 2008, the answer to all questions was Cloud Computing, it slices and dices like something invented by [Ron Popeill], with no investment or IT staff required. However, continuing to ask the CFO for CAPEX to purchase assets that were only 1/7th used was not working out either. That's right, over 75 percent of their servers were running less than 15 percent CPU utilization.
The compromise was named "D*Cloud". Internal IT infrastructure would be positioned for Cloud Computing, by adopting server virtualization, implementing REST/SOAP interfaces, and replicating the success across their various Content Distribution Networks (CDN). Disney is no stranger to Open Source software, using Linux and PHP. Their [Open Source] web page shows tools available from Disney Animation studios.
At the half-way point, they had half their applications running virtualized on just 4 percent of their servers. Today, they run over 20 VMs per host and have 65 percent of their apps virtualized. Their target is 80 percent of their apps virtualized by 2014.
Bud used the analogy that public clouds will be the "gas stations" of the IT industry. People will choose the cheapest gas among nearby gas stations. By focusing on "Application management" rather than "VM instance management", Disney is able to seamlessly move applications as needed from private to public cloud platforms.
Their results? Disney is now averaging 40 percent CPU utilization across all servers. Bud feels they have achieved better scalability, better quality of service, and increased speed, all while saving money. Disney is spending less on IT now than in 2008,
UPMC Maximizes Storage Efficiency with IBM
Kevin Muha, UPMC Enterprise Architect & Technology Manager for Storage and Data Protection Services, was unable to present this in person, so Norm Protsman (IBM) presented Kevin's charts on the success at the University of Pittsburgh Medical Center [UPMC]. UPMC is Western Pennsylvania's largest employer, with roughly 50,000 employees across 20 hospitals, 400 doctors' offices and outpatient sites. They have frequently been rated one of the best hospitals in the US.
Their challenge was storage growth. Their storage environment had grown 328 percent over the past three years, to 1.6PB of disk and nearly 7 PB of physical tape. To address this, UPMC deployed four IBM TS7650G ProtecTIER gateways (2 clusters) and three XIV storage systems for their existing IBM Tivoli Storage Manager (TSM) environment. Since they were already using TSM over a Fibre Channel SAN, the implementation took only three days.
UPMC was backing up nearly 60TB per day, in a 15-hour back window. Their primary data is roughly 60 percent Oracle, with the rest being a mix of Microsoft Exchange, SQL Server, and unstructured data such as files and images.
Their results? TSM reclamation is 30 percent faster. Hardware footprint reduced from 9 tiles to 5. Over 50 percent reduction in recovery time for Oracle DB, and 20 percent reduction in recovery of SQL Server, Microsoft Exchange, and Epic Cache. They average 24:1 deduplication overall, which can be broken down by data category as follows:
29:1 Cerner Oracle
18:1 EPIC Cache
10:1 Microsoft SQL Server
8:1 Unstructured files
6:1 Microsoft Exchange
UPMC still has lots of LTO-4 tapes onsite and offsite from before the change-over, so the next phase planned is to implement "IP-based remote replication" between ProtecTIER gateways to a third data center at extended distance. The plan is to only replicate the backups of production data, and not replicate the backups of test/dev data.
Continuing my post-week coverage of the [Data Center 2010 conference], we had receptions on the Show floor. This started at the Monday evening reception and went on through a dessert reception Wednesday after lunch. I worked the IBM booth, and also walked around to make friends at other booths.
Here are my colleagues at the IBM booth. David Ayd, on the left, focuses on servers, everything from IBM System z mainframes, to POWER Systems that run IBM's AIX version of UNIX, and of course the System x servers for the x86 crowd. Greg Hintermeister, on the right, focuses on software, including IBM Systems Director and IBM Tivoli software. I covered all things storage, from disk to tape. For attendees that stopped by the booth expressing interest in IBM offerings, we gave out Starbucks gift cards for coffee, laptop bags, 4GB USB memory sticks and copies of my latest book: "Inside System Storage: Volume II".
Across the aisle were our cohorts from IBM Facilities and Data Center services. They had the big blue Portable Modular Data Center (PMDC). Last year, there were three vendors that offered these: IBM, SGI, and HP. Apparently, IBM won the smack-down, as IBM has returned victorious, as SGI only had the cooling portion of their "Ice Cube" and HP had no container whatsoever.
IBM's PMDC is fully insulated so that you can use it in cold weather below 50 degrees F like Alaska, to the hot climates up to 150 degrees F like Iraq or Afghanistan, and everything in between. They come in three lengths, 20, 40 and 53 feet, and can be combined and stacked as needed into bigger configurations. The systems include their own power generators, cooling, water chillers, fans, closed circuit surveillance, and fire suppression. Unlike the HP approach, IBM allows all the equipment to be serviced from the comfort inside.
This is Mary, one of the 200 employees secunded to the new VCE. Michael Capellas, the CEO of VCE, offered to give a hundred dollars to the [Boys and Girls Club of America], a charity we both support, if I agreed to take this picture. The Boys and Girls Club inspires and enables young people to realize their full potential as productive, responsible, and caring citizens, so it was for a good cause.
The show floor offers attendees a chance to see not just the major players in each space, but also all the new up-and-coming start-ups.
Mastering the art of stretching out a week-long event into two weeks' worth of blog posts, I continue my
coverage of the [Data Center 2010 conference], Tuesday afternoon I attended several sessions that focused on technologies for Cloud Computing.
(Note: It appears I need to repeat this. The analyst company that runs this event has kindly asked me not to mention their name on this blog, display any of their logos, mention the names of any of their employees, include photos of any of their analysts, include slides from their presentations, or quote verbatim any of their speech at this conference. This is all done to protect and respect their intellectual property that their members pay for. The pie charts included on this series of posts were rendered by Google Charting tool.)
Converging Storage and Network Fabrics
The analysts presented a set of alternative approaches to consolidating your SAN and LAN fabrics. Here were the choices discussed:
Fibre Channel over Ethernet (FCoE) - This requires 10GbE with Data Center Bridging (DCB) standards, what IBM refers to as Converged Enhanced Ethernet (CEE). Converged Network Adapters (CNAs) support FC, iSCSI, NFS and CIFS protocols on a single wire.
Internet SCSI (iSCSI) - This works on any flavor of Ethernet, is fully routable, and was developed in the 1990s by IBM and Cisco. Most 1GbE and all 10GbE Network Interface Cards (NIC) support TCP Offload Engine (TOE) and "boot from SAN" capability. Native suppot for iSCSI is widely available in most hypervisors and operating systems, including VMware and Windows. DCB Ethernet is not required for iSCSI, but can be helpful. Many customers keep their iSCSI traffic in a separate network (often referred to as an IP SAN) from the rest of their traditional LAN traffic.
Network Attached Storage (NAS) - NFS and CIFS have been around for a long time and work with any flavor of Ethernet. Like iSCSI, DCB is not required but can be helpful. NAS went from being for files only, to be used for email and database, and now is viewed as the easiest deployment for VMware. Vmotion is able to move VM guests from one host to another within the same LAN subnet.
Infiniband or PCI extenders - this approach allows many servers to share fewer number of NICs and HBAs. While Infiniband was limited in distance for its copper cables, recent advances now allow fiber optic cables for 150 meter distances.
Interactive poll of the audience offered some insight on plans to switch from FC/FICON to Ethernet-based storage:
Interactive poll of the audience offered some insight on what portion storage is FCP/FICON attached:
Interactive poll of the audience offered some insight on what portion storage is Ethernet-attached:
Interactive poll of the audience offered some insight on what portion of servers are already using some Ethernet-attached storage:
Each vendor has its own style. HP provides homogeneous solutions, having acquired 3COM and broken off relations with Cisco. Cisco offers tight alliances over closed proprietary solutions, publicly partnering with both EMC and NetApp for storage. IBM offers loose alliances, with IBM-branded solutions from Brocade and BNT, as well as reselling arrangements with Cisco and Juniper. Oracle has focused on Infiniband instead for its appliances.
The analysts predict that IBM will be the first to deliver 40 GbE, from their BNT acquisition. They predict by 2014 that Ethernet approaches (NAS, iSCSI, FCoE) will be the core technology for all but the largest SANs, and that iSCSI and NAS will be more widespread than FCoE. As for cabling, the analysts recommend copper within the rack, but fiber optic between racks. Consider SAN management software, such as IBM Tivoli Storage Productivity Center.
The analysts felt that the biggest inhibitor to merging SAN and LANs will be organizational issues. SAN administrators consider LAN administrators like "Cowboys" undisciplined and unwilling to focus on 24x7 operational availability, redundancy or business continuity. LAN administrators consider SAN administrators as "Luddites" afraid or unwilling to accept FCoE, iSCSI or NAS approaches.
Driving Innovation through Innovation
Mr. Shannon Poulin from Intel presented their advancements in Cloud Computing. Let's start with some facts and predictions:
There are over 2.5 billion photos on Facebook, which runs on 30,000 servers
30 billion videos viewed every month
Nearly all Internet-connected devices are either computers or phones
An additional billion people on the Internet
Cars, televisions, and households will also be connected to the Internet
The world will need 8x more network bandwidth, 12x more storage, and 20x more compute power
To avoid confusion between on-premise and off-premise deployments, Intel defines "private cloud" as "single tenant" and "public cloud" as "multi-tenant". Clouds should be
automated, efficient, simple, secure, and interoperable enough to allow federation of resources across providers. He also felt that Clouds should be "client-aware" so that it know what devices it is talking to, and optimizes the results accordingly. For example, if watching video on a small 320x240 smartphone screen, it makes no sense for the Cloud server to push out 1080p. All devices are going through a connected/disconnected dichotomy. They can do some things while disconnected, but other things only while connected to the Internet or Cloud provider.
An internal Intel task force investigated what it would take to beat MIPS and IBM POWER processors and found that their own Intel chips lacked key functionality. Intel plans to address some of their shortcomings with a new chip called "Sandbridge" sometime next year. They also plan a series of specialized chips that support graphics processing (GPU), network processing (NPU) and so on. He also mentioned Intel released "Tukwilla" earlier this year, the latest version of Itanium chip. HP is the last major company to still use Itanium for their servers.
Shannon wrapped up the talk with a discussion of two Cloud Computing initiatives. The first is [Intel® Cloud Builders], a cross-industry effort to build Cloud infrastructures based on the Intel Xeon chipset. The second is the [Open Data Center Alliance], comprised of leading global IT managers who are working together to define and promote data center requirements for the cloud and beyond.
The analysts feel that we need to switch from thinking about "boxes" (servers, storage, networks) to "resources". To this end, they envision a future datacenter where resources are connected to an any-to-any fabric that connects compute, memory, storage, and networking resources as commodities. They feel the current trend towards integrated system stacks is just a marketing ploy by vendors to fatten their wallets. (Ouch!)
A new concept to "disaggregate" caught my attention. When you make cookies, you disaggregate a cup of sugar from the sugar bag, a teaspoon of baking soda from the box, and so on. When you carve a LUN from a disk array, you are disaggregating the storage resources you need for a project. The analysts feel we should be able to do this with servers and network resources as well, so that when you want to deploy a new workload you just disaggregate the bits and pieces in the amounts you actually plan to use and combine them accordingly. IBM calls these combinations "ensembles" of Cloud computing.
Very few workloads require "best-of-breed" technologies. Rather, this new fabric-based infrastructure recognizes the reality that most workloads do not. One thing that IT Data Center operations can learn from Cloud Service Providers is their focus on "good enough" deployment.
This means however that IT professionals will need new skill sets. IT administrators will need to learn a bit of application development, systems integration, and runbook automation. Network adminis need to enter into 12-step programs to stop using Command Line Interfaces (CLI). Server admins need to put down their screwdrivers and focus instead on policy templates.
Whether you deploy private, public or hybrid cloud computing, the benefits are real and worth the changes needed in skill sets and organizational structure.
Continuing my coverage of the [Data Center 2010 conference], Tuesday afternoon I presented "Choosing the Right Storage for your Server Virtualization". In 2008 and 2009, I attended this conference as a blogger only, but this time I was also a presenter.
The conference asked vendors to condense their presentations down to 20 minutes. I am sure this was inspired by the popular 18-minute lectures from the [TED conference] or perhaps the [Pecha Kucha] night gatherings in Japan where each presenter speaks while showing 20 slides for 20 seconds each, This forces the presenters to focus on their key points and not fill the time slot with unnecessary marketing fluff. This also allows more vendors to have a chance to pitch their point of view.
Continuing my coverage of the Data Center 2010 conference, Tuesday morning I attended several sessions. The first was a serious IT discussion with Mazen Rawashdeh, Technology Executive from eBay, and the second was a lighthearted review of the benefits from Cloud Computing from humorist Dave Barry, and the third focused on re-architecting backup strategies.
eBay – How One Fast Growing Company is Solving its Infrastructure and Data Center Challenges
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change." -- Charles Darwin
So far, this has been the best session I have attended. eBay operates in 32 countries in seven languages, helping 90 million users to buy or sell 245 million items in 50,000 categories. Let's start with some statistics of the volume of traffic that eBay handles:
$2000 traded every second
cell phone sold every six seconds
pair of shoes sold every nine seconds
a major appliance sold every minute
93 billion database actions every day
50 TB of daily ingested daily
code changes to the eBay application are rolled in every day
In 2007, eBay discovered a disturbing trend, that infrastructure costs matched linear growth to business listing volume, which was an unsustainable model. Mazen Rawashdeh, eBay Marketplace Technology Operations, presented their strategy to break free from this problem. They want to double the number of listings without doubling their costs. They are 2 years into their 4 year plan:
Switched from expensive 12U high servers consuming 3 Kilowatts over to open source software on commodity 1-2U server hardware. Mazen owns all the costs from cement floor up to the web server.
Replaced team-optimized key performance indicators (KPI) with a common KPI. The server team focused on transactions per minute. The storage team was focused on utilization. The network team was focused on MB/sec bandwidth. The problem is that changes to optimize one might have negative impact to other teams. The new KPI was "Watts per listing" that allowed all teams to focus on a common goal.
Focused on changing the corporate culture for communicating clear measurable goals so that everyone understands the why and how of this new KPI. You have to spend money to save money in the long run. Consider costs at least 36 months out.
Changed from purchasing servers and depreciating them over 3 years to a lease model with server replacement tech refresh every 18 months. It is a bad idea to keep IT equipment after full depreciation, as energy savings alone on new equipment easily justifies 18-month replacement.
Adopted storage tiers. Storage is purchased not leased because it is more difficult to swap out disk arrays. They have 10-40 PB of disk. They do not use traditional backup, but rather use disk replication across distant locations. They are quick to delete or archive data that does not belong on their production systems.
Their results so far? They have reduced the Watts per listing by 70 percent over the past two years. They were able to double their volume with a relatively flat IT budget.
The Wit and Wisdom of Dave Barry, Humorist and Author
Dave Barry is a humor columnist. For 25 years he was a syndicated columnist whose work appeared in more than 500 newspapers in the United States and abroad, including the [Funny Times] that I subscribe to. In 1988 he won the Pulitzer Prize for Commentary about the election and politics in general. Dave has also written a total of 30 books, of which two of his books were used as the basis for the CBS TV sitcom "Dave's World," in which Harry Anderson played a much taller version of Dave.
I first met Dave about ten years ago at a SHARE conference in Minneapolis, MN. It was good to see him again.
Backup and Beyond
The analyst covered the "Three C's" of backup: cost, capability and complexity. There are many ways to implement backup, and he predicts that 30 percent of all companies will re-evaluate and re-architect their backup strategy, or at least change their backup software, by 2014 to address these three issues. Another survey indicates that 43 percent of companies are considering backup the primary reason they are investigating public cloud service providers.
The top three primary backup software vendors for the audience were Symantec, IBM, and Commvault. An interactive poll of the audience offered some insight:
There appears to be shift away from using disk to emulate tape (Virtual Tape Library) and instead use direct disk interfaces.
Some of the recommended actions were:
Exploit backup software features. On average, people keep 11 versions of backup, try cutting this down to four versions. IBM Tivoli Storage Manager allows this to be done via management class policies.
Implement a separate archive. Once data is archived and backed up, it reduces the backup load of production systems. Any chance to backup semi-static data less frequently will help.
Switch to capacity-based pricing which will allow more flexibility on server options to run backup software.
Implement data deduplication and compression, such as with IBM ProtecTIER data deduplication solution.
Consider a tiered recovery approach, where less critical applications have less backup protection. Many keep 1-2 years of backups, but 90 percent of all recoveries are for backups from the most recent 27 days. Reduce backup retention to 90 days.
Consider adopting a "Unified Recovery Management" strategy that protects laptops and desktops, remote office and branch offices, mission critical applications, and provide for business continuity and disaster recovery.
regularly test your recovery to validate your procedures and assumptions of your recoverability.
While the conference is divided into seven major tracks, it quickly becomes obvious that many of these IT datacenter issues overlap, and that approaches and decisions in one area can easily impact other areas.
Continuing my coverage of the Data Center 2010 conference, Monday I afternoon included presentations from IBM executives.
Blueprint for a Smart data center
Steve Sams, IBM Vice President, Global Site and Facilities Services, is well known at this conference. In charge of designing and building data center facilities for IBM and its clients, he has lots of experience in various datacenter configurations.
The presentation was an update from last year's [Data Center Cost Saving Actions Your CFO Will Love]. 70 cents of every IT dollar is spent on just keeping the existing systems running, leaving only 30 percent to handle growth and business transformation. Over 70 percent of datacenters are more than seven years old, and may not be designed to handle today's density in IT equipment.
Many companies wanting to virtualize are stalled. IBM's Server Virtualization Analytics services can help cut this transformation time in half, with an ROI of only 6-18 months for complex Wintel environments. This is just one of the 17 end-to-end datacenter analytics tools IBM offers. The results have been 220 percent more VM instances per admin FTE than traditional deployments. IBM drinks its own champagne, having saved over $4 Billion USD in its own datacenter consolidation and virtualization projects.
Want to Cut the Cost of Storage in Half? Here’s How
The speaker of this session started out with a startling prediction: the amount of storage purchased in the five years 2010-2014 will be 25x what was purchased in 2009, on a PB basis. Most attempts to stem this capacity growth have failed. Therefore, the focus to cut storage costs need to be elsewhere.
The first concern is poor utilization. Utilization on DAS averages 10 percent, SANs 40-50 percent. Thin provisioning can raise this to 60-75 percent. Thin Provisioning was first introduced for the mainframe storage in the 1990s by StorageTek which IBM resold as the IBM RAMAC Virtual Array (RVA), but many credit 3PAR for porting this over to distributed operating systems in 2002. Other options include data deduplication and compression to reduce the cost of storing data on disk.
The second approach is use of storage tiering. In this case, the speaker felt SATA was 3x cheaper ($/GB) but can also be 3x lower performance. Moving data between faster FC/SAS 10K and 15K RPM drives to slower 7200 RPM drives can offer some cost reductions.
Implementing "quotas" in email, file systems or other applications is one of the worst financial decisions an IT department can make, as it merely shifts the storage management from experts (IT staff) to non-experts (end users).
The speaker recommended using archive instead. Keeping backup tapes for long-term is not archive, backups should not be older than eight weeks old.
Interactive polls of the audience gave some interesting insight:
When asked expected storage capacity "compound annual growth rate" (CAGR) for the next few years, 26 percent estimate 35-50 CAGR, 30 percent estimate 50-75 CAGR, and 15 estimate greater than 75 percent CAGR.
For thin provisioning, 43 percent of the audience already are using it, and 33 percent plan to next year.
Similarly , 41 percent of audience is using data deduplication for their primary data, and 30 percent plan to next year.
For automated tiering that moves portions of data automatically between fast and slow tiers of storage to optimize performance, like IBM's Easy Tier, 20 percent are already using it, and 44 percent plan to next year.
41 percent already have some archiving for file systems, 17 percent plan to next year.
Only 6 percent have an all-disk backup/replication environment, but 20 percent plan to adopt this next year.
The downsize of trying to squeeze out costs with these approaches and technologies is that there can be negative impact to performance. The speaker suggested a balanced approach of adding lower cost storage to existing fast storage to meet both capacity and performance requirements.
Smarter Infrastructures Deliver Better Economics
Elaine Lennox, IBM Vice President and Business Line Executive for System Software, presented the "3 D's" of a Smarter Infrastructure: design, data and delivery.
Design: new technologies and approaches are forcing people to reconsider the design of their applications, their infrastructure and their facilities.
Data: on average, companies store 17 copies of the same piece of production data. Data needs to be managed better in the future.
Delivery: new types of cloud computing are changing the way IT services can be delivered, and how they are consumed by end users.
Roadmap to Enterprise Cloud Computing
This was a combo vendor/customer presentation. Rex Wang from Oracle presented an overview of Oracle's service and product offerings, and then Jonathan Levine, COO of LinkShare, presented his experiences deploying Oracle ExaData.
Rex presented Oracle's "Cloud maturity model" that has its customers go through the following steps:
Silo: each application on its own stack of software, server and storage.
Grid: virtualization for shared infrastructure and platforms (internal IaaS and PaaS).
Private cloud: self-service, policy-based management, metered chargeback and capacity planning.
Hybrid Cloud: workloads portable between private and public clouds, offering federation, cloud bursting, and interoperability.
Rex felt the standard "Buy vs Rent" argument in the business world applies to IT as well, and that there could be break-even points over long-term TCO analysis that favors one over the other. He cited internal research that showed 28 percent of Oracle customers have internal or private cloud, and 14 percent use public cloud. 25 percent use Application PaaS, 21 percent database PaaS, 5 percent Identity management PaaS, 10 percent Compute IaaS, 18 percent storage IaaS, and 15 percent Test/Dev IaaS.
Rex felt that in all the hype around taking a single host and dividing it into multiple VMs, people have forgotten that the opposite approach of taking multiple instances into clusters is also important. He also felt you have to look at the entire "Application Lifecycle" that goes from:
IT sets up the equipment as an internal PaaS or IaaS
Developers write the application
End users are trained and use the application
Application owners manage and monitor the application
IT meters the usage and does chargeback to each application owner
Oracle's ExaData and ExaLogic compete directly against IBM's Smart Analytics System, IBM CloudBurst, and IBM Smart Business Storage Cloud.
Next up was Jonathan Levine, COO of [LinkShare], a subsidiary of Rakutan in Japan. This is an [Affiliated Marketing] company. Instead of pay-per-view or pay-per-click web advertising, this company only gets paid when the "end user" actually buys something when clicking on web advertising.
The business runs on an 8TB data warehouse and 1 TB OLTP database, ingesting 50GB daily, with 400 million transactions per day with 8.5 GB/sec throughput.
They discovered that the Oracle ExaData did not work right out of the box. In fact, it took them about a year to get it working for them, roughly the same amount of months it took them on their last Oracle 10 to Oracle 11 conversion.
Part of their business allows advertisers and web content publishers to generate reports on activity. Jonathan indicates that if the response is longer than 5 seconds, it might as well be an hour. He called this the "Excel" rule, that results need to be as fast as local PC Microsoft Excel pivot table processing.
With the new Exadata, they met this requirement. Over 84 percent of their transactions happen under 2 seconds, 9 percent take 2-4 seconds, and another 4 percent in the 4-8 second range. They hope that as they approach the winter holiday season that they can handle 2-3x more traffic without negatively impacting this response time.
Continuing my coverage of the Data Center 2010 conference, Monday I attended four keynote sessions.
The first keynote speaker started out with an [English proverb]: Turbulent waters make for skillful mariners.
He covered the state of the global economy and how CIOs should address the challenge. We are on the flat end of an "L-shaped" recovery in the United States. GDP growth is expected to be only 4.7 percent Latin America, 2.3 percent in North America, 1.5 percent Europe. Top growth areas include 8.0 percent India and 8.6 percent China, with an average of 4.7 growth for the entire Asia Pacific region.
On the technical side, the top technologies that CIOs are pursuing for 2011 are Cloud Computing, Virtualization, Mobility, and Business Intelligence/Analytics. He asked the audience if the "Stack Wars" for integrated systems are hurting or helping innovation in these areas.
Move over "conflict diamonds", companies now need to worry about [conflict minerals].
He proposed an alternative approach called Fabric-Based Infrastructure. In this new model, a shared pool of servers is connected to a shared pool of storage over an any-to-any network. In this approach, IT staff spend all of their time just stocking up the vending machine, allowing end-users to get the resources they need.
Crucial Trends You Need to Watch
The second speaker covered ten trends to watch, but these were not limited to just technology trends.
Virtualization is just beginning - even though IBM has had server virtualization since 1967 and storage virtualization since 1974, the speaker felt that adoption of virtualization is still in its infancy. Ten years ago, average CPU utilization for x86 servers of was only 5-7 percent. Thanks to server virtualization like VMware and Hyper-V, companies have increased this to 25 percent, but many projects to virtualized have stalled.
Big Data is the elephant in the room - storage growth is expected to grow 800 percent over the next 5 years.
Green IT - Datacenters consume 40 to 100 times more energy than the offices they support. Six months ago, Energy Star had announced [standards for datacenters] and energy efficiency initiatives.
Unified Communications - Voice over IP (VoIP) technologies, collaboration with email and instant messages, and focus on Mobile smartphones and other devices combines many overlapping areas of communication.
Staff retention and retraining - According to US Labor statistics, the average worker will have 10 to 14 different jobs by the time they reach 38 years of age. People need to broaden their scope and not be so vertically focused on specific areas.
Social Networks and Web 2.0 - the keynote speaker feels this is happening, and companies that try to restrict usage at work are fighting an uphill battle. Better to get ready for it and adopt appropriate policies.
Legacy Migrations - companies are stuck on old technology like Microsoft Windows XP, Internet Explorer 6, and older levels of Office applications. Time is running out, but migration to later releases or alternatives like Red Hat Linux with Firefox browser are not trivial tasks.
Compute Density - Moore's Law that says compute capability will double every 18 months is still going strong. We are now getting more cores per socket, forcing applications to re-write for parallel processing, or use virtualization technologies.
Cloud Computing - every session this week will mention Cloud Computing.
Converged Fabrics - some new approaches are taking shape for datacenter design. Fabric-based infrastructure would benefit from converging SAN and LAN fabrics to allow pools of servers to communicate freely to pools of storage.
He sprinkled fun factoids about our world to keep things entertaining.
50 percent of today's 21-year-olds have produced content for the web. 70 percent of four-year-olds have used a computer. The average teenager writes 2,282 text messages on their cell phone per month.
This year, Google averaged 31 billion searches per month, compared 2.6 billion searches per month in 2007.
More video has been uploaded to YouTube in the last two months than the three major US networks (ABC, NBC, CBS) have aired since 1948.
Wikipedia averages 4300 new articles per day, and now has over 13 million articles.
This year, Facebook reached 500 million users. If it were a country, it would be ranked third. Twitter would be ranked 7th, with 69% of their growth being from people 32-50 years old.
In 1997, a GB of flash memory cost nearly $8000 to manufacture, today it is only $1.25 instead.
The computer in today's cell phone is million times cheaper, and thousand times more powerful, than a single computer installed at MIT back in 1965. In 25 years, the compute capacity of today's cell phones could fit inside a blood cell.
See [interview of Ray Kurzweil] on the Singularity for more details.
The Virtualization Scenario: 2010 to 2015
The third keynote covered virtualization. While server virtualization has helped reduce server costs, as well as power and cooling energy consumption, it has had a negative effect on other areas. Companies that have adopted server virtualization have discovered increased costs for storage, software and test/development efforts.
The result is a gap between expectations and reality. Many virtualization projects have stalled because there is a lack of long-term planning. The analysts recommend deploying virtualization in stages, tackle the first third, so called "low hanging fruit", then proceed with the next third, and then wait and evaluate results before completing the last third, most difficult applications.
Virtualization of storage and desktop clients are completely different projects than server virtualization and should be handled accordingly.
Cloud Computing: Riding the Storm Out
The fourth keynote focus on the pros and cons of Cloud Computing. First they start by defining the five key attributes of Cloud: self-service, scalable elasticity, shared pool of resources, metered and paid per use, over open standard networking technologies.
In addition to IaaS, PaaS and SaaS classifications, the keynote speaker mentioned a fourth one: Business Process as a Service (BPaaS), such as processing Payroll or printing invoices.
While the debate rages over the benefits between private and public cloud approaches, the keynote speaker brings up the opportunites for hybrid and community clouds. In fact, he felt there is a business model for a "cloud broker" that acts as the go-between companies and cloud service providers.
A poll of the audience found the top concerns inhibiting cloud adoption were security, privacy, regulatory compliance and immaturity. Some 66 percent indicated they plan to spend more on private cloud in 2011, and 20 percent plan to spend more on public cloud options. He suggested six focus areas:
Test and Development
Prototyping / Proof-of-Concept efforts
Web Application serving
SaaS like email and business analytics
Select workloads that lend themselves to parallelization
The session wrapped up with some stunning results reported by companies. Server provisioning accomplished in 3-5 minutes instead of 7-12 weeks. Reduced cost of email by 70 percent. Four-hour batch jobs now completed in 20 minutes. 50 percent increase in compute capacity with flat IT budget. With these kind of results, the speaker suggests that CIOs should at least start experimenting with cloud technologies and start to profile their workloads and IT services to develop a strategy.
That was just Monday morning, this is going to be an interesting week!
This week I am in beautiful Las Vegas for the Data Center 2010 Conference. While the conference officially starts Monday, I arrived on Sunday to help set up the IBM Booth (Booth "Z").
(Note: This is my third year attending this conference. IBM is a platinum sponsor for this event. The analyst company that runs this event has kindly asked me not to mention their name on this blog, display any of their logos, mention the names of any of their employees, include photos of any of their analysts, include slides from their presentations, or quote verbatim any of their speech at this conference. This is all done to protect and respect their intellectual property that their members pay for. This is all documented in a lengthy document in case I forget. So, if the picture of the conference backpack appears lopped off at the top, this was done intentionally to comply with their request. The list of sponsors at this event represents a "who's who" of the IT industry.)
The pre-conference orientation is for people who are first-timers, or for those who have not attended this conference in a while. The conference includes 7 keynote presentations and 68 sessions organized into seven "tracks" plus one "virtual track" which crosses the other seven:
Servers and Operating Systems
Cost Optimization "Virtual Track"
Each session is further classified as foundational versus advanced, business versus technical, and practical versus strategic.
The speaker also presented some unique methodologies that will be used this week, including "Magic Quadrant", "MarketScope", "Hype Cycle" and "IT Market Clock" which provide graphical representation to help attendees better understand the conference materials.
The Welcome Reception was sponsored by VCE, formerly known as Acadia, the coalition comprised of VMware, Intel, Cisco and EMC. I joked that this should be "VICE" so that Intel does not feel left out.
While we enjoyed drinks and snacks, we listened to live music from the all-violin band [Phat Strad].
The CEO of VCE, Michael Capellas, recognized me from across the room and came over to ask me how IBM was doing. We had a nice friendly chat about the IT industry and the economy.
Next week, once again, I will be blogging from beautiful Caesars Palace hotel in Las Vegas, Nevada to report on what I see and hear at the 29th annual Data Center Conference. Here are my posts from 12 months ago when I attended this conference in 2009:
Again we will have a Solutions Showcase with a Portable Modular Data Center (PMDC) and various exhibits. I will be manning the booths, stop on by. Plus, on Tuesday, I will be be speaking! My topic will be "Choosing the right storage for your server virtualization environment."
Those of you on twitter can follow me at [@az990tony] and hash tag #LSC29. I will be available for one-on-one consultations sessions. I am arriving Sunday morning, Dec. 5, and staying through Thursday afternoon, December 9.
Just in time for [Cyber Monday], Volume II of my "Inside System Storage" book series is now available. As I mentioned in my post on the [October 7th Launch announcement], I finally got past all the internal restrictions that prevented this volume from being published earlier.
My first book covered my initial 12 months of blogging experience, from September 2006 to August 2007. This book covers the history of my career transition from software engineer developer to marketing strategist.
My second book covers the next 8 months, from September 2007 to April 2008, spanning the acquisitions of XIV and Diligent companies that were part of an overall strategic re-alignment of storage within the broader "Systems and Technology Group" of IBM.
The books come in a variety of formats, including hardcover with dust jacket, paperback, and online eBook (PDF). My publisher, Lulu, now supports ePub format, so I am investigating the time and effort required to build this format from the source files.
The client that bought these dozen IBM System Storage DS8800 disk systems also bought three DS8700 systems.
Governor's Celebration of Innovation [GCOI] is an annual awards gala, with attendees who include technologists, corporate executives, entrepreneurs, investors, and policymakers. Last week, IBM was awarded "Innovator of the Year" in the Large Company category for its Easy Tier feature of the IBM DS8700 that allows optimal use of Solid-State Drives through sub-LUN automatic movement of data. IBM's Long Term File System (LTFS) was also a finalist under consideration.
The award was presented to Cindy Grossman, IBM VP and Senior Location Executive for Tucson. Joining her were Dr. Krishna Nathan, Ed Childers, Glen Jaquette, Vincent Hsu, Rick Krebsbach, Gene Leo, Denise Lopez, Hironobu Nagura, Calline Sanchez, Johnny Smith, and Dr. Cheng-Chung Song.
This week, IBM launched the new [IBM Expert Network] that provides presentation materials from subject matter experts. I am honored to be one of the 20-plus experts selected for PRO accounts on SlideShare.Net to help seed this with initial materials.
I have a bit of behind-the-scenes history to share on this. Back in 2008, I first discovered SlideShare.net as an excellent resource to get ideas for presentations. Much like YouTube is for videos and FlickR is for photos, SlideShare.Net is for presentations. In my June 2008 post, [Summer Jobs and the Singularity], I embedded someone's presentation from SlideShare.
This latter one got me in a bit of trouble internally. Neither presentation had anything secret or controversial, so I didn't see the issue. Several other bloggers had asked how I got "permission" to use an external Software-as-a-Service (SaaS) like SlideShare.net for my blog. I never asked for permission! I explained that since IBM's internal Lotus Connections software we use for blogging did not have a feature to embed PowerPoint (PPT) or Open Document Format (ODP) presentations, I chose an external service instead. Yes, I guess I could have converted each page to a JPG or PNG graphic instead, or I could have put the PDF on an FTP download area of the "Files" feature of Lotus Connections, but I chose SlideShare.net instead.
The result? IBM communications decided to make an official list, it's actually three lists. A "white list" of services that we are allowed to use, a "grey list" of services under evaluation or negotiation, and a "black list" of services we are not allowed to use, and sadly Slideshare.Net was on the black list. I protested, argued that unless IBM offered something to replace it, to re-evaluate this external service. I got it back on the "grey list" and now, this week, it is officially on the "white list".
Of course, this probably involved negotiation on EULA terms and conditions, but I am not a lawyer and have no idea what went on behind closed doors to make this happen. I am just glad it did.
On Wikibon, David Floyer has an article titled [SAS Drives Tier 1 to New Levels of Green] that focuses on the energy efficiency benefits of newer Serial-Attach SCSI (SAS) drives over older Fibre Channel (FC) drives. This makes sense, as R&D budgets have been spent on making newer technologies more "green".
Of course, people might consider this an [apples-to-oranges] comparison. Not only are we changing from FC to SAS technology, we are also changing from 3.5-inch drives to small form factor (SFF) 2.5-inch drives. It seems odd to specify 2000 drives, when only two of the five scale up to that level. Few systems in production, from any vendor, have more than 1000 drives, so it would have seemed that would have been a fairer comparison.
However, Hu's conclusion that the combination of SAS and SFF provides better performance and energy efficiency for both IBM DS8800 and HDS VSP than FC-based alternatives from any vendor seems reasonably supported by the data.
Meanwhile, fellow blogger David Merrill (HDS) pokes fun at IBM DS8800 in Figure 2 in his post [Winner o’ the green]. This second comparison was for 4PB of raw capacity, which 4 of the 5 can handle easily using 2TB SATA drives, but the DS8800 is based on SAS technology and does not support 2TB SATA drives. A performance-oriented configuration with four distinct DS8800 boxes employing 600GB SAS drives is used instead, causing the data for the DS8800 to stick out like a sore thumb, or perhaps more intentionally as a middle finger.
The main take-away here is that IBM offers both the DS8700 for capacity-optimized workloads, and the DS8800 for performance-optimized workloads. Some competitors may have been spreading FUD that the DS8700 was withdrawn last month, it wasn't. As you can see from the data presented, there are times where a DS8700 might be more preferable than a DS8800, depending on the type of workloads you plan to deploy. IBM offers both, and will continue to support existing DS8700 and DS8800 units in the field for many years to come.
This year marks the 10 year anniversary of IBM's introduction of LTO tape technology. IBM is a member of the Linear Tape Open consortium which consists of IBM, HP and Quantum, referred to as "Technology Provider Companies" or TPCs. In an earlier job role, I was the "portfolio manager" for both LTO and Enterprise tape product lines.
Today, we held a celebration in Tucson, with cake and refreshments.
IBM Executives Doug Balog, IBM VP of Storage Platform, and Sanjay Tripathi, the new IBM Director and Business Line Executive for Tape, VTL and Archive systems, presented the successes of LTO tape over the past 10 years.
To date over 3.5 million LTO tape drives, and over 150 million LTO tape media cartridges have been shipped which is a testament to the remarkable marketplace acceptance of the technology.
In honor of this event, I decided to interview Bruce Master, IBM Senior Program Manager for Data Protection Systems, about this 10 year anniversary.
10 years of LTO technology is a great milestone. How is this especially significant to IBM and its clients?
According to IDC data, IBM has held the #1 leader position in market share for total world wide branded tape revenue for over 7 years and that IBM is still #1 in branded midrange tape revenue which includes the LTO tape technologies. IBM was the first drive manufacturer to deliver LTO-1 drives, back in September 2000, the first to deliver tape drive encryption to the marketplace on LTO-4 drives, and is shipping LTO generation 5 drives and libraries. IBM is the author of the new Linear Tape File System (LTFS) specification that has been adopted by the TPCs. This file system revolutionizes how tape can be used as if it were a giant 1.5 terabyte removable USB memory stick with the capability to be accessed with directory tree structures and drag and drop functionality. With LTO's built-in real-time compression, a single tape cartridge can hold up to 3TB of data.
The Linear Tape File System has been getting a lot of attention. Where can we learn more about it?
Why is tape still a critical part of a storage infrastructure?
Tape is low cost and provides critical off-line portable storage to help protect data from attacks that can occur with on-line data. For instance, on-line data is at risk of attack from a virus, hacker, system error, disgruntled employee, and more. Since tape is off-line, not accessible by the system, it protects against these forms of corruption. LTO technology also provides write-once read-many (WORM) tape media to help address compliance issues that specify non-erasable, non-rewriteable (NENR) storage, hardware encryption to secure data, as well as a low cost long term archive media. When data cools off, or becomes infrequently accessed, why keep it on spinning disk? Move it to tape where it is much greener and lower cost. A tape in a slot on a shelf consumes minimal energy.
So tape is not dead?
Ha! Far from it. Seems like disk-only "specialty shop" storage vendors that don’t have tape in their sales portfolio are the ones that propagate that myth. In reality, storage managers are tasked with meeting complex objectives for performance, compliance, security, data protection, archive and total cost of ownership. Optimally, a blend of disk and tape in a tiered infrastructure can best address these objectives. You can’t build a house with just a hammer. IBM has a rich tool kit of storage offerings including disk, tape, software, services and deduplication technologies to help clients address their needs.
Do you have an example of a client who was saved by tape?
Yes indeed. Estes Express, a large trucking firm, was hit by a hurricane that flooded their data center and destroyed all systems. Fortunately the company survived because the night before they had backed up all data on to IBM tape and moved the cartridges offsite! The company survived and has since implemented a best practices data protection strategy with a combination of disk-to-disk-to-tape (D2D2T) using LTO tape at the primary site, and a remote global mirrored site that is also backed up to LTO tape.
So tape saved the day. What is the outlook for tape innovation in the future?
The future is bright for tape. Earlier this year, IBM and Fujifilm were able to [demonstrate a tape density achievement] that could enable a native 35TB tape cartridge capacity! This shows a long roadmap ahead for tape and a continued good night’s sleep for storage managers knowing that their precious data will be safe.
Of course, LTO tape is just one of the many reasons IBM is a successful and profitable leader in the IT storage industry. Doug Balog talked about his experiences in London for the [October 7th launch] of IBM DS8800, Storwize V7000 and SAN Volume Controller 6.1. Sanjay Tripathi showed recent successes with IBM's ProtecTIER Data Deduplication Solution and Information Archive products.
I would like to thank Bruce Master for his time in completing this interview. To learn more about IBM tape and storage offerings, visit [ibm.com/storage].
Each quarter since 2006, the [IBM Migration Factory] team has tallied the number of clients who have moved to IBM severs and storage systems from competitive hardware. We'll I've just seen the latest numbers, for the third quarter of 2010, and it looks like we set a new quarterly record with nearly 400 total migrations to IBM from Oracle/Sun and HP.
It's clear that companies and governments worldwide are seeing greater value in IBM systems, while Oracle and HP watch their customer bases erode. In just this past 3Q 2010, nearly 400 clients have moved over to IBM -- almost all of them from Oracle/Sun and HP. Of these, 286 clients migrated to IBM Power Systems, running AIX, Linux and IBM i operating systems, from competitors alone -- nearly 175 from Oracle/Sun and nearly 100 from HP. The number of migrations to IBM Power Systems through the first three quarters of 2010 is nearly 800, already exceeding the total for all of last year by more than 200.
Let's do the math.... Since IBM established its Migration Factory program in 2006, more than 4,500 clients have switched to IBM. More than 1,000 from Oracle/Sun and HP joined the exodus this year alone. In less than five years, almost 3,000 of these clients -- including more than 1,500 from Oracle/Sun and more than 1,000 from HP -- have chosen to run their businesses on IBM's Power Systems. That's more than a client per day making the move to IBM!
And as the servers go, so goes the storage. Clients are re-discovering IBM as a server and storage powerhouse, offering a strong portfolio in servers, disk and tape systems, and how synergies between servers and storage can provide them real business benefits.
Adding it all up, it's clear that IBM's multi-billion dollar investment in helping to build a smarter planet with workload-optimized systems is paying off -- and that, more and more, clients are selecting IBM over the competition to help them meet their business needs.
This week I am in New York City to meet with clients, IBM Business Partners, Independent Software Vendors (ISV) and Industry Solution Resellers (ISR). I'll be at IBM's [Wall Street Center of Excellence]. IBM has over 120 client centers worldwide.
"How can I participate in IBM's Smarter Planet, specifically Smarter Cities?"
With a lot of college students graduating next month, I thought this would be a good question to answer.
Apply for a Job at IBM
The best way to participate in IBM Smarter Cities is to get a job within IBM, and then get assigned to one of the many IBM Smarter Cities projects. Visit IBM's [Employment Page] to learn why IBM is recognized as one of the top 50 most attractive employers in the world. Mention "Smarter Cities" on your Resume so it can be routed to the appropriate manager.
Join the Conversation
Another way to participate in Smarter Cities is to "join the conversation". Each of IBM's 25 different programs has folks that are focused on that area, with blogs, forums and case studies. Here is the conversations page for [Smarter Cities]. Watch the videos at ibm.com/theSmarterCity]. Play IBM's [City One], IBM's Smarter Planet for game for Smarter Cities. Provide IBM feedback on any ideas you might have to help make cities smarter.
You can also join in one of the many upcoming [IBM Jam events]. Jams are not restricted to generating business ideas. Their methods, tools and technology can also be applied to social issues. In 2005, over three days, the Government of Canada, UN-HABITAT and IBM hosted Habitat Jam. Tens of thousands of participants - from urban specialists, to government leaders, to residents from cities around the world - discussed issues of urban sustainability. Their ideas shaped the agenda for the UN World Urban Forum, held in June 2006. People from 158 countries registered for the jam and shared their ideas for action to improve the environment, health, safety and quality of life in the world's burgeoning cities.
Buy Products and/or Services from IBM
IBM has the resources to help the planet in so many ways that NGOs and non-profit agencies only dream of. With IBM's advocacy for causes like global public education, universal healthcare, and improved infrastructures, people often forget that IBM is not itself a non-profit organization. IBM has learned early on that creating value for the world can also be good business. The more people buy from IBM, the more skills and resources IBM will have to solve the world's toughest challenges.
It's Tuesday, and you know what that means... IBM Announcements!
IBM System Storage ProtecTIER
Today, IBM refreshed its IBM System Storage ProtecTIER data deduplication family with new hardware and software. On the hardware side, The [TS7650G gateway] now has 32 cores and 64GB RAM. The [TS7650 Appliance] now has 24 cores and 64GB of RAM, and the [TS7610 Appliance Express] has 4 cores and up to 16GB of RAM.
On the software side, all of these now support Symantec's proprietary "OpenStorage" OST API. This applies across the board, from the [Enterprise Edition], [Appliance Edition], and the [Entry Edition]. For those using Symantec NetBackup as their backup software, the OST API can provide advantages over the standard VTL interface.
IBM Systems Director Storage Control
The second announcement has an interesting twist. I could file this in my "I Told You So" folder. Offiically, it's called the [Cassandra Complex], where you accurately predict how something will turn out, but being unable to convince anyone else of what the future holds.
About ten years ago, I was asked to be lead architect of a new product to be called IBM TotalStorage Productivity Center, which was later renamed to IBM Tivoli Storage Productivity Center. This would combine three projects:
Tivoli Storage Resource Manager (TSRM)
Tivoli SAN Manager (TSANM)
Multiple Device Manager (MDM)
The first two were based on Tivoli's internal GUI platform, and the MDM was a plug-in for IBM Systems Director. I argued that administrators would want everything on a single pane of glass, and that we should bring all the components under a common GUI platform, such as IBM Systems Director. Unfortunately, management did not agree with me on that, and preferred instead to leave each interface alone to minimize development effort. The only "unification" was to give them all similar sounding names, four components packaged as single product:
Productivity Center for Data (formerly TSRM)
Productivity Center for Fabric (formerly TSANM)
Productivity Center for Disk (formerly MDM)
Productivity Center for Replication (formerly MDM)
While this management decision certainly allowed version 1 to hit the market sooner, this was not a good "first impression" of the product for many of our clients.
In 2002, IBM acquired Trellisoft, Inc. which replaced the internally-developed TSRM with a much better interface, but again, this was different GUI than the other components. A "launcher" was created that would launch the various disparate interfaces for each component for Version 2. At this point, we have different development teams scattered in five locations, with the first two components being developed by the Tivoli software team, and the other two components being developed by the System Storage hardware team.
Often times, when a technical lead architect and management do not agree, things do not end well. The lead architect has to leave the product, and management is forced to take alternative actions to keep the product going. In my case, management considered the idea of a common GUI as an expensive "nice-to-have" luxury we could not afford, but I considered this a "must-have". I moved on to a new job within IBM, and management, unable to continue without my leadership, gave up and handed the entire project over to the Tivoli Software team.
The Tivoli Software team took a whiff at the pile of code and agreed that it stunk. Dusting off my original design documents, they pretty much discarded most of the code and re-wrote much from scratch, with a common database, common app server, and common GUI platform. Unfortunately, Productivity Center for Replication was held up waiting for some hardware prerequisites, but the other three components would be packaged together as "Productivity Center v3 - Standard Edition" and was a big improvement over the prior versions.
In Version 4, TotalStorage Productivity Center was renamed to Tivoli Storage Productivity Center, and the Replication component was brought into the mix. A scaled-down version packaged as Productivity Center "Basic Edition" was made available as a hardware appliance named "System Storage Productivity Center" or SSPC. The idea was to provide a pre-installed 1U-high hardware console that had the basic functions of Productivity Center, with the option to upgrade to the full Tivoli Storage Productivity Center with just license keys.
So, now, years later, management recognizes that a common GUI platform is more than just a "nice-to-have". IBM now support three very specific use cases:
1. Administration for a single product
For small clients who might have only a single IBM product, IBM is now focused on making the GUI browser-based, specifically to work with the Mozilla Firefox browser, but any similar browser should work as well. The new IBM Storwize V7000 GUI is a good example of this.In this case, the browser serves as the common GUI platform.
2. Administration for both servers and storage devices
For mid-sized companies that have administrators managing both servers and storage, IBM announced this month the new [IBM Systems Director Storage Control v4.2.1] plug-in, which provides Tivoli Storage Productivity Center "Basic Edition" support. This allows admins already familiar with IBM Systems Director for managing their servers to also manage basic storage functions. This is the "I Told You So" moment, connecting server and storage administration under the IBM Systems Director management platform makes a lot of sense, it did when I came up with the idea 10 years ago! Hmmmm?
3. Administration for just the storage environment
For larger companies big enough to have separate server and storage admin teams, IBM continues to offer the full Tivoli Storage Productivity Center product for the storage admins. The most recent release enhanced the support for IBM DS8000, SVC, Storwize V7000 and XIV storage systems.
Today, analysts consider IBM's [Tivoli Storage Productivity Center] one of the leading products in its category. I am glad my original vision has finally come to life, even though it took a while longer than I expected.
To learn more about IBM storage hardware, software or services, see the updated [IBM System Storage] landing page.
To make true advances in any industry or field requires forward thinking—as well as industry insight and experience. It can't be done just by packaging a bag of piece parts and putting a new label on it. But forward thinkers are putting smarter, more powerful technology to uses that were once unimaginable -- either in scale or in progress.
The graphics developed for the IBM Smarter Planet vision are interesting. This one for Infrastructure includes images relating to public utilities, like gas, water and electricity, clouds representing cloud computing, green forests representing the need for energy efficiency and reducing carbon footprint to fight global warming, roads, representing the intricate transportation and traffic systems, highways and city streets that connect us all together, and a printed circuit board, representing the Information Technology that makes all of this possible.
Ironically, I didn't even know I made the final cut until I got three, yes three, separate requests for interviews about it. I already reached the "million hits" milestone. Other people track these things for me, so it will be interesting how much additional traffic my latest [15 minutes of fame] will generate.
Infrastructure is just one of the 25 different areas that IBM's vision for a Smarter Planet is trying to address, including the need for smarter buildings, smarter cities, smarter transportation systems, smarter energy grids, smarter healthcare and public safety, and smarter governments.
In his blog post, [The Lure of Kit-Cars], fellow blogger Chuck Hollis (EMC) uses an excellent analogy delineating the differences between kit-cars you build from parts, versus fully-integrated systems that you can drive off the car dealership showroom lot. The analogy holds relatively well, as IT departments can also build their infrastructure from parts, or you can get fully-integrated systems from a variety of vendors.
Is this what your data center looks like?
Certainly, this debate is not new. In my now infamous 2007 post [Supermarkets and Specialty Shops], I explained that there were clients that preferred to get their infrastructure from a single IT supermarket, like IBM or HP, while others were lured into thinking that buying separate parts from butchers, bakers and candlestick makers and other specialty shops was somehow a better idea.
Chuck correctly explains that in the early years of the automobile industry, before major car manufacturers had mass-production assembly lines, putting a car together from parts was the only way cars were made. Today, only the few most avid enthusiasts build cars this way. The majority get cars from a single seller and drive away. In my post [Resolving the Identity Crisis], I postulated that EMC appeared to be trying to shed itself of the "disk-only specialty shop" image and over to be more like IBM. Not quite a full IT Supermarket, but perhaps more like a [Trader Joe's] premium-priced retailer.
(If you find that EMC's focus on integrated systems appears to be a 180-degree about-face from their historical focus on selling individual best-of-breed products, see my previous discussion of Chuck's contradictions in my blog post: [Is Storage the Next Confusopoly].)
While companies like EMC might be making this transition, there is a lot of resistance and inertia from the customer marketplace. I agree with Chuck, companies should not be building kit-cars or IT infrastructures from parts, certainly not from parts sold from different vendors. In my post [Talking about Solutions not Products], I explained how difficult it was to change behavior. CIOs, IT directors and managers need to think differently about their infrastructure. Let's take a quick look at some choices:
Following Chuck's argument, it makes no sense to build a "kit-car" combining Oracle/Sun servers with EMC storage. Oracle would argue it makes more sense to run on integrated systems, business logic on their "Exalogic" system, and database processing on their "Exadata". Benchmark after benchmark, however, IBM is able to demonstrate that Oracle applications and databases run faster on IBM systems. Customers that want to run Oracle applications can run either on a full Oracle stack, or a full IBM stack, and both do better than a kit-car including EMC parts.
HP has been working hard to keep up with IBM in this area. With their their partnership with Microsoft, and acquisitions of EDS, 3Com and 3PAR, they can certainly make a case for getting a full HP stack rather than a kit-car mixing HP servers with EMC disk storage. The problem is that HP is focused on a converged infrastructure for private cloud computing, but Microsoft is focused on Azure and public cloud computing. It will be interesting when these two big companies sort this out. Definitely watch this space.
If you squint your eyes and focus on the part of the world that only has x86 machines, then Dell can be seen as an IT supermarket. In my post about [Entry-Level iSCSI Offerings], I discuss how Dell's acquisition of EqualLogic was a signal that it was trying to get away from selling EMC specialty shop products, and building up its own set of offerings internally.
Cisco is new on the server scene, but has already made quite a splash. Here, I have to agree with Chuck's logic: the only time it makes sense to buy EMC disk storage at all is when it is part of an integrated "V-block". This is not really an IT supermarket situation, instead you park your car at the "Acadia Mini-Mall" and get what you need from Trader Joe's, Cisco UCS, and VMware stores.
But wait, if what you want is running VMware on Cisco servers, you might be better off with IBM System Storage N series or NetApp storage. In his blog post about [Enhanced Secure Multi-Tenancy], fellow Blogger Val Bercovici (NetApp) provides a convincing argument of why Cisco and VMware run better on an "N-block" rather than a "V-block". IBM N series provides A-SIS deduplication, and IBM Real-time Compression can provide additional capacity and performance improvements. That might be true, but whether you get your storage from EMC, NetApp or IBM, to me, you are still working with three different vendors in any case.
Of course, following Chuck's logic, it makes more sense for people with IBM servers, whether they be mainframes, POWER systems or x86 machines, to integrate these with IBM storage, IBM software and IBM services. IBM is the leading reseller of VMware, but also has a lot of business with Microsoft Hyper-V, Citrix Xen, Linux KVM, PowerVM, PR/SM and z/VM. While IBM has market leading servers, disk and tape systems, to compete for those RFP bids that just ask for one component or another, it prefers to sell fully-integrated systems, which IBM has been doing successfully since the 1950s.
Back in 2007, I mentioned how IBM's fully-integrated InfoSphere Balanced Warehouse [Trounced HP and Sun]. For business analytics, IBM offers the fully-integrated [IBM Smart Analytics Systems]. Today, IBM expanded its line of fully-integrated private cloud service delivery platforms with the announcement of the [IBM CloudBurst for on Power Systems], which does for POWER7 what the IBM CloudBurst for System x, Oracle Exalogic, or Acadia's V-block, do for x86.
IBM estimates that private clouds built on Power systems can be up to 70 percent less expensive than stand alone x86 servers.
Before he earned his PhD in Mechanical Engineering, my father was a car mechanic. I spent much of my teenage years covered in grease, helping my father assembling cars, lifting engines, and rebuilding carburetors. Certainly this was good father-son time, and I certainly did learn something in the process. Like the automobile industry, the IT industry has matured, and it makes no financial sense to build your own IT infrastructure from parts from different vendors.
For a test drive of the industry's leading integrated IT systems, see your IBM sales rep or IBM Business Partner.
Intelligent block-level disk array that virtualizes both internal and external disk storage
8 Gbps FCP and 1GbE iSCSI
IBM Storwize V7000 disk system
Real-time compression appliance for files
10GbE/1GbE CIFS and NFS
Storwize, now an IBM company
IBM Real-time Compression STN-6800 appliance
1GbE CIFS and NFS
IBM Real-time Compression STN-6500 appliance
If you think this is the first time a company like IBM has pulled shenanigans with product names like this, think again. Here are a few posts that might refresh your memory:
In my September 2006 post, [A brand by any other name...] I explain that I started blogging specifically to promote the new "IBM System Storage" product line name, part of the "IBM Systems" brand resulting from merging the "eServer" and "TotalStorage' brands.
In my January 2007 post, [When Names Change], I explain our naming convention for our disk products, including our DS family, SAN Volume Controller and N series.
In my February 2008 post, [Getting Off the Island], I cover how the x/p/i/z designations came about for our various IBM server product lines.
But what about acquisitions? When [IBM acquired Lotus Development Corporation], it kept the "Lotus" brand. New products that fit the "collaboration" function were put under the Lotus brand. I think most people can accept this approach.
But have we ever seen an existing product renamed to an acquired name?
In my post January 2009 post
[Congratulations to Ken on your QCC Milestone], I mentioned that my colleague Ken Hannigan worked on an internal project initially called "Workstation Data Save Facility" (WDSF) which was changed to "Data Facility Distributed Storage Manager" (DFDSM), then renamed to "ADSTAR Distributed Storage Manager" (ADSM), and finally renamed to the name it has today: IBM Tivoli Storage Manager (TSM).
Readers reminded me that [IBM acquired Tivoli Systems, Inc.] in 1996, so TSM could not have been an internally developed product. Ha! Wrong! Let's take a quick history lesson on how this came about:
In the late 1980s, IBM Almaden research had developed a project to backup personal computers and workstations, which they called "Workstation Data Save Facility" or WDSF.
This was turned over to our development team, which immediately discarded the code, and wrote from scratch its replacmeent, called Data Facility Distributed Storage Manager (DFDSM), named similar to the Data Facility products on the mainframe (DFP, DFHSM, DFDSS). As a member of the Data Facility family, DFDSM didn't really fit. The rest processed mainframe data sets, but DFDSM processed Windows and UNIX files. That a version of DFDSM server was available to run on the mainframe was the only connection.
Then, in the early 1990s, there were discussions of possibly splitting IBM into a bunch of smaller "Baby Blues", similar to how [AT&T was split into "Baby Bells"], and how Forbes and Goldman Sachs now want to split Microsoft into [Baby Bills]. IBM considered naming the storage spin-off as ADSTAR, which stood for "Advanced Storage and Retrieval."
Pre-emptively, IBM renamed DFDSM to "ADSTAR Distributed Storage Manager" or ADSM.
Fortunately, in 1993, IBM brought a new sheriff to town, Lou Gerstner, who quickly squashed any plans to split up IBM. He quickly realized that IBM's core strength was building integrated stacks, combining systems, software and services to solve business problems.
In 1996, IBM acquired Tivoli Systems, Inc. to expand its "Systems Management" portfolio, and renamed ADSM over to IBM Tivoli Storage Manager, since "storage management" is an essential part of "systems management". Later, IBM TotalStorage Productivity Center would be renamed to "IBM Tivoli Storage Productivity Center."
I participated in five months of painful meetings to figure out what to name our new internally-developed midrange disk system. Since it ran SAN Volume Controller software, I pushed for keeping the SVC designation somehow. We considered DS naming convention, but the new midrange product would not fit between our existing DS5000 and DS6000 numbering scheme. A marketing agency we hired came up with nonsensical names, in the spirit of product names like Celerra, Centera and CLARiiON, using name generators like [Wordoid]. Luckily, in the nick of time, IBM acquired Storwize for its compression technology, and decided that Storwize as a name was way better fit than any of the names we came up with already.
However, the new IBM Storwize V7000 midrange product had nothing in common with the appliances acquired from Storwize, the company, so to avoid confusion, the latter products were renamed to [IBM Real-time Compression]. Fellow blogger Steven Kenniston, the Storage Alchemist from Storwize fame now part of IBM from the acquisition, gives his perspective on this in his post [Storwize – What is in a Name, Really?]. While I am often critical of the names and terms IBM uses, I have to say this last set of naming decisions makes a lot of sense to me and I support it wholeheartedly.
From New York, Rolf went to London, Paris, Madrid, Morocco, Cairo, South Africa, Bangkok Thailand, Malaysia, Singapore, New Zealand, Australia, and then back to United States. I was hoping to run into him while I was in Australia and New Zealand last month, but our schedules did not line up.
Travelingwithout baggage is more than just a convenience, it is a metaphor for the philosophy that we should keep only what we need, and leave behind what we don't. This was the approach taken by IBM in the design of the IBM Storwize V7000 midrange disk system.
The IBM Storwize V7000 disk system consists of 2U enclosures. Controller enclosures have dual-controllers and drives. Expansion enclosures have just drives. Enclosures can have either 24 smaller form factor (SFF) 2.5-inch drives, or twelve larger 3.5-inch drives. A controller enclosure can be connected up to nine expansion enclosures.
The drives are all connected via 6 Gbps SAS, and come in a variety of speeds and sizes: 300GB Solid-State Drive (SSD); 300GB/450GB/600GB high-speed 10K RPM; and 2TB low-speed 7200 RPM drives. The 12-bay enclosures can be intermixed with 24-bay enclosures on the same system, and within an enclosure different speeds and sizes can be intermixed. A half-rack system (20U) could hold as much as 480TB of raw disk capacity.
This new system, freshly designed entirely within IBM, competes directly against systems that carry a lot of baggage, including the HDS AMS, HP EVA, an EMC CLARiiON CX4 systems. Instead, we decided to keep the what we wanted from our other successful IBM products.
Inspired by our successful XIV storage system, IBM has developed a web-based GUI that focuses on ease-of-use. This GUI uses the latest HTML5 and dojo widgets to provide an incredible user experience.
Borrowed from our IBM DS8000 high-end disk systems, state-of-the-art device adapters provide 6 Gbps SAS connectivity with a variety of RAID levels: 0, 1, 5, 6, and 10.
From our SAN Volume Controller, the embedded [ SVC 6.1 firmware] provides all of the features and functions normally associated with enterprise-class systems, including Easy Tier sub-LUN automated tiering between Solid-State Drives and Spinning disk, thin provisioning, external disk virtualization, point-in-time FlashCopy, disk mirroring, built-in migration capability, and long-distance synchronous and asynchronous replication.
Finally, the various "internal NDA" that kept me from publishing this sooner have expired, so now I have the long-awaited [Inside System Storage: Volume II], documenting IBM's transformation in its storage strategy, including behind-the-scenes commentary about IBM's acquisitions of XIV and Diligent. Available initially in paperback form. I am still working on the hard cover and eBook editions.
For those who have not yet read my first book, Inside System Storage: Volume I, it is still available from my publisher Lulu, in [hard cover], [paperback] and [eBook] editions.
IBM System Storage DS8800
A lesson IBM learned long ago was not to make radical changes to high-end disk systems, as clients who run mission-critical applications are more concerned about reliability, availability and serviceability than they are performance or functionality. Shipping any product before it was ready meant painfully having to fix the problems in the field instead.
(EMC apparently is learning this same lesson now with their VMAX disk system. Their Engenuity code from Symmetrix DMX4 was ported over to new CLARiiON-based hardware. With several hundred boxes in the field, they have already racked up over 150 severity 1 problems, roughly half of these resulted in data loss or unavailability issues. For the sake of our mutual clients that have both IBM servers and EMC disk, I hope they get their act together soon.)
To avoid this, IBM made incremental changes to the successful design and architecture of its predecessors. The new DS8800 shares 85 percent of the stable microcode from the DS8700 system. Functions like Metro Mirror, Global Mirror, and Metro/Global Mirror, are compatible with all of the previous models of the DS8000 series, as well as previous models of the IBM Enterprise Storage Server (ESS) line.
The previous models of DS8000 series were designed to take in cold air from both front and back, and route the hot air out the top, known as chimney design. However, many companies are re-arranging their data centers into separate cold aisles and hot aisles. The new DS8800 has front-to-back cooling to help accommodate this design.
My colleague Curtis Neal would call the rest of this a "BFD" announcement, which of course stands for "Bigger, Faster and Denser". The new DS8800 scales-up to more drives than its DS8700 predecessor, and can scale-out from a single-frame 2-way system to a multi-frame 4-way system. IBM has upgraded to faster 5GHz POWER6+ processors, with dual-core 8 Gbps FC and FICON host adapters, 8 Gbps device adapters, and 6 Gbps SAS connectivity to smaller form factor (SFF) 2.5-inch SAS drives. IBM Easy Tier will provide sub-LUN automated tiering between Solid-State Drives and spinning disk. The denser packaging with SFF drives means that we can pack over 1000 drives in only three frames, compared to five frames required for the DS8700.
The [IBM System Storage SAN Volume Controller] software release v6.1 brings Easy Tier sub-LUN automated tiering to the rest of the world. IBM Easy Tier moves the hottest, most active extents up to Solid-State Drives (SSD) and moves the coldest, least active down to spinning disk. This works whether the SSD is inside the SVC 2145-CF8 nodes, or in the managed disk pool.
Tired of waiting for EMC to finally deliver FAST v2 for your VMAX? It has been 18 months since they first announced that someday they would have sub-LUN automatic tiering. What is taking them so long? Why not virtualize your VMAX with SVC, and you can have it sooner!
SVC 6.1 also upgrades to a sexy new web-based GUI, which like the one for the IBM Storwize V7000, is based on the latest HTML5 and dojo widget standards. Inspired by the popular GUI from the IBM XIV Storage System, this GUI has greatly improved ease-of-use.
A client asked me to explain "Nearline storage" to them. This was easy, I thought, as I started my IBM career on DFHSM, now known as DFSMShsm for z/OS, which was created in 1977 to support the IBM 3850 Mass Storage System (MSS), a virtual storage system that blended disk drives and tape cartridges with robotic automation. Here is a quick recap:
Online storage is immediately available for I/O. This includes DRAM memory, solid-state drives (SSD), and always-on spinning disk, regardless of rotational speed.
Nearline storage is not immediately available, but can be made online quickly without human intervention. This includes optical jukeboxes, automated tape libraries, as well as spin-down massive array of idle disk (MAID) technologies.
Offline storage is not immediately available, and requires some human intervention to bring online. This can include USB memory sticks, CD/DVD optical media, shelf-resident tape cartridges, or other removable media.
Sadly, it appears a few storage manufacturers and vendors have been misusing the term "Nearline" to refer to "slower online" spinning disk drives. I find this [June 2005 technology paper from Seagate], and this [2002 NetApp Press Release], the latter of which included this contradiction for their "NearStore" disk array. Here is the excerpt:
"Providing online access to reference information—NetApp nearline storage solutions quickly retrieve and replicate reference and archive information maintained on cost-effective storage—medical images, financial models, energy exploration charts and graphs, and other data-intensive records can be stored economically and accessed in multiple locations more quickly than ever"
Which is it, "online access" or "nearline storage"?
If a client asked why slower drives consume less energy or generate less heat, I could explain that, but if they ask why slower drives must have SATA connections, that is a different discussion. The speed of a drive and its connection technology are for the most part independent. A 10K RPM drive can be made with FC, SAS or SATA connection.