This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this blog will no longer be available. More details available on our FAQ.
Continuing my drawn out coverage of IBM's big storage launch of February 9, today I'll cover the IBM System Storage TS7680 ProtecTIER data deduplication gateway for System z.
On the host side, TS7680 connects to mainframe systems running z/OS or z/VM over FICON attachment, emulating an automated tape library with 3592-J1A devices. The TS7680 includes two controllers that emulate the 3592 C06 model, with 4 FICON ports each. Each controller emulates up to 128 virtual 3592 tape drives, for a total of 256 virtual drives per TS7680 system. The mainframe sees up to 1 million virtual tape cartridges, up to 100GB raw capacity each, before compression. For z/OS, the automated library has full SMS Tape and Integrated Library Management capability that you would expect.
Inside, the two control units are both connected to a redundant pair cluster of ProtecTIER engines running the HyperFactor deduplication algorithm that is able to process the deduplication inline, as data is ingested, rather than post-process that other deduplication solutions use. These engines are similar to the TS7650 gateway machines for distributed systems.
On the back end, these ProtecTIER deduplication engines are then connected to external disk, up to 1PB. If you get 25x data deduplication ratio on your data, that would be 25PB of mainframe data stored on only 1PB of physical disk. The disk can be any disk supported by ProtecTIER over FCP protocol, not just the IBM System Storage DS8000, but also the IBM DS4000, DS5000 or IBM XIV storage system, various models of EMC and HDS, and of course the IBM SAN Volume Controller (SVC) with all of its supported disk systems.
HDP brings the performance benefits of automated wide striping and HDT automatically keeps the hot pages of data on the highest performance tier of storage for mainframes, just as it does for open systems. There are differences between open systems and mainframe implementation due to mainframe CKD and CCHHR formats for instance, the page size is optimized for mainframe storage formats and storage reclamation must be host initiate. For more information check out our website: http://www.hds.com/assets/pdf/how-to-apply-latest-advances-in-hitachi-mainframe-storage.pdf
There are also additional performance efficiencies specific for mainframes.
Mainframe HDP is the foundation for Extended Addressable Volumes, which increases the size of 3390 volumes from 65,520 cylinders to 262,668 cylinders. This, along with HyperPAV--which facilitates multiple accesses to a volume, addressing the problem of queuing on a very large volume with a single UCB--enhances throughput with many more concurrent I/O operations.
The thin provisioning of HDP also increases the performance of mainframe functions that move, copy, or replicate these thin volumes like Concurrent Copy, FlashCopy V02, and HUR, since the actual volumes are smaller.
If you have mainframes, check out the capacity and performance efficiency of VSP with HDP and HDT.
At this point, you might be wondering: "If Hu Yoshida deleted his blog post, how did Tony get a copy of it? Did Tony save a copy of the HTML source before Hu deleted it?" No. I should have, in retrospect, in case lawyers got involved. It turns out that deleting a blog post does not clear the various copies in various RSS Feed Reader caches. I was able to dig out the previous version from the vast Google repository. (Many thanks to my friends at Google!!!).
(Lesson to all bloggers: If you write a blog post, and later decide to remove it for whatever legal, ethical, moral reasons, it is better to edit the post to remove offending content, and add a comment that the post was edited, and why. Shrinking a 700-word article down to 'Sorry Folks - I decided to remove this blog post because...' would do the trick. This new edited version will then slowly propagate across to all of the RSS Feed Reader caches, eliminating most traces to the original. Of course, the original may have been saved by any number of your readers, but at least if you have an edited version, it can serve as the official or canonical version.)
Perhaps there was a reason why HDS did not want to make public the FUD its sales team use in private meetings with IBM mainframe clients. Whatever it was, this appears to be another case where the cover-up is worse than the original crime!
Continuing my saga regarding my [New Laptop], I managed on
[Wednesday afternoon] to prepare my machine with separate partitions for programs and data. I was hoping to wrap things up on day 2 (Thursday), but nothing went smoothly.
Just before leaving late Wednesday evening, I thought I would try running the "Migration Assistant" overnight by connecting the two laptops with a REGULAR Ethernet cable. The instructions indicated that in "most" cases, two laptops can be connected using a regular "patch cord" cable. These are the kind everyone has, the connects their laptop to the wall socket for wired connection to the corporate intranet, or their personal computers to their LAN hubs at home. Unfortunately, the connection was not recognized, so I suspected that this was one of the exceptions not covered.
(There are two types of Ethernet cables. The ["patch cord"] connects computers to switches. The ["crossover" cable] connects like devices, such as computers to computers, or switches to switches. Four years ago, I used a crossover cable to transfer my files over, and assumed that I would need one this time as well.)
Thursday morning, I borrowed a crossover cable from a coworker. It was bright pink and only about 18 inches long, just enough to have the two laptops side by side. If the pink crossover cable were any shorter, the two laptops would be back to back. I kept the old workstation in the docking station, which allowed it to remain connected to my big flat screen, mouse, keyboard and use the docking stations RJ45 to connect to the corporate intranet. That left the RJ45 on the left side of the old system to connect via crossover cable to the new system. But that didn't work, of course, because the docking station overrides the side port, so we had to completely "undock" and go native laptop to laptop.
Restarting the Migration Assistant, I unplug the corporate intranet cable from the old laptop, put one end of the pink cable into each Ethernet port of each laptop. On the new system, Migration Assistant asks to setup a password and provides an IP address like 169.254.aa.bb with a netmask of 255.255.0.0 and I am supposed to type this IP address over on the old system for it to reach out and connect. It still didn't connect.
We tried a different pink crossover cable, no luck. My colleague Harley brought over his favorite "red" crossover cable, that he has used successfully many times, but still didn't work. The helpful diagnostic advice was to disable all firewall programs from one or both systems.
I disabled Symantec Client Firewall on both systems. Still not working. I even tried booting both systems up in "safe" mode, using MSCONFIG to set the reboot mode as "safe with networking" as the key option. Still not working. At this point, I was afraid that I would have to use the alternate approach, which was to connect both systems to our corporate 100 Mbps system, which would be painfully slow. I only have one active LAN cable in my office, so the second computer would have to sit outside in the lobby.
Looking at the IP address on the old system, it was 9.11.xx.yy, assigned by our corporate DHCP, so not even in the same subnet of the new computer. So, I created profiles on ThinkVantage Access Connections on both systems, with 192.168.0.yy netmask 255.255.255.0 on the old system, and 192.168.0.bb on the new system. This worked, and a connection between the two systems was finally recognized.
Since I had 23GB of system files and programs on my old C: drive, and 80GB of data on my old D: drive, I didn't think I would run out of space on my new 40GB C: drive and 245GB D: drive, but it did! The Migration Assistant wanted to my D:\Documents on my new C: drive and refused to continue. I had to turn off D:\Documents from the list so that it could continue, processing only the programs and system settings on C: drive. It took 61 minutes to scan 23GB on my C: drive, identify 12,900 files to move, representing 794MB of data. Seriously? Less than 1GB of data moved!
It then scanned all of the programs I had on my old system, and decided that there were none that needed to be moved or installed on the new system. The closing instructions explained there might be a few programs that need to be manually installed, and some data that needed to be transferred manually.
Given the performance of Migration Assistant, I decided to just setup a direct Network Mapping of the new D: drive as Y: on my old system, and just drag and drop my entire folder over. Even at 1000 Mbps, this still took the rest of the day. I also backed up C:\Program Files using [System Rescue CD] to my external USB drive, and restored as D:\prog-files, just in case. In retrospect, I realize it would have been faster just to have dumped my D: drive to my USB drive, and restore it on the new system.
I'll leave the process of re-installing missing programs for Friday.
Here are some upcoming events related to IBM Storage!
If you sell IBM and/or Oracle solutions, please join me for IBM Oracle Virtual University 2013!
A few weeks ago, I recorded a session on IBM Storage: Overview, Positioning and How to Sell that will be available on demand starting tomorrow, February 26th, at the IBM Oracle Virtual University 2013.
It's one of 65 new sessions that will help IBM to surround Oracle applications with IBM infrastructure, services and industry solutions. Oracle software, after all, runs best on IBM hardware. Other highlights of Oracle Virtual University include a live executive State of the Alliance session with Q&A, Oracle keynote, updates by Oracle product managers, sessions on PureSystems, Selling IBM into an Oracle environment, Cloud, and much more.
There will be live technical teams on hand throughout launch day to answer your questions in real time, so I hope you can carve out 30 minutes or more on February 26th to take advantage of these available resources.
After helping launch the first Pulse back in 2008, I have sadly not been back since. Last year, I was invited to attend as a last-minute replacement for another speaker, but I was busy [having emergency surgery].
This year's [Pulse 2013] conference looks amazing. It will be held in Las Vegas, Nevada. Guest Speaker Payton Manning, NFL 4-time MVP football player, and Carrie Underwood, 6-time Grammy award winner, join IBM's Software Group executives and experts on how IBM Tivoli can help optimize your IT infrastructure.
Sadly, once again, I will not be there at Pulse. This time, I will be on the East Coast visiting clients instead, but my on-premise correspondent, Tom Rauchut, has informed me that he will be there. Hopefully, he will provide me something to write about.
Later in March, I will be in Brussels, Belgium for the Storage Expo. This is held March 20-21, at the Brussels-Expo venue. I will be presenting several topics each day, as well as visit clients in the area. This event comes on behalf of IBM Belgium in association with IBM Business Partner IRIS-ICT.
If you plan to participate in any of these events, let me know!
Recently, a client asked how to backup their IBM PureData System for Analytics devices. IBM had [acquired Netezza in November 2010], and later renamed their TwinFin devices as the IBM PureData for Analytics, powered by Netezza.
The [IBM PureData System for Analytics] is incredibly fast for performing deep, ad-hoc analytics. However, the people who use them are "data scientists", not backup experts.
Likewise, there are backup administrators who may not be familiar with the unique characteristics of this expert-integrated system to know what backup options are available.
As with the rest of the IBM PureSystems line, the IBM PureData System for Analytics (or, PDA for short) has a combination of servers, storage and switches inside.
In a full-frame PDA, there are two servers in Active/Passive mode, these coordinate activity to FPGA-based blade servers, which have parallel access to hundreds of disk drives, storing nearly 200 TB of compressed database data. A system can span up to four frames.
But what do you backup? And why? You don't need to worry about backing up the Linux operating system or NPS server code, that is considered firmware and if anything every got corrupted, IBM would help restore it for you. System-wide metadata, such as the host catalog and global users, groups, and permissions should be backed up periodically to protect against data corruption.
There are a number of reasons to backup your user databases:
As part of firmware upgrade/downgrade
To transfer data to another system
Protect against hardware failure / disaster
Protect against data corruption
The PDA has three backup formats. You can backup the entire user database in compressed format, backup individual tables in compressed format, or export to a text-format file.
Compressed format is faster, but can only be restored to the same PDA, or a PDA that has the same or higher level of NPS firmware. The text-format is slower, but can be used to restore to lower levels of NPS firmware, or to other database systems.
There are basically two methods to backup your PDA. The first is called the "Filesystem" method. Basically, you can attach an external storage device to the NPS server, and use the built-in command line interface (CLI) to store the backups onto its file system.
On NPS version 6, the nzhostbackup will backup the /nz/data directory which stores the system tables, database catalogs, configuration files, query plans, and cached executable code for the SPU blade servers.
(I have heard that the nzhostbackup will get deprecated in NPS version 7, but I only have access to version 6. As always, [RTFM] for your specific NPS code level.)
The nzbackup with the users parameter will backup the global users, groups and permissions. This is included in the /nz/data backup contents from the nzhostbackup command, but you may want to backup and restore these separately.
The nzbackup with the db parameter will backup a user database in compressed format. To backup individual tables, use the CREATE EXTERNAL TABLE command, which can create compressed or text-format exports.
You may find that your databases are so large, they will exceed the limits of the filesystem on the external storage device. For SAN or NAS deployments, I recommend the IBM Storwize V7000 Unified with IBM General Parallel File System (GPFS). However, if you are using something else, you may need to use the "nz_backup" scripts provided which split up the backup images into smaller pieces that most other filesystems can handle.
The PDA comes with 10GbE Ethernet ports that you can attach a NAS storage device over a Local Area Network (LAN), or add Fibre Channel Protocol (FCP) ports and connect over a Storage Area Network (SAN). To keep things simple, I will refer to whichever network you decide as the "Backup Network" in the drawings.
The second method for backup is called the "External Backup Software" method. As you have probably guessed, it involves sending the backups to a supported software product like IBM Tivoli Storage Manager (or, TSM for short).
In this case, the PDA acts as a client node, similar to a laptop, desktop, or application server with internal disk. Backup data is sent over the LAN to the designated TSM server, and the TSM server in turn writes over the SAN to its storage hierarchy of disk, virtual tape and/or physical tape resources.
Backups can be done by command "on demand", or automated on a schedule. For the /nz/data directory, direct the nzhostbackup command to send the backup copy to local disk, then use TSM's dsmc archive command to transfer this backup copy to the TSM server.
For nzbackup with the users or db parameters, you can send the data directly to the appropriate TSM server by specifying the connector and connectorArgs parameters.
To reduce traffic on the TSM Server, an intermediary "TSM Proxy Node" can be put in between. In this case, the PDA sends the backup to the Proxy Node, the Proxy Node uses a "LAN Free Storage Agent" to send the backups directly to the virtual tape and/or physical tape, and then notifies the TSM Server to updates its system catalog to record which tape holds these new backups.
Another configuration involves installing the TSM LAN Free storage agent directly on the PDA. While this will require FCP ports to be added and consume more CPU resources on the NPS server, it eliminates most of the LAN traffic, allowing the PDA to send its backups directly to virtual or physical tape.
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday breakout sessions.
Aging Data: The Challenges of Long-Term Data Retention
The analyst defined "aging data" to be any data that is older than 90 days. A quick poll of the audience showed the what type of data was the biggest challenge:
In addition to aging data, the analyst used the term "vintage" to refer to aging data that you might actually need in the future, and "digital waste" being data you have no use for. She also defined "orphaned" data as data that has been archived but not actively owned or managed by anyone.
You need policies for retention, deletion, legal hold, and access. Most people forget to include access policies. How are people dealing with data and retention policies? Here were the poll results:
The analyst predicts that half of all applications running today will be retired by 2020. Tools like "IBM InfoSphere Optim" can help with application retirement by preserving both the data and metadata needed to make sense of the information after the application is no longer available. App retirement has a strong ROI.
Another problem is that there is data growth in unstructured data, but nobody is given the responsibility of "archivist" for this data, so it goes un-managed and becomes a "dumping ground". Long-term retention involves hardware, software and process working together. The reason that purpose-built archive hardware (such as IBM's Information Archive or EMC's Centera) was that companies failed to get the appropriate software and process to complete the solution.
Cloud computing will help. The analyst estimates that 40 percent of new email deployments will be done in the cloud, such as IBM LotusLive, Google Apps, and Microsoft Online365. This offloads the archive requirement to the public cloud provider.
A case study is University of Minnesota Supercomputing Institute that has three tiers for their storage: 136TB of fast storage for scratch space, 600TB of slower disk for project space, and 640 TB of tape for long-term retention.
What are people using today to hold their long-term retention data? Here were the poll results:
Bottom line is that retention of aging data is a business problem, techology problem, economic problem and 100-year problem.
A Case Study for Deploying a Unified 10G Ethernet Network
Brian Johnson from Intel presented the latest developments on 10Gb Ethernet. Case studies from Yahoo and NASA, both members of the [Open Data Center Alliance] found that upgrading from 1Gb to 10Gb Ethernet was more than just an improvement in speed. Other benefits include:
45 percent reduction in energy costs for Ethernet switching gear
80 percent fewer cables
15 percent lower costs
doubled bandwidth per server
Ruiping Sun, from Yahoo, found that 10Gb FCoE achieved 920 MB/sec, which was 15 percent faster than the 8Gb FCP they were using before.
IBM, Dell and other Intel-based servers support Single Root I/O Virtualization, or SR-IOV for short. NASA found that cloud-based HPC is feasible with SR-IOV. Using IBM General Parallel File System (GPFS) and 10Gb Ethernet were able to replace a previous environment based on 20 Gbps DDR Infiniband.
While some companies are still arguing over whether to implement a private cloud, an archive retention policy, or 10Gb Ethernet, other companies have shown great success moving forward!
Well, it's Tuesday again, and you know what that means! IBM Announcements!
Today also happens to be [Election Day] in the United States, and some have questioned IBM's logic of making major storage announcements on Election Day. During the campaigns, a major theme was to help Small and Medium size businesses, because these are the engines of economic growth and improved employment.
Hopefully, you all saw today's Launch Webcast on these announcements, but in case you missed it, waiting in line at the polling station to cast your vote, or caught without electricity or Internet access from [Superstorm Sandy], it is now available [On-Demand].
The 2U control enclosure can have up to four additional 2U expansion enclosures, for a maximum of 120 drives, or 180TB of raw disk capacity. Like the Storwize V7000, the Storwize V3700 supports a [large number of servers and operating systems.]
Many of the features you already know from the Storwize V7000 are carried forward:
1GbE iSCSI + 8GbFC
8GbFC, 10GbE iSCSI/FCoE, Statement of Direction for 6Gb SAS
8GB per canister
4GB per canister, upgradeable to 8GB
Up to 4 control enclosures in a clustered system, each with up to 9 expansion enclosures
Up to 4 expansion enclosures
Maximum Number of drives/TB
Up to 120 drives/180TB
RAID levels supported
GUI, CLI, SMI-S API
GUI, CLI, SMI-S API
Internal (included), external (optional)
Internal only (included)
Non-disruptive data migration
One-directional (migrate to Storwize V3700, included)
Statement of direction
Up to 256 targets (included)
Up to 64 targets (included) Statement of Direction for optional 2,040 targets
Metro Mirror and Global Mirror (optional)
Statement of Direction (optional)
The IBM Storwize V3700 is offered at attractive leasing options through IBM Global Financing.
IBM LTO-6 drives and midrange tape libraries
Last month, IBM's [Tape and Storage Hypervisor Announcements] included LTO-6 for the enterprise-class TS3500 tape library. Today, the LTO-6 support is complete with support for midrange tape drives and libraries.
There are two tape drive models. The TS2260 is based on the half-height drive, intended for occasional 9-to-5 usage. The TS2360 is based on the full-height drive, intended for 24x7 access. These drives can read LTO-4 and LTO-5 tape cartridge media, and can write LTO-5 cartridge media. The new LTO-6 tape cartridge media is expected to be available next month.
In addition to the IBM TS3500 Enterprise Tape Library, LTO-6 is now supported on all of the midrange tape libraries: TS2900, TS3100, TS3200 and TS3310.
IBM Linear Tape File System Library Edition V2.1.2
There are two levels of [Linear Tape File System], or LTFS for short. The first is the Single Drive Edition (LTFS-SDE), which allows you to attach an LTO-5, LTO-6 or TS1140 tape drive to a single workstation, and allow you to mount tape cartridges as easy as mounting USB memory sticks. This presents a full file system view that allows you to read, edit, create, and even drag-and-drop files to other file systems. The LTFS-SDE driver is available for Windows, Linux, and Mac OS.
The second is the Library Edition (LTFS-LE), which allows you to mount the entire tape library as a file system. Each tape cartridge in the library is presented as a subdirectory folder, that you can access like any file system on disk. This was only available for Linux systems, which could then export the files through NFS, FTP or HTTP protocols to other clients. Now, with release v2.1.2, LTFS-LE supports Windows servers, so that you can share the files with other clients through CIFS as well.
Continuing my series on a [Laptop for Grandma], I thought I would pursue some of the "low-RAM" operating system choices. Grandma's Thinkpad R31 has only 384MB of RAM.
All of the ones below are based on Linux. For those who aren't familiar with installing or running the Linux operating system, here are some helpful tips:
Most Linux distributors allow you to download an ISO file for free. These can be either (a) burned to a CD, (b) burned to a DVD, or (c) written to a USB memory stick.
The ISO can be either a "LiveCD/LiveDVD" version, an installation program, or a combination of the two. The "Live" version allows you to boot up and try out the operating system without modifying the contents of your hard drive. Windows and Mac OS users can try out Linux without impact to their existing environment. Some Linux distributions offer both a full LiveCD+Installer version, as well as an alternate text-based Installer-only version. The latter often requires less RAM to use.
When installing, it is best to have the laptop plugged in to an electrical outlet, and hard-wired to the internet in case it needs to download the latest drivers for your particular hardware.
A CD can hold only 700MB. Many of the newer Linux distributions exceed that, requiring a DVD or USB stick instead. If your laptop has an older optical drive, it may not be able to read DVD media. Some older optical drives can only read CD's, not burn them. In my case, I burned the CDs on another machine, and then used them on grandma's Thinkpad R31.
To avoid burning "a set of coasters" when trying out multiple choices, consider using rewriteable optical media, or the USB option. If you don't like it, you can re-use for something else.
The program [Unetbootin] can take most ISO files and write them to a bootable USB stick. On my Red Hat Enterprise Linux 6 laptop, I had to also install p7zip and p7zip-plugins first.
The BIOS on some older machines, like my grandma's Thinkpad R31, cannot boot from USB. The [PLoP Boot Manager] allows you to first boot from floppy or CD-ROM, and then allows you to boot from the USB. This worked great on my grandma's system. The PLoP Boot Manager is also available on the [Ultimate Boot CD].
While I am a big fan of SUSE, Red Hat, and Ubuntu, these all require more RAM than available on grandma's laptop. Here are some Low-RAM alternatives I tried:
Damn Small Linux 4.11 RC2
The Damn Small Linux [DSL] project was dormant since 2008, but has a fresh new release for 2012. This baby can run in as little 16MB or RAM! If you have 128MB of RAM or more, the OS can run entirely from RAM, providing much faster performance.
Of course, there are always trade-offs, and in this case, apps were chosen for their size and memory footprint, not necessarily for their user-friendliness and eye candy. For example, the xMMS plays MP3 music, but I did not find it as friendly as iTunes or Rhythmbox.
Boot time is fast. From hitting the power-on button to playing the first note of MP3 music was about 1 minute.
Installing DSL Linux on the hard drive converts it into a Debian distribution, which then allows more options for applications.
Next up was [MacPup]. The latest version is 529, based on Pupply Linux 5.2.60 Precise, compatible with Ubuntu 12.04 Precise Pangolin. While traditional Puppy Linux clutters the screen with apps, the MacPup tries to have the look-and-feel of the MacOS by having a launcher tray at the bottom center of the screen.
Both MacPup and Puppy Linux can run in very small amounts of RAM and disk space. Like DSL above, you can opt to run MacPup entirely in 128MB of RAM. Unfortunately, the trade-off is a lack of application choices.
Installation to the hard drive was quite involved, certainly not for the beginner. First, you have to use Gparted to partition the disk. I created a 19GB (sda1) for my files, and 700MB (sda5) for swap. I had troubles with "ext4" file system, so re-formatted to "ext3". Second, you have to copy the files over from the LiveCD using the "Puppy Universal Installer". Third, you have to set up the Bootloader. Grub didn't work, so I installed Grub4Dos instead.
The music app is called "Alsa Player", and I was able to drag the icon into the startup tray. time-to-first-note was just over 1 minute. Fast, but not as "simple-to-use" as I would like.
SliTaz 4.0 claims to be able to run in as little as 48MB of RAM and 100MB of disk space. Time-to-first-note was similar to MacPup, but I didn't care for the TazPanel for setup, and the TazPkg for installing a limited set of software packages. I could not get Wi-Fi working at all on SliTaz, and just gave up trying.
All three of these ran on grandma's Thinkpad R31, and all three could play MP3 music. However, I was concerned that they were not as simple to use as grandma would like, and I would be concerned the amount of time and effort I might have to spend if things go wrong.
Over on the Tivoli Storage Blog, there is an exchange over the concept of a "Storage Hypervisor". This started with fellow IBMer Ron Riffe's blog post [Enabling Private IT for Storage Cloud -- Part I], with a promise to provide parts 2 and 3 in the next few weeks. Here's an excerpt:
"Storage resources are virtualized. Do you remember back when applications ran on machines that really were physical servers (all that “physical” stuff that kept everything in one place and slowed all your processes down)? Most folks are rapidly putting those days behind them.
In August, Gartner published a paper [Use Heterogeneous Storage Virtualization as a Bridge to the Cloud] that observed “Heterogeneous storage virtualization devices can consolidate a diverse storage infrastructure around a common access, management and provisioning point, and offer a bridge from traditional storage infrastructures to a private cloud storage environment” (there’s that “cloud” language). So, if I’m going to use a storage hypervisor as a first step toward cloud enabling my private storage environment, what differences should I expect? (good question, we get that one all the time!)
The basic idea behind hypervisors (server or storage) is that they allow you to gather up physical resources into a pool, and then consume virtual slices of that pool until it’s all gone (this is how you get the really high utilization). The kicker comes from being able to non-disruptively move those slices around. In the case of a storage hypervisor, you can move a slice (or virtual volume) from tier to tier, from vendor to vendor, and now, from site to site all while the applications are online and accessing the data. This opens up all kinds of use cases that have been described as “cloud”. One of the coolest is inter-site application migration.
A good storage hypervisor helps you be smart.
Application owners come to you for storage capacity because you’re responsible for the storage at your company. In the old days, if they requested 500GB of capacity, you allocated 500GB off of some tier-1 physical array – and there it sat. But then you discovered storage hypervisors! Now you tell that application owner he has 500GB of capacity… What he really has is a 500GB virtual volume that is thin provisioned, compressed, and backed by lower-tier disks. When he has a few data blocks that get really hot, the storage hypervisor dynamically moves just those blocks to higher tier storage like SSD’s. His virtual disk can be accessed anywhere across vendors, tiers and even datacenters. And in the background you have changed the vendor storage he is actually sitting on twice because you found a better supplier. But he doesn’t know any of this because he only sees the 500GB virtual volume you gave him. It’s 'in the cloud'."
"Let’s start with a quick walk down memory lane. Do you remember what your data protection environment looked like before virtualization? There was a server with an operating system and an application… and that thing had a backup agent on it to capture backup copies and send them someplace (most likely over an IP network) for safe keeping. It worked, but it took a lot of time to deploy and maintain all the agents, a lot of bandwidth to transmit the data, and a lot of disk or tapes to store it all. The topic of data protection has modernized quite a bit since then.
Fast forward to today. Modernization has come from three different sources – the server hypervisor, the storage hypervisor and the unified recovery manager. The end result is a data protection environment that captures all the data it needs in one coordinated snapshot action, efficiently stores those snapshots, and provides for recovery of just about any slice of data you could want. It’s quite the beautiful thing."
At this point, you might scratch your head and ask "Does this Storage Hypervisor exist, or is this just a theoretical exercise?" The answer of course is "Yes, it does exist!" Just like VMware offers vSphere and vCenter, IBM offers block-level disk virtualization through the SAN Volume Controller(SVC) and Storwize V7000 products, with a full management support from Tivoli Storage Productivity Center Standard Edition.
SVC has supported every release of VMware since the 2.5 version. IBM is the leading reseller of VMware, so it makes sense for IBM and VMware development to collaborate and make sure all the products run smoothly together. SVC presents volumes that can be formatted for VMFS file system to hold your VMDK files, accessible via FCP protocol. IBM and VMware have some key synergies:
Management integration with Tivoli Storage Productivity Center and VMware vCenter plug-in
VAAI support: Hardware-assisted locking, hardware-assisted zeroing, and hardware-assisted copying. Some of the competitors, like EMC VPLEX, don't have this!
Space-efficient FlashCopy. Let's say you need 250 VM images, all running a particular level of Windows. A boot volume of 20GB each would consume 5000GB (5 TB) of capacity. Instead, create a Golden Master volume. Then, take 249 copies with space-efficient FlashCopy, which only consumes space for the modified portions of the new volumes. For each copy, make the necessary changes like unique hostname and IP address, changing only a few blocks of data each. The end result? 250 unique VM boot volumes in less than 25GB of space, a 200:1 reduction!
Support for VMware's Site Recovery Manager using SVC's Metro Mirror or Global Mirror features for remote-distance replication.
Data center federation. SVC allows you to seamlessly do vMotion from one datacenter to another using its "stretched cluster" capability. Basically, SVC makes a single image of the volume available to both locations, and stores two physical copies, one in each location. You can lose either datacenter and still have uninterrupted access to your data. VMware's HA or Fault Tolerance features can kick in, same as usual.
But unlike tools that work only with VMware, IBM's storage hypervisor works with a variety of server virtualization technologies, including Microsoft Hyper-V, Xen, OracleVM, Linux KVM, PowerVM, z/VM and PR/SM. This is important, as a recent poll on the Hot Aisle blog indicates that [44 percent run 2 or more server hypervisors]!
Join the conversation! The virtual dialogue on this topic will continue in a [live group chat] this Friday, September 23, 2011 from 12 noon to 1pm EDT. Join me and about 20 other top storage bloggers, key industry analysts and IBM Storage subject matter experts to discuss storage hypervisors and get questions answered about improving your private storage environment.
I took over a hundred pictures at this event. Here are a few of my favorites from Monday and Tuesday.
The IBM Booth #1111 Moscone South
I spent most of my time at the booth in the exhibition area. It was a huge booth, covering various software offerings in the front, and servers and storage systems in the back. Here I am next to the "IBM Watson" simulator, allowing people to play Jeopardy! game against Watson.
In the front was "EoS" which stands for "Exchanging Opinions for Solutions" -- an interactive screen developed by Somnio that allows people to enter questions and opinions and get crowd-sourced answers from people following the Twitter stream. The EoS was connected to the [IBM Mobile App] so people could follow the conversation.
IBM Customer Appreciation Events
On Monday evening we had some customer appreciation events. First was for IBM customers of "JD Edwards", which runs on "IBM i" operating system on POWER servers. This was an elegant affair at the [Weinstein Gallery] surrounded by works of art by Pablo Picasso and Marc Chagall. One customer expressed concern that Oracle would functionally stabilize JD Edwards "World" software and force everyone to move over to "Enterprise One". I told him that I had seen the roadmap for "World" and there are three healthy releases planned for its future. He should have nothing to worry about. IBM and Oracle will work together to make sure our mutual customers get the solutions they need.
Later, we went to the "Infusion" bar for another "IBM appreciation" event with a live band. Here's a Polaroid photo taken of me in the crowd.
Titan Gala Award Reception
On Tuesday night, Oracle gave out awards in 29 categories. IBM won three this year. I took a photo with the ladies from Beach Blanket Babylon, and a mermaid! Joining me to celebrate the awards were IBMers Carolann Kohler, Boyd Fenton, Sue Haad, and Susan Adomovich.
This is my first time attending Oracle OpenWorld, so naively I asked why there were only 29 categories and not an even 30. The IBMers joked that the 30th might as well have been "Best Server/Storage Platform for Integer Math" which Larry Ellison conceded that IBM's POWER 795 server wins over Oracle's new SPARC T4 Supercluster. As Larry said during his keynote "We still have some work to do to beat IBM!"
The event was held at the San Francisco City Hall, I got to walk on the red carpet, with lavish food and drink. I was even given a hand-rolled cigar! Thank you Oracle! We are proud to be your "Diamond Partner" helping our mutual customers get the most out of our solutions.
The "Booth Babes" Controversy
At the EMC booth, these three lovely ladies, Jennifer, Tamara and Manuela, were just a few of the dozen so-called booth babes EMC hired from a local agency. Attendees with technical questions were directed to the EMC guys in the back of the booth, behind the wall.
IBM stopped using "booth babes" a long while ago. At IBM Booth #1111, we had a healthy balance of real men and women executives, technical experts, and support staff at the IBM booth.
A guy from EMC came over to our booth later to explain that EMC is at two other events this same week, and their technical staff is spread thin. EMC is a small company, and skilled technical people are in short supply. We get it. Not every IT vendor has an army of experts in every category like IBM.
I want to thank the IBM-Oracle Alliance team, especially Nancy Spurry and Carolann Kohler for having me involved in these events.
Last week, in Computer Technology Review's article [Tiering: Scale Up? Scale Out? Do Both], Mark Ferelli interviews fellow blogger Hu Yoshida, CTO of Hitachi Data Systems (HDS). Here's an excerpt:
"MF/CTR: A global cache should be required to implement that common pool that you’re talking about going across all tiers.
Hu/HDS: Right. So that is needed to get to all the resources. Now with our system, we can also attach external storage behind it for capacity so that as the storage ages out or becomes less active we can move it to the external storage. They would certainly have less performance capability, but you don’t need it for the stale data that we’re aging down. Right now we’re the only vendor that can provide this type of tiering.
If you look at other people who do virtualization like IBM’s SVC, the SVC has no storage within it because it’s sitting so if you attach any storage behind it, there is some performance degradation because you have this appliance sitting in front. That appliance is also very limited in cache and very limited in the number of storage boards on it. It cannot really provide you additional performance than what is attached behind it. And in fact, it will always degrade what is attached behind it because it’s not storage, where as our USP is storage and it has a global cache and it has thousands of port connections, load balancing and all that. So our front end can enhance existing storage that sits behind it."
This is not the first time I have had to correct Hu and others of misperceptions of IBM's SAN Volume Controller (SVC). This month marks my four year "blogoversary", and I seem to spend a large portion of my blogging time setting the record straight. Here are just a few of my favorite posts setting the record straight on SVC back in 2007:
Since day 1, SAN Volume Controllers has focused primarily on external storage. Initially, the early models had just battery-protected DRAM cache memory, but the most recent model of the SVC, the 2145-CF8, adds support for internal SLC NAND flash solid state drives. To fully appreciate how SVC can help improve the performance of the disks that are managed, I need to use some visual aids.
In this first chart, we look at a 70/30/50 workload. This indicates that 70 percent of the IOPS are reads, 30 percent writes, and 50 percent can be satisfied as cache hits directly from the SVC. For the reads, this means that 50 percent are read-hits satisfied from SVC DRAM cache, and 50 percent are read-miss that have to get the data from the managed disk, either from the managed disk's own cache, or from the actual spinning drives inside that managed disk array.
For writes, all writes are cache-hits, but some of them will be destaged to the managed disk. Typically, we find that a third of writes are over-written before this happens, so only two-thirds are written down to managed disk.
In this example, the SVC reduced the burden of the managed disk from 100,000 IOPS down to 55,000, which is 35,000 reads and 20,000 writes. Some have argued against putting one level of cache (SVC) in front of another level of cache (managed disk arrays). However, CPU processor designers have long recognized the value of hierarchical cache with L1, L2, L3 and sometimes even L4 caches. The cache-hits on SVC are faster than most disk system's cache-hits.
This is a Ponder curve, mapping millisecond response (MSR) times for different levels of I/O per second, named after the IBM scientist John Ponder that created them. Most disk array vendors will publish similar curves for each of their products. In this case, we see that 100,000 IOPS would cause a 25 millisecond response (MSR) time, but when the load is reduced to 55,000 IOPS, the average response time drops to only 7 msec.
To be fair, the SVC does introduce 0.06 msec of additional latency on read-misses, so let's call this 7.06 msec. This tiny amount of latency could be what Hu Yoshida was referring to when he said there was "some performance degradation". There are other storage virtualization products in the market that do not provide caching to boost performance, but rather just map incoming requests to outgoing requests, and these can indeed slow down every I/O they process. Perhaps Hu was thinking of those instead of IBM's SVC when he made his comments.
Of course, not all workloads are 70/30/50, and not every disk array is driven to its maximum capability, so your mileage may vary. As we slide down the left of the curve where things are flatter, the improvement in performance lowers.
IOPS before SVC
IOPS after SVC
MSR before SVC
MSR after SVC
Hitachi's offerings, including the HDS USP-V, USP-VM and their recently announced Virtual Storage Platform (VSP) sold also by HP under the name P9500, have similar architecture to the SVC and can offer similar benefits, but oddly the Hitachi engineers have decided to treat externally attached storage as second-class citizens instead. Hu mentions data that "ages out or becomes less active we can move it to the external storage." IBM has chosen not to impose this "caste" system onto its design of the SAN Volume Controller.
The SVC has been around since 2003, before the USP-V came to market, and has sold over 20,000 SVC nodes over the past seven years. The SVC can indeed improve performance of managed disk systems, in some cases by a substantial amount. The 0.06 msec latency on read-miss requests represents less than 1 percent of total performance in production workloads. SVC nearly always improves performance, and in the worst case, provides same performance but with added functionality and flexibility. For the most part, the performance boost comes as a delightful surprise to most people who start using the SVC.
To learn more about IBM's upcoming products and how IBM will lead in storage this decade, register for next week's webcast "Taming the Information Explosion with IBM Storage" featuring Dan Galvan, IBM Vice President, and Steve Duplessie, Senior Analyst and Founder of Enterprise Storage Group (ESG).
Miles per Gallon measures an effeciency ratio (amount of work done with a fixed amount of energy), not a speed ratio (distance traveled in a unit of time).
Given that IOPs and MB/s are the unit of "work" a storage array does, wouldn't the MPG equivalent for storage be more like IOPs per Watt or MB/s per Watt? Or maybe just simply Megabytes Stored per Watt (a typical "green" measurement)?
You appear to be intentionally avoiding the comparison of I/Os per Second and Megabytes per Second to Miles Per Hour?
May I ask why?
This is a fair question, Barry, so I will try to address it here.
It was not a typo, I did mean MPG (miles per gallon) and not MPH (miles per hour). It is always challenging to find an analogy that everyone can relate to explain concepts in Information Technology that might be harder to grasp. I chose MPG because it was closely related to IOPS and MB/s in four ways:
MPG applies to all instances of a particular make and model. Before Henry Ford and the assembly line, cars were made one at a time, by a small team of craftsmen, and so there could be variety from one instance to another. Today, vehicles and storage systems are mass-produced in a manner that provides consistent quality. You can test one vehicle, and safely assume that all similar instances of the same make and model will have the similar mileage. The same is true for disk systems, test one disk system and you can assume that all others of the same make and model will have similar performance.
MPG has a standardized measurement benchmark that is publicly available. The US Environmental Protection Agency (EPA) is an easy analogy for the Storage Performance Council, providing the results of various offerings to chose from.
MPG has usage-specific benchmarks to reflect real-world conditions.The EPA offers City MPG for the type of driving you do to get to work, and Highway MPG, to reflect the type ofdriving on a cross-country trip. These serve as a direct analogy to SPC having SPC-1 for Online transaction processing (OLTP) and SPC-2 for large file transfers, database queries and video streaming.
MPG can be used for cost/benefit analysis.For example, one could estimate the amount of business value (miles travelled) for the amount of dollar investment (cost to purchase gallons of gasoline, at an assumed gas price). The EPA does this as part of their analysis. This is similar to the way IOPS and MB/s can be divided by the cost of the storage system being tested on SPC benchmark results. The business value of IOPS or MB/s depends on the application, but could relate to the number of transactions processed per hour, the number of music downloads per hour, or number of customer queries handled per hour, all of which can be assigned a specific dollar amount for analysis.
It seemed that if I was going to explain why standardized benchmarks were relevant, I should find an analogy that has similar features to compare to. I thought about MPH, since it is based on time units like IOPS and MB/s, butdecided against it based on an earlier comment you made, Barry, about NASCAR:
Let's imagine that a Dodge Charger wins the overwhelming majority of NASCAR races. Would that prove that a stock Charger is the best car for driving to work, or for a cross-country trip?
Your comparison, Barry, to car-racing brings up three reasons why I felt MPH is a bad metric to use for an analogy:
Increasing MPH, and driving anywhere near the maximum rated MPH for a vehicle, can be reckless and dangerous,risking loss of human life and property damage. Even professional race car drivers will agree there are dangers involved. By contrast, processing I/O requests at maximum speed poses no additional risk to the data, nor possibledamage to any of the IT equipment involved.
While most vehicles have top speeds in excess of 100 miles per hour, most Federal, State and Local speed limits prevent anyone from taking advantage of those maximums. Race-car drivers in NASCAR may be able to take advantage of maximum MPH of a vehicle, the rest of us can't. The government limits speed of vehicles precisely because of the dangers mentioned in the previous bullet. In contrast, processing I/O requests at faster speeds poses no such dangers, so the government poses no limits.
Neither IOPS nor MB/s match MPH exactly.Earlier this week,I related IOPS to "Questions handled per hour" at the local public library, and MB/s to "Spoken words per minute" in those replies. If I tried to find a metric based on unit type to match the "per second" in IOPS and MB/s, then I would need to find a unit that equated to "I/O requests" or "MB transferred" rather than something related to "distance travelled".
In terms of time-based units, the closest I could come up with for IOPS was acceleration rate of zero-to-sixty MPH in a certain number of seconds. Speeding up to 60MPH, then slamming the breaks, and then back up to 60MPH, start-stop, start-stop, and so on, would reflect what IOPS is doing on a requestby request basis, but nobody drives like this (except maybe the taxi cab drivers here in Malaysia!)
Since vehicles are limited to speed limits in normal road conditions, the closest I could come up with for MB/s would be "passenger-miles per hour", such that high-occupancy vehicles like school buses could deliver more passengers than low-occupancy vehicles with only a few passengers.
Neither start-stops nor passenger-miles per hour have standardized benchmarks, so they don't work well for comparisonbetween vehicles.If you or anyone can come up with a metric that will help explain the relevance of standardized benchmarks better than the MPG that I already used, I would be interested in it.
You also mention, Barry, the term "efficiency" but mileage is about "fuel economy".Wikipedia is quick to point out that the fuel efficiency of petroleum engines has improved markedly in recent decades, this does not necessarily translate into fuel economy of cars. The same can be said about the performance of internal bandwidth ofthe backplane between controllers and faster HDD does not necessarily translate to external performance of the disk system as a whole. You correctly point this out in your blog about the DMX-4:
Complementing the 4Gb FC and FICON front-end support added to the DMX-3 at the end of 2006, the new 4Gb back-end allows the DMX-4 to support the latest in 4Gb FC disk drives.
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s.
Oh, and yes, it's true - the DMX-4 is not the first high-end storage array to ship a 4Gb/s FC back-end. The USP-V, announced way back in May, has that honor (but only if it meets the promised first shipments in July 2007). DMX-4 will be in August '07, so I guess that leaves the DS8000 a distant 3rd.
This also explains why the IBM DS8000, with its clever "Adaptive Replacement Cache" algorithm, has such highSPC-1 benchmarks despite the fact that it still uses 2Gbps drives inside. Given that it doesn't matter between2Gbps and 4Gbps on the back-end, why would it matter which vendor came first, second or third, and why call it a "distant 3rd" for IBM? How soon would IBM need to announce similar back-end support for it to be a "close 3rd" in your mind?
I'll wrap up with you're excellent comment that Watts per GB is a typical "green" metric. I strongly support the whole"green initiative" and I used "Watts per GB" last month to explain about how tape is less energy-consumptive than paper.I see on your blog you have used it yourself here:
The DMX-3 requires less Watts/GB in an apples-to-apples comparison of capacity and ports against both the USP and the DS8000, using the same exact disk drives
It is not clear if "requires less" means "slightly less" or "substantially less" in this context, and have no facts from my own folks within IBM to confirm or deny it. Given that tape is orders of magnitude less energy-consumptive than anything EMC manufacturers today, the point is probably moot.
I find it refreshing, nonetheless, to have agreed-upon "energy consumption" metrics to make such apples-to-apples comparisons between products from different storage vendors. This is exactly what customers want to do with performance as well, without necessarily having to run their own benchmarks or work with specific storage vendors. Of course, Watts/GB consumption varies by workload, so to make such comparisons truly apples-to-apples, you would need to run the same workload against both systems. Why not use the SPC-1 or SPC-2 benchmarks to measure the Watts/GB consumption? That way, EMC can publish the DMX performance numbers at the same time as the energy consumption numbers, and then HDS can follow suit for its USP-V.
I'm on my way back to the USA soon, but wanted to post this now so I can relax on the plane.
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData blog, posted a ["bleg"] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
Keeping more backup data available online for fast recovery.
Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
Will this meet my current business requirements?
Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
I'm down here in Australia, where the government is a bit stalled for the past two weeks at the moment, known formally as being managed by the [Caretaker government]. Apparently, there is a gap between the outgoing administration and the incoming administration, and the caretaker government is doing as little as possible until the new regime takes over. They are still counting votes, including in some cases dummy ballots known as "donkey votes", the Australian version of the hanging chad. Three independent parties are also trying to decide which major party they will support to finalize the process.
While we are on the topic of a government stalled, I feel bad for the state of Virginia in the United States. Apparently, one of their supposedly high-end enterprise class EMC Symmetrix DMX storage systems, supporting 26 different state agencies in Virginia, crashed on August 25th and now more than a week later, many of those agencies are still down, including the Department of Motor Vehicles and the Department of Taxation and Revenue.
Many of the articles in the press on this event have focused on what this means for the reputation of EMC. Not surprisingly, EMC says that this failure is unprecedented, but really this is just one in a long series of failures from EMC. It reminds me of the last time EMC had a public failure with a dual-controller CLARiiON a few months ago that stopped another company from their operations. There is nothing unique in the physical equipment itself, all IT gear can break or be taken down by some outside force, such as a natural disaster. The real question, though, is why haven’t EMC and the State Government been able to restore operations many days after the hardware was fixed?
In the Boston Globe, Zeus Kerravala, a data storage analyst at Yankee Group in Boston, is quoted as saying that such a high-profile breakdown could undermine EMC’s credibility with large businesses and government agencies. “I think it’s extremely important for them,’’ said Kerravala. “When you see a failure of this magnitude, and their inability to get a customer like the state of Virginia up and running almost immediately, all companies ought to look at that and raise their eyebrows.’’
Was the backup and disaster recovery solution capable of the scale and service level requirements needed by vital state
agencies? Had they tested their backups to ensure they were running correctly, and had they tested their recovery plans? Were they monitoring the success of recent backup operations?
Eventually, the systems will be back up and running, fines and penalties will be paid, and perhaps the guy who chose to go with EMC might feel bad enough to give back that new set of golf clubs, or whatever ridiculously expensive gift EMC reps might offer to government officials these days to influence the purchase decision making process.
(Note: I am not accusing any government employee in particular working at the state of Virginia of any wrongdoing, and mention this only as a possibility of what might have happened. I am sure the media will dig into that possibility soon enough during their investigations, so no sense in me discussing that process any further.)
So what lessons can we learn from this?
Lesson 1: You don't just buy technology, you also are choosing to work with a particular vendor
IBM stands behind its products. Choosing a product strictly on its speeds and feeds misses the point. A study IBM and Mercer Consulting Group conducted back in 2007 found that only 20 percent of the purchase decision for storage was from the technical capabilities. The other 80 percent were called "wrapper attributes", such as who the vendor was, their reputation, the service, support and warranty options.
Lesson 2: Losing a single disk system is a disaster, so disaster recovery plans should apply
IBM has a strong Business Continuity and Recovery Services (BCRS) services group to help companies and government agencies develop their BC/DR plans. In the planning process, various possible incidents are identified, recovery point objectives (RPO) and recovery time objectives (RTO) and then appropriate action plans are documentede on how to deal with them. For example, if the state of Virginia had an RPO of 48 hours, and an RTO of 5 days, then when the failure occurred on August 25, they could have recovered up to August 23 level data(48 hours prior to the incident) and be up and running by August 30 (five days after the incident). I don't personally know what RPO and RTO they planned for, but certainly it seems like they missed it by now already.
Lesson 3: BC/DR Plans only work if you practice them often enough
Sadly, many companies and government agencies make plans, but never practice them, so they have no idea if the plans will work as expected, or if they are fundamentally flawed. Just as we often have fire drills that force everyone to stop what they are doing and vacate the office building, anyone with an IT department needs to practice BC/DR plans often enough so that you can ensure the plan itself is solid, but also so that the people involved know what to do and their respective roles in the recovery process.
Lesson 4: This can serve as a wake-up call to consider Cloud Computing as an alternative option
Are you still doing IT in your own organization? Do you feel all of the IT staff have been adequately trained for the job? If your biggest disk system completely failed, not just a minor single or double drive failure, but a huge EMC-like failure, would your IT department know how to recover in less than five days? Perhaps this will serve as a wake-up call to consider alternative IT delivery options. The advantage of big Cloud Service Providers (Microsoft, Google, Yahoo, Amazon, SalesForce.com and of course, IBM) is that they are big enough to have worked out all the BC/DR procedures, and have enough resources to switch over to in case any individual disk system fails.
IBM Storage Strategy for the Smarter Computing Era
I presented this session on Thursday morning. It is a session I give frequently at the IBM Tucson Executive Briefing Center (EBC). IBM launched [Smarter Computing initiative at IBM Pulse conference]. My presentation covered the role of storage in Business Analytics, Workload Optimized Systems, and Cloud Computing.
Layer 8: Cloud Computing and the new IT Delivery Model
Ed Batewell, IBM Field Technical Support Specialist, presented this overview on Cloud Computing. The "Layer 8" is a subtle reference to the [7-layer OSI Model] for networking protocols. Ed cites insights from the [2011 IBM Global CIO Survey]. Of the 3000 companies surveyed, 60 percent plan to use or deploy clouds. In USA, 70 percent of CIOs have significant plans for cloud within the next 3-5 years. These numbers are double the statistics gleamed from the 2009 Global CIO survey. Clouds are one of IBM's big four initiatives, expecting to generate $7 Billion USD annual revenues by 2015.
IBM is recognized in the industry as one of "Big 5" vendors (Google, Yahoo, Microsoft, and Amazon round out the rest). As such, IBM has contributed to the industry a set of best practices known as the [Cloud Computing Reference Architect (36-page document)]. As is typical for IBM, this architecture is end-to-end complete, covering the three main participants for successful cloud deployments:
Consumers: the people and systems that use cloud computing services
Providers: the people, infrastructure and business operations needed to deliver IT services to consumers
Developers: the people and their development tools that create apps and platforms for cloud computing
IBM is working hard to eliminate all barriers to adoption for Cloud Computing. [Mirage image management] can patch VM images offline to address "Day 0" viruses. [Hybrid Cloud Integrator can help integrate new Cloud technologies to legacy applications. [IBM Systems Director VMcontrol] can manage VM images from z/VM on the mainframe, to PowerVM on UNIX servers, to VMware, Microsoft, Xen and KVM for x86 servers. IBM's [Cloud Service Provider Platform (CSP2)] is designed for Telecoms to offer Cloud Computing services. IBM CloudBurst is a "Cloud-in-a-Can" optimized stack of servers, storage and switches that can be installed in five days and comes in various "tee-shirt sizes" (Small, Medium, Large and Extra Large), depending on how many VMs you want to run.
Ed mentioned that companies trying to build their own traditional IT applications and environments, in an effort to compete against the cost-effective Clouds, reminded him of Thomas Thwaites' project of building a toaster from scratch. You can watch the [TED video, 11 minutes]:
An interesting project is [Reservoir] which IBM is working with other industry leaders to develop a way to seamlessly migrate VMs from one location to another, globally, without requiring shared storage, SAN zones or Ethernet subnets. This is similar to how energy companies buy and sell electricity to each other, as needed, or the way telecommunications companies allow roaming acorss each others networks.
IBM System Networking - Convergence
Jeff Currier, IBM Executive Consultant for the new IBM System Networking group, presented this session on Network Convergence. Storage is expected to grow 44x, from 0.8 [Zettabytes] in 2009, to 35 Zetabytes by the year 2020. The role of the network is growing in importance. IBM refers to this converged loss-less Ethernet network as "Convergence Enhanced Ethernet" (CEE), which Cisco uses the term "Data Center Ethernet" (DCE), and the rest of the industry uses "Data Center Bridging" (DCB).
To make this happen, we need to replace Spanning Tree Protocol [STP] that eliminates walking in circles in a multi-hop network configuration, with a new Layer 2 Multipathing (L2MP) protocol. The two competing for the title are Shortest Path Bridging (IEEE 802.1aq) and Transparent Interconnect of Lots of Links (IETF TRILL).
All roads lead to Ethernet. While FCoE has not caught on as fast as everyone hoped, iSCSI has benefited from all the enhancements to the Ethernet standard. iSCSI works in both lossy and lossless versions of Ethernet, and seems to be the preferred choice for new greenfield deployments for Small and Medium sized Businesses (SMB). Larger enterprises continue to use Fibre Channel (FCP and FICON), but might use single-hop FCoE from the servers to top-of-rack switches. Both iSCSI and FCoE scale well, but FCoE is considered more efficient.
IBM has a strategy, and is investing heavily in these standards, technologies, and core competencies.
In my last blog post [Full Disk Encryption for Your Laptop] explained my decisions relating to Full-Disk Encryption (FDE) for my laptop. Wrapping up my week's theme of Full-Disk Encryption, I thought I would explain the steps involved to make it happen.
Last April, I switched from running Windows and Linux dual-boot, to one with Linux running as the primary operating system, and Windows running as a Linux KVM guest. I have Full Disk Encryption (FDE) implemented using Linux Unified Key Setup (LUKS).
Here were the steps involved for encrypting my Thinkpad T410:
Step 0: Backup my System
Long-time readers know how I feel about taking backups. In my blog post [Separating Programs from Data], I emphasized this by calling it "Step 0". I backed up my system three ways:
Backed up all of my documents and home user directory with IBM Tivoli Storage Manager.
Backed up all of my files, including programs, bookmarks and operating settings, to an external disk drive (I used rsync for this). If you have a lot of bookmarks on your browser, there are ways to dump these out to a file to load them back in the later step.
Backed up the entire hard drive using [Clonezilla].
Clonezilla allows me to do a "Bare Machine Recovery" of my laptop back to its original dual-boot state in less than an hour, in case I need to start all over again.
Step 1: Re-Partition the Drive
"Full Disk Encryption" is a slight misnomer. For external drives, like the Maxtor BlackArmor from Seagate (Thank you Allen!), there is a small unencrypted portion that contains the encryption/decryption software to access the rest of the drive. Internal boot drives for laptops work the same way. I created two partitions:
A small unencrypted partition (2 GB) to hold the Master Boot Record [MBR], Grand Unified Bootlloader [GRUB], and the /boot directory. Even though there is no sensitive information on this partition, it is still protected the "old way" with the hard-drive password in the BIOS.
The rest of the drive (318GB) will be one big encrypted Logical Volume Manager [LVM] container, often referred to as a "Physical Volume" in LVM terminology.
Having one big encrypted partition means I only have to enter my ridiculously-long encryption password once during boot-up.
Step 2: Create Logical Volumes in the LVM container
I create three logical volumes on the encrypted physical container: swap, slash (/) directory, and home (/home). Some might question the logic behind putting swap space on an encrypted container. In theory, swap could contain sensitive information after a system [hybernation]. I separated /home from slash(/) so that in the event I completely fill up my home directory, I can still boot up my system.
Step 3: Install Linux
Ideally, I would have lifted my Linux partition "as is" for the primary OS, and a Physical-to-Virtual [P2V] conversion of my Windows image for the guest VM. Ha! To get the encryption, it was a lot simpler to just install Linux from scratch, so I did that.
Step 4: Install Windows guest KVM image
The folks in our "Open Client for Linux" team made this step super-easy. Select Windows XP or Windows 7, and press the "Install" button. This is a fresh install of the Windows operating system onto a 30GB "raw" image file.
(Note: Since my Thinkpad T410 is Intel-based, I had to turn on the 'Intel (R) Virtualization Technology' option in the BIOS!)
There are only a few programs that I need to run on Windows, so I installed them here in this step.
Step 5: Set up File Sharing between Linux and Windows
In my dual-boot set up, I had a separate "D:" drive that I could access from either Windows or Linux, so that I would only have to store each file once. For this new configuration, all of my files will be in my home directory on Linux, and then shared to the Windows guest via CIFS protocol using [samba].
In theory, I can share any of my Linux directories using this approach, but I decide to only share my home directory. This way, any Windows viruses will not be able to touch my Linux operating system kernels, programs or settings. This makes for a more secure platform.
Step 6: Transfer all of my files back
Here I used the external drive from "Step 0" to bring my data back to my home directory. This was a good time to re-organize my directory folders and do some [Spring cleaning].
Step 7: Re-establish my backup routine
Previously in my dual-boot configuration, I was using the TSM backup/archive client on the Windows partition to backup my C: and D: drives. Occasionally I would tar a few of my Linux directories and storage the tarball on D: so that it got included in the backup process. With my new Linux-based system, I switched over to the Linux version of TSM client. I had to re-work the include/exclude list, as the files are different on Linux than Windows.
One of my problems with the dual-boot configuration was that I had to manually boot up in Windows to do the TSM backup, which was disruptive if I was using Linux. With this new scheme, I am always running Linux, and so can run the TSM client any time, 24x7. I made this even better by automatically scheduling the backup every Monday and Thursday at lunch time.
There is no Linux support for my Maxtor BlackArmor external USB drive, but it is simple enough to LUKS-encrypt any regular external USB drive, and rsync files over. In fact, I have a fully running (and encrypted) version of my Linux system that I can boot directly from a 32GB USB memory stick. It has everyting I need except Windows (the "raw" image file didn't fit.)
I can still use Clonezilla to make a "Bare Machine Recovery" version to restore from. However, with the LVM container encrypted, this renders the compression capability worthless, and so takes a lot longer and consumes over 300GB of space on my external disk drive.
Backing up my Windows guest VM is just a matter of copying the "raw" image file to another file for safe keeping. I do this monthly, and keep two previous generations in case I get hit with viruses or "Patch Tuesday" destroys my working Windows image. Each is 30GB in size, so it was a trade-off between the number of versions and the amount of space on my hard drive. TSM backup puts these onto a system far away, for added protection.
Step 8: Protect your Encryption setup
In addition to backing up your data, there are a few extra things to do for added protection:
Add a second passphrase. The first one is the ridiculously-long one you memorize faithfully to boot the system every morning. The second one is a ridiculously-longer one that you give to your boss or admin assistant in case you get hit by a bus. In the event that your boss or admin assistant leaves the company, you can easily disable this second passprhase without affecting your original.
Backup the crypt-header. This is the small section in front that contains your passphrases, so if it gets corrupted, you would not be able to access the rest of your data. Create a backup image file and store it on an encrypted USB memory stick or external drive.
If you are one of the lucky 70,000 IBM employees switching from Windows to Linux this year, Welcome!
Now that the US Recession has been declared over, companies are looking to invest in IT again. To help you plan your upcoming investments, here are some upcoming events in April.
SNW Spring 2010, April 12-15
IBM is a Platinum Plus sponsor at this [Storage Networking World event], to be held April 12-15 at the Rosen Shingle Creek Resort in Orlando, Florida. If you are planning to go, here's what you can go look for:
IBM booth at the Solution Center featuring the DS8700 and XIV disk systems, SONAS and the Smart Business Storage Cloud (SBSC), and various Tivoli storage software
IBM kiosk at the Platinum Galleria focusing on storage solutions for SAP and Microsoft environments
IBM Senior Engineer Mark Fleming presenting "Understanding High Availability in the SAN"
IBM sponsored "Expo Lunch" on Tuesday, April 13, featuring Neville Yates, CTO of IBM ProtecTIER, presenting "Data Deduplication -- It's not Magic - It's Math!"
IBM CTO Vincent Hsu presenting "Intelligent Storage: High Performance and Hot Spot Elimination"
IBM Senior Technical Staff Member (STSM) Gordon Arnold presenting "Cloud Storage Security"
One-on-One meetings with IBM executives
I have personally worked with Mark, Neville, Vincent and Gordon, so I am sure they will do a great job in their presentations. Sadly, I won't be there myself, but fellow blogger [Rich Swain from IBM] will be at the event to blog about all the actviities there.
Jim Stallings - General Manager, Global Markets, IBM Systems and Technology Group
Scott Handy - Vice President, WW Marketing, Power Systems, IBM Systems and Technology Group
Dan Galvan - Vice President, Marketing & Strategy, Storage and Networking Systems, IBM Systems and Technology Group
Inna Kuznetsova - Vice President, Marketing and Sales Enablement, Systems Software, IBM Systems and Technology Group
Jeanine Cotter - Vice President, Systems Services, IBM Global Technology Services
The webinar will include client testimonials from various companies as well.
Dynamic Infrastructure Executive Summit, April 27-29
I will be there, at this this 2-and-a-half-day [Executive Summit] in Scottsdale, Arizona, to talk to company executives. Discover how IBM can help you manage your ever-increasing amount of information with an end-to-end, innovative approach to building a dynamic infrastructure. You will learn all of our innovative solutions and find out how you can effectively transform your enterprise for a smarter planet.
Well it's Tuesday again, and you know what that means! IBM Announcements!
For nearly 50 years, IBM has been leading the IT industry with its mainframe servers. Today, IBM announced its 12th generation mainframe in its [System z product family], the IBM zEnterprise EC12, or zEC12 for short. I joined IBM in 1986, and my first job was to work on DFHSM for the MVS operating system. The product is now known as DFSMShsm as part of the Data Facility Storage Management System, and the operating systems went through several name changes: MVS/ESA, OS/390, and lately z/OS. I was the lead architect for DFSMS up until 2001. I then switched to be part of the team that brought Linux to the mainframe. Both of these experiences come in handy as I deal with mainframe storage clients at the Tucson Executive Briefing Center.
Let's take a look at some recent developments over the past few years.
In the 9th and 10th generations (IBM System z9 and z10, respectively), IBM introduced the concept of a large "Enterprise Class", and a small "Business Class" to offer customer choice. These were referred to as the EC and BC models.
For the 12th generation, IBM kept the name "zEnterprise", but went back to the "EC" to refer to Enterprise Class. Rather than offer a separate "small" Business Class version, the zEC12 comes in 60 different sub-capacity levels. Many software vendors charge per core, or per [MIPS], so offering sub-capacity means that some portion of the processors are turned off, so the software license is lower. The top rating for the zEC12 is 78,000 MIPS. (I would have thought by now that we would have switched over to BIPS by now!)
If you currently have a z10 or z196, then it can be upgraded to zEC12. The zEC12 can attach to up to four zBX model 003 frames that can run AIX, Microsoft Windows and Linux-x86. If you currently have zBX model 002 frames, these can be upgraded to model 003.
The key enhancements reflect the three key initiatives:
Operational Analytics - Most analytics are done after-the-fact, but IBM zEnterprise can enable operational analytics in real-time, such as fraud detection while the person is using the credit card at a retail outlet, or online websites providing real-time suggestions for related products while the person is still adding items to their shopping card. Operational analytics provides not just the insight, but in a timely manner that makes it actionable. There is even work in place to [certify Hadoop on the mainframe].
Security and Resiliency - IBM is famous for having the most secure solutions. With industry-leading EAL5+ security rating, it beats out competitive offerings that are typically only EAL4 or lower. IBM has a Crypto Express4S card to provide tamper-proof co-processing for the system. IBM introduces the new "zAware" feature, which is like "Operational Analytics" pointed inward, evaluating all of the internal processes, error logs and traces, to determine if something needs to be fixed or optimized.
Cloud Agile - When people hear the phrase "Cloud Agile" they immeidately think of IBM System Storage, but servers can be Cloud Agile also, and the mainframe can run Linux and Java better, faster, and at a lower cost, than many competitve alternatives.
Continuing my coverage of the 30th annual [Data Center Conference]. here is a recap of Wednesday morning sessions.
A Data Center Perspective on MegaVendors
The morning started with a keynote session. The analyst felt that the eight most strategic or disruptive companies in the past few decades were: IBM, HP, Cisco, SAP, Oracle, Apple and Google. Of these, he focused on the first three, which he termed the "Megavendors", presented in alphabetical order.
Cisco enjoys high-margins and a loyal customer base with Ethernet switch gear. Their new strategy to sell UP and ACROSS the stack moves them into lower-margin business like servers. Their strong agenda with NetApp is not in sync with their partnership with EMC. They recently had senior management turn-over.
HP enjoys a large customer base and is recognized for good design and manufacturing capabilities. Their challenges are mostly organizational, distracted by changes at the top and an untested and ever-changing vision, shifting gears and messages too often. Concerns over the Itanium have not helped them lately.
IBM defies simple description. One can easily recognize Cisco as an "Ethernet Switch" company, HP as a "Printer Company", Oracle as a "Database Company', but you can't say that IBM is an "XYZ" company, as it has re-invented itself successfully over its past 100 years, with a strong focus on client relationships. IBM enjoys high margins, sustainable cost structure, huge resources, a proficient sales team, and is recognized for its innovation with a strong IBM Research division. Their "Smarter Planet" vision has been effective in supporting their individual brands and unlock new opportuties. IBM's focus on growth markets takes advantage of their global reach.
His final advice was to look for "good enough" solutions that are "built for change" rather than "built to last".
Chris works in the Data Center Management and Optimization Services team. IBM owns and/or manages over 425 data centers, representing over 8 million square feet of floorspace. This includes managing 13 million desktops, and 325,000 x86 and UNIX server images, and 1,235 mainframes. IBM is able to pool resources and segment the complexity for flexible resource balancing.
Chris gave an example of a company that selected a Cloud Compute service provided on the East coast a Cloud Storage provider on the West coast, both for offering low rates, but was disappointed in the latency between the two.
Chris asked "How did 5 percent utilization on x86 servers ever become acceptable?" When IBM is brought in to manage a data center, it takes a "No Server Left Behind" approach to reduce risk and allow for a strong focus on end-user transition. Each server is evaluated for its current utilization:
Amazingly, many servers are unused. These are recycled properly.
1 to 19 percent
Workload is virtualized and moved to a new server.
20 to 39 percent
Use IBM's Active Energy Manager to monitor the server.
40 to 59 percent
Add more VMs to this virtualized server.
over 60 percent
Manage the workload balance on this server.
This approach allows IBM to achieve a 60 to 70 percent utilization average on x86 machines, with an ROI payback period of 6 to 18 months, and 2x-3x increase of servers-managed-per-FTE.
Storage is classified using Information Lifecycle Management (ILM) best practices, using automation with pre-defined data placement and movement policies. This allows only 5 percent of data to be on Tier-1, 15 percent on Tier-2, 15 percent on Tier-3, and 65 percent on Tier-4 storage.
Chris recommends adopting IT Service Management, and to shift away from one-off builds, stand-alone apps, and siloed cost management structures, and over to standardization and shared resources.
You may have heard of "Follow-the-sun" but have you heard of "Follow-the-moon"? Global companies often establish "follow-the-sun" for customer service, re-directing phone calls to be handled by people in countries during their respective daytime hours. In the same manner, server and storage virtualization allows workloads to be moved to data centers during night-time hours, following the moon, to take advantage of "free cooling" using outside air instead of computer room air conditioning (CRAC).
Since 2007, IBM has been able to double computer processing capability without increasing energy consumption or carbon gas emissions.
It's Wednesday, Day 3, and I can tell already that the attendees are suffering from "information overload'.
Friday - We landed in Paris, France. I have been to Paris many times, but this was a first for Mo. A croissant cost only 2 Euro, but the young woman behind the counter gave me a look of disgust when I asked for a knife and butter to put on the croissant. If you ever get the chance to have a real French croissant, you will realize you don't need any more butter. If you do attempt to put anything on the croissant, it will disintegrate into a million tiny pieces!
2. Visit Ronda
Saturday - We rented a car and drove to the mountain village of [Ronda, Spain], which is in the heart of the region of Spain called Andalucia. Why Ronda? This was where Mo's uncle was stationed during the war. The town is built on two mountains, connected by a set of bridges. The tallest is "Puente Nuevo", built in the 1700s, which is nearly 400 feet tall. Ronda is also home of Spain's oldest Bull Fighting ring. Bars and restaurants built along the cliff offer some spectacular views. Mo and I shared a "Paella Mixta" for lunch, consisting of yellow rice with bits of chicken and seafood.
3. Soak in European Mineral Waters
Sunday - Most things in Europe are closed on Sunday, so we decided to have a "Spa Day" at the [Gran Hotel Benahavis], in Benahavis, Spain. This lovely hotel is built over a natural mineral waters hot spring, and an underground spa allowed us to relax in the warmth. The spa also had a dry sauna, steam sauna, and ice cold water bath to complete the experience.
4. Climb to the Top of the Rock of Gibraltar
Monday - Technically, Gibraltar is a separate country, but they use British money (Pound Sterling). To get to the top of the rock, we drove across their airport runway, saw the mosque at Point Europa, parked in large parking lot and took the cable car to the top. From there, we climbed a few more steps to see the grand views of Spain and North Africa, while keeping our distance from the infamous monkeys. These [Barbary Macaque] are cute, but can bite or scratch you if you get too close. Afterwards, we had lunch in a pub called the Angry Friar.
5. See Snake Charmers in Morocco
Tuesday - We took a guided tour over to the Kingdom of Morocco. This included a ferry boat ride from Tarifa, Spain to Tangier, Morocco. A bus then took us to the "Kasbah" (the fort), where we got to see snake charmers perform their act. We had an interesting lunch, followed by obligatory "shopping opportunities" for rugs and spices. Back on the bus, we went to a place to go ride camels, see the King's palace, and visit the the Grotto of Hercules. The last stop was to sit back and relax for a nice cup of hot Menthol Tea at Cap Spartel, the northernmost point of Morocco.
6. Hang Out at a Mediterranean Beach
Wednesday - Our last full day in Spain, we decided to have lunch on the beach. This region is referred to as Costa Del Sol. We opted for "Playa de la Rada" in Estepona, Spain. The beach was a bit rocky, the sand was hot and uncomfortable to walk on, and the heat and humidity was just slightly less than the steam sauna at the Gran Hotel Benahavis. We stayed in the shade of our beach-side restaurant and had a lunch of grilled sardines and the local Cruzcampo beer.
7. Visit the World of Coca-Cola
Thursday - we drove to Malaga, Spain, and flew back to the United States. Malaga is famous for celebrities like Ernest Hemingway and Pablo Picaso. We could not get all the way back to Tucson, so we stayed overnight in Atlanta.
Friday - This gave us an opportunity to visit the [World of Coca-Cola], where Mo's cousin had done some recent marketing work in celebration of their 125 year anniversary. This is a museum with a live bottling operation on display, a 4D movie, viewing areas to see commercials from around the world, and free tasting, sampling some of the 105 different soft drink flavors manufactured. I recommend the Tawney Ginger from Tanzania, and the Simba Guarana from Brazil. I did not care for the Apple-and-Carrot soda from Japan.
8. See a Manta Ray Up Close
Our discount combo tickets included a visit to the [Georgia Aquarium] next door. Mo can't scuba-dive, but she got stung by a ray when she was a kid, and wanted to show me a big Manta Ray up close. The aquarium was quite good, divided up into separate exhibits, including interactive touch-the-fish areas for the kids, Beluga whales, Jellyfish, Seahorses, and a moving sidewalk that takes you underneath the sea life.
I would like to thank Delta Air Lines for letting Mo and I take this trip using frequent flyer miles, Hertz Rental Cars for offering a sweet deal on a tiny Hyundai i20 car, the Gran Hotel Benahavis for their hospitality, and the incredibly warm and helpful people of Atlanta. I am glad that my language skills in French, Spanish and Arabic came in quite handy!