This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections Platform is now in read-only mode and content is only available for viewing. No new wiki pages, posts, or messages may be added. Please see our FAQ for more information. The developerWorks Connections platform will officially shut down on March 31, 2020 and content will no longer be available. More details available on our FAQ. (Read in Japanese.)
Continuing this week's theme of doing important things without leaving town, I present our results foran exciting project I started earlier this year.
For seven weeks, my coworker Mark Haye and I voluntarily led a class of students here in Tucson, Arizona in an after-school pilot project to teach the ["C" programming language] using [LEGO® Mindstorms® NXT robots]. The ten students, boys and girls ages 9 to 14 years old, were already part of the FIRST [For Inspiration and Recognition of Science and Technology] program, and participated in FIRST Lego League[FLL] robot competitions.Since the students were already familiar building robots, and programming them with a simple graphical system of connecting blocks that perform actions. However, to compete in the next level of robot competitions, FIRST Tech Challenge [FTC],we need to leave this simple graphical programming behind, and upgrade to more precise "C" programming.
Mark is a software engineer for IBM Tivoli Storage Manager and has participated in FLL competitions over the past nine years. This week, he celebrates his 25th anniversary at IBM, and I celebrate my 23rd. The teacher, Ms. Ackerman, and the students referred to us as "Coach Mark" and "Coach Tony".
This was the first time I had worked with LEGO NXT robots. For those not familiar with these robots, you can purchase a kit at your localtoy store. In addition to regular LEGO bricks, beams, and plates, there are motors, wheels, and sensors. A programmable NXT brick has three outputs (marked A,B, and C) to control three motors, and four inputs (marked 1,2,3,4) to receive values from sensors. Programs are written and compiled on laptops and then downloaded to the NXT programmable brick through an USB cable, or wirelessly via Bluetooth.
In the picture shown, an image of the Mars planetary surface is divided into a grid with thick black lines.A light sensor between the front two wheels of the robot is over the black line.
We used the [RobotC programming firmware] and integrated development environment (IDE) from [Carnegie Mellon University].The idea of this pilot was to see how well the students could learn "C". With only a few hours after class on each Wednesday, could we teach young students "C" programming in just seven weeks?
My contribution? I have taught both high school and college classes, and spent over 15 years programming for IBM, so Mark asked me to help.We started with a basic lesson plan:
A brief history of the "C" language
Understanding statements and syntax
Setting motor speed and direction
Compiling and downloading your first program
Understanding the "while" loop
Retrieving input sensor values
Understanding the "if-then-else" statement
Defining variables with different data types
Manipulating string variables
Writing a program for the robot to track along a black line on a white background.
Understanding local versus global scope variables
Writing a program for a robot to count black lines as it crosses them.
Perform left turns, right turns, and to cross a specific number of lines on a grid pattern to move the robot to a specific location.
Weeks 6 and 7
Mission Impossible: come up with a challenge to make the robot do something that would be difficult to accomplish using the previous NXT visual programming language.
At the completion of these seven weeks, I sat down to interview "Coach Mark"on his thoughts on this pilot project.
This is a practical programming skill. The "C" language is used throughout the world to program everything from embedded systems to operating systems, and even storage software. This would allow the robots to handle more precise movements, more accurate turns, and more complicated missions.
Can kids learn "C" in only seven weeks?
Part of the pilot project was to see how well the students could understand the material. They were already familiar with building the robots, and understood the basics of programming sensors and motors, so we were hoping this was a good foundation to work from. Some kids managed very well, others struggled.
Did everything go according to plan?
The first two weeks went well, turning on motors and having robots move forward and backward were easy enough. We seemed to lose a few students on week 3, and things got worse from there. However, several of the students truly surprised us and managed to implement very complicated missions. We were quite pleased with the results.
What kind of problems did the kids encounter?
Touch sensor required loops waiting for pressing. Motors did not necessarily turn as expected until more advanced methods were used. Making 90 degree left and right turns accurately was more difficult than expected.
Any funny surprises?
Yes, we had a Challenge Map representing the Mars planetary surface from a previous FLL competition that was dark red and divided into squares with thick black lines. An active light sensor returns a value of "0" (complete darkness) to "100" (bright white).However, the Mars surface had craters that were dark enough to be misinterpreted as a black line causing some unusual results. This required some enhanced programming techniques to resolve.
Did robots help or hurt the teaching process?
I think they helped. Rather than writing programs that just display "Hello World!" on a computer screen, the students can actually see robots move, and either do what they expect, or not!
And when the robots didn't do what they were expected to?
The students got into "debug" mode. They were already used to doing this from previous FLL competitions, but with RobotC, you can leave the USB cable connected (or use wireless Bluetooth) and actually gather debugging information while the robot is running, to see the value of sensors and other variables and help determine why things are not working properly.
Any applicability to the real world of storage?
We have robots in the IBM System Storage TS3500 tape library. These robots scan bar code labels, pull tapes out of shelves and mount them into drives.The programming skills are the same needed for storage software, suchas IBM Tivoli Storage Manager or IBM Tivoli Storage Productivity Center.
The world is becoming smarter, instrumented with sensors, interconnected over a common network, and intelligent enough to react and respond correctly. The lessons of reading sensor values and moving motors can be considered the first step in solutions that help to make a smarter planet.
Just as light bulbs burn out eventually after repeatedly being turned on and off, Flash does not last forever either.
A set of transistors can represent a single bit of informaiton (Single-level cell, or SLC for short), or multiple bits (Multi-level Cell, MLC). MLC typically refers to two bits, with a new "Triple-level cell" or TLC technology, able to store three bits per set of transistors.
SLC is faster and can endure more "Program-erase" write cycles, but MLC is less expensive to manufacture and therefore used in most consumer products, like digital cameras, smart phones, music players and USB memory sticks. To learn more on this, see this 6-page IBM whitepaper on [Comparison of NAND Flash Technologies Used in Solid-State Storage].
In between, "Enterprise MLC" (or eMLC for short) refers specifically to a different grade of chips IBM gets from the flash manufacturer. eMLC chips use a similar MLC bit arrangement, but are typically selected from higher bins, and most importantly have much longer program-erase cycle times which yield greater chip endurance, at the expense of long data retention when power is off (but seriously, when is anything off for very long in a data center?)
As a result, eMLC has 10x the endurance of regalar MLC, approaching parity with SLC at half the cost!
In the IBM FlashSystem, DRAM cache is used to buffer the writes first, then written out to the Flash. This helps to further improve the endurance.
For enterprise reliability, each Flash chip on the IBM FlashSystem has Error Correcting Codes (ECC), and then each set of 10 chips is placed in a 9+P RAID-5 configuration.
The chips are sub-divided into 16 planes. In the event a cell fails, the data for that plane can be reconstructed from parity, and written to spare space on the other planes of that same chip set. That plane is then reformated as an 8+P RAID-5, bypassing the failed plane.
In this manner, a cell failure only results in losing a small portion of one chip. If the same plane fails another failure on another chip, it will drop down to 7+P, 6+P, 5+P, and finally 4+P. This is known as "Variable Stripe RAID" or VSR for short.
IBM FlashSystem can survive over 1,000 such cell failures without an outage. By comparison, a single cell failure on an SSD often marks the entire drive as a failure.
But wait, there's more. Why stop at just RAID-5 across 10 chips. The chips are organized into modules, and IBM FlashSystem can perform RAID-5 across modules, in a 10+P+S RAID-5 configuration. This is referred to as "Two dimensional RAID" or 2D-RAID for short.
Even if you lost an entire module, the system will automatically rebuild on the spare module, and you can replace the bad one non-disruptively.
Many use cases for all-Flash arrays do not require such high levels of Enterprise reliability. Several of the all-Flash competitors have adopted a "design-for-failure" approach common among Cloud Service Providers like Amazon Web Services.
The idea is to assume that the data stored on them is just a copy from some other storage media. In the event of a Flash failure, it can easily be restored from a mirrored copy or backup.
For the IBM FlashSystem, The newer 800 series are based on eMLC, ideal for the majority of business applications, databases and virtual machine images placed on all-Flash arrays. The older 700 series are based on more expensive SLC, designed specifically for sustained write-intensive workloads.
Within each series, the "tens" models (710, 810) offer RAID-0 striping across ECC and VSR protected modules. For higher levels of availability, the "twenties" models (720, 820) offer ECC, VSR and 2D-RAID protection.
Based on this success, and perhaps because I am also fluent in Spanish, I was asked to help with Proyecto Ceibal, the team for OLPC Uruguay. Normally theXS school server resides at the school location itself, so that even if the internet connection is disrupted or limited, the school kids can continue to access each other and the web cache content until internet connection is resumed.However, with a diverse developmentteam with people in United States, Uruguay, and India, we first looked to Linux hosting providers that wouldagree to provide free or low-cost monthly access. We spent (make that "wasted") the month of May investigating.Most that I talked to were not interested in having a customized Linux kernel on non-standard hardware on their shop floor, and wanted instead to offer their own standard Linux build on existing standard servers, managed by theirown system administrators, or were not interested in providing it for free. Since the XS-163 kernel is customizedfor the x86 architecture, it is one of those exceptions where we could not host it on an IBM POWER or mainframe as a virtual guest.
This got picked up as an [idea] for the Google's[Summer of Code] and we are mentoring Tarun, a 19-year-old student to actas lead software developer. However, summer was fast approaching, and we wanted this ready for the next semester. In June, our project leader, Greg, came up with a new plan. Build a machine and have it connected at an internet service provider that would cover the cost of bandwidth, and be willing to accept this with remote administration. We found a volunteer organization to cover this -- Thank you Glen and Vicki!
We found a location, so the request to me sounded simple enough: put together a PC from commodity parts that meet the requirements of the customizedLinux kernel, the latest release being called [XS-163]. The server would have two disk drives, three Ethernet ports, and 2GB of memory; and be installed with the customized XS-163 software, SSHD for remote administration, Apache web server, PostgreSQL database and PHP programming language.Of course, the team wanted this for as little cost as possible, and for me to document the process, so that it could be repeated elsewhere. Some stretch goals included having a dual-boot with Debian 4.0 Etch Linux for development/test purposes, an alternative database such as MySQL for testing, a backup procedure, and a Recover-DVD in case something goes wrong.
Some interesting things happened:
The XS-163 is shipped as an ISO file representing a LiveCD bootable Linux that will wipe your system cleanand lay down the exact customized software for a one-drive, three-Ethernet-port server. Since it is based on Red Hat's Fedora 7 Linux base, I found it helpful to install that instead, and experiment moving sections of code over.This is similar to geneticists extracting the DNA from the cell of a pit bull and putting it into the cell for a poodle. I would not recommend this for anyone not familiar with Linux.
I also experimented with modifying the pre-built XS-163 CD image by cracking open the squashfs, hacking thecontents, and then putting it back together and burning a new CD. This provided some interesting insight, but in the end was able to do it all from the standard XS-163 image.
Once I figured out the appropriate "scaffolding" required, I managed to proceed quickly, with running versionsof XS-163, plain vanilla Fedora 7, and Debian 4, in a multi-boot configuration.
The BIOS "raid" capability was really more like BIOS-assisted RAID for Windows operating system drivers. This"fake raid" wasn't supported by Linux, so I used Linux's built-in "software raid" instead, which allowed somepartitions to be raid-mirrored, and other partitions to be un-mirrored. Why not mirror everything? With two160GB SATA drives, you have three choices:
No RAID, for a total space of 320GB
RAID everything, for a total space of 160GB
Tiered information infrastructure, use RAID for some partitions, but not all.
The last approach made sense, as a lot of of the data is cache web page images, and is easily retrievable fromthe internet. This also allowed to have some "scratch space" for downloading large files and so on. For example,90GB mirrored that contained the OS images, settings and critical applications, and 70GB on each drive for scratchand web cache, results in a total of 230GB of disk space, which is 43 percent improvement over an all-RAID solution.
While [Linux LVM2] provides software-based "storage virtualization" similar to the hardware-based IBM System Storage SAN Volume Controller (SVC), it was a bad idea putting different "root" directories of my many OS images on there. With Linux, as with mostoperating systems, it expects things to be in the same place where it last shutdown, but in a multi-boot environment, you might boot the first OS, move things around, and then when you try to boot second OS, it doesn'twork anymore, or corrupts what it does find, or hangs with a "kernel panic". In the end, I decided to use RAIDnon-LVM partitions for the root directories, and only use LVM2 for data that is not needed at boot time.
While they are both Linux, Debian and Fedora were different enough to cause me headaches. Settings weredifferent, parameters were different, file directories were different. Not quite as religious as MacOS-versus-Windows,but you get the picture.
During this time, the facility was out getting a domain name, IP address, subnet mask and so on, so I testedwith my internal 192.168.x.y and figured I would change this to whatever it should be the day I shipped the unit.(I'll find out next week if that was the right approach!)
Afraid that something might go wrong while I am in Tokyo, Japan next week (July 7-11), or Mumbai, India the following week (July 14-18), I added a Secure Shell [SSH] daemon that runs automaticallyat boot time. This involves putting the public key on the server, and each remote admin has their own private key on their own client machine.I know all about public/private key pairs, as IBM is a leader in encryption technology, and was the first todeliver built-in encryption with the IBM System Storage TS1120 tape drive.
To have users have access to all their files from any OS image required that I either (a) have identical copieseverywhere, or (b) have a shared partition. The latter turned out to be the best choice, with an LVM2 logical volumefor "/home" directory that is shared among all of the OS images. As we develop the application, we might findother directories that make sense to share as well.
For developing across platforms, I wanted the Ethernet devices (eth0, eth1, and so on) match the actual ports they aresupposed to be connected to in a static IP configuration. Most people use DHCP so it doesn't matter, but the XSsoftware requires this, so it did. For example, "eth0" as the 1 Gbps port to the WAN, and "eth1/eth2" as the two 10/100 Mbps PCI NIC cards to other servers.Naming the internet interfaces to specific hardware ports wasdifferent on Fedora and Debian, but I got it working.
While it was a stretch goal to develop a backup method, one that could perform Bare Machine Recovery frommedia burned by the DVD, it turned out I needed to do this anyways just to prevent me from losing my work in case thingswent wrong. I used an external USB drive to develop the process, and got everything to fit onto a single 4GB DVD. Using IBM Tivoli Storage Manager (TSM) for this seemed overkill, and [Mondo Rescue] didn't handle LVM2+RAID as well as I wanted, so I chose [partimage] instead, which backs up each primary partition, mirrored partition, or LVM2 logical volume, keeping all the time stamps, ownerships, and symbolic links in tact. It has the ability to chop up the output into fixed sized pieces, which is helpful if you are goingto burn them on 700MB CDs or 4.7GB DVDs. In my case, my FAT32-formatted external USB disk drive can't handle files bigger than 2GB, so this feature was helpful for that as well. I standardized to 660 GiB [about 692GB] per piece, sincethat met all criteria.
The folks at [SysRescCD] saved the day. The standard "SysRescueCD" assigned eth0, eth1, and eth2 differently than the three base OS images, but the nice folks in France that write SysRescCD created a customized[kernel parameter that allowed the assignments to be fixed per MAC address ] in support of this project. With this in place, I was able to make a live Boot-CD that brings up SSH, with all the users, passwords,and Ethernet devices to match the hardware. Install this LiveCD as the "Rescue Image" on the hard disk itself, and also made a Recovery-DVD that boots up just like the Boot-CD, but contains the 4GB of backup files.
For testing, I used Linux's built-in Kernel-based Virtual Machine [KVM]which works like VMware, but is open source and included into the 2.6.20 kernels that I am using. IBM is the leadingreseller of Vmware and has been doing server virtualization for the past 40 years, so I am comfortable with thetechnology. The XS-163 platform with Apache and PostgreSQL servers as a platform for [Moodle], an open source class management system, and the combination is memory-intensive enough that I did not want to incur the overheads running production this manner, but it wasgreat for testing!
With all this in place, it is designed to not need a Linux system admin or XS-163/Moodle expert at the facility. Instead, all we need is someone to insert the Boot-CD or Recover-DVD and reboot the system if needed.
Just before packing up the unit for shipment, I changed the IP addresses to the values they need at the destination facility, updated the [GRUB boot loader] default, and made a final backup which burned the Recover-DVD. Hopefully, it works by just turning on the unit,[headless], without any keyboard, monitor or configuration required. Fingers crossed!
So, thanks to the rest of my team: Greg, Glen, Vicki, Tarun, Marcel, Pablo and Said. I am very excited to bepart of this, and look forward to seeing this become something remarkable!
(What does this have to do with Storage? When IBM got back into networking in a big way, they had to decide whether to combine it with one of the existing groups, or form its own group. IBM decided to merge networking with storage, which makes sense since the primary purpose of most networks is to access or transmit information stored somewhere else.)
Last April, the Wharton School and the Institute for the Future convened a one-day [After Broadband] workshop in San Francisco, California, that brought together a group of leading technologists, entrepreneurs, academics and policymakers to explore the future of broadband over the next decade.
During the break, I talked with some of the other bloggers at this event. From left to right: Stephen Foskett [Pack Rat] blog, Devang Panchigar [StorageNerve], and yours truly, Tony Pearson. (Picture courtesy of Stephen Foskett)
Meet the Experts
This next segment was a Q&A panel, with a moderator posing questions to four experts. Originally, I was scheduled to be the moderator, but this was changed to Doug Balog. The experts on the panel were:
Rich Castagna, Editorial Director for Storage Media, TechTarget. TechTarget is the group that runs the [SearchStorage] website.
Stan Zaffos, Gartner VP of Research, who spoke earlier today. I have worked with Stan for years as well, and have attended the last four Gartner Data Center Conferences held every December in Las Vegas.
Steve Duplessie, Founder and Senior Analyst, Enterprise Strategy Group (ESG). Steve's blog is titled [The Bigger Truth].
Jon clarified a statement Doug Balog said earlier in the day attributed to his study. Doug had said that 40 percent of all data should be archived. The study that Jon Toigo had done found that, on average, for the data on disk systems, about 30 percent is useful data, 40 percent is not active and could be eligible for archive, and the remaining 30 percent was crap.
The other experts introduced themselves. Rich felt that "Cloud" was still the biggest buzzword in the IT industry. Stan felt that CIOs should ask their storage administrators "What are you doing to improve my agility and efficiency". Steve felt that it was better to focus on improving process and procedures, rather than trying to deploy the best technology.
How can you best reduce backup costs per TB?
Jon- use tape.
Rich- Clean up your environment.
Stan- Don't rehydrate your deduplicated data, adopt archive approach, and revisit your backup schedules.
Steve- Deduplication covers up stupidity. No band-aids! Companies need to address the cause.
Does Backup as a Public Service for large enterprises makes sense?
Rich- Yes, especially for those with Remote Office/Branch Office (ROBO).
Stan- It depends. You should implement client-side dedupe. Get the Cloud Provider to waive telecom bandwidth charges.
Steve- Consider recovery scenarios, and try to maintain control.
Jon- "Clouds" are bulls@#$ marketing. WAN latency will pile up.
What are the top issues IT leaders should be discussing with the Storage Managers?
Stan- To ensure SLAs meet but not exceed design, to automate, and to evaluate SAN/NAS ratios.
Steve- Server virtualization is putting the spotlight on storage. Failure to implement storage virtualization is becoming the gate that slows down sever virtualization adoption.
Jon- Insist on management features from all storage vendors, try to separate feature/function from the underlying hardware layer. See IBM's [Project Zero].
Rich- Efficiency, Archiving, Thin Provisioning, Compression, Data Protection & Retention, Backup Redesign to protect endpoints like laptops and cell phones.
When does Archive eliminate Backup?
The need for protection never goes away. There are two kinds of data: "originals" and "derivatives", and two kinds of disk: "failed" and "not yet failed".
Given SATA and SAS drives, what is the future of 10K/15K RPM drives?
There is no future for these faster drives, they are going away.
What is the biggest challenge for adopting archive?
It is easy to move data out of production systems, but difficult to make these archives accessible for eDiscovery and Search. There is also concern about changing data formats. Adobe has changed the format of PDF a whopping 33 times.
This was by far the most entertaining section of the day! Hand-held devices allowed the audience to vote which answers they liked best.
Continuing my coverage of the IT Security and Storage Expo in Brussels, Belgium, we had some great storage solutions on display at the IBM and I.R.I.S-ICT booth.
Here my IBM colleague Tom Provost is showing the front of the "Smarter Office" solution. The second photo gives the view from behind. While I always explained the solution from the front of the box, many of the more technical attendees at this conference wanted to inspect the ports in the back.
This sound-isolated 11U solution combines the following:
The [IBM Storwize V3700] with 300GB small-form-factor (SFF) drives provides shared storage for the servers.
Two [IBM System x3550 M4 servers] that can run VMware, Hyper-V or Linux KVM server hypervisor software for your Windows and/or Linux applications. These are two socket servers that can have up to 16 x86 cores each.
A Juniper EX2200 switch to network the servers and storage together.
A Local Console Manager (LCM) with rackable keyboard, video, and mouse.
In this next example, the IBM team combined a BladeCenter S chassis that can hold six blade servers, with a Storwize V7000 Unified which offers FCP, iSCSI, FCoE, NFS, CIFS, HTTPS, SCP and FTP block and file protocols.
If those configurations are too small for your needs, consider the Flex System chassis or full PureFlex system frame. The rack-mountable 10U chassis can hold the Flex System V7000 and 10 compute notes. The PureFlex frame can hold up to four of these chasses.
IBM and I.R.I.S-ICT also had an IBM XIV Gen3 and a TS3500 Tape library on display.
We have some exciting webcasts in the upcoming weeks!
Smarter Enterprises Need Smarter Storage
In this [InformationWeek webcast], my IBM colleague Allen Marin will present a brief overview of IBM Smarter Storage for the enterprise with a focus on new high-end disk and Virtual Tape solutions.
Allen will take you through the recent enhancements [announced earlier this month], highlighting how the new capabilities can address the requirements of your mission-critical applications, as well as your evolving business analytics, and cloud initiatives.
Date: Wednesday, October 24, 2012 Time: 10:00 AM PDT / 10:00AM Arizona / 1:00 PM EDT Duration: 60 Minutes
[Register now!] All registrants will get the independent Clipper Group Report - "When Infrastructure Really Matters - A Focus on High-End Storage" - free!
Smarter Storage for Midsize Businesses
Businesses of all sizes are getting buried in the avalanche of data. Data is coming in at faster rates and in greater volumes. The value of data is increasing. Old processes and technologies aren't working. Midsize businesses have the same issues managing the rapid growth of data as large enterprises, but they don't have the same size budget or staff. They need advanced capabilities at an affordable price that are easy to implement.
Speakers for this webcast include Brian Truskowski, General Manager, IBM System Storage and Networking; Ed Walsh, Vice President of Market and Strategy, IBM System Storage; and Tommy Rickard, IBM Director, UK Storage Development.
Date: Tuesday, November 6, 2012 Time: 8:00 AM PST / 9:00AM Arizona / 11:00 AM EST Duration: 60 Minutes
[Register now!] Learn how new IBM Smarter Storage solutions can help midsize businesses tame the explosion of information and their IT budgets.
I hope you can find time in your busy schedule to participate in one or both of these webcasts.
Over on his Backup Blog, fellow blogger Scott Waterhouse from EMC has a post titled
[Backup Sucks: Reason #38]. Here is an excerpt:
Unfortunately, we have not been able to successfully leverage economies of scale in the world of backup and recovery. If it costs you $5 to backup a given amount of data, it probably costs you $50 to back up 10 times that amount of data, and $500 to back up 100 times that amount of data.
If anybody can figure out how to get costs down to $40 for 10 times the amount of data, and $300 for 100 times the amount of data, they will have an irrefutable advantage over anybody that has not been able to leverage economies of scale.
I suspect that where Scott mentions we in the above excerpt, he is referring to EMC in general, with products like
Legato. Fortunately, IBM has scalable backup solutions, using either a hardware approach, or one purely with software.
The hardware approach involves using deduplication hardware technology as the storage pool for IBM Tivoli Storage Manager (TSM). Using this approach, IBM Tivoli Storage Manager would receive data from dozens, hundreds or even thousands
of client nodes, and the backup copies would be sent to an IBM TS7650 ProtecTIER data deduplication appliance, IBM TS7650G gateway, or IBM N series with A-SIS. In most cases, companies have standardized on the operating systems and applications used on these nodes, and multiple copies of data reside across employee laptops. As a result, as you have more nodes backing up, you are able to achieve benefits of scale.
Perhaps your budget isn't big enough to handle new hardware purchases at this time, in this economy. Have no fear,
IBM also offers deduplication built right into the IBM Tivoli Storage Manager v6 software itself. You can use sequential access disk storage pool for this. TSM scans and identifies duplicate chunks of data in the backup copies, and also archive and HSM data, and reclaims the space when found.
If your company is using a backup software product that doesn't scale well, perhaps now is a good time to switch over to IBM Tivoli Storage Manager. TSM is perhaps the most scalable backup software product in the marketplace, giving IBM an "irrefutable advantage" over the competition.
(This series started with my post [Kindergarten desktop - The Challenge]. I have a 512MB RAM system with 40GB disk drive that I will install Linux and educational software for a class full of kindergarten children.
First, I re-partitioned the 40GB hard drive as follows. On the extended partition, sda5 will hold my system utilities, like Clonezilla and SystemRescue, and sda6 is my swap space. This gives me three primary partitions to install three flavors of Linux to try out.
The first was [LinuxKidX], which actually started out as a Portuguese-language effort in Brazil. It was then translated to the English language to extend its reach. It is based on the KDE desktop familiar to users of OpenSUSE Linux.
Many of the education software were similar or the same as those from Edubuntu I mentioned in my last post. However, not everything was translated, and unless you are able to read Portuguese, you may not want this one.
Next, I wanted to look at [Qimo for Kids], but first I had to look for the distribution, as the mirrors listed seemed to be unavailable. I was able to find an qimo-2.0-desktop.iso on CNET.com
Unlike Edubuntu, Qimo fits on a CD-ROM for older PCs that may not have DVD drives. Based on lightweight XFCE desktop, the LiveCD runs comfortably in 512MB, with a kid-friendly app launcher at the bottom of the screen. However, Qimo 2.0 is based on Ubuntu 10.04 (Lucid Lynx) LTS, with long term support expiring this May 2013. The Firefox 3.6.3 was too old to run Gmail.
Why hasn't Qimo been enhanced since 2010? It looks like you can just install the packages qimo-session and qimo-wallpaper on newer levels of Ubuntu.
Third, I tried Foresight Linux for Kids 1.0 release. The most recent Foresight is 2.5.3, but Linux for Kids is still at the 1.0 level. The "installer" was very outdated, so the website suggested following the [power-user install HOWTO].
The HOWTO can be a bit intimidating, but I was able to install just fine in 512MB of RAM. Foresight detected I had pre-configured a swap space, and used that to help finish the install process.
Like the others, it had many of the same educational software as before. A key difference is the [Conary package management]. Most systems use either Debian (DEB) or Redhat Package Manager (RPM), but this one is different, and the use of Conary may reduce the number of software applications available.
So what have I learned from these?
All of them seemed to have the same set of educational software: gCompris, eToys, Tux for math and typing.
I want a Linux that uses traditional package management, either DEB or RPM.
The 512MB RAM does not seem to be a difficult limitation. While installation may have been more complicated, they all ran well in 512MB.
If you have had any experience with any of these three distros, please comment below.
Normally, IBM only makes announcements on Tuesdays, but today, Friday, IBM announces that it acquired Diligent Technologies. What? I got a lot ofquestions about this, so I thought I would start with this...
When I posted in January that[IBM Acquires XIV],fellow EMC blogger Mark Twomey of StorageZilla fame, sent me a comment:
"Ah now Tony I wasn't poking fun. Indeed I find it fascinating that Moshe who's been sitting out on the fringes for years having been banished for being an obstructionist to EMC entering the mid-market is now back.
Which reminds me what happens with Diligent? There his as well aren't they or has he packed his stake in that in?"
As you might have guessed, I am privy to a lot of stuff going on behind the scenes at IBM that I can't talk about in this blog, and all these rumors in the blogosphere about IBM acquisition of Diligent was a topic I couldn't officially recognize, defend or deny, until official IBM announcements were made.
In his latest post, Mark wonders about[the last Tape and Mainframe sales person on earth]. He recounts my interaction with fellow HDS blogger Hu Yoshia about the energy benefits ofVirtual Tape Libraries. Knowing that we were going to announcement IBM's acquisition of Diligent soon, I thoughtthis would be a worthy exchange, driving up the sales of Diligent boxes (whether you buy them from IBM or HDS).Diligent already had reselling arrangements with HDS, and IBM plans to continue thosearrangements going forward with HDS. As I have explained before in my post [Supermarketsand Specialty Shops], IBM and HDS cater to different customers, so if a customer who wants the best technologyfrom a specialty shop, they can buy IBM Diligent products from HDS, but if they want one-stop shopping, they can buyIBM Diligent directly from IBM or its other IBM Business Partners.
(Perhaps a more tricky situation is that Diligent also had an arrangement with Sun Microsystems, which competesdirectly against IBM as another IT supermarket vendor, but I have not heard how IBM has decided to handle thisgoing forward.)
For more on this intricate mess of interconnected companies, alliances and partnerships, read Dave Raffo's article[Data dedupe dance cardfilling up] over at Storage Soup.
So, let's tackle the first question:
Q1. What will happen to IBM's real tape library business?
Come on! IBM is Number one in tape, we've had virtual tape libraries since 1997 (the first in the industry)and continue to do well in both virtual and real tape libraries. Both provide value to the customer, and bothhave their place as part of the overall "information infrastructure". This acquisition provides yet another choicefor clients on our "supermarket" shelf.
(For those following the ["which is greener"] discussion, the robot of the IBM TS3500 real tape library consumes185W per frame (when moving) and each tape drive consumes 50W (when actively working on a tape). Compared to 13W per SATA disk drive, each 6-drive frame of a TS3500 consumes as much electricity as 37 SATA disk drives. If you are not running backups 24x7, the total KWh per day for your tape library is actually quite less, but as several people have pointed out, there are customers that do run backups 80-90 percent of the time. LTO-4 tapes can hold 800GB uncompressed, and SATA disk are now available in 1TB (1000 GB) size, so you can have fun with your own comparisons.)
Meanwhile, Scott Waterhouse, one of the few people at EMC who understand tape workloadslike backup and archive, takes me to task in his Backup Blog with his post[I want a Red Ferrari].For those who are surprised that anyone at EMC might understand backup workloads, EMC did acquire a company calledLegato, and perhaps Scott came from that acquisition. I've never met Scott in person, but based solely only fromhis writings, he seems to know his stuff and makes strong arguments for using IBM Tivoli Storage Manager (TSM) with deduplication and virtual tape libraries.
While TSM does a good job of "deduplicating" at the client first, backing up only changed data, Scott feels database and email repositories must be backed up entirely each time, which is what happens in many other backup software products. Some clients might have 80 percent database/email and only 20 percent files, while others might have less than 20 percent database/email and 80 percent files, so this might influence whether deduplication will have small or big benefit.If TSM has to backup the entire database, even though little has changed since the last backup, that is where deduplication on a virtual tape library can come in handy. For IBM DB2 and Oracle databases, IBM TSM application-aware Tivoli Data Protection module interface backs up only changed data, not the entire file. Thanks to IBM's FilesX acquisition-- (also coincidently from Israel) --IBM can extend this support now to SQL Server databases as well.However, to be fair, Scott is partly correct, TSM does backup some database and email repositories in their entirety, which is why it is a good idea to have BOTH an IBM virtual tape library with deduplication and Tivoli Storage Manager to handle all cases. This brings us to the next question:
Q2. What will happen to IBM's patented "progressive backup" technology?
IBM will continue to use TSM's progressive backup technology. TSM already works great with Diligent virtual tapelibraries. One example is LAN-free backup. In this configuration, the TSM client writes its backups directly toa virtual or real tape library, over the SAN, and then sends the list of files backed up to the TSM server over theLAN to record in its database. This can greatly reduce IP traffic on your LAN during peak backup periods. For more about this, see the IBM Redbook titled["Get More Out of Your SAN with IBM Tivoli Storage Manager"].
Jon Toigo from DrunkenData asks[Did IBM Do Due Diligence Before Making Diligent Acquisition a Done Deal?] which is probably always a valid question. Unlike XIV, I wasn't part of the Diligent acquisition team, so I can't provide first hand account of the process. I am told that the IBM team did all the right things to make sure everything is going to turn out right.Sadly, many companies that make acquisitions in the IT industry fail to make them work. Fortunately, IBM is one of the few companies that has a great success record, with over 60 acquisitions in the past six years.In the Xconomy forum, Wade Rousch writes[IBM and the Art of Acquisitions]and gives some insight why IBM is different. Jon did not understand why Cindy Grossman, IBM VP of tape and archive solutions, ran the analyst conference call for this announcement, which brings me to the next question:
Q3. What is Diligent virtual tape library going to be categorized as, a disk system or a tape system?
IBM organizes its storage systems based on the host application workloads.Products to address disk workloads (SVC, DS8000 series, DS6000 series, DS4000 series, DS3000 series, N series, XIV Nextra) are in our disk systems group. Storage that appears to host applications like a tape system to address workloads like backup and archive (tape drives, libraries and tape virtualization) are in our tape and archive group. IBM Diligent has two products, one for big workloads and one for medium workloads. Both look liketape systems, so our tape and archive team, who understand tape workloads like backup and archive the best, are obviously the best choice to support IBM Diligent in the mix.
IBM will offer both N series and Diligent deduplication capabilities. For disk workloads, IBM N series offers a post-process deduplication feature at no additional charge. For tape workloads, IBM will now offer an in-line deduplication feature with Diligent Technologies. Different workloads, different offerings.
As with any acquisition, there will be some changes. The 100 folks from Diligent will get to learn the IBM wayof doing things. This brings me to our fifth and final question:
Q5. What is the correct spelling: deduplication or de-duplication?
It appears that Diligent has a corporate-wide standard to hyphenate this term (de-duplication), but the "word police" at IBM that control and standardize all "proper spellings, trademarks, and capitalization" have sent me corporate instructions a few days ago that IBM does not to hyphenate this term (deduplication). So, going forward, it will be "deduplication", or "dedupe" for short.I suspect one of the first tasks that our new IBMers from Diligent will be doing is removing all those hyphens fromthe [Diligent Technologies website]!
That's all for now, I'm off to Chicago, Illinois tomorrow!
Today, I'll cover the announcements related to our IBM System Storage N series disk systems, which ties inwith Valentines Day theme nicely. The phrase we use for "unified storage" is that N series allows you to "share the closet, not necessarily the clothes". Couples recognize the value of a shared closet over having one closet for just the man's clothes, and a separate closet for just the woman's clothes. (For some couples, the man's closet would be terribly under utilized!). By analogy, the N series allows you to share one solution for LUNs that can be accessed via FCP or iSCSI protocols, and NAS file systems that can be accessed via NFS and CIFS protocols. In most data centers, Windows and UNIX applications are about as likely to share files as men and women are to wear each other's clothes, so the analogy is in tact.
Let's take a look at what got announced:
N7700 and N7900
There are actually [eight new high-end N series] models. the N7900 has 4 processors and 32GB of cache. The N7700 has 2 processors and 16GB cache. Each has two appliance models (A11 single node and A21 dual node) and two gateway models (G11 single node and G21 dual node).
The appliance models support both FC and SATA disk. The N7900 A models support a maximum of 1176 drives; the N7700 A models supports 840 drives. The gateway models provide FCP, iSCSI and NAS host access through external disk attachment. The N7900 gateway models support 1176 LUNs on external disk systems; the N7700 gateway models support 840 external LUNs.
N series now supports 1 TB SATA disk
The [EXN1000 expansion drawer] can now have up to fourteen 1TB SATA drives. This is in addition to previousannouncements supporting 500GB and 750GB drive capacities. These drawer support the entire N series line.
With 1 TB drives, the N7900 now supports up to 1176 TB of raw capacity, which is over 1PB of usabledata in 12+2P RAID-DP mode. This is greater than the internal disk capacity limits of current IBM DS8000, EMC DMX andHDS USP-V models.
At the low end, both the N3300 and N3600 now support 500GB, 750GB and 1TB SATA drives in addition to the SASdrives they supported.
SnapManager for Microsoft SharePoint
There is a new SnapManager in town. This one is for Microsoft SharePoint data. See the announcementfor the [N3300 and N3600] for details.
On Jan 24, IBM signed agreements with [Ingram Micro, Tech Data, and Synnex], to distribute the N Series products and work with IBM to recruit new solution providers to the line. These three are all well-respected world-class distribution providers, so weare glad to have increased our partnership with them on this.
Well, I'm back in Tucson, and thought I would close out my coverage of this year's Data Center Conference 2009 with some pictures. These first few are from the Solution Showcase.
There were four stations at the IBM booth. I had the "Information Infrastructure" station, you can see here I had my blook (blog-based book) on display "Inside System Storage: Volume I", a solid-state drive (in clear plexiglas to show all the chips inside), and the GUI panel for XIV.
What really stole the show was the IBM Portable Mobile Data Center (PMDC), which is a shipping crate with a fully running data center inside. In the one shown here, we had iDataPlex servers connected to an IBM XIV Storage System. Here is David Bricker striking a pose.
Inside, Monica Martinez shows off the iDataPlex servers. These are 1U servers that are only half as deep as regular servers, so you can pack 84 servers in the floorspace of 42 traditional 1U servers.
Two of these fit into a 2U chassis to share a common power supply and fan set. The trouble with traditional 1U servers is that fans do not have enough radius, so putting wider 2U fans for two servers gives you much better airflow.
Monica Martinez, Ruth Weinheimer, and Tamara Rice.
Jamie Thomas, IBM General Manager of Storage and Software Defined Environments
Jamie announced [IBM Elastic Storage], a new offering that is available as a software defined storage solution, based on IBM's General Parallel File System (GPFS) technology already deployed at 45,000 installations.
IBM Elastic Storage provides a global name view across data center locations. It can manage up to a Yotabyte of information, combining Flash, disk and tape resources. It supports OpenStack interfaces, Hadoop and standard POSIX file system conventions.
IBM Elastic Storage provides automated tiering to move data from different storage media types. Infrequently accessed files can be migrated to tape and automatically recalled back to disk when required. Unlike traditional storage, it allows you to smoothly grow or shrink your storage infrastructure without application disruption or outages.
IBM Elastic Storage software can run on a cluster of x86 and/or POWER-based servers, and can be used with internal disk, commodity storage, or advanced storage systems from IBM or other vendors.
IBM partnered with various clients in different industries in a special beta program. Jamie led a client panel to discuss their experiences with IBM Elastic Storage:
Alan Malek, Director of IT, Cypress Semiconductor.
"Total cycle time is key". Over the past 31 years, they bought whatever file storage was available. Now, with IBM Elastic Storage, the performance was very consistent for their engineering workloads with full load balancing.
Russell Schneider, Principal Storage Consultant, Jeskell.
Russell's company works with a lot of federal agencies, "Big Data has become Bigger Data". For example, research on Global Warming and Climate Change requires a large amount of storage across agencies.
In another example, when the tsunami hit Japan a few years ago, an agency here in the USA realized they had 14PB of data stored as a single copy in a data center at sea level less than a mile from the coast. They realized they needed to have a secondary copy, and an option to cache to a third location depending on regional disasters.
Matthew Richards, Products, OwnCloud.
For those not familiar with OwnCloud, it provides a Dropbox-like file sharing service, but in the Enterprise, with on-premise storage. It has been fully tested and certified with IBM Elastic Storage to provide a secure file sharing platform.
With IBM Elastic Storage, they were able to scale linearly up to 20,000 users, and are now testing 100,000 users. The need to have intelligent access to files at scale is what Matthew likes about IBM Elastic Storage.
Dr. Michael Factor, IBM Distinguished Engineer at IBM Research
Michael started out explaining there are three areas for storage: block, file and object. The fastest growing type of data is unstructured fixed content with associated metadata. This is ideal for object storage. Michael has been working with OpenStack Swift, an open source interface defined for object storage. He defined "storlets" as follows:
Storlets extend an object store by moving computation to the data -- filtering, transforming, analyzing -- instead of bringing data to the computation.
Storlets have been deployed on a variety of European Union research projects. For example, in partnership with Phillips, a pathology storlet can count the number of cancer cells in an image. By bringing the computation to the data, it eliminates having to transfer large amounts of data over the network.
Storlets can run on-premise and on IBM's SoftLayer IaaS cloud offering.
Bruce Hillsberg, IBM Director of Storage Systems at IBM Research
Bruce led another panel discussion, this time of IBM storage experts:
Vincent Hsu, IBM Fellow and CTO of Storage.
The problem is the isolation of data into "storage silos". Isolation causes problems in managing large amounts of data at scale, and costs more as storage is not fully utilized. IBM Elastic Storage brings everything together, eliminating storage silos.
Michael explained how IBM works with clients all over the world to ensure that storage solutions meet client requirements. For example, storlets can be used to use rich metadata to manage photographs, and display them based on GPS satellite location, or other content that makes it easier to manage these images.
IBM Elastic Storage will support OpenStack Cinder and Swift interfaces. IBM is a platinum sponsor of OpenStack foundation, and is now its second most prolific contributor, with hundreds of full-time employees working on this.
Tom Clark, IBM Distinguished Engineer, Chief Architect, Storage Software, Cloud & Smarter Infrastructure.
Storage Management is a critical piece of Software Defined Storage. This is done in three ways:
The use of analytics to optimize the deployment of storage, based on workload requirements. Storage admins set policies, and then IBM Elastic Storage analytics gather metrics and then optimize data placement and movement based on these policies. IBM Elastic Storage has 70 percent lower TCO that competitive offerings.
The focus on backup services. Backups are not just for data protection, but rather can be used to duplicate or replicate data for testing, for training, and for other purposes. IBM Elastic Storage is fully supported by IBM Tivoli Storage Manager.
Being able to support Hybrid Cloud environments, where some data can be on-premise, and other data off-premise. Storage Management challenges will need to deal with this possibility. IBM Elastic Storage is well positioned for this.
Carl Kraenzel, IBM Distinguished Engineer, Director of Watson Cloud Technology and Support.
Watson is ground-breaking technology, and IBM Elastic Storage technology was at the heart of the Watson that was first introduced in 2011.
To consider IBM Elastic Storage based on lower-cost and higher-scalability is not the full picture. Rather, this is an important platform for Cognitive Computing, which we are just at the tip of the iceberg in exploring. IT systems need to be aware of the context of what we are doing.
While the Grand Challenge demonstration on Jeopardy! was exciting, it is time we stop playing games and apply IBM Elastic Storage to business, to help with health care and medical research, and other problems in society. IBM has already deployed this at Anderson Cancer Center and Memorial Sloan Kettering Cancer Center, for example.
Tom Rosamilia provided closing remarks. IBM Elastic Storage is not just for new workloads in Cloud, Analytics, Mobile and Social (CAMS) but also traditional workloads as well. IBM Elastic Storage provides "data democracy" and allows for "better rested storage administrators" that make fewer mistakes.
Tom opened the floor for questions from the audience:
Q1. Data integrity, not just security but also quality? IBM Elastic Storage has end-to-end data integrity checking built-in.
Q2. How does IT transition from full control to auto-pilot? IBM allows you to tap into existing storage. This is not rip-and-replace. With storage virtualization, IBM hides the complexity that normally requires full control over specific assets.
Q3. Storage admins would rather have a root canal without Novocaine than move their data. What is IBM doing to offer automation to help storage admins move to this new infrastructure? IBM storage virtualization breaks that hard link between applications and specific storage devices. IBM Elastic Storage eliminates application downtime previously associated with data movement.
Tom Rosamilia assured the audience that IBM is fully committed to its storage portfolio. IBM Elastic Storage is not just about the profoundness of what IBM announced today, but also where IBM is investing in the future of storage.
Mark your calendars! If you work in IT and have an interest in storage, then there are two upcoming conferences you might be interested in attending!
Join a network of your peers at
[IBM Pulse2012] who are fundamentally and cost-effectively changing the economics of IT and speeding the delivery of innovative products and services. With four days of top-notch education, Pulse 2012 will help you react with agility in changing competitive landscapes, reduce vulnerability throughout the service lifecycle, and continuously improve the business impact of the technology.
I presented at the very first IBM Pulse back in May 2008, which was a combination event to cover Tivoli Storage, Maximo and Netcool. For a bit of nostalgia, read my 2008 blog posts:
The IBM Pulse conference has certainly evolved over the past few years! The agenda is not yet finalized, so I don't know if I will be there again this year.
The second event has a new name. [IBM Edge2012] is the premier storage event that brings together innovative IBM technologies, world class training, leading industry experts, and compelling client success stories and best practices. Edge2012 is dedicated to helping you design, build and implement efficient storage infrastructure solutions.
We started doing these back in the mid-90s, entitled the "IBM Storage Symposium", then later the "IBM System Storage and Storage Networking Symposium". In 2007, I was there in Las Vegas presenting on a variety of topics. See my blog post [Storage Symposium 2007 recap].
In 2008, we had a version of the Storage Symposium down in Cuernavaca, Mexico. Not only did I present, but it was also a "book signing" event for my first book [Inside System Storage: Volume I]. Here were my blog posts: [Introduction], and [Conclusion]. We also had an event in the United States, as well as Montpelier, France, but since I already went to the one in Mexico, I let my colleagues go to these other ones instead.
In 2009, IBM experimented with combining two conferences under one roof in Chicago, IL. The IBM System Storage and Storage Networking Conference was combined with the IBM System x and BladeCenter Technical Conference. The idea was that server people would probably also be interested in storage, and storage admins might also be interested in x86-based servers. See my blog post
[Storage Symposium 2009 recap].
In 2010, System Storage and System x were once again combined, held in Washington DC, but the conferences were renamed to IBM System Storage Technical University and the IBM System x Technical University to give them a common look and feel. See my blog post [Storage University 2010 review].
In 2011, not satisfied that two data points was inconclusive, IBM continued the experiment, hosting both System Storage and System x conferences in Orlando, Florida. Here were my blog posts:
The results are now in. While I think it is admirable to run multiple conferences at the same time in the same place can help reduce costs and consolidate adminstration, it can have its drawbacks also. In the case of System Storage and System x, we learned a few things:
Having System x and Storage in the same conference gave the appearance that the conference was not focused on either. At smaller companies, there might be people who manage both x86 servers and storage, but at larger companies, servers and storage are managed by separate people, often in separate departments with different travel budgets.
Nearly all of IBM's storage attaches to IBM System x servers. However there are some clients that run AIX, IBM i or System z mainframes that might not have considered attending this conference, thinking that it was focused on storage for System x servers.
Both conferences were considered technical education, and might not have appealed to upper IT executives and directors as something to help make purchase decision from a business perspective, or to nework with peers of other decision makers.
The solution - IBM Edge. This conference is focused 100 percent on storage. There will be "Executive Edge" for decision makers to network with their peers, and "Technical Edge" for the storage admins to get the technical education they are looking for on IBM System Storage and Networking products and solutions. Please note that this conference was held in July or August in previous years, but will be held in June this year.
I am very excited about this new direction, and plan to be there in June 4-8 for this event!
On Tuesday, I covered much of the Feb 26 announcements, but left the IBM System Storage DS8000 for today so that it can haveits own special focus.
Many of the enhancements relate to z/OS Global Mirror, which we formerly called eXtended Remote Copy or "XRC", not to be confused with our "regular" Global Mirror that applies to all data. For those not familiar with z/OS Global Mirror, here is how it works. The production mainframe writes updates to the DS8000, and the DS8000 keeps track of these in cache until a "reader" can pull them over to the secondary location.The "reader" is called System Data Mover (SDM) which runs in its own address space under z/OS operating system. Thanks to some work my team did several years ago, z/OS Global Mirror was able to extend beyond z/OS volumes and include Linux on System z data. Linux on System z can use a "Compatible Disk Layout" (CDL) format (now the default) that meetsall the requirements to be included in the copy session.
IBM has over 300 deployments of z/OS Global Mirror, mostly banks, brokerages and insurance companies. The feature can keep tens of thousands of volumes in one big "consistency group" and asynchronously mirror them to any distance on the planet, with the secondary copy recovery point objective (RPO) only a few seconds behind the primary.
Extended Distance FICON
Extended Distance FICON is an enhancement to the industry-standard FICON architecture (FC-SB-3) that can help avoid degradation of performance at extended distances by implementing a new protocol for "persistent" Information Unit (IU) pacing. This deals with the number of packets in flight between servers and storage separated by long distances, andcan keep a link fully utilized at 4Gpbs FICON up to 50 kilometers. This is particularly important for z/OS GlobalMirror "reader" System Data Mover (SDM). By having many "reads" in flight, this enhancementcan help reduce the need for spoofing or channel-extender equipment, or allow you to choose lower-costchannel extenders based on "frame-forwarding" technology. All of this helps reduce your total cost of ownership (TCO)for a complete end-to-end solution.
This feature will be available in March as a no-charge update to the DS8000 microcode.For more details, see the [IBM Press Release]
z/OS Global Mirror process offload to zIIP processors
To understand this one, you need to understand the different "specialty engines" available on the System z.
On distributed systems where you run a single application on a single piece of server hardware, you mightpay "per server", "per processor" or lately "per core" for dual-core and quad-core processors. Software vendors were looking for a way to charge smaller companies less, and larger companies more. However, you might end up paying the same whether you use 1GHz Intelor 4GHz Intel processor, even though the latter can do four times more work per unit time.
The mainframe has a few processors for hundreds or thousands of business applications.In the beginning, all engines on a mainframe were general-purpose "Central Processor" or CP engines. Based on theircycle rate, IBM was able to publish the number of Million Instructions per Second (MIPS) that a machine witha given number of CP engines can do. With the introduction of side co-processors, this was changed to "Millionsof Service Units" or MSU. Software licensing can charge per MSU, and this allows applications running in aslittle as one percent of a processor to get appropriately charged.
One of the first specialty engines was the IFL, the "Integrated Facility for Linux". This was a CP designatedto only run z/VM and Linux on the mainframe. You could "buy" an IFL on your mainframe much cheaper than a CP,and none of your z/OS application software would count it in the MSU calculations because z/OS can't run on theIFL. This made it very practical to run new Linux workloads.
In 2004, IBM introduced "z Application Assist Processor" (zAAP) engines to run Java, and in 2006, the "z Integrated Information Processor" (zIIP) engines to run database and background data movement activities.By not having these counted in the MSU number for business applications, it greatly reduced the cost for mainframe software.
Tuesday's announcement is that the SDM "reader" will now run in a zIIP engine, reducing the costs for applicationsthat run on that machine. Note that the CP, IFL, zAAP and zIIP engines are all identical cores. The z10 EC hasup to 64 of these (16 quad-core) and you can designate any core as any of these engine types.
Faster z/OS Global Mirror Incremental Resync
One way to set up a 3-site disaster recovery protection is to have your production synchronously mirrored to a second site nearby, and at the same time asynchronously mirrored to a remote location. On the System z,you can have site "A" using synchronous IBM System Storage Metro Mirror over to nearby site "B", and alsohave site "A" sending data over to size "C" using z/OS Global Mirror. This is called "Metro z/OS Global Mirror"or "MzGM" for short.
In the past, if the disk in site A failed, you would switch over to site B, and then send all the data all over again. This is because site B was not tracking what the SDM reader had or had not yet processed.With Tuesday's announcement, IBM has developed an "incremental resync" where site B figures out what theincremental delta is to connect to the z/OS Global Mirror at site "C", and this is 95% faster than sendingall the data over.
IBM Basic HyperSwap for z/OS
What if you are sending all of your data from one location to another, and one disk system fails? Do you declare a disaster and switch over entirely? With HyperSwap, you only switch over the disk systems, but leave therest of the servers alone. In the past, this involved hiring IBM Global Technology Services to implementa Geographically Dispersed Parallel Sysplex (GDPS) with software that monitors the situation and updates thez/OS operating system when a HyperSwap had occurred. All application I/O that were writing to the primary locationare automatically re-routed to the disks at the secondary location. HyperSwap can do this for all the disk systems involved,allowing applications at the primary location to continue running uninterrupted.
HyperSwap is a very popular feature, but not everyone has implemented the advanced GDPS capabilities.To address this, IBM now offers "Basic HyperSwap", which is actually going to be shipped as IBMTotalStorage Productivity Center for Replication Basic Edition for System z. This will run in a z/OSaddress space, and use either the DB2 RDBMS you already have, or provide you Apache Derby database for thosefew out there who don't have DB2 on their mainframe already.
Update: There has been some confusion on this last point, so let me explain the keydifferences between the different levels of service:
Basic HyperSwap: single-site high availability for the disk systems only
GDPS/PPRC HyperSwap Manager: single- or multi-site high availability for the disk systems, plus some entry-level disaster recovery capability
GDPS/PPRC: highly automated end-to-end disaster recovery solution for servers, storage and networks
I apologize to all my colleagues who thought I implied that Basic HyperSwap was a full replacement for the morefull-function GDPS service offerings.
Extended Address Volumes (EAV)
Up until now, the largest volume you could have was only 54 GB in size, and many customers still are using 3 GB and 9 GB volume sizes. Now, IBM will introduce 223 GB volumes. You can have any kind of data set on these volumes,but only VSAM data sets can reside on cylinders beyond the first 65,280. That is because many applications still thinkthat 65,280 is the largest cylinder number you can have.
This is important because a mainframe, or a set of mainframes clustered together, can only have about 60,000disk volumes total. The 60,000 is actually the Unit Control Block (UCB) limit, and besides disk volumes, youcan have "virtual" PAVs that serve as an alias to existing volumes to provide concurrent access.
Aside from the first item, the Extended Distance FICON, the other enhancements are "preview announcements" which means that IBM has not yet worked out the final details of price, packaging or delivery date. In many cases, the work is done, has been tested in our labs, or running beta in select client locations, but for completeness I am required to make the following disclaimer:
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Availability, prices, ordering information, and terms and conditions will be provided when the product is announced for general availability.
Normally, IBM has its announcements on Tuesdays, but this week it was on Monday!
I am here in New York City, at the Kaufmann Theater of the American Museum of Natural History, for the
[IBM Storage Innovation Executive Summit]. We have about 250 clients here, as well as many bloggers and storage analysts.
My day started out being interviewed by Lynda from Stratecast, a division of [Frost & Sullivan]. This interview will be part of a video series that Stratecast is doing about the storage industry.
(About the venue: American Museum of Natural History was built in 1869. It was featured in the film "Night at the Museum". In keeping with IBM's focus on scalability and preservation, the museum here boasts skeletons of the largest dinosaurs. The five-story building takes up several city blocks, and the Kaufmann theater is buried deep in the bottom level, well shielded from cell phone or Wi-Fi signals allowing me to focus on taking notes the traditional way, with pen and paper.)
Deon Newman, IBM VP of Marketing for Northa America, was our Master of Ceremonies. Today would be filled with market insight, best practices, thought leadership, and testimonials of powerful results.
This is my first in a series of blog posts on this event.
Information Explosion on a Smarter Planet
Bridget van Kralingen, IBM General Manager for North America, indicated that storage is finally having its day in the sun, moving from the "back office" to the "front office". According to Google's Eric Schmidt, we now create, capture and replicate more date in two days than all of the information recorded from the dawn of time to the year 2003.
1928: IBM's innovative 80-column punch card stored nearly twice as much as its 50-column predecessor.
1947: Bing Crosby decided to do his radio show by recording it at his convenience on magnetic tape, rather than doing it live. This was the motivation for IBM researches to investigate tape media, delivering the first commercial tape drive in 1952. One tape reel could hold the equivalent of 30,000 punch cards.
1956: the IBM RAMAC mainframe was the first computer to access data randomly with an externally-attached disk system, the "350 Disk Unit", which stored 5 million 7-bit characters (about 5MB) and weighed over 500 pounds. Compare that today's cell phone that can store several GB of data in a handheld device.
1978: IBM invented Redundant Array of Independent Disks (RAID) through a collaboration with University of Berkeley.
1993: IBM introduces the [IBM 9337 Disk Storage Array], the first external disk storage system for distributed operating systems. This was based on the Serial Storage Architecture [SSA] protocol.
1995: IBM launches products that support Storage Area Networks (SAN), based on the Fibre Channel Protocol. IBM's internal codenames for disk products were all names of sharks, and so our internal mantra was that a healthy storage diet was comprised of "Plenty of Fish and Fibre".
2010: IBM ships Easy Tier, the world's easiest-to-use sub-LUN automated tiering capability, for the IBM System Storage DS8700 disk system.
Storage is growing (in capacity) at 40 percent per year, but IT budgets are only growing (in dollars) by a measly 1 to 5 percent. She cited the success at [Sprint], presented at the October 2010 launch. By combining IBM SAN Volume Controller with a three-tier storage architecture, Sprint lowered their raw capacity from 10PB to 8.4PB, increasing utilization from 35 to 78 percent. This involved shrinking from six storage vendors to three, and reducing total number of disk arrays from 166 down to 96. The resulting system has only 38 percent of their data on their most expensive Tier-1 storage, the rest is now living on less expensive Tier-2 and Tier-3 storage.
Companies are entering the era of Big Data with an insatiable appetite for collecting and analyzing data for marketplace insights. IBM [InfoSphere BigInsights], based on the Apache Hadoop, has helped customers make sense of it all. Innovative technology, expertise and marketplace insight will provide the competitive path forward in the coming decade.
Storage Challenges and Opportunities in 2011 and Beyond
I always enjoy hearing Stan Zaffos, Gartner Research VP, present at the annual [Data Center Conference] in Las Vegas every December. His analysis and research focuses on storage systems and emerging storage technologies.
Stan provided his perspective on the storage industry. He suggested a top-down approach, based on the market trends that Gartner is closely monitoring. He suggests focusing heavily on managing data growth, using SLAs to improve efficiency, and to follow Gartner's recommended actions. His statement, "If something is not sustainable, then it is unsustainable." resonated well with the audience. His key three points:
Design to meet but not exceed Service Level Agreements (SLAs)
Re-evaluate your ratio of SAN versus NAS based on growth of unstructured data content,
Explore the variety of Cloud options available.
Those of us who have been in this business a long time recognize that the problems haven't changed, just the dimensions. When in the past three decades were IT budgets generous and plentiful? When was there more than enough IT staff to handle all the requests in a timely manner? When hasn't there been a period of information growth? Gartner's analysis external control block (RAID protected disk systems) is growing revenue at 8.7 percent. Raw TBs of disk capacity is growing at 55 percent, and expected to be 100 Exabytes by 2015.
SAN has four times more revenue than NAS today, but NAS is growing faster. NAS was only 9 percent marketshare in 2010, but is projected to grow to 32 percent by 2015. SAN can offer higher price/performance for traditional OLTP and database workloads, but NAS is better suited for unstructured data, backups and archives, assisted by storage efficiency features like real-time compression and data deduplication. Which industries create the most unstructured data? The ones involved in filling out forms! This includes government, insurance agencies, manufacturing, mining and pharmaceuticals.
The phrase "good enough" should no longer be considered an insult. Too often IT departments design solutions that far exceed negotiated Service Level Agreements (SLAs), and they should instead focus on just meeting them instead. Modular storage systems are often sufficient for most workloads. Slower 7200RPM SATA disks can be one third the price of faster 15K RPM Fibre Channel drives, and often sufficient performance for the tasks required. Unified storage, such as IBM N series, can help simplify capacity planning, as storage can be re-purposed if different workloads grow at different rates. The key is to focus on meeting SLAs based on the price-vs-risk factor. Take a minimalist approach with fewer SLAs, fewer management classes, and fewer storage vendors.
Stan suggests a two-pronged approach: Capacity management through content analytics and classification, and Efficient Utilization through Thin Provisioning, storage virtualization, Quality of Service (QoS), compression and deduplication capabilities. This features will be ubiquitous by 2013. If you are worried that these technologies mean more information packed onto fewer devices, Stan's response was "If it's not there, it can't break." Storing data on fewer disks or tape cartridges means less chance something will fail.
Stan feels IT shops using Thin Provisioning should continue to charge their end-users on what they ask for (the full allocation request) rather than what the thin-provisioned amount actually is on the storage devices themselves. For example, if someone asks for 100GB LUN to be allocated to their system, but this only takes up 30GB of actual data space, chargeback the full 100GB!
It can take five years for new technology to get 50 percent adopted. The Romans took eight years to build the [Colosseum]. His research on "network convergence" found that 42 percent planned to use iSCSI, 32 percent Fibre Channel over Ethernet (FCoE) or other Top-of-Rack(TOR) converged switches, and 16 percent looking for full convergence of servers, switches and storage. Features like IBM Easy Tier automatic sub-LUN tiering were introduced later, and so have not been adopted as widely as other features like Thin Provisioning that have been around since the 1990's IBM RAMAC Virtual Array.
Stan felt that Public and Private clouds were two different approaches. Public clouds offer reservation-less provisioning. Private clouds offer improved agility, but can be more complex to set up, and has the risk of idle capacity similar to traditional IT datacenter deployments. Storage and File virtualization should be considered a pre-req for adopting Cloud technologies.
Storage IT teams need to adopt more than just technical skills. They need to learn about legal and government regulatory compliance issues, financial considerations, and would even benefit doing some "marketing". Why marketing? Because often IT departments need end-users to change their attitudes and behaviours, and this can be accomplished through internal marketing campaigns.
I've blogged about some of these videos already, but since there are probably a few out there buying the brand new Apple iPhone looking for YouTube videos to play on them, these links might provide some exampleentertainment on your new handheld device.
Next week has "Fourth of July" Independence Day holiday in the USA smack in the middle of the week, so I suspect the blogosphereto quiet down a bit. So whether you are working next week or not, in the USA or elsewhere, take some time to enjoy your friends and family.
Lakota Industries made news with the introduction of its [Sarah-Cuda Hunting Bow], named after moose-huntingU.S. Vice President nominee and Governor of Alaska [Sarah Palin]. This has all the same features as their other high-end hunting bows, but is lighter, smaller and available in Pink Camo. This "pink-it-and-shrink-it" move was designed to broaden the market share of hunting bows by reaching out to the needs of women hunters.
Not to be outdone, today, at the Storage Networking World Conference, IBM announced the new IBM System Storage SAN Volume Controller Entry Edition [SVC EE].
The new SVC Entry Edition, available in Flamingo Pink* or traditional Raven Black.
* RPQ required. Default color is Raven Black.
You might be thinking: "Wait! IBM SVC is already the leading storage virtualization product among SMB clients today,why introduce a less expensive model?" With the global economy in the tank, IBM thought it would be nice to help outour smaller SMB clients with this new option.
This new offering is actually a combination of new software (SVC 4.3.1) and new hardware (2145-8A4). Here are thekey differences:
by usable capacity managed, up to 8 PB
by number of disk drives, up to 60 drives
2145-4F2, 8F2, 8F4, 8G4, 8A4
1, 2, 3 or 4 node-pairs, depending on performance requirements
only one node-pair needed
FlashCopy, Metro Mirror and Global Mirror, licensed by subset of capacity used
FlashCopy, Metro Mirror and Global Mirror, but with simplified licensing
The SVC EE is not a "dumbed-down" version of the SVC Classic. It has all the features and functions of theSVC Classic, including thin provisioning with "Space-efficient volumes", Quality of Service (QoS) performance prioritization for more important applications, point-in-time FlashCopy, and both synchronous and asynchronous disk mirroring (Metro and Global Mirror).
While IBM has not yet have SPC-1 benchmarks published, IBM is positioning the SVC EE as roughly 60 percent of the performance, at 60 percent of the list price, compared to a comparable SVC Classic 2145-8G4 configuration. The SVC Classic is already one of the fastest disk systems in the industry. By comparison, the SVC EE is twice as fast as the original SVC 2145-4F2 introduced five years ago.If you outgrow the SVC EE, no problem! The 2145-8A4 can be used in traditional SVC Classic mode, and the SVC EE software can be converted into the SVC Classic software license for upgrade purposes, protecting your originalinvestment!
For those considering an HP EVA 4400 or EMC CX-4 disk system, you might want to look at combining an SVC EE with [IBM System Storage DS3400] disk. The combination offers more features and capabilities, and helps reduce your IT costs at the same time.
And if you are worried you can't afford it right now, IBM Global Financing is offering a ["Why Wait?" world-wide deferral of interest and payments] for 90 days, so you don't have to make your first payment until 2009, applicable to all IBM System Storage products, including the SVC EE, SVC Classic and DS3400 disk systems.
Continuing this week's theme of New Year's Resolutions for the data center, today we'll talk about one that many people make for their own personal lives: staying on a budget.
Often, when faced with a tightening budgets, we try to make more use of what we already have. Tell someone they are only using 10 percent of their brain, and they immediatelybelieve you; but tell them they are only using 30 percent of their storage, and they ask for a whitepaper,magazine article, or clarification on how that percentage is calculated. I actually visiteda customer that was only using6 percent of the storage attached to their Windows servers!
So, to help those of you making data center resolutions to stay on budget, the terms to remember are "Reduce", "Reuse" and "Recycle".
When people come to request storage, are they being reasonable about what they need today, or are they asking for what they might need over the next three years? They might need 50GB, but they ask for 100GB, in case they grow, and a year later, you find they have only 15GB of data on it. On the flipside, the person asks for what they need but some storage admins give out more, just so they don't have to be bothered so often when growth happens. Finally, I have seen this formalized into fixed size LUNs, all the disk is carved into big huge 100GB pieces, so if you need 20GB, here's one big enough with plenty of room to grow.
If you are going to keep on a budget, remember that storage today is 30% more expensive than storage next year. That is the average drop in both disk and tape on a dollar-per-MB basis. If there is any way to postpone giving out storage until it is actually needed, you can save a bundle of money. Timing is everything! In the event of a disaster, getting immediate replacement for disk can be very expensive, but if you can wait just two weeks, you can negotiate a better deal. I thought of this while going to the movie theatre yesterday. A "hot dog" and a bottle of water was $8.00, but if you are able to wait two hours and eat after the movie, you can get a much better meal for less.
A lot of companies buy new storage because their existing storage isn't fast enough, or doesn't have the latest copy services. This can easily be solved with an IBM SAN Volume Controller (SVC). The SVC can virtualize slower, functionless storage, and present to your application hosts virtual disks that are faster, and with all the latest disk-to-disk copy services like FlashCopy, Metro Mirror, and Global Mirror.
Chances are, you have unused disk capacity spread across all your storage today, but perhaps they are formatted into small LUNs. The SVC can combine the capacity, and let you carve up big LUNs at the sizes you need.This is like taking all those tiny pieces of soap in your shower and forming a new bar of soap, or taking all the crumbs at the bottom of your bread box, and making a new slice of bread. And, the virtual LUNs are dynamically expandable,so give out only the amount they need today, as it is simple to expand them to larger sizes later.
Of my 13 patents, the first will always be my favorite, on a function called "RECYCLE" for the Data Facility Storage Management Subsystem Hierarchical Storage Manager (DFSMShsm) product, which is now a component of the IBM z/OS operating system. Basically, tapes could contain hundreds or thousands of files, such as backup versions or archive copies, and these expired on different dates. As a result, a tape would be written100 percent full, and then over time, decrease in valid data to 80, 60, 40, 20 until it hit 0 percent. In some cases, a single filecould hold an entire tape hostage. RECYCLE was able to read the valid data off tapes that were perhaps less than 20 percent full, and consolidate them onto fewer tapes. As a result, a whole bunch of tapes could be returned to the scratch pool, and reused immediately for other workloads. This also helps in moving to newer, higher capacity cartridges, such as the new 700GB cartridge that IBM co-developed with FujiFilm.(This RECYCLE function exists in our IBM Tivoli Storage Manager software, as well as our Virtual Tape Server, but is called "reclamation" instead, to avoid confusion on searches.)
When evaluating your use of tape, determine if you are making best use of the tapes you have now, and perhaps a RECYCLE (or reclamation) scheme may be in order. Fewer tapes can save money in many ways, such as reduced storage costs, and reduced courier costs to send the tapes offsite. Tape media can still be 10-20 times less expensive than disk, based on full capacity.