Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is a Master Inventor and Senior IT Specialist for the IBM System Storage product line at the
IBM Executive Briefing Center in Tucson Arizona, and featured contributor
to IBM's developerWorks. In 2011, Tony celebrated his 25th year anniversary with IBM Storage on the same day as the IBM's Centennial. He is
author of the Inside System Storage series of books. This blog is for the open exchange of ideas relating to storage and storage networking hardware, software and services. You can also follow him on Twitter @az990tony.
(Short URL for this blog: ibm.co/Pearson
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData blog, posted a ["bleg"] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
Keeping more backup data available online for fast recovery.
Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
Will this meet my current business requirements?
Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
In this case, it is not chess pieces, but FUD being slung around like mud between vendors. EMC blogger Chuck Hollis' post [Products vs. Features] correctly pointsout that IBM has invented most nearly everything useful in IT, and sadly a few things we wish we hadn't.Gene Amdahl, who left IBM to start his own company, is credited for coining the phrase describing IBM'sinnovative sales techniques. Wikipedia has a nice write up on the history of[Fear, Uncertainty and Doubt(FUD)].
Nowadays, when you hear "FUD" most storage administrators immediately think of EMC, who have taken this method to anew level of art-form. Take for example two EMC entries from fellow blogger BarryB, on his Storage Anarchist blog:[Not Dead Yet, andPushing Daisies].The first is a reference to a funny scene from a Monty Python movie, and the second one is referring to a terriblenew television program called "Pushing Daisies". (In this show, the main character can bring a dead personback to life for sixty seconds, just long enough to ask a few questions on behalf of his detective friend. He must touch the person again within 60 seconds, or someone else randomly dies instead. I amnot a fan of this concept, and found it a bit morbid and creepy. But I digress.)
It is true I was on vacation the past two weeks, but this was group travel I booked over six months ago before we had the exact dates lined up for our various announcements, and not a last-minute celebration of my recent new job assignment. I got all my assignments for this announcement turned in before leaving for my trip. I never thought of checking with fellow IBM blogger BarryW to make sure that we don't have overlapping vacation schedules, leaving the "blogosphere" unmanned, so to speak, but it is not a bad idea. Fortunately, our IBM PR team was able to make their rebuttal through other means. You can read the recap on Techworld [Marketing Wars by Proxy].
Several astute readers on my blog, however, requested that I add my two cents. Let's take a look at some of BarryB's comments:
...most DS8300's are to this day most frequently bundled as "free" storage with IBM mainframe and server sales.
We just shipped our 15,000th box, so for this absurd statement to be true, more than half would have to be given away as part of a server-and-storage deal?Actually, about a third of our DS8000 sales are sold with servers in the same bundle, and while we do provide discounts from the official list price, that is not the same as "free". The other two thirds are sold into accounts to be used with the existing servers already deployed. So BarryB, your math doesn't work out. (Perhaps you've been taking Hitachi math lessons???)
It is interesting however, that when we do a 4-year TCO comparison, between a normally-discounted DS8000 versus free EMC DMX4 hardware, IBM still has the lower cost, given that most of the price-gouging from EMC happens after the initial sale, through software features, annual Powerpath renewals and MES upgrades. If you are an EMC customer, and you are planning to add more capacity to your DMX, ask EMC to charge you no more than what you originally paid on a dollar-per-GB basis for the initial capacity. That's only fair, right?
...No thin provisioning, or even a commitment to thin provisioning. Just crickets. (Celerra support since Jan 2006...
EMC DMX does not have thin provisioning available today either, so BarryB brings up Celerra, their NAS box? IBM System Storage N series NAS box also has thin provisioning, so if you want thin provisioning you can buy a NAS box from EMC or IBM. Thin provisioning makes sense using NAS protocols, as there are actual commands to "delete a file" that can then free up the related blocks in a thin-provisioned environment. The only way to do this with block-oriented protocols is to get the OS to notify the storage device that blocks can be freed up. As it turns out, IBM's z/OS has such support, which we developed specifically for our thin-provisioning support in our IBM RAMAC Virtual Array disk systems back in the 1990s.For block-oriented devices on most other operating systems, thin provisioning may not be all that it is cracked up to be.
No SATA drives (only DMX-4 supports native SATA-II drives, since Aug’07)
A few people are confused on this. IBM DS8000 has supported FATA for quite some time now, same slower speeds and higher capacities as SATA, but are technically NOT the same as SATA. FATA are designed to provide better protection against vibrational shock, to improve reliability of the drives. IBM felt that if the data was important enough to put on a high-end system, it should get better-than-SATA treatment. If you really want SATA, try our IBM System Storage N series, DS4000 or DS3000 models.
No RAID 6 (DMX-3 has supported multi-dimensional RAID since Q1’07, DMX-4 since Aug'07, ...
IBM N series supports RAID6, but we called it RAID-DP and that confused some people. Same thing, DP stands for Dual Parity, protecting against a double-disk failure. We also just announced RAID6 on our DS4000 series, by the way.
No 4Gb back-end (USP-V since May '07, DMX-4 since Aug’07)
I found this one odd, since BarryB himself in an earlier post explained why 4Gbps back-end made no difference to DMX4 performance in this post [DMX-4 and Oh So Much More], which I will put into a different color so you can tell it is from a different post:
You may have noticed that there weren't any specific performance claims attributed to the new 4Gb FC back-end. This wasn't an oversight, it is in fact intentional. The reality is that when it comes to massive-cache storage architectures, there really isn't that much of a difference between 2Gb/s transfer speeds and 4Gb/s. Transmit times are really only a tiny portion of I/O overhead, and just don't make that much difference when a massively-cached system is pre-fetching reads, buffering/delaying writes and reordering I/O requests to minimize seek times. Not that 4Gb/s won't help some applications, but most people just won't see any noticeable difference.
In this case, BarryB is right. The IBM DS8000's 2Gbps back-end is not a performance bottleneck. The DS8000 with a 2Gbps back-end is faster than DMX4 with a 4Gbps back-end for business application workloads. EMC doesn't publish SPC benchmarks to deny this, so you will just have to take our word on this.
Still only 1024 maximum disk drives (DMX-3 & 4 support up to 2400 drives, USP-V supports 1152)
I would be curious to see how many customers have more than 1024 drives on any high-end disk array.As we learned back in [Day 2 Storage Symposium], the average DS8100 has 17.4 TB, and DS8300 has 41.5 TB capacity. Using 500GB drives,that's only 83 spindles. Even with 73GB drives, that's 568 spindles. Plenty of room for growth, so I am notconvinced that higher theoretical upper architectural limits are worth discussing here.
Still only two HARD LPARs (partitions) ..., and even IBM’s mid-tier products support more than 2 storage partitions (in this same announcement)
IBM's two LPARs are TWICE what EMC DMX offers. I don't even know why anyone from EMC would bring this up? While EMC is enjoying their success with VMware, the lack the experience to carry this over to their storage lines. Until EMC offers MORE THAN TWO of any kind of partitions on their high-end offerings, there just is no credibility here. As for our "storage partitions" on our DS4000 line, that is an unfortunate mis-understanding of the press release. On the DS4000, the term "storage partition" is really "LUN masking", dividing up only which disks can be accessed by which hosts, and not dividing up any processor or cache capacity. So this is not the same as any LPAR concept on any other system. For example, a DS4000 with 64 partitions can be attached to 64 hosts, or 64 host-clusters like a Windows MSCS environment or AIX HACMP.
No native Ethernet replication or iSCSI support (Symmetrix has had since 2002)
Again, I found this one odd. On another EMC post, [Vigorous Debates],Chad Sakac mentions that only 2% of Symmetrix are sold with IP ports, not sure if this is for Ethernet replication, iSCSI attachment, or both (Again, I will use a different color):
On the Symm business (a huge part of EMC’s business – the IP ports are included on 2% of deals. That’s a fact.
Just because engineer can put a feature or function on a box, doesn't mean there is business sense to do so. I would hate for IBM to invest millions of dollars on native iSCSI support, only to have 2% of our DS8000 boxes sold with that feature. Customers who have DS8000 on FC SANs already deployed can easily add iSCSI support either through their SAN switches, or by fronting the DS8000 with an N series gateway. Most customers looking for native iSCSI are the smaller no-SAN-deployed SMB customers, and for them, we have both the DS3300 and the various N series models to choose from.
Well that's my two cents. The DS8000 series remains a strategic part of the IBM System Storage offering matrix, with continued investment in the development, as well as on-going research that we can leverage throughout the IBM company. I would like to read your thoughts on this, post me a comment below.
In his last post in this series, he mentions that the amazingly successful IBM SAN Volume Controller was part of a set of projects:
"IBM was looking for "new horizon" projects to fund at the time, and three such projects were proposed and created the "Storage Software Group". Those three projects became know externally as TPC, (TotalStorage Productivity Center), SanFS (SAN File System - oh how this was just 5 years too early) and SVC (SAN Volume Controller). The fact that two out of the three of them still exist today is actually pretty good. All of these products came out of research, and its a sad state of affairs when research teams are measured against the percentage of the projects they work on, versus those that turn into revenue generating streams."
But this raises the question: Was SAN File System just five years too early?
IBM classifies products into three "horizons"; Horizon-1 for well-established mature products, Horizon-2 was for recently launched products, and Horizon-3 was for emerging business opportunities (EBO). Since I had some involvement with these other projects, I thought I would help fill out some of this history from my perspective.
Back in 2000, IBM executive [Linda Sanford] was in charge of IBM storage business and presented that IBM Research was working on the concept of "Storage Tank" which would hold Petabytes of data accessible to mainframes and distributed servers.
In 2001, I was the lead architect of DFSMS for the IBM z/OS operating system for mainframes, and was asked to be lead architect for the new "Horizon 3" project to be called IBM TotalStorage Productivity Center (TPC), which has since been renamed to IBM Tivoli Storage Productivity Center.
In 2002, I was asked to lead a team to port the "SANfs client" for SAN File System from Linux-x86 over to Linux on System z. How easy or difficult to port any code depends on how well it was written with the intent to be ported, and porting the "proof-of-concept" level code proved a bit too challenging for my team of relative new-hires. Once code written by research scientists is sufficiently complete to demonstrate proof of concept, it should be entirely discarded and written from scratch by professional software engineers that follow proper development and documentation procedures. We reminded management of this, and they decided not to make the necessary investment to add Linux on System z as a supported operating system for SAN file system.
In 2003, IBM launched Productivity Center, SAN File System and SAN Volume Controller. These would be lumped together with Horizon-1 product IBM Tivoli Storage Manager and the four products were promoted together as the inappropriately-named [TotalStorage Open Software Family]. We actually had long meetings debating whether SAN Volume Controller was hardware or software. While it is true that most of the features and functions of SAN Volume Controller is driven by its software, it was never packaged as a software-only offering.
The SAN File System was the productized version of the "Storage Tank" research project. While the SAN Volume Controller used industry standard Fibre Channel Protocol (FCP) to allow support of a variety of operating system clients, the SAN File System required an installed "client" that was only available initially on AIX and Linux-x86. In keeping with the "open" concept, an "open source reference client" was made available so that the folks at Hewlett-Packard, Sun Microsystems and Microsoft could port this over to their respective HP-UX, Solaris and Windows operating systems. Not surprisingly, none were willing to voluntarily add yet another file system to their testing efforts.
Barry argues that SANfs was five years ahead of its time. SAN File System tried to bring policy-based management for information, which has been part of DFSMS for z/OS since the 1980s, over to distributed operating systems. The problem is that mainframe people who understand and appreciate the benefits of policy-based management already had it, and non-mainframe couldn't understand the benefits of something they have managed to survive without.
(Every time I see VMware presented as a new or clever idea, I have to remind people that this x86-based hypervisor basically implements the mainframe concept of server virtualization introduced by IBM in the 1970s. IBM is the leading reseller of VMware, and supports other server virtualization solutions including Linux KVM, Xen, Hyper-V and PowerVM.)
To address the various concerns about SAN File System, the proof-of-concept code from IBM Research was withdrawn from marketing, and new fresh code implementing these concepts were integrated into IBM's existing General Parallel File System (GPFS). This software would then be packaged with a server hardware cluster, exporting global file spaces with broad operating system reach. Initially offered as IBM Scale-out File Services (SoFS) service offering, this was later re-packaged as an appliance, the IBM Scale-Out Network Attached Storage (SONAS) product, and as IBM Smart Business Storage Cloud (SBSC) cloud storage offering. These now offer clustered NAS storage using the industry standard NFS and CIFS clients that nearly all operating systems already have.
Today, these former Horizon-1 products are now Horizon-2 and Horizon-3. They have evolved. Tivoli Storage Productivity Center, GPFS and SAN Volume Controller are all market leaders in their respective areas.
Well, it's Tuesday again, and you know what that means! IBM announcements!
Today, I am in New York visiting clients. The weather is a lot nicer than I expected. Here is a picture of the Hudson River through some trees with leaves turning color. Something we don't see in Tucson! Our cactus and pine trees stay green year-round!
The announcements today center around the IBM PureSystems family of expert integrated systems. The PureFlex is based on Flex System components. The Flex System chassis is 10U high that hold 14 bays, consisting of 7 rows by 2 columns. Computer and Storage nodes fit in the front, and switches, fans and power supplies in the back. Here is a quick recap:
IBM Flex System Compute Nodes
The x220 Compute Node is a single-bay low-power 2-socket x86 server. The x440 Compute Node is a powerful double-bay (1 row, 2 columns). The p260 Compute Node is a single-bay server based on the latest POWER7+ CPU processor.
IBM Flex System Expansion Nodes
Do you remember those old movies where a motorcycle would have a sidecar that could hold another passenger, or extra cargo? IBM introduces "Expansion Nodes" for the x200 series single-bay Compute nodes. The idea here is that in a single column, you have one bay for the Compute node, and then on the side in the next bay (same column) you have an Expanions node. There are two choices:
Storage Expansion Node allows you to have eight additional drives
PCIe Expansion Node allows to to have four PCIe cards, which could include the SSD-based PCIe cards from IBM's recent acquisition, Texas Memory Systems.
There are times where one or two internal drives are just not enough storage for a single server, and these expanion nodes could just be the perfect solution for some use cases.
IBM Flex System V7000 Storage Node
I saved the best for last! The Flex System V7000 Storage Node is basically the IBM Storwize V7000 repackaged to fit into the Flex System chassis. This means that in the front of the chassis, the Flex System V7000 takes up four bays (2 rows by 2 columns). In the back of the chassis are the power supplies, fans and switches.
The new Flex System V7000 supports everything the Storwize V7000 does except the upgrade to "Unified" through file modules. For those who want to have Storwize V7000 Unified in their PureFlex systems, IBM will continue to offer the outside-the-chassis original Storwize V7000 that can have two file modules added for NFS, CIFS, HTTPS, FTP and SCP protocol support.
IBM Flex System Converged Network Switch
The Converged Network Switch provide Fibre Channel over Ethernet (FCoE) directly from the chassis. This eliminates the need for a separate "Top-of-Rack" switch, and allows the new Flex System V7000 Storage Node to externally virtualize FCoE-based disk arrays.
Patterns of Expertise for Infrastructure
The original patterns of expertise focused on the PureApplication Systems. Now IBM has added some for the Infrastructure on PureFlex systems.
IBM has sold over 1,000 Flex System and PureFlex systems, across 40 different countries around the world, since their introduction a few months ago in April! These latest enhancements will help solidify IBM's industry leadership,
Did you miss IBM Pulse 2013 this week? I wasn't there either, having scheduled visits with clients in Washington DC this week, only to have those meetings cancelled due to the [U.S. sequestration cuts].
Fortunately, there are plenty of videos and materials to review from the event. Here's a [12-minute video] interview between Laura DuBois, Program VP of Storage for industry analyst firm [IDC], and fellow IBM executive Steve "Woj" Wojtowecz, VP of Tivoli Storage and Networking Software.
(Update: Apparently, IBM had not secured re-distribution rights from IDC to post this video prior to my blog post. IBM now has full permission to distribute. My apologies for any inconvenience last week.)
The two discuss client opportunities and requirements for storage clouds and compute clouds. Client cloud storage requirements include backup and archive clouds, file storage clouds, and storage that supports compute cloud environments.
While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.
"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this." -- Hu Yoshida
Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.
First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.
Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.
IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.
IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.
IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.
IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.
This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.
IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.
IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.
Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).
IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.
The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).
Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.
IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.
So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.
Continuing my coverage of the annual [2010 System Storage Technical University], I participated in the storage free-for-all, which is a long-time tradition, started at SHARE User Group conference, and carried forward to other IT conferences. The free-for-all is a Q&A Panel of experts to allow anyone to ask any question. These are sometimes called "Birds of a Feather" (BOF). Last year, they were called "Meet the Experts", one for mainframe storage, and the other for storage attached to distributed systems. This year, we had two: one focused on Tivoli Storage software, and the second to cover storage hardware. This post provides a recap of the Storage Hardware free-for-all.
The emcee for the event was Scott Drummond. The other experts on the panel included Dan Thompson, Carlos Pratt, Jack Arnold, Jim Blue, Scott Schroder, Ed Baker, Mike Wood, Steve Branch, Randy Arseneau, Tony Abete, Jim Fisher, Scott Wein, Rob Wilson, Jason Auvenshine, Dave Canan, Al Watson, and myself, yours truly, Tony Pearson.
What can I do to improve performance on my DS8100 disk system? It is running a mix of sequential batch processing and my medical application (EPIC). I have 16GB of cache and everything is formatted as RAID-5.
We are familiar with EPIC. It does not "play well with others", so IBM recommends you consider dedicating resources for just the EPIC data. Also consider RAID-10 instead for the EPIC data.
How do I evaluate IBM storage solutions in regards to [PCI-DSS] requirements.
Well, we are not lawyers, and some aspects of the PCI-DSS requirements are outside the storage realm. In March 2010, IBM was named ["Best Security Company"] by SC Magazine, and we have secure storage solutions for both disk and tape systems. IBM DS8000 and DS5000 series offer Full Disk Encryption (FDE) disk drives. IBM LTO-4/LTO-5 and TS1120/TS1130 tape drives meet FIPS requirements for encryption. We will provide you contact information on an encryption expert to address the other parts of your PCI-DSS specific concerns.
My telco will only offer FCIP routing for long-distance disk replication, but my CIO wants to use Fibre Channel routing over CWDM, what do I do?
IBM XIV, DS8000 and DS5000 all support FC-based long distance replication across CWDM. However, if you don't have dark fiber, and your telco won't provide this option, you may need to re-negotiate your options.
My DS4800 sometimes reboots repeatedly, what should I do.
This was a known problem with microcode level 760.28, it was detecting a failed drive. You need to replace the drive, and upgrade to the latest microcode.
Should I use VMware snapshots or DS5000 FlashCopy?
VMware snapshots are not free, you need to upgrade to the appropriate level of VMware to get this function, and it would be limited to your VMware data only. The advantage of DS5000 FlashCopy is that it applies to all of your operating systems and hypervisors in use, and eliminates the consumption of VMware overhead. It provides crash-consistent copies of your data. If your DS5000 disk system is dedicated to VMware, then you may want to compare costs versus trade-offs.
Any truth to the rumor that Fibre Channel protocol will be replaced by SAS?
SAS has some definite cost advantages, but is limited to 8 meters in length. Therefore, you will see more and more usage of SAS within storage devices, but outside the box, there will continue to be Fibre Channel, including FCP, FICON and FCoE. The Fibre Channel Industry Alliance [FCIA] has a healthy roadmap for 16 Gbps support and 20 Gbps interswitch link (ISL) connections.
What about Fibre Channel drives, are these going away?
We need to differentiate the connector from the drive itself. Manufacturers are able to produce 10K and 15K RPM drives with SAS instead of FC connectors. While many have suggested that a "Flash-and-Stash" approach of SSD+SATA would eliminate the need for high-speed drives, IBM predicts that there just won't be enough SSD produced to meet the performance needs of our clients over the next five years, so 15K RPM drives, more likely with SAS instead of FC connectors, will continue to be deployed for the next five years.
We'd like more advanced hands-on labs, and to have the certification exams be more product-specific rather than exams for midrange disk or enterprise disk that are too wide-ranging.
Ok, we will take that feedback to the conference organizers.
IBM Tivoli Storage Manager is focused on disaster recovery from tape, how do I incorporate remote disk replication.
This is IBM's Unified Recovery Management, based on the seven tiers of disaster recovery established in 1983 at GUIDE conference. You can combine local recovery with FastBack, data center server recovery with TSM and FlashCopy manager, and combine that with IBM Tivoli Storage Productivity Center for Replication (TPC-R), GDOC and GDPS to manage disk replication across business continuity/disaster recovery (BC/DR) locations.
IBM Tivoli Storage Productivity Center for Replication only manages the LUNs, what about server failover and mapping the new servers to the replicated LUNs?
There are seven tiers of disaster recovery. The sixth tier is to manage the storage replication only, as TPC-R does. The seventh tier adds full server and network failover. For that you need something like IBM GDPS or GDOC that adds this capability.
All of my other vendor kit has bold advertising, prominent lettering, neon lights, bright colors, but our IBM kit is just black, often not even identifying the specific make or model, just "IBM" or "IBM System Storage".
IBM has opted for simplified packaging and our sleek, signature "raven black" color, and pass these savings on to you.
Bring back the SHARK fins!
We will bring that feedback to our development team. ("Shark" was the codename for IBM's ESS 800 disk model. Fiberglass "fins" were made as promotional items and placed on top of ESS 800 disk systems to help "identify them" on the data center floor. Unfortunately, professional golfer [<a href="http://www.shark.com/">Greg Norman</a>] complained, so IBM discontinued the use of the codename back in 2005.)
Where is Infiniband?
Like SAS, Infiniband had limited distance, about 10 to 15 meters, which proved unusable for server-to-storage network connections across data center floorspace. However, there are now 150 meter optical cables available, and you will find Infiniband used in server-to-server communications and inside storage systems. IBM SONAS uses Infiniband today internally. IBM DCS9900 offers Infiniband host-attachment for HPC customers.
We need midrange storage for our mainframe please?
In addition to the IBM System Storage DS8000 series, the IBM SAN Volume Controller and IBM XIV are able to connect to Linux on System z mainframes.
We need "Do's and Don'ts" on which software to run with which hardware.
IBM [Redbooks] are a good source for that, and we prioritize our efforts based on all those cards and letters you send the IBM Redbooks team.
The new TPC v4 reporting tool requires a bit of a learning curve.
The new reporting tool, based on Eclipse's Business Intelligence Reporting Tool [BIRT], is now standardized across the most of the Tivoli portfolio. Check out the [Tivoli Common Reporting] community page for assistance.
An unfortunate side-effect of using server virtualization like VMware is that it worsens management and backup issues. We now have many guests on each blade server.
IBM is the leading reseller of VMware, and understands that VMware adds an added layer of complexity. Thankfully, IBM Tivoli Storage Manager backups uses a lightweight agent. IBM [System Director VMcontrol] can help you manage a variety of hypervisor environments.
This was a great interactive session. I am glad everyone stayed late Thursday evening to participate in this discussion.
The keynote was led by Phil Tasker, IBM Business Unit Executive (BUE) for STG Education Programs in Growth Markets, then Joe Screnci, head of IBM Storage Sales for Australia. IBM is in the Top 10 Training Hall of Fame, and conducts over 40,000 classes worldwide, resulting in over 1.3 million student days of instructions. IBM Systems Lab and Training technical hosts over three dozen conferences like this one every year.
Next was Clod Barrera, Distinguished Engineer and Chief Technical Strategist for the IBM System Storage product line. He covered future trends in storage as they relate to IBM's Smarter COmputing initiative.
Storage for the Clouds
Clod Barrera presented this break-out session on Cloud Storage. He covered why clouds matter, the various types and purposes of cloud, technology and architectures, and where IBM is headed to support this trend.
Storage for Cloud computing was $1 Billion USD business in 2010, and is expected to grow 32 percent CAGR through, compared to 3.8 percent for non-cloud storage. Clod estimates that 10 to 15 percent of all storage will be in cloud deployments by 2015. Of this storage, analysts expect 50 percent in private clouds, and the other 50 percent in public clouds. For private clouds, clients are looking to "Cloudify" their existing IT infrastructures. For public clouds, the projects are mostly green field.
IBM is also looking to the "arms dealer" of choice for Telcos and other companies looking to launch their own Cloud Services. IBM has a Cloud Services Provider Platform (CSP2) specifically to provide all the tools and technologies needed to make this possible.
Last month, IBM launched several new solutions for Cloud. The IBM Starter Kit for Cloud will help existing IT environments adopt cloud technologies. The IBM Service Agility Accelerator for Cloud is available for more advanced deployments. IBM Service Delivery Manager (ISDM) integrates a collection of software to provide complete integrated service management. IBM CloudBurst provides an integrated hardware-and-software stack for both x86 and POWER chipsets.
Multi-tenancy is also a big issue, and this varies depending on deployment model: IaaS, PaaS, or SaaS. Multi-tenancy is needed to help divide up management tasks, and to ensure that shared resources are paid for and meet SLA requirements accordingly.
Clod feels there are good reasons to use high performance, transactional SAN storage for VMware environments, versus NAS which many people consider simpler to deploy. IBM is also active in open standards, including SNIA's Cloud Data Management Interface [CDMI].
Journey to the Private Cloud
Gary Luke from Brocade provided this session on IBM's SAN384B-2 and SAN768B-2 SAN directors. Brocade is one of IBM's suppliers for SAN switches, and thanks to TRILL being adopted last August by IETF, supports multi-hop FCoE configurations! However, Gary did not talk about FCoE, but rather native FCP and FICON support in these new directors.
According to VMware, only 30 percent of x86 workloads are virtualized by any hypervisor. Gary feels that server virtualization and the use of Solid-State Drives (SSD) in disk arrays are driving existing 8 Gbps SAN to upgrade to 16 Gbps. Gary feels that Fibre-Channel based SANs are best positioned to handle unpredictable peaks in a 24-by-7 world.
The SAN384B-2 can house up to 256 ports (8 Gbps) or 192 ports (16 Gbps) in four slots, 9U chassis. The SAN768B-2 can handle twice these, in a 12U chassis. The nice thing about the 16Gbps ports is that they can auto-negotiate down to 10, 8, 4 and 2 Gbps. This is far better than typical N-2 support, often referred to as the speeds supported, such as 4/2/1 and 8/4/2. An upcoming FOS release will allow people with previous generation SAN384B-1/SAN768B-1 directors to move their 8Gbps blades over to the new SAN384B-2/SAN768B-2 generation models.
Since most CWDM and DWDM only support maximum 10 Gbps FC and 10GbE, Brocade's 16Gbps can automatically drop down to 10 Gbps for direct attachment to CWDM/DWDM, rather than having a step-down box normally required.
A major advancement is the change from copper to optical "Inter-Chassis Links" (ICL). Unlike Inter-switch links (ISL) that use up SAN ports on each box, the ICL is faster, more efficient and does not consume ports. Normally, clients would connect two directors together, but now you can connect up to six chassis together! For example, you can have four SAN368B-2 connected to your host servers, ICL attached to two SAN768B-2, that are then connected to your disk and tape storage devices. The fiber optic ICL allow for up to 50 meters distance. Combining six chassis together would allow the complex to support over 3,000 ports (8 Gbps) or 2,300 ports (16 Gbps).
The SAN384B-2 and SAN768B-2 supports "virtual SAN" logical switches, traffic isoliation (TI) zones, fabric-assigned WWNNs, and fabric-based QoS.
Lastly, Brocade offers a free utility called [SANhealth] that will gather data from your b-type, m-type and even Cisco MDS-based SAN. The data can then be sent to Brocade for analysis, and Brocade will then email back some nice Visio graphs, spreadsheets and other analysis results on the health of your SAN.
Every year, I teach hundreds of sellers how to sell IBM storage products. I have been doing this since the late 1990s, and it is one task that has carried forward from one job to another as I transitioned through various roles from development, to marketing, to consulting.
This week, I am in the city of Taipei [Taipei] to teach Top Gun sales class, part of IBM's [Sales Training] curriculum. This is only my second time here on the island of Taiwan.
As you can see from this photo, Taipei is a large city with just row after row of buildings. The metropolitan area has about seven million people, and I saw lots of construction for more on my ride in from the airport.
The student body consists of IBM Business Partners and field sales reps eager to learn how to become better sellers. Typically, some of the students might have just been hired on, just finished IBM Sales School, a few might have transferred from selling other product lines, while others are established storage sellers looking for a refresher on the latest solutions and technologies.
I am part of the teach team comprised of seven instructors from different countries. Here is what the week entails for me:
Monday - I will present "Selling Scale-Out NAS Solutions" that covers the IBM SONAS appliance and gateway configurations, and be part of a panel discussion on Disk with several other experts.
Tuesday - I have two topics, "Selling Disk Virtualization Solutions" and "Selling Unified Storage Solutions", which cover the IBM SAN Volume Controller (SVC), Storwize V7000 and Storwize V7000 Unified products.
Wednesday - I will explain how to position and sell IBM products against the competition.
Thursday - I will present "Selling Infrastructure Management Solutions" and "Selling Unified Recovery Management Solutions", which focus on the IBM Tivoli Storage portfolio, including Tivoli Storage Productivity Center, Tivoli Storage Manager (TSM), and Tivoli Storage FlashCopy Manager (FCM). The day ends with the dreaded "Final Exam".
Friday - The students will present their "Team Value Workshop" presentations, and the class concludes with a formal graduation ceremony for the subset of students who pass. A few outstanding students will be honored with "Top Gun" status.
These are the solution areas I present most often as a consultant at the IBM Executive Briefing Center in Tucson, so I can provide real-life stories of different client situations to help illustrate my examples.
The weather here in Taipei calls for rain every day! I was able to take this photo on Sunday morning while it was still nice and clear, but later in the afternoon, we had quite the downpour. I am glad I brought my raincoat!
Continuing my drawn out coverage of IBM's big storage launch of February 9, today I'll cover the IBM System Storage TS7680 ProtecTIER data deduplication gateway for System z.
On the host side, TS7680 connects to mainframe systems running z/OS or z/VM over FICON attachment, emulating an automated tape library with 3592-J1A devices. The TS7680 includes two controllers that emulate the 3592 C06 model, with 4 FICON ports each. Each controller emulates up to 128 virtual 3592 tape drives, for a total of 256 virtual drives per TS7680 system. The mainframe sees up to 1 million virtual tape cartridges, up to 100GB raw capacity each, before compression. For z/OS, the automated library has full SMS Tape and Integrated Library Management capability that you would expect.
Inside, the two control units are both connected to a redundant pair cluster of ProtecTIER engines running the HyperFactor deduplication algorithm that is able to process the deduplication inline, as data is ingested, rather than post-process that other deduplication solutions use. These engines are similar to the TS7650 gateway machines for distributed systems.
On the back end, these ProtecTIER deduplication engines are then connected to external disk, up to 1PB. If you get 25x data deduplication ratio on your data, that would be 25PB of mainframe data stored on only 1PB of physical disk. The disk can be any disk supported by ProtecTIER over FCP protocol, not just the IBM System Storage DS8000, but also the IBM DS4000, DS5000 or IBM XIV storage system, various models of EMC and HDS, and of course the IBM SAN Volume Controller (SVC) with all of its supported disk systems.
My how time flies. This week marks my 24th anniversary working here at IBM. This would have escaped me completely, had I not gotten an email reminding me that it was time to get a new laptop. IBM manages these on a four-year depreciation schedule, and I received my current laptop back in June 2006, on my 20th anniversary.
When I first started at IBM, I was a developer on DFHSM for the MVS operating system, now called DFSMShsm on the z/OS operating system. We all had 3270 [dumb terminals], large cathode ray tubes affectionately known as "green screens", and all of our files were stored centrally on the mainframe. When Personal Computers (PC) were first deployed, I was assigned the job of deciding who got them when. We were getting 120 machines, in five batches of 24 systems each, spaced out over the next two years. I was assigned the job of recommending who should get a PC during the first batch, the second batch, and so on. I was concerned that everyone would want to be part of the first batch, so I put out a survey, asking questions on how familiar they were with personal computers, whether they owned one at home, were familiar with DOS or OS/2, and so on.
It was actually my last question that helped make the decision process easy:
How soon do you want a Personal Computer to replace your existing 3270 terminal?
As late as possible
I had five options, and roughly 24 respondents checked each one, making my job extremely easy. Ironically, once the early adopters of the first batch discovered that these PC could be used for more than just 3270 terminal emulation, many of the others wanted theirs sooner.
Back then, IBM employees resented any form of change. Many took their new PC, configured it to be a full-screen 3270 emulation screen, and continued to work much as they had before. My mentor, Jerry Pence, would print out his mails, and file the printed emails into hanging file folders in his desk credenza. He did not trust saving them on the mainframe, so he was certainly not going to trust storing them on his new PC. One employee used his PC as a door stop, claiming he will continue to use his 3270 terminal until they take it away from him.
Moving forward to 2006, I was one of the first in my building to get a ThinkPad T60. It was so new that many of the accessories were not yet available. It had Windows XP on a single-core 32-bit processor, 1GB RAM, and a huge 80GB disk drive. The built-in 1GbE Ethernet went unused for a while, as we had 16 Mbps Token Ring network.
I was the marketing strategist for IBM System Storage back then, and needed all this excess power and capacity to handle all my graphic-intense applications, like GIMP and Second Life.
Over the past four years, I made a few slight improvements. I partitioned the hard drive to dual-boot between Windows and Linux, and created a separate partition for my data that could be accessed from either OS. I increased the memory to 2GB and replaced the disk with a drive holding 120GB capacity.
A few years ago, IBM surprised us by deciding to support Windows, Linux and Mac OS computers. But actually it made a lot of sense. IBM's world-renown global services manages the help-desk support of over 500 other companies in addition to the 400,000 employees within IBM, so they already had to know how to handle these other operating systems. Now we can choose whichever we feel makes us more productive. Happy employees are more productive, of course. IBM's vision is that almost everything you need to do would be supported on all three OS platforms:
Access your email, calendar, to-do list and corporate databases via Lotus Notes on either Windows, Linux or Mac OS. Corporate databases store our confidential data centrally, so we don't have to have them on our local systems. We can make local replicas of specific databases for offline access, and these are encrypted on our local hard drive for added protection. Emails can link directly to specific entries in a database, so we don't have huge attachments slowing down email traffic. IBM also offers LotusLive, a public cloud offering for companies to get out of managing their own email Lotus Domino repositories.
Create presentations, documents and spreadsheets on either Windows, Linux or Mac OS. Lotus Symphony is based on open source OpenOffice and is compatible with Microsoft Office. This allows us to open and update directly in Microsoft's PPT, DOC and XLS formats.
Many of the corporate applications have now been converted to be browser-accessible. The Firefox browser is available on Windows, Linux and Mac OS. This is a huge step forward, in my opinion, as we often had to download applications just to do the simplest things like submit our time-sheet or travel expense reimbursement. I manage my blog, Facebook and Twitter all from online web-based applications.
The irony here is that the world is switching back to thin clients, with data stored centrally. The popularity of Web 2.0 helped this along. People are using Google Docs or Microsoft OfficeOnline to eliminate having to store anything locally on their machines. This vision positions IBM employees well for emerging cloud-based offerings.
Sadly, we are not quite completely off Windows. Some of our Lotus Notes databases use Windows-only APIs to access our Siebel databases. I have encountered PowerPoint presentations and Excel spreadsheets that just don't render correctly in Lotus Symphony. And finally, some of our web-based applications work only in Internet Explorer! We use the outdated IE6 corporate-wide, which is enough reason to switch over to Firefox, Chrome or Opera browsers. I have to put special tags on my blog posts to suppress YouTube and other embedded objects that aren't supported on IE6.
So, this leaves me with two options: Get a Mac and run Windows on the side as a guest operating system, or get a ThinkPad to run Windows or Windows/Linux. I've opted for the latter, and put in my order for a ThinkPad 410 with a dual-core 64-bit i5 Intel processor, VT-capable to provide hardware-assistance for virtualization, 4GB of RAM, and a huge 320GB drive. It will come installed with Windows XP as one big C: drive, so it will be up to me to re-partition it into a Windows/Linux dual-boot and/or Windows and Linux running as guest OS machine.
(Full disclosure to make the FTC happy: This is not an endorsement for Microsoft or against Apple products. I have an Apple Mac Mini at home, as well as Windows and Linux machines. IBM and Apple have a business relationship, and IBM manufactures technology inside some of Apple's products. I own shares of Apple stock, I have friends and family that work for Microsoft that occasionally send me Microsoft-logo items, and I work for IBM.)
I have until the end of June to receive my new laptop, re-partition, re-install all my programs, reconfigure all my settings, and transfer over my data so that I can send my old ThinkPad T60 back. IBM will probably refurbish it and send it off to a deserving child in Africa.
If you have an old PC or laptop, please consider donating it to a child, school or charity in your area. To help out a deserving child in Africa or elsewhere, consider contributing to the [One Laptop Per Child] organization.
IBM introduced the Linear Tape File System last year, which I explained in my post [IBM Celebrates 10 Year Anniversary for LTO tape], and released it as open source to the rest of the Linear Tape Open [LTO] Consortium so that the entire planet can benefit from IBM's innovation. IBM presented a technology demonstration of its Linear Tape File System - Library Edition at the NAB conference, showing how this new IBM library offering of the file system can put mass archives of rich media video content at the users fingertips with the ease of library automation.
From left to right, here is Atsushi Nagaishi (Toshiba) and Shinobu Fujihara (IBM). Fujihara-san is from IBM's Yamato lab in Japan where some of the LTFS development was done. The Yamato Lab was not damaged by the [Earthquakes in Japan].
With the capabilities of LTFS, IBM has introduced an entirely new role for tape, as an attractive high capacity, easy to use, low cost and shareable storage media. LTFS can make tape usable in a fashion like removable external disk, a giant alternative to floppy diskettes, DVD-RW and USB memory sticks with directory tree access and file-level drag-and-drop capability. LTFS can allow the for passing of information around from one system or employee to another. And as for high video storage capacity, a 1.5TB LTO-5 cartridge can hold about 50 hours of XDCAM HD video!
A group photo of the global IBM LTFS team, from left to right, David Pease from IBM Almaden Research Center, Ed Childers from IBM Tucson, Shinobu Fujihara and Hironobu Nagura from IBM Japan.
IBM was once again #1 leader in Tape worldwide for the year 2010. With this exciting new win, tape is not just for backup and archive anymore!
Guest Post: The following post was written by Tom Rauchut, IBM Infrastructure Architect and Advanced Technical Sales Specialist for Tivoli Automation. Tom is at IBM Pulse 2011 for Las Vegas this week, and has offered to send his observations.
The expo opened last night. There are so many fantastic demos and product experts. Las Vegas has a Tivoli buzz on right now.
Continuing my coverage of the [IBM System x and System Storage Technical Symposium], I thought I would start with some photos. I took these with cell phone, and without realizing how much it would cost, uploaded them to Flickr at international data roaming rates. Oops!
Here are some of the banners used at the conference. Each break-out session room was outfitted with a "Presentation Briefcase" that had everything a speaker might need, including power plug adapters and dry-erase markers for the whiteboard. What a clever idea!
Here is a recap of the last and final day 3:
Understanding IBM's Storage Encryption Options
Special thanks to Jack Arnold for providing me his deck for this presentation. I presented IBM's leadership in encryption standards, including the [OASIS Key Management Interoperability Protocol] that allows many software and hardware vendors to interoperate. IBM offers the IBM Tivoli Key Lifecycle Manager (TKLM v2) for Windows, Linux, AIX and Solaris operating systems, and the IBM Security Key Lifecycle Manager (v1.1) for z/OS.
Encrypting data at rest can be done several ways, by the application at the host server, in a SAN-based switch, or at the storage system itself. I presented how IBM Tivoli Storage Manager, the IBM SAN32B-E4 SAN switch, and various disk and tape devices accomplish this level of protection.
NAS @ IBM
Rich Swain, IBM Field Technical Sales Specialist for NAS solutions, provided an overview of IBM's NAS strategy and the three products: Scale-Out Network Attached Storage (SONAS), Storwize V7000 Unified, and N series.
IBM System Networking Convergence CEE/DCB/FCoE
Mike Easterly, IBM Global Field Marketing Manager for IBM System Networking, presented on Network convergence. He wants to emphasize that "Convergence is not just FCoE!" rather it is bringing together FCoE with iSCSI, CIFS, NFS and other Ethernet-based protocols. In his view, "All roads lead to Ethernet!"
There are a lot new standards that didn't exist a few years ago, such as PCI-SIG's Single Root I/O Virtualization [SR-IOV], Virtual Ethernet Port Aggregator [VEPA], and [VN-Tag], Data Center Bridging [DCB], Layer-2 Multipath [L2MP], and my favorite: Transparent Interconnect of Lots of Links [TRILL].
Last year, IBM acquired Blade Network Technologies (BNT), which was the company that made IBM BladeCenter's Advanced Management Module (AMM) and BladeCenter Open Fabric Manager (BOFM). BNT also makes Ethernet switches, so it has been merged with IBM's System Storage team, forming the IBM System Storage and Networking team. Most of today's 10GbE is either fiber optic, Direct Attach Copper (DAC) that supports up to 8.5 meter length cables, or 10GBASE-T which provides longer distances of twisted pair. IBM's DS3500 uses 10GBASE-T for its 10GbE iSCSI support.
Last month, IBM announced 40GbE! I missed that one. The IT industry also expects to deliver 100GbE by 2013. For now, these will be used as up-links between other switches, as most servers don't have the capacity to pump this much data through their buses. With 40GbE and 100GbE, it would be hard to ignore Ethernet as the common network standard to drive convergence.
Fibre Channel, such as FCP and FICON, are still the dominant storage networking technology, but this is expected to peak around 2013 and start declining thereafter in favor of iSCSI, NAS and FCoE technologies. Already the enhancements like "Priority-based Flow Control" made to Ethernet to support FCoE have also helped out iSCSI and NAS deployments as well.
The iSCSI protocol is being used with Microsoft Exchange, PXE Boot, Server virtualization hypervisors like VMware and Hyper-V, as well as large Database and OLTP. IBM's SVC, Storwize V7000, XIV, DS5000, DS3500 and N series all support iSCSI.
IBM's [RackSwitch] family of products can help offload traffic at $500 per port, compared to traditional $2000 per port for IBM SAN32B or Cisco Nexus5000 converged top-of-rack switches.
IBM's System Networking strategy has two parts. For Ethernet, offer its own IBM System Networking product line as well as continue its partnership with Juniper Networks. For Fibre Channel and FCoE, continue strategic partnerships with Brocade and Cisco. IBM will lead the industry, help drive open standards to adopt Converged Enhanced Ethernet (CEE), provide flexibility and validate data center networking solutions that work end-to-end.
Earlier this week, EMC announced its Symmetrix V-Max, following two trends in the industry:
Using Roman numerals. The "V" here is for FIVE, as this is the successor to the DMX-3 and DMX-4. EMC might have gotten the idea from IBM's success with the XIV (which does refer to the number 14, specifically the 14th class of a Talpiot program in Israel that the founders of XIV graduated from).
Adding "-Max", "-Monkey" or "2.0" at the end of things to make them sound more cool and to appeal to a younger, hipper audience. EMC might have gotten this idea from Pepsi-Max (... a taste of storage for the next generation?)
I took a cue from President Obama and waited a few days to collect my thoughts and do my homework before responding.Special thanks to fellow blogger ChuckH in giving me a [handy list of reactions] for me to pick and choose from. It appears that EMC marketing machine feels it is acceptable for their own folks to claim that EMC is doing something first, or that others are catching up to EMC, but when other vendors do likewise, then that is just pathetic or incoherent. Here are a few reactions already from fellow bloggers:
This was a major announcement for EMC, addressing many of the problems, flaws and weaknesses of the earlier DMX-3 and DMX-4 deliverables. Here's my read on this:
Now you can have as many FCP ports (128) as an IBM System Storage DS8300, although the maximum number of FICON ports is still short, and no mention of ESCON support. The Ethernet ports appear to be 1Gb, not the new 10GbE you might expect.
Support for System z mainframe
V-Max adds some new support to catch up with the DS8000, like Extended Address Volumes (EAV). EMC is still not quite there yet. IBM DS8000 continues to be the best, most feature-rich storage option if you have System z mainframe servers.
Both the IBM DS8000 and HDS USP-V beat the DMX-4 in performance, and in some cases the DMX-4 even lost to the IBM XIV, so EMC had to do something about it. EMC chooses not to participate in industry-standard performance benchmarks like those from the [Storage Performance Council], which limits them to vague comparisons against older EMC gear. I'll give EMC engineers the benefit of the doubt and say that now V-Max is now "comparably as fast as HDS and IBM offerings".
Getting "V" in the name
The "V" appears to be for the roman number five, not to be confused with external heterogeneous storage virtualization that HDS USP-V and IBM SVC provide. There is no mention of synergy with EMC's failed "Invista" product, and I see no support for attaching other vendors disk to the back of this thing.
Switch to Intel processor
Apple switched its computers from PowerPC to Intel-based, and now EMC follows in the same path. There are some custom ASICs still in V-Max, so it is not as pure as IBM's offerings.
Modular, XIV-like Scale-out Architecture
Actually, the packaging appears to follow the familiar system bays and storage bays of the DMX-4 and DMX-4 950 models, but architecturally offers XIV-like attachment across a common switch network between "engines", EMC's term for interface modules.
Non-disruptive data migration
IBM's SoFS, DR550 and GMAS have this already, as does as anything connected behind an IBM SAN Volume Controller.
A long time ago, IBM used to have midrange disk storage systems called "FAStT" which stood for Fibre Array Storage Technology, so this might have given EMC the idea for their "Fully Automated Storage Tiering" acronym. The concept appears similar to what IBM introduced back in 2007 for the Scale-Out-File Services [SofS] which not only provides policy-based placement, movement and expiration on different disk tiers, includes tape tiers as well for a complete solution. I don't see anything in the V-Max announcement that it will support tape anytime soon.
And what ever happend to EMC's Atmos? Wasn't that supposed to be EMC's new direction in storage?
Zero-data loss Three-site replication
IBM already calls this Metro/Global Mirror for its IBM DS8000 series, but EMC chose to call it SRDF/EDP for Extended Distance Protection.
Ease of Use
The most significant part of the announcement is that EMC is finally focusing on ease-of-use.In addition to reducing the requirement for "Bin File" modifications, this box has a redesigned user interface to focus on usability issues. For past DMX models, EMC customers had to either hire EMC to do tasks for them that were just to difficult otherwise, or buy expensive software like their EMC Control Center to manage. EMC willcontinue to sell DMX-4 boxes for a while, as they are probably supply-constrained on the V-Max side, but I doubt they will retro-fit these new features back to DMX-3 and DMX-4.
When IBM announced its acquisition of XIV over a year ago now, customers were knocking down our doors to get one. This caught two particular groups looking like a [deer in headlights]:
EMC Symmetrix sales force: Some of the smarter ones left EMC to go sell IBM XIV, leaving EMC short-handed and having to announce they [were hiring during their layoffs]. Obviously, a few of the smart ones stayed behind, to convince their management to build something like the V-Max.
IBM DS8000 sales force: If clients are not happy with their existing EMC Symmetrix, why don't they just buy an IBM DS8000 instead? What does XIV have that DS8000 doesn't?
Let me contrast this with the situation Microsoft Windows is currently facing.
I am often asked by friends to help them pick out laptops and personal computers. I use Linux, Windows and MacOS, so have personal experience with all three operating systems.
Linux is cheaper, offers the power-user the most options for supporting older, less-powerfulequipment, but I wouldn't have my Mom use it. While distributions like Ubuntu are makinggreat strides, it is just too difficult for some people.
MacOS is nice, I like it, it works out of the box with little or no customization and an intuitive interface. However, some of my friends don't make IBM-level salaries, and have to watch their budget.
In their "I'm a PC" campaign, Microsoft is fighting both fronts. Let's examine two commercials:
In the first commercial, a young eight-year-old puts together a video from pictures oftoy animals and some background music.The message: "Windows is easier to use than Linux!" If they really wanted to send this message, they should have shown senior citizens instead.
In the second commercial, a young college student is asked to find a laptop with 17 inchscreen, and a variety of other qualifications, for under $1000 US dollars. The only modelat the Apple store below this price had a 13 inch screen, but she finds a Windows-based system that had this size screen and met all the other qualifications. The message: "Windows-based hardware from a variety of competitors are less expensive than hardware from Apple!"
Both Microsoft and Apple charge a premium for ease-of-use.In the storage world, things are completely opposite. Vendors don't charge a premium forease-of-use. In fact, some of the easiest to use are also the least expensive.
If you just have Windows and Linux, you can get some entry level system likethe IBM DS3000 series, only a few features, and can be set up in six simple steps.
Next, if you have a more interesting mix of operating systems, Linux, Windows and some flavorsof UNIX like IBM AIX, HP-UX or Sun Solaris, then you might want the features and functionsof more pricier midrange offerings. More options means that configuration and deploymentis more difficult, however.
Finally, if you are serious Fortune 500 company, running your mission critical applications on System z or System i centralized systems in a big data center, that you might be willing to pay top dollar for the most feature-rich offerings of an Enterprise-class machine.Thankfully you have an army of highly-trained staff to handle the highest levels of complexity.
IBM's DS8000, HDS USP-V and EMC's Symmetrix are the key players in the Enterprise-classspace. They tried to be ["all things to all people"], er.. perhaps all things to allplatforms. All of the features and functions came at a price, not just in dollars, butin complexity and difficulty. You needed highly skilled storage admins using expensive storage management software, or be willing to hirethe storage vendor's premium services to get the job done.
IBM recognized this trend early. IBM's SVC, N series and now XIV all offer ease-of-use withenterprise-class features and functions, at lower total cost of ownership than traditional enterprise-class systems. IBM is not the only one, of course, as smaller storage start-ups like 3PAR,Pillar Data Systems, Compellent, and to some extent Dell's EqualLogic all recognized thisand developed clever offerings as well.
While IBM's XIV may not have been the first to introduce a modular, scale-out architectureusing commodity parts managed by sophisticated ease-of-use interfaces, its success might have been the kick-in-the-butt EMC needed to follow the rest of the industry in this direction.
I have arrived safely to San Francisco, and was able to check-in at the hotel, pick up my registration badge for Oracle OpenWorld 2011, and attend the first keynote session. This is the largest Oracle OpenWorld event to-date, with over 45,000 attendees from 117 different countries. There are 520,000 square feet of exhibition floor, and over 2,400 educational sessions. The conference is spread across the different buildings of the Moscone center, as well as nearby hotels. On average, attendees will walk seven miles during the week.
Larry Ellison was the keynote speaker for this first kick-off session. He focused almost exclusively on server and storage hardware. He feels that business is all about moving data, not doing integer math.
At the beginning of 2011, Oracle had only sold about 1,000 Exadata, but they have a sales target to sell an additional 3,000 Exadata boxes by year end.
The Exadata offers up to 10x columnar compression, and has 10x faster bandwidth (40Gbps Infiniband versus 4Gbps FCP). If you have a 100TB database, it would take up only 10TB of disk with this approach. He claims that the 90TB of disk you don't have to buy can then be used to buy more DRAM and/or Flash SSD.
(Realistically, since SSD is 15x more expensive than spinning disk, you can only purchase about 6TB of Flash for the 90TB you save on disk!)
Larry claims the design point for Exadata and Exalogic was to offer a system that was more powerful than IBM's fastest P795 computer, but cheaper than commodity x86 hardware. His secret is to "Parallel everything" for faster performance, and no single points of failure (SPOF). Exadata offers up to 10-50x faster query, and 4-10x faster OLTP. To keep costs low, Exadata uses all commodity hardware except the Infiniband. He cited various customer examples:
A company replaced 36 Teradata with 3 Exadata and result was application was 8x faster.
Banco Chile 9x faster than previous system
Deutsche Post 60x faster
Sogetti gets 60x faster backups.
French bank BNP Paribas 17x faster and no change to applications.
Proctor & Gamble 18x faster
Merck 5x faster
Turkcell 250TB compressed to 25TB, 10x faster
The problem was that in each example, he said what it was compared against was the old previous system, which varies and could have been an older Sun system, or an old system from HP, IBM or Dell. Perhaps it was a freudian slip, but Larry mistakenly said "Paralyze" your applications, when he probably meant to "Parallelize".
Of all their 380,000 Oracle customers, 70 percent have SPARC/Solaris and/or Linux. Last week, Oracle announced the new SPARC-T4, which Larry claimed was 5x faster than the previous SPARC-T3. Larry feels that for the first time ever, a non-IBM CPU can challenge the long-standing rein of the IBM POWER series processor. Larry admitted that the IBM POWER7 chip actually did some tasks faster than the SPARC-T4, so his work is not yet done, but they plan to offer a new SPARC-T5 next year that will be 2x better than the SPARC-T4.
Larry compared the I/O bandwidth of serv ers based on SPARC-T4, compared to POWER7, and found that the SPARC-T4 has double the I/O bandwidth, for a cost that was only about 1/4 the cost of a mainframe. IBM offers both. POWER7-based servers for CPU-intensive workloads, and System z (S/390)-based systems for I/O-intensive workloads. Larry feels that even though POWER7 is superior than SPARC-T4 for mathematical calculations, all business applications are focused on I/O-bandwidth to move data, not computations.
Larry claims the new SPARC-T4 can do 1.2 million IOPS. He uses 40 Gbps Infiniband instead of traditional SAN-attached FCP solutions.
A new "box" called Exalytics, combines their commodity hardware platform with a hueristic adaptive in-memory cache, their latest "me-too" solution that compares with what IBM already offers in [IBM SolidDB]. In fact, their me-too is not even internally developed, but rather the result of an acquisition of a company called "Times Ten". I thought it was interesting that the only piece of Oracle software mentioned during Larry's 90-minute speach, was this piece of acquired technology. The new Exalytics product run on a small rack and grow, analyzing relational data, non-relational OLAP, as well as unstructured documents. The result is what Larry called "the Speed of Light".
He also mentioned that Bob Shimp would kick-off the Cloud later in the week. Given that Larry himself thought that Cloud was a stupid, over-marketed term that nobody has deployed over the past few years, to a complete believer, claiming that over 20 live demos will be given this year on Cloud.
Perhaps the funniest quote was his motivation to use Infiniband as the interconnect
"Ethernet was invented by Xerox when I was a child."
-- Larry Ellison
Here are some sessions that IBM is featuring on Monday. Note the first two are Solution Spotlight sessions at the IBM Booth #1111 where I will be most of the time.
IBM Cloud Computing Solutions for Oracle
10/03/11, 10:30 a.m. – 11:00 a.m., Solution Spotlight, Booth #1111 Moscone South
Presenter: Chuck Calio,Technical Strategist, IBM Systems & Technology Group
IBM is recognized in the IT industry as one of the "Big 6" cloud providers, along with Amazon, Google, Microsoft, Salesforce and Yahoo. This session will highlight how IBM Cloud offerings apply to Oracle applications.
Lowering Cost and increasing efficiency in your long term support of Oracle EPM and BI
10/03/11, 3:00 p.m. -- 3:30 p.m., Solution Spotlight, Booth #1111 Moscone South
Presenter: Matthew Angelstad, IBM Global Business Solutions - Oracle EPM (Hyperion) Practice Lead
In 2007, Oracle acquired Hyperion, a leading provider of performance management software. This session will show how IBM helps Oracle clients unify Enterprise Performance Management (EPM) and Business Intelligence (BI) in a cost-effective manner, supporting a broad range of strategic, financial and operational management processes.
Application Strategy: Charting the Course for Maximum Business Value
10/03/11, 3:30 p.m. – 4:30 p.m., OpenWorld session #39061
Presenter: Mike Marchildon, IBM
The industry is undergoing a shift from single Enteprise Resource Planning (ERP) application to second-generation platforms containing diverse yet interdependent systems. This shift presents opportunities and challenges for both IT and the business.
Continuing my week in Washington DC for the annual [2010 System Storage Technical University], I presented a session on Storage for the Green Data Center, and attended a System x session on Greening the Data Center. Since they were related, I thought I would cover both in this post.
Storage for the Green Data Center
I presented this topic in four general categories:
Drivers and Metrics - I explained the three key drivers for consuming less energy, and the two key metrics: Power Usage Effectiveness (PUE) and Data Center Infrastructure Efficiency (DCiE).
Storage Technologies - I compared the four key storage media types: Solid State Drives (SSD), high-speed (15K RPM) FC and SAS hard disk, slower (7200 RPM) SATA disk, and tape. I had comparison slides that showed how IBM disk was more energy efficient than competition, for example DS8700 consumes less energy than EMC Symmetrix when compared with the exact same number and type of physical drives. Likewise, IBM LTO-5 and TS1130 tape drives consume less energy than comparable HP or Oracle/Sun tape drives.
Integrated Systems - IBM combines multiple storage tiers in a set of integrated systems managed by smart software. For example, the IBM DS8700 offers [Easy Tier] to offer smart data placement and movement across Solid-State drives and spinning disk. I also covered several blended disk-and-tape solutions, such as the Information Archive and SONAS.
Actions and Next Steps - I wrapped up the talk with actions that data center managers can take to help them be more energy efficient, from deploying the IBM Rear Door Heat Exchanger, or improving the management of their data.
Greening of the Data Center
Janet Beaver, IBM Senior Manager of Americas Group facilities for Infrastructure and Facilities, presented on IBM's success in becoming more energy efficient. The price of electricity has gone up 10 percent per year, and in some locations, 30 percent. For every 1 Watt used by IT equipment, there are an additional 27 Watts for power, cooling and other uses to keep the IT equipment comfortable. At IBM, data centers represent only 6 percent of total floor space, but 45 percent of all energy consumption. Janet covered two specific data centers, Boulder and Raleigh.
At Boulder, IBM keeps 48 hours reserve of gasoline (to generate electricity in case of outage from the power company) and 48 hours of chilled water. Many power outages are less than 10 minutes, which can easily be handled by the UPS systems. At least 25 percent of the Computer Room Air Conditioners (CRAC) are also on UPS as well, so that there is some cooling during those minutes, within the ASHRAE guidelines of 72-80 degrees Fahrenheit. Since gasoline gets stale, IBM runs the generators once a month, which serves as a monthly test of the system, and clears out the lines to make room for fresh fuel.
The IBM Boulder data center is the largest in the company: 300,000 square feet (the equivalent of five football fields)! Because of its location in Colorado, IBM enjoys "free cooling" using outside air temperature 63 percent of the year, resulting in a PUE of 1.3 rating. Electricity is only 4.5 US cents per kWh. The center also uses 1 Million KwH per year of wind energy.
The Raleigh data center is only 100,000 Square feet, with a PUE 1.4 rating. The Raleigh area enjoys 44 percent "free cooling" and electricity costs at 5.7 US cents per kWh. The Leadership in Energy and Environmental Design [LEED] has been updated to certify data centers. The IBM Boulder data center has achieved LEED Silver certification, and IBM Raleigh data center has LEED Gold certification.
Free cooling, electricity costs, and disaster susceptibility are just three of the 25 criteria IBM uses to locate its data centers. In addition to the 7 data centers it manages for its own operations, and 5 data centers for web hosting, IBM manages over 400 data centers of other clients.
It seems that Green IT initiatives are more important to the storage-oriented attendees than the x86-oriented folks. I suspect that is because many System x servers are deployed in small and medium businesses that do not have data centers, per se.
Each quarter since 2006, the [IBM Migration Factory] team has tallied the number of clients who have moved to IBM severs and storage systems from competitive hardware. We'll I've just seen the latest numbers, for the third quarter of 2010, and it looks like we set a new quarterly record with nearly 400 total migrations to IBM from Oracle/Sun and HP.
It's clear that companies and governments worldwide are seeing greater value in IBM systems, while Oracle and HP watch their customer bases erode. In just this past 3Q 2010, nearly 400 clients have moved over to IBM -- almost all of them from Oracle/Sun and HP. Of these, 286 clients migrated to IBM Power Systems, running AIX, Linux and IBM i operating systems, from competitors alone -- nearly 175 from Oracle/Sun and nearly 100 from HP. The number of migrations to IBM Power Systems through the first three quarters of 2010 is nearly 800, already exceeding the total for all of last year by more than 200.
Let's do the math.... Since IBM established its Migration Factory program in 2006, more than 4,500 clients have switched to IBM. More than 1,000 from Oracle/Sun and HP joined the exodus this year alone. In less than five years, almost 3,000 of these clients -- including more than 1,500 from Oracle/Sun and more than 1,000 from HP -- have chosen to run their businesses on IBM's Power Systems. That's more than a client per day making the move to IBM!
And as the servers go, so goes the storage. Clients are re-discovering IBM as a server and storage powerhouse, offering a strong portfolio in servers, disk and tape systems, and how synergies between servers and storage can provide them real business benefits.
Adding it all up, it's clear that IBM's multi-billion dollar investment in helping to build a smarter planet with workload-optimized systems is paying off -- and that, more and more, clients are selecting IBM over the competition to help them meet their business needs.
(FTC Disclosure: I do not work or have any financial investments in ENC Security Systems. ENC Security Systems did not paid me to mention them on this blog. Their mention in this blog is not an endorsement of either their company or any of their products. Information about EncryptStick was based solely on publicly available information and my own personal experiences. My friends at ENC Security Systems provided me a full-version pre-loaded stick for this review.)
The EncryptStick software comes in two flavors, a free/trial version, and the full/paid version. The free trial version has [limits on capacity and time] but provides enough glimpse of the product to decide before you buy the full version. You can download the software yourself and put in on your own USB device, or purchase the pre-loaded stick that comes with the full-version license.
Whichever you choose, the EncryptStick offers three nice protection features:
Encryption for data organized in "storage vaults", which can be either on the stick itself, or on any other machine the stick is connected to. That is a nice feature, because you are not limited to the capacity of the USB stick.
Encrypted password list for all your websites and programs.
A secure browser, that prevents any key-logging or malware that might be on the host Windows machine.
I have tried out all three functions and everything works as advertised. However, there is always room for improvement, so here are my suggestions.
The first problem is that the pre-loaded stick looks like it is worth a million dollars. It is in a shiny bronze color with "EncryptStick" emblazoned on it. This is NOT subtle advertising! This 8GB capacity stick looks like it would be worth stealing solely on being a nice piece of jewelry, and then the added bonus that there might be "valuable secrets" just makes that possibility even more likely.
If you want to keep your information secure, it would help to have "plausible deniability" that there is nothing of value on a stick. Either have some corporate logo on it, of have the stick look like a cute animal, like these pig or chicken USB sticks.
It reminds me how the first Apple iPod's were in bright [Mug-me White]. I use black headphones with my black iPod to avoid this problem.
Of course, you can always install the downloadable version of EncryptStick software onto a less conspicuous stick if you are concerned about theft. The full/paid version of EncryptStick offers an option for "lost key recovery" which would allow you to backup the contents of the stick and be able to retrieve them on a newly purchased stick in the event your first one is lost or stolen.
Imagine how "unlucky" I felt when I notice that I had lost my "rabbits feet" on this cute animal-themed USB stick.
I sense trouble for losing the cap on my EncryptStick as well. This might seem trivial, but is a pet-peeve of mine that USB sticks should plan for this. Not only is there nothing to keep the cap on (it slides on and off quite smoothly), but there is no loop to attach the cap to anything if you wanted to.
Since then, I got smart and try to look for ways to keep the cap connected. Some designs, like this IBM-logoed stick shown above, just rotate around an axle, giving you access when you need it, and protection when it is folded closed.
Alternatively, get a little chain that allows you to attach the cap to the main stick. In the case of the pig and chicken, the memory section had a hole pre-drilled and a chain to put through it. I drilled an extra hole in the cap section of each USB stick, and connected the chain through both pieces.
(Warning: Kids, be sure to ask for assistance from your parents before using any power tools on small plastic objects.)
The EncryptStick can run on either Microsoft Windows or Mac OS. The instructions indicate that you can install both versions of download software onto a single stick, so why not do that for the pre-loaded full version? The stick I have had only the Windows version pre-loaded. I don't know if the Windows and Mac OS versions can unlock the same "storage vaults" on the stick.
Certainly, I have been to many companies where either everyone runs Windows or everyone runs Mac OS. If the primary target audience is to use this stick at work in one of those places, then no changes are required. However, at IBM, we have employees using Windows, Mac OS and Linux. In my case, I have all three! Ideally, I would like a version of EncryptStick that I could take on trips with me that would allow me to use it regardless of the Operating System I encountered.
Since there isn't a Linux-version of EncryptStick software, I decided to modify my stick to support booting Linux. I am finding more and more Linux kiosks when I travel, especially at airports and high-traffic locations, so having a stick that works both in Windows or Linux would be useful. Here are some suggestions if you want to try this at home:
Use fdisk to change the FAT32 partition type from "b" to "c". Apparently, Grub2 requires type "c", but the pre-loaded EncryptStick was set to "b". The Windows version of EncryptStick> seems to work fine in either mode, so this is a harmless change.
Install Grub2 with "grub-install" from a working Linux system.
Once Grub2 is installed, you can boot ISO images of various Linux Rescue CDs, like [PartedMagic] which includes the open-source [TrueCrypt] encryption software that you could use for Linux purposes.
This USB stick could also be used to help repair a damaged or compromised Windows system. Consider installing [Ophcrack] or [Avira].
Certainly, 8GB is big enough to run a full Linux distribution. The latest 32-bit version of [Ubuntu] could run on any 32-bit or 64-bit Intel or AMD x86 machine, and have enough room to store an [encrypted home directory].
Since the stick is formatted FAT32, you should be able to run your original Windows or Mac OS version of EncryptStick with these changes.
Depending on where you are, you may not have the luxury to reboot a system from the USB memory stick. Certainly, this may require changes to the boot sequence in the BIOS and/or hitting the right keys at the right time during the boot sequence. I have been to some "Internet Cafes" that frown on this, or have blocked this altogether, forcing you to boot only from the hard drive.
Well, those are my suggestions. Whether you go on a trip with or without your laptop, it can't hurt to take this EncryptStick along. If you get a virus on your laptop, or have your laptop stolen, then it could be handy to have around. If you don't bring your laptop, you can use this at Internet cafes, hotel business centers, libraries, or other places where public computers are available.