Last week, I presented IBM's strategic initiative, the IBM Information Infrastructure, which is part of IBM's New Enterprise Data Center vision. This week, I will try to get around to talking about some of theproducts that support those solutions.
There has been a lot of attention on XIV in the past few weeks, so I will start with that. Steve Duplessie, anIT industry analyst from Enterprise Strategy Group (ESG) had a post [Adaptec buys Aristos, Tom Cruise, XIV, and Logical Assumptions] with some interesting observations and some sage advice.Val Bercovici on his NetApp Exposed blog, has a post [Has Storage Swift-Blogging Finally Jumped the Shark?] which blasts EMC for their negativity.
(For those not in the USA, swift-blogging is a reference tofalse accusations and negative remarks made during the U.S. 2004 presidential election by the[Swift Boat Veterans], and ["jumping the shark"] is a reference to [a TV show that ran out of interesting and relevant topics].For movie sequels, the comparable phrase is ["nuke the fridge"] in reference to the most recent Indiana Jones' movie.)
I was going to set the record straight on a variety of misunderstandings, rumors or speculations, but I think most have been taken care of already. IBM blogger BarryW covered the fact that SVC now supports XIV storage systems, in his post[SVC and XIV],and addressed some of the FUD already. Here was my list:
- Now that IBM has an IBM-branded model of XIV, IBM will discontinue (insert another product here)
I had seen speculation that XIV meant the demise of the N series, the DS8000 or IBM's partnership with LSI.However, the launch reminded people that IBM announced a new release of DS8000 features, new models of N series N6000,and the new DS5000 disk, so that squashes those rumors.
- IBM XIV is a (insert tier level here) product
While there seems to be no industry-standard or agreement for what a tier-1, tier-2 or tier-3 disk system is, there seemed to be a lot of argument over what pigeon-hole category to put IBM XIV in. No question many people want tier-1 performance and functionality at tier-2 prices, and perhaps IBM XIV is a good step at giving them this. In some circles, tier-1 means support for System z mainframes. The XIV does not have traditional z/OS CKD volume support, but Linux on System z partitions or guests can attach to XIV via SAN Volume Controller (SVC), or through NFS protocol as part of the Scale-Out File Services (SoFS) implementation.
Whenever any radicalgame-changing technology comes along, competitors with last century's products and architectures want to frame the discussion that it is just yet another storage system. IBM plans to update its Disk Magic and otherplanning/modeling tools to help people determine which workloads would be a good fit with XIV.
- IBM XIV lacks (insert missing feature here) in the current release
I am glad to see that the accusations that XIV had unprotected, unmirrored cache were retracted. XIV mirrors all writes in the cache of two separate modules, with ECC protection. XIV allows concurrent code loadfor bug fixes to the software. XIV offers many of the features that people enjoy in other disksystems, such as thin provisioning, writeable snapshots, remote disk mirroring, and so on.IBM XIV can be part of a bigger solution, either through SVC, SoFS or GMAS that provide thebusiness value customers are looking for.
- IBM XIV uses (insert block mirroring here) and is not as efficient for capacity utilization
It is interesting that this came from a competitor that still recommends RAID-1 or RAID-10 for itsCLARiiON and DMX products.On the IBM XIV, each 1MB chunk is written on two different disks in different modules. When disks wereexpensive, how much usable space for a given set of HDD was worthy of argument. Today, we sell you abig black box, with 79TB usable, for (insert dollar figure here). For those who feel 79TB istoo big to swallow all at once, IBM offers "capacity on demand" pricing, where you can pay initially for as littleas 22TB, but get all the performance, usability, functionality and advanced availability of the full box.
- IBM XIV consumes (insert number of Watts here) of energy
For every disk system, a portion of the energy is consumed by the number of hard disk drives (HDD) andthe remainder to UPS, power conversion, processors and cache memory consumption. Again, the XIV is a bigblack box, and you can compare the 8.4 KW of this high-performance, low-cost storage one-frame system with thewattage consumed by competitive two-frame (sometimes called two-bay) systems, if you are willing to take some trade-offs. To getcomparable performance and hot-spot avoidance, competitors may need to over-provision or use faster, energy-consuming FC drives, and offer additional software to monitor and re-balance workloads across RAID ranks.To get comparable availability, competitors may need to drop from RAID-5 down to either RAID-1 or RAID-6.To get comparable usability, competitors may need more storage infrastructure management software to hide theinherent complexity of their multi-RAID design.
Of course, if energy consumption is a major concern for you, XIV can be part of IBM's many blended disk-and-tapesolutions. When it comes to being green, you can't get any greener storage than tape! Blended disk-and-tapesolutions help get the best of both worlds.
Well, I am glad I could help set the record straight. Let me know what other products people you would like me to focus on next.
technorati tags: IBM, XIV, disk, storage, system, Steve Duplessie, ESG, Val Bercovici, NetApp, BarryW, SVC, DS8000, N6000, DS5000, mainframe, z/OS, CKD, SoFS, NFS, ECC, HDD, RAID, UPS, availability, reliability, performance, usability, blended disk-and-tape, green
IBM once again delivers storage innovation!
(Note: The following paragraphs have been updated to clarify the performance tests involved.)
This time, IBM breaks the 1 million IOPS barrier, achieved by running a test workload consisting of a 70/30 mix of random 4K requests. That is 70 percent reads, 30 percent writes, with 4KB blocks. The throughput achieved was 3.5x times that obtained by running the identical workload on the fastest IBM storage system today (IBM System Storage SAN Volume Controller 4.3),
and an estimated EIGHT* times the performance of EMC DMX. With an average response time under 1 millisecond, this solution would be ideal for online transaction processing (OLTP) such as financial recordings or airline reservations.
(*)Note: EMC has not yet published ANY benchmarks of their EMC DMX box with SSD enterprise flash drives (EFD). However, I believe that the performance bottleneck is in their controller and not the back-end SSD or FC HDD media, so I have givenEMC the benefit of the doubt and estimated that their latest EMC DMX4 is as fast as an[IBMDS8300 Turbo] with Fibre Channel drives. If or when EMC publishes benchmarks, the marketplace can make more accurate comparisons. Your mileage may vary.
IBM used 4 TB of Solid State Disk (SSD) behind its IBM SAN Volume Controller (SVC) technology to achieve this amazing result. Not only does this represent a significantly smaller footprint, but it uses only 55 percent of the power and cooling.
The SSD drives are made by [Fusion IO] and are different than those used by EMC made by STEC.
The SVC addresses the one key problem clients face today with competitive disk systems that support SSD enterprise flash drives: choosing what data to park on those expensive drives? How do you decide which LUNs, which databases, or which files should be permanently resident on SSD? With SVC's industry-leading storage virtualization capability, you are not forced to decide. You can move data into SSD and back out again non-disruptively, as needed to meet performance requirements. This could be handy for quarter-end or year-end processing, for example.
For more on this, see the [IBM Press Release] or thearticles in [Network World] by Jon Brodkin, and [Cnet News] by Brooke Crothers.
Our clients have often told us at IBM that performance is one of their top purchase criteria. IBM once again has shown that it listens to the marketplace!
technorati tags: IBM, SVC, million, IOPS, EMC, DMX, Network World, Cnet, Jon Brodkin, Brooke Crothers, benchmark, leading, performance, SSD, EFD, FC, HDD, disk, systems, media
Continuing my catch-up on past posts, Jon Toigo on his DrunkenData
blog, posted a ["bleg"
] for information aboutdeduplication. The responses come from the "who's who" of the storage industry, so I will provide IBM'sview. (Jon, as always, you have my permission to post this on your blog!)
- Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
IBM offers two different forms of deduplication. The first is IBM System Storage N series disk system with Advanced Single Instance Storage (A-SIS), and the second is IBM Diligent ProtecTier software. Larry Freeman from NetApp already explains A-SIS in the [comments on Jon's post], so I will focus on the Diligent offering in this post. The key differentiators for Diligent are:
- Data agnostic. Diligent does not require content-awareness, format-awareness nor identification of backup software used to send the data. No special client or agent software is required on servers sending data to an IBM Diligent deployment.
- Inline processing. Diligent does not require temporarily storing data on back-end disk to post-process later.
- Scalability. Up to 1PB of back-end disk managed with an in-memory dictionary.
- Data Integrity. All data is diff-compared for full 100 percent integrity. No data is accidentally discarded based on assumptions about the rarity of hash collisions.
- InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
Diligent is focused on backup workloads, which has the best opportunity for deduplication benefits. The two main benefits are:
- Keeping more backup data available online for fast recovery.
- Mirroring the backup data to another remote location for added protection. With inline processing, only the deduplicated data is sent to the back-end disk, and this greatly reduces the amount of data sent over the wire to the remote location.
- Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one inline function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
As with any storage offering, the three gating factors are typically:
- Will this meet my current business requirements?
- Will this meet my future requirements for the next 3-5 years that I plan to use this solution?
- What is the Total Cost of Ownership (TCO) for the next 3-5 years?
Assuming you already have backup software operational in your existing environment, it is possible to determine thenecessary ingest rate. How many "Terabytes per Hour" (TB/h) must be received, processed and stored from the backup software during the backup window. IBM intends to document its performance test results of specific software/hardwarecombinations to provide guidance to clients' purchase and planning decisions.
For post-process deployments, such as the IBM N series A-SIS feature, the "ingest rate" during the backup only has to receive and store the data, and the rest of the 24-hour period can be spent doing the post-processing to find duplicates. This might be fine now, but as your data grows, you might find your backup window growing, and that leaves less time for post-processing to catch up. IBM Diligent does the processing inline, so is unaffected by an expansion of the backup window.
IBM Diligent can scale up to 1PB of back-end data, and the ingest rate does not suffer as more data is managed.
As for TCO, post-process solutions must have additional back-end storage to temporarily hold the data until the duplicates can be found. With IBM Diligent's inline methodology, only deduplicated data is stored, so less disk space is required for the same workloads.
- Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
IBM Diligent emulates a tape library, so the incoming data appears as files to be written sequentially to tape. A file is a string of bytes. Unlike block-level algorithms that divide files up into fixed chunks, IBM Diligent performs diff-compares of incoming data with existing data, and identifies ranges of bytes that duplicate what already is stored on the back-end disk. The file is then a sequence of "extents" representing either unique data or existing data. The file is represented as a sequence of pointers to these extents. An extent can vary from2KB to 16MB in size.
- De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
For IBM Diligent, all of the data needed to reconstitute the data is stored on back-end disks. Assuming that all of your back-end disks are available after the disaster, either the original or mirrored copy, then you only need the IBM Diligent software to make sense of the bytes written to reconstitute the data. If the data was written by backup software, you would also need compatible backup software to recover the original data.
- De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a subpoena or discovery request? Does de-dupe conflict with the non-repudiation requirements of certain laws?
I am not a lawyer, and certainly there are aspects of[non-repudiation] that may or may not apply to specific cases.
What I can say is that storage is expected to return back a "bit-perfect" copy of the data that was written. Thereare laws against changing the format. For example, an original document was in Microsoft Word format, but is converted and saved instead as an Adobe PDF file. In many conversions, it would be difficult to recreate the bit-perfect copy. Certainly, it would be difficult to recreate the bit-perfect MS Word format from a PDF file. Laws in France and Germany specifically require that the original bit-perfect format be kept.
Based on that, IBM Diligent is able to return a bit-perfect copy of what was written, same as if it were written to regular disk or tape storage, because all data is diff-compared byte-for-byte with existing data.
In contrast, other solutions based on hash codes have collisions that result in presenting a completely different set of data on retrieval. If the data you are trying to store happens to have the same hash code calculation as completely different data already stored on a solution, then it might just discard the new data as "duplicate". The chance for collisions might be rare, but could be enough to put doubt in the minds of a jury. For this reason, IBM N series A-SIS, that does perform hash code calculations, will do a full byte-for-byte comparison of data to ensure that data is indeed a duplicate of an existing block stored.
- Some say that de-dupe obviates the need for encryption. What do you think?
I disagree. I've been to enough [Black Hat] conferences to know that it would be possible to read thedata off the back-end disk, using a variety of forensic tools, and piece together strings of personal information,such as names, social security numbers, or bank account codes.
Currently, IBM provides encryption on real tape (both TS1120 and LTO-4 generation drives), and is working withopen industry standards bodies and disk drive module suppliers to bring similar technology to disk-based storage systems.Until then, clients concerned about encryption should consider OS-based or application-based encryption from thebackup software. IBM Tivoli Storage Manager (TSM), for example, can encrypt the data before sending it to the IBMDiligent offering, but this might reduce the number of duplicates found if different encryption keys are used.
- Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
Re-constituting the data back to the original format on tape allows the original backup software to interpret the tape data directly to recover individual files. For example, IBM TSM software can write its primary backup copies to an IBM Diligent offering onsite, and have a "copy pool" on physical tape stored at a remote location. The physical tapes can be used for recovery without any IBM Diligent software in the event of a disaster. If the IBM Diligent back-end disk images are lost, corrupted, or destroyed, IBM TSM software can point to the "copy pool" and be fully operational. Individual files or servers could be restored from just a few of these tapes.
An NDMP-like tape backup of a deduplicated back-end disk would require that all the tapes are in-tact, available, and fully restored to new back-end disk before the deduplication software could do anything. If a single cartridge fromthis set was unreadable or misplaced, it might impact the access to many TBs of data, or render the entire systemunusable.
In the case of a 1PB of back-end disk for IBM Diligent, you would be having to recover over a thousand tapes back to disk before you could recover any individual data from your backup software. Even with dozens of tape drives in parallel, could take you several days for the complete process.This represents a longer "Recovery Time Objective" (RTO) than most people are willing to accept.
- Some vendors are claiming de-dupe is “green” — do you see it as such?
Certainly, "deduplicated disk" is greener than "non-deduplicated" disk, but I have argued in past posts, supportedby Analyst reports, that it is not as green as storing the same data on "non-deduplicated" physical tape.
- De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
Deduplication can be applied to primary data, as in the case of the IBM System Storage N series A-SIS. As Larrysuggests, MS Exchange and SharePoint could be good use cases that represent the possible savings for squeezing outduplicates. On the mainframe, many master-in/master-out tape applications could also benefit from deduplication.
I do not believe that deduplication products will run efficiently with “update in place” applications, that is high levels of random writes for non-appending updates. OLTP and Database workloads would not benefit from deduplication.
- Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
In general, new technologies are introduced on software first, and then as implementations mature, get hardware-based to improve performance. The same was true for RAID, compression, encryption, etc. The Hifn card does "hash code" calculations that do not benefit the current IBM Diligent implementation. Currently, IBM Diligent performsLZH compression through software, but certainly IBM could provide hardware-based compression with an integrated hardware/software offering in the future. Since IBM Diligent's inline process is so efficient, the bottleneck in performance is often the speed of the back-end disk. IBM Diligent can get improved "ingest rate" using FC instead of SATA disk.
Sorry, Jon, that it took so long to get back to you on this, but since IBM had just acquired Diligent when you posted, it took me a while to investigate and research all the answers.
technorati tags: IBM, Diligent, Jon Toigo, DrunkenData, bleg, deduplication, A-SIS, NetApp, ProtecTier, inline, post-process, back-end, disk, data integrity, hash, collision, ingest rate, VTL, non-repudiation, extent, bit-perfect, Microsoft Word, Adobe PDF, diff, Black Hat, encryption, compression, Hifn, FC, SATA
Yesterday, I started this week's topic discussing the various areas of exploration to helpunderstand our recent press release of the IBM System Storage SAN Volume Controller and itsimpressive SPC-1 and SPC-2 benchmark results that ranks it the fastest disk system in the industry.
Some have suggested that since the SVC has a unique design, it should be placed in its own category,and not compared to other disk systems. To address this, I would like to define what IBM meansby "disk system" and how it is comparable to other disk systems.
When I say "disk system", I am going to focus specifically on block-oriented direct-access storage systems, which I will define as:
One or more IT components, connected together, that function as a whole, to serve as a target forread and write requests for specific blocks of data.
Clarification: One could argue, and several do in various comments below, that there are other typesof storage systems that contain disks, some that emulate sequential access tape libraries, some that emulate file-systems through CIFS or NFS protocols, and some that support thestorage of archive objects and other fixed content. At the risk of looking like I may be including or excluding such to fit my purposes, I wanted to avoid apples-to-orangescomparisons between very different access methods. I will limit this exploration to block-oriented, direct-access devices. We can explore these other types of storage systems in later posts.
People who have been working a long time in the storage industry might be satisfied by this definition, thinkingof all the disk systems that would be included by this definition, and recognize that other types of storage liketape systems that are appropriately excluded.
Others might be scratching their heads, thinking to themselves "Huh?" So, I will provide some background, history, and additional explanation. Let's break up the definition into different phrases, and handle each separately.
- read and write requests
Let's start with "read and write requests", which we often lump together generically as input/output request, or just I/O request. Typically an I/O request is initiated by a host, over a cable or network, to a target. The target responds with acknowledgment, data, or failure indication. A host can be a server, workstation, personal computer, laptop or other IT device that is capable of initiating such requests, and a target is a device or system designed to receive and respond to such requests.
(An analogy might help. A woman calls the local public library. She picks up the phone, and dials the phone number of the one down the street. A man working at the library hears the phone ring, answers it with "Welcome to the Public Library! How can I help you?" She asks "What is the capital city of Ethiopia?" and replies "Addis Ababa." and hangs up. Satisfied with this response, she hangs up. In this example, the query for information was the I/O request, initiated by the lady, to the public library target)
Today, there are three popular ways I/O requests are made:
- CCW commands over OEMI, ESCON or FICON cables
- SCSI commands over SCSI, Fibre Channel or SAS cables
- SCSI commands over Ethernet cables, wireless or other IP communication methods
- specific blocks of data
In 1956, IBM was the first to deliver a disk system. It was different from tape because it was a "direct access storage device" (the acronym DASD is still used today by some mainframe programmers). Tape was a sequential media, so it could handle commands like "read the next block" or "write the next block", it could not directly read without having to read past other blocks to get to it, nor could it write over an existing block without risking overwriting the contents of blocks past it.
The nature of a "block" of data varies. It is represented by a sequence of bytes of specific length. The length is determined in a variety of ways.
- CCW commands assume a Count-Key-Data (CKD) format for disk, meaning that tracks are fixed in size, but that a track can consist of one or more blocks, and can be fixed or variable in length. Some blocks can span off the end of one track, and over to another track. Typical block sizes in this case are 8000 to 22000 bytes.
- SCSI commands assume a Fixed-Block-Architecture (FBA) format for disk, where all blocks are the same size, almost always a power of two, such as 512 or 4096 bytes. A few operating systems, however, such as i5/OS on IBM System i machines, use a block size that doesn't follow this power-of-two rule.
- one or more IT components
You may find one or more of the following IT components in a disk system:
- customized or general-purpose processing chips
- memory, such as RAM, Flash, or similar
- batteries and/or other power supply
- Host attachment cards or ports
- motorized platter(s) covered in magnetic coating with a read/write head to move over its surface. These are often referred to as Hard Disk Drive (HDD) or Disk Drive Modules (DDM), and are manufacturedby companies like Seagate or Hitachi Global Storage Technologies.
A set of HDD can be accessed individually, affectionately known as JBOD for Just-a-bunch-of-disk, or collectively in a RAID configuration.
Memory can act as the high-speed cache in front of slower storage, or as the storage itself. For example, the solid state disk that IBM announced last week is entirely memory storage, using Flash technology.
Lately, there are two popular packaging methods for disk systems:
- Monolithic -- all the components you need connected together inside a big refrigerator-sized unit, with options to attach additional frames. The IBM System Storage DS8000, EMC Symmetrix DMX-4 and HDS TagmaStore USP-V all fit this category.
- Modular -- components that fit into standard 19-inch racks, often the size of the vegetable drawer inside a refrigerator, that can be connected externally with other components, if necessary, to make a complete disk system. The IBM System Storage DS6000, DS4000, and DS3000 series, as well as our SVC and N series, fall into this category.
Regardless of packaging, the general design is that a "controller" receives a request from its host attachment port, and uses its processors and cache storage to either satisfy the request, or pass the request to the appropriate HDD,and the results are sent back through the host attachment port.
In all of the monolithic systems, as well as some of the modular ones, the controller and HDD storage are contained in the same unit. On other modular systems, the controller is one system, and the HDD storage is in a separate system, and they are cabled together.
- serve as a target
The last part is that a disk system must be able to satisfy some or all requests that come to it.
(Using the same analogy used above, when the lady asked her question, the guy at the public library knew the answer from memory, and replied immediately. However, for other questions, he might need to look up the answer in a book, do a search on the internet, or call another library on her behalf.)
Some disk systems are cache-only controllers. For these, either the I/O request is satisfied as a read-hit or write-hit in cache, or it is not, and has to go to the HDD. The IBM DS4800 and N series gateways are examples of this type of controller.
Other systems may have controller and disk, but support additional disk attachment. In this case, either the I/O request is handled by the cache or internal disk, or it has to go out to external HDD to satisfy the request. IBM DS3000 series, DS4100, DS4700, and our N series appliance models, all fall into this category.
So, the SAN Volume Controller is a disk system comprising of one to four node-pairs. Each node is a piece of IT equipment that have processors and cache. These node-pairs are connected to a pair of UPS power supplies to protect the cache memory holding writes that have not yet been de-staged. The combination of node-pairs and UPS acting as a whole, is able to serve as a target to SCSI commands sent over Fibre Channel cables on a Storage Area Network (SAN). To read some blocks of data, it uses its internal cache storage to satisfy the request, and for others, it goes out to external disk systems that contain the data required. All writes are satisfied immediately in cache on the SVC, and later de-staged to external disk when appropriate.
As of end of 2Q07, having reached our four-year anniversary for this product, IBM has sold over 9000 SVC nodes, which are part of more than 3100 SVC disk systems. These things are flying off the shelves, clocking in a 100% YTY growth over the amount we sold twelve months ago. Congratulations go to the SVC development team for their impressive feat of engineering that is starting to catch the attention of many customers and return astounding results!
So, now that I have explained why the SVC is considered a disk system, tomorrow I'll discuss metrics to measure performance.
technorati tags: IBM, SVC, HDD, DDM, DS4800, SVC, SAN Volume Controller, EMC, HDS, HP, DS4100, DS4700, Flash, RAM, solid-state, disk, system, controller, array, RAID, I/O, IO, request, read, write,
Fellow blogger Chuck Hollis from EMC has a post titled[Whither Frankenstorage
] causing quite a stir in the [Stor-o-Sphere
]. He is not the firstEMC blogger to use this phrase, I credit [BarryB
] for coining the term back in September 2008.Frankenstein serves as the ideal icon for EMC's FUD machine. In the novel, Dr. Frankenstein wasattempting to do something nobody else had ever attempted, to create human life from variousdead body parts, a process full of uncertainty and doubt, with frightful results.
Perhaps it was a coincidence that I discussed IBM's storage strategy in my post[Foundations and Flavorings] on January 28, shortly followed by NetApp's announcing V-series gateway [support of Texas Memory Systems' RamSan-500] on February 3. These two events mighthave been the trigger that pushed ChuckH over the edge to put
pen to paper, .. finger to keyboard.
Flinging FUD in all directions was ChuckH's not-so-subtle way to remind the world that EMC is the only major storage vendor to not offer a successful storage virtualization product. Withoutfirst-hand experience with well-designed storage virtualization, ChuckH conjectures that a configuration matching intelligent front-ends to reliable back-ends might be more expensive, might be more difficult to manage, or might be harder to support.
(Note: Rest assured, IBM can demonstrate that a modular approach, combining intelligent front-ends to reliableback-ends can help reduce costs, be easier to manage, and be fully supported. Contact yourlocal IBM Business Partner or storage sales rep for details.)
The reaction was notas much a blogfight and more of a [dog pile]. Defending NetApp were[Alex McDonald],[Kostadis Roussos], and [Stephen Foskett, Pack Rat]. On the HDS front, we have [Tony Asaro]. My fellow blogger from IBM took his swing with [How Quickly We Forget].And finally, pointing out EMC's hypocrisy, overall, was [James Or] from Storage Monkeys.
My favorite was from Nigel Poulton's post on[Ruptured Monkey]. Here's an excerpt:
In fact, I'm fairly certain that EMC don't back away from customers who run HP or IBM servers and say "sorry we cant help you here, an end to end HP or IBM solution would be much better for you when it comes to troubleshooting……. putting our storage in would only add extra layers of complexity and make things messy….."
On most other days, ChuckH has well-written, insightful blog posts that show that EMC brings some value to the industry. I could have made a snarky reference to[Dr Jekyll and Mr Hyde], or indicate this post proves that nobody at EMC is editing or reviewingChuck's thoughts before they get posted. But it's too late, Chuck already got the message, and added the following to bring the discussion back to civility:
When considering the broad range of storage media service levels available today (flash, FC, SATA, spin-down, etc.) what's the best way to offer these media choices in an array? Is the answer (a) combine smaller arrays from different vendors together behind a virtualization head, or (b) invest the time and effort to build arrays that can directly support all of these media types?
Would anyone like to try a cogent response to the question posed, please?
|To address ChuckH's question, Nigel's post gave me the idea to use today's 200th year celebration of [Charles Darwin].|
Over millions of years, Charles Darwin argued, evolution results in change in the inherited traits of a population of organisms from one generation to the next.A key component of this is a biological process called [mitosis] that allows a single cell to split and become two cells. In some cases, these individual daughter cells can then specialize to specific functions, such as nerve cells, muscle cells or bone cells. Over time, adaptations that work well carry forward, and thosethat don't get left behind.
I find it interesting that before [On the Origin of Species] was published in 1859, works of fiction like Mary Shelley's[Frankenstein] had monsters being"created", and afterward, monsters were the result of mutation or selective adaptation.
Nigel compares EMC's monolithic approach to placing an intelligent front-end with a reliable back-end as "One man band, where one guy is trying playing all the instruments himself" versus the "Philharmonic Orchestra". I would take it one step further, comparing single-cell organisms to multi-cell life forms.
Innovative companies like Google and Amazon can't wait for a completely integrated solution from a major IT vendor to meet their needs. Why should they? There are open standards, and ways to interconnect the best intelligence into a [dynamic infrastructure®.].You don't need to wait another million years to see which way the IT marketplace considers the better approach. Just look at the last 60 years. Back then, computer systems were all integrated, server, storage, and the wires that connected them were all inside a huge container. Then, mitosis happened, and IBM created external tape storage in 1952, and external disk storage in 1956. Open standards for interfaces allowed third party manufacturers like HDS, StorageTek and EMC to offer plug-compatible storage devices.
On the server side, it didn't take long for functionality in mainframes to split off. Mitosis happened again, with front-end UNIX systems processing incoming data, and mainframes handling the back-end data bases and printing. The client-server era replaced dumb terminals with more intelligent desktops and workstations, and these could handle the front-end processing to display information, with the back-end storage and number-crunching being handled by the UNIX and mainframe systems they connected to.Connections between desktops and servers, and from servers to storage, have also evolved. From thousands of direct-attach cables to networks of switches and directors.
Charles Darwin was particularly interested in cases where evolution happened faster or slower than in other cases. While IBM and Microsoft encouraged third-party innovations on the PC side, Apple resisted mitosis, trying to keep its machines pure single-cell, integrated solutions.For the same reasons that you can't fight the laws of nature, Apple ended up having to support I/O ports to external devices. Thanks to open standards like USB and Firewire, you can connect third-party storage to Apple computers. My little Mac Mini at home has more devices hanging off it than any of my Windows or Linux boxes! And Apple's iPod is successful because its iTunes software runs on both Windows and Mac OS operating systems.
Every time mitosis happens in the IT industry, it opens up opportunities to specialize, to innovate, to adapt to a dynamically changing world. When mitosis is suppressed, you get limiting products and frustratedengineers leaving to form their own start-up companies.But when mitosis is encouraged, you get successful products, solutions and partnerships positioned for a smarter planet.
Happy Valentines Day, Chuck!
technorati tags: EMC, Chuck Hollis, frankenstorage, Frankenstein, FUD, IBM, NetApp, TMS, V-series, RamSan-500, storage virtualization, FC, SATA, Charles Darwin, HDS, StorageTek, Microsoft, Apple, UNIX, Linux, Windows, iPod, iTunes, mitosis, Invista, EDL, NX4, Centera, Valentines Day, dynamic infrastructure, smarter planet