This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this blog will no longer be available. More details available on our FAQ.
Rather than a target weight, I chose a target waist measurement, but did not quite make this one. I did keep up with my weekly exercise regime, but we recently installed an "ice cream freezer" here at work, and I have failed to resist temptation.
Reduce, Reuse and Recycle
In my post [Stayingon Budget], I resolved to "reduce, reuse and recycle". I have taken measures to de-clutter and simplify mylife, and already things are paying off. So I am happy about this one.
Learn to Better use Lotus Notes and Office 2007 software
In my post [Honeyour Tools and Skills], I resolved to learn how to better use Lotus Notes and Office 2007. We never got Office 2007.In a surprise move, IBM put out Lotus Symphony, an Office 2007 replacement. Lotus Symphony works on IBM's three approved recognized desktop platforms (Windows XP, Linux and Mac OS X). Here's a collection of [IBM Press Releases about Lotus Symphony].
I did learn how to better use Lotus Notes,thanks to Alan Lepofsky's blog [IBM Lotus Notes Hints, Tips, and Tricks].Ironically, the best help for dealing with Lotus Notes was not the software itself, but the skills in handling emailin general. This includes:
Resist the urge to copy the world, and better use "bcc" to be kind to upper management on "reply all" respondents.
Avoid attaching large documents, but use URL's to NAS file shares, websites, or [YouSendIt.com] instead. Obviously, the recipient has to have access to whatever you point to, but it greatly reduces total email volume and improves transmission over wireless.
Delegate. A lot of times I was the "middleman" between someone asking a question, and someone else Iknew had the answer. Now, I just introduce them together and step out of the way.
Checking email only a few times a day. I use to check my email every 5-10 minutes, now only 2-4 times per day.
In my post, [Lighten Up], I resolved to laugh more, stretch more, get enough sleep, and listen to music more. I participated in monthly[Tucson Laughter Club]events, incorporated stretching in my weekly exercise program, have gotten more sleep, and rediscovered some of my older music that I hadn't listened to in a while. Overall, I feel happy I met this one.
My New Year's Resolutions for 2008:
Improve my writing skills
Going back through my past blog postings, some of my sentences and paragraphs were frightful. I resolve toimprove my sentence and paragraph structure, and make better use of HTML tags to improve the layout andformatting.
Improve my HTML and Web design skills
Contribute to the OLPC Foundation
Last year, as a "Day 1 Donor", I had donated to this important charitable organization to help educate the childrenof third world nations. This year, I plan to learn Python and other programming languages used on the XO laptop,and see how I can contribute my skills and expertise on the OLPC forums.
Eat Healthier and Drink more
I think my downfall with last year's resolution was that it was merely a goal, 35 inch waist, rather thana "call for action". This year, I plan to eat more fish, salads, whole grains and other heart-healthy foods.
While many people resolve to "Quit Drinking", I need to drink more. My doctor, my personaltrainer, and even my interpreter teams, have asked me to do so. We live in Tucson, Arizona, during a centuryof global warming, and dehydration can cause stress on the body.
Attend more movies and film-making events
Last year, I joined the Tucson Film Society, and produced[my first film], part of which was filmedfrom Bogota, Colombia. I got invited to see a lot of independent films, premieres, and film-maker events, but did not attend many. I resolve to attend more in 2008.
Get better Organized
Moving offices from one building to another brought to light that I wasn't well organized. While I havemade some efforts to de-clutter my home, I need to step this up to my work as well.
I decided to start with something very non-tech, a [Hipster PDA]. I have nowmet or heard several people who use this approach successfully, and have decided to give it a try.
Hopefully, this list might inspire you to come up with your own resolutions. Not surprisingly, writing them in a public forum helped me keep most of them, and stick to my resolutions throughout the year.
Whew! I am glad that is over. The BarryB circus has left town, he has decided to [move on to other topics], and I am now to clean up the ["circus gold"] leftbehind. I would like to remind everyone that all of these discussions have been about the architecture,not the product. IBM will come out withits own version of a product based on Nextra later in 2008, which may be different than the product that XIV currentlysells to its customers.
RAID-X does not protect against double-drive failures as well as RAID-6, but it's very close
BarryB calls this the "Elephant in the room", that RAID-6 protects better against double-drive failures. I don't dispute that. He also credits me with the term "RAID-X", but I got this directly from the XIV guys. It turns out this was already a term used among academic research circles for [distributed RAID environments]. Meanwhile, Jon Toigo feels the term RAID-X sounds like a brand of bug spray in his post[XIV Architecture: What’s Not to Like?]Perhaps IBM can change this to RAID-5.99 instead.
If you measure risk of a second drive failing during the rebuild or re-replication process ofa first drive failure, you can measure the exposure by multiplying the amount of GB at risk by thenumber of hours that the second failure could occur, resulting in a unit of "GB-hours". Here Ilist best-case rebuild times, your mileage may vary depending on whether other workloads existon the system competing for resources. Notice that 8-disk configurations of RAID-10 and RAID-5for smaller FC disk are in the triple digits, and larger SATA disk in five digits, but that with RAID-X it is only single digits. That is orders of magnitude closer to the ideal.
For each RAID type, the risk is proportional to the square of the individual drive size.Double the drive size causes the risk to be four times greater.This is not the first time this has been discussed. In [Is RAID-5 Getting Old?], Ramskovquotes NetApp's response in Robin Harris' [NetApp Weighs In On Disks]:
...protecting online data only via RAID 5 today verges on professional malpractice.
As disks get older, RAID-6 will not be able to protect against 3-drive failures. A similar chartabove could show the risk to data after the second drive fails and both rebuilds are going on,compared to the risk of a third drive failure during this time. The RAID-X scheme protects muchbetter against 3-drive failures than RAID-6.
Nothing in the Nextra architecture prevents a RAID-6, Triple-copy, or other blob-level scheme
In much the same way that EMC Centera is RAID-5 based for its blobs, there is nothing in the Nextra architecturethat prevents taking additional steps to provide even better protection, using a RAID-6 scheme, making three copiesof the data instead of two copies, or something even more advanced. The current two-copy scheme for RAID-X is betterthan all the RAID-5 and RAID-10 systems out in the marketplace today.
Mirrored Cache won't protect against Cosmic rays, but ECC detection/correction does
BarryB incorrectly states that since some implementations of cache are non-mirrored, that this implies they are unprotected against Cosmic rays. Mirroring does not protect against bit-flips unless both copies arecompared for differences. Unfortunately, even if you compared them, the best you can do is detect theyare different, there is no way of knowing which version is correct.Mirroring cache is normally done to protect uncommitted writes. Reads in cacheare expendable copies of data already written to disk, so ECC detection/correction schemes are adequateprotection. ECC is like RAID for DRAM memory. A single bit-flip can be corrected, multiple bit-flipscan be detected. In the case of detection, the cache copy is discarded and read fresh again from disk.IBM DS8000, XIV and probably most other major vendor offerings use ECC of some kind. BarryB is correctthat some cheaper entry-level and midrange offerings from other vendors might cut corners in this area.I don't doubt BarryB's assertion that the ECC method used in the EMC products may be differently implemented than theECC in the IBM DS8000, but that doesn't mean the IBM DS8000's ECC implementation is flawed.
ECC protection is important for all RAID systems that perform rebuild, and even more importantthe larger the GB-hours listed in the table above.
XIV is designed for high-utilization, not less than 50 percent
I mentioned that the typical Linux, UNIX or Windows LUN is only 30-50 percent full, and perhaps BarryBthought I was referring to the typical "XIV customer". This average is for all disk storage systems connectedto these operating systems, based on IBM market research and analyst reports. The XIV is expected to run at much higher utilization rates, and offers features like "thin provisioning" and "differential snapshot" to make this simple to implement in practice.
Most often, disks don't fail without warning. Usually, they give out temporary errors first, and then fail permanently.The XIV architecture allows for pre-emptive self-repair, initiating the re-replication process after detecting temporary errors, rather than waiting for a complete drive failure.
I had mentioned that this process used "spare capacity, not spare drives" but I was notified that there are three spare drives per system to ensure that there is enough spare capacity, so I stand corrected.
New drives don't have to match the same speed/capacity as the new drives, so three to five years from now, whenit might be hard to find a matching 500GB SATA drive anymore, you won't have to.
No RAID scheme eliminates backups or Business Continuity Planning
The XIV supports both synchronous and asynchronous disk mirroring to remote locations. Backup software willbe able to backup data from the XIV to tape. A double drive failure would require a "recovery action", eitherfrom the disk mirror, or from tape, for the few GB of data that need to be recovered.
A third alternative is to allow end-users to receive backups of their own user-generated content. For example, I have over 15,000 photos uploaded over the past six years to Kodak Photo Gallery, which I use to share with my friends and family. For about $180 US dollars, they will cut DVDs containing all of my uploaded files and send them to me, so that I do not have to worry about Kodak losing my photos.In many cases, if a company or product fails to deliver on its promises, the most you will get is your money back, but for "free services" like HotMail, FreeDrive, FlickR and others, you didn't pay anything in the first place, andthey may point this limitation of liability in the "terms of service".
XIV can be used for databases and other online transaction processing
The XIV will have FCP and iSCSI interfaces, and systems can use these to store any kind of data you want. I mentionedthat the design was intended for large volumes of unstructured digital content, but there is nothing to prevent the use of other workloads. In today's Wall Street Journal article[To Get Back Into the Storage Game, IBM Calls In an Old Foe]:
Today, XIV's Nextra system is used by Bank Leumi, a large Israeli bank, and a few other customers for traditional data-storage tasks such as recording hundreds of transactions a minute.
BarryB, thanks for calling the truce. I look forward to talking about other topics myself. These past two weeks have been exhausting!
Wrapping up my week's theme on IBM's acquisition XIV, we have gotten hundreds of positive articles and reviews in the press, but has caused quite a stir with the[Not-Invented-Here] folks at EMC.We've heard already from EMC bloggers [Chuck Hollis] and [Mark Twomey].The latest is fellow EMC blogger BarryB's missive [Obligatory "IBM buys XIV" Post], which piles on the "Fear, Uncertainty and Doubt" [FUD], including this excerpt here:
In a block storage device, only the host file system or database engine "knows" what's actually stored in there. So in the Nextra case that Tony has described, if even only 7,500-15,000 of the 750,000 total 1MB blobs stored on a single 750GB drive (that's "only" 1 to 2%) suddenly become inaccessible because the drive that held the backup copy also failed, the impact on a file system could be devastating. That 1MB might be in the middle of a 13MB photograph (rendering the entire photo unusable). Or it might contain dozens of little files, now vanished without a trace. Or worst yet, it could actually contain the file system metadata, which describes the names and locations of all the rest of the files in the file system. Each 1MB lost to a double drive failure could mean the loss of an enormous percentage of the files in a file system.
And in fact, with Nextra, the impact will be across not just one, but more likely several dozens or even hundreds of file systems.
Worse still, the Nextra can't do anything to help recover the lost files.
Nothing could be further from the truth. If any disk drive module failed, the system would know exactly whichone it was, what blobs (binary large objects) were on it, and where the replicated copies of those blobs are located. In the event of a rare double-drive failure, the system would know exactly which unfortunate blobs were lost, and couldidentify them by host LUN and block address numbers, so that appropriate repair actions could be taken from remote mirrored copies or tape file backups.
Second, nobody is suggesting we are going to put a delicateFAT32-like Circa-1980 file system that breaks with the loss of a single block and requires tools like "fsck" to piece back together. Today's modern file systems--including Windows NTFS, Linux ext3, and AIX JFS2--are journaled and have sophisticated algorithms tohandle the loss of individual structure inode blocks. IBM has its own General Parallel File System [GPFS] and corresponding Scale out File Services[SOFS], and thus brings a lotof expertise to the table.Advanced distributed clustered file systems, like [Google File System] and Yahoo's [Hadoop project] take this one step further, recognizing that individual node and drive failures at the Petabyte-scale are inevitable.
In other words, XIV Nextra architecture is designed to eliminate or reduce recovery actions after disk failures, not make them worse. Back in 2003, when IBM introduced the new and innovative SAN Volume Controller (SVC), EMCclaimed this in-band architecture would slow down applications and "brain-damage" their EMC Symmetrix hardware.Reality has proved the opposite, SVC can improve application performance and help reduce wear-and-tear on the manageddevices. Since then, EMC acquired Kashya to offer its own in-band architecture in a product called EMC RecoverPoint, that offers some of the features that SVC offers.
If you thought fear mongering like this was unique to the IT industry, consider that 105years ago, [Edison electrocuted an elephant]. To understand this horrific event, you have to understand what was going on at the time.Thomas Edison, inventor of the light bulb, wanted to power the entire city of New York with Direct Current(DC). Nikolas Tesla proposed a different, but more appropriate architecture,called Alternating Current(AC), that had lower losses over distances required for a city as large and spread out as New York. But Thomas Edison was heavily invested in DC technology, and would lose out on royalties if ACwas adopted.In an effort to show that AC was too dangerous to have in homes and businesses, Thomas Edison held a pressconference in front of 1500 witnesses, electrocuting an elephant named Topsy with 6600 volts, and filmed the event so that it could be shown later to other audiences (Edison invented the movie camera also).
Today's nationwide electric grid would not exist without Alternating Current.We enjoy both AC for what it is best used for, and DC for what it is best used for. Both are dangerous at high voltage levels if not handled properly. The same is the case for storage architectures. Traditional high-performance disk arrays, like the IBM System Storage DS8000, will continue to be used for large mainframe applications, online transaction processing and databases. New architectures,like IBM XIV Nextra, will be used for new Web 2.0 applications, where scalability, self-tuning, self-repair,and management simplicity are the key requirements.
(Update: Dear readers, this was meant as a metaphor only, relating the concerns expressed above thatthe use of new innovative technology may result in the loss or corruption of "several dozen or even hundreds of file systems" and thus too dangerous to use, with an analogy on the use of AC electricity was too dangerous to use in homes. To clarify, EMC did not re-enact Thomas Edison's event, no animalswere hurt by EMC, and I was not trying to make political commentary about the current controversy of electrocution as amethod of capital punishment. The opinions of individual bloggers do not necessarily reflect the official positions of EMC, and I am not implying that anyone at EMC enjoys torturing animals of any size, or their positions on capital punishment in general. This is not an attack on any of the above-mentioned EMC bloggers, but rather to point out faulty logic. Children should not put foil gum wrappers in electrical sockets. BarryB and I have apologized to each other over these posts for any feelings hurt, and discussion should focus instead on the technologies and architectures.)
While EMC might try to tell people today that nobody needs unique storage architectures for Web 2.0 applications, digital media and archive data, because their existing products support SATA disk and can be used instead for these workloads, they are probably working hard behind the scenes on their own "me, too" version.And with a bit of irony, Edison's film of the elephant is available on YouTube, one of the many Web 2.0 websites we are talking about. (Out of a sense of decency, I decided not to link to it here, so don't ask)
This is a reasonable question. Since Invista 2.0 came out months ago in August, and Invista 2.1 is rumored to be out by end of this month, why put out a press release now, rather than just wait a few weeks? Thesignificant part of this announcement was that EMC finally has their first customer reference.To be fair, getting a customer to agree to be a reference is difficult for any vendor. Some non-profitsand government agencies have rules against it, and some corporations just don't want to be bothered byjournalists, or take phone calls from other prospective customers. I suspect EMC wanted to put the good folks from Purdue University in front of the cameras and microphones before they:
In Moore's terminology, Purdue University would be a "technology enthusiast", interested in exploring the technologyof the EMC Invista. Universities by their very nature often see themselves as early adopters, willing to take big risks in hopes to reap big rewards. The chasm happens later, when there are a lot of early adopters, all willing to be reference accounts. The mainstream market--shown here as pragmatists, conservatives, and skeptics-- are unwillingto accept reference claims from early adopters, searching instead for moderate gains from minimal risks. They prefer references from customers that are similar in size and industry. Whether a vendor can get a product to cross this chasm is the focus of the book.
Why "SAN" virtualization?
Technically, Invista is "storage" virtualization, not "SAN" virtualization. Virtualizationis any technology that makes one set of resources look and feel like a different setof resources, preferably with more desirable characteristics. You can virtualizeservers, SANs, and storage resources.
Virtual SAN (VSAN) technology, supported bythe Cisco MDS 9500 Series Multilayer Director Switch, partitions a single physical SAN into multipleVSANs, allowing different business functions and requirements to share a common physical infrastructure.
How does Invista advance Cisco's VSAN functionality? It doesn't, but that doesn't makethe title a falsehood, or the press release by association full of lies.If you read the entire press release, EMCcorrectly states that Invista is "storage" virtualization. Some storagevirtualization products, like EMC Invista and IBM System Storage SAN Volume Controller (SVC), require a SAN as a platform for which to perform their magic.Marketing people might use the term "SAN" torefer not just the network gear that provides the plumbing, but also to include the storage devices that are attached to the SAN. In that light, theuse of "SAN virtualization" can be understood in the title.
More importantly, it appears that EMC no longer requires that you purchase new SAN equipment from themwith Invista. When the Invista first came out, it cost over a quarter-million US dollars to cover thecost of the intelligent switches, but with the price drop to $100K, I imagine this means theyassume everyone has an appropriately-supported intelligent switch already deployed.
Why this architecture?
In his post [Storage Virtualization and Invista 2.0], EMC blogger ChuckH does a fair job explaining why EMC went in this direction for Invista, and how it is different thanother storage virtualization products.
Most storage virtualization products are cache-based. The world's first disk storagevirtualization product, the IBM 3850 Mass Storage System, introduced in 1974, and thefirst tape virtualization product, the IBM 3494 Virtual tape Server, introduced in 1997, bothused disk cache in front of tape storage. Later virtualization products, like IBM SVC and HDS USP-V, use DRAM memory cache in front of disk storage, but the concept is the same.People are comfortable with cache-based solutions, because the technology is matureand well proven in the marketplace, and excited and delighted that these can offer the following features in a mixed heterogeneous disk environment:
instantaneous point-in-time copy
None of these features are provided by Invista, as there is no cache in the switch. Instead,Invista is a "packet cracker"; it cracks open each FCP packet, inspects and modifies the contents, then passes theFCP packet along to the appropriate storage device. This process slows down each read andwrite by some amount, perhaps 20 microseconds. The disadvantage of slowing down every readand write is offset by having other benefits, like non-disruptive data migration.
To compensate for Invista's inability to provide these features,EMC offers a second solution called EMC RecoverPoint, which is an in-band cache-based appliancesimilar in design to SVC, but maps all virtual disks one-to-one to physical disks. It offersremote distance asynchronous mirroring between heterogeneous devices.EMC supports RecoverPoint in front of Invista, but if you are considering buying bothto get the combined set of features, you might as well buy an IBM SVC or HDS USP-V instead,in one system, rather than two, which is much less complicated. IBM SVC and HDS USP-Vhave both "crossed the chasm" having sold thousands of units to every type and size of customer.
Hopefully, this answers the questions you might have about EMC Invista.