Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
- Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
- Seagate (including Maxstor they acquired)
- Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
- American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
- The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
- ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
- RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
- Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
technorati tags: IBM, Seagate, Hitachi, HGST, EMC, NetApp, HP, HDS, Sun, Google, CMU, DDM, Fujitsu, MTBF, MTTF, AFR, ARR, JBOD, RAID, Tivoli, SVC, DR550, CDP, FC, SCSI, disk, tape, SAN,
Tuesday is always good for announcements. Today, Gartner, Inc.
announced that IBM has taken over HP in its climb to the top. I'll quote directly from today's press release:
STAMFORD, Conn., March 6, 2007 — Worldwide external controller-based (ECB) disk storage revenue totaled $15.2 billion in 2006, a 4.1 percent increase over 2005 revenue of $14.6 billion, according to Gartner, Inc.IBM overtook Hewlett-Packard for the No. 2 position in 2006 (see Table 1). IBM’s worldwide ECB market share increased to 15.8 percent, while HP’s market share dropped to 13.1 percent.
IBM beat HP both in 4Q06, as well as 2006 full year.You can read more about it from Gartner Dataquest report “Market Share: Disk Array Storage, All Regions, All Countries, 1Q05-4Q06" on their website. (Note: non-IBMers might need an account with Gartner to access this, not sure)
The focus was on external controller-based disk, not external controller-less SCSI/SAS disk, not disk arrays posing as virtual tape libraries, nor any disk sold inside HP, Sun, IBM or Dell servers. This is to compare with disk-only vendors such as EMC and HDS. The revenues reflect hardware only, including hardware-related parts of financial leases and managed services. Revenues from optional priced software features such as multi-pathing drivers, management software, or advanced copy services were excluded.I discussed these types of analyst reports back in blog post last September: Space Race Heats Up.
These marketshare numbers are based on revenues, not units or terabytes. When a box gets sold, the revenue was counted toward the vendor that sold it, not the manufacturer that built it. In this last report:
- When Dell sells an EMC box, it gets counted as Dell. When Fujitsu Siemens sells an EMC box, it gets counted as "Other".
- When HP sells an HDS box, it gets counted as HP. When Sun sells the HDS box, it gets counted as Sun.
- When IBM sells its System Storage N series (from the OEM agreement with NetApp), it gets counted as IBM. Both IBM and NetApp experienced growth in the NAS/unified storage arena.
It's still cold here in the Washington DC area, but at least good news like this helps warm me up!
technorati tags: IBM, disk, external controller-based, ECB, Gartner, 4Q06, 2006, revenue, marketshare, HP, EMC, Sun, Dell, NetApp, HDS, NAS
For those interested in performance, my IBM colleague Elisabeth Stahl hasstarted up her own blog on the subject, called Benchmarking and Systems Performance
. Check it out!
technorati tags: IBM, systems, performance, TPC-C, benchmarking, blog, Elisabeth Stahl
The blogosphere has quieted down a bit over the two papers on MTBF estimates for Disk Drive Modules (DDM).One article on SearchStorage.com by Arun Taneja asksIs RAID passé?
Disk capacity is growing at a faster rate than DDM reliability. During the hours to rebuild a DDM, companies are at risk of additional failures that could require recovery from a copy, or result in data loss, depending on how well your Business Continuity (BC) plan is written and followed.
I'll discuss two comments in particular.
Joerg Hallbauer felt I did not address all the issues raised:
... The problem with that is that it's the DISK ARRAY that determines when a drive has failed an starts the rebuild process. That IS under the control of IBM, specifically the controller. But more importantly, it effects my risk of data loss.
As I see it, my risk of data loss with RAID-5 is influenced by two main factors. 1 - The drive replacement rate and 2 - The rebuild time (which to a great extent is a function of the drive size) both of which IBM has some control over.
So, I think that the question in my mind is, what's the tipping point? Where does the risk of using RAID-5 protection exceed what I'm willing to accept, and I need to move to some other protection mechanism like RAID-6? Is it when the rebuild times exceed 12 hours? 24 hours? 48 hours?
Also, I wonder why IBM isn't publishing some information to help me make these kinds of decisions?
Bill Todd felt I was not technical enough:
Oh, dear - while Tony doesn’t seem to be parrying vigorously (as Seagate, Hitachi, and Chunk were doing), his contribution sounds more like IBM marketing than the kind of detailed, technical response one might have hoped for
... well, he *is* a manager, and a marketing one at that, so perhaps we shouldn’t expect more).
Both are fair comments. Disk arrays do run microcode to assist or perform the RAID function, detect failures and start the rebuild process, and so clever designs to support spare disks, process the rebuild quickly, and so on, can differentiate one vendor's offering from another.
On the issue of what does IBM provide to help its clients make the right decisions for their environments, Jon William Toigo at DrunkenData points his readers to IBM's Business Continuity Self-Assessment tool. In normal data center conditions, DDMs will fail, and a Business Continuity plan shouldbe written and developed to handle this fact. Using 2-site and 3-site mirroring, complemented with versions of tape backups, can help address some of these concerns and mitigate some of the risks involved with using disk systems.
For those who want a more technical answer, IBM has just published a series of IBM Redbooks.
Each client's situation is different, so no simple answer is possible. However, IBM does have a lot of experience in this area, and would be glad to help you write or update your existing Business Continuity plan.
technorati tags: IBM, disk, MTBF, estimates, papers, Arun Taneja, Jon William Toigo, StorageMojo, Business Continuity, Redbooks
Yesterday, most of the USA moved its clocks forward an hour. Arizona and Hawaii don't bother, as there is plenty of daylight in both states. While it may seem that Arizonans are not "affected" by Daylight Saving Time (DST)
, we are, because we have to deal with the time zone offsets with those we talk to in other states. (Note: it is SAVING not SAVINGS
, many people mistakenly say "Daylight Savings
Time", which is incorrect).
Year round, Arizona is on Mountain Standard Time (MST), which is GMT-7. Figuring out what time Arizona can be remembered by a simple mnemonic:
- In the winter time, Utah, Colorado, New Mexico, and Arizona are all on MST, so best American ski resorts are all on the same time zone. People who hop from one ski resort to another by helicopter don't have to reset their watches as they move into or out of Arizona.
- In the summer time, Arizonans head to San Diego, Los Angeles or other parts of California, where it is not so hot. California is on PDT, which is the same as MST. People who hop from Arizona wineries and vineyards to those in California and Oregon can easily cross the Arizona-California border without having to reset our watches.
Those in Second Life may have noticed that "Second Life time" (SL time) shifted from PST to PDT. That is because their servers reside in San Francisco, California.
technorati tags: IBM, Daylight Saving Time, DST, Second Life, Arizona, Hawaii, PDT, MST, GMT
On the news today, they mentioned it was "Happy Pi Day"
. Today is the 14th day of the 3rd month, and "pi" is about 3.14159, the ratio of the circumference of a circle to its diameter. So, in Tucson it is celebrated on 3/14, at 1:59pm MST.
The ratio has a lot to do with storage.
Tape wrapped around a hub. Tape is thin, but not completely, so wrapping hundreds of meters on tape results in a change in diameter of the spool. This impacts the rotational velocity needed to get the linear meters-per-second on the tape media consistent as the diameter changes when you wind down from a full spindle toward the hub. IBM has variable speed motors and other clever technologies to handle this adjustment.
Disks spin at consistent speeds, but tracks on the outside edge travel faster across the head than the inside tracks.Currently, the top speeds for disk are 15000 Revolutions per minute (RPM). As faster rotational speeds are investigated, the researchers find they need to make the diameters smaller to compensate.
The diameters of disks were based on "U", the unit height of standard 19" racks. A "U" is 1.75 inches, and standard floppy diskettes were 5.25 inch (3U) and 3.5 inch (2U). For those who have a difficult time remember how many inches a "U" is, it is the height of a standard two-by-four (2x4) piece of lumber.
The value of "pi" has been calculated to over a billion significant digits. Here is a cuteapplet to use if you ever need the value to any level of accuracy.
technorati tags: IBM, disk, tape, pi, Pi Day, U, RPM
Robert Von Oech on CreativeThink remembers Ernest Gallo
, who died last week at 97 years old.
"Do you know what I do?" Mr. Mondavi recalls Mr. Gallo asked him when they first met.
"Yes, you run the largest winery in the country," recalls Mr. Mondavi, then in his mid-20s.
"No," Ernest corrected him. "I go out and visit customers in stores."
Robert Smith (aka Radio Voom) reports on National Public Radio that Second Life is now being used for campaigning for political candidates. It used to be that political candidates took trains and buses across the country, meeting people, discussing their issues, and getting a feel for what is going on in the hearts and minds of their potential voters. With the development of TV and Radio, candidates traveled less, hoping to get their word out to people who would listen to them. Using Second Life and other social networking tools brings candidates back to having conversations with the people they hope to represent.
Of course, many of these candidates are old, and are learning internet social networking skills for the first time. John McCain, my senator from Arizona, is running for President at 70 years old! It's true that old dogs CAN learn new tricks.
IBM is investing heavily into Second Life, as are many other forward-thinking companies, to explore the age-old human need for connectedness, community and dialog. I've asked my team to all get their avatars up and running in Second Life. Granted there is a bit of a learning curve, but everybody handles change in different ways, some better than others.
John Windsor on YouBlog,Marina Krakovsky inStanford Magazine,and Guy Kawasaki, all discuss the "Effort Effect" and Carol Dweck's latest book "Mindset: The New Psychology of Success". I haven't read the book yet myself, but the reviews are interesting. The IT industry is evolving fast, and embracing new technologies, new concepts, and new ideas is necessary for success.
Seth Godin takes this one step further, arguing there are two kinds of people in this world: Thrill Seekers and Fear Avoiders. Forbes just published its latest list of billionaires. The front quote on Forbes' website says it all...
"Knowledge is the antidote to fear."
-Ralph Waldo Emerson
Why are most of these guys (and girls) with over a billion US dollars in net worth still working? Perhaps because they embrace new ideas, and are on the thrill seeking side of humanity. I guess I am too. I'll be thrill-seeking in Chicago this weekend, celebrating St. Patrick's day.
technorati tags: Robert Von Oech, CreativeThink, Ernest Gallo, Mondavi, Robert Smith, National Public Radio, NPR, John+McCain, Arizona, IBM, Secondlife, John Windsor, YouBlog, Mirina Krakovsky, Standford, Guy Kawasaki, Effort Effect, mindset, success, Seth Godin, thrill seekers, fear avoiders, Forbes, billionaires, working, Chicago, Wicked, St Patricks Day
Last Friday,The "Greater IBM Connection" team held a "red carpet" event, showcasing the winners of the Second Life "machinima"
.It is best explained on the Linden Lab website:
Machinima is the art of making real movies in virtual worlds.
Movies made in Second Life use the world's building, scripting, and avatar customization tools, working in real-time collaboration with people around the globe. You can use Second Life as your own virtual back lot, soundstage, choreography studio, costume and prop repository, and special effects house.
The seven videos were shown in Second Life, and are now available on YouTube for those who missed them.
technorati tags: IBM, Greater IBM Connection, Second Life, machinima, red carpet
The movie industry is slowly making the conversion to digital.
For about 25 years, movies were silent, actors acted, text was shown on the screen, and an organ or piano player added the musical score. My mother was a concert pianist, so I grew up listening to all kinds of piano music. Last weekend, while I was in Chicago for St. Patricks Day, we watched and listened to the dueling pianos at a bar called "Howl at the Moon". Those not familiar with this art form can watch this 1-minute video of Star Wars re-imagined as a Silent Movie.
About 80 years ago, "talkies" appeared. The sound was converted to a series of colors that were recorded as a separate strip on the film media itself, hence the name "soundtrack". When the movie ran, the colors would then be converted back to voice and music. While the live piano players were out of jobs, the move to sound created a whole new industry for foley artists, orchestras and composers.InformationWeek's Mitch Wagner explains in Something Will Be Lost thatgreat artists like Charlie Chaplin and Mary Pickford never completely made the transition to talkies.
Now the movie industry is changing again, this time from film to digital format. Thanks to digital, we can now see videos on the internet, such as this set of Impressive Palindromes parody of a Bob Dylan song.
While movies are digital when you rent them from the DVD store, download them on iTunes, or play them on YouTube, they are still mostly in analog format on 35mm or 70mm film stock when you see them on the big screen.
My first "digital projection" experience was the movie "Ice Age" shown in Denver, Colorado. The theatre owner came out to show us what film stock looks like, and then how small the DVD was that held the digital version. The theatre also showed previews of other movies first on film, then in digital, so that we could see the difference in quality.My second experience was "Star Wars: Attack of the Clones (episode II)", which I saw opening night at the Ziegfeld theatre in New York City. This was a huge theatre, and we had front row seats in the upper balcony.
Of course, the transition of film stock to digital projection is just one of the many trends resulting in the fast growth of computer IT storage. Documents transitioned from paper, to being scanned into digital format, to being created digitally using word processing software. Likewise, photographs went from film, to being scanned, to being captured with digital cameras.
As with talkies, history repeats itself; the transition to digital projection is not going smoothly.NPR's Laura Sydell reports thatDigital Projection in Theaters Slowed by Dispute. The dispute is between movie production companies and theatre owners. Currently, it is quite expensive to send out film stock to all the theatres, so the transition to digital will save the movie production companies lots of money. On the other hand, installing digital projection equipment will be costly for theatre owners. How the two groups will share the burdensome costs to convert this infrastructure is still under negotiation.
As a fan of going to the movies, I hope they resolve this dispute soon.
technorati tags: IBM, silent movie, Chicago, Star Wars, piano, talkies, foley artists, Charlie Chaplin, Mary Pickford, DVD, iTunes, NPR, digital projection, theatre, Mitch Wagner, Laura Sydell
The amount of information stored and available today is astounding. Consider the following:
...a weekday edition of The New York Times contains more information than the average person was likely to come across in a lifetime in seventeenth-century England.
Richard Wurman, Information Anxiety. 1989.
Shawn Callahan mentions this in his great presentation on how work really gets done.
Mark Nelson covers this in more detail inWe Have the Information You Want, But Getting It Will Cost You: Being Held Hostage by Information Overload.
To help address this challenge of organizing finding the right information at the right time, Web 2.0 technologies have emerged. You can read the 16-page paper What Is Web 2.0? -- Design Patterns and Business Models for the Next Generation by O'Reilly.
Or better yet, watch the quick 4-minute video Web 2.0 ... The Machine is Us/Ing Us.
technorati tags: Richard Wurman, Information Anxiety, Shawn Callahan, Mark Nelson, Web 2.0