On his blog post on preparation
, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
- Block-based disk storage
You walk to the coat-check desk, tell the attendant to hang your coat on a specific hanger, say hanger number 387. When you come back, you ask for the coat on hanger 387. The coat-check attendant knows exactly where hanger 387 is, and is able to retrieve it quickly. Most disk systems use this approach, including IBM SAN Volume Controller and DS family of disk systems.
- Name-based disk storage
You walk to the coat-check desk, tell the person the name that you want to call your coat. An empty hanger is located, and a list of coat names, with their associated hanger number, is then kept. Upon return, you ask for your coat by name, and the coat-check attendant looks up the hanger number to match, and retrieves your coat. This is the scheme used by the IBM System Storage DR550, N series for NAS storage, and the IBM Healthcare and Life Sciences Grid Medical Archive Solution (GMAS).
- Content-addressable storage (CAS)
You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a "hash code" from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. Upon return, you provide the hash code you were given, and the coat-check attendant looks up the hanger number to match, and retrieves your coat.This is the scheme used for some non-erasable, non-rewriteable storage, such as the EMC Centera.
IBM invented hash codes in 1953 as a way to speed up searches. For example, if you want to look up a word in the dictionary, knowing the first letter of the word makes it much quicker, because you can thumb directly to that section. A hash code was intended to give a more even distribution, so that if a million words are stored in a "hash code dictionary" then you would calculate the hash code, then look up only that section of words associated with that specific hash code number.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
It's safer that way.
technorati tags: Seth Godin, Swedish, saying, bad, weather, clothing, snow, Tucson, coat, check, room, IBM, block-based, disk, storage, DR550, N series, NAS, healthcare, life sciences, grid, medical, archive, solution, GMAS, content-addressable, CAS, EMC, Centera, hash code, collision, de-duplication, birthday, paradox
Continuing my theme on naming conventions, this time I will talk about finding information.
Industry analysts estimate that information workers spend as much as 30 percent of their working day just looking for information they need, as mentioned in an interview withJeff Teper, Microsoft.A second study of middle managers found similar results, and is discussed in Information Architecture andDM Review.
Take for example looking for information on a specific person. If you were searching for"Barack Obama" or "Lulit Shiferaw", then perhaps you will find exactly the person you werelooking for.
On the other hand, some names are more common. I've met my share of people named John Smith, Jennifer Jones,or Mark Johnson. While PEARSON does not even make the top 100 list of family names here in the US, it isstill fairly common.
People looking for me may realize that I am notHead of Australian Economics at ANZ Bank, author of the book "Don't Read This!", winner of the Killam Teaching Prize, or owner offisheries in Essex.While I do work out at the gym on a regular basis, I look nothing like thisfamous body builder. There are several others, but I pickedjust a few to make my point.
At this time, there is only one "Tony Pearson" working at IBM. A while back, there was a second,a woman who spelled her name "Toni" with an "i" at the end. Her job involved deploying storage on AIX platforms,and was similar enough job description to mine that we would get each other's mail, and sometimes not realize it was meant for the other.
Dan Santow of Word Wise talks about the difficulties of proper names withaccent marks.This would make searching worse, but many search tools can handle this, stripping off allaccent marks to make the comparisons.
But what if you wanted to leave those accent marks in, for an exact match?
technorati tags: Jeff Teper, Accenture, search, names, IBM, Barack Obama, Lulit Shiferaw, Dan Santow, Word Wise, productivity, Essex, bodybuilder, author, fisheries,Microsoft, Australian, Economics, ANZ Bank,Killam Teaching Prize
Stephen Colbert, of The Colbert Report
, explains the name changes in recent mergers
of the Telecommunications industry. A discussion on "changing names" and how that impacts storage seems like a good way to wrap up the week's theme on naming conventions.
Name changes are sometimes painful, but often times done for a purpose, such as to promote a family. In the US, when a man and woman marries, the woman often changes her family name to match her husband, and the kids all adopt the father's family name. I say "often" because there are times where the woman keeps her name, or adds to it in a hyphenated way. ABC News reported that a Man Fights to Take Wife's Name in Marriage. KipEsquire, a lawyer, writes about it in his blogA stitch in haste.
IT industry changes the names of products that people knew as something else. Other times, they re-use an existing name, when really it is or should be different from the original. Last year, I took on the job of helping transition from our brand "TotalStorage" to the "System Storage" product line under the new "IBM Systems" brand. I help decide what stays the same name or what changes, when it should change, and how to announce that change.
On the disk side, IBM renamed Fibre Array Storage Technology, or FAStT, which was pronounced exactly like "fast", to DS4000 series. This was a big improvement, as people couldn't seem to spell it properly, with variations like "FastT". Nor could people pronounce it properly, saying "fast-tee" instead. The advantage of "DS" is that it is both easy to spell, and easy to pronounce. The DS4000 series continues to be "fast", providing excellent performance for its midrange price category.
IBM's Enterprise Storage Server (ESS) line went from model E10, to F20, to 750 and 800. When IBM came out with its replacement, the IBM TotalStorage DS8000, some people asked why it wasn't named the ESS 900, for example. The DS8000 is quite different internally, new hardware design and implementation, but is highly compatible with the ESS line, and shares much of the same functionality from microcode. Last year, it was replaced by the IBM System Storage DS8000 Turbo. Again, newer hardware, so it was easy to justify the new name change from "TotalStorage" to "System Storage".
Renaming a product risks losing its certifications and awards. For example, IBM spent a lot of time and money getting the OS/390 operating system certified as a "UNIX" platform. When it was renamed to z/OS, IBM had to do it all over again. Learning from this experience, IBM decided not to rename the SAN Volume Controllerto a new designation like "DS5750", as it enjoys the "number one" spot on both the SPC-1 and SPC-2 performance benchmarks, and is recognized as the leader in the disk storage virtualization marketplace. Renaming this product would mean losing that collateral.
IBM's "other disk systems" the N series posed another set of challenges. The current DS line already has entry-level (DS3000), midrange (DS4000) and enterprise-class (DS6000 and DS8000) products. The OEM agreement that IBM has with Network Appliance (NetApp) resulted in a new set of entry-level, midrange, and enterprise-class products. But these didn't fit nicely into the DS3000-to-DS8000 continuum. Instead, IBM decided to go with N series, using N3000 for entry-level, N5000 for midrange, and N7000 for enterprise-class. These are different than the numbers used by NetApp for their comparable, but not identical, offerings.
On the tape side, IBM decided to name the tape drives TS1000 and TS2000 range, tape libraries and automation with a TS3000 range, and tape virtualization to the TS7000 range. A lot of tape products already had 3000 numbering that had to change to fit this new scheme. This is why IBM's popular 3592 tape drive was renamed to the TS1120. The replacement to the 3494 Virtual Tape Server was named TS7700 Virtualization Engine.
Obviously, you can't change the names of products that are currently in the field, but what about existing software with minor updates? IBM decided to leave "TotalStorage Produtivity Center" under the "TotalStorage" brand until it has a significant version upgrade. Many people say "TPC" as a convenient acronym when referring to this product, but TPC is a registered trademark of the Professional Golfers Association (PGA) to refer to its "Tournament Players Club".
How can anyone confuse "managing storage" with "playing golf"? One activity is full of frustration that takes years or decades to master, involving the need to understand a variety of equipment and techniques to use each properly to accomplish your goals; and the other is an enjoyable activity, immediately productive in front of a single pane of glass managing all of your DAS, SAN and NAS storage, from reporting on your files and databases to managing storage networks and tape libraries.
Enjoy the weekend!
technorati tags: Stephen Colbert, Colbert Report, Telecommunications industry, KipEsquire, IBM, FAStT, DS4000, DS3000, DS8000, OS/390, UNIX, z/OS, SAN Volume Controller, N series, TS1120, TS7700, TotalStorage Productivity Center, TPC, PGA, Golf
This week I am in Japan, so my week's theme will center around travel, speaking at conferences, and Japan itself. I first travelled to Japan in the late 1980s, to visit a college friend who was working for Ford Motor Company, on assignment in Japan as liasion to Mazda Corp.
Back then, the only Japanese phrase I knew was "Wakarimashta" which means "I know" or "I understand". If you only know one phrase in a foreign language, this possibly could be the worst to know.
My second trip, I was better prepared. I learned three "survival phrases":
sumimasen - "I'm sorry/excuse me"
hanashimasen - "I don't speak"
wakarimasen - "I don't know / I don't understand"
These are great phrases to know individually, but even more powerful strung all together, to emphasize that you will begin speaking English, but at least with good reason (and perhaps a bit of irony.)
I've been to Japan many times since, and have picked up more of the language. When travelling to Japan, or anywhere for that matter, it is important to "pack light". I'll be gone for two weeks, but all I bring is a laptop bag and one carry-on piece of luggage.
I went on a trip to Prague (Czech Republic) with a female co-worker who brought FOUR pieces of luggage. One was just for shoes. Another piece was just for hair styling gel, make-up, face creams and finger nail polish. Today, the rules are different, and the TSA allows only a single quart-size plastic bag containing little jars of 3 ounces or less of liquids or gels. I didn't have any "quart-size" bags, so I used a smaller sandwich-size bag.
What does all this have to do with storage? I've helped many clients move data centers, and this involves moving their servers, their networks, and their storage. Servers and Networks are easy to move, but storage presents some challenges. In many cases, the entire company is shut down, the storage is moved, and then the company is operational again. Needless to say, it is best to do this over a weekend.
I tell clients to "pack light" and figure out what data they really need in the move. What do you really need to operate your business? Bring just that, the rest can arrive later.
This same concept applies for Business Continuity and Disaster Recovery planning. What do you really need after a disaster occurs? Can you run your business for a few weeks on that data, until the rest of the data is restored? If you can't run your entire business on that data, can you run your most important parts of your business?
If you run a bank, perhaps keeping your ATM cash machines running is more important than making out new loans. In Japan, if a bank has any outages that impact their ATM machines, they put out a full page advertisement in the local papers to apologize for the inconvenience.
Business Continuity is one of the nine "Infrastructure Solutions" that IBM can help clients with. If you are interested in learning more on how IBM can help you with your Business Continuity, click here.
technorati tags: IBM, Japan, Prague, TSA, Business Continuity, Disaster Recovery, ATM, Infrastructure Solutions, travel, Japanese, language, survival, phrases,
Modified by TonyPearson
Continuing my week's theme on travel, conferences, and Japan, I saw two items in the newsthat seem to follow a common theme.
- According to the "The Daily Yomiuri", a local Japanese paper, "double happy weddings" arebecoming more and more popular in Japan. These would be called "stotgun" weddings in the US, butin Japan, couples pay extra to have a wedding between the fifth and seventh month ofpregnancy. As Dave Barry would say, I am not making this up. 27% of couples in Japan got married while or after pregnant. The logic is that they can celebrate both events with one ceremony. Many couples believe that the primary purpose of marriage is to have children, and somethat fail to have children suffer terrible anguish or divorce. Waiting untilbeing pregnant helps ensure the couple will be "successful" in this regard.
- IBM acquires Softek, a software company that develops a product called Transparent Data MoverFacility (TDMF) to move mainframe data from one disk system to another, while applicationsare running. This can be used, for example, to move data from outdated disk systems to IBMdisk systems. This is not to be confused with IBM's archive and retention software partner,Princeton Softech.
Softek is the software spin-off of Fujitsu (a Japanese computer hardware manufacturer). Fora while, Fujitsu made IBM-compatible mainframe servers, but was not successful at developingits own system software, relying heavily on IBM for this. Unable to compete against IBM, it stoppedmaking mainframe servers, but continues making other kinds of hardware equipment.
With TDMF, the process of moving data is simple. The software runs on z/OS and intercepts all writes intendedfor a source volumes on the old array, and re-directs a copy to destination volumes on the new device.Systems can run with old and new equipment side by side for a few weeks, with the new devicestaying in-sync with the old. When the client is ready to cross over, the systems arepointed to the new disk, and the old disk systems are detached and removed from the sysplex.
Afraid that installing TDMF will mess with your applications? IBM Global Technology Services (GTS)is able to roll-in a separate mainframe, move the data, than disconnect it along with the old storage.
(For customers running Linux, UNIX or Windows on other platforms, IBM offers SAN Volume Controller (SVC).While SVC is not marketed as a "data migration device", per se, it does have this capability.Many clients were able to cost-justify purchase of an SVCto move data from old storage to new in similar fashion to how TDMF works on the mainframe.)
What do these stories have to do with one another, other than both relating to Japan? IBM has beenusing TDMF for years as part of a service offering to move data from one disk system to another.Since Sam Palmiasano took over in 2002, IBM has acquired 51 companies, 31 of them software companies.Often, these have been "successful" turning quickly profitable because IBM was already well familiar with the companies they acquire, in much the same way that husbandsare well familiar with their brides-to-be at a "double happy wedding".
So, welcome Softek! It looks like its time to celebrate again!
technorati tags: IBM, Japan, Daily Yomiuri, double happy, wedding, shotgun, Dave Barry, Softek, TDMF, z/OS, Fujitsu
Well, I have left Japan, and while everyone else is enjoying the Super Bowl, I am now in Australia, at another conference.Today I had the pleasure to hear filmmakers talk about their successes, and how IBM helps the movie industry.
- Khoa Do
At one extreme was Khoa Do, independent filmmaker. After acting in movies asideMichael Caine and Billy Zane, he decided to become his own director. He started a project to help seven disadvantaged youths from a poor drug-ridden section of Sydney, by having them act in his first full-length film.Armed with only an IBM laptop and small budget, he made the film called "The Finished People" that had critical acclaim.
The film was a success, and many of the disadvantaged youths have gone on to act in other movies. In 2005, Khoa Do was named "Young Australian of the Year".
Thanks to IBM technology, filmmaking is now accessible to a wider number of aspiring wanna-be directors. It is no longer necessary to be part of a large film studio with a multi-million dollar budget to tell your story.
- Xavier Desdoigts
At the other extreme, was Xavier Desdoigts, director of technical operations at Animal Logic, the Computer Graphics (CG) arthouse that produced special effects of movies like "The Matrix", "House of Flying Dragons" and "World Trade Center". They started with producing digital effects for TV commercials, like this one forCarlton Draught Beer.
With the support of a large film studio and multi-million dollar budget, Animal Logic now boasts the 86th most powerful "Supercomputer" based on IBM BladeCenter technology, with over 4000 servers connected into a cluster, for making the movie "Happy Feet". The movie took four years to make, with over 500 people, of 27 different nationalities. It was the first CG movie made in Australia, and has been well-received by audiences worldwide.
Mr. Desdoigts gave out some interesting facts and figures about the movie:
- While visually stunning on the big screen, each frame is only 1.4 Megapixel, about the same resolution as most camera phones.
- In one scene, there are 427,086 penguins all appearing on frame.
- Mumble, the lovable lead character, is made up of over 6 million feathers.
- As many as 17 dancers were "motion-captured" to choreograph the tap-dancing and character interaction segments.
- Only one system admin was needed to manage this entire server farm. (IBM Systems Director technology makes this possible)
- The movie consumed 103 TB of disk space, backed up to 595 LTO tape cartridges.
- An estimated 17 million CPU-hours were needed for all the processing and rendering.
Rather than talking about technology for technology sake, these filmmakers showed how technology couldbe put to use, in a practical sense, to provide the world something of value.
technorati tags: IBM, filmmaker, film industry, Khoa Do, Michael Caine, Billy Zane, Xavier Desdoigts, Animal Logic, Happy Feet, Systems Director, LTO, BladeCenter
In Storage Technology News, Marc Staimer makes hisSeven network storage predictions for 2007
. Let's take a closer look at each one.
- Federal Rules for Civil Procedures (FRCP) will increase adoption of unstructured data classification, email archive systems and CAS.
CAS continues to flounder, but the rest I can agree with. Regulations are being adopted world wide. Japan has its own Sarbanes-Oxley (SOX) style legislation go into effect in 2008.IBM TotalStorage Productivity Center for Data is a great tool to help classify unstructured file systems. IBM CommonStore for email supports both Microsoft Exchange and Lotus Domino, and can be connected to IBM System Storage DR550 for compliance storage.
- Unified storage systems (combined file and block storage target systems) will become increasingly attractive in 2007, because of their ease of use and simplicity.
I agree with this one also. Our sales of IBM N series in 2006 was great, and looking to continue its strong growth in 2007. The IBM N series brings together FCP, iSCSI and NAS protocols into one disk system. With the SnapLock(tm) feature, N series can store both re-writable data, as well as non-erasable, non-rewriteable data, on the same box. Combine the N series gateway on the front-end with SAN Volume Controller on the back-end, and you have an even more powerful combination.
- Distributed ROBO backup to disk will emerge as the fastest growing data protection solution in 2007.
IDC had a similar prediction for 2006. ROBO refers to "Remote Office/Branch Office", and so ROBO backup deals with how to back up data that is out in the various remote locations. Do you back it up locally? or send it to a central location?Fortunately, IBM Tivoli Storage Manager (TSM) supports both ways, and IBM has introduced small disk and tape drives and auto-loaders that can be used in smaller environments like this. I don't know whether "backup to disk" will be the fastest growing, but I certainly agree that a variety of ROBO-related issues will be of interest this year.
- 2007 will be remembered as the year iSCSI SAN took off because of the much reduced pricing for 10 Gbit iSCSI and the continued deployment of 10 Gbit iSCSI targets.
While I agree that iSCSI is important, I can't say 2007 will be remembered for anything.We have terrible memory in these things. Ask someone what year did Personal Computers (PC) take off, and they will tell you about Apple's famous 1984 commercial. Ask someone when the Internet took off, cell phones took off, etc, and I suspect most will provide widely different answers, but most likely based on their own experience.
For the longest time, I resisted getting a cell phone. I had a roll of quarters in my car, and when I needed to make a call, I stopped at the nearby pay-phone, and made the call. In 1998, pay phones disappeared. You can't find them anymore. That was the year of the cell phones took off, at least for me.
Back to iSCSI, now that you can intermix iSCSI and SAN on the same infrastructure, either through intelligent multi-protocol switches available from your local IBM rep, or through an N series gateway, you can bring iSCSI technology in slowly and gradually. Low-cost copper wiring for 10 Gbps Ethernet makes all this very practical.
Another up-and-coming technology is AoE, or ATA-over-Ethernet. Same idea as iSCSI, but taken down to the ATA level.
- CDP will emerge as an important feature on comprehensive data protection products instead of a separate managed product.
Here, CDP stands for Continuous Data Protection. While normal backups work like a point-and-shoot camera, taking a picture of the data once every midnight for example. CDP can record all the little changes like a video camera, with the option to rewind or fast-forward to a specific point in the day. IBM Tivoli CDP for Files, for example, is an excellent complement to IBM Tivoli Storage Manager.
The technology is not really new, as it has been implemented as "logs" or "journals" on databases like DB2 and Oracle, as well as business applications like SAP R/3.
The prediction here, however, relates to packaging. Will vendors "package" CDP into existing backup products, possibly as a separately priced feature, or will they leave it as a separate product that perhaps, like in IBM's case, already is well integrated.
- The VTL market growth will continue at a much reduced rate as backup products provide equivalent features directly to disk. Deduplication will extend the VTL market temporarily in 2007.
VTL here refers to Virtual Tape Library, such as IBM TS7700 or TS7510 Virtualization Engine. IBM introduced the first one in 1997, the IBM 3494 Virtual Tape Server, and we have remained number one in marketshare for virtual tape ever since. I find it amusing that people are now just looking at VTL technology to help with their Disk-to-Disk-to-Tape (D2D2T) efforts, when IBM Tivoli Storage Manager has already had the capability to backup to disk, then move to tape, since 1993.
As for deduplication, if you need the end-target box to deduplicate your backups, then perhaps you should investigatewhy you are doing this in the first place? People take full-volume backups, and keep to many copies of it, when a more sophisticated backup software like Tivoli Storage Manager can implement backup policies to avoid this with a progressive backup scheme. Or maybe you need to investigate why you store multiple copies of the same data on disk, perhaps NAS or a clustered file system like IBM General Parallel File System (GPFS) could provide you a single copy accessible to many servers instead.
The reason you don't see deduplication on the mainframe, is that DFSMS for z/OS already allows multiple servers to share a single instance of data, and has been doing so since the early 1980s. I often joke with clients at the Tucson Executive Briefing Center that you can run a business with a million data sets on the mainframe, but that there wereprobably a million files on just the laptops in the room, but few would attempt to run their business that way.
- Optical storage that looks, feels and acts like NAS and puts archive data online, will make dramatic inroads in 2007.
Marc says he's going out on a limb here, and that's good to make at least one risky prediction. IBM used to have anoptical library emulate disk, called the IBM 3995. Lack of interest and advancement in technology encouraged IBM to withdraw it. A small backlash ensued, so IBM now offers the IBM 3996 for the System p and System i clients that really, really want optical.
As for optical making data available "online", it takes about 20 seconds to load an optical cartridge, so I would consider this more "nearline" than online. Tape is still in the 40-60 second range to load and position to data, so optical is still at an advantage.
Optical eliminates the "hassles of tape"? Tape data is good for 20 years, and optical for 100 years, but nobody keeps drives around that long anyways. In general, our clients change drives every 6-8 years, and migrate the data from old to new. This is only a hassle if you didn't plan for this inevitable movement. IBM Tivoli Storage Manager, IBM System Storage Archive Manager, and the IBM System Storage DR550 all make this migration very simple and easy, and can do it with either optical or tape.
The Blue-ray vs. DVD debate will continue through 2007 in the consumer world. I don't see this being a major player in more conservative data centers where a big investment in the wrong choice could be costly, even if the price-per-TB is temporarily in-line with current tape technologies. IBM and others are investing a lot of Research and Development funding to continue the downward price curve for tape, and I'm not sure that optical can keep up that pace.
Well, that's my take. It is a sunny day here in China, and have more meetings to attend.
technorati tags: IBM, FRCP, SOX, TotalStorage, Productivity Center, Microsoft, Exchange, Lotus, Domino, DR550, SnapLock, unified storage, NAS, iSCSI, FCP, ROBO, Tivoli, Storage Manager, TSM, Ethernet, AoE, CDP, DB2, Oracle, SAP, VTL, TS7700, TS7510, GPFS, DFSMS, Optical, 3995, 3996, Blue-Ray, D2D2T,DVD
I am back from China, and now glad to be back in the old USA. Last week, someone asked me what would it take to add a specific feature to the IBM System Storage DS8300. The what-would-it-take question is well-known among development circles informally as a "sizing" effort, or more formally as "Development Expense" estimate.
For software engineering projects, the process was simply that an architect would estimate the number of "Lines of Code" (LOC) typically represented in thousands of lines of code (KLOC). This single number would convert to another single number, "person-months", which would then translate to another single number "dollars". Once you had KLOC, the rest followed directly from a formula, average or rule-of-thumb.
More amazing is that this single number could then determine a variety of other numbers, the number of total months for the schedule, the number of developers, testers, publication writers and quality assurance team members needed, and so on. Again, these were developed using a formula, developed and based on past experience of similar projects.
Earlier in my career, I was the lead architect for DFSMS for the z/OS operating system, and later for IBM TotalStorage Productivity Center, performing these sizing efforts. A famous IBM architect, Frederick P. Brooks, wrote a now-classic book that was requiredreading when I started at IBM, which just was re-released as Mythical Man-Month: Essays in Software Engineering, 20th Anniversary Edition. In addition to sound advice, he alsooffered a formula or two that helps with these estimating tasks.
Hardware design introduces a different set of challenges. When I was getting my Masters Degree in Electrical Engineering, it took myself and four other grad students a full semester just to design a six-layer, 900 transistor silicon chip, which could only perform a single function, multiply two numbers together.At IBM, another book that I was given to read was Soul of a New Machine, documenting six hardware engineers, and six software engineers, working long hours on a tight schedule to produce a new computer for Data General.
So why do I bring this up now? IBM architects William Goddard and John Lynott are being inducted posthumously this year into the prestigious National Inventors Hall of Fame for their disk system innovation.
Under the leadership of Reynold Johnson, the team developed an air-bearing head to “float” above the disk without crashing into the disk. Imagine a fighter airplane flying full speed across the country-side at 50 feet off the ground. If you every heard the term "my disk crashed", it was originally referring to the read/write head touching the disk surface, causing terrible damage.
A uniformly flat disk surface was created by spinning the coating onto the rapidly rotating disk, leaving many wearing lab coats covered with disk liquid at waist level. Developing disk-to-disk and track-to-track access mechanisms proved more challenging, and nearly halted the project. The team, however, was adamant that this problem could be solved, and customers were increasingly asking for random access technology. The result was the "350 Disk Storage Unit" designed for the "305 RAMAC computer", which I have talked about a lot last year as part of our "50 years of disk systems innovation" celebration.
Neither Goddard nor Lynott had computing experience prior to joining IBM. Goddard was a former science teacher who briefly worked in aerospace. Lynott had been a mechanic in the Navy and later a mechanical engineer. They didn't have a nice formula based on past experience, they didn't have the benefit of Fred Brooks' advice, or the rules-of-thumb or averages now used to estimate the size of projects. They had to break new ground.
Now that's innovation!
technorati tags: IBM, DS8300, disk, KLOC, sizing, estimate, DFSMS, z/OS, TotalStorage Productivity Center, Frederick Brooks, William Goddard, John Lynott, Mythical Man-Month, Reynold Johnson, RAMAC, 305, 350,
In case you haven't noticed, IBM System Storage makes most of their announcements on Tuesdays. IBM announced a lot today, so here is a quick run-down.
- Cisco storage networking products
IBM continues to resell Cisco switches and directors, but now can offer these with a 1-year IBM warranty.
The entry-level Cisco 9124offers 8 to 24 ports. For IBM BladeCenter, IBM now offers the Cisco10-port and 20-port modules that slide into the back of the chassis, and are functionally equivalent to the 9124.The original BladeCenter came with a 16-port module with 14 internal, but only 2 external, which severely hamperedbandwidth connectivity to external storage. These new modules provide more external ports to relieve that constraint.
The midrange Cisco9200switches have two models, both with 16 fixed ports, with the option for a blade that can provide 12, 24 or 48 additional ports. The 9216A has 16 FCP ports, and the 9216i has 14 FCP ports, and 2 GbE ports to act as a router, such as toconnect to a remote location for business continuity using Metro Mirror or Global Mirror.
The enterprise-class Cisco 9500directors can support up to 528 ports.
- TS3400 Tape Library
The new TS3400library is a small entry-level size library, supporting the enterprise-class TS1120 drive, providing interoperabilitywith the larger tape libraries, with all the support for tape encryption.
In addition to Linux, Unix, and WIndows, the TS1120 can now be connected to System i servers. In the past, the only IBMtape available to System i were the LTO models. There are a lot of businesses that need to comply with government regulations that are looking for tape encryption, and now IBM has made it accessible to more clients.
- 300GB drives at 15K RPM
The DS8000 can now support new drives with 300GB capacity at 15,000 RPM (15K). These can be up to 30 percent faster than the 10,000 RPM drives for typical workloads.
IBM continues its market leadership with these new set of features and offerings!
technorati tags: IBM, SAN, Cisco, 9124, BladeCenter, warranty, 9200, 9216i, 9216A, 9500, TS3400, TS1120, LTO, DS8000, disk,tape,15K, RPM
I am still wiping the coffee off my computer screen, inadvertently sprayed when I took a sip while reading HDS' uber-blogger Hu Yoshida's post on storage virtualization andvendor lock-in
. This blog appears to be the text version of theirfunny video
While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.
"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this."
-- Hu Yoshida
Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.
First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.
Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.
IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.
IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.
IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.
IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.
This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.
IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.
IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.
Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).
IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.
The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).
Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.
IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.
So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.
technorati tags: IBM, disk, tape, storage, virtualization, Hu Yoshia, HDS, Hitachi, TagmaStore, USP, NSC, disk-less, SAN Volume Controller, LVM, AIX, RAID, SAN, blade, Sun, STK, Cisco, EMC, Invista,
Today, I went looking for reading-glasses. Unfamiliar with my surroundings, I asked several people where I might be able to find and purchase these, and was sent in various directions. My first stop was a bookstore. It would make sense that since many people need reading glasses to read the books, that they would sell them there, but no. The staff didn't know where I could go, but pointed me in the direction of a mall. At the mall, I found a pharmacy. Many pharmacies sell reading glasses, so I stopped in, but no, not this one. The pharmacists suggested the super-store nearby. I walked in to the super-store, and asked the first employee where they keep their reading glasses, and they said the other corner. The other corner was the electronics department. It made sense that they sold CDs and DVDs in the same section as the equipment that plays them, but reading glasses? Skeptical, I went to the pharmacy department, and the young and beautiful lady (everyone is young, thin and beautiful here) had me follow her, and she led me back to the electronics department, whereupon she pointed to a rack of sunglasses. I indicated that I need reading glasses, not sunglasses. She pulled one out, and it was indeed reading glasses, 1.25, just what I was looking for. Others were tinted, so you can read the newspaper out in the sunlight. The pair I chose cost only $97 in the local currency.
After reading the last sentence, you might be thinking I am describing my "avatar" in Second Life, but no, I am talking about my search for reading glasses on the streets of Mexico. I am here this week in meetings with IBM Business Partners and sales reps to discuss IBM's latest System Storage products and offerings.
We used to tell people they should "clothe" servers with storage. IBM offers both, so yes it makes sense to offer both as part of a complete solution. However, when you look through a dictionary definition "to clothe" you learn it is to dress, wrap or cover with clothing, an implication that it is external, and perhaps temporary, easily changed, like switching from sunglasses to reading glasses. In Second Life, objects can be "worn", simply by attaching or detaching them to your "avatar". Sometimes clothing serves a purpose, like reading glasses, provides protection, like raincoats, and other times, more decorative, like"icing on the cake" or "gold plating".
This concept was fine 50 years ago, when we were in a server-centric world, and dumb storage devices were attached to very intelligent servers. Back then, we used the derogatory term "subsystems" to emphasize that storage was just part of the server, not a system of its own.
Today, we live in an information-centric world. The information outlives the media, and the media outlives the servers that access it. It is not unreasonable to attach dozens or hundreds of servers to a single storage system, or collection of storage systems. Over 20 percent of IBM System Storage DS8000 series, for example, are attached to Windows rack-optimized or blade servers. Imagine a refrigerator surrounded by dozens or hundreds of pizza boxes. Storage is no longer a subsystem, but a system on its own right, dressed, wrapped or covered by servers that deliver the right information, to the right people, at the right time.
So perhaps we should reverse it, telling people they should "clothe" their storage with servers!
technorati tags: IBM, disk, storage, servers, Second Life, clothing, DS8000, Turbo, subsystem, system
Well, I'm back from Mexico.
The flight back was uneventful, except for the leg from Houston to Tucson. The lady in the window seat had "overallocated storage" and required a "distance extension" on her safety belt. To accomodate her, her husband and I flipped up the "logical partitions" between the seats, and "compressed" to take up less space to accomodate. Luckily, it was only for two hours.
On the flight to Houston, I was asked what kind of drink I wanted, in Spanish, as the crew were all from Mexico. Here's a quick Spanish lesson:
- this stands for drink in general, and can include liquor and soft drinks
- this stands generically for soft drink. They will often use "Coke" to refer to any cola beverage, regardless of brand.
It is interesting that Spanish language in each country is slightly different. The Mexicans I met with and spoke Spanish to immediately recognized I was from South America, and not from Central America. Likewise, folks in Puerto Rico knew I was from somewhere from South America, and not from Mexico or Central America. In Columbia, Argentina, and even Brazil, my speech is more recognizable as being from Bolivia.
Before IBM got into an OEM agreement with Network Appliance, I used to indicate that EMC and NetApp were the "Coke and Pepsi" of the NAS marketplace. IBM had a presence, but it was in the single digits, whereas these two major players had roughly equal marketshare, just as Coke and Pepsi dominate equally the US marketplace. That analogy doesn't work in other countries, as in some cases the country might be more heavily in favor of one or the other.
On my flight over from Houston to Tucson, however, I was asked what kind of "pop" I wanted. I always say "soda" to refer generically to soft drinks, but realize that others say "pop" instead. Not only can Americans be able to detect what part of the country people are from by accent, but also by the words they use.
Now I see a blog that explores in great detail the issue of Pop vs Soda vs Coke.
So, it looks like I'll need to "retire" my Coke vs. Pepsi analogy, not because their marketshare has changed, but because IBM's parntering with NetApp greatly skews the advantage over EMC.
technorati tags: IBM, Mexico, NAS, OEM, NetApp, EMC, Coke, Pepsi, Bolivia, Pop, Soda
Tonight I had dinner with Henry Daboub (an SVC expert from Houston, TX) and some clients, who asked what I would blog about tonight, and I figured it made sense to blog about the SVC.
Hu Yoshida clarifies his position about storage virtualization, including the statement: "As a result they can not provide the availability, scalability, and performance of a DS8300. If they could, there would be no need for a DS8300."
Of course, if humans descended from apes, why are there still apes? Now that we have cars, why are there still trains? But perhaps a better question is: now that there are supercomputers, why are there still mainframe servers?
The issue is the difference between scale-up versus scale-out. Scale-up is making a single box as big and beefy as possible. When the SVC was introduced, the major vendors all had scale-up designs: IBM ESS 800, HDS Lightning, EMC Symmetrix. Like the mainframe, they were for customers that wanted everything in a single monolithic container.
SAN Volume Controller was the result of IBM Research asking the question, if you could put anyone's software (feature and functionality) on anyone's hardware (monolithic scale-up design), what combination would you choose? What if the brains inside today's monolithic systems could be snapped into the another vendor's frame? What if you could run SRDF on an HDS box, or ShadowImage on an IBM box? The surprising response was that most customers would want a single software for consistency, but wanted the option to choose from different vendors hardware, to negotiate the best price of the commodity iron. Based on this feedback, the SVC was born.
The idea was simple, put all the brains in a separate appliance. The appliance would do the non-disruptive migrations, the caching, the striping, and all the copy services. This lets the customer chose then the hardware they want, any mix of FC and ATA disk, from any vendor.
The SVC design was based on IBM's long history in supercomputers. Using the same "scale-out" technology, the power comes not from having it all in one monolithic box, but rather in a design that combines small nodes together. While the cache is not globally shared, the data is shared between node-pairs, and the logical-to-physical mapping is routed around to all nodes in a cluster. Each SVC node talks to each other SVC node through the FCP ports, eliminating the need for additional wiring. For the most part, each node does its own separate work, but when it needs to, they can communicate across, just like nodes in a supercomputer.
Both the SVC and the DS8300 Turbo have better than 99.999 percent availability, based on redundant components designed for no single point of failure (SPOF). IBM has sold thousands of each, and they have been in the field enough time that we can make that claim. There is nothing between scale-up versus scale-out that makes on inherently more available than the other.
Both the SVC and the DS8300 Turbo can scale from as little as a few TB of disk, to hundreds of TB of disk. We have yet to meet a customer that is too big for the SVC. The DS8300 Turbo is able to scale by adding up to four extension frames, but is still considered a single box from a scale-up perspective. From a processor perspective, an 8-node SVC cluster has 16 Intel Xeon processors, and the DS8300 has 8 POWER5+ processors (dual 4-way). The key advantage of scale-out is that you can add capacity to the SVC in smaller increments. Jumping from a DS8100 (dual 2-way) to a DS8300 (dual 4-way) is a big jump.
SVC remains the fastest disk system in the industry, based on both the SPC-1 and SPC-2 benchmarks. The latest model now supports 8GB per node, for a total of 64GB for an 8-node cluster. This can be used for both read and write non-volatile storage. By comparison, DS8300 Turbo has 32GB write non-volatile storage, and up to 256 GB of read-only cache. The SVC is able to do 155,519 IOPS, faster than the 123,030 IOPS for the DS8300, and of course faster than anything from EMC, HDS, HP or Texas Memory Systems. Of course, workloads vary, and there might be some workloads where the 256GB of read-only cache of the monolithic DS8300 is the better choice.
- copy services
Both SVC and DS8300 Turbo offer FlashCopy (point-in-time copy), Metro Mirror (synchronous) and Global Mirror (asynchronous). SVC provides the additional benefit that it can perform a FlashCopy from one frame to another, and the ability to migrate data seemlessly from one box to another.
Interestingly, IBM has seen a resurgence in both mainframe sales, as well as interest in supercomputers. Both have their place, based on the workload characteristics, and so IBM will continue to offer both modular scale-out designs, as well as monolithic scale-up designs, to meet the different needs of the marketplace.
technorati tags: IBM, disk, SAN, Volume, Controller, DS8300, Turbo, Hu Yoshida, FlashCopy, Metro Mirror, Global Mirror, SPC, benchmarks, HDS, HP, EMC, mainframe, supercomputer
Well, this week I am in Maryland, just outside of Washington DC. It's a bit cold here.
Robin Harris over at StorageMojo put out this Open Letter to Seagate, Hitachi GST, EMC, HP, NetApp, IBM and Sun about the results of two academic papers, one from Google, and another from Carnegie Mellon University (CMU). The papers imply that the disk drive module (DDM) manufacturers have perhaps misrepresented their reliability estimates, and asks major vendors to respond. So far, NetAppand EMC have responded.
I will not bother to re-iterate or repeat what others have said already, but make just a few points. Robin, you are free to consider this "my" official response if you like to post it on your blog, or point to mine, whatever is easier for you. Given that IBM no longer manufacturers the DDMs we use inside our disk systems, there may not be any reason for a more formal response.
- Coke and Pepsi buy sugar, Nutrasweet and Splenda from the same sources
Somehow, this doesn't surprise anyone. Coke and Pepsi don't own their own sugar cane fields, and even their bottlers are separate companies. Their job is to assemble the components using super-secret recipes to make something that tastes good.
IBM, EMC and NetApp don't make DDMs that are mentioned in either academic study. Different IBM storage systems uses one or more of the following DDM suppliers:
- Seagate (including Maxstor they acquired)
- Hitachi Global Storage Technologies, HGST (former IBM division sold off to Hitachi)
In the past, corporations like IBM was very "vertically-integrated", making every component of every system delivered.IBM was the first to bring disk systems to market, and led the major enhancements that exist in nearly all disk drives manufactured today. Today, however, our value-add is to take standard components, and use our super-secret recipe to make something that provides unique value to the marketplace. Not surprisingly, EMC, HP, Sun and NetApp also don't make their own DDMs. Hitachi is perhaps the last major disk systems vendor that also has a DDM manufacturing division.
So, my point is that disk systems are the next layer up. Everyone knows that individual components fail. Unlike CPUs or Memory, disks actually have moving parts, so you would expect them to fail more often compared to just "chips".
If you don't feel the MTBF or AFR estimates posted by these suppliers are valid, go after them, not the disk systems vendors that use their supplies. While IBM does qualify DDM suppliers for each purpose, we are basically purchasing them from the same major vendors as all of our competitors. I suspect you won't get much more than the responses you posted from Seagate and HGST.
- American car owners replace their cars every 59 months
According to a frequently cited auto market research firm, the average time before the original owner transfers their vehicle -- purchased or leased -- is currently 59 months.Both studies mention that customers have a different "definition" of failure than manufacturers, and often replace the drives before they are completely kaput. The same is true for cars. Americans give various reasons why they trade in their less-than-five-year cars for newer models. Disk technologies advance at a faster pace, so it makes sense to change drives for other business reasons, for speed and capacity improvements, lower power consumption, and so on.
The CMU study indicated that 43 percent of drives were replaced before they were completely dead.So, if General Motors estimated their cars lasted 9 years, and Toyota estimated 11 years, people still replace them sooner, for other reasons.
At IBM, we remind people that "data outlives the media". True for disk, and true for tape. Neither is "permanent storage", but rather a temporary resting point until the data is transferred to the next media. For this reason, IBM is focused on solutions and disk systems that plan for this inevitable migration process. IBM System Storage SAN Volume Controller is able to move active data from one disk system to another; IBM Tivoli Storage Manager is able to move backup copies from one tape to another; and IBM System Storage DR550 is able to move archive copies from disk and tape to newer disk and tape.
If you had only one car, then having that one and only vehicle die could be quite disrupting. However, companies that have fleet cars, like Hertz Car Rentals, don't wait for their cars to completely stop running either, they replace them well before that happens. For a large company with a large fleet of cars, regularly scheduled replacement is just part of doing business.
This brings us to the subject of RAID. No question that RAID 5 provides better reliability than having just a bunch of disks (JBOD). Certainly, three copies of data across separate disks, a variation of RAID 1, will provide even more protection, but for a price.
Robin mentions the "Auto-correlation" effect. Disk failures bunch up, so one recent failure might mean another DDM, somewhere in the environment, will probably fail soon also. For it to make a difference, it would (a) have to be a DDM in the same RAID 5 rank, and (b) have to occur during the time the first drive is being rebuilt to a spare volume.
- The human body replaces skin cells every day
So there are individual DDMs, manufactured by the suppliers above; disk systems, manufactured by IBM and others, and then your entire IT infrastructure. Beyond the disk system, you probably have redundant fabrics, clustered servers and multiple data paths, because eventually hardware fails.
People might realize that the human body replaces skin cells every day. Other cells are replaced frequently, within seven days, and others less frequently, taking a year or so to be replaced. I'm over 40 years old, but most of my cells are less than 9 years old. This is possible because information, data in the form of DNA, is moved from old cells to new cells, keeping the infrastructure (my body) alive.
Our clients should approach this in a more holistic view. You will replace disks in less than 3-5 years. While tape cartridges can retain their data for 20 years, most people change their tape drives every 7-9 years, and so tape data needs to be moved from old to new cartridges. Focus on your information, not individual DDMs.
What does this mean for DDM failures. When it happens, the disk system re-routes requests to a spare disk, rebuilding the data from RAID 5 parity, giving storage admins time to replace the failed unit. During the few hours this process takes place, you are either taking a backup, or crossing your fingers.Note: for RAID5 the time to rebuild is proportional to the number of disks in the rank, so smaller ranks can be rebuilt faster than larger ranks. To make matters worse, the slower RPM speeds and higher capacities of ATA disks means that the rebuild process could take longer than smaller capacity, higher speed FC/SCSI disk.
According to the Google study, a large portion of the DDM replacements had no SMART errors to warn that it was going to happen. To protect your infrastructure, you need to make sure you have current backups of all your data. IBM TotalStorage Productivity Center can help identify all the data that is "at risk", those files that have no backup, no copy, and no current backup since the file was most recently changed. A well-run shop keeps their "at risk" files below 3 percent.
So, where does that leave us?
- ATA drives are probably as reliable as FC/SCSI disk. Customers should chose which to use based on performance and workload characteristics. FC/SCSI drives are more expensive because they are designed to run at faster speeds, required by some enterprises for some workloads. IBM offers both, and has tools to help estimate which products are the best match to your requirements.
- RAID 5 is just one of the many choices of trade-offs between cost and protection of data. For some data, JBOD might be enough. For other data that is more mission critical, you might choose keeping two or three copies. Data protection is more than just using RAID, you need to also consider point-in-time copies, synchronous or asynchronous disk mirroring, continuous data protection (CDP), and backup to tape media. IBM can help show you how.
- Disk systems, and IT environments in general, are higher-level concepts to transcend the failures of individual components. DDM components will fail. Cache memory will fail. CPUs will fail. Choose a disk systems vendor that combines technologies in unique and innovative ways that take these possibilities into account, designed for no single point of failure, and no single point of repair.
So, Robin, from IBM's perspective, our hands are clean. Thank you for bringing this to our attention and for giving me the opportunity to highlight IBM's superiority at the systems level.
technorati tags: IBM, Seagate, Hitachi, HGST, EMC, NetApp, HP, HDS, Sun, Google, CMU, DDM, Fujitsu, MTBF, MTTF, AFR, ARR, JBOD, RAID, Tivoli, SVC, DR550, CDP, FC, SCSI, disk, tape, SAN,
Tuesday is always good for announcements. Today, Gartner, Inc.
announced that IBM has taken over HP in its climb to the top. I'll quote directly from today's press release:
STAMFORD, Conn., March 6, 2007 — Worldwide external controller-based (ECB) disk storage revenue totaled $15.2 billion in 2006, a 4.1 percent increase over 2005 revenue of $14.6 billion, according to Gartner, Inc.IBM overtook Hewlett-Packard for the No. 2 position in 2006 (see Table 1). IBM’s worldwide ECB market share increased to 15.8 percent, while HP’s market share dropped to 13.1 percent.
IBM beat HP both in 4Q06, as well as 2006 full year.You can read more about it from Gartner Dataquest report “Market Share: Disk Array Storage, All Regions, All Countries, 1Q05-4Q06" on their website. (Note: non-IBMers might need an account with Gartner to access this, not sure)
The focus was on external controller-based disk, not external controller-less SCSI/SAS disk, not disk arrays posing as virtual tape libraries, nor any disk sold inside HP, Sun, IBM or Dell servers. This is to compare with disk-only vendors such as EMC and HDS. The revenues reflect hardware only, including hardware-related parts of financial leases and managed services. Revenues from optional priced software features such as multi-pathing drivers, management software, or advanced copy services were excluded.I discussed these types of analyst reports back in blog post last September: Space Race Heats Up.
These marketshare numbers are based on revenues, not units or terabytes. When a box gets sold, the revenue was counted toward the vendor that sold it, not the manufacturer that built it. In this last report:
- When Dell sells an EMC box, it gets counted as Dell. When Fujitsu Siemens sells an EMC box, it gets counted as "Other".
- When HP sells an HDS box, it gets counted as HP. When Sun sells the HDS box, it gets counted as Sun.
- When IBM sells its System Storage N series (from the OEM agreement with NetApp), it gets counted as IBM. Both IBM and NetApp experienced growth in the NAS/unified storage arena.
It's still cold here in the Washington DC area, but at least good news like this helps warm me up!
technorati tags: IBM, disk, external controller-based, ECB, Gartner, 4Q06, 2006, revenue, marketshare, HP, EMC, Sun, Dell, NetApp, HDS, NAS