It's official! IBM System Storage TS1120 tape drive takes home the gold award, the product of the year, announced by Storage magazine.
I spent 18 hours traveling from Australia to China yesterday, and we were partially delayed due to weather, but felt that it was necessary to discuss the innovative use of encryption on this drive.
While most consider the TS1120 an "Enterprise-class" tape technology for the mainframe, it is also attachable to the smallest distributed systems running Windows, Linux, or various flavors of UNIX. Rather than limit users with an Encryption Key Manager that only ran on z/OS, IBM instead chose to implement it in Java, that can be run on anything from z/OS to Linux, Unix and Windows platforms, giving clients choice and flexibility in their deployment.
The design is quite clever and elegant. In the encryption world, there are two ways to encrypt.
- Symmetric Key
This is very fast, because it uses a single key for both encryption and decryption, and can be incorporated on a chip. The problem is that anyone with the key can read the sensitive data.
- Asymmetric Key
This is slower, but more secure, using two separate keys. The public "encryption" key takes clear data and encrypts it. Anyone can be freely given this key, as they cannot use it to decrypt any other data. The private "decryption" key is able to decrypt the data, so that one is kept secret. If two business plan to exchange lots of tapes, they can exchange their "encryption" keys to each other.
So, let's say that Green, Inc. wants to send a tape to Blue, Co. Blue has already provided its public "encryption" key to Green, so Green does the following:
- Generate a unique data key, will call it the "red key", and there is one for each tape. It is a standard AES 256-bit symmetric key that can be processed with less than one percent overhead on the tape drive. All the data is encrypted with this key.
- Store the red key on the tape. How does Green give Blue the red key? Green encrypts it with Blue's RSA 2048-bit public "encryption" key. This is stored on three places on the tape cartridge, one in memory, and the other two on the media itself.
- Sends the tape over to Blue Co.
When it arrives on the dock at Blue Co., they do the following:
- Mount the tape and decrypt the "red key" using Blue's super-secret private decryption key.
- Pass the "red key" to the tape drive, and have it read, append or re-write the tape.
If the super-secret private key is ever compromised, all you have to do is mount the tape, unlock the red key with the old private key, and re-lock the red key with a new public key. Since the red key doesn't change, the rest of the data can be left in tact. The whole process takes less than 5 minutes, compared to Sun Microsystems method, which could take 1-2 hours per cartridge, having to decrypt and re-encrypt the entire data stream.
technorati tags: IBM, tape, TS1120, encryption, gold, award, storage, magazine, Sun, AES,RSA
Well, I have left Japan, and while everyone else is enjoying the Super Bowl, I am now in Australia, at another conference.Today I had the pleasure to hear filmmakers talk about their successes, and how IBM helps the movie industry.
- Khoa Do
At one extreme was Khoa Do, independent filmmaker. After acting in movies asideMichael Caine and Billy Zane, he decided to become his own director. He started a project to help seven disadvantaged youths from a poor drug-ridden section of Sydney, by having them act in his first full-length film.Armed with only an IBM laptop and small budget, he made the film called "The Finished People" that had critical acclaim.
The film was a success, and many of the disadvantaged youths have gone on to act in other movies. In 2005, Khoa Do was named "Young Australian of the Year".
Thanks to IBM technology, filmmaking is now accessible to a wider number of aspiring wanna-be directors. It is no longer necessary to be part of a large film studio with a multi-million dollar budget to tell your story.
- Xavier Desdoigts
At the other extreme, was Xavier Desdoigts, director of technical operations at Animal Logic, the Computer Graphics (CG) arthouse that produced special effects of movies like "The Matrix", "House of Flying Dragons" and "World Trade Center". They started with producing digital effects for TV commercials, like this one forCarlton Draught Beer.
With the support of a large film studio and multi-million dollar budget, Animal Logic now boasts the 86th most powerful "Supercomputer" based on IBM BladeCenter technology, with over 4000 servers connected into a cluster, for making the movie "Happy Feet". The movie took four years to make, with over 500 people, of 27 different nationalities. It was the first CG movie made in Australia, and has been well-received by audiences worldwide.
Mr. Desdoigts gave out some interesting facts and figures about the movie:
- While visually stunning on the big screen, each frame is only 1.4 Megapixel, about the same resolution as most camera phones.
- In one scene, there are 427,086 penguins all appearing on frame.
- Mumble, the lovable lead character, is made up of over 6 million feathers.
- As many as 17 dancers were "motion-captured" to choreograph the tap-dancing and character interaction segments.
- Only one system admin was needed to manage this entire server farm. (IBM Systems Director technology makes this possible)
- The movie consumed 103 TB of disk space, backed up to 595 LTO tape cartridges.
- An estimated 17 million CPU-hours were needed for all the processing and rendering.
Rather than talking about technology for technology sake, these filmmakers showed how technology couldbe put to use, in a practical sense, to provide the world something of value.
technorati tags: IBM, filmmaker, film industry, Khoa Do, Michael Caine, Billy Zane, Xavier Desdoigts, Animal Logic, Happy Feet, Systems Director, LTO, BladeCenter
I will wrap up this week's theme on travel, conferences and Japan discussingGroundhog day
, celebratedtoday (Feb. 2) in the US.
I thought of this because there was a 2003 movie called"Lost in Translation", the title of yesterday's post. This movie is about an American actor, played by Bill Murray, coming to Tokyoto film a whisky commercial. I first saw it with my sister and father, and we musthave been the only three who have actually been to Japan, as we were laughing hysterically,while the rest of the audience was utterly confused. If you have never been to Japan, see the movie before you go, then see it again after you get back home.
Ten years earlier, Bill Murray also played the lead role in another movie called"Groundhog day".In the movie, Bill Murray's character is TV newsman "Phil Connors" who travels to a small townwhere they bring out a small groundhog. If the groundhog can see his shadow, it predictsat least six more weeks of winter. If it does not, winter will end sooner. The nextday, Phil wakes up to realize that he is re-living the same day, over and over, like a modern-day Sisyphus or Promethius. Howhe handles himself in this situation, is what makes the movie so memorable.
When I explain what I do for IBM, to people I meet at home and abroad, I get asked the same set of questions.
- Don't you get bored presenting the same presentations?
The fact is, I never give the same presentation twice. Since I focus mostly on visual informationand what I say, versus the words of text on the page, I am able to customize my presentation toeach unique audience. In much the same way that Bill Murray's character managed to do somethingfun and different each day in the movie, despite his situation.
I do pity those presenters who focus entirely on text, turning their back to the audience, and then reading verbatim what is on each page.They should read Seth Godin's Really Bad PowerPointwith advice like "Bullets are for the NRA".
Another problem are presenters who apologize because they did not develop the materials they are presenting. Sorry, bub, you present it, you own it. The only person held accountable fora bad presentation at a conference is the speaker. When I make charts for others, I expect themthem to adjust it to their own speaking styles.
As a speaker, if you inherit materials fromsomeone else, have the courage to change it, or accept the parts you can't change, and have thewisdom to know the difference.
- Don't you get tired of traveling?
At first I thought this was odd. It's like asking "Don't you get tired of doing different things and eating different foodswith different people in a different country every week?" How can anyone grow tired of variety?
As with any question, you have to go inside the mind of the person asking the question.For most people, travel is an ordeal, outside their comfort zone. They are travelingto attend a funeral, family reunion, or a theme park with spouse and kids in tow.If that is the only kind of traveling a person knows, then it is understandablewhy they might ask this question.
- Don't you get annoyed answering the same questions at conferences?
As if this only happens at conferences!
Seriously, it might be the 17th time I've heard the question asked, but might be onlythe first time the person is asking it, and my response may be the crucial "first impression"that sets the stage for later engagements.
In this case, I focus on continuous improvement. What is the best way to answer thisquestion? How could I have answered that better? How could I have phrased the answerso it will be well-remembered? Again, like Bill Murray's character in "Groundhog Day",have fun with it, take advantage of the opportunity for improvement.
Enjoy the weekend!
technorati tags: IBM, Japan, travel, conferences, really, bad, PowerPoint, Lost In Translation, Groundhog Day, Bill Murray, Seth Godin
Continuing my week's theme on travel, conferences, and Japan, I will discuss translation and interpretation.
By now, you realize that I speak some Japanese, but not enough to give a full presentation. In addition to English, I can present Spanish and Brazilian Portuguese, but am not yet comfortable doing a full hour talk in Japanese, especially when technical terminology is required.
This brings us to the differences between translation and interpretation. The former is more literal, but the latter is needed to get the spirit or essence of what is being communicated. Sometimes, the differences in languages and culture need to be taken into account to get the right meaning across.
- One phrase, different interpretation
The conference attire was listed as "Business Casual" which they use the foreign words, as it is a very foreign concept to the Japanese. In the US, Business Casual could be polo shirt and kahki pants, perhaps. In Japan, where everyone wears a dark suit, white shirt and conservative tie, "business casual" means your shirt can be blue, or have stripes. Few dressed down for the occasion; I saw mostly white shirts underneath those dark suit coats.
- One interpretation, different connotations
Working with my interpreter team, I went page by page to explain what I would say. On one page, I mentioned having "free space" to run applications. They asked if "free space" was good or bad? I was caught off-guard by this question. Americans enjoy wide open spaces, and the comforts afforded by having enough "leg room", "head room" or "elbow room".The Japanese word for this is "yoyu", which roughly translates to "leeway". However, "yoyu" also is used in the negative sense, tailored-to-fit clothing, for example, is preferred over loose-fitting off-the-rack clothing, because it has no "yoyu". Having too much "free space" can be just as bad as not enough, much like an hour presentation that ends 20 minutes too early is just as bad as one that goes 20 minutes over.
- One word, two different interpretations
In explaining the word "archive" we came up with two separate Japanese words. One was "katazukeru", and the other was "shimau".If you are clearing the dinner plates from the table after your meal, for example, it could be done for two reasons.Both words mean "to put away", but the motivation that drives this activity changes the word usage. The first reason, katazukeru, is because the table is important, you need the table to be empty or less cluttered to use it for something else, perhaps play some card game, work on arts and craft, or pay your bills. The second reason, shimau, is because the plates are important, perhaps they are your best tableware, used only for holidays or special occasions only, and you don't want to risk having them broken. As it turns out, IBM supports both senses of the word archive. We offer "space management" when the space on the table, (or disk or database), is more important, so older low-access data can be moved off to less expensive disk or tape. We also offer "data retention" where the data itself is valuable, and must be kept on WORM or non-erasable, non-rewriteable storage to meet business or government regulatory compliance.
- Sames words, different order
On many of my charts, we show on the left the entry-level models, in the center the midrange offerings, and on the right the enterprise class high-end devices. In English, I would say "Small, Medium, and Large". However, in Japan, they read from right to left, and their words "Dai, Chu, Sho" represent "Large, Medium, Small". So, the chart had the offerings on the page correctly sequenced, I just had to start on the right, and work my way to the left, from largest to smallest.
Understanding the differences in both language and culture greatly helps in communications.
technorati tags: IBM, Japanese, Business Casual, free space, yoyu, archive, katazukeru, shimau, WORM, non-erasable, non-rewriteable, entry-level, midrange, enterprise-class
Continuing my week's theme on travel, conferences, and Japan, I saw two items in the newsthat seem to follow a common theme.
- According to the "The Daily Yomiuri", a local Japanese paper, "double happy weddings" arebecoming more and more popular in Japan. These would be called "stotgun" weddings in the US, butin Japan, couples pay extra to have a wedding between the fifth and seventh month ofpregnancy. As Dave Barry would say, I am not making this up. 27% of couples in Japan got married while or after pregnant. The logic is that they can celebrate both events with one ceremony. Many couples believe that the primary purpose of marriage is to have children, and somethat fail to have children suffer terrible anguish or divorce. Waiting untilbeing pregnant helps ensure the couple will be "successful" in this regard.
- IBM acquires Softek, a software company that develops a product called Transparent Data MoverFacility (TDMF) to move mainframe data from one disk system to another, while applicationsare running. This can be used, for example, to move data from outdated disk systems to IBMdisk systems. This is not to be confused with IBM's archive and retention software partner,Princeton Softech.
Softek is the software spin-off of Fujitsu (a Japanese computer hardware manufacturer). Fora while, Fujitsu made IBM-compatible mainframe servers, but was not successful at developingits own system software, relying heavily on IBM for this. Unable to compete against IBM, it stoppedmaking mainframe servers, but continues making other kinds of hardware equipment.
With TDMF, the process of moving data is simple. The software runs on z/OS and intercepts all writes intendedfor a source volumes on the old array, and re-directs a copy to destination volumes on the new device.Systems can run with old and new equipment side by side for a few weeks, with the new devicestaying in-sync with the old. When the client is ready to cross over, the systems arepointed to the new disk, and the old disk systems are detached and removed from the sysplex.
Afraid that installing TDMF will mess with your applications? IBM Global Technology Services (GTS)is able to roll-in a separate mainframe, move the data, than disconnect it along with the old storage.
(For customers running Linux, UNIX or Windows on other platforms, IBM offers SAN Volume Controller (SVC).While SVC is not marketed as a "data migration device", per se, it does have this capability.Many clients were able to cost-justify purchase of an SVCto move data from old storage to new in similar fashion to how TDMF works on the mainframe.)
What do these stories have to do with one another, other than both relating to Japan? IBM has beenusing TDMF for years as part of a service offering to move data from one disk system to another.Since Sam Palmiasano took over in 2002, IBM has acquired 51 companies, 31 of them software companies.Often, these have been "successful" turning quickly profitable because IBM was already well familiar with the companies they acquire, in much the same way that husbandsare well familiar with their brides-to-be at a "double happy wedding".
So, welcome Softek! It looks like its time to celebrate again!
technorati tags: IBM, Japan, Daily Yomiuri, double happy, wedding, shotgun, Dave Barry, Softek, TDMF, z/OS, Fujitsu
Continuing my week's theme on travel, conferences, and Japan, I provide three more"survival words" in Japanese language. These might seem like an odd trio, but they comein very handy.
- Domo -- "very much"
- This is short for "Domo Arigato" which is "Thank you very much". If you just say "Domo"they assume you mean the full phrase, and gets the same point across. Some of my older readersmay remember a song by Styx "Domo Arigato Mr. Roboto" which is perhaps the only song I could think ofthat mentions IBM and Japan in the same song.
You're wondering who I am-machine or mannequin
With parts made in Japan, I am the modern man
I've got a secret I've been hiding under my skin
My heart is human, my blood is boiling, my brain I.B.M.
- Doko -- "where?"
- This word is great for trying to find places or things, such as "Where is the restroom?"Sometimes making an attempt to speak a country's official language will garner some respectand appreciation from the locals. If you're lost, and looking for help finding something, this is a great time to be appreciated.
Upon arrival to Narita, I was planning to take the N'EX high-speed train into town, but there was a train accident on the JR line, so I ended up taking the Kisei train over to Nappori station,then transfer over to a different line. Not what I had planned, but an adventure nonetheless.
- Dozo -- "after you" or "go ahead"
- Japan is all about showing respect to your elders, upper management, and others in great esteem or authority. When arriving to a dinner table, or leaving the elevator, this phraselets them know you are respectful by letting them go first. Americans always seem rushed,so it is good to show I can be patient also.
technorati tags: IBM, Japan, Japanese, Styx, survival, phrases, respect
This week I am in Japan, so my week's theme will center around travel, speaking at conferences, and Japan itself. I first travelled to Japan in the late 1980s, to visit a college friend who was working for Ford Motor Company, on assignment in Japan as liasion to Mazda Corp.
Back then, the only Japanese phrase I knew was "Wakarimashta" which means "I know" or "I understand". If you only know one phrase in a foreign language, this possibly could be the worst to know.
My second trip, I was better prepared. I learned three "survival phrases":
sumimasen - "I'm sorry/excuse me"
hanashimasen - "I don't speak"
wakarimasen - "I don't know / I don't understand"
These are great phrases to know individually, but even more powerful strung all together, to emphasize that you will begin speaking English, but at least with good reason (and perhaps a bit of irony.)
I've been to Japan many times since, and have picked up more of the language. When travelling to Japan, or anywhere for that matter, it is important to "pack light". I'll be gone for two weeks, but all I bring is a laptop bag and one carry-on piece of luggage.
I went on a trip to Prague (Czech Republic) with a female co-worker who brought FOUR pieces of luggage. One was just for shoes. Another piece was just for hair styling gel, make-up, face creams and finger nail polish. Today, the rules are different, and the TSA allows only a single quart-size plastic bag containing little jars of 3 ounces or less of liquids or gels. I didn't have any "quart-size" bags, so I used a smaller sandwich-size bag.
What does all this have to do with storage? I've helped many clients move data centers, and this involves moving their servers, their networks, and their storage. Servers and Networks are easy to move, but storage presents some challenges. In many cases, the entire company is shut down, the storage is moved, and then the company is operational again. Needless to say, it is best to do this over a weekend.
I tell clients to "pack light" and figure out what data they really need in the move. What do you really need to operate your business? Bring just that, the rest can arrive later.
This same concept applies for Business Continuity and Disaster Recovery planning. What do you really need after a disaster occurs? Can you run your business for a few weeks on that data, until the rest of the data is restored? If you can't run your entire business on that data, can you run your most important parts of your business?
If you run a bank, perhaps keeping your ATM cash machines running is more important than making out new loans. In Japan, if a bank has any outages that impact their ATM machines, they put out a full page advertisement in the local papers to apologize for the inconvenience.
Business Continuity is one of the nine "Infrastructure Solutions" that IBM can help clients with. If you are interested in learning more on how IBM can help you with your Business Continuity, click here.
technorati tags: IBM, Japan, Prague, TSA, Business Continuity, Disaster Recovery, ATM, Infrastructure Solutions, travel, Japanese, language, survival, phrases,
Stephen Colbert, of The Colbert Report
, explains the name changes in recent mergers
of the Telecommunications industry. A discussion on "changing names" and how that impacts storage seems like a good way to wrap up the week's theme on naming conventions.
Name changes are sometimes painful, but often times done for a purpose, such as to promote a family. In the US, when a man and woman marries, the woman often changes her family name to match her husband, and the kids all adopt the father's family name. I say "often" because there are times where the woman keeps her name, or adds to it in a hyphenated way. ABC News reported that a Man Fights to Take Wife's Name in Marriage. KipEsquire, a lawyer, writes about it in his blogA stitch in haste.
IT industry changes the names of products that people knew as something else. Other times, they re-use an existing name, when really it is or should be different from the original. Last year, I took on the job of helping transition from our brand "TotalStorage" to the "System Storage" product line under the new "IBM Systems" brand. I help decide what stays the same name or what changes, when it should change, and how to announce that change.
On the disk side, IBM renamed Fibre Array Storage Technology, or FAStT, which was pronounced exactly like "fast", to DS4000 series. This was a big improvement, as people couldn't seem to spell it properly, with variations like "FastT". Nor could people pronounce it properly, saying "fast-tee" instead. The advantage of "DS" is that it is both easy to spell, and easy to pronounce. The DS4000 series continues to be "fast", providing excellent performance for its midrange price category.
IBM's Enterprise Storage Server (ESS) line went from model E10, to F20, to 750 and 800. When IBM came out with its replacement, the IBM TotalStorage DS8000, some people asked why it wasn't named the ESS 900, for example. The DS8000 is quite different internally, new hardware design and implementation, but is highly compatible with the ESS line, and shares much of the same functionality from microcode. Last year, it was replaced by the IBM System Storage DS8000 Turbo. Again, newer hardware, so it was easy to justify the new name change from "TotalStorage" to "System Storage".
Renaming a product risks losing its certifications and awards. For example, IBM spent a lot of time and money getting the OS/390 operating system certified as a "UNIX" platform. When it was renamed to z/OS, IBM had to do it all over again. Learning from this experience, IBM decided not to rename the SAN Volume Controllerto a new designation like "DS5750", as it enjoys the "number one" spot on both the SPC-1 and SPC-2 performance benchmarks, and is recognized as the leader in the disk storage virtualization marketplace. Renaming this product would mean losing that collateral.
IBM's "other disk systems" the N series posed another set of challenges. The current DS line already has entry-level (DS3000), midrange (DS4000) and enterprise-class (DS6000 and DS8000) products. The OEM agreement that IBM has with Network Appliance (NetApp) resulted in a new set of entry-level, midrange, and enterprise-class products. But these didn't fit nicely into the DS3000-to-DS8000 continuum. Instead, IBM decided to go with N series, using N3000 for entry-level, N5000 for midrange, and N7000 for enterprise-class. These are different than the numbers used by NetApp for their comparable, but not identical, offerings.
On the tape side, IBM decided to name the tape drives TS1000 and TS2000 range, tape libraries and automation with a TS3000 range, and tape virtualization to the TS7000 range. A lot of tape products already had 3000 numbering that had to change to fit this new scheme. This is why IBM's popular 3592 tape drive was renamed to the TS1120. The replacement to the 3494 Virtual Tape Server was named TS7700 Virtualization Engine.
Obviously, you can't change the names of products that are currently in the field, but what about existing software with minor updates? IBM decided to leave "TotalStorage Produtivity Center" under the "TotalStorage" brand until it has a significant version upgrade. Many people say "TPC" as a convenient acronym when referring to this product, but TPC is a registered trademark of the Professional Golfers Association (PGA) to refer to its "Tournament Players Club".
How can anyone confuse "managing storage" with "playing golf"? One activity is full of frustration that takes years or decades to master, involving the need to understand a variety of equipment and techniques to use each properly to accomplish your goals; and the other is an enjoyable activity, immediately productive in front of a single pane of glass managing all of your DAS, SAN and NAS storage, from reporting on your files and databases to managing storage networks and tape libraries.
Enjoy the weekend!
technorati tags: Stephen Colbert, Colbert Report, Telecommunications industry, KipEsquire, IBM, FAStT, DS4000, DS3000, DS8000, OS/390, UNIX, z/OS, SAN Volume Controller, N series, TS1120, TS7700, TotalStorage Productivity Center, TPC, PGA, Golf
Continuing my theme on naming conventions, this time I will talk about finding information.
Industry analysts estimate that information workers spend as much as 30 percent of their working day just looking for information they need, as mentioned in an interview withJeff Teper, Microsoft.A second study of middle managers found similar results, and is discussed in Information Architecture andDM Review.
Take for example looking for information on a specific person. If you were searching for"Barack Obama" or "Lulit Shiferaw", then perhaps you will find exactly the person you werelooking for.
On the other hand, some names are more common. I've met my share of people named John Smith, Jennifer Jones,or Mark Johnson. While PEARSON does not even make the top 100 list of family names here in the US, it isstill fairly common.
People looking for me may realize that I am notHead of Australian Economics at ANZ Bank, author of the book "Don't Read This!", winner of the Killam Teaching Prize, or owner offisheries in Essex.While I do work out at the gym on a regular basis, I look nothing like thisfamous body builder. There are several others, but I pickedjust a few to make my point.
At this time, there is only one "Tony Pearson" working at IBM. A while back, there was a second,a woman who spelled her name "Toni" with an "i" at the end. Her job involved deploying storage on AIX platforms,and was similar enough job description to mine that we would get each other's mail, and sometimes not realize it was meant for the other.
Dan Santow of Word Wise talks about the difficulties of proper names withaccent marks.This would make searching worse, but many search tools can handle this, stripping off allaccent marks to make the comparisons.
But what if you wanted to leave those accent marks in, for an exact match?
technorati tags: Jeff Teper, Accenture, search, names, IBM, Barack Obama, Lulit Shiferaw, Dan Santow, Word Wise, productivity, Essex, bodybuilder, author, fisheries,Microsoft, Australian, Economics, ANZ Bank,Killam Teaching Prize
On his blog post on preparation
, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
- Block-based disk storage
You walk to the coat-check desk, tell the attendant to hang your coat on a specific hanger, say hanger number 387. When you come back, you ask for the coat on hanger 387. The coat-check attendant knows exactly where hanger 387 is, and is able to retrieve it quickly. Most disk systems use this approach, including IBM SAN Volume Controller and DS family of disk systems.
- Name-based disk storage
You walk to the coat-check desk, tell the person the name that you want to call your coat. An empty hanger is located, and a list of coat names, with their associated hanger number, is then kept. Upon return, you ask for your coat by name, and the coat-check attendant looks up the hanger number to match, and retrieves your coat. This is the scheme used by the IBM System Storage DR550, N series for NAS storage, and the IBM Healthcare and Life Sciences Grid Medical Archive Solution (GMAS).
- Content-addressable storage (CAS)
You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a "hash code" from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. Upon return, you provide the hash code you were given, and the coat-check attendant looks up the hanger number to match, and retrieves your coat.This is the scheme used for some non-erasable, non-rewriteable storage, such as the EMC Centera.
IBM invented hash codes in 1953 as a way to speed up searches. For example, if you want to look up a word in the dictionary, knowing the first letter of the word makes it much quicker, because you can thumb directly to that section. A hash code was intended to give a more even distribution, so that if a million words are stored in a "hash code dictionary" then you would calculate the hash code, then look up only that section of words associated with that specific hash code number.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
It's safer that way.
technorati tags: Seth Godin, Swedish, saying, bad, weather, clothing, snow, Tucson, coat, check, room, IBM, block-based, disk, storage, DR550, N series, NAS, healthcare, life sciences, grid, medical, archive, solution, GMAS, content-addressable, CAS, EMC, Centera, hash code, collision, de-duplication, birthday, paradox