Shakespeare wrote "What's in a name? That which we call a rose by any other word would smell as sweet." This week my theme will be on names, naming convention, and how we access information on storage.
Take for example these two sentences:
The Bears beat New Orleans.
Chicago clobbered the Saints.
Though they appear very different, football fans who might have watched either or both of the two conference title games yesterday would quickly recognize that they refer to the same two teams and the same end-result.
I'll be traveling to Asia next week. While most people call me "Tony", my legal given name is "Anthony" which is what appears on my passport and other legal documents. Most English-speaking countries handle this fine, but it can be confusing in Japan or China, where "A. Pearson" doesn't match "T. Pearson".
In the US, our given and family names are referred to as our "first name" and our "last name", relating to their positional sequence. In Asia, family names come first, followed by their given names last. To help avoid confusion, we have started adopting the practice of putting the family name in ALL CAPITAL LETTERS, so I would "Tony PEARSON" while my colleague may be "WONG Francis".
In Japanese, "Mr. JONES" would be "Jones-san". However, Pearson-san is such a toungue-twister, that most just say "Tony-san" which is fine with me. I have been called "Mr. Tony" in a variety of countries, perfectly acceptable.
You can call me anything you like, just don't call me late for dinner.
On his blog post on preparation
, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
- Block-based disk storage
You walk to the coat-check desk, tell the attendant to hang your coat on a specific hanger, say hanger number 387. When you come back, you ask for the coat on hanger 387. The coat-check attendant knows exactly where hanger 387 is, and is able to retrieve it quickly. Most disk systems use this approach, including IBM SAN Volume Controller and DS family of disk systems.
- Name-based disk storage
You walk to the coat-check desk, tell the person the name that you want to call your coat. An empty hanger is located, and a list of coat names, with their associated hanger number, is then kept. Upon return, you ask for your coat by name, and the coat-check attendant looks up the hanger number to match, and retrieves your coat. This is the scheme used by the IBM System Storage DR550, N series for NAS storage, and the IBM Healthcare and Life Sciences Grid Medical Archive Solution (GMAS).
- Content-addressable storage (CAS)
You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a "hash code" from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. Upon return, you provide the hash code you were given, and the coat-check attendant looks up the hanger number to match, and retrieves your coat.This is the scheme used for some non-erasable, non-rewriteable storage, such as the EMC Centera.
IBM invented hash codes in 1953 as a way to speed up searches. For example, if you want to look up a word in the dictionary, knowing the first letter of the word makes it much quicker, because you can thumb directly to that section. A hash code was intended to give a more even distribution, so that if a million words are stored in a "hash code dictionary" then you would calculate the hash code, then look up only that section of words associated with that specific hash code number.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
It's safer that way.
technorati tags: Seth Godin, Swedish, saying, bad, weather, clothing, snow, Tucson, coat, check, room, IBM, block-based, disk, storage, DR550, N series, NAS, healthcare, life sciences, grid, medical, archive, solution, GMAS, content-addressable, CAS, EMC, Centera, hash code, collision, de-duplication, birthday, paradox
Happy New Year!
This year I resolve to be more consistent in my blogging, and my goal is to give you one to five entries per week, every week, based on the advice from Glenn Wolsey, Jennette Banks, and others.On some weeks, I will have a running theme, so rather than super-long entries to cover everything I can think of on a topic, make the entries short and readable. This week is a good time to review last year's "New Year's Resolutions" and to make new ones for 2007. I will discuss actions that companies can adopt for their data centers.
A common resolution is to lose weight, as in this Dilbert comic. Last year, I resolved to lose weight in 2006, and am delighted with myself that I lost eight pounds. When people ask for the secret of my success, I whisper in their ear "Eat less, exercise more." In general, people (and companies) know what to do, but just don't do it, which Pfeffer and Sutton document in their book The Knowing-Doing Gap. In my case, it involved lifestyle change: I exercised at a gym three times per week in Tucson, with a personal trainer, and revamped my diet.
Not everyone subscribes to the "eat less exercise more" philosophy. For example, Ric Watson argues in his blog that you can eat fewer calories, but eat more in actual volume, by choosing the right foods. This brings up the issues of "metrics" that most data centers are familiar with. Last year, I read the book "You: On a Diet" which explains that it is better to focus on "waist reduction" as measured in inches around your mid-section at the belly button, than "weight reduction" as measured in pounds. This year, I resolve to get down to 35 inches by the end of 2007.
The problem with measuring "weight" is that you are weighing bones, muscle and fat. A person can gain ten pounds of muscle, lose ten pounds of fat, and the scale would indicate no progress. The same problem occurs in data centers. How many TB of data do you have? Storage admins can easily tell you, but can they tell how much of this is bone (data needed for operating infrastructure), muscle (data used in daily operations that generates revenue) or fat (obsolete or orphaned data)?
We at IBM often state that "Information Lifecycle Management (ILM)" is more lifestyle change than a "fad diet". Figuring out what data you should capture in the first place, where to place it, when to move it, and when to get rid of it, is more important that just buying different tiers of storage hardware. So, for those looking to make new data center resolutions, I suggest the following actions:
- Re-evaluate the metrics you now use, and determine if they are helpful in making decisions and taking action.
- Come up with new ones that are more focused to solve the issues you face.
- Consider storage infrastructure software, such as IBM TotalStorage Productivity Center, to help you gather the information about your SAN, disk and tape systems, calculate the metrics, and automate the appropriate actions.
If you don't know where to start with ILM, certainly IBM can point you to the right solutions,best practices, techniques, and whitepaper.
technorati tags: Glenn Wosley, Jennette Banks, New Years resolutions, weight reduction, Diet, ILM, Information Lifecycle Management, IBM, TotalStorage Productivity Center
Continuing this week's theme of New Year's Resolutions for the data center, today we'll talk about one that people don't always think about on a personal level, that is to hone your tools and skills.
A long time ago, I used to be a regular speaker at the SHARE user group conference. One of the most attended sessions was Sam Golob presenting the latest CBT Tape set of tools. Over time, this large collection of "mainframe shareware" was handed out on 3480 tape cartridges, then on CDs, and finally made downloadable off the web.Sam's main point, which I remember to this day, was that everyone who has a job should figure out what tools they use, keep those tools functioning properly, and learn to use them well.
Later, I took some cooking classes at a culinary school. Among other things, we learned:
- A sharp knife is safer and easier to use than a dull one, resulting in fewer accidents
- Knowing what you are doing is the difference between food that is "simply awful" to that which is "awfully simple" to prepare.
- A well trained chef can prepare most meals with just a sharp knife and wooden spoon.
This last point hits close to home, as many people like me have too many tools that they do not use often enough to know how to use them well. Do I really need my strawberry corer, garlic press, or a tray designed for the storage and delivery of deviled eggs?
The same could be said about software tools. What tools do you use in your job? Do you feel you know how to take full advantage of their power and capabilities?If you develop software, do you know all the features for your debugging tools? If you develop advertising or marketing materials, do you know all the features of your photo or video editing software? If you manage storage in a data center, do you know all the tools for managing your storage area network (SAN), disk systems, tape libraries, and reporting tools to identify all of your files and databases across your entire IT environment?I would not be surprised if you could replace a whole mess of tools with just one, such as the IBM TotalStorage Productivity Center.
For me, I resolve to learn how to better use Lotus Notes e-mail client, and perhaps the new Office 2007.
technorati tags: Sam Golub, SHARE, CBT, tape, disk, SAN, mainframe, cooking, tools, IBM, TotalStorage Productivity Center, Lotus Notes, Office 2007
Wrapping up this week's theme of New Year's Resolutions for the data center, the New York Times argues we should go easy on the resolutions
, so I'll conclude with reducing stress. Lighten up! Relax, and try not to take your job so seriously.
(I know you're probably thinking, "That's easy for you to say, Mr. paid-to-play-golf-with-clients big-shot executive, but what about the rest of us?" or perhaps "I can't do that! My job is so important that if I didn't take it so seriously, my company would go bankrupt, my industry would falter, and global economies will collapse!" I can understand. Over 70 percent of all of the world's business transactions travels through, or sits on, IBM equipment, so you can imagine how stressful my past 20 years have been. Bear with me, read on, and hopefully you might benefit from my past experiences.)
- Laugh out loud
- Everytime someone laughs out loud at the office here in Tucson, everyone else within earshot stops what they are doing and rushes over to see what is so funny. Likewise, everytime it rains during normal working hours, people stop what they are doing, and run to the windows to see what water coming out of the sky looks like. Do you work in a "dry" climate, where laughter, like rain, is a rare occurrence?
Recognizing the benefits of laughter on reducing stress and improving health, my friends and I started theTucson Laughter Club back in 2004. There are hundreds of laughter clubs across the United States and the rest of the world, sometimes referred to as "Laughter Yoga" groups. Those of you readers in Tucson are welcome to join us, our next meeting is January 13.
Look to see if there is one near you, or start your own. Until then, laugh while you watch this funny storage-related video from the folks over at Sun StorageTek, or this one from Kodak.
- Every laughter club meeting starts out with some breathing exercises. If you feel stressed, try this simple2-minute relaxation technique.
- I'm not just talking about stretching our muscles here.
- The next time you tell the story, stretch the truth a little, aim for 70 percent truth, 30 percent embellishment.
- The next time you are making plans for lunch or dinner, stretch yourself out of your comfort zone and try something new and different, new restaurant, or different type of cuisine.
- The next time you get called by a phone solicitor or telemarket, stretch out the conversation, have fun with it, ask them what they are wearing, and share with them all the successes and problems you have at work, until THEY hang up first.
- The next time you are throwing a party, stretch your invite list to include someone you normally wouldn't invite, perhaps a neighbor or co-worker.
- The next time someone insists you make a list,stretch the process out, to give you more time to drink their expensive wine.
- When you hear the phrase Work/Life Balance, do you think to yourself, "I would settle for Work/Sleep Balance!"You're not alone. The National Sleep Foundation reports that most Americans don't get enough sleep, which can causevarious health problems.
So get plenty of sleep at home, before you get fired forsleeping on the job.
- Get along with others
- Peace on earth starts one relationship at a time.I found this amusing article on Wired, discussion the Top Blogfights of 2006.Can't we all get along? I stopped two blog-fights in 2006, just by pointing to the facts, and setting the record straight.
If there is someone you are not getting along with at work, fix it. Sooner, rather than later. Here are tips to be more assertive.
- Listen to Music
- Can you listen to music where you work? Listening to music has been shownto help reduce stress.I've got my Thinkpad T60 laptop connected to my wireless bluetooth headset, so I can listen to relaxing music without disturbing anyone else, and not be "tethered" to my system with traditional headphone wires.
In her book, "Life Hacker: 88 tech tricks to turbocharge your day", Gina Trapani suggestsPink noise. I prefer"The Quiet Earth", an internet radio station on Live365.
So, relax and enjoy your weekend. And remember, when you get back to the office on Monday, its only ones and zeros.
technorati tags: New Years, resolutions, reducing stress, laughter, Tucson Laughter Club, Laughter Yoga, Sun, StorageTek, Kodak, Work/Life Balance, sleep, blogfights, assertive, music, LifeHacker, Live365, Pink Noise