Shakespeare wrote "What's in a name? That which we call a rose by any other word would smell as sweet." This week my theme will be on names, naming convention, and how we access information on storage.
Take for example these two sentences:
The Bears beat New Orleans.
Chicago clobbered the Saints.
Though they appear very different, football fans who might have watched either or both of the two conference title games yesterday would quickly recognize that they refer to the same two teams and the same end-result.
I'll be traveling to Asia next week. While most people call me "Tony", my legal given name is "Anthony" which is what appears on my passport and other legal documents. Most English-speaking countries handle this fine, but it can be confusing in Japan or China, where "A. Pearson" doesn't match "T. Pearson".
In the US, our given and family names are referred to as our "first name" and our "last name", relating to their positional sequence. In Asia, family names come first, followed by their given names last. To help avoid confusion, we have started adopting the practice of putting the family name in ALL CAPITAL LETTERS, so I would "Tony PEARSON" while my colleague may be "WONG Francis".
In Japanese, "Mr. JONES" would be "Jones-san". However, Pearson-san is such a toungue-twister, that most just say "Tony-san" which is fine with me. I have been called "Mr. Tony" in a variety of countries, perfectly acceptable.
You can call me anything you like, just don't call me late for dinner.
On his blog post on preparation
, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
- Block-based disk storage
You walk to the coat-check desk, tell the attendant to hang your coat on a specific hanger, say hanger number 387. When you come back, you ask for the coat on hanger 387. The coat-check attendant knows exactly where hanger 387 is, and is able to retrieve it quickly. Most disk systems use this approach, including IBM SAN Volume Controller and DS family of disk systems.
- Name-based disk storage
You walk to the coat-check desk, tell the person the name that you want to call your coat. An empty hanger is located, and a list of coat names, with their associated hanger number, is then kept. Upon return, you ask for your coat by name, and the coat-check attendant looks up the hanger number to match, and retrieves your coat. This is the scheme used by the IBM System Storage DR550, N series for NAS storage, and the IBM Healthcare and Life Sciences Grid Medical Archive Solution (GMAS).
- Content-addressable storage (CAS)
You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a "hash code" from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. Upon return, you provide the hash code you were given, and the coat-check attendant looks up the hanger number to match, and retrieves your coat.This is the scheme used for some non-erasable, non-rewriteable storage, such as the EMC Centera.
IBM invented hash codes in 1953 as a way to speed up searches. For example, if you want to look up a word in the dictionary, knowing the first letter of the word makes it much quicker, because you can thumb directly to that section. A hash code was intended to give a more even distribution, so that if a million words are stored in a "hash code dictionary" then you would calculate the hash code, then look up only that section of words associated with that specific hash code number.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
It's safer that way.
technorati tags: Seth Godin, Swedish, saying, bad, weather, clothing, snow, Tucson, coat, check, room, IBM, block-based, disk, storage, DR550, N series, NAS, healthcare, life sciences, grid, medical, archive, solution, GMAS, content-addressable, CAS, EMC, Centera, hash code, collision, de-duplication, birthday, paradox
Happy New Year!
This year I resolve to be more consistent in my blogging, and my goal is to give you one to five entries per week, every week, based on the advice from Glenn Wolsey, Jennette Banks, and others.On some weeks, I will have a running theme, so rather than super-long entries to cover everything I can think of on a topic, make the entries short and readable. This week is a good time to review last year's "New Year's Resolutions" and to make new ones for 2007. I will discuss actions that companies can adopt for their data centers.
A common resolution is to lose weight, as in this Dilbert comic. Last year, I resolved to lose weight in 2006, and am delighted with myself that I lost eight pounds. When people ask for the secret of my success, I whisper in their ear "Eat less, exercise more." In general, people (and companies) know what to do, but just don't do it, which Pfeffer and Sutton document in their book The Knowing-Doing Gap. In my case, it involved lifestyle change: I exercised at a gym three times per week in Tucson, with a personal trainer, and revamped my diet.
Not everyone subscribes to the "eat less exercise more" philosophy. For example, Ric Watson argues in his blog that you can eat fewer calories, but eat more in actual volume, by choosing the right foods. This brings up the issues of "metrics" that most data centers are familiar with. Last year, I read the book "You: On a Diet" which explains that it is better to focus on "waist reduction" as measured in inches around your mid-section at the belly button, than "weight reduction" as measured in pounds. This year, I resolve to get down to 35 inches by the end of 2007.
The problem with measuring "weight" is that you are weighing bones, muscle and fat. A person can gain ten pounds of muscle, lose ten pounds of fat, and the scale would indicate no progress. The same problem occurs in data centers. How many TB of data do you have? Storage admins can easily tell you, but can they tell how much of this is bone (data needed for operating infrastructure), muscle (data used in daily operations that generates revenue) or fat (obsolete or orphaned data)?
We at IBM often state that "Information Lifecycle Management (ILM)" is more lifestyle change than a "fad diet". Figuring out what data you should capture in the first place, where to place it, when to move it, and when to get rid of it, is more important that just buying different tiers of storage hardware. So, for those looking to make new data center resolutions, I suggest the following actions:
- Re-evaluate the metrics you now use, and determine if they are helpful in making decisions and taking action.
- Come up with new ones that are more focused to solve the issues you face.
- Consider storage infrastructure software, such as IBM TotalStorage Productivity Center, to help you gather the information about your SAN, disk and tape systems, calculate the metrics, and automate the appropriate actions.
If you don't know where to start with ILM, certainly IBM can point you to the right solutions,best practices, techniques, and whitepaper.
technorati tags: Glenn Wosley, Jennette Banks, New Years resolutions, weight reduction, Diet, ILM, Information Lifecycle Management, IBM, TotalStorage Productivity Center
Continuing this week's theme of New Year's Resolutions for the data center, today we'll talk about one that people don't always think about on a personal level, that is to hone your tools and skills.
A long time ago, I used to be a regular speaker at the SHARE user group conference. One of the most attended sessions was Sam Golob presenting the latest CBT Tape set of tools. Over time, this large collection of "mainframe shareware" was handed out on 3480 tape cartridges, then on CDs, and finally made downloadable off the web.Sam's main point, which I remember to this day, was that everyone who has a job should figure out what tools they use, keep those tools functioning properly, and learn to use them well.
Later, I took some cooking classes at a culinary school. Among other things, we learned:
- A sharp knife is safer and easier to use than a dull one, resulting in fewer accidents
- Knowing what you are doing is the difference between food that is "simply awful" to that which is "awfully simple" to prepare.
- A well trained chef can prepare most meals with just a sharp knife and wooden spoon.
This last point hits close to home, as many people like me have too many tools that they do not use often enough to know how to use them well. Do I really need my strawberry corer, garlic press, or a tray designed for the storage and delivery of deviled eggs?
The same could be said about software tools. What tools do you use in your job? Do you feel you know how to take full advantage of their power and capabilities?If you develop software, do you know all the features for your debugging tools? If you develop advertising or marketing materials, do you know all the features of your photo or video editing software? If you manage storage in a data center, do you know all the tools for managing your storage area network (SAN), disk systems, tape libraries, and reporting tools to identify all of your files and databases across your entire IT environment?I would not be surprised if you could replace a whole mess of tools with just one, such as the IBM TotalStorage Productivity Center.
For me, I resolve to learn how to better use Lotus Notes e-mail client, and perhaps the new Office 2007.
technorati tags: Sam Golub, SHARE, CBT, tape, disk, SAN, mainframe, cooking, tools, IBM, TotalStorage Productivity Center, Lotus Notes, Office 2007
Wrapping up this week's theme of New Year's Resolutions for the data center, the New York Times argues we should go easy on the resolutions
, so I'll conclude with reducing stress. Lighten up! Relax, and try not to take your job so seriously.
(I know you're probably thinking, "That's easy for you to say, Mr. paid-to-play-golf-with-clients big-shot executive, but what about the rest of us?" or perhaps "I can't do that! My job is so important that if I didn't take it so seriously, my company would go bankrupt, my industry would falter, and global economies will collapse!" I can understand. Over 70 percent of all of the world's business transactions travels through, or sits on, IBM equipment, so you can imagine how stressful my past 20 years have been. Bear with me, read on, and hopefully you might benefit from my past experiences.)
- Laugh out loud
- Everytime someone laughs out loud at the office here in Tucson, everyone else within earshot stops what they are doing and rushes over to see what is so funny. Likewise, everytime it rains during normal working hours, people stop what they are doing, and run to the windows to see what water coming out of the sky looks like. Do you work in a "dry" climate, where laughter, like rain, is a rare occurrence?
Recognizing the benefits of laughter on reducing stress and improving health, my friends and I started theTucson Laughter Club back in 2004. There are hundreds of laughter clubs across the United States and the rest of the world, sometimes referred to as "Laughter Yoga" groups. Those of you readers in Tucson are welcome to join us, our next meeting is January 13.
Look to see if there is one near you, or start your own. Until then, laugh while you watch this funny storage-related video from the folks over at Sun StorageTek, or this one from Kodak.
- Every laughter club meeting starts out with some breathing exercises. If you feel stressed, try this simple2-minute relaxation technique.
- I'm not just talking about stretching our muscles here.
- The next time you tell the story, stretch the truth a little, aim for 70 percent truth, 30 percent embellishment.
- The next time you are making plans for lunch or dinner, stretch yourself out of your comfort zone and try something new and different, new restaurant, or different type of cuisine.
- The next time you get called by a phone solicitor or telemarket, stretch out the conversation, have fun with it, ask them what they are wearing, and share with them all the successes and problems you have at work, until THEY hang up first.
- The next time you are throwing a party, stretch your invite list to include someone you normally wouldn't invite, perhaps a neighbor or co-worker.
- The next time someone insists you make a list,stretch the process out, to give you more time to drink their expensive wine.
- When you hear the phrase Work/Life Balance, do you think to yourself, "I would settle for Work/Sleep Balance!"You're not alone. The National Sleep Foundation reports that most Americans don't get enough sleep, which can causevarious health problems.
So get plenty of sleep at home, before you get fired forsleeping on the job.
- Get along with others
- Peace on earth starts one relationship at a time.I found this amusing article on Wired, discussion the Top Blogfights of 2006.Can't we all get along? I stopped two blog-fights in 2006, just by pointing to the facts, and setting the record straight.
If there is someone you are not getting along with at work, fix it. Sooner, rather than later. Here are tips to be more assertive.
- Listen to Music
- Can you listen to music where you work? Listening to music has been shownto help reduce stress.I've got my Thinkpad T60 laptop connected to my wireless bluetooth headset, so I can listen to relaxing music without disturbing anyone else, and not be "tethered" to my system with traditional headphone wires.
In her book, "Life Hacker: 88 tech tricks to turbocharge your day", Gina Trapani suggestsPink noise. I prefer"The Quiet Earth", an internet radio station on Live365.
So, relax and enjoy your weekend. And remember, when you get back to the office on Monday, its only ones and zeros.
technorati tags: New Years, resolutions, reducing stress, laughter, Tucson Laughter Club, Laughter Yoga, Sun, StorageTek, Kodak, Work/Life Balance, sleep, blogfights, assertive, music, LifeHacker, Live365, Pink Noise
Continuing this week's theme of New Year's Resolutions
for the data center, today we'll talk about one that many people make for their own personal lives: staying on a budget.
Often, when faced with a tightening budgets, we try to make more use of what we already have. Tell someone they are only using 10 percent of their brain, and they immediatelybelieve you; but tell them they are only using 30 percent of their storage, and they ask for a whitepaper,magazine article, or clarification on how that percentage is calculated. I actually visiteda customer that was only using6 percent of the storage attached to their Windows servers!
So, to help those of you making data center resolutions to stay on budget, the terms to remember are "Reduce", "Reuse" and "Recycle".
- When people come to request storage, are they being reasonable about what they need today, or are they asking for what they might need over the next three years? They might need 50GB, but they ask for 100GB, in case they grow, and a year later, you find they have only 15GB of data on it. On the flipside, the person asks for what they need but some storage admins give out more, just so they don't have to be bothered so often when growth happens. Finally, I have seen this formalized into fixed size LUNs, all the disk is carved into big huge 100GB pieces, so if you need 20GB, here's one big enough with plenty of room to grow.
If you are going to keep on a budget, remember that storage today is 30% more expensive than storage next year. That is the average drop in both disk and tape on a dollar-per-MB basis. If there is any way to postpone giving out storage until it is actually needed, you can save a bundle of money. Timing is everything! In the event of a disaster, getting immediate replacement for disk can be very expensive, but if you can wait just two weeks, you can negotiate a better deal. I thought of this while going to the movie theatre yesterday. A "hot dog" and a bottle of water was $8.00, but if you are able to wait two hours and eat after the movie, you can get a much better meal for less.
- A lot of companies buy new storage because their existing storage isn't fast enough, or doesn't have the latest copy services. This can easily be solved with an IBM SAN Volume Controller (SVC). The SVC can virtualize slower, functionless storage, and present to your application hosts virtual disks that are faster, and with all the latest disk-to-disk copy services like FlashCopy, Metro Mirror, and Global Mirror.
Chances are, you have unused disk capacity spread across all your storage today, but perhaps they are formatted into small LUNs. The SVC can combine the capacity, and let you carve up big LUNs at the sizes you need.This is like taking all those tiny pieces of soap in your shower and forming a new bar of soap, or taking all the crumbs at the bottom of your bread box, and making a new slice of bread. And, the virtual LUNs are dynamically expandable,so give out only the amount they need today, as it is simple to expand them to larger sizes later.
- Of my 13 patents, the first will always be my favorite, on a function called "RECYCLE" for the Data Facility Storage Management Subsystem Hierarchical Storage Manager (DFSMShsm) product, which is now a component of the IBM z/OS operating system. Basically, tapes could contain hundreds or thousands of files, such as backup versions or archive copies, and these expired on different dates. As a result, a tape would be written100 percent full, and then over time, decrease in valid data to 80, 60, 40, 20 until it hit 0 percent. In some cases, a single filecould hold an entire tape hostage. RECYCLE was able to read the valid data off tapes that were perhaps less than 20 percent full, and consolidate them onto fewer tapes. As a result, a whole bunch of tapes could be returned to the scratch pool, and reused immediately for other workloads. This also helps in moving to newer, higher capacity cartridges, such as the new 700GB cartridge that IBM co-developed with FujiFilm.(This RECYCLE function exists in our IBM Tivoli Storage Manager software, as well as our Virtual Tape Server, but is called "reclamation" instead, to avoid confusion on searches.)
When evaluating your use of tape, determine if you are making best use of the tapes you have now, and perhaps a RECYCLE (or reclamation) scheme may be in order. Fewer tapes can save money in many ways, such as reduced storage costs, and reduced courier costs to send the tapes offsite. Tape media can still be 10-20 times less expensive than disk, based on full capacity.
technorati tags: IBM, storage utilization,RECYCLE, Tivoli Storage Manager, SAN Volume Controller, SVC, tape, disk, FujiFilm, DFSMS, HSM, DFSMShsm, Virtual Tape Server, brains
Continuing my week's theme on travel, conferences, and Japan, I provide three more"survival words" in Japanese language. These might seem like an odd trio, but they comein very handy.
- Domo -- "very much"
- This is short for "Domo Arigato" which is "Thank you very much". If you just say "Domo"they assume you mean the full phrase, and gets the same point across. Some of my older readersmay remember a song by Styx "Domo Arigato Mr. Roboto" which is perhaps the only song I could think ofthat mentions IBM and Japan in the same song.
You're wondering who I am-machine or mannequin
With parts made in Japan, I am the modern man
I've got a secret I've been hiding under my skin
My heart is human, my blood is boiling, my brain I.B.M.
- Doko -- "where?"
- This word is great for trying to find places or things, such as "Where is the restroom?"Sometimes making an attempt to speak a country's official language will garner some respectand appreciation from the locals. If you're lost, and looking for help finding something, this is a great time to be appreciated.
Upon arrival to Narita, I was planning to take the N'EX high-speed train into town, but there was a train accident on the JR line, so I ended up taking the Kisei train over to Nappori station,then transfer over to a different line. Not what I had planned, but an adventure nonetheless.
- Dozo -- "after you" or "go ahead"
- Japan is all about showing respect to your elders, upper management, and others in great esteem or authority. When arriving to a dinner table, or leaving the elevator, this phraselets them know you are respectful by letting them go first. Americans always seem rushed,so it is good to show I can be patient also.
technorati tags: IBM, Japan, Japanese, Styx, survival, phrases, respect
Modified by TonyPearson
Continuing my week's theme on travel, conferences, and Japan, I saw two items in the newsthat seem to follow a common theme.
- According to the "The Daily Yomiuri", a local Japanese paper, "double happy weddings" arebecoming more and more popular in Japan. These would be called "stotgun" weddings in the US, butin Japan, couples pay extra to have a wedding between the fifth and seventh month ofpregnancy. As Dave Barry would say, I am not making this up. 27% of couples in Japan got married while or after pregnant. The logic is that they can celebrate both events with one ceremony. Many couples believe that the primary purpose of marriage is to have children, and somethat fail to have children suffer terrible anguish or divorce. Waiting untilbeing pregnant helps ensure the couple will be "successful" in this regard.
- IBM acquires Softek, a software company that develops a product called Transparent Data MoverFacility (TDMF) to move mainframe data from one disk system to another, while applicationsare running. This can be used, for example, to move data from outdated disk systems to IBMdisk systems. This is not to be confused with IBM's archive and retention software partner,Princeton Softech.
Softek is the software spin-off of Fujitsu (a Japanese computer hardware manufacturer). Fora while, Fujitsu made IBM-compatible mainframe servers, but was not successful at developingits own system software, relying heavily on IBM for this. Unable to compete against IBM, it stoppedmaking mainframe servers, but continues making other kinds of hardware equipment.
With TDMF, the process of moving data is simple. The software runs on z/OS and intercepts all writes intendedfor a source volumes on the old array, and re-directs a copy to destination volumes on the new device.Systems can run with old and new equipment side by side for a few weeks, with the new devicestaying in-sync with the old. When the client is ready to cross over, the systems arepointed to the new disk, and the old disk systems are detached and removed from the sysplex.
Afraid that installing TDMF will mess with your applications? IBM Global Technology Services (GTS)is able to roll-in a separate mainframe, move the data, than disconnect it along with the old storage.
(For customers running Linux, UNIX or Windows on other platforms, IBM offers SAN Volume Controller (SVC).While SVC is not marketed as a "data migration device", per se, it does have this capability.Many clients were able to cost-justify purchase of an SVCto move data from old storage to new in similar fashion to how TDMF works on the mainframe.)
What do these stories have to do with one another, other than both relating to Japan? IBM has beenusing TDMF for years as part of a service offering to move data from one disk system to another.Since Sam Palmiasano took over in 2002, IBM has acquired 51 companies, 31 of them software companies.Often, these have been "successful" turning quickly profitable because IBM was already well familiar with the companies they acquire, in much the same way that husbandsare well familiar with their brides-to-be at a "double happy wedding".
So, welcome Softek! It looks like its time to celebrate again!
technorati tags: IBM, Japan, Daily Yomiuri, double happy, wedding, shotgun, Dave Barry, Softek, TDMF, z/OS, Fujitsu
Today, January 16, IBM launches its latest disk system, the DS3000 series.
There are actually three products in the DS3000 series:
The DS3200 is a 2U high, 12 drive system that attaches to servers via 3Gbps Serial Attach (SAS) interface.You can expand this to 48 drives by added EXP3000 expansion units. Here are theDS3200 specifications.
The DS3400 is a 2U high, 12 drive system that attaches to servers via 4Gbps Fibre Channel (FC) interface.You can expand this to 48 drives by added EXP3000 expansion units. Here are the DS3400 specifications.
The EXP3000 is a 2U high, 12 drive expansion drawer. It was announced back in August 2006, but is part of theoverall DS3000 series. It can be used directly with servers, but is also designed to be attached to the back of the DS3200 or DS3400 to increase capacity.Here are the EXP3000 specifications.
With this announcement, IBM provides entry-level storage at the "less-than-$5000" price point, withsupport for intermix of 10K and 15K RPM drives, and scalable up to 14.4 TB capacity.This would be ideal storage for HP, Dell, IBM System x and BladeCenter servers.
technorati tags: IBM, disk, DS3000, DS3200, DS3400, EXP3000, HP, Dell, SAS, SATA, FC
Continuing my theme on naming conventions, this time I will talk about finding information.
Industry analysts estimate that information workers spend as much as 30 percent of their working day just looking for information they need, as mentioned in an interview withJeff Teper, Microsoft.A second study of middle managers found similar results, and is discussed in Information Architecture andDM Review.
Take for example looking for information on a specific person. If you were searching for"Barack Obama" or "Lulit Shiferaw", then perhaps you will find exactly the person you werelooking for.
On the other hand, some names are more common. I've met my share of people named John Smith, Jennifer Jones,or Mark Johnson. While PEARSON does not even make the top 100 list of family names here in the US, it isstill fairly common.
People looking for me may realize that I am notHead of Australian Economics at ANZ Bank, author of the book "Don't Read This!", winner of the Killam Teaching Prize, or owner offisheries in Essex.While I do work out at the gym on a regular basis, I look nothing like thisfamous body builder. There are several others, but I pickedjust a few to make my point.
At this time, there is only one "Tony Pearson" working at IBM. A while back, there was a second,a woman who spelled her name "Toni" with an "i" at the end. Her job involved deploying storage on AIX platforms,and was similar enough job description to mine that we would get each other's mail, and sometimes not realize it was meant for the other.
Dan Santow of Word Wise talks about the difficulties of proper names withaccent marks.This would make searching worse, but many search tools can handle this, stripping off allaccent marks to make the comparisons.
But what if you wanted to leave those accent marks in, for an exact match?
technorati tags: Jeff Teper, Accenture, search, names, IBM, Barack Obama, Lulit Shiferaw, Dan Santow, Word Wise, productivity, Essex, bodybuilder, author, fisheries,Microsoft, Australian, Economics, ANZ Bank,Killam Teaching Prize