Comment (1) Visits (7450)
Continuing my theme on naming conventions, this time I will talk about finding information.
Industry analysts estimate that information workers spend as much as 30 percent of their working day just looking for information they need, as mentioned in an interview withJeff Teper, Microsoft.A second study of middle managers found similar results, and is discussed in Information Architecture andDM Review.
Take for example looking for information on a specific person. If you were searching for"Barack Obama" or "Lulit Shiferaw", then perhaps you will find exactly the person you werelooking for.
On the other hand, some names are more common. I've met my share of people named John Smith, Jennifer Jones,or Mark Johnson. While PEARSON does not even make the top 100 list of family names here in the US, it isstill fairly common.
People looking for me may realize that I am notHead of Australian Economics at ANZ Bank, author of the book "Don't Read This!", winner of the Killam Teaching Prize, or owner offisheries in Essex.While I do work out at the gym on a regular basis, I look nothing like thisfamous body builder. There are several others, but I pickedjust a few to make my point.
At this time, there is only one "Tony Pearson" working at IBM. A while back, there was a second,a woman who spelled her name "Toni" with an "i" at the end. Her job involved deploying storage on AIX platforms,and was similar enough job description to mine that we would get each other's mail, and sometimes not realize it was meant for the other.
Dan Santow of Word Wise talks about the difficulties of proper names withaccent marks.This would make searching worse, but many search tools can handle this, stripping off allaccent marks to make the comparisons.
But what if you wanted to leave those accent marks in, for an exact match?
technorati tags: Jeff Teper, Accenture, search, names, IBM, Barack Obama, Lulit Shiferaw, Dan Santow, Word Wise, productivity, Essex, bodybuilder, author, fish
Comments (2) Visits (13912)
On his blog post on preparation, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
It's safer that way.
technorati tags: Seth Godin, Swedish, saying, bad, weather, clothing, snow, Tucson, coat, check, room, IBM, block-based, disk, storage, DR550, N series, NAS, healthcare, life sciences, grid, medical, archive, solution, GMAS, cont
Comments (6) Visits (12794)
Wrapping up my week's theme of "diversity", with posts on a diverse set of topics,today I will suggest ways to spendyour time while you are walking 10,000 steps per day, as recommended by the authorsof the book "You: On a Diet".
(If you thought this was about the 10,000 steps it might take to implement a storage solution, you should switch over to IBM as your storage vendor. For example, the DS3200 and DS3400 can beimplemented in as little as SIX steps. That's pretty cool.)
Blogs like Lifehacker are an excellent resource for neat littletips and tricks to help you throughout your day, like how to use your iPod, cell phone or computer better, for example. These suggestions are based on the idea that you can walk your 10,000 steps with access to an iPod and cell phone.
Well, that's three suggestions. The next time you complain that there is no time to walk, you now have no excuse.Read More]
Comments (2) Visits (10811)
Today, January 16, IBM launches its latest disk system, the DS3000 series.
There are actually three products in the DS3000 series:
With this announcement, IBM provides entry-level storage at the "less-than-$5000" price point, withsupport for intermix of 10K and 15K RPM drives, and scalable up to 14.4 TB capacity.This would be ideal storage for HP, Dell, IBM System x and BladeCenter servers.
Last week, Paul Weinberg of eChannelLine.com asks Is this the year of the SAN (again)?So, I thought this week I would cover my thoughts and opinions on storage networking. We oftenfocus on servers or storage devices, and forget that the network in between is an entire worldon itself.
I believe Mr. Weinberg is basing this on the idea that in 2007, over 50 percent of disk will beattached over SAN, edging out the alternative: Direct Attached Storage (DAS). But perhaps 50 percentis the wrong number to focus on. In 2007, The United Nations estimates thatcities will surpass rural areas, with just over 50 percent of theworld's population. Does that make this the "Year of the City"? Of course not.
Instead, I prefer to use the methodology that Malcolm Gladwell uses in his book, The Tipping Point.(I have read this book and highly recommend it!)Gladwell indicates that the tipping point happens at the start of the epidemic, not when it is half over.Isn't it better to celebrate the sweet 16 debutante ball when young ladies have completed their years of training and preparation, and are ready to be introduced to the rest of the world, rather than after they are thirty-something, married with children.
Let's explore some of the history. Stuart Kendric has a nice 7-page summary on theHistory & Plumbing of SANs.
IBM announced the first SAN technology calledEnterprise Systems Connection (ESCON) way back in September 1990. This allowed multiplemainframe servers to connect to multiple storage systems over equipment called "ESCON Directors" that directedtraffic from point A to point B. Before this, mainframes sent "ChannelCommand Words" or CCWs, across parallel "bus and tag" copper cables. ESCON was serial overfiber optic wiring. SANs solved two problems: first, it reduced the "rat's nest" of cables between many serversand many storage systems, and second, it extended the distance between server and storage device.
For distributed systems running UNIX or Windows, the CCW-equivalent over parallel cables was called Small ComputerSystem Interface (SCSI). The SCSI command had over 1000 command words, so for its Advanced Technology (AT) personal computers (PC AT), IBM introduced a subset of SCSI commands called ATA (Advanced Technology Attachment). ATA drives supportedfewer commands, ran at slower speeds, and were manufactured with a less rigorous process. Today ATA drives are about 55 percent the cost per MB as comparable SCSI drives.
Anyone who has ever opened their PC and found flat ribbon cable with eight or sixteen wires in parallel, can understand that the same issues applied externally. Parallel technologies arelimited to distance and speed, as all the bits have to arrive at the end of the wire at approximately thesame time. Direct attach schemes with every server attaches directly to every storage device were also problematic.Imagine 100 servers connected to 100 storage devices, that would be 10,000 wires!
So, a new technology standard was developed, called Fibre Channel, ratified in 1994.The spelling of "Fibre" was intentionally made different than "Fiber" on purpose. "Fibre" is a protocol thatcan travel over copper or glass wires. "Fiber" represents the glass wiring itself.
Fibre Channel is amazingly versatile. For today's Linux, UNIX and Windows servers, it can carry SCSI commands, and the combination of SCSI over FC is called Fibre Channel Protocol (FCP). For the mainframe servers, it can carry CCW commands. Running CCW over Fibre Channel is called FICON. This convergence allows mainframes and distributed systems to share a common Fibre Channel network, using the same set of switches and directors.
We saw the use of SANs explode in the marketplace over the past 10 years, and then cool down with a series of mergers and acquisitions. Last year, Brocade announced it was acquiring rival McData, so we will be down to two major players, Cisco and Brocade.
So, IMHO, I think we are well past the "Year of the SAN".
Continuing this week's theme of New Year's Resolutions for the data center, today we'll talk about one that people don't always think about on a personal level, that is to hone your tools and skills.
A long time ago, I used to be a regular speaker at the SHARE user group conference. One of the most attended sessions was Sam Golob presenting the latest CBT Tape set of tools. Over time, this large collection of "mainframe shareware" was handed out on 3480 tape cartridges, then on CDs, and finally made downloadable off the web.Sam's main point, which I remember to this day, was that everyone who has a job should figure out what tools they use, keep those tools functioning properly, and learn to use them well.
Later, I took some cooking classes at a culinary school. Among other things, we learned:
This last point hits close to home, as many people like me have too many tools that they do not use often enough to know how to use them well. Do I really need my strawberry corer, garlic press, or a tray designed for the storage and delivery of deviled eggs?
The same could be said about software tools. What tools do you use in your job? Do you feel you know how to take full advantage of their power and capabilities?If you develop software, do you know all the features for your debugging tools? If you develop advertising or marketing materials, do you know all the features of your photo or video editing software? If you manage storage in a data center, do you know all the tools for managing your storage area network (SAN), disk systems, tape libraries, and reporting tools to identify all of your files and databases across your entire IT environment?I would not be surprised if you could replace a whole mess of tools with just one, such as the IBM TotalStorage Productivity Center.Read More]
Comment (1) Visits (11961)
Continuing this week's theme of New Year's Resolutions for the data center, today we'll talk about one that many people make for their own personal lives: staying on a budget.
Often, when faced with a tightening budgets, we try to make more use of what we already have. Tell someone they are only using 10 percent of their brain, and they immediatelybelieve you; but tell them they are only using 30 percent of their storage, and they ask for a whitepaper,magazine article, or clarification on how that percentage is calculated. I actually visiteda customer that was only using6 percent of the storage attached to their Windows servers!
So, to help those of you making data center resolutions to stay on budget, the terms to remember are "Reduce", "Reuse" and "Rec If you are going to keep on a budget, remember that storage today is 30% more expensive than storage next year. That is the average drop in both disk and tape on a dollar-per-MB basis. If there is any way to postpone giving out storage until it is actually needed, you can save a bundle of money. Timing is everything! In the event of a disaster, getting immediate replacement for disk can be very expensive, but if you can wait just two weeks, you can negotiate a better deal. I thought of this while going to the movie theatre yesterday. A "hot dog" and a bottle of water was $8.00, but if you are able to wait two hours and eat after the movie, you can get a much better meal for less. Chances are, you have unused disk capacity spread across all your storage today, but perhaps they are formatted into small LUNs. The SVC can combine the capacity, and let you carve up big LUNs at the sizes you need.This is like taking all those tiny pieces of soap in your shower and forming a new bar of soap, or taking all the crumbs at the bottom of your bread box, and making a new slice of bread. And, the virtual LUNs are dynamically expandable,so give out only the amount they need today, as it is simple to expand them to larger sizes later. When evaluating your use of tape, determine if you are making best use of the tapes you have now, and perhaps a RECYCLE (or reclamation) scheme may be in order. Fewer tapes can save money in many ways, such as reduced storage costs, and reduced courier costs to send the tapes offsite. Tape media can still be 10-20 times less expensive than disk, based on full capacity.
If you are going to keep on a budget, remember that storage today is 30% more expensive than storage next year. That is the average drop in both disk and tape on a dollar-per-MB basis. If there is any way to postpone giving out storage until it is actually needed, you can save a bundle of money. Timing is everything! In the event of a disaster, getting immediate replacement for disk can be very expensive, but if you can wait just two weeks, you can negotiate a better deal. I thought of this while going to the movie theatre yesterday. A "hot dog" and a bottle of water was $8.00, but if you are able to wait two hours and eat after the movie, you can get a much better meal for less.
Chances are, you have unused disk capacity spread across all your storage today, but perhaps they are formatted into small LUNs. The SVC can combine the capacity, and let you carve up big LUNs at the sizes you need.This is like taking all those tiny pieces of soap in your shower and forming a new bar of soap, or taking all the crumbs at the bottom of your bread box, and making a new slice of bread. And, the virtual LUNs are dynamically expandable,so give out only the amount they need today, as it is simple to expand them to larger sizes later.
When evaluating your use of tape, determine if you are making best use of the tapes you have now, and perhaps a RECYCLE (or reclamation) scheme may be in order. Fewer tapes can save money in many ways, such as reduced storage costs, and reduced courier costs to send the tapes offsite. Tape media can still be 10-20 times less expensive than disk, based on full capacity.
Happy New Year!
This year I resolve to be more consistent in my blogging, and my goal is to give you one to five entries per week, every week, based on the advice from Glenn Wolsey, Jennette Banks, and others.On some weeks, I will have a running theme, so rather than super-long entries to cover everything I can think of on a topic, make the entries short and readable. This week is a good time to review last year's "New Year's Resolutions" and to make new ones for 2007. I will discuss actions that companies can adopt for their data centers.
A common resolution is to lose weight, as in this Dilbert comic. Last year, I resolved to lose weight in 2006, and am delighted with myself that I lost eight pounds. When people ask for the secret of my success, I whisper in their ear "Eat less, exercise more." In general, people (and companies) know what to do, but just don't do it, which Pfeffer and Sutton document in their book The Knowing-Doing Gap. In my case, it involved lifestyle change: I exercised at a gym three times per week in Tucson, with a personal trainer, and revamped my diet.
Not everyone subscribes to the "eat less exercise more" philosophy. For example, Ric Watson argues in his blog that you can eat fewer calories, but eat more in actual volume, by choosing the right foods. This brings up the issues of "metrics" that most data centers are familiar with. Last year, I read the book "You: On a Diet" which explains that it is better to focus on "waist reduction" as measured in inches around your mid-section at the belly button, than "weight reduction" as measured in pounds. This year, I resolve to get down to 35 inches by the end of 2007.
The problem with measuring "weight" is that you are weighing bones, muscle and fat. A person can gain ten pounds of muscle, lose ten pounds of fat, and the scale would indicate no progress. The same problem occurs in data centers. How many TB of data do you have? Storage admins can easily tell you, but can they tell how much of this is bone (data needed for operating infrastructure), muscle (data used in daily operations that generates revenue) or fat (obsolete or orphaned data)?
We at IBM often state that "Information Lifecycle Management (ILM)" is more lifestyle change than a "fad diet". Figuring out what data you should capture in the first place, where to place it, when to move it, and when to get rid of it, is more important that just buying different tiers of storage hardware. So, for those looking to make new data center resolutions, I suggest the following actions:
Continuing this week's theme of recap for 2006, I thought it would be good to look back at the various videos made available on the internet.
Comments (3) Visits (12843)
For those of us in the northern hemisphere, yesterday was this year's Winter Solstice, representingthe shortest amount of daylight between sunrise and sunset. So today, I thought I would blog on my thoughtsof managing scarcity.
Earlier in my career, I had the pleasure to serve as "administrative assistant" to Nora Denzel for the week at a storage conference. My job was to make her look good at the conference, which if you know Nora, doesn't take much. Later, she left IBM to work at HP, and I gotto hear her speak at a conference, and the one thing that I remember most was her statement that thewhole point of "management" was to manage scarcity, as in not enough money in the budget,not enough people to implement change, or not enough resources to accomplish a task.(Nora, I have no idea where you are today, so if you are reading this, send me a note).
Of course, the flip-side to this is that resources that are in abundance are generallytaken for granted. Priorities are focused on what is most scarce. Let's examine some of theresources involved in an IT storage environment:
This last point brings me back to the concept of food, and I am not talking about doughnuts in the conference room, or pizza while making year-end storage upgrades. I'm talking aboutthe food you work so hard to provide for yourself and your family. The folks at Oxfam came up with a simpleanalogy. If 20 people sit down at your table, representing the world’s population:
Happy Winter Solstice!
technorati tags: IBM, Northern, Hemisphere, Winter, Solstice, Nora+Denzel, Oxfam, scarcity, Linux, UNIX, Windows, TSM, Tivo