On his blog post on preparation
, Seth Godin mentioned an appropriate Swedish saying:
There is no bad weather, just bad clothing.
Appropriate because it snowed here in Tucson, Arizona on Sunday evening, leaving many of us here figuring out how to drive through the stuff on Monday. In my entire lifetime, I have only witness snow down in the Tucson valley a handful of times. It got me thinking about coats, and the wonderful schemes for coat check rooms, as an analogy for data access. A lot of people ask me to compare and contrast one technology from another, say block-level virtualization from content-addressable storage, and so on, and I always try to find a good analogy to help explain things.
Let's start with the setting. It is snowing outside and people are wearing coats. When they come inside, they check their coats at a coat check room, a large room with rows and rows of racks with hangers. A coat check attendant takes your coat and puts it on a hanger, and gives you a ticket or other identifier that will allow you to retrieve your coat later. The ticket must have sufficient information to retrieve the coat quickly, rather than searching rows and rows of hangers for it.
- Block-based disk storage
You walk to the coat-check desk, tell the attendant to hang your coat on a specific hanger, say hanger number 387. When you come back, you ask for the coat on hanger 387. The coat-check attendant knows exactly where hanger 387 is, and is able to retrieve it quickly. Most disk systems use this approach, including IBM SAN Volume Controller and DS family of disk systems.
- Name-based disk storage
You walk to the coat-check desk, tell the person the name that you want to call your coat. An empty hanger is located, and a list of coat names, with their associated hanger number, is then kept. Upon return, you ask for your coat by name, and the coat-check attendant looks up the hanger number to match, and retrieves your coat. This is the scheme used by the IBM System Storage DR550, N series for NAS storage, and the IBM Healthcare and Life Sciences Grid Medical Archive Solution (GMAS).
- Content-addressable storage (CAS)
You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a "hash code" from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. Upon return, you provide the hash code you were given, and the coat-check attendant looks up the hanger number to match, and retrieves your coat.This is the scheme used for some non-erasable, non-rewriteable storage, such as the EMC Centera.
IBM invented hash codes in 1953 as a way to speed up searches. For example, if you want to look up a word in the dictionary, knowing the first letter of the word makes it much quicker, because you can thumb directly to that section. A hash code was intended to give a more even distribution, so that if a million words are stored in a "hash code dictionary" then you would calculate the hash code, then look up only that section of words associated with that specific hash code number.
A problem arises when you generate "hash codes" for storage. It is possible for two different pieces of data to resolve to the same hash code. When an application tries to write a piece of data, and it resolves to a hash code that already exists, that is called a collision. One response is to either compare the incoming data to the data that is already stored, confirm they are identical, but that can be time consuming. The other response is to just assume they are identical, and reject the secondary copy, a process often referred to as "de-duplication".
What's the chance of getting a collision for data that is really different? Let's take for example the famousBirthday paradox. Suppose the coat check room assigned the hanger based on your birthday (month and day). How may coats before you run the risk of having two people turn in coats with the same birthday? After only 23 people, the likelihood is 50%. At 60 people, it goes up to 99%.
For this reason, IBM does not offer content-addressable storage. For non-erasable, non-rewriteable storage, the IBM System Storage DR550 requires the application to give each object a name, and that name is then used to storage the data, eliminating the possibility that data might accidently be thrown away.
It's safer that way.
technorati tags: Seth Godin, Swedish, saying, bad, weather, clothing, snow, Tucson, coat, check, room, IBM, block-based, disk, storage, DR550, N series, NAS, healthcare, life sciences, grid, medical, archive, solution, GMAS, content-addressable, CAS, EMC, Centera, hash code, collision, de-duplication, birthday, paradox
Modified by TonyPearson
Continuing my week's theme on travel, conferences, and Japan, I saw two items in the newsthat seem to follow a common theme.
- According to the "The Daily Yomiuri", a local Japanese paper, "double happy weddings" arebecoming more and more popular in Japan. These would be called "stotgun" weddings in the US, butin Japan, couples pay extra to have a wedding between the fifth and seventh month ofpregnancy. As Dave Barry would say, I am not making this up. 27% of couples in Japan got married while or after pregnant. The logic is that they can celebrate both events with one ceremony. Many couples believe that the primary purpose of marriage is to have children, and somethat fail to have children suffer terrible anguish or divorce. Waiting untilbeing pregnant helps ensure the couple will be "successful" in this regard.
- IBM acquires Softek, a software company that develops a product called Transparent Data MoverFacility (TDMF) to move mainframe data from one disk system to another, while applicationsare running. This can be used, for example, to move data from outdated disk systems to IBMdisk systems. This is not to be confused with IBM's archive and retention software partner,Princeton Softech.
Softek is the software spin-off of Fujitsu (a Japanese computer hardware manufacturer). Fora while, Fujitsu made IBM-compatible mainframe servers, but was not successful at developingits own system software, relying heavily on IBM for this. Unable to compete against IBM, it stoppedmaking mainframe servers, but continues making other kinds of hardware equipment.
With TDMF, the process of moving data is simple. The software runs on z/OS and intercepts all writes intendedfor a source volumes on the old array, and re-directs a copy to destination volumes on the new device.Systems can run with old and new equipment side by side for a few weeks, with the new devicestaying in-sync with the old. When the client is ready to cross over, the systems arepointed to the new disk, and the old disk systems are detached and removed from the sysplex.
Afraid that installing TDMF will mess with your applications? IBM Global Technology Services (GTS)is able to roll-in a separate mainframe, move the data, than disconnect it along with the old storage.
(For customers running Linux, UNIX or Windows on other platforms, IBM offers SAN Volume Controller (SVC).While SVC is not marketed as a "data migration device", per se, it does have this capability.Many clients were able to cost-justify purchase of an SVCto move data from old storage to new in similar fashion to how TDMF works on the mainframe.)
What do these stories have to do with one another, other than both relating to Japan? IBM has beenusing TDMF for years as part of a service offering to move data from one disk system to another.Since Sam Palmiasano took over in 2002, IBM has acquired 51 companies, 31 of them software companies.Often, these have been "successful" turning quickly profitable because IBM was already well familiar with the companies they acquire, in much the same way that husbandsare well familiar with their brides-to-be at a "double happy wedding".
So, welcome Softek! It looks like its time to celebrate again!
technorati tags: IBM, Japan, Daily Yomiuri, double happy, wedding, shotgun, Dave Barry, Softek, TDMF, z/OS, Fujitsu
Modified by Jeff Antley
Wrapping up my week's theme of "diversity", with posts on a diverse set of topics,today I will suggest ways to spendyour time while you are walking 10,000 steps per day, as recommended by the authorsof the book "You: On a Diet"
(If you thought this was about the 10,000 steps it might take to implement a storage solution, you should switch over to IBM as your storage vendor. For example, the DS3200 and DS3400 can beimplemented in as little as SIX steps. That's pretty cool.)
Blogs like Lifehacker are an excellent resource for neat littletips and tricks to help you throughout your day, like how to use your iPod, cell phone or computer better, for example. These suggestions are based on the idea that you can walk your 10,000 steps with access to an iPod and cell phone.
- Learning a language
- ... or refreshing yourself on a language you might not have spoken in a while. In addition to formal audio-based lessons from Pimsleur, there are podcasts you can get for various languages. In preparation for my upcoming trip to Japan and China, I have been listening to JapanesePod101.com and ChinesePod.com which have quick lessons that complement the formal training.This Lifehacker postindicates there are similar ones for French, Spanish, Italian, and Brazillian Portuguese.
- Practicing your presentation
- Walking while practicing your 30-60 minute presentation would be good exercise.MicroPersuasion explains how to turn your iPod into the ultimate PowerPoint accessory, and this article in PlayListmag.com providesthe steps to get a PowerPoint presentation onto your iPod. I did this, and the slides are found underPhotos->Photo Library. The images are small, but heck, they are your charts and you should recognize themwell enough to remind yourself what to say on each slide.Also, I am able to record my practice sessions using MP3 Recorder and listen as I page through each slide. (In theory, you can use your iPod to present your slides to your audience, plugging the iPod directly into the laptop projector, instead of a laptop, using cables available at your local Apple store, and use the iPod controls as your forward/backward remote.)
- Working your To-Do list
- You can download your to-do list to your iPod. I use BackPackIt from 37 Signals. You can sign up for a free account, or upgrade to a paid account, and have anamazingly simple browser-based tool to develop your to-do lists, one for each project or aspect of your life. Oncedone, the list can be emailed to you as plain text. Enable your iPod as an "external disk drive" and copy this text file to your NOTES directory on the iPod drive. Voila! You can now read your to-do list! (I could also send it to my cell phone, using email@example.com, but I find the iPod easier to read and navigate)
Think of something to add? Send an email from your cell phone. With BackPackit, I can send an email that will directly add my text as a note or todo list item. On my phone, this is simply sending a text message to "500" with text like:
"firstname.lastname@example.org todo # buy bread".The hash mark (#) separates the subject line from the body of the email, and this is how Backpackit knows its a todo item or a note. If you pre-program the huge email address in advance on your phone, then it isn't as bad as it looks. It will be on your packpackit page the next time you log in.
Well, that's three suggestions. The next time you complain that there is no time to walk, you now have no excuse.
technorati tags: walking, LifeHacker, MicroPersuasion, IBM, DS3200, DS3400, PowerPoint, iPod, Backpackit, tmomail
Today, January 16, IBM launches its latest disk system, the DS3000 series.
There are actually three products in the DS3000 series:
The DS3200 is a 2U high, 12 drive system that attaches to servers via 3Gbps Serial Attach (SAS) interface.You can expand this to 48 drives by added EXP3000 expansion units. Here are theDS3200 specifications.
The DS3400 is a 2U high, 12 drive system that attaches to servers via 4Gbps Fibre Channel (FC) interface.You can expand this to 48 drives by added EXP3000 expansion units. Here are the DS3400 specifications.
The EXP3000 is a 2U high, 12 drive expansion drawer. It was announced back in August 2006, but is part of theoverall DS3000 series. It can be used directly with servers, but is also designed to be attached to the back of the DS3200 or DS3400 to increase capacity.Here are the EXP3000 specifications.
With this announcement, IBM provides entry-level storage at the "less-than-$5000" price point, withsupport for intermix of 10K and 15K RPM drives, and scalable up to 14.4 TB capacity.This would be ideal storage for HP, Dell, IBM System x and BladeCenter servers.
technorati tags: IBM, disk, DS3000, DS3200, DS3400, EXP3000, HP, Dell, SAS, SATA, FC
Continuing this week's theme of New Year's Resolutions
for the data center, today we'll talk about one that many people make for their own personal lives: staying on a budget.
Often, when faced with a tightening budgets, we try to make more use of what we already have. Tell someone they are only using 10 percent of their brain, and they immediatelybelieve you; but tell them they are only using 30 percent of their storage, and they ask for a whitepaper,magazine article, or clarification on how that percentage is calculated. I actually visiteda customer that was only using6 percent of the storage attached to their Windows servers!
So, to help those of you making data center resolutions to stay on budget, the terms to remember are "Reduce", "Reuse" and "Recycle".
- When people come to request storage, are they being reasonable about what they need today, or are they asking for what they might need over the next three years? They might need 50GB, but they ask for 100GB, in case they grow, and a year later, you find they have only 15GB of data on it. On the flipside, the person asks for what they need but some storage admins give out more, just so they don't have to be bothered so often when growth happens. Finally, I have seen this formalized into fixed size LUNs, all the disk is carved into big huge 100GB pieces, so if you need 20GB, here's one big enough with plenty of room to grow.
If you are going to keep on a budget, remember that storage today is 30% more expensive than storage next year. That is the average drop in both disk and tape on a dollar-per-MB basis. If there is any way to postpone giving out storage until it is actually needed, you can save a bundle of money. Timing is everything! In the event of a disaster, getting immediate replacement for disk can be very expensive, but if you can wait just two weeks, you can negotiate a better deal. I thought of this while going to the movie theatre yesterday. A "hot dog" and a bottle of water was $8.00, but if you are able to wait two hours and eat after the movie, you can get a much better meal for less.
- A lot of companies buy new storage because their existing storage isn't fast enough, or doesn't have the latest copy services. This can easily be solved with an IBM SAN Volume Controller (SVC). The SVC can virtualize slower, functionless storage, and present to your application hosts virtual disks that are faster, and with all the latest disk-to-disk copy services like FlashCopy, Metro Mirror, and Global Mirror.
Chances are, you have unused disk capacity spread across all your storage today, but perhaps they are formatted into small LUNs. The SVC can combine the capacity, and let you carve up big LUNs at the sizes you need.This is like taking all those tiny pieces of soap in your shower and forming a new bar of soap, or taking all the crumbs at the bottom of your bread box, and making a new slice of bread. And, the virtual LUNs are dynamically expandable,so give out only the amount they need today, as it is simple to expand them to larger sizes later.
- Of my 13 patents, the first will always be my favorite, on a function called "RECYCLE" for the Data Facility Storage Management Subsystem Hierarchical Storage Manager (DFSMShsm) product, which is now a component of the IBM z/OS operating system. Basically, tapes could contain hundreds or thousands of files, such as backup versions or archive copies, and these expired on different dates. As a result, a tape would be written100 percent full, and then over time, decrease in valid data to 80, 60, 40, 20 until it hit 0 percent. In some cases, a single filecould hold an entire tape hostage. RECYCLE was able to read the valid data off tapes that were perhaps less than 20 percent full, and consolidate them onto fewer tapes. As a result, a whole bunch of tapes could be returned to the scratch pool, and reused immediately for other workloads. This also helps in moving to newer, higher capacity cartridges, such as the new 700GB cartridge that IBM co-developed with FujiFilm.(This RECYCLE function exists in our IBM Tivoli Storage Manager software, as well as our Virtual Tape Server, but is called "reclamation" instead, to avoid confusion on searches.)
When evaluating your use of tape, determine if you are making best use of the tapes you have now, and perhaps a RECYCLE (or reclamation) scheme may be in order. Fewer tapes can save money in many ways, such as reduced storage costs, and reduced courier costs to send the tapes offsite. Tape media can still be 10-20 times less expensive than disk, based on full capacity.
technorati tags: IBM, storage utilization,RECYCLE, Tivoli Storage Manager, SAN Volume Controller, SVC, tape, disk, FujiFilm, DFSMS, HSM, DFSMShsm, Virtual Tape Server, brains