Information versus Data Lifecycle Management
The terms "information" and "data" are often used interchangeably in regular usage, but for the storageindustry, there are significant differences between the two, as different as "fact" from "meaning".
For example, if you are walking down the street, and see a pole with red and white stripes, the data of red and white stripes may not have much meaning, unless you recognize the information is that you are in front of a barber shop.I thought of this when someone pointed me to theStrip Generator Tool website, which can helpyou generate various stripes for use on the tiled background of web pages. (Or if you aredesigning neckties for your Second Life avatar).
Many national flags are based on simple stripes of different colors.For example, look at the national flags of France, Russia, and the Netherlands. These consist of a red, white, and blue stripe, justin different sequence and orientation.Again, the data of these colors, the width of their lines, and the way they are placed on the flag are all data, but the information they convey is significantly more than that.One person might walk right by the flag, not knowing which country it belongs to, while anotherperson might get emotional memories of their homeland.
For those of us in the storage industry, data is just binary 1's and 0's on disk and tape media, and canbe treated like packages at the post office in brown wrapping paper. Just as post office employees don't have to know the contents to ship them to the final destination, servers and storage devices don't need to knowthe informational content of the data that they process and store.
Converting information to data is easy. Let's take an example of taking a digital photo. The photo could be a picture of you and your spouseon your last vacation trip, but you would never know that from just looking at a series of 1's and 0's. For this reason, you create photo albums, you write captions below indicating where and when the photowas taken. This additional "context" is often called "metadata" or just simply "indexing".
Both the information captured (the photo in this case) and its metadata (the caption), can be storedas 1's and 0's on storage media. These bits can be compressed, encrypted, or represented in a variety of formats.
Information is copied from one data file to another. In the traditional sense, one piece of informationcould exist in the primary production copy, as well as multiple archive or backup copies. One piece ofinformation, stored on multiple copies of data. In a sense, this is similar to genetic information storedon each human being (data copy). Richard Dawkins, author of The Selfish Gene, reminds us that genes outlive individual humans. In storage, we remind people that data outlivesthe media it is initally written to, and the information outlives the initial data copy stored.
Converting data back to information is not always as simple.Not all sequences of 1's and 0's are obvious what they represent. To display a digital photo, you need to know the format the photo is in, and have an appropriate application that can display it back to something a human person can recognize. If the bits were compressed, the application needs to handlethat, or you need to de-compress the data before handing it to the application. For encrypted data,you need to have the decryption key. The process of converting a single file of data back to information is called "rendering".
One of the big problems with keeping information for long periods of time, isthat you may not have the equipment, decryption key, or applications needed to render the data back to usable information. You've kept the data, but you can't make any sense of it, as if it went through an episode of Will it Blend?
A good example is how the current version of Microsoft Office application is unable to interpret andrender data documents that were stored in WORD 1.0 format. IBM and others have developed "rendering tools" that can help decipher the bits, and bring back the information. To help address this challenge, the new Microsoft Office 2007 haschosen the OOXML format, but will continue to support some of the older legacy formats. IBM and the rest of the world are focused instead on Open Document Format (ODF) open standard. Those of usstill using older versions of Microsoft Office might need the Office 2007 Compatibility Pack.
Another way to get information from data is "data mining", an important part of "business intelligence". Here you are gleaning information notfrom individual details, but from patterns in the data, averages, statistics, totals, that havebroader meaning than individual transactions or events.
For many applications, DLM is just fine. Let's consider e-mail, for example. For most employees,deleting e-mails larger than 1 MB, after 90 days, regardless of content, is probably a reasonable DLM policy. All data is treated the same, based purely on the size and date markings on the outer brown wrapper.
For more sensitive content, DLM is not enough. The e-mails that are to or from the president of thecompany, or between top executives, or that contain certain pieces of information relevant for lawsuitsor other investigations, may not be treatedthe same as other e-mails. In this case, you need ILM technologies, managing based on the informational content of the data, and not just the size and date last referenced.
Of course, IBM supports both, and can help you decide the right solution for each workload.
technorati tags: IBM, barber pole, stripe generator, International space station, France, Russia, Netherlands, digital photography, Richard Dawkins, blender, rendering tools, metadata, encryption, OOXML, ODF, Open Document Format, Microsoft, Office, Word, ILM, information, lifecycle, management, data, DLM, e-mail, archive, context, Hu+Yoshida