My sister is a mechanical engineer, so when she needs to configure a part or component, she candesign it on the computer, and then use a "Rapid Prototyping Machine"that acts like a 3D printer, to generate a plastic part that matches the specifications. Some machinesdo this by taking a hunk of plastic and cutting it down to the appropriate shape, and others use glue andpowder to assemble the piece.
But not everything is that simple. Harry Beckwith deals with the issue of selling services and software featuresin his book "Selling the Invisible". How do you sell a service before it is performed? How do you sell a softwarefeature based on new technology that the customer is not familiar with?
Our good friends over at NetApp, our technology partners for the IBM System Storage N series, developed a"storage savings estimator" tool that can provide good insight into the benefits of Advanced Single InstanceStorage (A-SIS) deduplication feature.
I decided to run the tool to analyze my own IBM Thinkpad C: drive (Windows operating system and programs) and D: drive ("My Documents" folder containing all my data files) to see how much storage savings thetool would estimate. Here are my results:
WINXP-C-07G (C: drive)Total Number of Directories: 1272Total Number of Files: 56265Total Number of Symbolic Links: 0Total Number of Hard Links: 41996Total Number of 4k Blocks: 2395884Total Number of 512b Blocks: 18944730Total Number of Blocks: 2395884Total Number of Hole Blocks: 290258Total Number of Unique Blocks: 1611792Percentage of Space Savings: 20.61Scan Start Time: Wed Sep 5 14:37:06 2007Scan End Time: Wed Sep 5 14:53:51 2007WINXP-D-07H (D: drive)Total Number of Directories: 507Total Number of Files: 7242Total Number of Symbolic Links: 0Total Number of Hard Links: 11744Total Number of 4k Blocks: 3954712Total Number of 512b Blocks: 31610595Total Number of Blocks: 3954712Total Number of Hole Blocks: 3204Total Number of Unique Blocks: 3524605Percentage of Space Savings: 10.79Scan Start Time: Wed Sep 5 14:21:16 2007Scan End Time: Wed Sep 5 14:34:30 2007
I am impressed with the results, and have a better understanding of the way A-SIS works. A-SIS looks at every4kB block of data, and creates a "fingerprint", a type of hash code of the contents. If two blocks have different "fingerprints", then the contents are known to be different. If two blocks have the same fingerprint, it is mathematically possible for them to be unique in content, so A-SIS schedules a byte-for-byte comparison to be sure they are indeed the same. This might happen hours after the block is initially written to disk, but is a much safer implementation, and does not slow down the applications writing data.
(In an effort to provide support "real time" as data was being written, earlier versions of deduplication
had to either assume that a hash collision was a match, or take time to perform the byte-for-byte comparisonrequired during the write process. Doing this byte-for-byte comparison when the device is the busiest doingwrite activities causes excessive undesirable load on the CPU.)
The estimator tool runs on any x86-based Laptop, personal computer or server, and can scan direct-attached, SAN-attached, or NAS-attached file systems. If you are a customer shopping around for deduplication, ask your IBM pre-sales technical support, storage sales rep, or IBM Business Partner to analyze your data. Tools like this can help make a simple cost-benefit analysis: the cost of licensing the A-SIS software feature versus the amount of storage savings.
technorati tags: IBM, Rapid prototyping, 3D printer, Harry Beckwith, Selling the Invisible, IBM, NetApp, Advanced Single Instance Storage, A-SIS, deduplication, fingerprint, hash code, EMC, flaw, MD5, Centera