When new technologies are introduced to the marketplace, it is normal for customers to be skeptical.
My sister is a mechanical engineer, so when she needs to configure a part or component, she can design it on the computer, and then use a "Rapid Prototyping Machine"that acts like a 3D printer, to generate a plastic part that matches the specifications. Some machines do this by taking a hunk of plastic and cutting it down to the appropriate shape, and others use glue and powder to assemble the piece.
But not everything is that simple. author Harry Beckwith deals with the issue of selling services and software features in his book "Selling the Invisible". How do you sell a service before it is performed? How do you sell a software feature based on new technology that the customer is not familiar with?
Our good friends over at NetApp, our technology partners for the IBM System Storage N series, developed a"storage savings estimator" tool that can provide good insight into the benefits of Advanced Single Instance Storage (A-SIS) deduplication feature.
I decided to run the tool to analyze my own IBM Thinkpad C: drive (Windows operating system and programs) and D: drive ("My Documents" folder containing all my data files) to see how much storage savings the tool would estimate. Here are my results:
WINXP-C-07G (C: drive)
Total Number of Directories: 1272
Total Number of Files: 56265
Total Number of Symbolic Links: 0
Total Number of Hard Links: 41996
Total Number of 4k Blocks: 2395884
Total Number of 512b Blocks: 18944730
Total Number of Blocks: 2395884
Total Number of Hole Blocks: 290258
Total Number of Unique Blocks: 1611792
Percentage of Space Savings: 20.61
Scan Start Time: Wed Sep 5 14:37:06 2007
Scan End Time: Wed Sep 5 14:53:51 2007WINXP-D-07H (D: drive)
Total Number of Directories: 507
Total Number of Files: 7242
Total Number of Symbolic Links: 0
Total Number of Hard Links: 11744
Total Number of 4k Blocks: 3954712
Total Number of 512b Blocks: 31610595
Total Number of Blocks: 3954712
Total Number of Hole Blocks: 3204
Total Number of Unique Blocks: 3524605
Percentage of Space Savings: 10.79
Scan Start Time: Wed Sep 5 14:21:16 2007
Scan End Time: Wed Sep 5 14:34:30 2007
I am impressed with the results, and have a better understanding of the way A-SIS works. A-SIS looks at every4kB block of data, and creates a "fingerprint", a type of hash code of the contents. If two blocks have different "fingerprints", then the contents are known to be different. If two blocks have the same fingerprint, it is mathematically possible for them to be unique in content, so A-SIS schedules a byte-for-byte comparison to be sure they are indeed the same. This might happen hours after the block is initially written to disk, but is a much safer implementation, and does not slow down the applications writing data.
(In an effort to provide support "real time" as data was being written, earlier versions of deduplication
had to either assume that a hash collision was a match, or take time to perform the byte-for-byte comparison required during the write process. Doing this byte-for-byte comparison when the device is the busiest doing write activities causes excessive undesirable load on the CPU.)
The estimator tool runs on any x86-based Laptop, personal computer or server, and can scan direct-attached, SAN-attached, or NAS-attached file systems. If you are a customer shopping around for deduplication, ask your IBM pre-sales technical support, storage sales rep, or IBM Business Partner to analyze your data. Tools like this can help make a simple cost-benefit analysis: the cost of licensing the A-SIS software feature versus the amount of storage savings.
technorati tags: IBM, Rapid prototyping, 3D printer, Harry Beckwith, Selling the Invisible, IBM, NetApp, Advanced Single Instance Storage, A-SIS, deduplication, fingerprint, hash code, EMC, flaw, MD5, Centera