Comments (10) Visits (16743)
The technology industry is full of trade-offs. Take for example solar cells that convert sunlight to electricity. Every hour, more energy hits the Earth in the form of sunlight than the entire planet consumes in an entire year. The general trade-off is between energy conversion efficiency versus abundance of materials:
IBM has eliminated this trade-off with a record-setting breakthrough last week, demonstrating 9.6 percent efficiency [thin film solar cells using earth-abundant materials].
A second trade-off is exemplified by EMC's recent GeoProtect announcement. This appears similar to the geographic dispersal method introduced by a company called [CleverSafe]. The trade-off is between the amount of space to store one or more copies of data and the protection of data in the event of disaster. Here's an excerpt from fellow blogger Chuck Hollis (EMC) titled ["Cloud Storage Evolves"]:
Seized by the government? falling into the wrong hands? Is EMC positioning ATMOS as "Storage for Terrorists"? I can certainly appreciate the value of being able to protect 6PB of data with only 9PB of storage capacity, instead of keeping two copies of 6PB each, the trade-off means that you will be accessing the majority of your data across your intranet, which could impact performance. But, if you are in an illicit or illegal business that could have a third of your facilities "seized by the government", then perhaps you shouldn't house your data centers there in the first place. Having two copies of 6PB each, in two "friendly nations", might make more sense.
(In reality, companies often keep way more than just two copies of data. It is not unheard of for companies to keep three to five copies scattered across two or three locations. Facebook keeps SIX copies of photographs you upload to their website.)
ChuckH argues that the governments that seize the three nodes won't have a complete copy of the data. However, merely having pieces of data is enough for governments to capture terrorists. Even if the striping is done at the smallest 512-byte block level, those 512 bytes of data might contain names, phone numbers, email addresses, credit cards or social security numbers. Hackers and computer forensics professionals take advantage of this.
You might ask yourself, "Why not just encrypt the data instead?" That brings me to the third trade-off, protection versus application performance. Over the past 30 years, companies had a choice, they could encrypt and decrypt the data as needed, using server CPU cycles, but this would slow down application processing. Every time you wanted to read or update a database record, more cycles would be consumed. This forced companies to be very selective on what data they encrypted, which columns or fields within a database, which email attachments, and other documents or spreadsheets.
An initial attempt to address this was to introduce an outboard appliance between the server and the storage device. For example, the server would write to the appliance with data in the clear, the appliance would encrypt the data, and pass it along to the tape drive. When retrieving data, the appliance would read the encrypted data from tape, decrypt it, and pass the data in the clear back to the server. However, this had the unintended consequences of using 2x to 3x more tape cartridges. Why? Because the encrypted data does not compress well, so tape drives with built-in compression capabilities would not be able to shrink down the data onto fewer tapes.
(I covered the importance of compressing data before encryption in my previous blog post [Sock Sock Shoe Shoe].)
Like the trade-off between energy efficiency and abundant materials, IBM eliminated the trade-off by offering compression and encryption on the tape drive itself. This is standard 256-bit AES encryption implemented on a chip, able to process the data as it arrives at near line speed. So now, instead of having to choose between protecting your data or running your applications with acceptable performance, you can now do both, encrypt all of your data without having to be selective. This approach has been extended over to disk drives, so that disk systems like the IBM System Storage DS8000 and DS5000 can support full
Certainly, something to think about!
technorati tags: , sunlight, solar cells, electricity, indium, gallium, cadmium, copper, tin, zinc, sulfur, selenium, thin+film, efficiency, EMC, Chuck Hollis, GeoProtect, Cleversafe, governement, seizure, Facebook, terrorists, encryption, forensics, hackers, protection, performance, disk, tape
Comments (4) Visits (13262)
A long time ago, perhaps in the early 1990s, I was an architect on the component known today as DFSMShsm on z/OS mainframe operationg system. One of my job responsibilities was to attend the biannual [SHARE conference to listen to the requirements of the attendees on what they would like added or changed to the DFSMS, and ask enough questions so that I can accurately present the reasoning to the rest of the architects and software designers on my team. One person requested that the DFSMShsm RELEASE HARDCOPY should release "all" the hardcopy. This command sends all the activity logs to the designated SYSOUT printer. I asked what he meant by "all", and the entire audience of 120 some attendees nearly fell on the floor laughing. He complained that some clever programmer wrote code to test if the activity log contained only "Starting" and "Ending" message, but no error messages, and skip those from being sent to SYSOUT. I explained that this was done to save paper, good for the environment, and so on. Again, howls of laughter. Most customers reroute the SYSOUT from DFSMS from a physical printer to a logical one that saves the logs as data sets, with date and time stamps, so having any "skipped" leaves gaps in the sequence. The client wanted a complete set of data sets for his records. Fair enough.
When I returned to Tucson, I presented the list of requests, and the immediate reaction when I presented the one above was, "What did he mean by ALL? Doesn't it release ALL of the logs already?" I then had to recap our entire dialogue, and then it all made sense to the rest of the team. At the following SHARE conference six months later, I was presented with my own official "All" tee-shirt that listed, and I am not kidding, some 33 definitions for the word "all", in small font covering the front of the shirt.
I am reminded of this story because of the challenges explaining complicated IT concepts using the English language which is so full of overloaded words that have multiple meanings. Take for example the word "protect". What does it mean when a client asks for a solution or system to "protect my data" or "protect my information". Let's take a look at three different meanings:
If these three methods of protection sound familiar, I mentioned them in my post about [Pulse conference, Data Protection Strategies] back in May 2008.