A new report on data breaches was just released, showing the number of incidents and compromised records is higher than ever according to Verizon." The analysis covered 855 data security “incidents” at enterprises, consisting of 174 million compromised records, which was the second-highest total since Verizon started annual reviews in 2004. ". The authors point out that these were mostly crimes of opportunity, i.e. sloppy administration leaving big holes for hackers to jump into. Full article here: http://www.information-management.com/news/data-breaches-back-up-as-enterprises-remain-targets-of-opportunity-10023445-1.html?ET=informationmgmt:e5650:2136004a:&st=email .
It's only going to accelerate dramatically as these same hackers realize that Big Data contains a wonderland of largely unprotected data. This was a big topic of discussion at Strata Conference/Hadoop World this past week, as well as the IBM Information on Demand Conference, which I attended. Here's a thoughfully written article that provides an initial set of recommendations for securing Hadoop: http://cloudcomputing.sys-con.com/node/2416407 . One important one the authors seem to have forgotten is masking non-production data. Considering that the list came from a group of Hadoop developers, I'm not surprised. One of the biggest challenges for Big Data Governance is that many of the individuals who work in this environment may not be familiar with privacy regulations. For example- PCI requires that <any> data going offshore or into a contractor's hands needs to be obscured so as not to show the original values.
Unlike some of our vendor colleagues (no names!!), we are not just postulating and posturing about it here at IBM. You know those vendors, the ones who do nothing more than put 'Big Data' in front of their product names ;-) .. We are working quickly and furiously to deliver key privacy and security capabilities for Big Data TODAY. InfoSphere Optim now provides Masking on Demand for Hadoop with an embeddable API so that you can easily mask your non-production data with realistic yet fictitious names, addresses, SSN's, email addresses, etc: http://informationmanagementbps.tumblr.com/post/32813500881/infosphere-optim-9dot1 . InfoSphere Guardium real-time activity monitoring just announced support for Hadoop as well as a host of other security enhancements: http://www-03.ibm.com/press/us/en/pressrelease/39136.wss . And... we've had Encryption for Hadoop for quite some time with our Guardium Encryption Expert offering.
Product is one thing, but People and Process are another. It's time to start thinking about how to apply and implement security and privacy policies across the Big Data divide.