After spending the last six months getting an in-depth education on Data Privacy, culminating with achieving Certified Information Privacy Manager certification (CIPM) via the IAPP, I have come to the conclusion that I am a Data Privacy Pragmatist.
Quite simply, that means recommending that you stop all your grandiose plans to identify ALL your sensitive data before actually doing something to protect it.
It's really quite easy. 1-2-3:
1. Which data?
Start with your customer data. Do you have a web site? Collect consumer information? Then you have Personally Identifiable Information. Your legal counsel, compliance, or Chief Privacy Office will be able to guide you as to which information is considered PII.
Next, where’s your HR information? Health information? Benefits? That’s another treasure trove. Again, ask your compliance friends for a list of data elements. Use technology such as InfoSphere Discovery and Guardium to automate the process of identifying specific elements.
2. Who has access?
Minimize! Minimize data access to a ‘need to know’ basis. Use role security, authentication and monitoring to ensure that only those who need data, get data. For test environments, there is no good reason to use real data. Optim Masking and Guardium Redaction do a good job of obfuscating critical information in test environments. In Big Data and analytic environments, consider anonymization techniques such as Optim Semantic Masking to hide individual identities while preserving the utility of the data.
3. Think Big- Start Small- Deliver Fast- Speak One Language- Use One Framework- Start Now
Pick a first target and just get started! Use a tool such as InfoSphere Business Glossary to share a set of terms and privacy policies that are easily understood and implemented in a cohesive fashion. Select a Privacy Framework. Assess your Maturity Level (well you’re probably immature if you are reading this). Perform a Privacy Impact Assessment. Conduct a Data Inventory. Decide on Mitigating Measures. Conduct a Pilot. Design a set of scalable processes that are easily adaptable. Make some mistakes and learn from them. Rinse and repeat.
It seems like the world wakes up right around this time and realizes another year is upon us.. time to gear up and get going! A resolution is only a resolution if you do something about it--- making a habit takes time and repetition.
May I suggest a resolution to learn something new each week, or if you are really motivated, each day? Here's something new for today, out of the realm of technology...I hope you enjoy it.
The famous 19th century Spanish artist, Francisco Goya, made this sketch at the age of 82- likely a self-portrait. Aside from the impactful image of an old man in his waning days, what is most compelling to me
is the title, "Aun aprendo". The English translation is "I am still learning". Since I studied Goya back in my undergraduate school days, this saying has stuck with me. No matter what your age, it's never too late to learn.
Here's a free learning opportunity that I think worthwhile... the 2013 Big Data, Integration, and Governance Forum series. We'll be hosting these sessions around the U.S. in the coming months, starting in February. The good news- it's FREE.
Read on for details...
No data is an island, especially Big Data. Hear how organizations are leveraging Big Data to open the door to a world of possibilities. Hear from worldwide leaders on how analytics, information integration and governance can help you discover insights and optimize decisions fast enough to impact your business in real-time. For more details and to register, visit: http://events.techtarget.com/bigdata_infogov .
I'm especially looking forward to Forrester's Keynote: " Drive Business Innovation: Finding Treasure in the Data Attic"
A description of the session:
"Governing data and managing data just got a whole lot more complex when we talk about big data. Rather than retrench, embrace the new reality. The old adage of "business first" still holds true. Data management fundamentals are still core to your strategy. What needs to change is how you think about and apply data governance and build an architecture that is flexible and elastic to meet an ever changing and evolving business. See what it takes to go from a data strategy focused on data collection, to one that drives business innovation. Gain valuable insight on:
• Turning data from an artifact, to an asset, into currency
• Identifying what it takes to pursue a big data strategy
• Using data governance as an accelerator, not a brake
• Understanding why context is much more important "
I'm sure there will be a lot to learn.
What are you learning?
Recently, I have noticed a growing trend in how organizations secure and manage information- that is, they seem to fall into 1 of 3 categories:
1. Security of sensitive data is mandated by IT, without regard to classification- they just do it. Kind of like brushing your teeth, it's good data hygiene. A good approach, but not necessarily driven by risk and compliance and/or business policy and likely not cost effective.
2. Stick your head in the sand... Let's do what we have to, and hope for the best... Translation... let's do as little as possible and if we get audited or have a breach, then the fines/penalties are likely less than the cost of implementation. Ouch... yes, I've heard this more than once and frankly, it's not true.. I have also seen organizations that continually put off encrypting key financial and employee information due ot the outage required to do so, despite being out of compliance for years. When a major breach or audit hits, then quickly, compliance moves ahead. Or at least, one would hope. In the case of one firm, it took two breaches several years apart, from two disgruntled ex-employees who decided to reveal executive salaries, before the company finally took action.
3. Information Security as policy tied to business requirements, policies, and compliance mandates, regardless of data source or type
. HADOOP or not, it really does not matter. You've probably guessed by now this is the place to be. Policy comes from the top down, and needs to be driven by the business, for the business.. And now, there is more on the line than just avoiding fines. The SEC has issued firm guidelines about disclosure of cyber-security risks.
Organizations that ignore the recommendation could be hit with securities fraud or other lawsuits against directors and officers alleging failure to properly disclose and manage risks and/or breach of fiduciary duty. I expect that moving forward, this guidance may become more stringent and move toward enforced mandate. A very well written article on the topic by attorney Richard J. Bortnick is here, for you folks who would like to further understand the legal implications or perhaps share with your colleagues: http://www.dandodiary.com/2012/09/articles/d-o-insurance/guest-post-cyber-security-and-data-breaches-why-directors-and-officers-should-be-concerned/
. ( Note that the sponsor of the web site sells liability insurance for corporate officers- I do not necessarily endorse the firm- I am providing this as useful background information ).
Recommendation- If we all treated sensitive data as if it were our very own personal data, it would help us to align in this direction.
Take a moment to ponder where your data ends up and how it is treated. If we treat your corporate data as if it were our own, I think a large percentage of us would consider doing things a little differently!
Here's a comical yet serious example of what not to do....
This year, we had yet another lovely and amazing annual Macy's Thanksgiving Day parade. Having grown up in NY, I've always been excited to watch the parade. The balloons marching down Broadway are a sight to behold, with much pomp and circumstance, and many marching bands, Broadway performances, and good old-fashioned floats. Now, no parade would be complete without confetti, though some may argue that it's wasteful and expensive, since it requires resources to clean up. Imagine watching the parade from your perch on Central Park West, enjoying the festivities, only to look down and realize the confetti at your feet is chock full of your personal information. Name, address, Social Security number, Driver's License, bank routing number, etc. Unfortunately this is exactly what happened. Apparently Nassau County, NY confidential police data, including references to Mitt Romney's motorcade during last month's presidential debate at Hofstra University, ended up as confetti, all over the upper parade route. More info is here: http://www.wpix.com/news/wpix-confidential-confetti-at-thanksgiving-parade,0,4718007.story
. Apparently it's not the first time. At this year's Superbowl victory parade, entire sheets of paper were thrown out of office buildings that included sensitive data, such as Social Security numbers and personal medical information. What is it with NY and parades that makes people want to give up their personal data? Read more here... http://gothamist.com/2012/02/08/good_job_social_security_numbers_th.php
I hope your personal data does not end up in the same place.
Until next time...
It's been a busy month, as the year winds to a close. We've made many exciting announcements around governance for Big Data, especially in the area of security. Just like with 'little data', there is no need to boil the entire ocean and lock everything down... well... depending on who you are. It does, however, require you to look at what information you keep, and then create and implement policies to protect that sensitive data. Where to start?
On 12/12/12, you can find out more and also celebrate the number 12... Who says Big Data security isn't quite cooked yet? Find out more ...join us for a free Webcast on 12/12/12 with Forrester and IBM experts: "Why Big Data Doesn't Have to Mean Big Security Challenges
. Our keynote IBM speaker is not just some marketing talking head (and I mean that tongue in cheek to all my marketing colleagues), he is a seasoned security expert who has done hands-on implementations and knows of what he speaks.
Many of you, especially some of my retail clients, have already locked down their IT infrastructures for the year. Now is a great time to take a step back and think about your own IT security policies and how you plan to extend and implement them for Big Data. If you don't think your organization is working with Big Data yet, a good place to look is your marketing organization. Chances are they are and have been for some time, right under your nose. What kind of customer data are they keeping? Are they creating customer profiles? Enriching them with information gleaned from social media? Keeping sensitive financial information? You may in fact, have a new area of risk quietly developing. Something to think about.... I hope you can join the webcast, it promises to be a very interesting discussion.
It's only going to accelerate dramatically as these same hackers realize that Big Data contains a wonderland of largely unprotected data. This was a big topic of discussion at Strata Conference/Hadoop World this past week, as well as the IBM Information on Demand Conference, which I attended. Here's a thoughfully written article that provides an initial set of recommendations for securing Hadoop: http://cloudcomputing.sys-con.com/node/2416407
. One important one the authors seem to have forgotten is masking non-production data. Considering that the list came from a group of Hadoop developers, I'm not surprised. One of the biggest challenges for Big Data Governance is that many of the individuals who work in this environment may not be familiar with privacy regulations. For example- PCI requires that <any> data going offshore or into a contractor's hands needs to be obscured so as not to show the original values.
Unlike some of our vendor colleagues (no names!!), we are not just postulating and posturing about it here at IBM. You know those vendors, the ones who do nothing more than put 'Big Data' in front of their product names ;-) .. We are working quickly and furiously to deliver key privacy and security capabilities for Big Data TODAY. InfoSphere Optim now provides Masking on Demand for Hadoop with an embeddable API so that you can easily mask your non-production data with realistic yet fictitious names, addresses, SSN's, email addresses, etc: http://informationmanagementbps.tumblr.com/post/32813500881/infosphere-optim-9dot1
. InfoSphere Guardium real-time activity monitoring just announced support for Hadoop as well as a host of other security enhancements: http://www-03.ibm.com/press/us/en/pressrelease/39136.wss
. And... we've had Encryption for Hadoop for quite some time with our Guardium Encryption Expert offering.
Product is one thing, but People and Process are another. It's time to start thinking about how to apply and implement security and privacy policies across the Big Data divide.
Welcome to my Blog- Big Data Governance Meets Reality
Here, you'll find a wide variety of opinions, information, and resources devoted to Big Data Information Governance. While there are many who think that BDIG is just an extension of Information Governance, there are some important differences:
1. Size and Scale: lots of data means lots of processing. It's not just about increased volume/variety/velocity as many industry pundits have stated. Traditional architectures have not been designed to govern data on HADOOP file systems, nor to work in a Map/Reduce processing framework. That being said, it is a major challenge for products that didn't scale very well in their original enviroments to process this much data at the speed required, especially when much of the need is real time, such as data streaming from devices. Solutions need to be either adapted or re-invented to accommodate massive scale.
2. Different architecture, few 'real' products- See #1 above. Just putting 'Big Data' in front of a product name does not mean it was designed for this purpose. Caveat emptor!
3. Processing methods- force 'classification in reverse'. In other words, we may not even know that the data is sensitive until it is actually processed, so classification is only possible in real-time or after the fact. Again, new solutions, new methods are required for governing privacy. How will Metadata repositories evolve
4. Data 'surprises' with re-identification- when public data is combined with corporate data. Increased legal risk and exposure, as evidenced by recent legal actions against companies like Spokeo and Skout.
Some interesting links follow:
Big Data Governance: A Framework to Assess Maturity: http://ibmdatamag.com/2012/04/big-data-governance-a-framework-to-assess-maturity/ . This articile discusses how the original Information Governance Framework can be applied to Big Data. In it, the authors suggest a list of questions and considerations for those going down the Big Data path.
Do you trust your Big Data? How do you assess its accuracy, especially if you are using publicly available data in conjunction with your organization's data to drive critical business decisions?
There is a web site that shall rename nameless, it advertises itself as 'Not your grandma's phone book'. It's a good thing, since it's a great example of misuse of Big Data and of public information in general. Said web site consolidates public information on most everyone and then posts it without regard to its accuracy, It includes information such as your name, address, phone number, real estate value, age, marital status, and much more. As an example, it lists my dad as in his 90's and living in Florida when in fact, he died in back in 1979. Whle my mom does now live in Florida, he never did. Meanwhle she since remarried and was subsequently widowed. Relying on an information consolidator such as this to augment customer information could indeed be misleading and costly.
Jeff Jonas has some interesting ideas around Privacy By Design. http://www.e-comlaw.com/data-protection-law-and-policy/hottopics_template.asp?id=Jonas
Metadata, Classification and Discovery:
How do you automate classification especially for machine-generated data where volumes are huge and tried and true methods may not scale? Take a look at the work we're doing with the infogov community and join in... http://www.infogovcommunity.com . High level summary- define the high level ontology and then use crawlers/automation to classify and tag the masses.
Do you have to cleanse everything? Well... um... no! Read here and give your opinion. I've shared mine....
I decided to make a separate blog entry for this one since it strikes me as something so critical, it needs to be called out.
Recently, Forbes published an e-article entitled, "The Future of Big Data: Crawling Over Broken Glass": http://www.forbes.com/sites/gilpress/2012/08/30/the-future-of-big-data-crawling-over-broken-glass/
. I must have read this article three times before I decided to comment. In essence, it points out how Big Data is another technology deja vu, with business going off and doing Big Data projects on its own, only to later 'crawl back' to IT for help, over broken glass no less . Yikes! I pointed out that this repeating pattern is really a failure of IT to partner with the business. Here's my comment in its entirety:
Are We Innovation-Challenged?
I agree, it is déjà vu all over again. Yes, we’re seeing another re-visit of the Client/Server computing adoption, as well as internet, as well as just about every technology trend you can think of, including the recent mobile computing trend (iPads anyone?). Having the business as the forefront of adoption, only to come back to IT for help, is indicative of a greater failure- that is the failure of Information Technology organizations to partner with and understand the needs to the business.
This is nothing new- I recall back in 1986 as an MBA student at NYU, many case studies and discussions around the topic. Traditionally, IT has functioned as an infrastructure organization- providing the basic plumbing, and ‘running the shop’. I’m not saying there aren’t world class organizations that partner more closely with the business. I’ve worked with many of them. Those are also the same organizations that treat data as an asset, and govern it accordingly. The result is higher shareholder value, and we see this again and again in studies with corporate executives.
If IT does not partner with the business, then the business will continue to take matters into its own hands. Crawling over broken glass is no picnic. Then again, innovators take risks- risks bear rewards. If it means a little glass, so be it. What’s the alternative when IT is not there to partner?
Big Data is the next chapter of that innovation journey. My take on this is that IT needs to re-think how it helps the business to innovate. When it comes to Big Data, the first step is finding an actual BUSINESS PROBLEM to solve, and to leverage those examples to show value to the business, while also learning the infrastructure implications and how it fits into your existing architecture and IT practices. Having worked with well over 1,000 customers during my career, I’ve seem way too many Proof of Concept or Pilot projects that focus solely on how the technology works, with no real applicability to the evaluator’s environment or business needs. As a result, organizations not only waste valuable resources, they are also ill equipped to support or roll out real projects.
Yes, there are clearly infrastructure implications, such as security, integration with existing applications and processes, and especially governance. The governance model doesn’t necessarily get thrown out the window. Like anything else, it will adapt, though the basic tenets will also remain- govern what has value or risk to the organization. Doesn’t mean you have to become a ‘Big Data Prison Guard’. It does mean it may be time to take a look at the value you provide to your business.
What do you think?
Being a voracious e-reader, I find myself stunned every day at the volume of Big Data journalistic and vendor trash floating out there. While there is some compellingly useful information, most of what I read is posturing or basic 'What is Big Data' regurgitated over and over. Yes, I have an opinion and yes, I tend to posture too. My goal, though, is to share what I consider useful, practical information and perhaps add relevant comments where needed. Note that at times, I may include generic Governance materials, since they do apply to data Big and small.
Following are a couple of reading suggestions for your consideration... Happy Reading!
1. Governance over Critical Data Elements: http://ibmdatamag.com/2012/08/governance-over-critical-data-elements/
. Once again, Sunil Soares reminds us why boiling the ocean isn't a good strategy when we're looking for that one gold doubloon in the shipwreck that is on our radar. (OK it's "International Talk Like a Pirate Day" so I couldn't resist) . The same approach applies to Big Data-- govern what has value (or risk)-- start small and build from there.
2. Big Data Quality: Persistence vs. Disposability-- What do you cleanse? http://www.information-management.com/blogs/big-data-quality-persistence-versus-disposable-10023136-1.html
. Michele Goetz of Forrester does a credible job of explaining when to cleanse Big Data and pointing out that much of what is Big Data does not need to be cleansed. I also wrote a comment pointing back to the value of the data and that it may need to be standardized to some extent when performing functions like householding.
Database security is a funny thing-- most everyone 'should' be implementing it and we assume they are, yet those who are not are highly likely NOT to admit it, which I believe calls into question any data security survey published to date where respondents are asked to answer questions on whether or not they secure sensitive data. Over the past several years, I have noticed a trend in database security, which has become more like a large wave. 'Way back when' in the open systems world, it was perfectly acceptable to perform an installation with Administrator or even root privileges, use generic userid's that everyone shared, sometimes even post them with sticky notes on the side of a computer monitor, and give the DBA full access to the entire system, data included. As the number and impact of breaches has increased, so has the awareness of how important it is to implement security controls on the database. Awareness, yes, implementation....... well not so much! It's even more shocking to me, since databases are by far the largest source of compromised records. According to the latest Verizon Business Risk Team Survey for 2012,
while only 7% of breaches involve a database server, they account for a whopping 96% of compromised records and 98% for larger organizations. (Full study is here: http://tinyurl.com/btrasnu
I have also observed that organizations who rely on mainframes tend to have much cleaner security and housekeeping rules, even around things like change control. The irony of it is that those systems are also much harder to breach! How many hackers do you know who can write assembler code? The clients I work with (who shall rename nameless) vary in their approaches and level of controls used. At one end of the spectrum, I had a former client who used the same password for all DB2 instance owners across all systems. That password was also shared by a large number of DBA's and developers both. Very convenient, yes.... also very risky! At the other end, is a company who has completely segregrated the DBA work from any systems work. Even creation of the database is completely locked down. Good controls, yes, though they did slow us down for a couple of days while we waited for the database creation request. This organization is also in the process of rolling out Database Activity Monitoring across the enterprise as well as Data Privacy (Masking) for Test Data. On top of that, they have very strong support for Information Governance from the executive level on down. That company, I might add, had previously experienced a major breach, actually two, so are now actively engaged in ensuring proper controls are implemented.
Breaches can be expensive, embarrassing, and costly in terms of corporate and brand reputation. Just like with physical security, there is also no silver bullet or single way to implement database security. Case in point- a jewelry store. A high end jewelry store will employ many methods to secure its valuable assets. First, perimeter security- making sure there are no hiding places, shrubbery, or easy access methods like open windows or roof vents; Second, one or more alarm systems or motion detectors around the permises, or even on specific items or cases. Third, security personnel at the door, and inside the store. Forth, security practices such as having the salesperson only take out a certain number of items at a time. Lastly, security cameras to record activity. If you think of database monitoring in this way, you can see that multiple methods and practices must be employed.
In a recent article entitled "Establishing a Strategy for Database Security Is No Longer Optional", Jeffrey Wheatman of Gartner brings database security to prime time. In this paper, he does a good job of laying out the different aspects of database security, along with some basic best practices recommendations. I especially appreciated how he defines the different categories: Administrative Controls, Preventive Controls, and Detective Controls ( though I would rename Detective to Investigative Controls). If you would like a copy of this paper, follow this link: https://www14.software.ibm.com/webapp/iwm/web/signup.do?lang=en_US&source=sw-infomgt&S_PKG=500021641&S_CMP=Guardium_Gartner_data_security_not_optional_analyst_lib
One area that Wheatman did not cover, which would seem obvious and critical, is the need for process controls, even extending to hiring practices and sharing of credentials. We all like to think that IT personnel are law-abiding citizens. However, there has been more than one case where DBA's and other personnel with criminal records went and 'borrowed' sensitive customer data, incuding credit cards and account numbers. The good news is there is now a certain level of maturity in the database marketplace to provide the needed multiple levels of security for organizations to lock down their valuable data assets. Most every RDBMS now supports roles and privileges, auditing, non-administrative installation, as well as column/row level security. Combined with preventive controls such as encryption and masking, investigative controls such as database activity monitoring, and process controls, organizations can now design and implement a stringent database security practice that meets the needs of business and regulatory requirements.
Data Security: http://www.darkreading.com/
Database Privacy and Masking: http://www-01.ibm.com/software/data/optim/protect-data-privacy/
Database Monitoring, Security, Encryption: http://www-01.ibm.com/software/data/guardium/
Next, do you know where your sensitive data is located?