Recap of Tweetchat: "Getting Control of Data in Big Data Era"
svisser1 2700018UK9 Visits (4494)
Today I spent an hour taking part in the TweetChat at Big Datamgmt focused on governance to avoid a data landfill: http
Well, it went too fast for me to actually be a contributor, so I was participating as a reader / listener. This kept me busy enough since by the end we had generated a fair about of Big Data ourselves: 647 tweets, 180 users with reach of 136,229 & 1,506,585 impressions.
and facilitators / moderators:
There were 8 questions posed over the hour, but I'm only posting the first 4 here.
Q1 In this Big Data era, do traditional concepts data quality, data governance & data stewardship even apply?
A summary of the answers:
craigmullins Big Data refers to datasets whose size, type and speed of creation make it impractical to process and analyze with traditional tools. That Big Data definition comes from wikibon; see http
dvellante My belief is that ingest process & analysis of data changes with big data.
BigDataAlex Yes, I think they apply. Our clients are very concerned about these issues and it does apply.
jeffreyfkelly Absolutely, but vastly more complex.
Natasha_D_G Traditional concepts are even more critical in Big Data era especially in data governance.
craigmullins But, of course data quality, data governance and data stewardship SHOULD apply in the age of Big Data Management.
dvellante You still need clean and common policies for data taxonomies; but the unstructured and semi-structured data texture requires some new thinking and technology. Specifically ideas around function shipping, name value pairs, Hadoop, etc - applying traditional concepts to new model.
Dmattcarter In order for Big Data to be enterprise-ready, it needs to include those traditional concepts.
jeffreyfkelly The challenge is applying DQ and governance to high velocity data - hard enough with "traditional" data, ie CRM, ERP.
craigmullins Failing to apply these concepts will result in poor data quality. Analytics performed on bad quality data produces bad results.
BigDataAlex I think transparency is important too in this era of Big Data and how we govern. I would suggest Big Data Ethics manager.
BTRG_MikeMartin IG concepts apply to Big Data even more so as the issues solved by Information governance are only exaggerated.
furrier Data quality has to take on the idea that it will be moving around different systems/APIs.
craigmullins Yet there are issues and adaptations that will be required as we apply data quality, data governance and data stewardship to Big Data Management.
BigDataAlex Love the challenge on high velocity data....algorithms in streams.
jeffreyfkelly Big Data is experimenting with data sets, while governance is applying policies that sometimes restrict experimentation.
BTRG_MikeMartin You can’t make good business decisions on bad data. http
Natasha_D_G Data quality is an issue as "94% biz believe some of their customer/prospect info is inaccurate".
BTRG_MikeMartin Data governance is critical in the Big Data management era as it makes small problems bigger. You need data quality to enable Biginsights http
furrier Data as a resource for applications; ownership of data is important to individual and/or company.
BigDataAlex In health care sector, orgs are combining medical ethics with their CIOs.
Aarti_Borkar Governance is even more important with Big Data as the security and trust is a bigger business issue now.
dvellante In part this is a discussion around the balance between data being an asset an a liability - good DQ is important for both.
searchCIO Metadata practices are gaining momentum as companies tackle Big Data. http
Q2 With data at unprecedented speed/volume, how can data quality measures be applied in time for analysis?
A summary of the answers:
craigmullins With data quality, cleansing can occur as humans eyeball the data - most raw Big Data is not eyeballed. In some cases (e.g. medical devices, automated metering, etc.) only rudimentary cleansing (if any) may be needed. At least as long as the meters are calibrated and maintained properly
BigDataAlex Real-time analytics is critical. We love Streams. The right algorithm at the right time.
Natasha_D_G Trust = Word we try to avoid. @Aarti_Borkar: Governance is even more important with Big Data as security & trust bigger biz issue.
BTRG_MikeMartin To deal with Big Data, speed, and volume: be proactive by starting Big Data Management across the enterprise now & maintain http
Aarti_Borkar Data Quality for Big Data can be handled right upfront before starting Big Data analysis
BigDataAlex A next-generation of KPIs for quality vs. quantity are being implemented to separate quality from quantity in real-time.
furrier Data quality is about the context of the application & what users experience for each use case is not always the same.
jeffreyfkelly Machine learning is required to improve data quality for Big Data - velocity too high for human methods IMHO
nenshad Variety of algorithms include semantics
zacharyjeans Ask your Big Data well crafted questions. Sloppy questions lead to sloppy answers.
craigmullins Speed + volume make data quality challenging…
searchCIO Data Quality is essential to master Big Data Management http
BTRG_MikeMartin Start now on data quality because if you don’t have it in now Big Data only magnifies data issues http
Natasha_D_G Excellent question especially given social media data and its 18 minute life span
jeffreyfkelly Also with Big Data, volume of data can sometimes smooth over anomalies in data quality.
Aarti_Borkar Data quality should also be handled as the results of the analysis are merged back into the reporting marts.
BigDataAlex The right analytics at the right time against the systems of systems integration.
dvellante Perspectives from a former CIO on the importance of data quality http
nenshad It’s all about the data first
dvellante In my view you can't deal with Big Data quality unless you can automate the classification of data at the point of creation.
Kari_Agrawal How exactly do we clean the data when it has no structure?
BTRG_MikeMartin You can’t make good decisions and enable business biginsights without high data quality.
furrier Dirty data equals poor user experience. I wrote about it in 2009 re: twitter facebook & social data http
Aarti_Borkar Data quality should be handed as part of data integration as the Information Server customers do - its the same with Big Data.
Q3 How do data governance policies apply when the point of Big Data is to explore novel use cases?
A summary of the answers:
craigmullins Finding novel uses of data does not diminish the need for data governance policies.
Natasha_D_G True, but still need boundaries.
BTRG_MikeMartin Exploring Big Data still requires trusted data so you must secure and govern even more so. http
craigmullins The novel uses need to be documented as part of the data governance policies.
BigDataAlex The right policy at right time. I think you can agility with accountability.
craigmullins Keeping in mind that even under ideal circumstances data governance policies can be difficult to enact.
tmustacchio Big data isn't just for novel new business cases - it can also vastly improve value in existing ones - i.e. R&D, cust service.
craigmullins Consider non-intrusive data governance; see this article by my friend Bob Seiner http
craigmullins Seiner states: data governance refers to the administering (formalizing) of discipline (behavior) around the management of data.
craigmullins And data governance is an on-going process; it should formalize what already exists + address opportunities to improve.
jeffreyfkelly There is a need to set up boundaries but give analysts freedom to explore Big Data.
furrier Innovation will not come from regulations but creative developers to play with data -#slipperyslope
Q4 How does Big Data change data retention policies, ie, deciding what data to keep vs dispose?
A summary of the answers:
tomjkunkel Formal Data Destruction processes minimize the growing data landfill and need to be incorporated into Data Lifecycle Mgmt.
dvellante: Still must be able to defensibly delete data. you may not want WIP data hanging around - too much of a risk.
BTRG_MikeMartin Big Data is not immune to the laws of information economics: http
BigDataAlex Focus on workflow, business process, optimization. There is no set answer. Filtration - distillation
BTRG_MikeMartin Velocity of Big Data means current best data is changing rapidly, you want decisions on the best info.
BTRG_MikeMartin: It is important to have Big Data Management framework for good business outcomes inc. policy, security, ILM & quality.
craigmullins Data is retained for internal + external reasons... Internal because the org needs it for business – external because the law demands it.
tomjkunkel Isn't there also a need for Data Entrepreneurs (A business perspective with a knack for data)?
craigmullins You may choose to retain more data for Big Data Management analytics but be careful because data once retained is discoverable during court trials.
furrier Big data complicates data retention policies - we have shadow IT and now "shadow data" or what I call "dark data".
Natasha_D_G Big Data can extend data retention esp in R&D. Pharmas can leverage old research to accelerate new research.
jeffreyfkelly This is a major issue: with hadoop you can now store all data inexpensively - not possible before and new challenge.
BTRG_MikeMartin NO still too costly.
Kari_Agrawal If we see the huge amount of IP packets flying around, can we process those packets to get something meaningful?
craigmullins There are over 150 different regulations (at the local, state, national, and international levels) that impact data retention.
Aarti_Borkar Retention is about storing what the business needs later vs everything - that core concept does not change with Big Data.
BigDataAlex Do we need to store everything? Can we, should we?
craigmullins No, no, and no to that last series of questions!
Natasha_D_G Data hoards say keep all! Fear of losing critical info.
jeffreyfkelly Nothing worse than looking for data you know you had only to remember you threw it away!
Aarti_Borkar Defensible disposal of data becomes harder if multiple copies are made as part of Big Data analytics.
craigmullins MT @Aarti_Borkar: Defensible disposal of data becomes harder if... hence the need for #datagovernance policies!
furrier We all want data retention but who owns it after it's retained..will a data marketplace economy develop?
TheSocialPitt Storage is a huge challenge, especially in cases with many streaming video feeds, e.g. defense.
jeffreyfkelly Keep in mind regulations haven't caught up w the technology - industry needs to be proactive on this issue or the government will.
Aarti_Borkar Big Data allows for pattern searches and trends in retained data that was not easy to do earlier.
That is a lot of information! I hope you can follow the discussions. I tried to clean up a little bit and hope that I didn’t change any content from the participants.
To find out more about managing big data, join IBM for a free event: http