Yesterday I posted a recap of the first 4 questions from the TweetChat that took place yesterday at Big Datamgmt focused on governance to avoid a data landfill:
" Getting Control of Data in Big Data Era". My blog wouldn't take all of the data that I collected, so here are questions 5-8.
Q5 Once you decide what data to keep, how do you make sure it goes to the right systems and people?
jeffreyfkelly From a developer perspective, Big Data app dev tools need to improve, make it easier to deliver insight to business users.
BTRG_MikeMartin You must increase control of wasteful data even with Big Data Management, archive/retire & dispose http://t.co/Ta361ASBkP http://t.co/904FjTm2yA
dvellante Again it's a matter of information liability and asset management. Which is more critical to your organization? Cutting risk or mining opportunities?
BTRG_MikeMartin Big Data doesn’t change retention. Keep the data you need, get rid of the rest . You can’t afford to keep it all. http://t.co/Ta361ASBkP
dvellante This is a metadata problem / opportunity
craigmullins Policies, procedures, automation and education are needed to ensure that Big Data makes its way to the right systems + people?
BTRG_MikeMartin Big Data approach needs to include improved business outcomes which requires people process & technology working in harmony.
BTRG_MikeMartin You need to instrument processes to not only govern but make the best use of valuable data.
furrier Data is code in the new paradigm of new apps & services - lots of issues so developer create & data can learn & be smart.
furrier The integration of data create new datasets - future is smart data and learning data - data is code.
jeffreyfkelly Exactly, and new data sets could be highly sensitive - need governance RT @furrier: the integration of data create new data sets
Betharonoff Data as a commodity already exists, so economy is only a few steps down the road.
furrier Meta data practices will be impacted in this data quality and data-as-code concept.
furrier One aspect of this chat is business competitiveness in integrating data as code into business lifecycle and processes.
BigDataAlex Metadata tags are aligned to role based systems - automated systems.
BTRG_MikeMartin If you don’t improve processes with Big Data management and create better business outcomes your Big Data initiative isn’t a success.
Kari_Agrawal When and how do we decide to discard the extremely old data? Or do we retain it as in Data Warehouse?
craigmullins You need policies and automated procedures based on retention requirements.
PPB13 How do practitioners overcome emerging skepticism in the marketplace? http://t.co/x0pHJCDcfN
@BTRG_MikeMartin: You need to instrument processes to not only govern but make the best use of valuable data
BigDataAlex Moving "beyond search"
TheSocialPitt ALWAYS start Big Data project by thinking+planning. More data does not fix bad process.
BTRG_MikeMartin What processes to improve: ediscovery, ECM, Data Governance, Data Security, data retention, and data quality.
IBMbigdata Who decides "best"? RT @BTRG_MikeMartin: You need to instrument processes to not only govern but make best use of valuable data
joycetompsett Data quality has to take on the idea it will be moving around different sys/APIs Big Data management > critical for security #RSAC
BTRG_MikeMartin - Without the right tools data retention with Big Data could be a nightmare
skenniston RT @furrier: We all want data retention but who owns it after it's retained..will a data marketplace economy develop?
BTRG_MikeMartin That's where determining business value, legal and regulations come in typically only 30% of data.
Kari_Agrawal How exactly do we begin to classify data in case of Big Data?
craigmullins MT @PPB13: How do practitioners overcome skepticism... <-- by continuing to do work that adds value to your company
dvellante @BigDataAlex yes re: search - it's sometimes used as a 'blunt instrument'
Aarti_Borkar Deciding what data to retain needs to start with business policies defined upfront - its not an "on the fly" decision.
praxsozi RT @jeffreyfkelly: Q5 Big Data requires rethink of business processes - this is NOT a trivial exercise
Natasha_D_G Culture also plays role RT @jeffreyfkelly: Q5 Big Data requires rethink of business processes - this is NOT a trivial exercise
TheSocialPitt Data antique dealers RT @furrier: We all want data retention but who owns it after it's retained.will data marketplace develop?
tomjkunkel @BTRG_MikeMartin Integrated effort with Legal, Finance, Sales, Marketing with IT serving through best architecture.
BTRG_TomNestor The process must lead to better data which should drive better business opportunities.
BTRG_MikeMartin Big Data is not immune to the laws of information economics: http://t.co/Ta361ASBkP #CGOC
Q6 How does Big Data affect data lifecycle management? Does big data introduce new stages to the info lifecycle?
Summary of top answers:
BigDataAlex Yes, new stages - stages we haven't even imagined yet. Data needs to update itself into authoritative sources.
craigmullins One issue that arises is "How can you create realistic test data for testing Big Data systems and applications?"
jeffreyfkelly Yes, but we are just starting to understand Big Data lifecycle mgt - need to build out best practices.
BTRG_MikeMartin Big Data might not create new stages in life cycle management, but certainly with new domains we have to extend the data lifecycle to new platforms.
jeffreyfkelly I disagree - I think new stage of LCM includes emergence of new data sets created from integration of other data sets and then yet new data sets created from integrating new new data sets, and on and on and on.
Aarti_Borkar Big Data makes handing the lifecycle of data a far more complex problem than before.
Natasha_D_G Can u say more? RT @Aarti_Borkar: Big Data makes handing the lifecycle of data a far more complex problem than before
Aarti_Borkar Big Data does not create new stages - just new ways to apply the existing stages to different use cases.
Dmattcarter What are some of those new use cases?
Aarti_Borkar Test Data and Privacy for Big Data is critical - as we bring in more data potentially creating a bigger security threat.
BigDataAlex Is there a new data management paradigm emerging?
craigmullins A new paradigm may indeed be emerging.
BTRG_MikeMartin RT perhaps a refined one
TheSocialPitt One new stage = 'ephemeral'.
craigmullins Let's not burden Big Data with things little data has not yet mastered.
craigmullins Sometimes we forget that - in practice - many orgs do not follow a lifecycle, practice data governance, ensure quality, etc.
craigmullins So yes, Big Data should do these things, but it is not failing if it does not.
tomjkunkel Big Data Management requires identification & deletion of ROT-redundant, obsolete & trivial data, which reduces storage & eDiscovery.
StevenDickens3 What role does the community see for the original Big Data system of record the mainframe?
BTRG_MikeMartin Consider impacts of eDiscovery, governance, security and #ILM on Big Data stores how do we move traditional methods to Big Data management.
BigDataAlex Many organizations can only afford to store 20 copies of the same data - they are looking for authoritative against process.
jeffreyfkelly definitely, data sprawl becomes an issue
Kari_Agrawal How do we deal with redundancy in case of Big Data?
StevenDickens3 What is the collective view of centralised data vs multiple federated copies ?
TerraEchos Could some Big Data mgmt stages be the elimination of stages? Using data/ data analysis without constraint and eliminating steps.
Aarti_Borkar Masking test data is essential to Big Data development: what the enterprise considers private needs to always be privatized.
craigmullins My next two Tweets mentioned some of them. Not saying Big Data shouldn't just that our stds should not be too high
craigmullins @BigDataAlex Yes #littledata concepts apply to Big Data... but many orgs still struggle with managing little data
tomjkunkel @Kari_Agrawal Destroy it! I can provide insight on best practices
Dmattcarter Pretty intense data quality and Big Data conversation going on around Big Datamgmt chat!
BTRG_MikeMartin There is already a data problem with #smalldata carrying it over to Big Datamgmt. Too costly to delete all data that has no value.
BTRG_MikeMartin You must increase control of wasteful data even w Big Datamgmt, archive/retire & dispose: http://t.co/Ta361ASBkP http://t.co/904FjTm2yA
Q7 Are new tools and platforms required to manage Big Data and the new dimensions of the data lifecycle?
Summary of top answers.
craigmullins Tools for performing advanced analytics on Big Data – though not new to the industry – will be new to many organizations.
BigDataAlex Yes. We need new tools, platforms, and systems....it is happening. Calling for massive innovation - love #DataAsCode.
BTRG_MikeMartin @craigmullins It's called defensible disposal http://t.co/Ta361ASBkP
craigmullins Hadoop-based products will need to be augmented with mission-critical DBMS capabilities to become de rigueur.
craigmullins But I think DB2 (and other RDBMS products) could be extended with Big Data capabilities before that happens.
BTRG_MikeMartin Flexibility & scalability of Big Data platforms will themselves assist in helping Big Datamgmt security & controls...
BigDataAlex We need DigitalDNA - anticipating the Internet of Things - World Wired Web.
Aarti_Borkar It’s a mix of new tools and enhancing existing tools. The core solution does not change it morphs
StevenDickens3 All depends where data resides today and whether the current platform/tools are fit for purpose, if yes why move or retool?
tomjkunkel Legacy storage assets can't handle the high availability,low latency applications and need to be displaced.
jeffreyfkelly Yes, a major topic at #strataconf is making Big Data enterprise ready -need better mgt, data gov, DQ capabilities.
Aarti_Borkar Key innovation is required to ensure that both traditional and#big data are uniformly governed.
BTRG_MikeMartin Infosphere Optim helps you get control of structured data to feed only the good into Big Datamgmt: http://t.co/Y5Jniunn6N
jeffreyfkelly And don't forget security - #RSAC - must keep Big Data secure
craigmullins Which brings up regulatory compliance... another big issue
Aarti_Borkar Big Data gov starts with a uniform set of data classification and policies that cover ALL data. Metadata is the magic here.
craigmullins If the Big Data contains PII then all the regulations that apply to PII still apply - doesn't matter how big the data set is.
BigDataAlex Does this spill over into machine learning? Can we reduce dimensionality of data through associative memory?
BTRG_MikeMartin Yes innovation & big ideas as well as change our paradigms.
Betharonoff Interesting query RT @BigDataAlex A7: Can we reduce dimensionality of data through associative memory?
Q8 How does Big Data impact data stewardship? Who “owns” particular data in a big data environment?
BigDataAlex Great question - ownership is beginning to blur - standard licensing models for data are being challenged.
craigmullins All data is owned by the company, whether it is Big Data or not…
jeffreyfkelly Ah, but is it? social data, market data etc.
BTRG_MikeMartin Internal ownership of Big Data while beyond traditional areas should still be based on business value, compliance or legal hold.
craigmullins Of course proper data governance policies need to be enacted by the corp to confer #datastewardship and ensure proper treatment
BTRG_MikeMartin Without good data stewardship & Big Datamgmt it will difficult to unlock the value of big data: http://t.co/hGJ3QkTiJf
Aarti_Borkar Ownership of replicated data is the original biz owner- governance of that data is still their problem.
jeffreyfkelly This is a really hard one, again new biz processes informed by Big Data will impact who owns the data.
Aarti_Borkar Stewardship does not change just because a new copy of the data was created.
craigmullins True, but some Big Data is all new.
craigmullins The word "own" is always so troublesome, isn't it?
BTRG_MikeMartin Yes it needs to be well defined .
Aarti_Borkar @craigmullins - Oh so right! .. think "Responsible for".. is better than "own"...
BigDataAlex If DataAsCode, then if DataAsCode is viral, can it be controlled? Do we want it to be controlled? What does ownership mean?
BigDataAlex How does OpenSource apply to our Data?
BTRG_MikeMartin For more Big Datamgmt resources Data Privacy and Security: http://t.co/UL0VNCiivP
That is a lot of information! I hope you can follow the discussions. I tried to clean up a little bit and hope that I didn’t change any content from the participants.