Today marks the first day of the new blog by David Birmingham called “Netezza Underground”. As you may remember, IBM purchased Netezza not that long ago. To refresh your memory, here is a description in the Press Release, IBM Completes Acquisition Of Netezza:
Netezza data warehouse appliances bring analytics directly into the hands of business users within every department of an organization such as sales, marketing, product development and human resources. The simplicity of deploying Netezza appliances makes the technology ideal for the needs of high-performance analytics, requiring minimal administration and IT skills, and enables clients to run complex data queries within days of deploying the solution.
David’s first blog entry introduces the need for the blog and looks at the counter-intuitive nature of the product's internals. In future entries, he’ll answer questions about the technology as a seasoned expert. If you’re interested in business analytics, data warehousing, and big data, this is definitely a blog that you want to subscribe to.
So, who is David? He’s not an employee of IBM or Netezza. David is a consultant working for Bright Light Consulting and is a huge fan of the Netezza product and architecture. A few years ago he also wrote a Netezza book:
Looking for more information about Netezza? Here are some additional resources that I’ve found:
- IBM Red Guide: The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics
- Netezza Community – links to David’s blog, plus other bloggers, including Jim Baum, Justin Lindsey, Phil Francisco, Brad Terrell, Shawn Dolley, Dai Clegg, and Patricia Colter.
- Enzee Universe – User Conference taking place in Boston June 20-22.
I’m looking forward to learning more about Netezza via David’s blog.
Great articles by amazing contributors. Make sure you read every article in the latest edition of the DM Magazine. I’ve read the article about Roger Sander’s contribution to the DB2 Certification Exam Development team and the announcement of his latest certification guide, which is a supplement. I’ve bookmarked many of the other articles so I can read them over the coming week. I hope you find something interesting and useful in this edition!
Taming Big Data - The realm of huge information flows is governed by new rules. What changes in the multi-petabyte, microsecond response, multimedia world? And how will Big Data change your job?
by Lisa Stapleton
It’s not just huge volumes. It’s not just microsecond response times, it’s not just incredible variety. It’s all three. Dealing with “nice” data, stored in well-defined data warehouse structures and handled sometimes months or years after it is first collected, is an increasingly smaller part of the job of data management. Here’s how folks are learning to handle the rest of it—the unstructured data, and the data that constantly changes as customers take actions and applications process input.
Tuning SQL at the Senate: E Pluribus Unum - When the SQL flow in the United States Senate went from static to dynamic, the database team had to see many queries, but tune them as one.
by Ives Brant
“High-performance government” may sound like an oxymoron, but for the teams that monitor and tune databases for the U.S. Senate, high performance is absolutely necessary to handle the huge number of queries that hit the Senate’s financial management system. InfoSphere Optim Query Workload Tuner software running on IBM DB2 z/OS helps the U.S. Senate DBAs and software specialists reduce query response time from 20 to 30 seconds to less than 2 seconds when they generate new statistics for a workload of SQL queries.
The Man to See About Certification - The guru of DB2 certification tests talks about how he puts them together—and how they can help your career
by Howard Baldwin
Q&A with Roger Sanders who has helped IBM develop 17 DB2 certification exams, more than any other individual. Want to know why certification is important? Roger Sanders is the man to ask. We talked to him about how the tests are put together, how they can help a DBA’s career, and—oh yes—about the certification test he failed.
Get Your Head in the Clouds - Data Pros are adopting cloud computing concepts to offer databases as a service - easing management burdens and sending users to cloud nine.
by Jin Zhang
Tactics and strategies for moving away from traditional provisioning models in which DBAs function solely in reactive mode—responding to user requests in nonstop “database, clone, database, clone” activities—towards a database-as-a-service, or DBaaS (pronounced as “D-Baa-S”), model employing cloud computing practices.
To offer DBaaS on the cloud, enterprise IT departments must undergo a process of constructing and managing a private enterprise “data cloud”—a platform consisting of storage hardware, virtual images, database schemas and more—and making that cloud available to users through a services interface.
Smarter is… Making Watson Smarter…Faster
by Howard Baldwin
A deeper look at the Jeopardy-winning system and the technology that enables it to process information in near real-time.
IBM Information Governance Council - Information Governance Community growing worldwide - IGC draws 1500 members, solicits best practices ideas
by the Magazine Team
The wisdom of crowds isn't an oxymoron. Large groups have an uncanny ability to get the right answer - just see James Surowiecki's book on the subject for proof. So it's not unreasonable to say that the Information Governance Council (IGC) is getting smarter with each passing month.
Data Architect - Securing DB2 Data - Grant privileges to a what, not a who
by Robert Catterall
These days executives are more concerned than ever about unauthorized access to data entrusted to their organization. Their fears are justified: a recent study showed that a third of those polled would quit doing business with a company they perceived to be guilty of a data security breach.
Distributed DBA - Using the DB2 Problem Determination Tool
by Roger E. Sanders
Sooner or later, every DBA encounters problems. Consequently, a skill that every DBA must possess is the ability to perform a logical, systematic search of a database system for the source of any problems that might arise. The DB2 Problem Determination Tool can help.
Programmers Only - New Order by Information: Part 2. The Impact of using the RANDOM index option on ORDER BY sort avoidance
by Bonnie Baker
In the last issue, Bonnie Baker began a series of columns concerning new aspects of ORDER BY. This column - Part 2 - covers the impact of using the CREATE / ALTER INDEX RANDOM order option on sort avoidance.
Performance tuning on Informix - Fastest Informix DBA Contest III - Performance tuning an OLTP system
by Lester Knutsen
Performance tuning is a continuous process for every DBA. Advanced DataTools Corporation conducted three fastest Informix DBA contests to highlight and learn what goes on in this process.
Imagine What You Could Do - Free your mind, and your business will follow
by David Buelke
Big Data, Big Time - Series data, warehouse acceleration, and 4GLs
by Stuart Litel
There are only a few shows remaining in the rest of this season’s collection of DB2Night Show episodes. Mark your calendar for these up coming events and take a look at previous shows that you may have missed. The education opportunity to you is huge.
#53 - Don't flip out! How to stop your query access plans from flopping! (aka DB2 HINTS!)
John Hornibrook, IBM STSM, Manager Query Optimization
Friday May 20: 11am ET, 90 minutes
Did you know you can "force" the DB2 LUW optimizer to choose a specific access strategy of your choosing? The secret is out...
Special guest John Hornibrook from the IBM Toronto Lab presented in Episode #52 where he talked about best practices for query tuning. In this show he’ll teach you how to exploit DB2 LUW Optimizer "hints", or, maybe more properly, how to tell the DB2 optimizer how to execute your queries.
Note that this show is scheduled for 90 minutes so that John can share all of his incredible presentation with you and have time for questions.
#z04 - What's new from the optimizer in DB2 10 for z/OS?
Terry Purcell, IBM SVL, SQL & Optimization
Monday May 23: 11am ET, 60 minutes
DB2 10 for z/OS is no exception to the goal of delivering incremental query optimization enhancements to the world’s most respected cost based optimizer. With skip-release migration supported, it is expected that many more customers may adopt DB2 10 in the coming year - so understanding the performance enhancements can be critical for those customers. Terry Purcell will share the insight uncovered from beta customer and early adopters, and provide the motivation for each enhancement including:
- “Safe” query optimization
- Improvements to complex OR and IN list processing
- RUNSTATS management and performance improvements
- And more!
#54 - DB2 9 LUW Core Engine Data Movement Utilities Overview with Oracle database comparisons
Burt Vialpando, Executive IT Specialist, IBM
Friday June 3 - 11am ET, 60 minutes
Special guest Burt Vialpando from IBM presented in Episode #22 which was the 15th most downloaded show in 2010. He talked about Comparing DB2 LUW and Oracle, Architectures and Administration. In this episode, Burt will give you an overview of each of the DB2 core engine data movement utilities: Load, Import, Export (with db2look), db2move, ADMIN_COPY_SCHEMA, ADMIN_MOVE_TABLE, db2relocatedb, restore from backup and split mirror. These will be compared to each other so that you can get a good idea of when to use and not to use each of these. To help round out the discussion, a comparison to the Oracle database core engine data movement utilities will be made to these DB2 utilities.
#z05 - B2 V10 Migration Planning and Early User Experiences
John Campbell, IBM Distinguished Engineer
Monday June 6: 11am ET, 60 minutes
In this episode, host Klaas Brant and guest John Campbell will introduce and discuss early experiences and lessons to be learned with DB2 10 for z/OS. It will provide quick hints on preparing for and executing the migration, performance expectations and opportunities, virtual storage constraint relief, some instrumentation changes, use of 1MB real storage frame size, use of hash access, value of rebind, etc. Key topics covered will include:
- Lessons learned
- Surprises and pitfalls
- Provide hints and tips
- Address some myths
- Provide additional planning information
- Provide usage guidelines
- Provide positioning on new enhancements
#55 - DB2 LUW Multi-Temperature Data Management
Kate Kurtz and IBM Smart Analytics Best Practices Team members
Friday June 17: 11am EDT, 60 minutes
Did you know that data has temperatures? Can data run a fever? What happens if your data catches a cold? Or rather, turns cold from hot? What can and should you do? What techniques are available for optimizing performance in databases where some data is more popular (hot) than other data?
In this episode of The DB2Night Show, various IBM experts will help us answer these challenging questions! Kate Kurtz along with team members from the IBM Smart Analytics Systems Best Practices team will share with us their expertise!
#56 Season #2 Finale! - Data Warehouse Performance Tuning!
Kate Kurtz and IBM Smart Analytics Best Practices Team members
Friday June 24: 11am ET, 60 minutes
Kate Kurtz and her IBM Smart Analytics Systems Best Practices team return again to share with us IBM recommendations and suggested best practices for optimizing performance of Data Warehouse Databases. If you're trying to make DPF or multi-partition databases run queries as fast as possible, then you should sign up for this show!
Hard to believe another season is over already! Thanks Scott Hayes and Klaas Brant for Entertaining, Informing, and best of all, EDUCATING us! We look forward to see what amazing ideas you come up with next season.
PS Click here for the recorded shows & commentary.
What do you know about IBM InfoSphere BigInsights?
BigInsights brings the power of Apache Hadoop to enterprises. This means that by using InfoSphere BigInsights, you can manage and analyze data in ways that were previously unimaginable. You an extract deep insights that can lead to greater efficiencies, value-add services, and opportunity for transformation.
That’s not much to go on, but what I can tell you is if you’re interested in learning more, take advantage of the special one-day class on Sunday, October 23, 2011 in conjunction with the IBM Information on Demand 2011 Conference in Las Vegas, Nevada. Registration for the IOD Conference is NOT required to attend this Sunday class.
This one-day training course is for system administrators and developers responsible for managing Apache Hadoop. Not only will you learn about this new technology, but after you’ve taken the course, you’ll be prepared to take and the BigInsights Technical mastery test, which you can take for free at the IOD conference.
- Big Data Overview
- Introduction to Hadoop and HDFS
- HDFS Administration
- Introduction to Map / Reduce
- Setup of an Hadoop Cluster
- Managing Job Execution
- Overview of JAQL
- Data Loading
- Overview of workflow engine
More about the Mastery Exam: Test M97: IBM BigInsights Technical Mastery Test v1
This proctored technical mastery test examines IBM BigInsights knowledge regarding the ability to identify, manage and close sales opportunities.
The test is applicable to sales representatives who demonstrate sales and technical knowledge of the IBM BigInsights product and targets the technical sales professional who can deliver a comprehensive business solution to customers through solution identification, product differentiation, and competitive positioning.
This technical mastery test meets one of the technical requirements for SVP (Software Value Plus) and counts as a skill towards Advanced and Premier PartnerWorld Membership levels. It is strongly advised that the candidate complete the recommended education prior to attempting this technical mastery test.
- Number of questions: 39
- Time allowed in minutes: 75
- Required passing score: 76%
- Test languages: English
- Section 1 - BigData Overview (23%)
- Section 2 - Introduction to Hadoop (18%)
- Section 3 - HDFS Administration (15%)
- Section 4 - Setup of a Hadoop Cluster (18%)
- Section 5 - Managing Job Execution (8%)
- Section 6 - Overivew of JAQL (18%)
Other Pre-Conference Education available at IOD this year:
Other blog entries about IOD Events:
Understanding Big Data: Analytics for Enterprise Class Hadoop and
by Dirk deRoos, Chris Eaton, George Lapis, Paul
Zikopoulos, Tom Deutsch
Big Data represents a new era in data exploration and utilization, and IBM is
uniquely positioned to help clients navigate this transformation. This Flashbook
reveals how IBM is leveraging open source Big Data technology to deliver a
robust, secure, highly available, enterprise-class Big Data platform.
The three defining characteristics of Big Data—volume, variety, and
velocity—are discussed. You’ll get a primer on Hadoop and how IBM is 'hardening'
it for the enterprise, and learn when to leverage IBM InfoSphere BigInsights
(Big Data at rest) and IBM InfoSphere Streams (Big Data in motion) technologies.
Deployment and scaling strategies plus industry use cases are also included in
this practical guide.
- Learn how IBM hardens Hadoop for enterprise-class scalability and
- Gain insight into IBM's unique in-motion and at-rest Big Data analytics
- Learn tips and tricks for Big Data use cases and solutions
- Get a quick Hadoop primer
This book is about Big Data: but you already knew that. Big Data
is a Big Deal! This book’s authoring team is well seasoned in
traditional database technologies; and all recognized one thing: Big Data is an
inflection point when it comes to information management technologies. In fact,
Big Data is going to change the way you do things in the future, how you gain
insight, and make decisions (the change isn’t going to be a replacement, rather
a synergy and extension). Recognizing this inflection point, the author team
decided to write this book to help you get quickly up to speed on this
technology and to show you the unique things IBM is doing to turn the freely
available open source Big Data technology into a Big Data Platform;
there’s a major difference and the platform is comprised of leveraging the open
source technologies (and never forking it) and marrying that to enterprise
capabilities provided by a technology leader that understands the benefits a
platform can provide.
By the time you are done reading this book, you’ll have a good handle on the
Big Data opportunity that lies ahead, a better understanding on the requirements
that ensures you have the right Big Data platform (as opposed to just
technology), and have a strong foundational knowledge as to the business
opportunities that lie ahead with Big Data and some of the technologies
PART 1: The Big Deal about Big Data
Chapter 1 – What is Big Data? Hint: You’re a Part of it Every Day
Chapter 2 – Why Big Data is Important
Chapter 3 – Why IBM for Big Data
PART II: Big Data: From the Technology Perspective
Chapter 4 - All About Hadoop: The Big Data Lingo
Chapter 5 – IBM InfoSphere Big Insights – Analytics for “At Rest” Big
Chapter 6 – IBM InfoSphere Streams – Analytics for “In Motion” Big Data
Chris Eaton, B.Sc., is a worldwide technical specialist for
IBM’s Information Management products focused on Database Technology, Big Data,
and Workload Optimization. Chris is also an international award winning
speaker, having presented at data management conferences across the globe, and
has one of the most popular DB2 blogs located on IT Toolbox at: http://it.toolbox.com/blogs/db2luw.
Dirk DeRoos, B.Sc, B.A. is a member of the IBM World-Wide Technical
Sales Team, specializing in the IBM Big Data Platform. Dirk joined IBM eleven
years ago, and has a Bachelor of Computer Science and a Bachelor of Arts (Honors
English) from the University of New Brunswick.
Thomas Deutsch, B.A, M.B.A., serves as a Program Director in IBM’s Big
Data business. Tom has spent the couple of years helping customers with Apache
Hadoop, identifying architecture fit, and managing early stage projects in 200+
George Lapis, MS CS, is a Big Data Solutions Architect at IBM's
Silicon Valley Lab. He has worked in database software area for more than 30
years. He was a founding member of R* and Starburst research projects at IBM's
Almaden Research Center in the valley, as well as a member of the compiler
development team for several releases of DB2.
Paul C. Zikopoulos, B.A., M.B.A., is the Director of Technical
Professionals for IBM Software Group’s Information Management division and
additionally leads the World Wide Database Competitive and Big Data SWAT teams.
Paul has written more than 300 magazine articles and 14 books on DB2 and can be
reached at: email@example.com.
If you were at the IOD11 Conference last week and tried to get a copy of the latest Flashbook “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”. If you weren’t at the conference, here’s what happened: I had 3000 printed copies of this book by IBM experts Paul Zikopoulos, Chris Eaton, Tom Deutch, George Lapis, and Dirk Deroos. We scheduled 2 time periods for the giveaways and author signings… Monday & Tuesday outside the event center after the opening session. None of the books were remaining after the first giveaway. Yes, it is true that we handed out 3000 copies on that Monday morning in what seemed like a mere 30 minutes!
I had an additional 1000 copies of this book delivered for Wednesday morning and had no trouble finding people who were eager to get a copy of the book to read. I’ve given out books at many conferences over the years and must say that this was the most exciting of all. More about the book: Flashbook: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data http://ibm.co/pVhiw2
#iod11 #bigdata #analytics
If you didn’t get a copy, I’m happy to announce that you can get the free e-version of this book on my bookstore page: ibm.com/software/data/education/bookstore. The book will be on that page later today or tomorrow… so don’t be discouraged if you don’t find it right away.
If you’re an IBMer and want a box of these books for an event, contact me via my IBM email address and I’ll give you instructions on how to get the books.
Want more? Here are other ways that you can build your knowledge and skills on this hot new technology:
1) Join Paul Zikopoulos this Friday, Nov 4 on the DB2Night Show, Episode #62 – Big Data Overview. The space for this free webinar is limited, so register now! Thanks again to award-winning IBM Champion Scott Hayes for doing such a great job of entertaining and educating the public!
2) Read Jeff Jonas’ blog entries about Big Data:
3) Dive into the best wiki in the world! BigInsights Technical Enablement Wiki: ibm.com/developerworks/wiki/biginsights/
4) Read this technical intro to IBM BigInsights, IBM’s Big Data platform, by Cynthia Saracco: Understanding InfoSphere BigInsights: An introduction for software architects and technical leaders.
5) One of the 82 Web-based Training Courses available for IM is BigInsights Essentials.
6) Want to learn Hadoop, MapReduce, HBase or other Big Data topics? Join free BigDataUniversity.com.
7) Read the excellent article by Lisa Stapleton that published in a recent edition of the IBM Data Management Magazine BigData - Volume, Variety, and Velocity. How do you attack something *that* big?
8) Read the article Big Data Impacts Data Management: The Big Vs of Big Data by IBM Champion Dave Beulke.
9) Follow the blogs and tweets that are happening real-time via the Big Data Daily.
10) Read the article by IBM Champion Craig Mullins Big Data and 150 Trillion Calculations Per Second at Vestas
Read, attend and learn!
I know… what happens in Vegas is supposed to STAY in Vegas. But I’m going to break that rule and give you links to a few videos that you’ll want to check out.
First, check out the interviews I did with some of the authors who were doing book signings at IOD11. I interviewed:
Sandy Carter about her latest book Get Bold: Using Social Media
Tony Giordano for his book Data Integration Blueprint and Modeling
James Taylor for his latest book Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics to Build Adaptive, Agile, Intelligent Systems
You can find all three videos on the IBM Press Book’s Channel on YouTube.
To go with the author them, remember to check out the audio interviews I did before the conference for the IM Skills Cast Series. I interviewed Roger Sanders, Roger Johnson, Filip Draskovic, Sunil Soares, and Bob Laberge.
Here are a few interesting clips that I found on YouTube that you may want to take a look at.
I’m already looking forward to IOD12. If you are too… here is the info so you can block your calendar and start building a justification to attend.
Last Friday Dan Dubriwny from the Big Data Tiger Team, IBM USA, was the guest speaker on episode 62 of the DB2 Night Show, hosted by Klaas Brant. The topic was BIG DATA Overview - What is it? Who cares? If you missed it, be sure to catch the replay.
According to Klaas:
If you have been to IOD 2011 in Las Vegas then the words Big Data will probably still resonate in your mind. The theme of IOD was "Turn Insight into action" and IBM is the only company that can do this for all sort of big data. Most people think that big data is all about Hadoop. And although Hadoop is one of the components in IBM's solutions it is not what big data is all about. Dan Dubriwny gives in this episode an excellent introduction into big data. It is all about Variety, Velocity and Volume. IBM can handle all of this for both data streams and data at rest. Probably by now you are curious how this all works, so go ahead and enjoy Dan's presentation with numerous examples.
Scott Hayes informs me that there is more… much more! Episode #63 on 18 Nov will be a deep dive into IBM solutions around BIG DATA.
Join special guests Paul Zikopoulos and Robert Thomas, both from IBM, to get a look at why IBM is the right partner for Big Data. You'll find that many vendors offer Big Data products whereas IBM is offering a Big Data PLATFORM. Most of this session will be technical--- show casing, for example, the file system (GPFS SNC) that's used with the IBM Big Data platform as opposed to the open source defacto standard HDFS because it provides better performance, management (it's POSIX compliant), security, and availability; with GPFS SNC, IBM hardens Hadoop for enterprise deployments. Of course, no one quite understands the enterprise like IBM so you can bet we are solving other enterprise challenges such as security, integration, governance, and more.
The DB2Night Show Episode #63: BIG DATA Technologies, Solutions, and Details
Speakers: Paul Zikopoulos and Robert Thomas
Friday, November 18, 2011
11:00 AM - 12:00 PM EST
As Scott always says, join to be educated, informed, and entertained.
Want more on Big Data? Check out this entry: Do you feel the excitement of Big Data?
What do you think? Are these the two top topics of 2011? It seems to me that I’ve blogged about these two topics quite a bit this year, and now we have the best… both in one webinar!
Join Leon Katsnelson, Uri Budnik, and Rav Ahuja for the next Chat with the Lab webinar: Leveraging Clouds for small and BIG data
Many experts agree that Cloud Computing has now matured past the initial hype phase and that organizations large and small have started to derive real benefits by placing applications and data on Clouds. In this webinar we will discuss how to leverage Cloud Computing for managing data and gaining insights from big data. We will cover various Cloud options (public, private, and hybrid) available for managing data in the Cloud with IBM DB2 database server as well as for deriving value from big data using IBM BigInsights hadoop-based platform. We will also show you how to be up and running in as little as a few minutes using free editions of DB2 and BigInsights. You ought to find this session useful regardless of whether you are a beginner or expert in cloud computing or big data.
Date: Tuesday, December 6, 2011 (6.12.2011)
Time: 12:30 PM - 2:00 PM Eastern Time (ET) 11:30 AM Central / 9:30 AM Pacific / 17:30hrs London / 18:30hrs Frankfurt, Paris / India 11 PM
Speakers: Leon Katsnelson (IBM), Uri Budnik (RightScale), Rav Ahuja (IBM)
Register Now >>> http://bit.ly/sY3Nfe
For questions/comments/suggestions: http://www.channeldb2.com/group/db2chatwiththelab
Thanks to Rav for this information.
Here are some other posts from my blog about these topics:
Yesterday I read a blog entry by Sunil Soares called Big Data Governance. This caught my attention as both Big
Data and Governance are huge topics in the industry today. You may recognize
Sunil’s name, and let me remind you where you know his name from. Sunil is an
expert in the Information Governance area and has had two books published on the
This book was published last year and is selling quite well. I wrote a
lengthy blog entry about this book that I encourage you to read: Meet Sunil Soares - Selling
Information Governance to the Business: Best Practices by Industry and Job
Function. In summary this book discusses the best practices to sell the
value of information governance. The objective of the book is to provide a
representative sample, rather than an exhaustive list, of best practices to sell
the value of information governance within an organization. You should use these
best practices as inspiration for what might work within your organization. It
is important that you read chapters from industries and functions outside of
your own because there are a number of case studies that you might find useful
for your specific situation. The book contains more than 50 case studies and 16
This is a Flashbook that was published in 2010. Printed copies of this book
exist, but in small supply. Printed copies are handed at selected events, but
you can get a free e-version of the book: Information Management Bookstore. What’s the book about?
Briefly, according to Arvind Krishna, General Manager at IBM:
IBM has assembled a comprehensive approach to Information Governance
delivers the industry’s strongest portfolio of products, services, and best
practices to address every organization’s needs. This book provides a
set of detailed steps and sub-steps to implement an Information
program, as well as the associated automation provided by IBM
Beyond the blog entry that caught my attention, I found two articles also
written by Sunil:
Information Governance: Big Data and the Road Ahead
Smart meters and Big Data: A clear case for governance best
In Sunil’s words: I am starting to see a convergence of two major trends in
the marketplace: information governance and Big Data. We are coining the term
“Big Data Governance” to reflect this emerging trend. I define Big Data
Governance as the formulation of policy to optimize, secure, and leverage Big
Data as an enterprise asset by aligning the objectives of multiple
I’m looking forward to see how this convergence pans out. If you’re also
interested, I suggest you keep an eye on what Sunil comes up with. I’m sure
he’ll make this emerging trend easy for you to understand and implement.
Date: Mar 27, 2012
Time: 12:30 PM - 2:00 PM (Eastern Time)
Speaker: Paul Zikopoulos (IBM)
Topic: An Introduction to Big Data
Big Data can mean a lot of things to a lot of people; but one thing we're sure of, it's the hottest thing to hit the IT landscape. In this chat you'll start with an introduction to Big Data so we're all on common ground as to what it is, how to spot it, and what the opportunities are. (Hint, be prepared to get shocked on Volume and more). With a strong foundation on what Big Data is, you'll be introduced to the IBM Big Data platform, what it looks like and its key components. In the final part of the chat, we'll dive into IBM's non-forked, embraced and extended Hadoop distribution (called InfoSphere BigInsights) and streaming technology (InfoSphere Streams), a tour of the Big Data platform features as it relates to these pillar components, and a get introduced to the toughest Big Data use case there is: text analytics and how this is just one areas where IBM is brining the "WOW" factor to Big Data.
Of course along the way you'll get introduced to clients that are using the platform and implicit use cases and be all set to investigate and appreciate not only what Big Data can do you for company, but how a partnership with IBM can make it happen that much faster with that much more confidence.
Register now to join IBM speaker, Paul Zikopoulos on Tuesday March 27th, 2012 at 12:30 PM Eastern. Sign-up to attend this free webinar.
Here are a few other resources you may wish to tap into if you’re interested in this topic:
1) Dan Dubriwny from the Big Data Tiger Team, IBM USA, was the guest speaker on episode 62 of the DB2 Night Show, hosted by Klaas Brant. The topic was BIG DATA Overview - What is it? Who cares? Catch the replay.
2) Paul Zikopoulos and Robert Thomas were guests on Episode #63 of the DB2Night Show and gave a deep dive into IBM solutions around BIG DATA.
3) At the IBM Information on Demand Conference that took place last October, copies of the Flashbook “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data” were handed out. If you didn’t get a copy, you can get the free e-version of this book on my bookstore page: ibm.com/software/data/education/bookstore.
4) Read Jeff Jonas’ Business Insider’s talks and his blog entries about Big Data:
5) Dive into the best wiki in the world! BigInsights Technical Enablement Wiki: ibm.com/developerworks/wiki/biginsights/
6) Read this technical intro to IBM BigInsights, IBM’s Big Data platform, by Cynthia Saracco: Understanding InfoSphere BigInsights: An introduction for software architects and technical leaders.
7) One of the 82 Web-based Training Courses available for IM is BigInsights Essentials.
8) Want to learn Hadoop, MapReduce, HBase or other Big Data topics? Join free BigDataUniversity.com. Already 15,000 students have registered.
9) Read the excellent article by Lisa Stapleton that published in a recent edition of the IBM Data Management Magazine BigData - Volume, Variety, and Velocity. How do you attack something *that* big?
10) Read the article Big Data Impacts Data Management: The Big Vs of Big Data by IBM Champion Dave Beulke.
11) Follow the blogs and tweets that are happening real-time via the Big Data Daily.
12) Read the article by IBM Champion Craig Mullins Big Data and 150 Trillion Calculations Per Second at Vestas
Read, attend and learn!
Do you ever wonder what the big brains in IBM read? You know the people who
I mean! Their names are on books, they have patents and speaking engagements,
and when they talk, you’re wowed at what they know and how they convey what they
Here’s a list of what Netezza / Big Data expert Krishnan Parasuraman
has been reading and recommends for others to read:
1. Analytics: The widening divide.
IBM Institute of Business Value in collaboration with MIT Sloan
2. Moneyball: The Art of Winning an Unfair Game.
Michael Lewis, W. W. Norton & Company
3. Pacific Northwest Smart Grid Demonstration Project
Euronext: Federated Data Architecture with IBM Netezza
5. T-Mobile: IBM Netezza Client Success Video - Network Engineering success at scale with IBM Netezza
6. Catalina Marketing Stays ahead of the Curve with IBM
7. T-Mobile crunching 17 billion transactions a day - What does it do with all that data?
8. Large Gene Interaction analytics at University at Buffalo, SUNY. IBM Case Study
9. The Netezza Data Appliance Architecture: A
Platform for High Performance Data Warehousing and Analytics. Phil
10. IBM Netezza Analytics - The advanced analytics platform inside every IBM
Netezza appliance. IBM Data Sheet.
You haven’t heard of Krishnan Parasuraman yet? He’s one of the
authors who is working on a new Big Data book that will be handed out at the IBM
Information on Demand Conference.
One of the things that I love most about the IBM Information on Demand
Conference is the buzz on books. Every year we have a giant bookstore that
showcases all the latest books related to DB2, Analytics, Big Data, and more.
My job is to arrange to have the right set of books at the conference and to
invite authors to be available to sign copies of their books for attendees.
All books purchased at the bookstore are discounted by 20% off the retail
Here are the featured books and authors that we have planned for this
There are many other new books available to purchase at the bookstore this
year, but I was only able to arrange signings for the authors who are attending
the conference this year. Stop by and check out all the great knowledge that is
The IBM Information on Demand Conference is a great place to promote books
that are published through out the year. In fact, quite often publishers time
the release of their books to coincide with this conference. As a result, we
are typically able to set up book signings and giveaways to help promote the
This year is no different. We have quite a few books that have published
already that we’ll promote as well as books launching just for the conference
and we’ll have 4 (possibly 6) new Flashbooks that we’ll be giving away!
“Flashbooks” are small-sized books that have fewer than 150 pages, and have
easy-to-read messages about products, solutions, or technology. Flashbooks are
published to coincide the IBM Information on Demand Conference. The huge
quantity of messages and content delivered at the conference makes it difficult
for attendees to retain all the key messages. Flashbooks are designed to contain
the key messages IBM experts want you to take away with you.
Here is the list of Flashbooks. More details for each will be available in
my blog shortly.
Warp Speed, Time Travel, Big Data, and more: DB2 10 for Linux, Unix
and Window New Features
by Paul Zikopoulos, Walid Rjaibi, George Baklarz,
Matt Huras, Matthias Nicola, Dale McInnis, and Leon
Information about the book: http://ibm.co/db210flashbook
Signing and Giveaway: Monday October 22, 12:30 – 1:30 at bookstore.
5 Steps to Business Analytics Program Success
by Brian Green, BlueCross BlueShield of Tennessee, Kay Van De
Vanter, The Boeing Company, Tracy Harris, IBM, Bill Frank,
Johnson & Johnson, John Boyer, RCG Global
Signing and Giveaway: Monday October 22, 5:00 p.m.–7:00 p.m. EXPO reception
Business Value of DB2 Optimizer and Analytics Accelerator - DB2 for z/OS
by Surekha Parekh, Terry Purcell, John Campbell,
Signing and Giveaway: Mon 12:30 – 1:30 at bookstore.
Harness the Power of Big Data - The IBM Big Data Platform
by Paul Zikopoulos, Dirk deRoos, David Corrigan, Tom
Deutsch, Krishnan Parasuraman, James Giles
Signing and Giveaway:
- Monday October 22, 9:45 – 10:45 at bookstore
- Tuesday October 23, 9:30 – 10:30 at bookstore
An excerpt of this book is available and can be found: The Big Data Hub
Note: After the conference, all of these books will be available in
electronic format. Follow my blog to find out how to get the softcopy!
Lots of people love IBM Redbooks. As you know, they are FREE and available
for you to download in PDF or EPUB versions at anytime from the IBM Redbooks website.
One of the most popular giveaways at IBM Information on Demand Conference is
printed IBM Redbooks. This year we have 10 different titles to give away.
Please note that the PRINTED copies are in limited quantity.
Make sure you attend one of the sessions planned at the bookstore to make sure
you get the title that you want. While you're getting your free copy, meet the authors who wrote the book and have them sign it for you!
I included links to the eversion of the book if it already exists. Notice
that some are not yet published? They’ll be launched specifically for IOD:
Tuesday, October 23, 12:00 – 1:00 at the IOD bookstore
Customizing and Extending IBM Content Navigator
Wednesday, October 24, 12:00 – 1:00 at the IOD bookstore
Thursday, October 25, 12:00 – 1:00 at the IOD
As you can see, we’ve made it easy for you. Come by the bookstore Tuesday,
Wednesday and Thursday at lunchtime to get a printed book!
While you’re at the bookstore, check out the technical books that are available for you to purchase at a
20% discount as well as pick up one of the many free Flashbooks that are scheduled to
be handed out at the bookstore on Monday, October 22.
Harness the Power of Big Data - The IBM Big Data Platform
by Paul Zikopoulos, Dirk deRoos, David Corrigan, Tom
Deutsch, Krishnan Parasuraman, James Giles
Information about book:
Big Data represents a new era of computing – an inflection point of
opportunity where data in any format may be explored and utilized for
breakthrough insights - whether that data is in-place, in-motion, or at-rest.
IBM is uniquely positioned to help clients navigate this transformation. This
book reveals how IBM is leveraging open source Big Data technologies, infused
with deep IBM innovation from over 6 billion dollars in analytics acquisitions,
that manifest in a platform capable of 'changing the game'.
The four defining characteristics of Big Data – volume, variety,
velocity, and veracity – are discussed. You’ll understand how IBM is fully
committed to Hadoop and integrating it into the enterprise. Hear about how
organizations are taking inventories of their existing Big Data assets, with
search capabilities that helps organizations put their hands around what they
already know, and extending their reach into new data territories for
unprecedented model accuracy and discovery.
In this book you will also learn not just about the technologies that make up
the IBM Big Data platform, but when to leverage its purpose built engines for
analytics on data in-motion and data at-rest. And you’ll gain an understanding
of how and when to govern big data, and how IBM’s industry-leading InfoSphere
integration and governance portfolio helps you understand, govern, and
effectively utilize big data. Industry use cases are also included in this
An excerpt of this book is available and can be found: The
Big Data Hub. Also see the Big Data Hub for blog entries, videos, and other
resources that will help you on your journey in the Big Data world.
If you are attending the IBM
Information on Demand Conference, get a free printed copy of this book at the Information Desk at the big data booth #622 in the Expo. Bring your book to get signed by the authors at the IOD bookstore Tuesday October 23, 9:30 – 10:30.
After the conference is over, a free e-version of the book will be available
for you to download. Follow my blog to ensure that you are one of the first to
know where you can download the book.
Other sessions by these authors:
3818A Top Enterprise Big Data Use Cases - South Pacific B; Mon, Oct 22,
2012; 2:15 PM - 3:15 PM
3920B Ask the Experts: How Can A Big Data Platform Help Me? - Tradewinds A;
Tue, Oct 23, 2012; 1:45 PM - 2:45 PM
3920C Ask the Experts: How Can A Big Data Platform Help Me? - Tradewinds A;
Wed, Oct 24, 2012; 3:45 PM - 4:45 PM
2017B How to Get Started with Your First Big Data Project - South Pacific F;
Mon, Oct 22, 2012; 3:45 PM - 5:00 PM
3618A Big Data Reference Architectures - South Pacific A; Tue, Oct 23, 2012;
10:00 AM - 11:00 AM
3802A Customer Insight and the Customer Experience: New Capabilities with
Big Data and Analytics - Palm A; Tue, Oct 23, 2012; 11:15 AM - 12:15 PM
1660B Big Data Governance: An Emerging Imperative - South Pacific H; Tue,
Oct 23, 2012; 4:30 PM - 5:45 PM
3822A How to Build an Exploratory Big Data Analytics Capability for the
Enterprise - South Pacific B; Wed, Oct 24, 2012; 5:00 PM - 6:00 PM
2017C How to Get Started with Your First Big Data Project - South Pacific B;
Thu, Oct 25, 2012; 8:15 AM - 9:30 AM
4000A Trusted Information - The Foundation for Efficient Operations and
Smarter Analytics - Jasmine F; Sun, Oct 21, 2012; 2:15 PM - 3:15 PM
2309A InfoSphere Information Integration and Governance Track Keynote:
Future Directions for InfoSphere - South Pacific J; Mon, Oct 22, 2012; 10:15 AM
- 11:15 AM
3524A Real-Time Analytics on Extreme Data: Optimization of a Bayesian
Algorithm Within InfoSphere Streams - South Pacific B; Tue, Oct 23, 2012; 1:45
PM - 2:45 PM
2092A Using Big Data Technology to Revolutionize Cyber Threat Detection -
South Pacific F; Wed, Oct 24, 2012; 5:00 PM - 6:00 PM
Just in time for IBM’s Information on Demand Conference: a new book by Sunil Soares: No
doubt the biggest topic in tech these days is Big Data. Sunil uses
this guide to focus on Big Data plus one of the most important tech
topics: Governance. This
guide focuses on the convergence of two major trends in information
management—big data and information governance—by taking a strategic
approach oriented around business cases and industry imperatives. With
the advent of new technologies, enterprises are expanding and handling
very large volumes of data; this book, nontechnical in nature and geared
toward business audiences, encourages the practice of establishing
appropriate governance over big data initiatives and addresses how to
manage and govern big data, highlighting the relevant processes,
procedures, and policies. It teaches readers to understand how big data
fits within an overall information governance program; quantify the
business value of big data; apply information governance concepts such
as stewardship, metadata, and organization structures to big data;
appreciate the wide-ranging business benefits for various industries and
job functions; sell the value of big data governance to businesses; and
establish step-by-step processes to implement big data governance. Sunil
is a leading expert in the field and is the founder of Information
Asset LLC, a consulting firm focused on helping clients build
information governance programs, and a former director of information
governance at IBM. He is the author of two other high rated books: The IBM Governance Unified Process and Selling Information Governance to the Business. Susan
in this event to experience IBM’s enterprise-class big data platform
that allows you to address the full spectrum of big data business
challenges. The morning will include interactive discussions and live
demonstrations of big data for social media and log analytics, then get
hands on with Hadoop scripting and text analytics with guidance from
development experts. This is a unique opportunity to develop skills and
learn about exciting technologies.
8:00 a.m. - 9:00 a.m. Registration & Complimentary breakfast
9:00 a.m - 12:00 p.m. Overview of Big Data and Demonstrations
12:00 p.m. - 12:45 p.m. Complimentary Lunch
12:45 p.m - 6:00 p.m Hands-on Lab
There are many dates and locations planned for this event. Two of the events fall into the month of October:
October 10, 2012 - Boston, MA
October 22, 2012 - New York, NY
November and December events are as follows:
November 8, 2012 - Washington, D.C.
November 15, 2012 - Austin, TX
November 15, 2012 - Toronto, Canada
December 6, 2012 - Pittsburgh, PA
Space is very limited so register today!
Flashbook: Big Data Analyticsby Dr. Arvind Sathi
About the Book:
Data Analytics is a popular topic. While everyone has heard stories
of new Silicon Valley valuation bubbles and critical shortage of data
scientists, there are equal number of concerns – Will it take away my
current organization or investment? How do I integrate my Data Warehouse
and Business Intelligence with Big Data? How do I get started, so I can
show some results? What are the skills required? What happens to data
governance? Unlike many other Big Data Analytics blogs and books, this
book presents a practitioner’s viewpoint. It identifies the demand for
Big Data Analytics, its engineering components and what happens on the
production floor. In doing so, it respects the large investments in
Data Warehouse and Business Intelligence and shows both evolutionary and
revolutionary ways of moving forward to the new brave world of Big
book discusses three perspectives on Big Data Analytics. First, why is
Big Data Analytics becoming so important and what can we do with it.
It presents major trends behind the rise of Big Data and shows typical
use cases tackled by Big Data Analytics – where leading organizations
are already seeing major benefits in using Big Data Analytics. Second,
it lists major components of Big Data Analytics - Unstructured Data
Analytics, Massively Parallel Processing, Adaptive Real-time and
Predictive Modeling, Data Privacy Management, Data Visualization, and
Ontologies. It shows how these components work together to provide an
integrated engine that can combine Big Data with traditional Data
Warehouse and Business Intelligence to provide an overall solution.
Third, it provides a glimpse at implementation concerns and how they
must be tackled. How do we combine various components, which are running
at different velocities and volumes? How do we get structured
information out of unstructured data and combine with other structured
data? How do we provide governance across this data, when the
originating data may have varying quality or privacy constraints? How
do we embark on an implementation road map in a systematic way to show
results as we go and build skill level and momentum for Big Data
Analytics in our organization?
If you are attending the IBM Information on Demand Conference, get a free printed copy of this book and meet the author at the Author Signing and Giveaway that is scheduled:
- Monday October 22, 4:00 p.m.–5:00 p.m. IOD Bookstore, Bayside Foyer; Session # 4243A in Smart Site.
the conference is over, a free e-version of the book will be available
for you to download. Follow my blog to ensure that you are one of the
first to know where you can download the book.
Also attend this session by the author of this book:
Session # 3645A - Smarter Operations at Verizon - A Case Study - Tue, Oct 23, 2012; 3:00 PM - 4:00 PM - South Seas J
Session # 1813A - Extreme Targeted Marketing: Micro Segmentation for Utilities and Cross-Industry Applications; Tue, Oct 23, 2012; 4:30 PM - 5:45 PM - South Pacific A
the ads say “I LOVE NY”. I’ve visit often and have many friends in NY.
If you’re looking for an excuse to visit the big apple, consider some
of these events that are taking place during Data Week.
When: October 22 - October 26. I’ll be in Las Vegas for the IBM Information on Demand Conference, but some of my colleagues will be in NYC at this event.
Where: Various awesome Manhattan locations.
Price: Most NYC Data Week events are free to attend, and anyone can attend.
What is Data Week: According to their website, NYC Data Week is co-produced by the City of New York's Department of Information Technology & Telecommunications (DoITT) and O'Reilly Media's Strata + Hadoop World Conference.
It celebrates and explores the people, industries, and organizations using data to fuel innovation in New York City. The Data Innovation in Finance Panel on October 24 and Data Innovation Across the City Panel
on October 25 showcase New York City business and government leaders
using data to implement change, and talking frankly about what it takes
to succeed with data initiatives.
Data Week events include:
- A Startup Showcase with Fred Wilson and Tim O'Reilly.
- Ignite NYC @Strata, a hackathon, numerous meetups, and more.
- IBM Big Data Developer Day • Oct 22 • 8:00am–6:00pm • IBM Client Center, 590 Madison Avenue, New York, NY
IBM’s enterprise-class big data platform at IBM's Big Data Developer
Day hosted by the IBM Big Data Development team. The morning will
include interactive discussions and live demonstrations of big data for
social media and log analytics, then get hands on with Hadoop scripting
and text analytics with guidance from development experts. Seating is
limited and you must register to be guaranteed a seat. Register today!
- If you can't make this one, see the list of other Big Data Developer Days.
- DataKind DataSprint • Oct 23 • 9:00am–5:00pm • Sheraton New York, Empire Ballroom, 811 7th Avenue 53rd Street, New York
hackathon focused on a critical New York City data project. DataKind is
incredibly excited to announce that we will be setting up shop all day
at the Strata NY Conference on October 23rd with a bunch of great data
problems for you to stop by and work on! We will be serving non-profits
and charities, using data to to solve some of their toughest problems,
so bring your data skills and get ready to make the world a better
place. If you're a socially conscious data hacker who wants to make the
world a better place, RSVP now! Entrance to our DataSprint is completely
- The Future of Security • Oct 24 • 9:00am–3:30pm • Theresa Lang Community and Student Center; The New School; 55 West 13th Street, 2nd Floor
Future of Security: Ethical Hacking, Big Data and the Crowd conference
will convene a daylong series of discussions to highlight the emerging,
disruptive forces changing the landscape of the global community. Key
panels include the following topic areas: Ethical Hacking / Hacktivism;
Big Data and Networks; and The Crowd and Crowdsourced Science. Organized
by the The Parsons Institute for Information Mapping (PIIM), The Center
for Transformative Media (CTM) of Parsons The New School for Design,
and The Richard Lounsbery Foundation
Be sure to see the agenda as there are many choices that may appeal to you. Wish I was going to be there!
get asked this question all the time. What are we doing to prepare the
next generation in terms of up and coming technology. You’ll be happy
to know that plenty is going on. Even at IOD. Key
faculty who are collaborating with IBM on advanced Big Data and
Analytics curriculum and programs are gathering for a two day special
Big Data and Analytics Academic track at IOD. They will attend several
sessions together to share their current programs as well as brainstorm
about their next steps and their challenges. They will learn about IBM
academic programs and hear from IBM experts in two panels. All of this
will strengthen the community amongst these leading academic
institutions. The Big Data panel with Big Data leaders, includes:
- Richard Hale - IBM Big Data Evangelist
The Analytics panel with Analytics leaders includes:
- Sunil Soares - Founder & Managing Partner at Information Asset, LLC
- Paraic Sweeney - IBM VP Product Management, InfoSphere
- Dave Wilkinson - IBM VP, InfoSphere Development
- Tracy Harris - Senior Manager, Business Analytics
I’m glad to say that we have faculty from some top schools involved with this initiative, including:
- Richard Rodts - Manager, SPSS Solutions Specialist team
- Bob Bry - IBM Academic Initiative Analytics lead
If you think you should be on this list, let me know and I’ll pass your information to the Academics Team.Beyond
these sessions, the faculty will still have time to experience and
learn from the wealth of other IOD events and sessions. The faculty will also have the opportunity to attend a special, invitation only roundtable session::
- Beijing Jiaotong University, China
- Boston University, USA
- Dalhousie University, Canada
- ITAM, Mexico
- ITESM, Mexico
- Radford University, USA
- Ryerson University, Canada
- Simon Fraser University, Canada
- Southern Methodist University, USA
- State University New York at Buffalo, USA
- Syracuse University, USA
- University of Connecticut, USA
- University of Ottawa, Canada
- University of South Carolina, USA
"Watson Heads to the Class" - Preparing a New Generation of Thinkers for Next Generation JobsRecognizing
that a practical understanding of analytics, cognitive computing and
big data is key for 21st century jobs and economic competitiveness, this
round table will bring together a panel of leaders that can discuss the
new ways IBM is working with academia to develop new degree programs,
case competitions, case study programs and internships. A key focus of
this panel will be how IBM is providing academic organizations and
students access to the Watson technology and Watson 'thinkers' in order
to foster greater interest and knowledge in cognitive computing,
analytics, natural language processing technologies; as well as explore
new ways these technologies can be used to address major societal
challenges.ParticipantsIBM Executives: Manoj Saxena, General Manager, IBM Watson SolutionsUniversity
partners: Professor Girish Punj, University of Connecticut, Dr. Thomas
Fomby, Southern Methodist University, Rajiv Dewan, Sr. Associate Senior
Associate Dean for Faculty and Research, University of RochesterSMART program participant: Matthew Canon, Business Planning and Analysis Manager, Andrews Distributing Company====================================================================Beyond IOD, there are many initiatives underway to educate the future, including:Big Data University
IBM DB2 Academic Associate ProgramIBM DB2 program for students and the academic communityThe
IBM DB2 Academic Associate program is exclusively for the university
community. The preparation course teaches relational database concepts
while also providing students with critical, hands-on database skills on
IBM’s DB2 and Data Studio software. Smarter Planet solutions rely on an
ever growing amount of data. Learn how to be smart about managing that
data with this no-charge academic course and IBM exam. IBM Student PortalStudents can find out about jobs, competitions, and discounts on certification exams.Susan
- Easy and Affordable
- Learning Hadoop and other Big Data technologies has never been more affordable! Many courses are FREE!
- Latest industry trends
- Acquire valuable skills and get updated about industry's latest trends right here. Today!
- Learn from the Experts! Big Data University offers education about Hadoop and other technologies by the industry's best!
- Learn at your Own Pace! Find everything right here when you need it and from wherever you are.
Bigger Big Data, Big Thoughts, and Big IdeasLast week, Leon Katsnelson was the guest on The DB2Night Show with IBM Champion Scott Hayes. I missed it! Did you as well? If so, we’re both in luck! We can watch the replay and learn. From Scott:97% of our studio audience learned something! Leon gave us a fantastic presentation on BIG DATA. Not only did he explain what it is and why it is important, but finally, at last, some explained DB2's relationship with BIG DATA! Leon also talked about HADOOP and other important technologies.Leon is the Program Director, Information Management Cloud Computing Center of Competence and Evangelism at IBM.Be sure to browse through the large and growing library of replays of The DB2Night Show that are available for you to watch: REPLAYS.Upcoming shows include:Nov 16: DB2 XXL - How to handle large volumes of data? with IBM Champion Klaas Brant.Nov 30: IBM DB2 LUW Data Warehouse & DPF Performance with Glenn Sheffield from IBM.Mark your calendar so you don’t miss a single episode of these award winning webinars.Susan
find it interesting to see what people buy at a bookstore. Personally I
love bookstores and online book sites. I can spend hours researching
bookstore at the IBM Information on Demand conference was very busy
this year, and I’ve put together this blog to tell you want others were
buying. The biggest seller was Nate Silver’s book The Signal and the Noise: Why So Many Predictions Fail-but Some Don't.
The best selling book at a conference tends to be one from a special
guest. Nate was quite the guest to have given his predictions in the US
election that happened shortly after IOD ended. I listened to the
audible version of the book and quite enjoyed it. The book has a
chapter for each area where predictions are normally made: weather,
earthquakes, stock market, baseball, poker, and elections. Nate fully
describes the reason why predictions either work or don’t work for the
area. You’ll instantly be connected to the story and will learn how
predictions can be made more accurate in this case and how you can
improve predictions that you need to make yourself.The next biggest seller was Sunil Soares’ latest book Big Data Governance: An Emerging Imperative.
The store sold out of Sunil’s book, so clearly Sunil did a great job
putting this book together. I haven’t read it yet, but would like to. Read a sample chapter from the book.
Governance and Big Data are both huge and important topics, so take a
look. Sunil’s first book also made the top 10 list: Selling Information
Governance.Next was the book by Jeff Ma, The House Advantage: Playing the Odds to Win Big In Business.
If you watched the movie “21” with Kevin Spacey, you’ll remember Jeff
Ma as the lead MIT student who counted cards. If you attended Jeff’s
talk at IOD, you learned that Jeff played a dealer in the movie. I
think I need to rewatch the movie!Cognos books sold very well this year! Four different Cognos titles made it in the top 10 list: and since the conference, a few more Cognos books were published:And the top selling IM books this year were:Note, if you’re in Europe, I’ve written a blog to help you figure out what online sites give you the best prices: I’m in Europe: How can I get a copy of that book?Susan
does “big data” mean to you? It is a term that is widely used and can
convey all sorts of concepts, including: huge quantities of data, social
media analytics, next generation data management capabilities,
real-time data, and much more. Once you’ve figured out what “big data”
actually means to you, you must then figure out how to manage and store
it. Are your current systems able to handle this type of data? How
can you be sure?
Several recent articles and blog entries discuss this theme.
The Commoditization of Commercial Database Management Systems?
by Craig Mullins
are so popular these days that they are being taken for granted. This
article argues that databases are intricate, multi-faceted and useful
tools that are necessary to get the most out of the data that is being
collected and should not be considered a commodity as no two offerings
The Continuing Role of the Database in the New Era of Big Data
by Bernie Spang
amounts of complex big data must be stored in databases in order for it
to be analyzed and made of use to businesses. What features in a
database are necessary to take on this challenge?
Big Data: IBM’s Mainframe Customers Base Grows
by Dave Beulke
Big Data solutions are being embraced by companies worldwide. This
discussion shows that IBM Information Management solutions are well
positioned to fill the needs to handle all your big data.
The Maturing of Big Data: From Herding Cats to Taming Tigers
by James Kobielus
all the recent innovation in the big data market, has big data matured?
Have we figured out how to make big data tigers jump through hoops, or
are we still just herding cats?
Infographic: Taming Big Data
Lots of detail here. Take a look, share the graphic and tell us what you think of it.
Taming Big Data: 12 Best Practices for Analysts - Maria Deutscher Are
you ready to “sink your teeth” into Big Data Technology? Learn how to
start using IBM’s technology correctly in order to get the most out of
your efforts by learning these 12 best practices.
In addition to these articles, I highly suggest that you read the report Analytics: The Real-World Use of Big Data.
It is based on the Big Data @ Work Survey conducted by IBM in mid-2012
with 1144 professionals from 95 countries across 26 industries.
“Across industries and geographies, our study found that
organizations are taking a pragmatic approach to big data.
The most effective big data solutions identify business
requirements first, and then tailor the infrastructure, data
sources and analytics to support the business opportunity.
These organizations extract new insights from existing and
newly available internal sources of information, define a big
data technology strategy and then incrementally upgrade their
infrastructures accordingly over time.”
Join this free SSWUG Webcast: The Big Deal about Big Data,
staring expert Paul Zikpoulos. The webinar is free only during the
live broadcast that takes place Wednesday, February 20, 2013, 1:00 PM
Eastern. What you’ll learn: Big
Data can mean a lot of things to a lot of people; but one thing we're
sure of, it's the hottest thing to hit the IT landscape. In this chat
you'll get a comprehensive introduction to Big Data. You'll learn how to
spot Big Data, it's characteristics, and what the opportunities are.
(Hint, be prepared to get shocked on Volume and more). You'll get a
taste of Hadoop, but realize Big Data is so much more. Paul
will also share top things to consider in the Big Data world that's
often overlooked (governance, integration, search, and more). Consider
this: if you Google search 'What is Big Data', you will get almost 1
billion hits!!! If you attend this session, you'll never have to Google
search this phrase again.To register and for additional information see: Webcast Structure and Cost
Note that listening to the live broadcast is free, but ordering the replay has a charge.
Modified on by svisser1
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
by Sunil Soares
ISBN 978-1583473825, MC Press, February 2013
Governance has taken a backseat to the analytics and technologies associated with big data. However, as big data projects become mainstream, we anticipate that privacy, stewardship, data quality, metadata, and information lifecycle management will coalesce into an emerging imperative for big data governance.
Foreword from David Corrigan, Director, Product Marketing, InfoSphere
The importance and the role of a governance strategy are still not well understood. Information Governance is a business strategy that has a series of IT deliverables. Sunil has been one of the pioneers in this area, defining the Unified Information Governance Process several years ago. He defined several key steps, such as identifying a business problem and executive sponsor, setting up cross-functional governance boards, and measuring and communicating success. He has applied this process at hundreds of clients and has helped them achieve successful implementations. His approach can also be applied to governing big data. It has helped many organizations get the business involved in governance and establish trusted information for a key enterprise application.
In short, this process helps you move beyond an IT project toward a true business strategy. It helps by getting business executives and owners involved in the process of governing data. It helps ensure successful outcomes. Sunil, thank you for continuing to contribute to the discipline of Information Governance and move it into the new era of computing—the era of big data.
And to the readers of this book, remember that the competitive advantage you seek from insights garnered from big data has two components: big data analytics and trusted information. Information Governance creates trusted information from very uncertain sources, enabling you to trust and act upon the insights from analytics. I wish you well in your big data strategy.
Sunil’s other books:
The IBM Data Governance Unified Process (MC Press, 2010)
Details the 14 steps and almost 100 sub-steps to implement an information governance program. The book has been used by several organizations as the blueprint for their information governance programs and has been translated into Chinese.
Selling Information Governance to the Business: Best Practices by Industry and Job Function (MC Press, 2011)
Reviews the best practices to approach information governance by industry and function.
Big Data Governance: An Emerging Imperative (MC Press, 2012)
Discusses the governance of different types of big data.
Congratulations to Sunil on this latest book!
Today I spent an hour taking part in the TweetChat at Big Datamgmt focused on governance to avoid a data landfill: http://t.co/j2wojSb9Hf: "Getting Control of Data in Big Data Era"
it went too fast for me to actually be a contributor, so I was
participating as a reader / listener. This kept me busy enough since by
the end we had generated a fair about of Big Data ourselves: 647 tweets, 180 users with reach of 136,229 & 1,506,585 impressions.
Who were the experts?
and facilitators / moderators:
There were 8 questions posed over the hour, but I'm only posting the first 4 here.
Q1 In this Big Data era, do traditional concepts data quality, data governance & data stewardship even apply?
A summary of the answers:
Big Data refers to datasets whose size, type and speed of creation make
it impractical to process and analyze with traditional tools. That Big
Data definition comes from wikibon; see http://t.co/awsPyuqXjZ. So given that, definitionally then, traditional concepts are at the very least “impractical”… no?
dvellante My belief is that ingest process & analysis of data changes with big data.
BigDataAlex Yes, I think they apply. Our clients are very concerned about these issues and it does apply.
jeffreyfkelly Absolutely, but vastly more complex.
Natasha_D_G Traditional concepts are even more critical in Big Data era especially in data governance.
craigmullins But, of course data quality, data governance and data stewardship SHOULD apply in the age of Big Data Management.
You still need clean and common policies for data taxonomies; but the
unstructured and semi-structured data texture requires some new thinking
and technology. Specifically ideas around function shipping, name value
pairs, Hadoop, etc - applying traditional concepts to new model.
Dmattcarter In order for Big Data to be enterprise-ready, it needs to include those traditional concepts.
jeffreyfkelly The challenge is applying DQ and governance to high velocity data - hard enough with "traditional" data, ie CRM, ERP.
craigmullins Failing to apply these concepts will result in poor data quality. Analytics performed on bad quality data produces bad results.
BigDataAlex I think transparency is important too in this era of Big Data and how we govern. I would suggest Big Data Ethics manager.
BTRG_MikeMartin IG concepts apply to Big Data even more so as the issues solved by Information governance are only exaggerated.
furrier Data quality has to take on the idea that it will be moving around different systems/APIs.
Yet there are issues and adaptations that will be required as we apply
data quality, data governance and data stewardship to Big Data
BigDataAlex Love the challenge on high velocity data....algorithms in streams.
jeffreyfkelly Big Data is experimenting with data sets, while governance is applying policies that sometimes restrict experimentation.
BTRG_MikeMartin You can’t make good business decisions on bad data. http://t.co/8J1pQPy6eW
Natasha_D_G Data quality is an issue as "94% biz believe some of their customer/prospect info is inaccurate".
Data governance is critical in the Big Data management era as it makes
small problems bigger. You need data quality to enable Biginsights http://t.co/yVTA9NpXIB
furrier Data as a resource for applications; ownership of data is important to individual and/or company.
BigDataAlex In health care sector, orgs are combining medical ethics with their CIOs.
Aarti_Borkar Governance is even more important with Big Data as the security and trust is a bigger business issue now.
dvellante In part this is a discussion around the balance between data being an asset an a liability - good DQ is important for both.
searchCIO Metadata practices are gaining momentum as companies tackle Big Data. http://t.co/DSkdH4Yk6S
Q2 With data at unprecedented speed/volume, how can data quality measures be applied in time for analysis?
A summary of the answers:
With data quality, cleansing can occur as humans eyeball the data -
most raw Big Data is not eyeballed. In some cases (e.g. medical
devices, automated metering, etc.) only rudimentary cleansing (if any)
may be needed. At least as long as the meters are calibrated and
BigDataAlex Real-time analytics is critical. We love Streams. The right algorithm at the right time.
Natasha_D_G Trust = Word we try to avoid. @Aarti_Borkar: Governance is even more important with Big Data as security & trust bigger biz issue.
To deal with Big Data, speed, and volume: be proactive by starting
Big Data Management across the enterprise now & maintain http://t.co/hGJ3QkTiJf
Aarti_Borkar Data Quality for Big Data can be handled right upfront before starting Big Data analysis
BigDataAlex A next-generation of KPIs for quality vs. quantity are being implemented to separate quality from quantity in real-time.
furrier Data quality is about the context of the application & what users experience for each use case is not always the same.
jeffreyfkelly Machine learning is required to improve data quality for Big Data - velocity too high for human methods IMHO
nenshad Variety of algorithms include semantics
zacharyjeans Ask your Big Data well crafted questions. Sloppy questions lead to sloppy answers.
craigmullins Speed + volume make data quality challenging…
searchCIO Data Quality is essential to master Big Data Management http://t.co/pxZ49Xgimm
BTRG_MikeMartin Start now on data quality because if you don’t have it in now Big Data only magnifies data issues http://t.co/hGJ3QkTiJf
Natasha_D_G Excellent question especially given social media data and its 18 minute life span
jeffreyfkelly Also with Big Data, volume of data can sometimes smooth over anomalies in data quality.
Aarti_Borkar Data quality should also be handled as the results of the analysis are merged back into the reporting marts.
BigDataAlex The right analytics at the right time against the systems of systems integration.
dvellante Perspectives from a former CIO on the importance of data quality http://t.co/mYPfqNCCjm
nenshad It’s all about the data first
dvellante In my view you can't deal with Big Data quality unless you can automate the classification of data at the point of creation.
Kari_Agrawal How exactly do we clean the data when it has no structure?
BTRG_MikeMartin You can’t make good decisions and enable business biginsights without high data quality.
furrier Dirty data equals poor user experience. I wrote about it in 2009 re: twitter facebook & social data http://t.co/vpkfB0xS3h
Aarti_Borkar Data quality should be handed as part of data integration as the Information Server customers do - its the same with Big Data.
Q3 How do data governance policies apply when the point of Big Data is to explore novel use cases?
A summary of the answers:
craigmullins Finding novel uses of data does not diminish the need for data governance policies.
Natasha_D_G True, but still need boundaries.
BTRG_MikeMartin Exploring Big Data still requires trusted data so you must secure and govern even more so. http://t.co/UL0VNCiivP
craigmullins The novel uses need to be documented as part of the data governance policies.
BigDataAlex The right policy at right time. I think you can agility with accountability.
craigmullins Keeping in mind that even under ideal circumstances data governance policies can be difficult to enact.
Big data isn't just for novel new business cases - it can also vastly
improve value in existing ones - i.e. R&D, cust service.
craigmullins Consider non-intrusive data governance; see this article by my friend Bob Seiner http://t.co/GogojXCcoV
Seiner states: data governance refers to the administering
(formalizing) of discipline (behavior) around the management of data.
craigmullins And data governance is an on-going process; it should formalize what already exists + address opportunities to improve.
jeffreyfkelly There is a need to set up boundaries but give analysts freedom to explore Big Data.
furrier Innovation will not come from regulations but creative developers to play with data -#slipperyslope
Q4 How does Big Data change data retention policies, ie, deciding what data to keep vs dispose?
A summary of the answers:
tomjkunkel Formal Data Destruction processes minimize the growing data landfill and need to be incorporated into Data Lifecycle Mgmt.
dvellante: Still must be able to defensibly delete data. you may not want WIP data hanging around - too much of a risk.
BTRG_MikeMartin Big Data is not immune to the laws of information economics: http://t.co/Ta361ASBkP
BigDataAlex Focus on workflow, business process, optimization. There is no set answer. Filtration - distillation
BTRG_MikeMartin Velocity of Big Data means current best data is changing rapidly, you want decisions on the best info.
BTRG_MikeMartin: It is important to have Big Data Management framework for good business outcomes inc. policy, security, ILM & quality.
Data is retained for internal + external reasons... Internal because
the org needs it for business – external because the law demands it.
tomjkunkel Isn't there also a need for Data Entrepreneurs (A business perspective with a knack for data)?
You may choose to retain more data for Big Data Management analytics
but be careful because data once retained is discoverable during court
furrier Big data complicates data retention policies - we have shadow IT and now "shadow data" or what I call "dark data".
Natasha_D_G Big Data can extend data retention esp in R&D. Pharmas can leverage old research to accelerate new research.
jeffreyfkelly This is a major issue: with hadoop you can now store all data inexpensively - not possible before and new challenge.
BTRG_MikeMartin NO still too costly.
Kari_Agrawal If we see the huge amount of IP packets flying around, can we process those packets to get something meaningful?
craigmullins There are over 150 different regulations (at the local, state, national, and international levels) that impact data retention.
Aarti_Borkar Retention is about storing what the business needs later vs everything - that core concept does not change with Big Data.
BigDataAlex Do we need to store everything? Can we, should we?
craigmullins No, no, and no to that last series of questions!
Natasha_D_G Data hoards say keep all! Fear of losing critical info.
jeffreyfkelly Nothing worse than looking for data you know you had only to remember you threw it away!
Aarti_Borkar Defensible disposal of data becomes harder if multiple copies are made as part of Big Data analytics.
craigmullins MT @Aarti_Borkar: Defensible disposal of data becomes harder if... hence the need for #datagovernance policies!
furrier We all want data retention but who owns it after it's retained..will a data marketplace economy develop?
TheSocialPitt Storage is a huge challenge, especially in cases with many streaming video feeds, e.g. defense.
Keep in mind regulations haven't caught up w the technology - industry
needs to be proactive on this issue or the government will.
Aarti_Borkar Big Data allows for pattern searches and trends in retained data that was not easy to do earlier.
is a lot of information! I hope you can follow the discussions. I
tried to clean up a little bit and hope that I didn’t change any content
from the participants.
To find out more about managing big data, join IBM for a free event: http://ibm.co/BigDataEvent
Q5 Once you decide what data to keep, how do you make sure it goes to the right systems and people?
jeffreyfkelly From a developer perspective, Big Data app dev tools need to improve, make it easier to deliver insight to business users.
BTRG_MikeMartin You must increase control of wasteful data even with Big Data Management, archive/retire & dispose http://t.co/Ta361ASBkP http://t.co/904FjTm2yA
Again it's a matter of information liability and asset management.
Which is more critical to your organization? Cutting risk or mining
BTRG_MikeMartin Big Data doesn’t change retention. Keep the data you need, get rid of the rest . You can’t afford to keep it all. http://t.co/Ta361ASBkP
dvellante This is a metadata problem / opportunity
craigmullins Policies, procedures, automation and education are needed to ensure that Big Data makes its way to the right systems + people?
Big Data approach needs to include improved business outcomes which
requires people process & technology working in harmony.
BTRG_MikeMartin You need to instrument processes to not only govern but make the best use of valuable data.
Data is code in the new paradigm of new apps & services - lots of
issues so developer create & data can learn & be smart.
furrier The integration of data create new datasets - future is smart data and learning data - data is code.
jeffreyfkelly Exactly, and new data sets could be highly sensitive - need governance RT @furrier: the integration of data create new data sets
Betharonoff Data as a commodity already exists, so economy is only a few steps down the road.
furrier Meta data practices will be impacted in this data quality and data-as-code concept.
furrier One aspect of this chat is business competitiveness in integrating data as code into business lifecycle and processes.
BigDataAlex Metadata tags are aligned to role based systems - automated systems.
If you don’t improve processes with Big Data management and create
better business outcomes your Big Data initiative isn’t a success.
Kari_Agrawal When and how do we decide to discard the extremely old data? Or do we retain it as in Data Warehouse?
craigmullins You need policies and automated procedures based on retention requirements.
PPB13 How do practitioners overcome emerging skepticism in the marketplace? http://t.co/x0pHJCDcfN
@BTRG_MikeMartin: You need to instrument processes to not only govern but make the best use of valuable data
BigDataAlex Moving "beyond search"
TheSocialPitt ALWAYS start Big Data project by thinking+planning. More data does not fix bad process.
BTRG_MikeMartin What processes to improve: ediscovery, ECM, Data Governance, Data Security, data retention, and data quality.
IBMbigdata Who decides "best"? RT @BTRG_MikeMartin: You need to instrument processes to not only govern but make best use of valuable data
joycetompsett Data quality has to take on the idea it will be moving around different sys/APIs Big Data management > critical for security #RSAC
BTRG_MikeMartin - Without the right tools data retention with Big Data could be a nightmare
skenniston RT @furrier: We all want data retention but who owns it after it's retained..will a data marketplace economy develop?
BTRG_MikeMartin That's where determining business value, legal and regulations come in typically only 30% of data.
Kari_Agrawal How exactly do we begin to classify data in case of Big Data?
craigmullins MT @PPB13: How do practitioners overcome skepticism... <-- by continuing to do work that adds value to your company
dvellante @BigDataAlex yes re: search - it's sometimes used as a 'blunt instrument'
Aarti_Borkar Deciding what data to retain needs to start with business policies defined upfront - its not an "on the fly" decision.
praxsozi RT @jeffreyfkelly: Q5 Big Data requires rethink of business processes - this is NOT a trivial exercise
Natasha_D_G Culture also plays role RT @jeffreyfkelly: Q5 Big Data requires rethink of business processes - this is NOT a trivial exercise
TheSocialPitt Data antique dealers RT @furrier: We all want data retention but who owns it after it's retained.will data marketplace develop?
tomjkunkel @BTRG_MikeMartin Integrated effort with Legal, Finance, Sales, Marketing with IT serving through best architecture.
BTRG_TomNestor The process must lead to better data which should drive better business opportunities.
BTRG_MikeMartin Big Data is not immune to the laws of information economics: http://t.co/Ta361ASBkP #CGOC
Q6 How does Big Data affect data lifecycle management? Does big data introduce new stages to the info lifecycle?
Summary of top answers:
BigDataAlex Yes, new stages - stages we haven't even imagined yet. Data needs to update itself into authoritative sources.
craigmullins One issue that arises is "How can you create realistic test data for testing Big Data systems and applications?"
jeffreyfkelly Yes, but we are just starting to understand Big Data lifecycle mgt - need to build out best practices.
BTRG_MikeMartin Big Data might not create new stages in life cycle management, but certainly with new domains we have to extend the data lifecycle to new platforms.
I disagree - I think new stage of LCM includes emergence of new data
sets created from integration of other data sets and then yet new data
sets created from integrating new new data sets, and on and on and on.
Aarti_Borkar Big Data makes handing the lifecycle of data a far more complex problem than before.
Natasha_D_G Can u say more? RT @Aarti_Borkar: Big Data makes handing the lifecycle of data a far more complex problem than before
Aarti_Borkar Big Data does not create new stages - just new ways to apply the existing stages to different use cases.
Dmattcarter What are some of those new use cases?
Aarti_Borkar Test Data and Privacy for Big Data is critical - as we bring in more data potentially creating a bigger security threat.
BigDataAlex Is there a new data management paradigm emerging?
craigmullins A new paradigm may indeed be emerging.
BTRG_MikeMartin RT perhaps a refined one
TheSocialPitt One new stage = 'ephemeral'.
craigmullins Let's not burden Big Data with things little data has not yet mastered.
craigmullins Sometimes we forget that - in practice - many orgs do not follow a lifecycle, practice data governance, ensure quality, etc.
craigmullins So yes, Big Data should do these things, but it is not failing if it does not.
Big Data Management requires identification & deletion of
ROT-redundant, obsolete & trivial data, which reduces storage &
StevenDickens3 What role does the community see for the original Big Data system of record the mainframe?
BTRG_MikeMartin Consider impacts of eDiscovery, governance, security and #ILM on Big Data stores how do we move traditional methods to Big Data management.
BigDataAlex Many organizations can only afford to store 20 copies of the same data - they are looking for authoritative against process.
jeffreyfkelly definitely, data sprawl becomes an issue
Kari_Agrawal How do we deal with redundancy in case of Big Data?
StevenDickens3 What is the collective view of centralised data vs multiple federated copies ?
Could some Big Data mgmt stages be the elimination of stages? Using
data/ data analysis without constraint and eliminating steps.
Aarti_Borkar Masking test data is essential to Big Data development: what the enterprise considers private needs to always be privatized.
craigmullins My next two Tweets mentioned some of them. Not saying Big Data shouldn't just that our stds should not be too high
craigmullins @BigDataAlex Yes #littledata concepts apply to Big Data... but many orgs still struggle with managing little data
tomjkunkel @Kari_Agrawal Destroy it! I can provide insight on best practices
Dmattcarter Pretty intense data quality and Big Data conversation going on around Big Datamgmt chat!
is already a data problem with #smalldata carrying it over to Big
Datamgmt. Too costly to delete all data that has no value.
BTRG_MikeMartin You must increase control of wasteful data even w Big Datamgmt, archive/retire & dispose: http://t.co/Ta361ASBkP http://t.co/904FjTm2yA
Q7 Are new tools and platforms required to manage Big Data and the new dimensions of the data lifecycle?
Summary of top answers.
craigmullins Tools for performing advanced analytics on Big Data – though not new to the industry – will be new to many organizations.
BigDataAlex Yes. We need new tools, platforms, and systems....it is happening. Calling for massive innovation - love #DataAsCode.
BTRG_MikeMartin @craigmullins It's called defensible disposal http://t.co/Ta361ASBkP
craigmullins Hadoop-based products will need to be augmented with mission-critical DBMS capabilities to become de rigueur.
craigmullins But I think DB2 (and other RDBMS products) could be extended with Big Data capabilities before that happens.
BTRG_MikeMartin Flexibility & scalability of Big Data platforms will themselves assist in helping Big Datamgmt security & controls...
BigDataAlex We need DigitalDNA - anticipating the Internet of Things - World Wired Web.
Aarti_Borkar It’s a mix of new tools and enhancing existing tools. The core solution does not change it morphs
StevenDickens3 All depends where data resides today and whether the current platform/tools are fit for purpose, if yes why move or retool?
tomjkunkel Legacy storage assets can't handle the high availability,low latency applications and need to be displaced.
jeffreyfkelly Yes, a major topic at #strataconf is making Big Data enterprise ready -need better mgt, data gov, DQ capabilities.
Aarti_Borkar Key innovation is required to ensure that both traditional and#big data are uniformly governed.
BTRG_MikeMartin Infosphere Optim helps you get control of structured data to feed only the good into Big Datamgmt: http://t.co/Y5Jniunn6N
jeffreyfkelly And don't forget security - #RSAC - must keep Big Data secure
craigmullins Which brings up regulatory compliance... another big issue
Aarti_Borkar Big Data gov starts with a uniform set of data classification and policies that cover ALL data. Metadata is the magic here.
craigmullins If the Big Data contains PII then all the regulations that apply to PII still apply - doesn't matter how big the data set is.
BigDataAlex Does this spill over into machine learning? Can we reduce dimensionality of data through associative memory?
BTRG_MikeMartin Yes innovation & big ideas as well as change our paradigms.
Betharonoff Interesting query RT @BigDataAlex A7: Can we reduce dimensionality of data through associative memory?
Q8 How does Big Data impact data stewardship? Who “owns” particular data in a big data environment?
BigDataAlex Great question - ownership is beginning to blur - standard licensing models for data are being challenged.
craigmullins All data is owned by the company, whether it is Big Data or not…
jeffreyfkelly Ah, but is it? social data, market data etc.
Internal ownership of Big Data while beyond traditional areas should
still be based on business value, compliance or legal hold.
craigmullins Of course proper data governance policies need to be enacted by the corp to confer #datastewardship and ensure proper treatment
BTRG_MikeMartin Without good data stewardship & Big Datamgmt it will difficult to unlock the value of big data: http://t.co/hGJ3QkTiJf
Aarti_Borkar Ownership of replicated data is the original biz owner- governance of that data is still their problem.
jeffreyfkelly This is a really hard one, again new biz processes informed by Big Data will impact who owns the data.
Aarti_Borkar Stewardship does not change just because a new copy of the data was created.
craigmullins True, but some Big Data is all new.
craigmullins The word "own" is always so troublesome, isn't it?
BTRG_MikeMartin Yes it needs to be well defined .
Aarti_Borkar @craigmullins - Oh so right! .. think "Responsible for".. is better than "own"...
BigDataAlex If DataAsCode, then if DataAsCode is viral, can it be controlled? Do we want it to be controlled? What does ownership mean?
BigDataAlex How does OpenSource apply to our Data?
BTRG_MikeMartin For more Big Datamgmt resources Data Privacy and Security: http://t.co/UL0VNCiivP
is a lot of information! I hope you can follow the discussions. I
tried to clean up a little bit and hope that I didn’t change any content
from the participants.
What are you sacrificing for the promise of big data?
of today's companies are making trade-offs in areas of security and
governance to leverage the promise of big data. Can you have both?
have been several responses to this question. First off, I’d like to
point you to the tweetchat that took place on Wednesday, February 27.
Several experts, including Jeff Kelly of Wikibon discussed the topic:
Garbage In/Garbage Out and data governance’s role in the big data era.
This was a very exciting event and all who participated said they had
fun and are looking forward to the next tweetchat. I had to split the
recap into two blog entries:
plan to write a third blog entry from the information gathered by
pulling out the list of related articles that were mentioned during the
Articles and blog entries related to this theme.
Managing Database Change
by Craig Mullins
looked at the many aspects of managing change, including 9 requirements
for successful change. He also shared his DBA perspective as the
“custodian of database changes.”
3 Big Data Issues: Security, Governance and Archiving
by Dave Beulke
notes, “While the new kid database Hadoop may be getting a lot of
press, it needs more capabilities to keep up with the mature platforms
and databases for data security, governance, archiving and temporal
Productionizing your Big Data: A Checklist of Key Considerations
by James Kobielus
your big data investment production ready? James shares five key
considerations to help ensure your big data investment can function as a
reliable business asset.
Must You Sacrifice Privacy for Big Data?
by Larry Dubov
organizations move towards big data, privacy concerns become very real.
Larry traces the history of privacy and explains why good governance
can help reduce the need to compromise on privacy.
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
by Sunil Soares
new flashbook provides an in-depth look at several aspects of governing
your big data, with several case studies to offer guidance and best
practices. Download your free copy.
Controlled Explosion: Keeping Big Data Contained with Security, Governance and Information Lifecycle Management
by James Kobielus
Can you control the big data explosion? James argues that you can, with careful attention to three vital types of controls.
Your Information is a Product
by Steven Adler
is the Chairman of IBM Data Governance Council. His view is that
companies who still treat their data as a raw resource rather than an
essential product that they produce, should prepare to be obsolete.
Steve has an ongoing blog related to the topic. You can find his blog
behind a registration page. Here is an entry to look for: The Data Governance Report Card.
Information Governance and Big Data
by Mike Martin
wrote this blog entry a year ago, but it is still as accurate today as
it was then: “I believe we should include Big Data in Information
Governance, after all one of the major issues we try to solve with
Information Governance is data volume and I consider Big Data solutions
to be yet another tool in the arsenal of Information Governance.”
Data Scientist: Exploration in the Age of the Unstructured
by James Kobielus
unstructured data (however defined) enters the enterprise big-data
picture, data management professionals begin to quake in their boots.
Governance of structured data is an established body of practices and
tools, focusing on enforcing controls on schemas and contents of data
deemed to constitute an official system of records in some subject area,
such as customers and finances.”
Infographic: Valentines: 4 database features we'd love to have in big data era.
Availability, Analytics, Governance, Security, Speed, and Compression. How do these values transfer to the Big Data era?
See also Database technology: Remember me? Big data will remind you!
summary of the conversations focused on how and why big data challenges
are posing a bit of a renaissance for databases. Some people see
databases as a commodity, something that gets taken for granted. But as
big data becomes a reality, databases are again taking center stage.
Follow us on twitter using the tag #bigdatamgmt and join in on the conversation.
During this week’s #bigdatamgmt TweetChat that was focused on Getting Control of Data in Big Data Era",
I noticed that many of the experts recommended articles to emphasize
their points. I’ve gathered all of these recommendations to create this
reading list. I hope you find this valuable.
You must increase control of wasteful data even with Big Data
Management, archive/retire & dispose. Big Data doesn’t change
retention. Keep the data you need, get rid of the rest . You can’t
afford to keep it all. Big
Data is not immune to the laws of information economics. You must
increase control of wasteful data even w Big Datamgmt, archive/retire
& dispose. Infosphere Optim helps you get control of structured
data to feed only the good into Big Datamgmt.
PPB13 How do practitioners overcome emerging skepticism in the marketplace?
Without good data stewardship & Big Datamgmt it will difficult to
unlock the value of big data. To deal with Big Data, speed, and volume:
be proactive by starting Big Data Management across the enterprise
now & maintain. Start now on data quality because if you don’t have it in now Big Data only magnifies data issues.
For more Big Datamgmt resources Data Privacy and Security. Exploring
Big Data still requires trusted data so you must secure and govern even
Big Data refers to datasets whose size, type and speed of creation make
it impractical to process and analyze with traditional tools. That Big
Data definition comes from wikibon; see Big Data Vendor Revenue and Market Forecast 2012-2017. So given that, definitionally then, traditional concepts are at the very least “impractical”… no?
BTRG_MikeMartin You can’t make good business decisions on bad data.
Data governance is critical in the Big Data management era as it makes
small problems bigger. You need data quality to enable Biginsights
searchCIO Metadata practices are gaining momentum as companies tackle Big Data.
searchCIO Data Quality is essential to master Big Data Management
dvellante Perspectives from a former CIO on the importance of data quality
furrier Dirty data equals poor user experience. I wrote about it in 2009 re: twitter facebook & social data
craigmullins Consider non-intrusive data governance; see this article by my friend Bob Seiner
great resources! For our next tweetchat, we’d like to build a reading
list that is sent out prior to the chat. Wednesday, March 15: Topic: Mobile Challenges for the Enterprise. Stay tuned for more information about this event.
To find out more about managing big data, join IBM for a free event on April 30: Big Data at the Speed of Business
What are you sacrificing for the promise of big data?
Recap of Tweetchat: "Getting Control of Data in Big Data Era"
Part 2: Recap of Tweetchat: "Getting Control of Data in Big Data Era"
What is a Tweetchat and why should you join us?
Big Data & Database Technology