Today marks the first day of the new blog by David Birmingham called “Netezza Underground”. As you may remember, IBM purchased Netezza not that long ago. To refresh your memory, here is a description in the Press Release, IBM Completes Acquisition Of Netezza:
Netezza data warehouse appliances bring analytics directly into the hands of business users within every department of an organization such as sales, marketing, product development and human resources. The simplicity of deploying Netezza appliances makes the technology ideal for the needs of high-performance analytics, requiring minimal administration and IT skills, and enables clients to run complex data queries within days of deploying the solution.
David’s first blog entry introduces the need for the blog and looks at the counter-intuitive nature of the product's internals. In future entries, he’ll answer questions about the technology as a seasoned expert. If you’re interested in business analytics, data warehousing, and big data, this is definitely a blog that you want to subscribe to.
So, who is David? He’s not an employee of IBM or Netezza. David is a consultant working for Bright Light Consulting and is a huge fan of the Netezza product and architecture. A few years ago he also wrote a Netezza book:
Looking for more information about Netezza? Here are some additional resources that I’ve found:
- IBM Red Guide: The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics
- Netezza Community – links to David’s blog, plus other bloggers, including Jim Baum, Justin Lindsey, Phil Francisco, Brad Terrell, Shawn Dolley, Dai Clegg, and Patricia Colter.
- Enzee Universe – User Conference taking place in Boston June 20-22.
I’m looking forward to learning more about Netezza via David’s blog.
Great articles by amazing contributors. Make sure you read every article in the latest edition of the DM Magazine. I’ve read the article about Roger Sander’s contribution to the DB2 Certification Exam Development team and the announcement of his latest certification guide, which is a supplement. I’ve bookmarked many of the other articles so I can read them over the coming week. I hope you find something interesting and useful in this edition!
Taming Big Data - The realm of huge information flows is governed by new rules. What changes in the multi-petabyte, microsecond response, multimedia world? And how will Big Data change your job?
by Lisa Stapleton
It’s not just huge volumes. It’s not just microsecond response times, it’s not just incredible variety. It’s all three. Dealing with “nice” data, stored in well-defined data warehouse structures and handled sometimes months or years after it is first collected, is an increasingly smaller part of the job of data management. Here’s how folks are learning to handle the rest of it—the unstructured data, and the data that constantly changes as customers take actions and applications process input.
Tuning SQL at the Senate: E Pluribus Unum - When the SQL flow in the United States Senate went from static to dynamic, the database team had to see many queries, but tune them as one.
by Ives Brant
“High-performance government” may sound like an oxymoron, but for the teams that monitor and tune databases for the U.S. Senate, high performance is absolutely necessary to handle the huge number of queries that hit the Senate’s financial management system. InfoSphere Optim Query Workload Tuner software running on IBM DB2 z/OS helps the U.S. Senate DBAs and software specialists reduce query response time from 20 to 30 seconds to less than 2 seconds when they generate new statistics for a workload of SQL queries.
The Man to See About Certification - The guru of DB2 certification tests talks about how he puts them together—and how they can help your career
by Howard Baldwin
Q&A with Roger Sanders who has helped IBM develop 17 DB2 certification exams, more than any other individual. Want to know why certification is important? Roger Sanders is the man to ask. We talked to him about how the tests are put together, how they can help a DBA’s career, and—oh yes—about the certification test he failed.
Get Your Head in the Clouds - Data Pros are adopting cloud computing concepts to offer databases as a service - easing management burdens and sending users to cloud nine.
by Jin Zhang
Tactics and strategies for moving away from traditional provisioning models in which DBAs function solely in reactive mode—responding to user requests in nonstop “database, clone, database, clone” activities—towards a database-as-a-service, or DBaaS (pronounced as “D-Baa-S”), model employing cloud computing practices.
To offer DBaaS on the cloud, enterprise IT departments must undergo a process of constructing and managing a private enterprise “data cloud”—a platform consisting of storage hardware, virtual images, database schemas and more—and making that cloud available to users through a services interface.
Smarter is… Making Watson Smarter…Faster
by Howard Baldwin
A deeper look at the Jeopardy-winning system and the technology that enables it to process information in near real-time.
IBM Information Governance Council - Information Governance Community growing worldwide - IGC draws 1500 members, solicits best practices ideas
by the Magazine Team
The wisdom of crowds isn't an oxymoron. Large groups have an uncanny ability to get the right answer - just see James Surowiecki's book on the subject for proof. So it's not unreasonable to say that the Information Governance Council (IGC) is getting smarter with each passing month.
Data Architect - Securing DB2 Data - Grant privileges to a what, not a who
by Robert Catterall
These days executives are more concerned than ever about unauthorized access to data entrusted to their organization. Their fears are justified: a recent study showed that a third of those polled would quit doing business with a company they perceived to be guilty of a data security breach.
Distributed DBA - Using the DB2 Problem Determination Tool
by Roger E. Sanders
Sooner or later, every DBA encounters problems. Consequently, a skill that every DBA must possess is the ability to perform a logical, systematic search of a database system for the source of any problems that might arise. The DB2 Problem Determination Tool can help.
Programmers Only - New Order by Information: Part 2. The Impact of using the RANDOM index option on ORDER BY sort avoidance
by Bonnie Baker
In the last issue, Bonnie Baker began a series of columns concerning new aspects of ORDER BY. This column - Part 2 - covers the impact of using the CREATE / ALTER INDEX RANDOM order option on sort avoidance.
Performance tuning on Informix - Fastest Informix DBA Contest III - Performance tuning an OLTP system
by Lester Knutsen
Performance tuning is a continuous process for every DBA. Advanced DataTools Corporation conducted three fastest Informix DBA contests to highlight and learn what goes on in this process.
Imagine What You Could Do - Free your mind, and your business will follow
by David Buelke
Big Data, Big Time - Series data, warehouse acceleration, and 4GLs
by Stuart Litel
There are only a few shows remaining in the rest of this season’s collection of DB2Night Show episodes. Mark your calendar for these up coming events and take a look at previous shows that you may have missed. The education opportunity to you is huge.
#53 - Don't flip out! How to stop your query access plans from flopping! (aka DB2 HINTS!)
John Hornibrook, IBM STSM, Manager Query Optimization
Friday May 20: 11am ET, 90 minutes
Did you know you can "force" the DB2 LUW optimizer to choose a specific access strategy of your choosing? The secret is out...
Special guest John Hornibrook from the IBM Toronto Lab presented in Episode #52 where he talked about best practices for query tuning. In this show he’ll teach you how to exploit DB2 LUW Optimizer "hints", or, maybe more properly, how to tell the DB2 optimizer how to execute your queries.
Note that this show is scheduled for 90 minutes so that John can share all of his incredible presentation with you and have time for questions.
#z04 - What's new from the optimizer in DB2 10 for z/OS?
Terry Purcell, IBM SVL, SQL & Optimization
Monday May 23: 11am ET, 60 minutes
DB2 10 for z/OS is no exception to the goal of delivering incremental query optimization enhancements to the world’s most respected cost based optimizer. With skip-release migration supported, it is expected that many more customers may adopt DB2 10 in the coming year - so understanding the performance enhancements can be critical for those customers. Terry Purcell will share the insight uncovered from beta customer and early adopters, and provide the motivation for each enhancement including:
- “Safe” query optimization
- Improvements to complex OR and IN list processing
- RUNSTATS management and performance improvements
- And more!
#54 - DB2 9 LUW Core Engine Data Movement Utilities Overview with Oracle database comparisons
Burt Vialpando, Executive IT Specialist, IBM
Friday June 3 - 11am ET, 60 minutes
Special guest Burt Vialpando from IBM presented in Episode #22 which was the 15th most downloaded show in 2010. He talked about Comparing DB2 LUW and Oracle, Architectures and Administration. In this episode, Burt will give you an overview of each of the DB2 core engine data movement utilities: Load, Import, Export (with db2look), db2move, ADMIN_COPY_SCHEMA, ADMIN_MOVE_TABLE, db2relocatedb, restore from backup and split mirror. These will be compared to each other so that you can get a good idea of when to use and not to use each of these. To help round out the discussion, a comparison to the Oracle database core engine data movement utilities will be made to these DB2 utilities.
#z05 - B2 V10 Migration Planning and Early User Experiences
John Campbell, IBM Distinguished Engineer
Monday June 6: 11am ET, 60 minutes
In this episode, host Klaas Brant and guest John Campbell will introduce and discuss early experiences and lessons to be learned with DB2 10 for z/OS. It will provide quick hints on preparing for and executing the migration, performance expectations and opportunities, virtual storage constraint relief, some instrumentation changes, use of 1MB real storage frame size, use of hash access, value of rebind, etc. Key topics covered will include:
- Lessons learned
- Surprises and pitfalls
- Provide hints and tips
- Address some myths
- Provide additional planning information
- Provide usage guidelines
- Provide positioning on new enhancements
#55 - DB2 LUW Multi-Temperature Data Management
Kate Kurtz and IBM Smart Analytics Best Practices Team members
Friday June 17: 11am EDT, 60 minutes
Did you know that data has temperatures? Can data run a fever? What happens if your data catches a cold? Or rather, turns cold from hot? What can and should you do? What techniques are available for optimizing performance in databases where some data is more popular (hot) than other data?
In this episode of The DB2Night Show, various IBM experts will help us answer these challenging questions! Kate Kurtz along with team members from the IBM Smart Analytics Systems Best Practices team will share with us their expertise!
#56 Season #2 Finale! - Data Warehouse Performance Tuning!
Kate Kurtz and IBM Smart Analytics Best Practices Team members
Friday June 24: 11am ET, 60 minutes
Kate Kurtz and her IBM Smart Analytics Systems Best Practices team return again to share with us IBM recommendations and suggested best practices for optimizing performance of Data Warehouse Databases. If you're trying to make DPF or multi-partition databases run queries as fast as possible, then you should sign up for this show!
Hard to believe another season is over already! Thanks Scott Hayes and Klaas Brant for Entertaining, Informing, and best of all, EDUCATING us! We look forward to see what amazing ideas you come up with next season.
PS Click here for the recorded shows & commentary.
What do you know about IBM InfoSphere BigInsights?
BigInsights brings the power of Apache Hadoop to enterprises. This means that by using InfoSphere BigInsights, you can manage and analyze data in ways that were previously unimaginable. You an extract deep insights that can lead to greater efficiencies, value-add services, and opportunity for transformation.
That’s not much to go on, but what I can tell you is if you’re interested in learning more, take advantage of the special one-day class on Sunday, October 23, 2011 in conjunction with the IBM Information on Demand 2011 Conference in Las Vegas, Nevada. Registration for the IOD Conference is NOT required to attend this Sunday class.
This one-day training course is for system administrators and developers responsible for managing Apache Hadoop. Not only will you learn about this new technology, but after you’ve taken the course, you’ll be prepared to take and the BigInsights Technical mastery test, which you can take for free at the IOD conference.
- Big Data Overview
- Introduction to Hadoop and HDFS
- HDFS Administration
- Introduction to Map / Reduce
- Setup of an Hadoop Cluster
- Managing Job Execution
- Overview of JAQL
- Data Loading
- Overview of workflow engine
More about the Mastery Exam: Test M97: IBM BigInsights Technical Mastery Test v1
This proctored technical mastery test examines IBM BigInsights knowledge regarding the ability to identify, manage and close sales opportunities.
The test is applicable to sales representatives who demonstrate sales and technical knowledge of the IBM BigInsights product and targets the technical sales professional who can deliver a comprehensive business solution to customers through solution identification, product differentiation, and competitive positioning.
This technical mastery test meets one of the technical requirements for SVP (Software Value Plus) and counts as a skill towards Advanced and Premier PartnerWorld Membership levels. It is strongly advised that the candidate complete the recommended education prior to attempting this technical mastery test.
- Number of questions: 39
- Time allowed in minutes: 75
- Required passing score: 76%
- Test languages: English
- Section 1 - BigData Overview (23%)
- Section 2 - Introduction to Hadoop (18%)
- Section 3 - HDFS Administration (15%)
- Section 4 - Setup of a Hadoop Cluster (18%)
- Section 5 - Managing Job Execution (8%)
- Section 6 - Overivew of JAQL (18%)
Other Pre-Conference Education available at IOD this year:
Other blog entries about IOD Events:
Understanding Big Data: Analytics for Enterprise Class Hadoop and
by Dirk deRoos, Chris Eaton, George Lapis, Paul
Zikopoulos, Tom Deutsch
Big Data represents a new era in data exploration and utilization, and IBM is
uniquely positioned to help clients navigate this transformation. This Flashbook
reveals how IBM is leveraging open source Big Data technology to deliver a
robust, secure, highly available, enterprise-class Big Data platform.
The three defining characteristics of Big Data—volume, variety, and
velocity—are discussed. You’ll get a primer on Hadoop and how IBM is 'hardening'
it for the enterprise, and learn when to leverage IBM InfoSphere BigInsights
(Big Data at rest) and IBM InfoSphere Streams (Big Data in motion) technologies.
Deployment and scaling strategies plus industry use cases are also included in
this practical guide.
- Learn how IBM hardens Hadoop for enterprise-class scalability and
- Gain insight into IBM's unique in-motion and at-rest Big Data analytics
- Learn tips and tricks for Big Data use cases and solutions
- Get a quick Hadoop primer
This book is about Big Data: but you already knew that. Big Data
is a Big Deal! This book’s authoring team is well seasoned in
traditional database technologies; and all recognized one thing: Big Data is an
inflection point when it comes to information management technologies. In fact,
Big Data is going to change the way you do things in the future, how you gain
insight, and make decisions (the change isn’t going to be a replacement, rather
a synergy and extension). Recognizing this inflection point, the author team
decided to write this book to help you get quickly up to speed on this
technology and to show you the unique things IBM is doing to turn the freely
available open source Big Data technology into a Big Data Platform;
there’s a major difference and the platform is comprised of leveraging the open
source technologies (and never forking it) and marrying that to enterprise
capabilities provided by a technology leader that understands the benefits a
platform can provide.
By the time you are done reading this book, you’ll have a good handle on the
Big Data opportunity that lies ahead, a better understanding on the requirements
that ensures you have the right Big Data platform (as opposed to just
technology), and have a strong foundational knowledge as to the business
opportunities that lie ahead with Big Data and some of the technologies
PART 1: The Big Deal about Big Data
Chapter 1 – What is Big Data? Hint: You’re a Part of it Every Day
Chapter 2 – Why Big Data is Important
Chapter 3 – Why IBM for Big Data
PART II: Big Data: From the Technology Perspective
Chapter 4 - All About Hadoop: The Big Data Lingo
Chapter 5 – IBM InfoSphere Big Insights – Analytics for “At Rest” Big
Chapter 6 – IBM InfoSphere Streams – Analytics for “In Motion” Big Data
Chris Eaton, B.Sc., is a worldwide technical specialist for
IBM’s Information Management products focused on Database Technology, Big Data,
and Workload Optimization. Chris is also an international award winning
speaker, having presented at data management conferences across the globe, and
has one of the most popular DB2 blogs located on IT Toolbox at: http://it.toolbox.com/blogs/db2luw.
Dirk DeRoos, B.Sc, B.A. is a member of the IBM World-Wide Technical
Sales Team, specializing in the IBM Big Data Platform. Dirk joined IBM eleven
years ago, and has a Bachelor of Computer Science and a Bachelor of Arts (Honors
English) from the University of New Brunswick.
Thomas Deutsch, B.A, M.B.A., serves as a Program Director in IBM’s Big
Data business. Tom has spent the couple of years helping customers with Apache
Hadoop, identifying architecture fit, and managing early stage projects in 200+
George Lapis, MS CS, is a Big Data Solutions Architect at IBM's
Silicon Valley Lab. He has worked in database software area for more than 30
years. He was a founding member of R* and Starburst research projects at IBM's
Almaden Research Center in the valley, as well as a member of the compiler
development team for several releases of DB2.
Paul C. Zikopoulos, B.A., M.B.A., is the Director of Technical
Professionals for IBM Software Group’s Information Management division and
additionally leads the World Wide Database Competitive and Big Data SWAT teams.
Paul has written more than 300 magazine articles and 14 books on DB2 and can be
reached at: email@example.com.
If you were at the IOD11 Conference last week and tried to get a copy of the latest Flashbook “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”. If you weren’t at the conference, here’s what happened: I had 3000 printed copies of this book by IBM experts Paul Zikopoulos, Chris Eaton, Tom Deutch, George Lapis, and Dirk Deroos. We scheduled 2 time periods for the giveaways and author signings… Monday & Tuesday outside the event center after the opening session. None of the books were remaining after the first giveaway. Yes, it is true that we handed out 3000 copies on that Monday morning in what seemed like a mere 30 minutes!
I had an additional 1000 copies of this book delivered for Wednesday morning and had no trouble finding people who were eager to get a copy of the book to read. I’ve given out books at many conferences over the years and must say that this was the most exciting of all. More about the book: Flashbook: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data http://ibm.co/pVhiw2
#iod11 #bigdata #analytics
If you didn’t get a copy, I’m happy to announce that you can get the free e-version of this book on my bookstore page: ibm.com/software/data/education/bookstore. The book will be on that page later today or tomorrow… so don’t be discouraged if you don’t find it right away.
If you’re an IBMer and want a box of these books for an event, contact me via my IBM email address and I’ll give you instructions on how to get the books.
Want more? Here are other ways that you can build your knowledge and skills on this hot new technology:
1) Join Paul Zikopoulos this Friday, Nov 4 on the DB2Night Show, Episode #62 – Big Data Overview. The space for this free webinar is limited, so register now! Thanks again to award-winning IBM Champion Scott Hayes for doing such a great job of entertaining and educating the public!
2) Read Jeff Jonas’ blog entries about Big Data:
3) Dive into the best wiki in the world! BigInsights Technical Enablement Wiki: ibm.com/developerworks/wiki/biginsights/
4) Read this technical intro to IBM BigInsights, IBM’s Big Data platform, by Cynthia Saracco: Understanding InfoSphere BigInsights: An introduction for software architects and technical leaders.
5) One of the 82 Web-based Training Courses available for IM is BigInsights Essentials.
6) Want to learn Hadoop, MapReduce, HBase or other Big Data topics? Join free BigDataUniversity.com.
7) Read the excellent article by Lisa Stapleton that published in a recent edition of the IBM Data Management Magazine BigData - Volume, Variety, and Velocity. How do you attack something *that* big?
8) Read the article Big Data Impacts Data Management: The Big Vs of Big Data by IBM Champion Dave Beulke.
9) Follow the blogs and tweets that are happening real-time via the Big Data Daily.
10) Read the article by IBM Champion Craig Mullins Big Data and 150 Trillion Calculations Per Second at Vestas
Read, attend and learn!
I know… what happens in Vegas is supposed to STAY in Vegas. But I’m going to break that rule and give you links to a few videos that you’ll want to check out.
First, check out the interviews I did with some of the authors who were doing book signings at IOD11. I interviewed:
Sandy Carter about her latest book Get Bold: Using Social Media
Tony Giordano for his book Data Integration Blueprint and Modeling
James Taylor for his latest book Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics to Build Adaptive, Agile, Intelligent Systems
You can find all three videos on the IBM Press Book’s Channel on YouTube.
To go with the author them, remember to check out the audio interviews I did before the conference for the IM Skills Cast Series. I interviewed Roger Sanders, Roger Johnson, Filip Draskovic, Sunil Soares, and Bob Laberge.
Here are a few interesting clips that I found on YouTube that you may want to take a look at.
I’m already looking forward to IOD12. If you are too… here is the info so you can block your calendar and start building a justification to attend.
Last Friday Dan Dubriwny from the Big Data Tiger Team, IBM USA, was the guest speaker on episode 62 of the DB2 Night Show, hosted by Klaas Brant. The topic was BIG DATA Overview - What is it? Who cares? If you missed it, be sure to catch the replay.
According to Klaas:
If you have been to IOD 2011 in Las Vegas then the words Big Data will probably still resonate in your mind. The theme of IOD was "Turn Insight into action" and IBM is the only company that can do this for all sort of big data. Most people think that big data is all about Hadoop. And although Hadoop is one of the components in IBM's solutions it is not what big data is all about. Dan Dubriwny gives in this episode an excellent introduction into big data. It is all about Variety, Velocity and Volume. IBM can handle all of this for both data streams and data at rest. Probably by now you are curious how this all works, so go ahead and enjoy Dan's presentation with numerous examples.
Scott Hayes informs me that there is more… much more! Episode #63 on 18 Nov will be a deep dive into IBM solutions around BIG DATA.
Join special guests Paul Zikopoulos and Robert Thomas, both from IBM, to get a look at why IBM is the right partner for Big Data. You'll find that many vendors offer Big Data products whereas IBM is offering a Big Data PLATFORM. Most of this session will be technical--- show casing, for example, the file system (GPFS SNC) that's used with the IBM Big Data platform as opposed to the open source defacto standard HDFS because it provides better performance, management (it's POSIX compliant), security, and availability; with GPFS SNC, IBM hardens Hadoop for enterprise deployments. Of course, no one quite understands the enterprise like IBM so you can bet we are solving other enterprise challenges such as security, integration, governance, and more.
The DB2Night Show Episode #63: BIG DATA Technologies, Solutions, and Details
Speakers: Paul Zikopoulos and Robert Thomas
Friday, November 18, 2011
11:00 AM - 12:00 PM EST
As Scott always says, join to be educated, informed, and entertained.
Want more on Big Data? Check out this entry: Do you feel the excitement of Big Data?
What do you think? Are these the two top topics of 2011? It seems to me that I’ve blogged about these two topics quite a bit this year, and now we have the best… both in one webinar!
Join Leon Katsnelson, Uri Budnik, and Rav Ahuja for the next Chat with the Lab webinar: Leveraging Clouds for small and BIG data
Many experts agree that Cloud Computing has now matured past the initial hype phase and that organizations large and small have started to derive real benefits by placing applications and data on Clouds. In this webinar we will discuss how to leverage Cloud Computing for managing data and gaining insights from big data. We will cover various Cloud options (public, private, and hybrid) available for managing data in the Cloud with IBM DB2 database server as well as for deriving value from big data using IBM BigInsights hadoop-based platform. We will also show you how to be up and running in as little as a few minutes using free editions of DB2 and BigInsights. You ought to find this session useful regardless of whether you are a beginner or expert in cloud computing or big data.
Date: Tuesday, December 6, 2011 (6.12.2011)
Time: 12:30 PM - 2:00 PM Eastern Time (ET) 11:30 AM Central / 9:30 AM Pacific / 17:30hrs London / 18:30hrs Frankfurt, Paris / India 11 PM
Speakers: Leon Katsnelson (IBM), Uri Budnik (RightScale), Rav Ahuja (IBM)
Register Now >>> http://bit.ly/sY3Nfe
For questions/comments/suggestions: http://www.channeldb2.com/group/db2chatwiththelab
Thanks to Rav for this information.
Here are some other posts from my blog about these topics:
Yesterday I read a blog entry by Sunil Soares called Big Data Governance. This caught my attention as both Big
Data and Governance are huge topics in the industry today. You may recognize
Sunil’s name, and let me remind you where you know his name from. Sunil is an
expert in the Information Governance area and has had two books published on the
This book was published last year and is selling quite well. I wrote a
lengthy blog entry about this book that I encourage you to read: Meet Sunil Soares - Selling
Information Governance to the Business: Best Practices by Industry and Job
Function. In summary this book discusses the best practices to sell the
value of information governance. The objective of the book is to provide a
representative sample, rather than an exhaustive list, of best practices to sell
the value of information governance within an organization. You should use these
best practices as inspiration for what might work within your organization. It
is important that you read chapters from industries and functions outside of
your own because there are a number of case studies that you might find useful
for your specific situation. The book contains more than 50 case studies and 16
This is a Flashbook that was published in 2010. Printed copies of this book
exist, but in small supply. Printed copies are handed at selected events, but
you can get a free e-version of the book: Information Management Bookstore. What’s the book about?
Briefly, according to Arvind Krishna, General Manager at IBM:
IBM has assembled a comprehensive approach to Information Governance
delivers the industry’s strongest portfolio of products, services, and best
practices to address every organization’s needs. This book provides a
set of detailed steps and sub-steps to implement an Information
program, as well as the associated automation provided by IBM
Beyond the blog entry that caught my attention, I found two articles also
written by Sunil:
Information Governance: Big Data and the Road Ahead
Smart meters and Big Data: A clear case for governance best
In Sunil’s words: I am starting to see a convergence of two major trends in
the marketplace: information governance and Big Data. We are coining the term
“Big Data Governance” to reflect this emerging trend. I define Big Data
Governance as the formulation of policy to optimize, secure, and leverage Big
Data as an enterprise asset by aligning the objectives of multiple
I’m looking forward to see how this convergence pans out. If you’re also
interested, I suggest you keep an eye on what Sunil comes up with. I’m sure
he’ll make this emerging trend easy for you to understand and implement.
Date: Mar 27, 2012
Time: 12:30 PM - 2:00 PM (Eastern Time)
Speaker: Paul Zikopoulos (IBM)
Topic: An Introduction to Big Data
Big Data can mean a lot of things to a lot of people; but one thing we're sure of, it's the hottest thing to hit the IT landscape. In this chat you'll start with an introduction to Big Data so we're all on common ground as to what it is, how to spot it, and what the opportunities are. (Hint, be prepared to get shocked on Volume and more). With a strong foundation on what Big Data is, you'll be introduced to the IBM Big Data platform, what it looks like and its key components. In the final part of the chat, we'll dive into IBM's non-forked, embraced and extended Hadoop distribution (called InfoSphere BigInsights) and streaming technology (InfoSphere Streams), a tour of the Big Data platform features as it relates to these pillar components, and a get introduced to the toughest Big Data use case there is: text analytics and how this is just one areas where IBM is brining the "WOW" factor to Big Data.
Of course along the way you'll get introduced to clients that are using the platform and implicit use cases and be all set to investigate and appreciate not only what Big Data can do you for company, but how a partnership with IBM can make it happen that much faster with that much more confidence.
Register now to join IBM speaker, Paul Zikopoulos on Tuesday March 27th, 2012 at 12:30 PM Eastern. Sign-up to attend this free webinar.
Here are a few other resources you may wish to tap into if you’re interested in this topic:
1) Dan Dubriwny from the Big Data Tiger Team, IBM USA, was the guest speaker on episode 62 of the DB2 Night Show, hosted by Klaas Brant. The topic was BIG DATA Overview - What is it? Who cares? Catch the replay.
2) Paul Zikopoulos and Robert Thomas were guests on Episode #63 of the DB2Night Show and gave a deep dive into IBM solutions around BIG DATA.
3) At the IBM Information on Demand Conference that took place last October, copies of the Flashbook “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data” were handed out. If you didn’t get a copy, you can get the free e-version of this book on my bookstore page: ibm.com/software/data/education/bookstore.
4) Read Jeff Jonas’ Business Insider’s talks and his blog entries about Big Data:
5) Dive into the best wiki in the world! BigInsights Technical Enablement Wiki: ibm.com/developerworks/wiki/biginsights/
6) Read this technical intro to IBM BigInsights, IBM’s Big Data platform, by Cynthia Saracco: Understanding InfoSphere BigInsights: An introduction for software architects and technical leaders.
7) One of the 82 Web-based Training Courses available for IM is BigInsights Essentials.
8) Want to learn Hadoop, MapReduce, HBase or other Big Data topics? Join free BigDataUniversity.com. Already 15,000 students have registered.
9) Read the excellent article by Lisa Stapleton that published in a recent edition of the IBM Data Management Magazine BigData - Volume, Variety, and Velocity. How do you attack something *that* big?
10) Read the article Big Data Impacts Data Management: The Big Vs of Big Data by IBM Champion Dave Beulke.
11) Follow the blogs and tweets that are happening real-time via the Big Data Daily.
12) Read the article by IBM Champion Craig Mullins Big Data and 150 Trillion Calculations Per Second at Vestas
Read, attend and learn!
Do you ever wonder what the big brains in IBM read? You know the people who
I mean! Their names are on books, they have patents and speaking engagements,
and when they talk, you’re wowed at what they know and how they convey what they
Here’s a list of what Netezza / Big Data expert Krishnan Parasuraman
has been reading and recommends for others to read:
1. Analytics: The widening divide.
IBM Institute of Business Value in collaboration with MIT Sloan
2. Moneyball: The Art of Winning an Unfair Game.
Michael Lewis, W. W. Norton & Company
3. Pacific Northwest Smart Grid Demonstration Project
Euronext: Federated Data Architecture with IBM Netezza
5. T-Mobile: IBM Netezza Client Success Video - Network Engineering success at scale with IBM Netezza
6. Catalina Marketing Stays ahead of the Curve with IBM
7. T-Mobile crunching 17 billion transactions a day - What does it do with all that data?
8. Large Gene Interaction analytics at University at Buffalo, SUNY. IBM Case Study
9. The Netezza Data Appliance Architecture: A
Platform for High Performance Data Warehousing and Analytics. Phil
10. IBM Netezza Analytics - The advanced analytics platform inside every IBM
Netezza appliance. IBM Data Sheet.
You haven’t heard of Krishnan Parasuraman yet? He’s one of the
authors who is working on a new Big Data book that will be handed out at the IBM
Information on Demand Conference.
One of the things that I love most about the IBM Information on Demand
Conference is the buzz on books. Every year we have a giant bookstore that
showcases all the latest books related to DB2, Analytics, Big Data, and more.
My job is to arrange to have the right set of books at the conference and to
invite authors to be available to sign copies of their books for attendees.
All books purchased at the bookstore are discounted by 20% off the retail
Here are the featured books and authors that we have planned for this
There are many other new books available to purchase at the bookstore this
year, but I was only able to arrange signings for the authors who are attending
the conference this year. Stop by and check out all the great knowledge that is
The IBM Information on Demand Conference is a great place to promote books
that are published through out the year. In fact, quite often publishers time
the release of their books to coincide with this conference. As a result, we
are typically able to set up book signings and giveaways to help promote the
This year is no different. We have quite a few books that have published
already that we’ll promote as well as books launching just for the conference
and we’ll have 4 (possibly 6) new Flashbooks that we’ll be giving away!
“Flashbooks” are small-sized books that have fewer than 150 pages, and have
easy-to-read messages about products, solutions, or technology. Flashbooks are
published to coincide the IBM Information on Demand Conference. The huge
quantity of messages and content delivered at the conference makes it difficult
for attendees to retain all the key messages. Flashbooks are designed to contain
the key messages IBM experts want you to take away with you.
Here is the list of Flashbooks. More details for each will be available in
my blog shortly.
Warp Speed, Time Travel, Big Data, and more: DB2 10 for Linux, Unix
and Window New Features
by Paul Zikopoulos, Walid Rjaibi, George Baklarz,
Matt Huras, Matthias Nicola, Dale McInnis, and Leon
Information about the book: http://ibm.co/db210flashbook
Signing and Giveaway: Monday October 22, 12:30 – 1:30 at bookstore.
5 Steps to Business Analytics Program Success
by Brian Green, BlueCross BlueShield of Tennessee, Kay Van De
Vanter, The Boeing Company, Tracy Harris, IBM, Bill Frank,
Johnson & Johnson, John Boyer, RCG Global
Signing and Giveaway: Monday October 22, 5:00 p.m.–7:00 p.m. EXPO reception
Business Value of DB2 Optimizer and Analytics Accelerator - DB2 for z/OS
by Surekha Parekh, Terry Purcell, John Campbell,
Signing and Giveaway: Mon 12:30 – 1:30 at bookstore.
Harness the Power of Big Data - The IBM Big Data Platform
by Paul Zikopoulos, Dirk deRoos, David Corrigan, Tom
Deutsch, Krishnan Parasuraman, James Giles
Signing and Giveaway:
- Monday October 22, 9:45 – 10:45 at bookstore
- Tuesday October 23, 9:30 – 10:30 at bookstore
An excerpt of this book is available and can be found: The Big Data Hub
Note: After the conference, all of these books will be available in
electronic format. Follow my blog to find out how to get the softcopy!
Lots of people love IBM Redbooks. As you know, they are FREE and available
for you to download in PDF or EPUB versions at anytime from the IBM Redbooks website.
One of the most popular giveaways at IBM Information on Demand Conference is
printed IBM Redbooks. This year we have 10 different titles to give away.
Please note that the PRINTED copies are in limited quantity.
Make sure you attend one of the sessions planned at the bookstore to make sure
you get the title that you want. While you're getting your free copy, meet the authors who wrote the book and have them sign it for you!
I included links to the eversion of the book if it already exists. Notice
that some are not yet published? They’ll be launched specifically for IOD:
Tuesday, October 23, 12:00 – 1:00 at the IOD bookstore
Customizing and Extending IBM Content Navigator
Wednesday, October 24, 12:00 – 1:00 at the IOD bookstore
Thursday, October 25, 12:00 – 1:00 at the IOD
As you can see, we’ve made it easy for you. Come by the bookstore Tuesday,
Wednesday and Thursday at lunchtime to get a printed book!
While you’re at the bookstore, check out the technical books that are available for you to purchase at a
20% discount as well as pick up one of the many free Flashbooks that are scheduled to
be handed out at the bookstore on Monday, October 22.
Harness the Power of Big Data - The IBM Big Data Platform
by Paul Zikopoulos, Dirk deRoos, David Corrigan, Tom
Deutsch, Krishnan Parasuraman, James Giles
Information about book:
Big Data represents a new era of computing – an inflection point of
opportunity where data in any format may be explored and utilized for
breakthrough insights - whether that data is in-place, in-motion, or at-rest.
IBM is uniquely positioned to help clients navigate this transformation. This
book reveals how IBM is leveraging open source Big Data technologies, infused
with deep IBM innovation from over 6 billion dollars in analytics acquisitions,
that manifest in a platform capable of 'changing the game'.
The four defining characteristics of Big Data – volume, variety,
velocity, and veracity – are discussed. You’ll understand how IBM is fully
committed to Hadoop and integrating it into the enterprise. Hear about how
organizations are taking inventories of their existing Big Data assets, with
search capabilities that helps organizations put their hands around what they
already know, and extending their reach into new data territories for
unprecedented model accuracy and discovery.
In this book you will also learn not just about the technologies that make up
the IBM Big Data platform, but when to leverage its purpose built engines for
analytics on data in-motion and data at-rest. And you’ll gain an understanding
of how and when to govern big data, and how IBM’s industry-leading InfoSphere
integration and governance portfolio helps you understand, govern, and
effectively utilize big data. Industry use cases are also included in this
An excerpt of this book is available and can be found: The
Big Data Hub. Also see the Big Data Hub for blog entries, videos, and other
resources that will help you on your journey in the Big Data world.
If you are attending the IBM
Information on Demand Conference, get a free printed copy of this book at the Information Desk at the big data booth #622 in the Expo. Bring your book to get signed by the authors at the IOD bookstore Tuesday October 23, 9:30 – 10:30.
After the conference is over, a free e-version of the book will be available
for you to download. Follow my blog to ensure that you are one of the first to
know where you can download the book.
Other sessions by these authors:
3818A Top Enterprise Big Data Use Cases - South Pacific B; Mon, Oct 22,
2012; 2:15 PM - 3:15 PM
3920B Ask the Experts: How Can A Big Data Platform Help Me? - Tradewinds A;
Tue, Oct 23, 2012; 1:45 PM - 2:45 PM
3920C Ask the Experts: How Can A Big Data Platform Help Me? - Tradewinds A;
Wed, Oct 24, 2012; 3:45 PM - 4:45 PM
2017B How to Get Started with Your First Big Data Project - South Pacific F;
Mon, Oct 22, 2012; 3:45 PM - 5:00 PM
3618A Big Data Reference Architectures - South Pacific A; Tue, Oct 23, 2012;
10:00 AM - 11:00 AM
3802A Customer Insight and the Customer Experience: New Capabilities with
Big Data and Analytics - Palm A; Tue, Oct 23, 2012; 11:15 AM - 12:15 PM
1660B Big Data Governance: An Emerging Imperative - South Pacific H; Tue,
Oct 23, 2012; 4:30 PM - 5:45 PM
3822A How to Build an Exploratory Big Data Analytics Capability for the
Enterprise - South Pacific B; Wed, Oct 24, 2012; 5:00 PM - 6:00 PM
2017C How to Get Started with Your First Big Data Project - South Pacific B;
Thu, Oct 25, 2012; 8:15 AM - 9:30 AM
4000A Trusted Information - The Foundation for Efficient Operations and
Smarter Analytics - Jasmine F; Sun, Oct 21, 2012; 2:15 PM - 3:15 PM
2309A InfoSphere Information Integration and Governance Track Keynote:
Future Directions for InfoSphere - South Pacific J; Mon, Oct 22, 2012; 10:15 AM
- 11:15 AM
3524A Real-Time Analytics on Extreme Data: Optimization of a Bayesian
Algorithm Within InfoSphere Streams - South Pacific B; Tue, Oct 23, 2012; 1:45
PM - 2:45 PM
2092A Using Big Data Technology to Revolutionize Cyber Threat Detection -
South Pacific F; Wed, Oct 24, 2012; 5:00 PM - 6:00 PM
Just in time for IBM’s Information on Demand Conference: a new book by Sunil Soares: No
doubt the biggest topic in tech these days is Big Data. Sunil uses
this guide to focus on Big Data plus one of the most important tech
topics: Governance. This
guide focuses on the convergence of two major trends in information
management—big data and information governance—by taking a strategic
approach oriented around business cases and industry imperatives. With
the advent of new technologies, enterprises are expanding and handling
very large volumes of data; this book, nontechnical in nature and geared
toward business audiences, encourages the practice of establishing
appropriate governance over big data initiatives and addresses how to
manage and govern big data, highlighting the relevant processes,
procedures, and policies. It teaches readers to understand how big data
fits within an overall information governance program; quantify the
business value of big data; apply information governance concepts such
as stewardship, metadata, and organization structures to big data;
appreciate the wide-ranging business benefits for various industries and
job functions; sell the value of big data governance to businesses; and
establish step-by-step processes to implement big data governance. Sunil
is a leading expert in the field and is the founder of Information
Asset LLC, a consulting firm focused on helping clients build
information governance programs, and a former director of information
governance at IBM. He is the author of two other high rated books: The IBM Governance Unified Process and Selling Information Governance to the Business. Susan
in this event to experience IBM’s enterprise-class big data platform
that allows you to address the full spectrum of big data business
challenges. The morning will include interactive discussions and live
demonstrations of big data for social media and log analytics, then get
hands on with Hadoop scripting and text analytics with guidance from
development experts. This is a unique opportunity to develop skills and
learn about exciting technologies.
8:00 a.m. - 9:00 a.m. Registration & Complimentary breakfast
9:00 a.m - 12:00 p.m. Overview of Big Data and Demonstrations
12:00 p.m. - 12:45 p.m. Complimentary Lunch
12:45 p.m - 6:00 p.m Hands-on Lab
There are many dates and locations planned for this event. Two of the events fall into the month of October:
October 10, 2012 - Boston, MA
October 22, 2012 - New York, NY
November and December events are as follows:
November 8, 2012 - Washington, D.C.
November 15, 2012 - Austin, TX
November 15, 2012 - Toronto, Canada
December 6, 2012 - Pittsburgh, PA
Space is very limited so register today!
Flashbook: Big Data Analyticsby Dr. Arvind Sathi
About the Book:
Data Analytics is a popular topic. While everyone has heard stories
of new Silicon Valley valuation bubbles and critical shortage of data
scientists, there are equal number of concerns – Will it take away my
current organization or investment? How do I integrate my Data Warehouse
and Business Intelligence with Big Data? How do I get started, so I can
show some results? What are the skills required? What happens to data
governance? Unlike many other Big Data Analytics blogs and books, this
book presents a practitioner’s viewpoint. It identifies the demand for
Big Data Analytics, its engineering components and what happens on the
production floor. In doing so, it respects the large investments in
Data Warehouse and Business Intelligence and shows both evolutionary and
revolutionary ways of moving forward to the new brave world of Big
book discusses three perspectives on Big Data Analytics. First, why is
Big Data Analytics becoming so important and what can we do with it.
It presents major trends behind the rise of Big Data and shows typical
use cases tackled by Big Data Analytics – where leading organizations
are already seeing major benefits in using Big Data Analytics. Second,
it lists major components of Big Data Analytics - Unstructured Data
Analytics, Massively Parallel Processing, Adaptive Real-time and
Predictive Modeling, Data Privacy Management, Data Visualization, and
Ontologies. It shows how these components work together to provide an
integrated engine that can combine Big Data with traditional Data
Warehouse and Business Intelligence to provide an overall solution.
Third, it provides a glimpse at implementation concerns and how they
must be tackled. How do we combine various components, which are running
at different velocities and volumes? How do we get structured
information out of unstructured data and combine with other structured
data? How do we provide governance across this data, when the
originating data may have varying quality or privacy constraints? How
do we embark on an implementation road map in a systematic way to show
results as we go and build skill level and momentum for Big Data
Analytics in our organization?
If you are attending the IBM Information on Demand Conference, get a free printed copy of this book and meet the author at the Author Signing and Giveaway that is scheduled:
- Monday October 22, 4:00 p.m.–5:00 p.m. IOD Bookstore, Bayside Foyer; Session # 4243A in Smart Site.
the conference is over, a free e-version of the book will be available
for you to download. Follow my blog to ensure that you are one of the
first to know where you can download the book.
Also attend this session by the author of this book:
Session # 3645A - Smarter Operations at Verizon - A Case Study - Tue, Oct 23, 2012; 3:00 PM - 4:00 PM - South Seas J
Session # 1813A - Extreme Targeted Marketing: Micro Segmentation for Utilities and Cross-Industry Applications; Tue, Oct 23, 2012; 4:30 PM - 5:45 PM - South Pacific A
the ads say “I LOVE NY”. I’ve visit often and have many friends in NY.
If you’re looking for an excuse to visit the big apple, consider some
of these events that are taking place during Data Week.
When: October 22 - October 26. I’ll be in Las Vegas for the IBM Information on Demand Conference, but some of my colleagues will be in NYC at this event.
Where: Various awesome Manhattan locations.
Price: Most NYC Data Week events are free to attend, and anyone can attend.
What is Data Week: According to their website, NYC Data Week is co-produced by the City of New York's Department of Information Technology & Telecommunications (DoITT) and O'Reilly Media's Strata + Hadoop World Conference.
It celebrates and explores the people, industries, and organizations using data to fuel innovation in New York City. The Data Innovation in Finance Panel on October 24 and Data Innovation Across the City Panel
on October 25 showcase New York City business and government leaders
using data to implement change, and talking frankly about what it takes
to succeed with data initiatives.
Data Week events include:
- A Startup Showcase with Fred Wilson and Tim O'Reilly.
- Ignite NYC @Strata, a hackathon, numerous meetups, and more.
- IBM Big Data Developer Day • Oct 22 • 8:00am–6:00pm • IBM Client Center, 590 Madison Avenue, New York, NY
IBM’s enterprise-class big data platform at IBM's Big Data Developer
Day hosted by the IBM Big Data Development team. The morning will
include interactive discussions and live demonstrations of big data for
social media and log analytics, then get hands on with Hadoop scripting
and text analytics with guidance from development experts. Seating is
limited and you must register to be guaranteed a seat. Register today!
- If you can't make this one, see the list of other Big Data Developer Days.
- DataKind DataSprint • Oct 23 • 9:00am–5:00pm • Sheraton New York, Empire Ballroom, 811 7th Avenue 53rd Street, New York
hackathon focused on a critical New York City data project. DataKind is
incredibly excited to announce that we will be setting up shop all day
at the Strata NY Conference on October 23rd with a bunch of great data
problems for you to stop by and work on! We will be serving non-profits
and charities, using data to to solve some of their toughest problems,
so bring your data skills and get ready to make the world a better
place. If you're a socially conscious data hacker who wants to make the
world a better place, RSVP now! Entrance to our DataSprint is completely
- The Future of Security • Oct 24 • 9:00am–3:30pm • Theresa Lang Community and Student Center; The New School; 55 West 13th Street, 2nd Floor
Future of Security: Ethical Hacking, Big Data and the Crowd conference
will convene a daylong series of discussions to highlight the emerging,
disruptive forces changing the landscape of the global community. Key
panels include the following topic areas: Ethical Hacking / Hacktivism;
Big Data and Networks; and The Crowd and Crowdsourced Science. Organized
by the The Parsons Institute for Information Mapping (PIIM), The Center
for Transformative Media (CTM) of Parsons The New School for Design,
and The Richard Lounsbery Foundation
Be sure to see the agenda as there are many choices that may appeal to you. Wish I was going to be there!