August 9, 2013 | Written by: Paul DiMarzio
Share this post:
I don’t know—that would be a rather silly joke! But since I have your attention, how about we explore the question I really had in mind: how many words beginning with V does it take to describe the characteristics of data?
Recently I wrote a blog post that discussed big data and its role in enterprise analytics. I used IBM’s definition of big data as a class of data that exhibits particular characteristics in the areas of volume, velocity, variety and veracity. I went on to propose that we should also be examining a fifth dimension, value, to help us completely understand how to most effectively use that data to drive business insights.
Well, it turns out I had the right idea but was only scratching at the surface of the issue!
Shortly after I submitted my blog for publication, I tuned in to a joint IBM-Gartner webcast on the topic of big data and the enterprise. I found it to be very well done and informative; you can watch the recording here.
In this webcast, Gartner analyst Donald Feinberg shared his view of what big data means for organizations, the impact it can have on businesses and the role of the mainframe in realizing this potential. As I did in my post, he also made it clear that you have to look beyond the characteristics that make data “big” to determine how best to mine that data for insights. Whereas I expanded the characteristics of data to include a fifth dimension, Feinberg noted that to fully describe data requires no less than twelve dimensions spread across four categories! Here is my sketch of what he presented:
(Note that Feinberg does not treat veracity as a dimension of big data, but rather lists validity—a similar term—as a dimension of governance. A minor quibble!)
In Feinberg’s view there are four main categories to consider. Bigness is just one, and not necessarily the most important.
The value dimension that I discussed in my blog is one aspect of what Feinberg categorizes as benefit, which would also include visibility and viscosity. Feinberg’s category of relevance (volatility, versatility, vocation), to my mind, is very closely related to benefit. In my blog post I advised that the first task of any analytics project should be to clearly articulate the questions to be answered, and then to locate the data that is most valuable in answering those questions. Having listened to Feinberg’s talk, I would expand the aperture of my direction to include all the characteristics of relevance and benefit, not just value.
For organizations that use the mainframe today, I believe that the data that is most relevant and beneficial to answering their most burning questions will reside primarily on the mainframe. When this is the case, analytics need to be moved to the data—not the other way around!
Feinberg’s fourth category, governance, may in fact be the most critical of all. If data is not held securely (vulnerability), is not accurate (validity—or veracity) or does not give a complete view of the truth (vacancy), then it really doesn’t matter if the data is big, relevant or beneficial; you cannot rely on the insights gleaned from data that is not properly governed. Data governance is one of the hallmarks and strengths of the mainframe and should definitely be closely considered as part of any overall enterprise analytics strategy.
Although I may not know how many words beginning with V it takes to change a light bulb, I now know that it takes (at least) twelve of them to adequately describe the characteristics of data!
Do you agree? Are there any more V-words that we need to use to describe data? Share your thoughts below or connect with me on Twitter.