I am a database guy (a DB2 guy, to be more precise). I do my work at the basement level of the IBM Information Management software stack. If the people who work with analytics, master data management, data integration, and other high-level software technologies are up on the bridge of the Ship of Data, peering through binoculars and wearing white jackets with gold epaulets, I'm down in the engine room. When the calls come in from the bridge, asking for more and more data to be beamed up with ever-greater speed, I call back with, "I canna' change the laws o' physics, Captain!" … and then I find some way to get the job done. You folks up top keep talking about strategies and paradigms. I'll keep talking bits and bytes, speeds and feeds. Welcome to my world. Down here, we like to keep it real.
The latest in a long line of squishy-sounding concepts to come my way is "data governance." Excuse me, chief, but that sounds like a solution looking for a problem. I'll let you in on a little secret: data is inert, which means it doesn't need to be "governed." It needs to be administered, and my friends and I have that situation well in hand, thank you. Go find something else to "govern."
What's that? You are, in fact, planning on governing something else when you talk about governing data? OK, I'm curious now. Tell me a little more. I'm listening.
The focus is on people
Who am I listening to? None other than Steve Adler, director of Information Governance Solutions at IBM and chair of the IBM Information Governance Council. He's on the other end of the phone, telling me that the oxymoronic nature of the term "data governance" actually serves a useful purpose: it prompts people to ask questions. "What in the world do you mean by 'data governance'?" Boom. Conversation started, and a chance for Adler to say that the real aim is governing behavior.
People interact with databases: they direct data flows; they interpret and service data requests. People also make mistakes—often the result of shortcomings in the processes they are directed to follow and the application systems upon which they rely—that introduce errors in the information pipeline, creating downstream data quality problems. Data governance, says Adler, is largely about systematically identifying and addressing error entry points. The goal is to build data that decision makers can trust—and also provide them with evidence that the data can be trusted.
Now that's a pretty good pitch. But before joining your cause, I'd like to see Big Blue eat some of its own dog food. Adler asks if I know about IBM's product catalog. Sure I do—I only worked for the company for 17 years. Lots of information there—to the tune of 120 million records. Apparently, a high percentage of those records contained incorrect information: data that was wrong, missing, or published before it should have been. Out of 255 IBM product announcements chosen at random, only five were completely error-free. That's a motivating statistic.
To find out where the data quality problems originated, a team set "traps" at various points in the data flows that led to the product database. The traps helped the team discover several errors that could be expected to occur given a certain set of circumstances. With the causes identified, the team could then design and implement process- and technology-based solutions to eliminate the sources of information inaccuracy.
Not my problem (or is it?)
I'm all for eliminating circumstances that lead to data quality problems, so you governance types go to it. I wish you success. No need to snoop around the databases that the DBAs and I are looking after—they're solid. How solid? At least 99.9 percent accurate. How many data records am I talking about? Well, a production database would probably hold north of a billion rows. Yeah, OK, even a very small error rate in a database of that size gives you a pretty big number. Point taken. Putting some of those traps in and around the database might be a good idea, after all.
But don't stop there, Adler says, because data quality problems are not just a matter of factual inaccuracies in data records. Sometimes, the pressing issue has to do with data categorization. Another story: An organization, partly due to acquisitions, ended up serving some of its larger corporate clients through multiple lines of business. One such client company complained that the same question, communicated to the service-providing organization through representatives of different lines of business, received different answers. It turned out that people employed in those different lines of business ascribed different meanings to the same terms. That's a data definition problem—the kind of problem that data governance aims to eliminate through effective master data management (MDM).
How do you get from A to B?
I'll admit that I'm starting to see some value in this data governance stuff. It's not quite as squishy as I'd first thought. Still, seeing potential value and getting real value are two different things. How do you get some traction with a data governance effort? Where do you start? How do you keep things moving forward?
Adler told me that his favored approach involves six steps, to wit:
- Identify your goals. Some goals will be of the sustainable variety (these will be ongoing), and others will be situational in nature (for example, dealing with a data quality problem). Right—you need to know where you're going before you start on your way.
- Identify what you're going to measure. If you want to boost data quality, how will you know if you've made progress? Perhaps you'll examine some percentage of documents in a repository and record instances of factually incorrect or missing information. Deciding on a process and criteria for measurement is important for assessing the baseline situation (where's the pain?) as well as for tracking progress. I get this. Talk is cheap: if you want to convince me that you're doing well, show me numbers.
- Understand your organization's decision-making model. Is it an autocratic model? Representative? Democratic? Whatever it is, is that the right model for your company? Does the data governance policy you're developing support the decision-making model? What would it mean if decision making were "better"? Would a larger volume of decisions be made? Or perhaps decisions would be made more quickly? I suppose that as long as you're working on improving data quality, you might as well take the time to assess how data drives decisions in your environment. Maybe that'll be seen as something that needs improving, too.
- Communicate the data governance policy effectively. How will information about the policy be communicated to stakeholders and other interested parties? Via e-mail? Through newsletters? Probably wouldn't do to just rely on the ol' office grapevine to get the word out.
- Measure results. What has your data governance policy actually achieved? If you came up with a plan for measuring progress (item 2 in this list), cranking out the actual numbers shouldn't be too difficult. Interpreting the numbers could be interesting.
- Audit the whole thing. Are processes being followed? Is technology being appropriately employed? Are controls effective? The word "audit" kind of sets my teeth on edge, but a policy—whether focused on data governance or something else—isn't worth spit if it's ignored. I don't like being checked up on any more than the next guy, but I understand the need to do this sometimes.
These are pretty high-level checkpoints, and a detailed project plan will have far more than six steps. But when faced with a complex task, framing the challenge in the right way can really help to focus people's efforts. To me, Adler's approach looks to be a good framing of the problem space. It's something you could build on.
OK, I'm in
A resident of one town once derisively said of another city, "There's no 'there' there." That's the way I used to think about data governance: interesting concept, but come on—where's the substance? Steve Adler makes a good case in arguing the business value of data governance. I do think now that there is some "there" there.
So if one of those high-up information management types visits you in the database engine room to talk data governance, give him or her a listen. Better yet, see what you can do to get involved in some way. When a grand plan is leavened with insights from people who have a rubber-meets-the-road perspective, the result is usually positive.
Gotta go. Maybe I'll see you around a coffee machine sometime. Who knows? We might even have a non-snarky talk about data governance.
Dig deeper into Information management on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.