Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]
 

Holly Hayes on model-driven data governance

Scott Laningham (scottla@us.ibm.com), Podcast Editor, IBM developerWorks
Scott Laningham
Scott Laningham, host of developerWorks podcasts, was previously editor of developerWorks newsletters. Prior to IBM, he was an award-winning reporter and director for news programming featured on Public Radio International, a freelance writer for the American Communications Foundation and CBS Radio, and a songwriter/musician.

Summary:  Holly Hayes, data studio program manager for IBM®, joins Scott Laningham to talk about model-driven data governance.

Date:  09 Dec 2008
Level:  Introductory

Activity:  90 views
Comments:  

developerWorks: This is a developerWorks podcast. I'm Scott Laningham here with Holly Hayes, data studio program manager for IBM. She's here to talk about model-driven data governance. How are things, Holly?

Hayes: Pretty nice.

developerWorks: Wonderful. You're in northern California; is that right?

Holly Hayes on model-driven data governance

Hayes: Bay Area.

developerWorks: Oh, lovely area. Everybody wants to visit there.

Hayes: Yes.

developerWorks: Or live there, but it's usually just visit, for the rest of us. [LAUGHTER]

Now, we've talked to others on this podcast before about governance, but it applies to many aspects of software. So what are we talking about here as governance applies to data?

Hayes: Well, I think people think about different things when they talk about data governance. They may think just about managing the security of data, they may think about the quality of data. And you see different definitions for governance. But ultimately, I think data governance is really about managing data as an asset such that you can minimize risks that are associated with it and you can maximize its value to the business.

We have an IBM Data Governance Council. They have of course a definition of data governance. They say that it's the process by which business goals and IT capabilities are transformed into requirements, policy, and action to manage data risk, enhance revenue, and reduce costs. Kind of a mouthful, I guess. [LAUGHTER]

developerWorks: That works. That explains it. With that said, what about big concerns that companies have today? And maybe these are concerns that they've had for years now, like brand value and how that can get diluted in this arena, or loss of revenue, compliance, things like that. Do you want to kind of take some of those and talk about them from the context of data governance?

Hayes: Well, absolutely. And you've hit upon pretty much the high-level concerns that companies have and really what's a driving movement. I mean, you mentioned brand value. One of the things is that if you have a data breach, well, No. 1, most states, for example, in the United States now have regulations in place that say if there is a data breach, you do have to inform people of that data breach. So they're highly visible, is sort of one issue.

developerWorks: Right.

Hayes: Right? And you know, it's pretty interesting how widespread data breaches are. I was looking at a Web site actually that talks about number of data breaches. It's privacyrights.org, if anybody's interested in looking it up. They've been tracking this in the U.S. since January 2005. And they've estimated that over 245 million customer employee records have been leaked.

developerWorks: Holy cow!

Hayes: I mean, you know, that's an astounding number, for one thing. And then, you think about, OK — well, what's the problem? Well, the problem, No. 1 is, is if you're leaking somebody's data, then we're going to lose trust. You know, and so you of course hear about brand value, reduction .... I was reading another report, and they were saying that on average the customer churn rate goes up by about 2 1/2 percent after a breach. So this is a huge concern, you know, for companies. You mentioned revenue loss. There was a Ponemon Institute study that was done, oh, it was about a year ago actually, and they were saying that it costs about $197 per customer record leaked, and that the average cost was $6.3 million per breach.

developerWorks: Oh my gosh.

Hayes: So, you know, this is not chump change. We're talking about very serious issues for companies from a brand-revenue perspective, from a customer-retention perspective, and from a revenue-loss perspective.

developerWorks: Absolutely.

Hayes: And of course, this is why, you know, these very highly visible and concerning breaches have, of course, generated a lot of focus from compliance both industry, as well as governmental regulations. So, therein lies, of course, another driver, you know, as if the others were not enough — but here's another driver of why customers are needing to look at data governance practices much more closely. Whether it's governmentally driven or industry-driven, they need to be able to ensure that they can demonstrate compliance. No. 1, perhaps to avoid fines — that would sort of be the least onerous of them — but even to be able to continue to participate in the industry.

For example, there's this payment-card industry data-security standard, and it was developed actually by the credit card industry, by the major credit card companies. And it talks about how you have to handle data that relates to, for example, credit card numbers. And if you are non compliant, then basically you don't get to participate, right? So as a retail company, I wouldn't be able to continue, you know, offering to accept VISA or MasterCard, for example, at my store, if I'm noncompliant with these regulations.

developerWorks: In terms of perspective around these numbers you're talking about, you know, it can sound like we're just completely failing in this space. And, how serious is it? I mean, is this just the result of a lot of people just not taking the very obvious steps in front of them to have better data governance and security?

Hayes: Well, no. I mean, I think companies certainly are in some cases. You know, what we see overall, I would say in general is a somewhat spotty approach to data governance. It's not something that most companies I've spoken with are really looking at in a holistic fashion. There's a lot of focus on data breach and so data security and, therefore, encryption technologies, for example. So you see some of that, people utilizing encryption technology so that if I have an asset that's lost .... There was a fairly highly publicized case not that long ago about an Accenture tape that was left in the back of somebody's car and then got stolen. So, if you encrypt that data then at least, you know, if you have those kinds of thefts, then you don't have the exposure of those records actually being leaked. So you see some of that.

We see some customers focusing on retention because, of course, there are regulatory requirements about how I'm going to have to retain data. We see some people looking at data quality. I mean, certainly if you're thinking about Sarbanes-Oxley or BASL-II compliance, you need to ensure that the data that you're using, for example, to assess credit risk, is reliable. So, data quality is certainly an issue.

But back to I guess my original point, was that what we see are mostly customers are looking at it in a somewhat spotty approach. They're folks that are looking at it from a security perspective. There are folks that are looking at it from a quality perspective. There are folks that are looking at it from a life-cycle management perspective or from an audit perspective, let's say. But we're not seeing as much evidence of customers or organizations looking at data governance in a really holistic practice. And I think that's really what we would like to see.

developerWorks: So is that what model-driven governance is all about?

Hayes: Well, model-driven governance is not a total answer to this data governance issue, but it is something that we think is going to facilitate data governance overall. When we talk about model-driven governance, what we're hoping to do is actually create a framework in which organizations can define governance practices and policies sort of early in the life cycle of the data when they're defining the data and then, we can automate more of the compliance to those policies throughout the life cycle of the data. So we call it model-driven governance because we're starting to utilize the data model itself as the vehicle through which those policies can be communicated.

developerWorks: On some level, it sounds like it's more proactive than reactive, to be real simple, then, right?

Hayes: Absolutely. The idea is to sort of define these practices up front and then automate them throughout the life cycle. And just to give you sort of a little example. I mean, today, in data modeling, you're defining your data elements and the attributes, and you can define a number of different kinds of standards for your organization like naming standards; "What words do I use? Or what acronyms? What naming patterns are valid?" You can define the meaning of information through business glossaries. You can define the values which are valid for data items, things like this.

But what we want to do is we want to expand the kind of attributes that you can associate with a data model. So, defining privileges: Who can access the data? Defining privacy attributes — if I'm talking about a credit card number, for example, I know that this is very sensitive data and that it is subject to payment card industry-compliance requirements.

So I want to be able to expand what I contain in the data model, and then once I can do that, I can define these things in the data model up front. And then let's say then during the development cycle, I can see that "Ah, this particular field is a credit card field. It's subject to PCI compliance; therefore, I cannot copy production data into a test environment unless I've masked that data." Or, as I roll that database into production, I have to ensure that encryption is in place to prevent data breach. So if somebody takes, you know, the copy of the data that I have sitting in the backseat, they can't do anything with it.

developerWorks: Right.

Hayes: Or, I ensure that auditing is turned on so that I have appropriate logging set up to ensure that I can track and monitor all of the changes that are associated with that data schema, and who it is that actually made those changes. Those kinds of things.

developerWorks: So, what is the reality around all of this right now vs. what is more futuristic thinking?

Hayes: Well, much of it actually is futures — although a lot of the capabilities exist in our modeling tools today. So when I talk about naming standardization and the ability of understanding the meanings or business glossaries and standardization of values, those are in data modeling tools today. More and more we see data-modeling tools implementing privilege modeling so I can define in the data model the users or groups or the roles that should have certain authorizations as it relates to the data, and we can automate the delivery of that as we rolled that into production.

One of the things that's pretty new is though this notion of privacy modeling. And in particular, this is something actually that we just introduced into our Rational® Data Architect product in September of this year. And in this context, you can define a privacy model, which indicates that a particular item has a certain sensitivity. Maybe it's related to personally identifiable information, or it's related to the PCI standard. And, if I take that data and I want to copy that production data into a test environment, then I have to have a special way of treating the data. I may need to mask that data such that it would be usable in a test context.

So we've provided the capability of modeling privacy in the product, and then we've also automated the capability of pushing that information out into our [Optim] solution that handles test data management and data privacy. So you can automate the generation of test data from a production environment, but that it is compliant from an all privacy perspective and so I'm not exposing information that I shouldn't be in a test context.

developerWorks: What other suggestions or tips do you want to provide for tackling these issues? I mean, things that people can think about doing immediately to make changes in their space.

Hayes: Well, with respect to this model-driven governance that I just spoke about, there's a great article that just came out in IBM Database Magazine in the October issue, and it's written by Paul Sakopolos. The title is "Integrated Data Management, Data Parenting in the Digital Age," and I'd certainly recommend that. For people who are looking at the broader data-governance topic, I would really encourage them to visit the IBM Data Governance site. The easiest way to get there I find is you just Google "IBM Data Governance." There's lots of resources on that page, many of which are the result of work within the IBM Data Governance Council. And the governance council is an international leadership group of about 50 C-level executives who have bene working together for some time to design comprehensive data-governance solutions.

They've developed a data-governance maturity model, there's an assessment workshop you can take, which can help you understand sort of where you are on your data governance journey. And Steven Adler, who's the leader of the Data Governance Council and a recognized authority on data governance, his blog is also listed there. So, those would be things that I would recommend people look at.

developerWorks: Holly Hayes, data studio program manager for IBM, was our guest. Thank you so much, Holly.

Hayes: Thank you very much. It's been fun.

developerWorks: Find out more on this topic in the Information Management zone at ibm.com/developerworks, IBM's premiere technical resource for software developers with tools, code and education on IBM products, and open-standards technology. I'm Scott Laningham. Talk to you next time.


Resources

About the author

Scott Laningham

Scott Laningham, host of developerWorks podcasts, was previously editor of developerWorks newsletters. Prior to IBM, he was an award-winning reporter and director for news programming featured on Public Radio International, a freelance writer for the American Communications Foundation and CBS Radio, and a songwriter/musician.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=
ArticleID=359022
ArticleTitle=Holly Hayes on model-driven data governance
publish-date=12092008

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).