Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Overcoming data design challenges

Strategies for overcoming problems with both data and data design

Scott W. Ambler, Prectice Leader, Agile Development, Rational Methods Group, IBM, Software Group
Scott W. Ambler is a Practice Leader for Agile Development within the IBM Methods group. He develops process materials, speaks at conferences, and works with IBM clients worldwide to help improve their software processes. Scott is author of several books, listed on his Web site at www.ambysoft.com. Scott is also a recognized Ratonal Thought Leader, whose homepage may be viewed here.

Summary:  As we've seen in the previous two tips, working with legacy data means that your source data is rarely clean, and the data schemas you are forced to work with are often little better. Here are some techniques and tools that can minimize the pain inflicted by problematic legacy data and data designs.

Date:  01 Aug 2001
Level:  Introductory

Comments:  

Once you understand the potential legacy data and legacy database design problems that you may face (see Resources), the next step is to determine how you will address the problems. Table 1 compares and contrasts several strategies you have at your disposal. An important point to remember is that your project team isn't the only team facing these sorts of challenges. Almost every organization has such problems. As a result, a very large market exists for tools to help deal with legacy databases, a sampling of which I've listed in Table 2. Their basic features are:

  • Extraction of legacy data
  • Transformation of the legacy data to "cleanse" it
  • Loading of that data into a new data schema that is more robust

Products that support all of these features are referred to as ETL (extract, transform, load) tools.

Although instruction on how to implement these three strategies is clearly beyond the scope of this tip, I can provide several words of advice:

  • Do not underestimate the effort required. Data migration/improvement efforts, if they can be done at all, very often prove to be projects (or series of projects) unto themselves that often span several years. This is easily on the order of magnitude of your organization's Year 2000 (Y2K) efforts.
  • Think small. A series of small changes, or refactorings, is often preferable to a single big-bang approach where you need to rerelease all of your organization's applications at once. Martin Fowler's book Refactoring (see Resources) describes principles and practices that should provide some insight into how to make incremental changes to your legacy data design. (Many of his refactorings are geared to changing object-oriented designs, but the fundamentals still apply.)
  • Don't underestimate the effort required to address the problem.
  • Consider the problem from an integration point of view.
  • Did I mention not to underestimate the effort required?

Table 1. Strategies for mitigating legacy data problems

StrategyAdvantagesDisadvantages
Create your own private database for new attributes
  • You have complete control over your database.
  • You may be able to avoid conforming to legacy procedures within your organization, speeding up development.
  • Replication of common data likely.
  • Unable to easily take advantage of the existing corporate, legacy data.
  • May still be required to integrate with the legacy corporate database(s) via triggers, programmed batch jobs, or ETL tools.
  • Your team must have database expertise.
  • Your project risks significant political problems because you may be perceived as not being team players.
Refactor your data schema
  • You have a clean database design to work with.
  • Your database schema can be redesigned to reflect the needs of modern, object-oriented and component-based technologies such as Enterprise JavaBeans (EJB).
  • This is very difficult to achieve.
  • Legacy applications will need to be updated to reflect the new data schema.
  • Need to identify and then fix all of your data-related problems, requiring significant effort to achieve.
  • You need to develop, and then follow, procedures to ensure that your database design remains clean; otherwise you will end up in the same position again several years from now.
Encapsulate database access with stored procedures, views, data classes/objects, or an APIEncapsulation, a clean access approach, can be presented to application developers.
  • Legacy applications should be rewritten to use the new access approach to ensure integrity within the database.
  • Significant effort may need to be made to implement your encapsulation strategy.
  • Your encapsulation approach may become an architectural bottleneck.
  • Depending on the range of technologies within your organization, you may not be able to find one strategy that works for all applications.
Design your objects to work with the existing design as isYour objects work with the legacy database(s).
  • Significant redesign and coding are likely required for this to work.
  • The actual problem, a poor database design, is not addressed and will continue to affect future projects.
  • May not be feasible depending on the extent of the mismatch between the legacy database design and the requirements for your application.
  • Performance is likely to be significantly impacted because of the resulting overhead of mapping your objects to the database and the transformations required to support those mappings.
  • Common approaches to persistence, such as EJB's container managed persistence (CMP) and the use of a persistence layer/framework, is likely not an option if the mismatch is too great.

Table 2. Sample legacy data integration tools

ToolCompany
Informatica PowerCenterInformatica
ETI*Extract Evolutionary Technologies International
Information Logistics Network D2K Inc.
DataStage XE Ascential Software
INTEGRITY Data Re-Engineering Environment Vality Technology Inc.
Trillium Control Center Trillium Software

Note: This tip was modified from the Mastering Enterprise JavaBeans 2/e, to be published in autumn of 2001.


Resources

  • The first two tips in this three-tip series on the challenges of working with legacy data discuss common data-related and data design problems:



  • For an introduction to refactoring, read Martin Fowler's Refactoring: Improving the Design of Existing Code (Addison Wesley Longman, 1999).

  • Visit the developerWorks data management theme for an exploration of many of the issues relating to accessing and managing legacy data.

  • In this tip, Scott W. Ambler presents four strategies for integrating your Java, J2EE, and EJB-based applications with existing legacy systems.

  • Read about the object-data divide and how to overcome it in your EJB projects in two of Scott Ambler's tips from April 2001:

About the author

Scott W. Ambler is a Practice Leader for Agile Development within the IBM Methods group. He develops process materials, speaks at conferences, and works with IBM clients worldwide to help improve their software processes. Scott is author of several books, listed on his Web site at www.ambysoft.com. Scott is also a recognized Ratonal Thought Leader, whose homepage may be viewed here.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and web services
ArticleID=86965
ArticleTitle=Overcoming data design challenges
publish-date=08012001
author1-email=scott_ambler@ca.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).