Once you understand the potential legacy data and legacy database design problems that you may face (see Resources), the next step is to determine how you will address the problems. Table 1 compares and contrasts several strategies you have at your disposal. An important point to remember is that your project team isn't the only team facing these sorts of challenges. Almost every organization has such problems. As a result, a very large market exists for tools to help deal with legacy databases, a sampling of which I've listed in Table 2. Their basic features are:
- Extraction of legacy data
- Transformation of the legacy data to "cleanse" it
- Loading of that data into a new data schema that is more robust
Products that support all of these features are referred to as ETL (extract, transform, load) tools.
Although instruction on how to implement these three strategies is clearly beyond the scope of this tip, I can provide several words of advice:
- Do not underestimate the effort required. Data migration/improvement efforts, if they can be done at all, very often prove to be projects (or series of projects) unto themselves that often span several years. This is easily on the order of magnitude of your organization's Year 2000 (Y2K) efforts.
- Think small. A series of small changes, or refactorings, is often preferable to a single big-bang approach where you need to rerelease all of your organization's applications at once. Martin Fowler's book Refactoring (see Resources) describes principles and practices that should provide some insight into how to make incremental changes to your legacy data design. (Many of his refactorings are geared to changing object-oriented designs, but the fundamentals still apply.)
- Don't underestimate the effort required to address the problem.
- Consider the problem from an integration point of view.
- Did I mention not to underestimate the effort required?
Table 1. Strategies for mitigating legacy data problems
| Strategy | Advantages | Disadvantages |
| Create your own private database for new attributes |
|
|
| Refactor your data schema |
|
|
| Encapsulate database access with stored procedures, views, data classes/objects, or an API | Encapsulation, a clean access approach, can be presented to application developers. |
|
| Design your objects to work with the existing design as is | Your objects work with the legacy database(s). |
|
Table 2. Sample legacy data integration tools
| Tool | Company |
| Informatica PowerCenter | Informatica |
| ETI*Extract | Evolutionary Technologies International |
| Information Logistics Network | D2K Inc. |
| DataStage XE | Ascential Software |
| INTEGRITY Data Re-Engineering Environment | Vality Technology Inc. |
| Trillium Control Center | Trillium Software |
Note: This tip was modified from the Mastering Enterprise JavaBeans 2/e, to be published in autumn of 2001.
The first two tips in this three-tip series on the challenges of working with legacy data discuss common data-related and data design problems:
- For an introduction to refactoring, read Martin Fowler's Refactoring: Improving the Design of Existing Code (Addison Wesley Longman, 1999).
- Visit the developerWorks data management theme for an exploration of many of the issues relating to accessing and managing legacy data.
- In this tip, Scott W. Ambler presents four strategies for integrating your Java, J2EE, and EJB-based applications with existing legacy systems.
- Read about the object-data divide and how to overcome it in your EJB projects in two of Scott Ambler's tips from April 2001:
Scott W. Ambler is a Practice Leader for Agile Development within the IBM Methods group. He develops process materials, speaks at conferences, and works with IBM clients worldwide to help improve their software processes. Scott is author of several books, listed on his Web site at www.ambysoft.com. Scott is also a recognized Ratonal Thought Leader, whose homepage may be viewed here.