IBM InfoSphere Optim Data Growth: Setting up your first Archive

This article introduces IBM® InfoSphere® Optim™ Solutions, which help you manage enterprise data in every phase of the information lifecycle. Learn the fundamentals of InfoSphere Optim Data Growth Solution. A step-by-step example shows how to set up your first Archive request and quickly get up to speed. The essentials of an effective archiving strategy are also covered.

Share:

Satvinder Kaur (satvinder.kaur@in.ibm.com), InfoSphere Optim Specialist, IBM

Photo of Satvinder KaurSatvinder has close to 14 years of experience in database administration, consulting, and DB2 replication and implementation. Currently, she is an InfoSphere Optim architect (data growth/data masking and TDM solutions) for a major healthcare client. Satvinder is an InfoSphere Optim Specialist, Database Consultant (DB2), Replication Architect, and IBM Certified IT Specialist.



21 July 2014 (First published 17 July 2014)

Introduction to InfoSphere Optim Solutions

Organizations are increasingly challenged with successfully managing data growth. They have large volumes of data, which is stored in various data repositories, which are likely to grow exponentially in coming years. This growth is leading to operational and storage issues that create havoc with performance and maintainability of applications.

Try out InfoSphere products before you buy them.

IBM InfoSphere Optim Solutions manage data from requirements to retirement:

  • Optim Data Growth Solution
  • Optim Test Data Management (DM) Solution
  • Optim Data Privacy Solution (data masking)

Optim Data Growth, a market leader in archiving space, provides excellent enterprise data management solutions that help solve various business problems that are related to data growth, test data management, and data masking requirements. Optim is a single, scalable solution that supports almost all applications, databases, and environments in any enterprise.

With data that pours in from diverse sources, application-independent access methods are critical in fast-changing scenarios. With Optim, users can:

  • View, query, and generate reports from data in archives by using industry standard methods (SQL, ODBC/JDBC, XML) and reporting tools (such as Crystal, Cognos, Business Objects, PS Query, SQR, Discoverer).
  • Browse the archive files directly with Optim's data browser.
  • Export archived information to a CSV file or Excel spreadsheet.
  • Provide access from the originating application itself.

Optim Data Growth Solution

Optim Data Growth Solution resolves the adverse impact of rapid data growth by archiving historical data safely to a secure archive. It enables enterprises to achieve universal access to archives for retention compliance, legal requirements, and long-term usage. Archiving historical data out of production lightens the database volume, streamlines processing workloads, improves application performance, and lowers overall storage costs. Archiving also enables faster application upgrades as it reduces the amount of data to be migrated, which reduces downtime and helps achieve timely project completion. To control the proliferation of applications and their maintenance across an organization, Optim allows companies to safely retire legacy or unsupported applications (decommissioning) while still allowing access to underlying data records.

Optim Test DM Solution

It's common practice to clone an entire production database for use in application development and testing. When data is duplicated, storage and maintenance costs increase proportionally and affect the company’s total data burden. Optim Test DM helps by extracting a precise subset of application data records, which are complete in business context. It creates targeted right-sized test databases that are faster and can be repeated more easily than cloning entire production copies. Companies can speed up the deployment of new or upgraded applications with Optim Test DM. With less data to process, iterative testing cycles are completed faster. This data subset also facilitates identifying and fixing bugs in the earliest stages of development.

Data masking

Optim data masking is a solution to protect confidential data by de-identifying (masking) sensitive or personal identifiable data. To mask the data, the solution replaces it with valid but fictitious data, which renders the data unusable and without value even if it’s stolen. This masking protects the client's business from financial loss of information, and provides a simple-to-use solution to its IT department. Optim helps to create a protected test database so IT staff generate accurate test results while still adhering to privacy policies applicable to HIPAA, PHI, and other forms of confidential data in an organization.


Optim Archive project phases

By implementing good archiving strategy, organizations can effectively control growth of their enterprise data. Standard phases include:

Planning
Investigate the project goals. What is the driving force, or expected benefit, of the archiving project? Requirements are largely driven by one or many factors among data growth management, compliance, cost reduction, improving performance. Choose an Optim project team with the appropriate skills based on the drivers, complexity, and size of the project.
Analysis
Review what data needs to be archived, and the archival and retention rules that apply to the data. This information is provided to the Optim team in a "Technical Requirements" document from the client.
Design
Design of the solution differs depending on different requirements or goals. For example, a decommissioning archiving solution design that runs one time differs from a design that requires continuous archiving in the production system to regularly move older data out of production database. (This article explores a design for a general archive to illustrate how the archive process is designed with Optim tools.)
Testing
After the archive design is built in a development environment, test it in different scenarios to ensure that the desired outcome is achieved.
Deployment
During deployment, Optim helps to promote archive design scripts from development to the production environment. You can use different utilities, such as export/import, based on the deployment models with which the Optim project is designed.

Working with InfoSphere Optim Data Growth

Instructions for installing InfoSphere Optim Data Growth and configuring your first workstation are outside the scope of this article. See the Installation chapter of InfoSphere Optim Data Growth Solution for information.

You can design various types of archive architectures that depend on the performance and scalability requirements. For this article, you use a simple configuration where Optim operates on a single workstation and all processing takes place on the same machine.

It is assumed that the following steps are completed:

  1. At least one database is installed on a local Windows® machine. For this example, the workstation was installed with the SAMPLE DB2 database where Optim directory tables and test tables that participate in the Archive example are located.
  2. Install Optim on the same machine (first workstation).
  3. Configure the first workstation to create an Optim directory, which is stored in the SAMPLE database, and any DB Alias to access your databases (SAMPLE). Enable the Server feature if the first workstation is to be used as an Optim Server (which it is in this case). Also, review product configuration and personal product options; most of them are set as the default.

Common elements

Optim processes and tracks progress of its utilities and functions by storing various kinds of objects in an Optim directory. For example, a directory can store the DB alias, access definitions, table maps, column maps, action requests, and more. New and modified primary keys and relationships can also be created in Optim, in addition to what is defined in the source database depending on the project archive requirements.

Figure 1 shows an overview of the Archive process. The rest of this section discusses a few elements that are used in building your first Archive.

Figure 1. Archive process overview
Diagram of archive process overview

Optim directory

The directory can be considered the "brain" behind how Optim performs and does data processing. Optim creates a set of tables in which it stores objects that are needed for processing enterprise data management tasks and tracking processing status. These tables get created while you configure the first workstation before you use Optim.

Best practice

Create as many Optim directories as needed to satisfy site requirements (for example, one for each kind of environment: Development, Test, and Production).

Generally, in a small architecture setup, one site typically shares one Optim directory regardless of the number of database instances to be accessed or the number of workstations that use Optim. In the example, as shown in Figure 2, I created the directory with the default name OPTIMDIR when I configured the first workstation. Some important objects that are stored in an Optim directory include DB Alias, access definitions, table maps, column maps, primary keys, relationships, and action requests.

Figure 2. Optim Directory that is configured in Optim
Screen capture of Optim Directory that is configured in Optim

DB Alias

DB Alias, as shown in Figure 3, is one of the mandatory Optim directory objects. It provides parameters that are needed to access specific target or source databases. Each DB Alias name must be unique, and each database can have only one DB Alias that is defined in Optim. DB Alias provides a naming convention that is used to specify high-level qualifiers for object and table names to access specific databases and to perform requested functions. To denote this convention, it serves as a prefix in the fully qualified names of primary keys, database tables, and relationships. The format is: Dbalias.creatorid.tablename

Figure 3. DB Alias that is configured in Optim
Screen capture of DB Alias that is configured in Optim

Action requests

For data growth management, Optim has four significant processes:

Archive
Processes user-defined data from databases according to data management policies set in the organization, yet retains access to the archived data for future access, compliance, or restoration purposes. Figure 4 shows an example.
Delete
Deletes archived data from selected or all tables. The Delete process can be defined as part of the Archive process or in a separate step after it archives data. The latter option is used when you want to validate the archive data or have a data retention policy that is defined for data deletion in an enterprise.
Restore
Selectively restores archive data back to a production database or different database. This process brings back data from archive files to a relational database and generally is used when an application needs to access or modify the archived data. Moreover, methods available in Optim can restore archived data into a database even if the original data model changes.
Browse
Enables user access to view or browse archived data that is extracted in archive/extract files without having to restore data.
Figure 4. Actions
Screen capture of Action requests with Archive selected

Access Definition

With the Access Definition, one of the most important elements, Optim designers can define objects, or a set of related data, for Optim to archive or restore. It identifies database tables that contain a desired set of data to be archived and how relationships are to be traversed between different pairs of tables involved. Use the Access Definition to define selection criteria to limit a subset of data to be accessed. While you can create local access definition, it's also possible in Optim to create a named Access Definition with the intent of reusing it by a different user or an Optim workstation.

Archive File

Optim lets you store your data in either extract or Archive Files. An Archive File contains data, object definitions, and many other database components that together can re-create a database if required. You can store an Archive File on a local, remote, or shared file server. View the contents of an Archive File by using the Browse utility that is provided by Optim. Similar to how indexes help with search and retrieval of data from database tables, you can create an Archive index so you can quickly Browse or Restore data.


Designing your first Archive

In the following example scenario, the IT department of company X received a directive to archive all records for a customer in a country where the company stopped its sales over two years ago due to a legal issue. The data needs to be archived because the company:

  • Wants to store the data for the future due to a compliance requirement.
  • Wants to delete data from the production database to control the current data growth.

This exercise will refer to sample tables created when you configure the first workstation, which is stored in SAMPLE database.

The solution involves:

  • Identifying the data to archive.
  • Designing an Archive by using the defined selection criteria.
  • Creating the Archive with the Delete option.
  • Defining the data to delete.
  • Running the Archive.

Source information

For this scenario:

  • Optim Directory: OptimDir
  • DB alias: SAMPLE
  • Table Description and relationships, which are shown in Figure 5, are:
    • OPTIM_SALES contains salesman information.
    • OPTIM_CUSTOMERS contains customer personal information, address, sales, and the associated salesman.
    • OPTIM_ORDERS stores order date, customer who made the order, salesman who sold the order, ship date, and related information.
    • OPTIM_DETAILS stores order details such as item numbers, quantity, and unit prices.
    • OPTIM_ITEMS contains master details of items such as description, category, rating, or unit prices.
Figure 5. Table relationships
Diagram of Table Relationships

Designing the Archive

Use the following steps to design the example Archive.

  1. Open the Optim tool.
  2. Click File -> New -> Archive, which opens the Archive Request Editor.
  3. Enter a description for your Archive Request that reflects the purpose of this archive, as in Figure 6. Your Description is Archive AU country data and ArchiveAU.AF is the name of the output Archive File.

    Tip

    You can add Optim variables to dynamically generate a regular archive file name from the same archive request. For example, adding the suffix ArchiveAU<$YYYY — $MM — $DD — $SEQ4> generates an archive file name with today's date and sequence supplied: ArchiveAU2014-05-21-0001.AF.

  4. Click or tab to the Archive Index File field. The Optim tool automatically inserts the Archive Index File name as the Archive file given earlier along with the .AFX extension.
  5. Clear the check box for Defer Delete after Archive. Within this same archive request, you will run a Delete request after you archive the data.

    (You can optionally add this archive to Group, which acts as a tag so it can be associated with related Archive files for future search criteria purposes. Also, experiment with various other Optim options that you can specify in this dialog.)

    Figure 6. Archive Request Editor
    Screen capture of Archive Request Editor
  6. Click the Edit access definition icon on the toolbar, as shown in Figure 7, to open the Access Definition Editor dialog and define the data for archiving.
    Figure 7. Archive Request Editor toolbar that shows the Edit Access Definition button
    Screen capture of the Edit Access Definition button on the Archive Request Editor toolbar
  7. Enter a description for your Access Definition: AD for Archiving AU country data. In the Default Qualifier field, click Select, to the right of the Default Qualifier box to open the Select Default Qualifier dialog.

    In the Select Default Qualifier dialog, double-click the DB Alias, SAMPLE, and the Creator ID SKAUR (the sample table was created with SKAUR user) to select the default qualifier as SAMPLE.SKAUR.

  8. Data fetch begins from OPTIM_CUSTOMERS tables and then find related records to archive. In the Start Table field, click Select, on the right of the Start Table box, to open the Select Access Definition Start Table.

    In the Select Access Definition Start Table dialog for the DB Alias, SAMPLE, double-click OPTIM_CUSTOMERS to select it as the start table. This pattern forms the table as three parts: SAMPLE.SKAUR.OPTIM_CUSTOMERS. Once selected, the Start Table field now contains this table name.

    This action also populates the table name as one of the first entries in the Table/View column.

  9. Right click OPTIM_CUSTOMERS and select Add Tables to add the remaining related tables.

    Another window opens with the current DB Alias SAMPLE selected.

  10. Select the box to Find Tables Related to Table, as in Figure 8. Verify that you have Both (the default) selected to find both parent and child tables that are related to the specified table. Click Display to see what tables would be selected. There are four tables (OPTIM_SALES, OPTIM_ORDERS, OPTIM_DETAILS, and OPTIM_ITEMS) in the list.
    Figure 8. Selecting related tables for Archive
    Screeen capture of selecting Related Tables for Archive

    Tip

    It's often a requirement to archive data based on data age (data older than three years, for example). Optim supplies a date function for selection criteria: BEFORE (nD, nW, nM, nY), where D, W, M, Y indicate number of days, weeks, months, or years before current day at run time. If you want to archive data older than two years, use BEFORE (2Y).

  11. Click Select All Matching, which returns to the previous window (Access Definition Editor). Now you see all five tables: OPTIM_CUSTOMERS, OPTIM_SALES, OPTIM_ORDERS, OPTIM_DETAILS, and OPTIM_ITEMS.
  12. You need to define the selection criteria, which is COUNTRY_CODE=’AU’ for your example. In the Access Definition Editor dialog, right click OPTIM_CUSTOMERS and select Table Specifications -> Selection Criteria.
  13. In the Table Specifications, enter =‘AU’ in the COUNTRY_CODE column, as in Figure 9.
    Figure 9. Selection Criteria
    Screen capture of Selection Criteria tab
  14. To close the Table Specifications dialog, click the X in the upper right corner.

    The selection criteria icon is added in the Table Specification column of the Access Definitions Editor, as in Figure 10.

    Figure 10. Access Definitions main screen that shows selection icon
    Screen capture of the AD main screen showing selection icon
  15. Click the Relationships tab to review the traversal path of parent to child (default option 1) or child to parent (option 2). For this example, leave it option 1.

    Note: Option (1) determines whether Archive traverses a relationship from child to parent and archives a parent row for each selected child row. By default, Option (1) is selected. As a result, this setting maintains the relational integrity of the data. Option (2) determines whether more child rows are archived when a parent row is archived because of Option (1). In other words, if Archive traversed a relationship from child to parent and archived a parent row, all child rows of that parent are archived when Option (2) is selected. By default, Option (2) is cleared.

  16. Click Show Steps from the toolbar to show the traversal path that Optim uses internally for this particular Archive. Listing 1 shows an example.
    Listing 1. Show steps screen
    Step  1:  Extract Rows from Start Table SAMPLE.SKAUR.OPTIM_CUSTOMERS. 
    Selection Criteria and/or Statistical Controls are used, these determine rows selected.
    
    Step  2: Extract Rows from SAMPLE.SKAUR.OPTIM_SALES which are Parents of Rows 
    Previously Extracted from SAMPLE.SKAUR.OPTIM_CUSTOMERS in Step  1 to satisfy an RI rule
    using Relationship RSC. 
    
    Step  3: Extract Rows from SAMPLE.SKAUR.OPTIM_ORDERS which are Children of Rows 
    Previously Extracted from SAMPLE.SKAUR.OPTIM_CUSTOMERS in Step  1 using Relationship RCO. 
    
    Step  4: Extract Rows from SAMPLE.SKAUR.OPTIM_DETAILS which are Children of Rows 
    Previously Extracted from SAMPLE.SKAUR.OPTIM_ORDERS in Step  3 using Relationship ROD. 
    
    Step  5: Extract Rows from SAMPLE.SKAUR.OPTIM_ITEMS which are Parents of Rows 
    Previously Extracted from SAMPLE.SKAUR.OPTIM_DETAILS in Step  4 to satisfy an RI rule 
    using Relationship RID.
    You want to delete the transactional data present for selected country customers in the OPTIM_ORDERS, OPTIM_DETAILS, and OPTIM_CUSTOMERS tables from the database. But, you like to keep the master data OPTIM_SALES and OPTIM_ITEMS tables in production because this data will be relevant to future operations. Follow these steps.
  17. In the Access Definition window, select the boxes in the Delete Rows After Archive column for tables that are selected for the delete operation, as in Figure 11. With Delete as part of the same archive request, Optim will delete data from selected tables after a successful archive.
    Figure 11. Access Definitions, delete rows from table option
    Screen capture of AD main screen that shows Delete rows from table option

    Tip

    You can DEFER the DELETE to a later date if you want to involve the testing team to validate data that is archived.

  18. Close the Access Definition Editor and select Yes when prompted with saving Access Definition. The Archive Request Editor window opens.
  19. Close the Archive Request Editor window and save as a two-part name when prompted (for example, ARCH.COUNTRYAU).

    You're now ready to run your Archive request.

  20. You have two options for running the ARCHIVE request: from the GUI or from the command line.
    • From the GUI, click File -> Open, which opens Object Explorer. Expand the archive tree and select the Archive request that you created.

      Click the running man icon from the toolbar, as shown in Figure 12.

      Figure 12. Toolbar: Running Archive
      Screen capture of the Toolbar: Running Archive button

      This button opens the Archive Processing Status window and, after processing finishes, it automatically shows a detailed archive process report. Listing 2 shows an example.

      Listing 2. Important details from Archive process report
      Row Details:
      
      Archived       Failed          Extract Options   Table Name
      434	       0	                         SAMPLE.SKAUR.OPTIM_CUSTOMERS
      108	       0	                         SAMPLE.SKAUR.OPTIM_SALES
      1117	       0	                         SAMPLE.SKAUR.OPTIM_ORDERS
      2540	       0	                         SAMPLE.SKAUR.OPTIM_DETAILS
      506	       0	                         SAMPLE.SKAUR.OPTIM_ITEMS
      
      Row Details:
      
      Deleted	       Not Found       Failed   Delete Strategy   Table Name
      434	       0                0       Key Lookup        SAMPLE.SKAUR.OPTIM_CUSTOMERS
      1117           0                0       Key Lookup        SAMPLE.SKAUR.OPTIM_ORDERS
      2540           0                0       Key Lookup        SAMPLE.SKAUR.OPTIM_DETAILS
      
      Delete Impact Analysis:
      
      Status	      Estimated Storage Savings	    Resource Name
    • From the command line, the Optim command-line processing executable is installed in the ${OPTIMHOME}\RT\BIN\ directory. The name of the executable is PR0CMND:
      PR0CMND /R TYPE=type REQUEST=identifier.name

      type can be any of the { ARCHIVE | DELETE | REPORT | RESTORE } values.

      identifier.name is the name of the action request that is pre-created by using the GUI on the Optim workstation and saved in the Optim Directory.

      You can specify several other parameters to alter or override the behavior of the stored Optim request. A few frequently used parameters are: STOP, QUIET, OUTPUT, MONITOR, SERVER, and PST. You can enter parameters with the command on the command line or enter them into a text file (called a parameter file).

      Run an Archive request from the command line (for the Archive that is designed in this article):

      pr0cmnd /R type=ARCHIVE request=ARCH.COUNTRYAU QUIET+ PST=OPTIMDIR
                                  output=”C:\Data\OPTIM\DeveloperWorks\ARCHCOUNTRY_cmd.rtf"

Tip

Optim has a facility to join tables in the Browse window that shows related records at various levels with a simple click of the Join button.

Now that you've finished this exercise, it would be beneficial to view Archived data. Optim provides a feature for browsing data: Click the InfoSphere Optim tool menu, select Utility -> Browse, and then select the Archive file. You will see a list of tables along with row count archived for each table. You can double-click any table to view its contents and also join related contents with the Join button.


Conclusion

Today, organizations face tremendous challenges as they deal with data explosion and information governance for their enterprise data. You cannot ignore the need for a strong solution that provides comprehensive capabilities for managing enterprise application data across applications, databases, operating systems, and hardware platforms. IBM InfoSphere Optim Data Growth Solution is widely used and a market leader in this space.

This article provided an overview of InfoSphere Optim Data Growth for systems development professionals, data architects, data administrators, or anyone who needs to understand archiving of products. You learned the fundamentals of the product, and common elements such as Optim directory, DB alias, and action requests. This introductory article should help you get started with your first Archive process.

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, or use a product in a cloud environment.

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while you explore the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=978027
ArticleTitle=IBM InfoSphere Optim Data Growth: Setting up your first Archive
publish-date=07212014