Archiving and recovery solutions for IBM Business Process Manager using InfoSphere Optim

Using simplified examples, this article shows how you can use IBM® InfoSphere® Optim™ to control data growth and, at the same time, maintain data privacy with IBM BPM.

Sharad Chandra (schandr1@in.ibm.com), Senior WebSphere Consultant, IBM

Photo of Sharad Chandra Sharad Chandra is a Senior WebSphere Consultant in IBM Software Group. He has 8 years of experience in BPM and SOA technologies and a versatile background in handling complex customer solutions related to BPM and middleware technologies.


developerWorks Contributing author
        level

Gaurav Rohila (garohila@in.ibm.com), Senior QA Analyst, IBM

Photo of Gaurav RohilaGaurav Rohila is a Senior QA Analyst and has nine years of experience in software testing. He has worked as a software test specialist under the Optim QA team at the India Software Labs in Bangalore. Previous Optim articles on developerWorks include Mask archived and extracted data directly into CSV, XML and ECM formats using InfoSphere Optim Data Masking Solution. In the past, Gaurav has participated in testing financial and insurance domain applications, such as Peoplesoft AP and GL modules.



30 April 2014

Introduction

IBM Business Process Manager (BPM) is a comprehensive BPM platform that provides visibility and insight on the management of business processes. It scales smoothly and easily from an initial project to a full enterprise-wide program that harnesses complexity in a simple environment to better meet customer needs. This article presents a real time use case by showing how effective database archiving techniques can improve the performance of your infrastructure when several queries are fired against the database tables.

In this article, we will concentrate on one activity of the whole process, which fetches the data from multiple tables to be displayed to the end user. There are production workflows that are initiated 24X7, thereby accumulating a huge amount of data in tables and indexes, in which most of the data lays redundant after the activities of workflows are completed. If this data is not archived in time, it results in exponential growth of tables and indexes. The query scan time also increases exponentially, resulting in low response time gradually increasing from a few seconds to minutes and hours. Some memory timeouts are also reported at the indexes, leading to an index crash. To handle such issues, a tooling support is needed to control the data growth in databases such as IBM InfoSphere® Optim™.

IBM Optim is the market leader in the enterprise data management market. Optim is a truly enterprise solution that works across applications, databases, and platforms and supports all of the following databases:

  • Oracle®
  • IBM DB2®
  • Microsoft® SQL Server®
  • Sybase®
  • Informix®
  • Teradata®
  • Netezza®

This article discusses the following capabilities of Optim:

  • Optim Data Growth Solutions (archiving):
    • Improve performance
    • Control data growth, save storage
    • Support retention compliance
    • Enable application retirement
    • Reduce maintenance burden
    • Streamline upgrades, safely retire legacy applications
  • Optim Test Data Management Solutions:
    • Create targeted, right sized test environments
    • Improve application quality
    • Speed iterative testing processes
  • Optim Data Privacy Solutions:
    • Mask confidential data
    • Comply with privacy policies
    • Create a safe sandbox for testing

Since Optim supports multiplatforms, this article is applicable to all such infrastructures involving WebSphere® based servers with multi-vendor databases on multiple platforms.


WebSphere Business Process Manager infrastructureuse case

Considering a production environment involving an HTTP server cluster, a WebSphere Portal Server cluster, and WebSphere Process Server or IBM BPM along with a multi-vendor database shown in Figure 1, let us assume this infrastructure is being used by a government sector customer, such as the passport office, natural gas fuel agency, a nationalized bank, and so on, where the site is used by customers 24X7. The overall processing involves application data originating through J2EE and portlet applications, and workflow data originating through WebSphere Process Server or IBM BPM long running processes.

Figure 1. End-to-end infrastructure architecture
End-to-end infrastructure architecture

Click to see larger image

Figure 1. End-to-end infrastructure architecture

End-to-end infrastructure architecture

Since long running processes may take days, weeks, or months to complete, any record generated by these processes may remain in the database for a long period of time, along with the completed process records. This, along with application data generated out of normal applications deployed on Portal Server and Process Server, may increase the table size exponentially per day. Such an uncontrolled growth of data may result in severe performance degradation for the overall response time to end users. WebSphere Process Server's Business Process Choreographer and other application queries may take longer to scan the indexes for processing the result. This may affect end users and back office users who are working on tasks, activities, and response. Let's see a simple example of a database interaction through IBM BPM coaches.

Figure 2 shows a database service, which fetches the data from the database table against a query and sends it to the SalesDataReview coach to get it displayed.

Figure 2. Activity that displays data to end users
Activity that displays data to end users

Figure 3 shows the SELECT query fired against multiple related tables and fetching multiple rows to be displayed to end users. The database reference is taken through the data source JNDI configured in IBM BPM's underlying WebSphere Application Server, as shown in Figure 4. SQL queries may vary and can select specific data based on the primary key to display the results. In such cases, a complete table scan may be done in the absence of indexes. However, the query response time may increase with considerable data growth.

Figure 3. Select Query to fetch data from database and to display in coach
Select Query to fetch data from database and to display in coach
Figure 4. Data source reference to fetch queries
Data source reference to fetch queries

As seen in Figure 4, "jdbc/testDS" is a data source reference from the underlying WebSphere Application Server admin console with a return data from SQL been mapped to "SalesDataBO" type, which is a BPM data structure to store the return data (Figure 5). The amount of data to be fetched can also be set as "maxRows" parameter. For data to be assigned in business objects on an "as-is" basis, you need to create fields with the same name of the columns in the table.

Figure 5. SalesDataBO Business Object structure
SalesDataBO Business Object structure

Figure 6 shows the coach implementation for a data display, which is fetched from multiple tables. Each column is mapped to the corresponding fields in the BO. The data is mapped by the IBM BPM engine into the table through the BO mappings.

Figure 6. Structure of table in the coach
Structure of table in the coach

Once the data is loaded into the coach, it is displayed as shown in Figure 7. You can control the rate at which the data is displayed in the table control through custom settings that are available in the table properties in Process Designer. For example, you can set per page limit of the data to be displayed. The forward and backward controls come automatically and no additional programming is needed.

Figure 7. Results displayed in a table in the coach
Results displayed in a table in the coach

The above query is used for reporting purposes. However, there could be situations where specific data needs to be fetched from the database using the Where condition clause as shown in Listing 1.

Listing 1. Where condition clause
SELECT x.SALESMAN_ID AS SalesmanID,x.NATIONALITY,x.TERRITORY,x.EMAIL_ADDRESS AS 
Email,y.CUSTNAME,y.YTD_SALES AS Sales,z.ORDER_POSTED_DATE AS 
OrderPostDate,z.ORDER_DATE AS OrderDate,p.ITEM_QUANTITY AS ItemQuantity FROM 
OPTIM_SALES x,OPTIM_CUSTOMERS y, OPTIM_ORDERS z, OPTIM_DETAILS p
WHERE x.SALESMAN_ID=y.SALESMAN_ID and y.CUST_ID=z.CUST_ID and z.ORDER_ID='254'

Assuming that the indexes have been created on the tables for which the above query is fired, the performance may degrade if the index size grows beyond a certain limit, such as the absence of any backup and archival strategy. In such situations, the only option is to take a backup of the business critical data and reduce the index and table sizes. Regular backups can help achieve the following goals as mentioned in the Introduction section: improve performance for the application, control data growth and save storage, and reduce maintenance burden.


Managing data growth

In the above example, if you want to remove data from the database, where OPTIM_CUSTOMERS,OPTIM_SALES, OPTIM_ITEMS,OPTIM_ORDERS and OPTIM_DETAILS tables need to be archived and deleted, you can use Optim's "Archive and Delete" data solutions. The data can be archived and also deleted from the production database.

As shown in Figure 8, the "Archive request Editor" window displays all the required data that is to be archived and deleted. After the process ends, the "customerdata.af" archive file is created containing all the tables. Clicking the Objects tab under the "Archive request Editor" window displays all the objects related to all the archived tables, which will also be archived as shown in Figure 9.

Figure 8. Archive Request Editor window displaying the archive file names
Archive Request Editor window displaying the archive file names
Figure 9. Archive Request Editor window showing all objects related to the tables
Archive Request Editor window showing all objects related to the tables

If you navigate to Tools > Edit Access Definition under the "Archive request Editor", a window displays all the tables that will be archived. The data will also be deleted from these tables as displayed in Figure 10. When you click the Relationships tab under the "Access Definition Editor", a window displays all the relationships, if any, between the archived tables as displayed in Figure 11.

Also in Figure 8, the "Archive Request Editor" window shows three important points marked in red. These are as follows:

  • Defer delete after archive: If the user checks this check box, the required data (tables) will not be deleted from the database after the archive finishes processing. Only the archive files will be created with the required data. If the user un-checks this check box and after the archive finishes, the same data will also get deleted from the database.
  • Compression Options: If the user checks this check box, the archive file that is created will be compressed.
  • Storage Profile: This is an important part for moving the archival data from the local drive to the storage device. As discussed in Figure 8, where "customerdata.af" file is created after the data is archived, if you want to remove this archived data file ("customerdata.af") from the local drive and move this file to some storage device like IBM Tivoli®, this task can be achieved by setting the "Storage Profile Definition" (see Figure 12). A "Storage Profile Definition", called "Test", is created and this name is declared under the "Storage Profile" field of the "Archive request editor" window, as shown in Figure 8. Now when the archive process is initiated, it will execute the "Test" storage profile definition. Under this storage profile definition (Test), there is an option set under the "Copy to backup device" field as "Tivoli". This means that after the archive process is finished, the archive file created will be copied to the "Tivoli" storage device.

Also, under this storage profile definition (Test), you can set the timeline (hours/days) as to when the respective archive file is to be deleted from the local drive, which can be set under "File retention and retention policy".

Figure 10. Access Definition Editor window displaying tables that will be archived from the database
Access Definition Editor window displaying tables that will be archived from the database
Figure 11. Access Definition Editor window displaying all relationships between all tables
Access Definition Editor window displaying all relationships between all tables

When you select Tools > Edit Storage Profile under the "Archive request Editor" window, you see "Storage Profile Definition", where you can set the storage definition (see Figure 12).

Figure 12. Storage profile definition to store backup of data
Storage profile definition to store backup of data

Testing data management and data privacy

Test data management comprises of providing test environment to the external testing teams, where the data provided consists of a sub-set of the production data. As shown in Figure 13, the "Extract Request Editor" window performs extract request, which extracts the desired number of tables from the database. Figure 13 shows that after the extract process is completed, the "customers-data.xf" file is created. This file consists of data from all the tables whose data is extracted from the database.

Figure 13. Extract Request Editor window displaying extract file name
Extract Request Editor window displaying extract file name

After clicking the Objects tab under the "Extract request Editor" window, you see all the Objects related to all the extracted tables, which will also be extracted (Figure 14).

Figure 14. Extract Request Editor window showing all the objects related to all the tables extracted
Extract Request Editor window showing all the objects related to all the tables extracted

When you select Tools > Edit Access Definition under the "Extract request Editor" window, you see all tables that will be extracted (Figure 15). You can also specify how many rows need to be extracted from each table. This can be specified under the "Row Limit" option.

Figure 15. Access Definition Editor window displaying tables that will be extracted from the database
Access Definition Editor window displaying tables that will be extracted from the database

One of the main concerns related to extraction of data, which has been discussed in the above example, is related to data privacy. That is, how to "hide" or "mask" the important columns of the tables that are to be extracted. The next figures will show this scenario, which is also called "data privacy". This can be achieved by the "Convert" request of Optim. If you want to mask some of columns under some of the tables, which are extracted in the "customers-data.xf" file above, you can execute the "Convert" request as shown in Figure 16. In this "Convert Request Editor" window, the source is the extracted file "customers-data.xf", where some of the columns under the tables mentioned inside this file should be masked. After this "Convert" request is executed, the "customers-data-converted.xf" file is created that consists of masked data.

Figure 16. Convert Request Editor window
Convert Request Editor window

Navigate to Tools > Edit Table Map under the "Convert request editor" window to display the "Table map editor" window, as shown in Figure 17. This displays all the tables under the "customers-data.xf" extract file. If you want to mask or hide the "National_id" and "Email_address" columns under the "Optim_Customers" table, apply the data privacy modules to these columns.

Figure 17. Table map Editor window displaying all tables that will be converted
Table map Editor window displaying all tables that will be converted

Figure 18 displays the "Column map editor" window that displays all the available columns under the "Optim_Customers" table. The "Column map editor" window can be displayed by right-clicking any table name under the "Table map editor" window.

Figure 18. Column map Editor window displaying all columns under the specified table
Column map Editor window displaying all columns under the specified table

Figure 19 displays the "Column map editor" window, which displays all the columns from the "Optim_Customers" table. To mask the "EMAIL_ADDRESS" and "NATIONAL_ID" columns of the "Optim_Customers" table, TRANS EML and TRANS SSN functions are applied to these columns as shown in Figure 19, which are marked in red. After "Convert" finishes processing, these columns under the "Optim_Customers" tables are masked. Data privacy is achieved under the conveyed extract file called "customers-data-converted.xf".

Figure 19. Column map Editor window which displays all columns under the specified table
Column map Editor window which displays all columns under the specified table

Note: The "Convert" request mentioned above has the capability of converting source extract files into a destination extract file, which is masked. In addition, this "Convert" request can convert the source extract files into the destination CSV and XML files, where those files can also be masked.


Conclusion

Through simplified examples, this article illustrated how you can use InfoSphere Optim to control data growth, and at the same time, maintain data privacy with IBM BPM.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Business process management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Business process management, Information Management
ArticleID=970175
ArticleTitle= Archiving and recovery solutions for IBM Business Process Manager using InfoSphere Optim
publish-date=04302014