This blog promotes knowledge sharing through experience and collaboration. For more product information, visit our WebSphere Commerce CSE page. For easier navigation, utilize the Categories to find posts that match your interest.
DB Must-Do's: Data Maintenance for WebSphere Commerce Database
Every database requires a maintenance strategy. As part of the strategy, it is crucial to look at the data being stored and how the database grows. This article will aim at the process of removing stale and unneeded data from the WebSphere Commerce database.
In WebSphere Commerce, specifically, lack of database data maintenance may introduce problems in relatively short periods of time.
So it's a given that DBA's need to implement a data maintenance plan. This must be developed working closely with the application and business teams to answer the following questions:
Identify the data
The WebSphere Commerce DBClean utility provides out-of-the-box delete statements for database cleanup purposes. You can review the utility to get familiarized with the predefined tables and delete statements. A list of out-of-the-box delete configurations is provided at Database Cleanup utility objects.
Some of WebSphere Commerce fastest growing tables are:
The following three tables are also worth mentioning as they have fast growth rates and have scheduled jobs for maintenance.
CTXMGMT and CTXDATA are business context tables that store session information. The ActivityCleanUp scheduler command which runs every 24 hours to cleanup these tables. However, during high traffic the cleanup periods may need to occur more often.
Next is the SCHSTATUS table, which as the previously mentioned tables, has a scheduler command to maintain the table (CleanJob). Alternatively, you can use DBClean to cleanup the table.
For more information, refer to Maintaining the Business Context tables using DBClean and Maintaining the WebSphere Commerce Scheduler tables.
In addition to running DBClean, collect growth statistics from the WebSphere Commerce database. This can be done in a weekly or monthly basis; and will help identifying how much space tablespaces are using along with the cardinality of the tables. Most important, it will help identity the following:
The process of collecting the statistics shouldn't be intrusive. DB2 and Oracle store the cardinality of the tables already. Using a script you can extract this information and store it in a separate table. Then, you can calculate the growth rate based on the cardinality values store over time.
Details on how to implement the growth script can be found at Implementing a data maintenance strategy.
As the database is in continues growth, data cannot be kept forever. Policies need to be defined to determine the length of time that the data needs to be kept before being moved to a Decision Support System (DSS) or cleaned up. These must be in accordance with the business needs and practices.
The following are sample questions to define policies:
The DBClean utility can be used to clean up the database based on the policies you've defined. It contains predefined list of delete statements to delete common WebSphere Commerce objects such as users, orders and catalog information. It can also be customized to add your own statements or update the existing ones.
Is is not required to performing the data cleanup using DBClean. The application team and the DBA may decide to use the queries and run them directly from DB2.
For more information on DBClean, refer to Database Cleanup utility command script.
As mentioned before, DBClean contains predefined delete statements. However, your delete implementation may differ from what's already defined, and these differences may impact performance significantly.
For instance, let's take the following predefined query to remove stale guest orders.
DELETE FROM orders
WHERE ( status = 'P' or status = 'I'
or status = 'W' or status = 'N') and (sysdate - ?) >= lastupdate
and orders.member_id in
(select distinct users_id
from users where registertype = 'G')
and (orders_id not in
(select distinct orders_id
where inventorystatus != 'NALC' and inventorystatus is not null))
and orders_id not in
(select orders_id from orderitems where orderitems_id in
(select orderitems_id from rmaitem where orderitems_id is not null))
This query has several sub-queries with conditions against other tables.
The first condition is to check the member_id in the ORDERS table exists in the USERS table and the registertype is 'G'. Basically, we're checking to see if the order belongs to a guest user. Checking for this condition already adds a cost to the overall query as we need to check against the USERS table.
Now, if you business defined policy states that you can delete pending orders for both registered and guest users after the same amount of days, we can remove the subquery pertaining to the USERS' table lookup (as we no longer care if the order belongs to a guest user).
The next sub-query checks if the order contains any items that there inventory status is different than 'NALC' (non-allocated). ALC and NALC are possible inventory status options for ATP inventory model. If your store uses non-ATP, you can remove this condition from the query.
Removing any condition not applicable to the business defined policies may result in significant improvement in the query's performance.
Note: if you customize a default DBClean configuration, the recommendation is to add a new configuration to the utility instead of modifying the existing one. This will ensure that the changes are not lost during a fix pack installation.
DBClean also allows modifications to the commit frequency making transactions manageable by the database and reducing the contention by freezing locks more often. The performance may vary depending the statements and other factors such as I/O and table-scanning cost.
Let's first cover functional testing to make sure the deletions performed are correct.
Next, perform non-functional tests to determine if the system resources are sufficient to complete the DBClean runs. You must perform this test in non-live environment where you can simulate the data and the load applied to the database.
Some of the most common aspect to look at are:
The data maintenance scripts need to run, preferably, during low-traffic maintenance windows. Usually the data is cleaned up in a weekly basis; however, there may be some data that would required daily maintenance.
The data maintenance scripts should take place before database maintenance as statistics need to be updated to reflect the changes made by the cleanup scripts.
In certain scenarios, live environments have small down-time window for maintenance and DBClean (or it's equivalent queries run directly at the database) may require to run longer. This is usually because of the amount of data that needs to be cleaned and the cascading table hierarchies that need to be checked. In other cases, you may just need to do some "catching-up" deleting large number of records. To solve these problems WebSphere Commerce Fix Pack 9 introduces Offline Mode for DBClean.
The Offline Mode for DBClean removes the use of cascade deletes by disabling the foreign key relationships; and uses explicit deletes for each table.
For more information on DBClean's Offline Mode, refer to Database Cleanup Utility.