Today I discuss the performance aspects of the stagingprop utility. Without some basic tuning considerations, the time needed to complete a large propagation could exceed the allocated window by hours or days, and the propagation could impact the storefront performance and shopping experience. This is the first post of a series I will use to show you key tuning configurations, best practices and troubleshooting techniques to help you run staginprop optimally.
Disclaimer: Although stagingprop works with both Oracle and DB2, this post was written with DB2 in mind and some of the points are only applicable to DB2.
The very very basics
Understanding this basic functioning of stagingprop is a prerequisite to tune and troubleshoot the propagation.
Once you become familiar with the stagingprop utility, you realize the logic is not that complicated: Triggers track all data changes in the staging database and record the changes in the STAGLOG database. The stagingprop utility, which is used to propagate the data changes to production, starts by querying the STAGLOG table and performs a consolidation process, which speeds up the propagation by reducing the number of changes that are issues to the production database. The utility then joins the STAGLOG table with the table containing the actual data to be propagated. The utility uses the data to create INSERT/UPDATE and DELETE statements that are executed thru JDBC on the production database. This is done with transactions, which size determines how much data is fetched at once and committed on the production database. Yes, sure, this is an oversimplification but it serves the purpose!
Before starting the propagation, be sure to have followed the best practices and recommendations discussed in the latter articles in this series, such as using non-default values for transaction and batch size, the currently committed feature, DBCleaning the STAGLOG and more.
While the propagation runs, you will need to collect data from both databases to validate performance:
- Stagingprop log
The stagingprop utility will create a log that will help you find the most time-consuming operations.
- db2collect for both staging and production databases
While the propagation is active, use the db2collect.sh script to collect configuration and performance data on each database. The data collected can later be analyzed with the WebSphere Commerce DB2 Report tool. The reports will show you, for example, how fast queries run on the staging side, or if stagingprop is running into locking or overloading the production database.
See the blog to get started with the tool: db2report Makes it Easy to Troubleshoot your Database.
- Access to query the STAGLOG table
At any time you can query the STAGLOG to find the number of records to be propagated.
SELECT STGTABLE, STGOP, COUNT(*) FROM STAGLOG WHERE STGPROCESSED = 0 GROUP BY STGTABLE, STGOP ORDER BY 3 DESC
Key tuning parameters: Transaction and batch size
We'll start with transaction and batch size, which are key parameters for tuning the propagation. If they are not set correctly, you might experience long propagation times, errors and increased impact on the live site.
The -transaction parameter controls how often changes are committed on the production database. If the parameter is not specified, stagingprop defaults to "one", which means that a single commit will be done at the end of job. Although this has the advantage of an "all-or-nothing" approach where all changes are rolled back in case of error, if the propagation is large, a single very large transaction puts a lot of pressure on the database. A large transaction can use all the transaction logs and lead to errors. It also might require a very long rollback time in case of failure, and locks are held for an extended period of time which can impact the live store.
A common approach is to commit after an n number of rows. Good starting points are values from 25K to 50K.
If you are thinking from a pure database standpoint, a commit operation is quick, so using a smaller number shouldn't hurt. The reality with stagingprop is a bit different: The transaction size not only determines when the commits are done on the production database, but also how frequently queries are executed in the staging database to fetch more data and when the records in the STAGLOG table are marked as processed. Then If the query to fetch data from staging takes a number of seconds, executing it more frequently can increase the total execution time.
For example, if you want to propagate 5 million rows, if the query to fetch data from the staging database takes 3 seconds using a 1000 transaction size, the query will need to execute 5,000 times with an execution time of +4 hours ( 5,000,000 / 1,000 * 3 ). Then if you change the transaction size to 50,000, the query will need to run only 100 times, and will only account for 5 minutes.
So why not using a very large transaction size? It's a matter of balance. As you use a larger transaction size, you can run into the same problems I discussed for the -transaction one option.
The batchSize sometimes gets confused with the transaction size. The batchSize does not control transactions. It determines how statements are grouped before they are sent to the database server.
For example, for an INSERT operation. If the batchSize is 1, stagingprop will go to the database for each statement. Instead, with batching, stagingprop can grab a number of statements and send them all at once. Reducing the back-and-forward communication can save time, specially if there is a slow network connection between the stagingprop utility and the databases.
The batchSize needs to be a multiplier of the transaction size. For example, if the transaction parameter is set to 50K, using a batchSize of 5K is a fair starting point.
In the next blog of this series I will discuss how to speed up stagingprop on the staging side, and how to reduce the impact of propagations in the live site.