This blog promotes knowledge sharing through experience and collaboration. For more product information, visit our WebSphere Commerce CSE page. For easier navigation, utilize the Categories to find posts that match your interest.
Common issues during Peak Part 1
In my previous blog I talked about the high level steps that should be taken to help make sure your WebSphere Commerce environment is prepared for peak season. In this blog I will highlight some of the issues we see every year and what steps you should take to help prevent them.
Mis-configured Webcontainer thread pool
The size of the Webcontainer thread pool determines the number of web container threads that can run concurrently in the WebSphere Commerce JVM. To many threads and you run the risk of overwhelming the JVM and too little and your requests can pile up waiting for a free connection. The min and max should be equal to reduce the overhead of destroying and creating new threads. Generally speaking, 25 web container threads is a good starting point. Also you should ensure that the parameter isGrowable is set to false to prevent native memory issues
For multiple processors, start with 5 threads for each CPU. As an example: In a 4 CPU system, 20 Web Container threads should be used
Connection Pool setting:
Use following equation to determine size of DataSource Connection Pool Size:
DataSource Connection Pool Size >= (# Of WebContainer Threads)
Incorrect Cache Sizes
Tuning caching is a difficult process and it requires constant review especially in the run up to Peak. If a cache instance is full then Dynacache will need to off-load entries before it can add a new cache entry. If you see a large number of cache entries being off-loaded then it reduces the benefits of caching as requests that should be found within the cache, are forced to re-execute and re-cache.
The solution is not always as easy as increasing the size of the cache. Larger caches require more memory, and this has implications with garbage collection and Java heap size tuning. A large cache can also increase maintenance times.
To review your cache setup please follow these blogs;
Debugging an OOM is not an easy task there can be several reasons why you've hit an OOM issue and checking your system prior to peak is best defense against an OOM. For example if you notice that your heapspace is nearly full in the run up to peak you should look into whether the heap is full due to a leak or LOA or caching to many objects. If its a leak you don't want to increase the heapspace again as it will eventually fill up and cause an OOM again so you need to look into where the issue stems from. If you find that its caching objects that is taking up alot of your heap then this may not be bad thing but its important to look into if the cache is holding to much space and you're left with no space for other objects.
To help prevent/ be prepared for an OOM I would suggest installing the WebSphere Commerce health center tools that I mentioned in my previous blog and using the tools to check the health of your system and highlight areas of interest.
You should also have your environment ready in the case of an OOM by having verbose GC running & the system setup to gather javacores and heapdumps/cores during an issue to help speed up the resolution.
To understand more about OOM issues and how they can be debugged check out the following blogs
During peak season most customers will need to propagate new data to the production site. We have seen in the past that without some basic tuning considerations, the time needed to complete a large propagation could exceed the allocated window by hours or days, and the propagation could impact the storefront performance and shopping experience. The main parameters that need to be focused on to help tune your stagingprop job is Transaction and batch size. If they are not set correctly, you might experience long propagation times, errors and increased impact on the live site.
The -transaction parameter controls how often changes are committed on the production database. If the parameter is not specified, stagingprop defaults to "one", which means that a single commit will be done at the end of job.
This not an ideal approach during peak as if it fails the whole job is rolled back.
The batch size determines how statements are grouped before they are sent to the database server. For example, for an INSERT operation. If the batchSize is 1, stagingprop will go to the database for each statement. Instead, with batching, stagingprop can grab a number of statements and send them all at once. Reducing the back-and-forward communication can save time, specially if there is a slow network connection between the stagingprop utility and the databases.
To tune your stagingprop to ensure it is ready for peak please review the following blog series;
Check our part 2 for more common issues.