Troubleshooting invalidations can be a daunting task. After it is created, the invalidation ID starts its journey across multiple components, products and servers to ensure that all the cache entries associated to the invalidation ID are removed. With a failed invalidation, you are presented the challenge to understand which exact component is breaking, which can become a very time consuming task, and require the use of relatively expensive logging and tracing.
Next I'll walk you thru the basic invalidation flows, and show you how by understanding the different components and asking the right questions, you can narrow down the investigation and simplify the troubleshooting.
The invalidation process
Think of it as a relay race that starts with WebSphere Commerce. In the most common scenario, data updates on the production database by a staging propagation cause triggers to capture the changes and record them in the form of invalidation IDs in the CACHEIVL table. When the DynaCacheInvalidation scheduler job runs, it looks for new entries in CACHEIVL and issues the invalidations to Dynacache. If the cache instance is configured with WebSphere eXtreme Scale (WXS), the WXS libraries handle the invalidation and pass it over to the WXS containers. Instead, if the cache instance is configured with Dynacache, the entries are invalidated from memory and disk. If the cache instance is configured to replicate invalidations, the DRS component creates invalidation messages and relies on the High Availability manager to distribute them to the other servers in the cluster.
The process in the Search servers is a bit different. There is no scheduler and therefore no DynaCacheInvalidation job. As requests are received, the Search servers look into the CACHEIVL table for updates and processes them. As each Search server processes its own invalidations, there is no need for cache replication (DRS). See this other post for details: The Data Cache Moves to the Search Server.
Synchronization of JSP invalidations and Search
Invalidations recorded into the CACHEIVL table are issued as soon as the DynaCacheInvalidation job runs. If the invalidation is fired and the JSP re-executed (and re-cached) before the Indexprop process is finished, the JSPs can be re-cached using old Search data. To avoid this scenario, indexprop inserts a special restart entry in CACHEIVL that instructs the DynaCacheInvalidation job to go back in time and re-issue invalidations. The restart entry has the following format:
INSERT INTO cacheivl ( template, dataid, inserttime ) VALUES ( 'restart:', 'restartTime=<time_to_restart_from>', CURRENT TIMESTAMP )
The following are common causes as to why stale content might remain on the site after invalidations are issued. We'll discuss troubleshooting later in the post.
- Problems with the invalidations IDs
- Invalidations are not created
The process that updates the data does not trigger an invalidation, either directly or thru the CACHEIVL table. This can happen for example, if you enable the Data Cache, but do not install the invalidation triggers.
- Invalidation IDs do not match
The fragment to be invalidated is not associated to the invalidation ID that is issued. For example, the data ID inserted in CACHEIVL table (dataid column) does not match the dependency ID defined in cachespec.xml for that cache entry.
- Invalidations are not complete
You issue an invalidation for a fragment without invalidating its top-level cache. Then, even though the lower level cache is refreshed, changes are not reflected in the store front until the top level caches are refreshed as well.
- Invalidations are not in the right order
Here, although all invalidations are issued, they are not done in the right order. If a top-level cache entry refreshes prior to a lower level cache, the top-level cache might be re-created using old data..
- Problems issuing or distributing the invalidations
Under this scenario, the correct invalidations are created (e.g. inserted in CACHEIVL), but they are either not processed by the DynaCacheInvalidationJob, or Dynacache/WXS fail to process and replicate the invalidations as expected.
Troubleshooting store pages not refreshed
Next I list a number of actions and concepts that will help you see why stale pages remain in the store after invalidating. Caching can happen at multiple levels. The following points assume a problem with the caches within the Commerce servers. Before using these steps, ensure that the stale content is not being returned from browser cache or an edge cache service such as Akamai.
Validate and test your configuration
Before getting into troubleshooting, it's a best practice to validate your caching configuration. See these documents:
- Enabling the dynamic cache service and servlet caching
- Enabling WebSphere Commerce data cache
For non-WXS caches, ensure the DRS replication domain is configured. This applies not only to the base cache, and data cache (cacheinstances.properties), but also to all of the out-of-the-box defined distributed maps ( ).
Test your configuration by emptying each cache instance (clear cache) or specific cache entries, and ensuring the change is reflected on all of the other JVMs. For DRS-enabled local caches, this can be done using the WebSphere Extended Cache Monitor.
Using the Cache Monitor for WXS caches is not recommended and can lead to OOM problems. You can clear a WXS cache using the WXS's own Cache Monitor. To test invalidations on the WXS caches from a WCS JVM, see the sample code in the section below titled "Cache invalidation sample code".
For more info about DRS, see this post: Invalidating Cache: DRS.
Notes about cache clear
Either if you are clearing cache for testing, or it is your standard daily procedure, these are things that you should keep in mind:
- Using the WebSphere Cache Monitor: The "Clear Cache" button in the WebSphere Cache Monitor only empties the currently selected cache instance. To completely clear cache, you need to select and clear each cache. WXS caches need to be cleared using the WXS own cache monitor.
- Using clearall with the DynaCacheInvalidation job clears the base cache, as well as all of the data cache object caches. Custom caches need to be explicitly added to CACHEIVL using dmap:services/cache/MyCustomMapCach as the template. Other out-of-the-box caches that are not part of the data cache, such as DM_Cache, IVCache, Price and PriceRule are not cleared with clearall. See the documentation of the DynaCacheInvalidation job for more details.
- If the server is receiving traffic as you test, chances are that by the time you open another JVM's cache monitor, that server already cached multiple entries and the cache wont be empty. This can sometimes be confusing, specially with small caches that can fill up in matter of seconds or minutes.
- The DynaCacheInvalidation job is scheduled out-of-the-box to run every 10 minutes. You could therefore see a delay from the time the invalidations are added to CACHEIVL to the time they are effectively issued.
Narrowing down the scenario
As the invalidation process is complex, to troubleshoot is important to narrow down the scenario; this will help you focus the testing and reduce the troubleshooting time. These are answers you can try to provide before having to enable tracing:
- What is the content that is not refreshed? Does it affect all URLs or only certain data or fragments? For example, you could find that the complete product page is stale, or only certain aspects of it such as price.
- Are all the caches affected ? or only certain ones? Compare base cache vs data cache. Also local caches vs Extreme Scale (if applicable)
- Are all servers affected? In a cluster environment, servers might have different configurations. Dynacache could also be failing to issue remote invalidations to certain servers
- Do the invalidations work on the server that are originally issued? If invalidations always appear to work on the server where they are originally issued (e.g. on the server where the DynaCacheInvalidation scheduler job runs), and the cache is a local cache configured with DRS (to replicate the invalidations), this can help you narrow down the problem to cache replication.
Constant and Intermittent problems
Problems that can be reproduced every time typically relate to configuration issues. Following the steps in section "Validate and test your configuration" will often help uncover which caches are misconfigured.
Intermittent problems can be a lot more complex to troubleshoot. The problem could sometimes be related to load. See this post for details on how to control the number of invalidations that are issued by the DynaCacheInvalidation job: Avoiding invalidation overload after propagations.
Keep in mind that problems that only affect certain servers could also at first appear to be intermittent, as they will depend on which server your session gets associated to. To rule this out, check which server you are connected to when the problem happens. See Identifying the WCS server you are using.
Verifying the invalidation IDs
Cache invalidations can be issued by cache ID, dependency ID, or cache clears.
Cache IDs are unique for each object and they are used to determine if an object is in the cache. The ID is formed from the components of each cache entry as defined in the cachespec.xml file.
To simplify invalidations, cache entries can be associated to dependency IDs. A single cache entry can be associated to multiple dependency IDs, and a dependency ID can be associated to multiple cache entries.
When using the CACHEIVL table, or the Cache Monitors to invalidate, the invalidation ID needs to match at least one of the dependency IDs associated to the cache entry.
Verify the following:
- Ensure the invalidation ID exists in the CACHEIVL table. The CACHEIVL table is typically populated with triggers that fire as data is updated (e.g. from stagingprop). Data cache triggers are provided out-of-the-box (see invalidation triggers). Triggers to invalidate JSP pages are typically implemented and installed as part of the customization.
For example, list the ProductDisplay:productId dependencies inserted during the past hour:
SELECT * FROM WCS.CACHEIVL WHERE INSERTTIME > CURRENT TIMESTAMP - 1 HOUR AND DATAID LIKE 'ProductDisplay:productId:%'
If the invalidation ID does not exist in CACHEIVL, or it is not issued by other means, then the invalidation will not happen.
- Ensure the cache entries are associated to a dependency ID that matches the invalidation ID. The following is an example of how the dependency ID is created in cachespec.xml. Notice some components are constant, and others depend on parameter values (productId)
<component id="" ignore-value="true" type="pathinfo">
<component id="productId" type="parameter">
To see the actual dependency IDs associated to each cache entry, you can use the Cache Monitors.
- Caching is often present at multiple levels. A single request could rely on full page cache, fragment cache and data cache. Invalidation of lower level caches such as price, wont be reflected in the store if they are included in a higher level cache such as a JSP fragment or a full page.
Once you identify the piece of data that is not being refreshed, use the cachespec.xml file and the Cache Monitor to understand the different cache entries that include that data, and ensure there are corresponding invalidation IDs for those cache entries as well.
Opening a JSP cache in the WebSphere Cache Monitor will show its contents.
Entries like this show JSP fragments that were consumed into the current cache entry (merged with):
[CONSUMED include: /Aurora/Widgets/ESpot/ContentRecommendation/ContentRecommendation.jsp?storeId=10051&catalogId=10101&emsName=FooterRight_Content&fromPage=footer]
Entries without consume mean that the fragment is linked to and included during execution time:
Verifying the DynaCacheInvalidation Job
The DynaCacheInvalidation Job reads invalidation IDs from the CACHEIVL table and issues them to Dynacache. Verify the job is running as follows:
- Ensure the Job is correctly scheduled and running:
Use the Commerce Administrative Console, or an SQL such as this one to show the past jobs:
SELECT c.sccjobrefnum, s.scsactlstart, s.scsend, s.scsstate, s.scsresult, s.scsqueue
FROM schconfig c,
WHERE c.sccjobrefnum = s.scsjobnbr
AND c.sccpathinfo = 'DynaCacheInvalidation'
ORDER BY 2 DESC
- This query will show the next time and server on which the DynaCacheInvalidation job will run:
SELECT c.sccjobrefnum, c.sccquery, c.scchost, c.sccinterval, a.scsprefstart
FROM schconfig c,
WHERE c.sccjobrefnum = a.scsjobnbr
AND c.sccpathinfo = 'DynaCacheInvalidation'
- To avoid re-executing all the invalidations in the CACHEIVL table, the job uses the values of the startTime and startTimeNanos parameters, stored in SCHCONFIG.SCCQUERY. For example:
Only entries with INSERTTIME newer than the time specified with startTime and startTimeNanos will be processed the next time the job runs. The time in startTime is a Java date in milliseconds. There are several websites that will convert the milliseconds into a date for you.
Note: If the job find a 'restart' entry inserted after the startTime, the process will go back and re-issue invalidations.
When troubleshooting invalidations, it's typically easier to start by bypassing the DynaCacheInvalidation job and issuing the invalidations directly with the Cache Monitor. If the invalidation from the Cache Monitor works, then the problem can be in the DynaCacheInvalidation job and you should review it further. Otherwise, the problem is likely not in the job and you need to troubleshoot the Dynacache or WXS configurations further. To issue invalidations against a WXS cache, see the section below titled "Cache invalidation sample code".
Verifying Dynacache and DRS
The WebSphere Dynacache APIs receive the invalidation. If the cache instance is configured with a WXS back-end, the local WXS libraries forward the invalidation to the WXS container. If the cache instance is using local caches, the invalidation is processed and the associated entries are removed from caches in memory and disk. If the cache is associated to a replication domain, DRS is used to replicate the invalidation across the other servers in the cluster.
To verify Dynacache and DRS:
Validate if the cache entries are removed on the server where the DynaCacheInvalidation job ran.
You can find the server where the last instance of the DynaCacheInvalidation job ran by looking into the SCSQUEUE column in the SCHSTATUS table. You can also fix the job to a specific server by following these steps: Configuring the scheduler to run a job on an instance or cluster member.
If the invalidation works on the local server, but it is not propagated across to other servers, then there is likely an issue with the configuration of the replication domain or replication settings. As a next step you might need to enable tracing on both, the JVM where the DynaCacheInvalidation job ran and the JVM that was supposed to receive the invalidation. ( see "When everything else fails, enable tracing" below).
If the invalidation fails on the local server, validate the DynaCacheInvalidation job and use the Cache Monitor to issue the invalidations. If required, enable tracing on that JVM . ( see "When everything else fails, enable tracing" below).
Cache invalidation sample code
The WebSphere Cache Monitor can be used to issue invalidations and cache clears on DynaCache managed caches. If you are using a WXS cache, cache invalidations need to be issued from the eXtreme Scale monitor instead.
For troubleshooting purposes, it can be useful to issue invalidations to WXS caches from a WebSphere Commerce JVM. This allows you to bypass the CACHEIVL/DynaCacheInvalidation job and simplify the scenario.
Using the sample code:
- Download the sample code
- Extract the JSPs into a WAR that is not available externally, such us CacheMonitor.war
- Open depid_select.jsp in a browser
- Select the cache instance on which the invalidation will be issued, and either enter the dependency Id, or select "Issue Full Cache Clear?"
When everything else fails, enable tracing
If you are still unable to figure out why the invalidations are failing, the next step is to open a PMR and contact support.
The relevant MustGather documents are:
- MustGather: WebSphere eXtreme Scale
- MustGather: Dynamic Cache Issues in WebSphere Commerce V7.0
Remember these tips:
- Always trace the simplest scenario: For example, if you are able to reproduce the problem between two machines using the Cache Monitor, there is probably no need to collect trace on all the JVMs, or to use the CACHEIVL table.
- If you can reproduce the problem with WXS and local caches, choose one. Using a non-WXS cache is probably simpler.
Reproducing the problem:
- L2 needs sample invalidation IDs to track thru the system. Select a few samples. For DRS caches you can also test with cache clears.
- Collect a timestamped screenshot of the cache monitor (either WebSphere or WXS) showing the entries to be invalidated. For WXS ensure the screen shot includes the key and partition ID.
- If using CACHEIVL, export the table or the sample IDs. Remember that if you can reproduce the problem using the Cache Monitors, this simplifies the scenario.
- Enable Traces on the WCS JVMs as per this technote:
MustGather: Dynamic Cache Issues in WebSphere Commerce V7.0
- If using WXS, enable trace in the containers:
- Reproduce the problem. Take note of the time
- Get new timestamped screenshots showing the cache entries that were not invalidated
- Disable the traces
When opening the PMR, upload the traces, screen shots, descriptions and all configuration logs mentioned in the MustGather documents.
I know this is a lengthy article, but the time reading it is good investment considering how time consuming troubleshooting invalidations problems can be. I also recommend you checkout the other invalidation posts, to learn for example, how the data cache invalidation ids are created, or marketing invalidates.