Hats off to the business rules support team!
This morning the email below caught my eye, as it typifies the high-quality diagnostic work of the support team. Jun is helping troubleshoot an out of memory issue with Rule Team Server. A key component of the success of JRules is the ability to support customers and engineers in the field, and filing high-quality (actionable) bug reports for R&D when necessary.
In my previous email I gave some suggestions based on the information we received. What kind of diagnostic steps have been taken on the customer's side? Do you have any additional findings?
The server logs alone are not sufficient for us to draw any conclusions.
In order to effectively diagnose the problem, we must first obtain as much background information as possible.
What we need to determine is whether the OOM is caused by a memory leak or a lack of resources given the current load.
We can start by addressing the following:
- What is the current heap size?
- Does increasing the heap size help avoiding the problem?
- What is the size of the rule project? e.g. number of rule artifacts, decision tables, variables, dependencies, etc.
- From the server logs, as I pointed out earlier, there are NullPointerException and ConcurrentModificationException preceding the OOM. Is it always the case that OOM follows these particular errors?
- What type of activity leads to the NPE, CME or OOM? Do you have steps to reproduce these errors? Can you provide sample projects along with the steps that would allow us to replicate the problem?
- Do you see any evidence indicating a possible memory leak? If so, do you have any profiling data that you can share with us? An indication of a memory leak would be increasing memory utilization over time that is not released and collected by the GC and inconsistent with the load requirements.
- Are there any customizations done on the RTS?
- Do you observe any memory fragmentation from the heap dump?
The following link introduces a troubleshooting tool for OOM, provided by IBM. This could provide a good starting point for diagnostics on your customer's side.
Besides the questions above, please share with us any additional findings/observations you have made. As soon as we have these information, we will be able to move forward with the investigation and provide possible lead.