This document applies to Rational DOORS Next Generation 6.0.6 Ifix003 and later.
It is possible to construct a query across a large data set which runs for a long period of time. Such a long-running query may consume resources and cause the system to become unstable.
A new Rogue Query Monitor has been introduced to capture these queries.
It is now possible to abort a long-running query before it impacts system stability.
Queries which exceed a specified timeout produce a warning message. This is displayed to the end user in the web UI and also seen in the rm.log on the server. Since 6.0.6 iFix 003 it is possible to automatically abort queries that exceed a defined timeout.
Note: Some RM internal queries are excluded from monitoring because we know these are expected to be long-running. These include ETL jobs for Reporting.
This document details how to use the timeout values and how to disable this feature, if required.
The following steps apply to the new Rogue Query Monitor functionality introduced as a stability fix within Rational DOORS Next Generation 6.0.6 ifix 003.
The RM server uses the following advanced properties for the Rogue Query Monitor
1) Rogue Query Monitor run interval
- Name - Rogue Query Monitor run interval (in seconds)
- Description - Run interval in seconds for the Rogue Query Monitor. A value of 0 will disable the Rogue Query Monitor
- Default value - 30 seconds
- Changing the value requires a server restart to become active
- A value of 0 seconds will result in no query monitor running. This will return the pre-Ifix003 behaviour and should only be used under the direction of IBM Support.
2) Rogue Query timeout
- Name - Rogue SPARQL Query abort timeout (in ms)
- Description - Enables the RM Rogue Query Monitor to abort exceedingly long running SPARQL queries. If the execution of a query exceeds the amount of time in milliseconds specified here, the query execution will abort to avoid locking up the server
- Default value - 60000 (1 minute)
- Can change value on running server
- No minimum setting, if set to 0 or -1 (ms), it will be added (or deducted for negative values) from the client time out value
3) Web UI Query timeout (exists already)
- Name - query.client.timeout
- Description - Value for which the web UI expects a query to timeout
- Default value - 30000 milliseconds (30 seconds)
Allowed runtime calculation
Logic used by the RM Rogue Query Monitor to calculate query abort:
Query starttime + (Web UI Query timeout + Rogue Query timeout) > current time : abort query in Jena using current Thread Id
Using this calculation
the minimum query runtime with the default settings is (30 seconds client timeout + 60 seconds rogue timeout) + 1 second for the rogue query monitor interval = 91 seconds
the maximun query runtime with the default settings is (30 seconds client timeout + 60 seconds rogue timeout) + 30 seconds for the rogue query monitor to kick in = 120 seconds (plus a minimal delay while the query monitor is iterating through each the running queries at that moment)
The RM admin debug page for running SPARQL queries now has a checkbox to allow for long-running queries.
There is additional logging that accompanies this functionality that is set by default, as well as advanced logging which can be set via a log4j property.
Informational logging that will be available in rm.log
At server startup:
CRRRS8752I The RM query monitor task started. The run interval for the task is set to 30 seconds. The maximum query run time is set to 1 minute 30 seconds.
CRRRS8753I The RM query monitor task is disabled. The run interval for the task is set to 0 seconds.
when a rogue query is detected:
CRRRS8754W The RM query monitor detected and will cancel a query that started running at 8/23/18 3:50 PM and has been running for 1 minute 37 seconds. The query ID is 33f9b07d-d771-437a-854b-3a3437a2b0ed with thread 574.
Debug logging can be invoked in order to fully understand what is occurring when operations and queries are timing out, via:
If you are asked to run a specific tool by IBM Support to troubleshoot a situation, or run a corrective procedure, you may need to adjust these settings. If it appears that nothing is happening in the GUI after 2 minutes, then please refer to the rm.log.
An example would be ReqIF export. Jazz.net defect: https://jazz.net/jazz03/web/projects/Requirements%20Management#action=com.ibm.team.workitem.viewWorkItem&id=126574/ APAR PH05142 is an example of a user action impacted by this. The advice is to amend the rogue query timeout to a value which allows your exports to complete.
IBM Support will advise whether to turn off this feature temporarily, or whether an adjustment to the time-out will suffice.
RDNG; Rational DOORS Next Generation;Rational DOORS Next Generation
13 November 2018