IBM Support

How to amend Rational DOORS Next Generation timeouts to manage long running queries

How To


Summary

This document applies to Rational DOORS Next Generation 6.0.6 Ifix003 and later.

It is possible to construct a query across a large data set which runs for a long period of time. Such a long-running query may consume resources and cause the system to become unstable.

Objective

A new Rogue Query Monitor has been introduced to capture these queries.
It is now possible to abort a long-running query before it impacts system stability.

Queries which exceed a specified timeout produce a warning message. This is displayed to the end user in the web UI and also seen in the rm.log on the server.  Since 6.0.6 iFix 003 it is possible to automatically abort queries that exceed a defined timeout.

Note: Some RM internal queries are excluded from monitoring because we know these are expected to be long-running. These include ETL jobs for Reporting.

This document details how to use the timeout values and how to disable this feature, if required.

Steps

The following steps apply to the new Rogue Query Monitor functionality introduced as a stability fix within Rational DOORS Next Generation 6.0.6 ifix 003.

Advanced properties

The RM server uses the following advanced properties for the Rogue Query Monitor

1) Rogue Query Monitor run interval

  • Name - Rogue Query Monitor run interval (in seconds)
  • Description - Run interval in seconds for the Rogue Query Monitor. A value of 0 will disable the Rogue Query Monitor
  • Default value - 30 seconds
  • Changing the value requires a server restart to become active
  • A value of 0 seconds will result in no query monitor running. This will return the pre-Ifix003 behaviour and should only be used under the direction of IBM Support.

 

2) Rogue Query timeout

  • Name - Rogue SPARQL Query abort timeout (in ms)
  • Description - Enables the RM Rogue Query Monitor to abort exceedingly long running SPARQL queries. If the execution of a query exceeds the amount of time in milliseconds specified here, the query execution will abort to avoid locking up the server
  • Default value - 60000 (1 minute)
  • Can change value on running server
  • No minimum setting, if set to 0 or -1 (ms), it will be added (or deducted for negative values) from the client time out value

 

3) Web UI Query timeout (exists already)

  • Name - query.client.timeout
  • Description - Value for which the web UI expects a query to timeout
  • Default value - 30000 milliseconds (30 seconds)

 

Allowed runtime calculation

Logic used by the RM Rogue Query Monitor to calculate query abort:

Query starttime + (Web UI Query timeout + Rogue Query timeout) > current time  :  abort query in Jena using current Thread Id

Using this calculation

the minimum query runtime with the default settings is (30 seconds client timeout + 60 seconds rogue timeout) + 1 second for the rogue query monitor interval = 91 seconds

the maximun query runtime with the default settings is (30 seconds client timeout + 60 seconds rogue timeout) + 30 seconds for the rogue query monitor to kick in = 120 seconds (plus a minimal delay while the query monitor is iterating through each the running queries at that moment)

The RM admin debug page for running SPARQL queries now has a checkbox to allow for long-running queries.

There is additional logging that accompanies this functionality that is set by default, as well as advanced logging which can be set via a log4j property.

 

 

Additional Information

Logging

Informational logging that will be available in rm.log

At server startup:

CRRRS8752I The RM query monitor task started. The run interval for the task is set to 30 seconds. The maximum query run time is set to 1 minute 30 seconds.
or
CRRRS8753I The RM query monitor task is disabled. The run interval for the task is set to 0 seconds.

when a rogue query is detected:

CRRRS8754W The RM query monitor detected and will cancel a query that started running at 8/23/18 3:50 PM and has been running for 1 minute 37 seconds. The query ID is 33f9b07d-d771-437a-854b-3a3437a2b0ed with thread 574.

Debug logging can be invoked in order to fully understand what is occurring when operations and queries are timing out, via:

log4j.logger.com.ibm.rdm.fronting.server.core.query.monitor.RogueQueryMonitorTask=DEBUG


Additional Notes:

If you are asked to run a specific tool by IBM Support to troubleshoot a situation, or run a corrective procedure, you may need to adjust these settings.  If it appears that nothing is happening in the GUI after  2 minutes, then please refer to the rm.log.

An example would be ReqIF export.  Jazz.net defect: https://jazz.net/jazz03/web/projects/Requirements%20Management#action=com.ibm.team.workitem.viewWorkItem&id=126574/ APAR PH05142 is an example of a user action impacted by this.  The advice is to amend the rogue query timeout to a value which allows your exports to complete.

IBM Support will advise whether to turn off this feature temporarily, or whether an adjustment to the time-out will suffice.

[{"Business Unit":{"code":"BU055","label":"Cognitive Applications"},"Product":{"code":"SSUVLZ","label":"IBM Engineering Requirements Management DOORS Next"},"Component":"","Platform":[{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"6.0.6","Edition":"Ifix 003+","Line of Business":{"code":"LOB02","label":"AI Applications"}}]

Product Synonym

RDNG; Rational DOORS Next Generation;Rational DOORS Next Generation

Document Information

Modified date:
13 November 2018

UID

ibm10732958