IBM Support

How to amend Rational DOORS Next Generation timeouts to manage long running queries

How To


Summary

This document applies to Rational DOORS Next Generation 6.0.6 Ifix003 and later.

It is possible to construct a query across a large data set that runs for a long time. Such a long running query might consume resources and cause the system to become unstable.

Objective

A new Rogue Query Monitor is introduced to capture these queries.
It is now possible to abort a long-running query before it impacts system stability.

Queries that exceed a specified timeout produce a warning message. The message is displayed to the user in the web UI and also seen in the rm.log on the server.  Since 6.0.6 iFix003, it is possible to automatically abort queries that exceed a defined timeout.

Note: Some RM internal queries are excluded from monitoring because we know they are expected to be long running. These include ETL jobs for Reporting.

This document details how to use the timeout values and how to disable this feature, if required.

Steps

The following steps apply to the new Rogue Query Monitor functionality introduced as a stability fix within Rational DOORS Next Generation 6.0.6 ifix003.

Advanced properties

The RM server uses the following advanced properties for the Rogue Query Monitor

1) Rogue Query Monitor run interval

  • Name - Rogue Query Monitor run interval (in seconds)
  • Description - Run interval in seconds for the Rogue Query Monitor. A value of 0 disables the Rogue Query Monitor
  • Default value - 30 seconds
  • Changing the value requires a server restart to become active
  • A value of 0 seconds results in no query monitor running. This reverts to the pre-Ifix003 behavior and must be used under the direction of IBM Support.

2) Rogue Query timeout

  • Name - Rogue SPARQL Query abort timeout (in ms)
  • Description - Enables the RM Rogue Query Monitor to abort exceedingly long running SPARQL queries. If the execution of a query exceeds the amount of time in milliseconds specified here, the query execution aborts to avoid locking up the server
  • Default value - 60000 (1 minute)
  • Can change value on running server
  • No minimum setting, if set to 0 or -1 (ms), it is added (or deducted for negative values) from the client timeout value

3) Web UI Query timeout (exists already)

  • Name - query.client.timeout
  • Description - Value for which the web UI expects a query to time out
  • Default value - 30000 milliseconds (30 seconds)

Allowed runtime calculation

Logic used by the RM Rogue Query Monitor to calculate query abort:

Query start time + (Web UI Query timeout + Rogue Query timeout) > current time:  abort query in Jena by using the current Thread ID

Using this calculation

The minimum query runtime with the default settings is (30 seconds client timeout + 60 seconds rogue timeout) + 1 second for the rogue query monitor interval = 91 seconds

The maximum query runtime with the default settings is (30 seconds client timeout + 60 seconds rogue timeout) + 30 seconds for the rogue query monitor to take effect = 120 seconds (plus a minimal delay while the query monitor is iterating through each the running queries at that moment)

The RM admin debug page for running SPARQL queries now has a checkbox to allow for long running queries.

There is more logging that accompanies this functionality that is set by default, as well as advanced logging that can be set with a log4j property.

Additional Information

Logging

Informational logging that is available in rm.log

At server startup:

CRRRS8752I The RM query monitor task started. The run interval for the task is set to 30 seconds. The maximum query run time is set to 1 minute 30 seconds.
Or
CRRRS8753I The RM query monitor task is disabled. The run interval for the task is set to 0 seconds.

when a rogue query is detected:

CRRRS8754W The RM query monitor detected and will cancel a query that started running at 8/23/18 3:50 PM and has been running for 1 minute 37 seconds. The query ID is 33f9b07d-d771-437a-854b-3a3437a2b0ed with thread 574.

Debug logging can be invoked in order to fully understand what is occurring when operations and queries are timing out with:

log4j.logger.com.ibm.rdm.fronting.server.core.query.monitor.RogueQueryMonitorTask=DEBUG


Extra Notes:

If you are asked to run a specific tool by IBM Support to troubleshoot a situation, or run a corrective procedure, you might need to adjust these settings.  If apparently nothing is happening in the GUI after 2 minutes, then refer to the rm.log.

An example would be ReqIF export.  Jazz.net defect: https://jazz.net/jazz03/resource/itemName/com.ibm.team.workitem.WorkItem/126574 / APAR PH05142 is an example of a user action impacted by this.  The advice is to amend the rogue query timeout to a value that allows your exports to complete.

IBM Support can advise whether to turn off this feature temporarily, or whether to adjust the timeout.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSUVLZ","label":"IBM Engineering Requirements Management DOORS Next"},"Component":"","Platform":[{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"6.0.6","Edition":"Ifix 003+","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

Product Synonym

RDNG; Rational DOORS Next Generation;Rational DOORS Next Generation

Document Information

Modified date:
15 April 2022

UID

ibm10732958