APAR status
Closed as user error.
Error description
In R7.0.1 cluster environment, user's receive multiple emails for reservation pending approval when the Domino server starts. One of our users was the owner of a room and there was a reservation pending their approval. On Sunday we shut the server down for some maintenance. When the server was restarted, the room owner received about 20 messages letting thm know there was a reservation pending approval. Then follow with error message "RnRMgr: Error identifying calendar profile/document ..." 16 servers in the cluster. 15 of them are R7 and are running the RnRMgr task since it is required on R7 servers. There is only one replica of rooms.nsf from this server in the cluster. There are other rooms.nsf databases on other servers in the region because they have their own rooms and resources, however none of the rooms.nsf databases are clustered.
Local fix
No workaround, but possible cause of this problem. 1: The cause of the " RnRMgr: Error identifying calendar profile/document" messages is being investigated. we need a dB that can reproduce the issue in order to isolate the cause and fix it. The error message is generated by RnRMgr when it scans an R&R dB for reservations to process or for new rooms/resources to include in busytime. The code is unable to determine which kind of doc it opened from the doc contents so it posts that warning. 2: The cause of multiple notices to the room owner is understood. It has to do with RnRMgr attempting to do cluster failover of R&R dBs for any clustermates that have become unresponsive for an extended period of time. So each time a clustermate goes offline, RnRMgr will rescan all local dBs for any that it may share with the now offline clustermate and it will attempt to pick up processing of any currently queued reservation requests. In the process it finds dBs it already has control of but it neglects to filter them out. As a result, the R&R dB that is already being monitored is rescanned and the Waiting For Approval (WFA) docs trigger notices to room owners again. If you do not have a clustered R&R server then you would not see this. A fix is being worked on for this. Working on: Looking at how to change the RnRMgr filters to handle the legacy data w/o risking introducing a gap that would allow requests to be lost/missed. The messages are non-critical; the cause is we do not filter out legacy requests that are already processed so they do not adversely impact the R&R performance. Still looking at the issue of multiple notices when cluster failover is attempted.
Problem summary
Problem conclusion
Temporary fix
Comments
This APAR is associated with SPR# TBAE6VJTFC. Per development - unsupported configuration - From what I've seen in the provided info/logs the problem is their configuration (clustering over a WAN). WAN problems caused a "split cluster" which resulted in RnRMgr on both servers processing requests which is to be expected since both sides think the other is down and we have no way to detect otherwise. The problem should self correct by simply restarting the RnRMgr task on the server that originally had control (since it will then detect the clustermate has control and stop processing that dB) OR to simply restart RnRMgr on both servers if you do not want to go figure out which RnRMgr to restart _once_ the cluster is restored. Mulitple emails will still be sent on RnRMgr task restart since in 8.51 the task has no idea if it already sent WFA notices. There will also be mulitple emails on cluster failover because RnRMgr on Server B has no way to know if the pending request already triggered a WFA notice on Server A before it went down. However this is different than the SPRs case of the same server sending mulitple emails when it already controls the dB. These other cases occur in 8.51 when dB control is changing. In any case the configuration is leading to the problem (along with some user expectaitons such as 'fail back' of control which we do not do.)
APAR Information
APAR number
LO52002
Reported component name
DOMINO SERVER
Reported component ID
5724E6200
Reported release
851
Status
CLOSED USE
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2010-05-25
Closed date
2010-09-15
Last modified date
2010-09-15
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
[{"Business Unit":{"code":"BU055","label":"Cognitive Applications"},"Product":{"code":"SSKTMJ","label":"Lotus Domino"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
15 September 2010