APAR status
Closed as program error.
Error description
Customer shutsdown the Domino Server and the Domino Server tasks shutdown however the nRouter Task does not shutdown Completly which in turn results in an NSD -nomemcheck as the server did not shutdown in the time specified in the Server Document. In this case the server shutdown timeout was set to 5 minutes (300 Seconds) An investigation of the NSD confirms this is an nomemcheck crash by the following argument: "C:\Notes\nsd.exe" -dumpandkill -termstatus 1 -nomemcheck -shutdownhang -crashpid 1660 -crashtid 2740 -runtime 300 1) The first thing that I do when analyzing an NSD is look at the Name of the Server, Date & Time, OS Version and Notes Version and since this is a shutdown hang we can confirm this by the following Argument as we see -nomemcheck: Host Name : ABCD1234 User Name : SYSTEM Date : Tue Aug 23 21:51:17 2011 Windows Dir : C:\WINNT "C:\Notes\nsd.exe" -dumpandkill -termstatus 1 -nomemcheck -shutdownhang -crashpid 1660 -crashtid 2740 -runtime 300 NSD Version : 8.5.15.0214 (Release 8.5.1FP5) OS Version : Windows/2003 5.2 [64-bit] (Build 3790), PlatID=2, Service Pack 2 (4 Processors) Build time : Thu Sep 30 03:03:05 2010 Latest file mod : Tue Aug 03 10:57:08 2010 Domino Version : Release 8.5.1FP5 HF231 (64-bit server) 2) Once we have verified the Build I then search for "OS Process" to see what is running on the server, when the OS was rebooted and we can see when the server crashed outlined in RED, however this crash was generated as the server did not shutdown within the 5 minutes set in the Server Document. A Review of the OS Process Table the server is awaiting the nRouter Process to shutdown before the entire server can successfully shutdown. <@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@> Section: System Data -> OS Process Table (Time 21:51:43) <@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@> <@@ ------ System Data -> Processes (Time 21:51:43) ------ @@> INFO PID PPID UID STIME COMMAND ["C:\Notes\nservice.exe" "=C:\Notes\notes.ini": 057c] -> 067c 057c 0 08/02 00:01:11 ["C:\Notes\nSERVER.EXE" =C:\Notes\notes.ini: 067c] -> 0440 067c 0 08/02 00:17:52 [C:\Notes\nRouter.EXE : 0440] 04d0 067c 0 08/23 21:51:16 ["C:\Notes\nsd.exe" -dumpandkill -termstatus 1 -nomemcheck -shutdownhang -crashpid 1660 -crashtid 2740 -runtime 300: 04d0] 3) We also see the following is outlined, confirming this was a shutdown monitor NSD: ############################################################ ### thread 2/3: [ nSERVER: 067c: 0ab4] ### FP=0x6eb4ac78, PC=0x77ef02ea, SP=0x6eb4ac78 ### stkbase=0x6eb50000, total stksize=4194304, used stksize=21384 ############################################################ [ 1] 0x77ef02ea ntdll.ZwWaitForSingleObject+10 (12c,6eb4d9f0,6eb4dfb0,6eb4dfd0) [ 2] 0x77d704ff kernel32.WaitForSingleObjectEx+223 (4e8,6eb4dfc8,0,0) @[ 3] 0x004adae7 nnotes.OSRunExternalScript+4151 (4,0,24982a4,3) @[ 4] 0x004a8091 nnotes.FRTerminateWindowsResources+2277 (6E006F004D0020,72006F00740069,53006F0051007C,73006100540020) @[ 5] 0x004b0e16 nnotes.OSFaultCleanupExt+622 (0,467a26,0,0) @[ 6] 0x004b1829 nnotes.OSFaultCleanup+29 (ad180004,0,EE00000000,f0102a95) @[ 7] 0x10006aaa nserverl.ShutdownMonitorTask+898 (c6e4009c,0,0,1) @[ 8] 0x10001b21 nserverl.Scheduler+969 (0,0,0,0) @[ 9] 0x0044ff1e nnotes.ThreadWrapper+330 (0,0,0,6eb4ffa8) [10] 0x77d6b71a kernel32.BaseThreadStart+58 (0,0,0,0) 4) Which is outlined in the following technote: Lotus Software Knowledge Base Document Title: How does the Shutdown Monitor work in Domino? Doc #: 1236058 URL: http://www.ibm.com/support/docview.wss?uid=swg21236058 5) However the cause of the long shutdown time was due to the nRouter process and a review of the stack below points us this known issue outlined in SPR# CMAS7VQHK4 which is addressed in 8.5.2 codestream, ############################################################ ### thread 1/2: [ nRouter: 0440: 0c60] ### FP=0x00128528, PC=0x77ef030a, SP=0x00128528 ### stkbase=0x00130000, total stksize=81920, used stksize=31448 ############################################################ [ 1] 0x77ef030a ntdll.ZwReadFile+10 (6A300062109,2d8000ab,ffffffff,10052e60) [ 2] 0x77d6e4a6 kernel32.ReadFile+182 (128608,129394,42,128608) @[ 3] 0x1004a7ff nnotes.OSFDFileRead+47 (0,7b7ec1a8,42,1004a7ff) @[ 4] 0x116bf59b nnotes.FileReadOSFD+439 (4,0,128a7e48,128a7d80) @[ 5] 0x116c12dd nnotes.NSFFileRead+37 (0,ffffffff,0,12c850) @[ 6] 0x119817ff nnotes.ReadBDB+775 (538d5020,54009,7b7ec1a8,12c850) @[ 7] 0x1197fde1 nnotes.DbBDBRead+1057 (12b950,54009,6,7b7ec1a8) @[ 8] 0x11919cbc nnotes.DbLoad+1976 (6f809,2cbc009,12cb04,12cb04) @[ 9] 0x116df1f7 nnotes.NSFDbOpenExtended4+24463 (2CBC00900000000,10052e60,5f640618,0) @[10] 0x116d82ab nnotes.NSFDbOpenExtended+127 (f0247c96,1004dcfc,12fd1020,41c0018) @[11] 0x1171f81a nnotes.NSFDbOpen+34 (0,11219348,0,fcb0038) @[12] 0x11d89970 nnotes.MailDeleteDeliveryContext+52 (0,455146,12d9ec,12de48) @[13] 0x004398e5 nRouter.RouterDbCacheTerm+117 (e6a00003,29,3330,158) @[14] 0x0040214c nRouter.AddInMain+4428 (1,10125a1d,0,0) @[15] 0x0047616e nRouter.NotesMain+74 (40fac80,0,0,40b0000) @[16] 0x00476275 nRouter.main+245 (2,0,0,0) @[17] 0x00484a18 nRouter.mainCRTStartup+568 (0,0,0,0) [18] 0x77d596ac kernel32.BaseProcessStart+44 (4847e0,0,0,0) In the end what is happening is the router is shown to be de-registering NSF Monitors that have been registered for mail files that the router has delivered mail to. These monitors are the basis of user mail rules that exist in each mail file. This is unchanged since the initial support for mail rules, but we have found that the de-registration can take some time if there are a large number of databases that contain rules, as each database needs to be re-opened and closed to perform the de-registration. This is the reason it takes the server a long time to shutdown and the crash occurs as the nRouter process has not completly shutdown.
Local fix
NA
Problem summary
A programming error was found and will be corrected in a future release.
Problem conclusion
A programming error was found and will be corrected in a future release.
Temporary fix
Comments
This APAR is associated with SPR# CMAS7VQHK4.
APAR Information
APAR number
LO63255
Reported component name
DOMINO SERVER
Reported component ID
5724E6200
Reported release
851
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2011-08-24
Closed date
2011-09-02
Last modified date
2011-09-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
DOMINO SERVER
Fixed component ID
5724E6200
Applicable component levels
R851 PSN
UP
[{"Business Unit":{"code":"BU055","label":"Cognitive Applications"},"Product":{"code":"SSKTMJ","label":"Lotus Domino"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
02 September 2011