APAR status
Closed as unreproducible.
Error description
Sver becomes unresponsive with thousands of ESTABLISHED and many CLOSE_WAIT appearing bogus connections MakePWEntry thread: ############################################################ ### thread 12/97: [ nSERVER: 1198: 1dcc] ### FP=0x0a90f008, PC=0x7c82860c, SP=0x0a90efa0 ### stkbase=0x0a910000, total stksize=262144, used stksize=4192 ############################################################ [ 1] 0x7c82860c ntdll.KiFastSystemCallRet+0 (3e8,0,a90f028,600997ca) [ 2] 0x77e424fd kernel32.Sleep+15 (3e8,38ae76d8,a90f370,600807e4) @[ 3] 0x600997ca nnotes.OSDelayThread@4+42 (3e8) @[ 4] 0x600807e4 nnotes.NIFUpdateCollectionNext@8+1732 (38ae8208,37e1c9f8) @[ 5] 0x60047572 nnotes.NIFUpdateCollection@4+466 (a90112f) @[ 6] 0x60ad6492 nnotes.NIFGetCollectionUpdated@12+402 (38ae76d8,0,a90f570) @[ 7] 0x60ad7c56 nnotes.NIFOpenCollectionExtended4@60+3414 (1136,1136,2d2,20,0,a90f5b0,f10f10,ffffffff,0,0,0,0,0,0,0) @[ 8] 0x60059702 nnotes.NIFOpenCollectionExtended3@56+66 (5c8,5c8,2d2,20,0,a90f5f4,f10f10,ffffffff,0,0,0,0,0,0) @[ 9] 0x600596bc nnotes.NIFOpenCollectionExtended2@48+60 (5c8,5c8,2d2,20,0,a90f634,f10f10,ffffffff,0,0,0,0) @[10] 0x600653a4 nnotes.NIFOpenCollection@40+52 (5c8,5c8,2d2,20,0,a90f66c,f10f10,ffffffff,0,0) @[11] 0x6035a966 nnotes.AdminpFindProxyDbEntry@28+102 (5c8,60fa63ec,a90f808,a90f78c,0,a90f760,f10f10) @[12] 0x603e879a nnotes.FindProxyEntry@40+410 (5c8,60fa63ec,a90fa74,60fa66d8,0,a90f874,f10f10,ffffffff,0,0) @[13] 0x603e934f nnotes.MakePWEntry@32+63 (60fa63ec,a90fa5c,a90fa20,2be52f26,0,a90f9d8,f10f10,ffffffff) @[14] 0x603eb9a7 nnotes.SECMakeProxyEntry@40+423 (8,0,0,a90fa74,0,a90f9f4,f10f10,ffffffff,0,0) @[15] 0x60b897e7 nnotes.MakeNewPWEntry@4+535 (a90fb80) @[16] 0x60b89a5a nnotes.Parse_PWNewHashSig@4+26 (a90fb80) @[17] 0x60b75a2e nnotes.AuthServerDialog@12+4350 (a90fb80,1,38f60000) @[18] 0x600ca176 nnotes.AuthStateMachine@4+342 (a90fb80) @[19] 0x60b52057 nnotes.AUTHProcessNetbfr@16+199 (da1c0010,10025de0,a90fe58,a90fc10) @[20] 0x100217ca nserverl.DbServer@8+1226 (968401ab,22d8001a) @[21] 0x100371f5 nserverl.WorkThreadTask@8+1621 (6035884,0) @[22] 0x10001a2e nserverl.Scheduler@4+750 (0) @[23] 0x6010cd0f nnotes.ThreadWrapper@4+175 (0) [24] 0x77e6482f kernel32.GetModuleHandleA+223 (0,0,0,0) - The clients are starting to connect to the server (opening sessions, all goes well until a certain period of time ~30minutes) - In this interval the number of connection from a particular user is rising and remains in the ESTABLISHED mode , and sometimes some of them are passing to the CLOSE_WAIT state, and increasing the number of them - On the client side the problem is different, the client has only 1 or 2 open sessions to the server, of corse, using different source ports - Seems that the domino (or the network driver) is not able to close connections, also very strange that also as an admin using tools, there is no way to close the opened connections - Network connection is ok, switched network cards and used another switch (Cisco), no switchport security is enabled, all settings are automatically 1Gb connection, self test of network card ok, ping and trace run fine When the system is failing respond to clients: - trace is not connecting to the server - telnet to server 1352 is opening and stable - number of sessions originating from the same client (on the same source port) increases (so domino is not closing connections, but trying to establish old connection that was previously good) - on the client side there are no new connections established - domino is putting many connections on CLOSE_WAIT state, and some of them in FIN_WAIT_2, which is very bad - connections cannot be closed manually - disconnecting the network card slowly closes connections and after reenabling the network card domino responds again - same if domino service is restarted - Server_Session_Timeout=10 parameter is not closing idle connections, which means that those connections are not in the IDLE stage, which is strange because it shows no transfer bytes IN/OUT for the connection - when the client is initiating the connection to the server it shows only one socket pair, which is normal, on which the exchange of bytes is taking place - beside that connection which is active, is starting to multiply the number of them but without activity on those new sockets created (and not even the original one is not exchanging anymore bytes) Bogus sessions seen in NSD: <@@ ------ Notes Data -> Server Data -> Server Task Vars (Time 13:45:42) ------ @@> Indx TaskId VarBlock SessionID Ver Proto|ST TrId Fnc TS|#Dbs #DocR #DocW|Trans NetW |Session Duration|UserName ---- ------ -------- --------- --- -----|-- ---- -- --|---- ----- -----|----- ------|----------------|-------- 1 [2515: 38458] [175: 46786] [2437: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 2 [2514: 38458] [175: 42154] [2438: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 3 [2516: 38458] [175: 37522] [2440: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 4 [2517: 38458] [175: 32890] [2439: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 5 [2518: 38458] [175: 28258] [2441: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 6 [2520: 38458] [175: 23626] [2442: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 7 [2519: 38458] [175: 18994] [2443: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 8 [2521: 38458] [175: 14362] [2445: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 9 [2522: 38458] [175: 9730] [2444: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | 10 [2523: 38458] [175: 5098] [2446: 1706] 0 0: 0| 2 0 142 0| 0 0 0| 0 0| 474558h:40m:38s | Data review also suggests that this could possibly be caused by issue similar to SPR JRED7SNU25: ID Vault: Multiple "Change HTTP Password in Domino Directory" requests in the admin4 database. and FLII8BYBFY: Server hang because all server working threads are doing MakeHttpPWChange (however the thread is not httpPW The workaround here is to disable the "Update Internet Password when Notes Client Password Changes" in security setting document, not suitable
Local fix
as per doc 1385788: WSKDMN_DEBUG_DONTLINGER=1 can be used to workaround the issue and need restart Domino server to make change effective. The setting will disable TCP SO_LINGER option to avoid bogus sessions to cause Domino Server hang. There is no observed implication like performance from implemented customer sites (confirmed solved problem) workaround #2: Rebuild admin4.nsf: 1. "Tell AdminP quit" on the Domino server console 2. Issue the following command on the domino server console: "dbcache flush Admin4.NSF" (This may need to be done multiple times in order to release the servers' handle from the database). 3. Then, quickly rename the current Admin4.NSF to Admin4.OLD. 4. Replicate a new copy of the Admin4.NSF from a server known to have a good copy. 5. Lastly, issue the command "load AdminP" on the Domino server console.
Problem summary
This APAR is closed as FIN. We have deferred the fix to a future release.
Problem conclusion
Temporary fix
Comments
This APAR is associated with SPR# JFRA8D7EZB. The problem will be fixed in the next release of the product.
APAR Information
APAR number
LO57703
Reported component name
DOMINO SERVER
Reported component ID
5724E6200
Reported release
851
Status
CLOSED UR5
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2011-01-17
Closed date
2012-09-13
Last modified date
2012-09-13
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
R851 PSN
UP
[{"Business Unit":{"code":"BU055","label":"Cognitive Applications"},"Product":{"code":"SSKTMJ","label":"Lotus Domino"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
13 September 2012