This blog is part 2 of a 2-part blog that is focused on some commonly occurring exceptions for the status cache problem. To read part 1 first, see Common status cache problems in the WebSphere Administrative Console - Part 1.
With this problem, the status display of an application server, node agent, or an application is shown as 'red' or 'unknown' on the WebSphere Administrative Console. However, the actual process is up running.
In part 2, we will look at when this type of issue can occur due to one of the following areas:
- File permission
- Networking issues
- Security error
Some of these exceptions can be seen in the SystemOut, SystemError, or in FFDC log files.
The first example is due do incorrect file permissions.
AbstractStatu 3 Failed to loadcluster context while determining application status:
at com.ibm.ws.management.status.DeploymentManagerStatusCache._sendReport(DeploymentManagerStatusCache.java:446) at com.ibm.ws.management.status.DeploymentManagerStatusCache.placeReport(DeploymentManagerStatusCache.java:394) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
The previous exception can indicate that some files under the WAS_home directory might not have the same file permission as the WebSphere Application Server installed user ID. To resolve the previous exception, check the following conditions:
- Review the file listing of the WAS_home directory and make sure that all files under WAS_home are consistent as the WebSphere Application Server installation user ID. As a rule of thumb, all file/group permission should be consistent with the user ID that was used during the WebSphere Application Server installation process. All files should have read/write permission (755).
- Verify that the ulimit value of the file descriptors (ulimit -n 10,000 or higher), and the user has a required umask of 022. Refer to this link for more detail: http://www.ibm.com/support/knowledgecenter/SSAW57_8.0.0/com.ibm.websphere.installation.nd.doc/info/ae/ae/tins_prepare.html
- Make sure that the Javasharedresources directory has the correct file permission/user ID.
Some examples of networking problems that cause the status cache problem are as shown in the following examples:
00000043 SOAPConnector < invokeTemplate -
failed Exit queryName [SOAPException: faultCode=SOAP-ENV:Client; msg=Error opening socket: java.io.IOException: Exception during sslSocket.startHandshake: Remote host closed connection during handshake; targetException=java.lang.IllegalArgumentException: Error opening socket: java.io.IOException: Exception during sslSocket.startHandshake: Remote host closed connection during handshake]
[12/06/14 18:52:15:454 EST] 0000006b SOAPConnector < invokeTemplate -failed Exit isAlive [SOAPException: faultCode=SOAP-ENV:Protocol; msg=Unsupported response content type "text/html", must be: "text/xml". Response was: <HTML><TITLE>408 – Request Timeout</TITLE><BODY><h1>408 Connection timed out while reading request</h1></BODY></HTML>
FFDC Exception:java.net.BindException SourceId:com.ibm.ws.management.discovery.Endpoint.initialize ProbeId:101
Reporter:com.ibm.ws.management.discovery.Endpoint@79947994 java.net.BindException: Address already in use at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:402)
at java.net.ServerSocket.<init>(ServerSocket.java:194) at java.net.ServerSocket.<init>(ServerSocket.java:106)
A few checkpoints for the previous exceptions:
- Review the network trace (snoop -o /tmp/snoop.bin), OS systemcall (truss -d -f) with your network administrator to make sure that the communication is good between the node agent and the deployment manger.
- Run netstat -an to determine whether the port in question is in listening mode to ensure that there is no port conflict.
Another commonly seen status cache problem is due to security certificate problem as shown in the following examples:
[10/17/14 0:38:42:865 EDT] 00000013 WSX509TrustMa E CWPKI0022E: SSLHANDSHAKE FAILURE: A signer with SubjectDN "CN=test3,OU=test3Cell, OU=test3Manager, O=IBM, C=US" was sent from targethost:port "unknown:0". The signer may need to be added to local trust store "/usr/IBM/WebSphereV7/AppServer/profiles/AppSrv01/config/cells/test3Cell/trust.p12" located in SSL configuration alias "NodeDefaultSSLSettings" loaded from SSL configuration file "security.xml". The extended error message from the SSL handshake exception is: "PKIX path validation failed: java.security.cert.CertPathValidatorException: The certificate expired at Tue Aug 07 11:40:56 EDT 2012; internal cause is: java.security.cert.CertificateExpiredException: NotAfter: Tue Aug 07 11:40:56 EDT 2012".
[3/4/14 15:38:30:364 EST] 0000000f WSX509TrustMa E CWPKI0022E: SSLHANDSHAKE FAILURE: A signer with SubjectDN "CN=was2_lab886, OU=DSWEB, O=FDC, ST=NC/RTP" was sent from target host:port "localhost:9634". The signer may need to be added to local trust store "/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/config/cells/lab886Cell01/trust.p12" located in SSL configuration alias "NodeDefaultSSLSettings" loaded from SSL configuration file "security.xml". The extended error message from the SSL handshake exception is: "PKIX path building failed: java.security.cert.CertPathBuilderException: unable to find valid certification path to requested target".
Based on the previous errors, check the following information:
- Might need to disable global security to narrow down the problem.
- Verify the trust keys to see if they are in place with the correct permissions.
- Might need to renew the certificate and/or engage a WebSphere Application Server security expert
Here are some manual steps to renew the certificates:
- Stop servers and the node agent.
- Rename key.p12 and trust.p12 files, which are located under WAS_home/profiles/<profile Name>/etc/ directory, as key.p12_bkup and trust.p12_bkup .
- Copy the key.p12 and trust.p12 files, which are located under the WAS_home/profiles/<profileName>/<Dmgr01>/config/cells/<cell Name> directory, to the WAS_home/profiles/<profile Name>/etc/ directory.
- Run the following command from the WAS_home/profiles/<profile Name>/bin directory:
SyncNode dmgr_hostname dmgr_SOAP PORT -username <username> -password <password> -trace
Alternatively, you can renew the certificates on the console. See the following information:
problem-problem-solution-solution (modified) credit: (cc) Some rights reserved by geralt