Troubleshooting runtime issues
You might encounter an issue during WebSphere Automation normal operation, such as failed health investigations. Learn how to fix the most common runtime issues.
The following issues might cause a health investigation to fail. When an investigation fails, download the archive file for the investigation by using the WebSphere Automation UI or the REST API. Open the archive and examine the analysis.log file for errors.
- FIPS-enabled server fails to register with WebSphere Automation after installing a JDK fixpack with a SecureRandom SHA2DRBG for provider not available error message
- No contact with server, or contact lost with registered server
- Investigation not created for an Instana alert
- Out-of-memory error for the memory analysis runner job
- Failed to identify server on host example.com
- The MyCustomRole role includes an invalid permission: can_view_websphere_inventory
- Fix installation fails with a Connection error: read operation timed out error message
- Fix installation fails on WebSphere Application Server Liberty with a Red Hat® Enterprise Linux® non-root user and with Installation Manager in group mode
- Errors in running status or synchronization of node agent after a fix pack is installed
- Installation of the fix cannot proceed error message
- Problem with request to install fix error message
- Installation of a fix on a Windows server stalls
- Install-fix pod is stalled with ContainerStatusUnknown with installing fixes
- iFix installation on a target server with an AIX operating system fails with error chmod: A flag or octal number is not correct
- Failure to invoke a webhook after a memory leak is detected
- Bulletin import job fails in an air gap installation
- The timeout was exceeded accessing the Operate > Application runtimes page
- In a FIPS-enabled environment, installations of fixes or memory leak investigation do not progress
- Email notifications are not sent, Can't verify identity of server error message in the wsa-secure-vulnerability-notifier pod log
- Runbook logs contain file access permission errors
- No iFix option for fixing vulnerabilities in a Java SDK runtime
- FIPS-enabled server fails to register with WebSphere Automation after installing a JDK fixpack with a SecureRandom SHA2DRBG for provider not available error message
- Installing a Java SDK runtime fixpack on a registered WebSphere Application Server or WebSphere Application Server Liberty
server that is configured to be compliant with Federal Information Processing Standards (FIPS) might
fail with the following
error.
To resolve this issue, configure the registered server according to the instructions in the following articles.Stack Dump = java.lang.RuntimeException: SecureRandom SHA2DRBG for provider <provider_name> not available - No contact with server, or contact lost with registered server
-
WebSphere Automation uses the usage metering feature of WebSphere Application Server and WebSphere Application Server Liberty to register servers and maintain regular contact with them. If WebSphere Automation is unable to contact a server, or if contact with a registered server is lost for more than six hours, WebSphere Automation displays visual indicators in the UI to make you aware of the situation. If the loss of contact is not due to a known and acceptable reason, try the following steps to diagnose and resolve the issue.
- Ensure usage metering is correctly configured on the target server.
For more information, see Setting up security monitoring.
- Restart the server.
If the server was removed from WebSphere Automation, or if the usage metering feature was disabled, enable the feature and restart the target server. This action leads to the server being registered with WebSphere Automation again. For more information, see Unregistering servers.
- Check network connectivity.
If the usage metering feature is properly configured but contact is not established, check the logs on the target server for indications of network connectivity issues that might block communication between the server and WebSphere Automation.
- Ensure usage metering is correctly configured on the target server.
- Investigation not created for an Instana alert
- This problem can be caused by the host not having any registered servers. If the investigation
manager finds no registered servers for the host, the following error message is written in the
investigation-manager log file:
Investigation cannot be started because no assets are registered with the example.com host. - Out-of-memory error for the memory analysis runner job
(In version 1.3 or later) java.lang.OutOfMemoryError: Java heap space (In version 1.2) JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError"By default, the memory request and memory limit of the memory analysis runner job are set to 4 GB. These settings are sufficient for the runner job to analyze most heap dumps. If you see this error message, the analyzer did not have enough memory to analyze the heap dump. You can allocate more memory to theThe default instance name ismemoryAnalysisRunnersetting inWebSphereHealthcustom resources. For more information, see WebSphereHealth custom resource. Alternately, you can edit theWebSphereHealthinstance with the following command:oc edit WebSphereHealth <instance-name> -n <namespace>wsa-health. The default namespace iswasautomation.Note: The memory defaults to 4Gi with a limit of 4Gi. You can increase the memory to a bigger value, such as 20Gi, as in the following example. Set the memory request and memory limit to the same value. The Java VM uses the amount of memory that is specified by the limit to calculate the maximum heap size. Kubernetes only guarantees that the process can obtain the amount of memory that is specified by the request.spec: analysisManager: Image: … memoryAnalysisRunner: resources: limits: cpu: '1' memory: 20Gi requests: cpu: 500m memory: 20GiNote: When you allocate more resources tomemoryAnalysisRunner, make sure that the worker nodes can handle the requests.- Failed to identify server on host example.com
-
Failed to identify the server on host example.comThis error can be caused by several problems. To resolve the issue, try the following steps:- Ensure that all of the prerequisites for managed servers are met. For more information, see Managed server requirements.
- Test the connection between WebSphere Automation and the managed server.
- Troubleshoot setup issues.
- The MyCustomRole role includes an invalid permission: can_view_websphere_inventory
- If you included the
can_view_websphere_inventorypermission in a custom role in version 1.1, this permission was removed in version 1.2. To fix your custom roles, you must use the API:- Get the API key from the cpd UI.
From cpd console, click , then click the API key button.
- Get a bearer token to use for API
calls:
curl -k -X POST -H 'Content-Type: application/json' -d '{"username":"<user_name>","api_key":"<api_key>"}' https://$(oc get route -n wasautomation -o jsonpath='{.items[?(@.spec.to.name=="ibm-nginx-svc")].spec.host}')/icp4d-api/v1/authorize - Get a list of the roles. This list is needed to get the extension name and JSON metadata, which
is used in a subsequent step to modify the broken custom
role:
curl -X GET -k -v -H "Authorization: Bearer <bearer_token>" --header "Content-Type: application/json" --header "Accept: application/json" https://$(oc get route -n wasautomation -o jsonpath='{.items[?(@.spec.to.name=="ibm-nginx-svc")].spec.host}')/api/v1/usermgmt/v1/rolesExample:
curl -X GET -k -v -H "Authorization: Bearer eyJhbGciOiJSUz..." --header "Content-Type: application/json" --header "Accept: application/json" -d '{"role_name":"mycustomrole","description":"","permissions":[]}' https://$(oc get route -n wasautomation -o jsonpath='{.items[?(@.spec.to.name=="ibm-nginx-svc")].spec.host}')/api/v1/usermgmt/v1/rolesResponse (truncated):
{"rows":[{"id":"f60b72c3-ae7e-4860-8f98-649e316af6d2","key":"f60b72c3-ae7e-4860-8f98-649e316af6d2","doc":{"_id":"f60b72c3-ae7e-4860-8f98-649e316af6d2","extension_id":"_ce_703424172539772929","extension_name":"f60b72c3-ae7e-4860-8f98-649e316af6d2","role_name":"mycustomrole","description":"","permissions":["can_view_websphere_inventory"]...],"messageCode":"success","message":"success"} - For each custom role that contains the
can_view_websphere_inventorypermission, remove that permission and replace it with thecan_view_application_runtime_securitypermission.curl -X PUT -k -v -H "Authorization: Bearer <bearer_token>" --header "Content-Type: application/json" --header "Accept: application/json" -d '{"role_name":"","description":"","permissions":['can_view_application_runtime_security']}' https://$(oc get route -n wasautomation -o jsonpath='{.items[?(@.spec.to.name=="ibm-nginx-svc")].spec.host}')/api/v1/usermgmt/v1/role/<extension_name>Example:
curl -X PUT -k -v -H "Authorization: Bearer eyJhbGciOiJSUz..." --header "Content-Type: application/json" --header "Accept: application/json" -d '{"role_name":"mycustomrole","description":"","permissions":["can_view_application_runtime_security"]}' https://$(oc get route -n wasautomation -o jsonpath='{.items[?(@.spec.to.name=="ibm-nginx-svc")].spec.host}')/api/v1/usermgmt/v1/role/f60b72c3-ae7e-4860-8f98-649e316af6d2Response (truncated):
{"id":"f60b72c3-ae7e-4860-8f98-649e316af6d2","messageCode":"success","message":"success"}
- Get the API key from the cpd UI.
- Fix installation fails with a Connection error: read operation timed out error message
-
If the installation of a fix fails with this error in the runbook.log file, click Install fix in the UI at a later time to restart the installation of the fix.
- Fix installation fails on WebSphere Application Server Liberty for a non-root user with Installation Manager in group mode on Linux or UNIX
-
This error occurs because the InstallationManager.dat file that WebSphere Automation accesses is not located in the non-root user's home directory as expected. To work around this problem, create an InstallationManager.dat file in the non-root user's home directory that has a symbolic link to the actual location of the InstallationManager.dat file. Refer to the following example.
ln -s /<my_group_name>/InstallationManager_AppData/etc/.ibm/registry/InstallationManager.dat \ /home/<non-root-username>/etc/.ibm/registry/InstallationManager.dat - Errors in running status or synchronization of node agent after a fix pack is installed
-
After you use WebSphere Automation to install a fix pack on a node in WebSphere Application Server Network Deployment, you might see one of the following problems:
- An incorrect running status for the node in the admin console
- An incorrect synchronization for the node in the admin console
- An error similar to the following one in the SystemOut.log
file:
ADMD0026W: The version of the deployment manager (9.0.5.11) is earlier than that of this node (node1, 9.0.5.12).
These problems occur because the fix pack version of the node is higher than the version of the deployment manager host. To resolve the problem, manually update the deployment manager host to a version equal to or higher than the fix pack version.
- Installation of the fix cannot proceed error message
-
If the Installation of the fix cannot proceed error message appears, it might be caused by one of the following problems:
- A communication problem might exist between WebSphere Automation and IBM Fix Central.
- A configuration problem might exist that prevents WebSphere Automation from authenticating with IBM Fix Central.
- A user privilege problem might exist that prevents WebSphere Automation from installing the fix on the managed server.
Check the configurations to ensure that they are correct. If the configurations are correct and a communication problem is suspected, try initiating the fix again after approximately one hour.
- Problem with request to install fix error message
-
If the Problem with request to install fix error message appears, it is because more than one fix installation was initiated on a particular host. Only one fix can be installed on a particular host at a time. Wait until the current fix installation process is complete before you attempt to install another fix on that host.
- Installation of a fix on a Windows server stalls
-
If the process of installing a fix on a Windows server stalls for an unreasonable amount of time, restart the Windows server and then restart the installation of the fix.
- Install-fix pod is stalled with ContainerStatusUnknown with installing fixes
- It is possible for a fix installation to stall and show Installing fix in the WebSphere Automation UI, and for the install-fix pod to remain in
ContainerStatusUnknownstate. While in this condition, subsequent installation attempts on the same host do not proceed and result in the following error message.WIORM0806E: Other fixes are being installed on the host 'myhost.com'. Try again later.
To check your pod status, run the
ocget pod command. Look for theContainerStatusUnknownstate.oc get pod | grep install-fix install-fix-f6054b58-f20d-4351-8c44-7c1efd93f2d5-9m89j 0/1 ContainerStatusUnknown 1 48mTo work around this issue, you must delete the stalled installation that never proceeds past the
in-progressstatus and then delete the related install-fix job. - iFix installation on a target server with an AIX operating system fails with error chmod: A flag or octal number is not correct
- This error is related to the use of Ansible on the AIX operating system when both the connection
user and the
become_userare unprivileged. To prevent this problem from recurring, do the following steps:- Add
--from-literal=ansible_pipelining=trueto your secret. - Disable
requirettyin your /etc/sudoers file for all managed hosts.You can do this by commenting out theDefaults requirettyline, as shown in the following example.#Defaults requiretty
- Add
- Failure to invoke a webhook after a memory leak is detected
- Unexpected updates to the Instana alerts schema can cause this problem. WebSphere Automation uses a JSON schema to validate the JSON that is sent
from Instana. The schema that WebSphere Automation uses is set in the
wsa-schema-instana-alertsconfig map. Ensure that the environment variable$WSA_INSTANCE_NAMESPACEis set to your WebSphere Automation instance namespace.- Retrieve the default Instana Alerts schema from the default ConfigMap as a local file named
instana-alerts-custom.json.oc get cm wsa-schema-instana-alerts -n $WSA_INSTANCE_NAMESPACE -o "jsonpath={.data['instanaAlerts\.json']}" > instana-alerts-custom.json - Make the necessary changes to the
instana-alerts-custom.jsonJSON file. - Create the custom
ConfigMap.
oc create cm wsa-schema-instana-alerts-custom -n $WSA_INSTANCE_NAMESPACE --from-file=instanaAlerts.json=instana-alerts-custom.json
- Retrieve the default Instana Alerts schema from the default ConfigMap as a local file named
- Bulletin import job fails in an air gap installation
-
In an air gap installation, the
wsa-secure-bulletins-importpods might fail to complete. For example, if you run the following command:oc get pods | grep importYou might see output with errors:
wsa-secure-bulletins-import-1.6.0-8526l 0/1 Error 0 2d15h wsa-secure-bulletins-import-1.6.0-b7jld 0/1 Error 0 2d15h wsa-secure-bulletins-import-1.6.0-c4cxf 0/1 Error 0 2d15h wsa-secure-bulletins-import-1.6.0-dsmdg 0/1 Error 0 2d15h wsa-secure-bulletins-import-1.6.0-fgj7p 0/1 Error 0 2d21h wsa-secure-bulletins-import-1.6.0-kw9qm 0/1 Error 0 2d15h wsa-secure-bulletins-import-1.6.0-t5sgl 0/1 Error 0 2d15h
If so, delete the bulletins import job.
oc delete job wsa-secure-bulletins-import-1.6.0A new bulletins import job is created.
- The timeout was exceeded accessing the page
- When the WebSphereSecure UI is not able to contact the Platform UI instance when it uses the Red
Hat OpenShift provisioned certificates, it results in this problem. When the application fails to
communicate, the browser experiences a timeout, displaying the following
error:
timeout of 20000ms exceededTo solve this problem, delete the WebSphereSecure UI deployment (
<instance-name>-secure-ui) to restart the application.For example, with a WebSphere Automation instance name
wsa, you should delete thewsa-secure-ui.Delete the related deployment with the oc delete deployment deployment_name command.
oc delete deployment <instance-name>-secure-ui -n <namespace> - In a FIPS-enabled environment, installations of fixes or memory leak investigation do not progress
- If you use an SSH key pair that was generated on a non-FIPS system in a FIPS-enabled
installation of WebSphere Automation, application of fixes or memory
leak investigations might not progress. The pod log may stop at the following
lines:
[07/28/23 18:27:00:516 UTC] 1 com.ibm.ws.automation.core.runbook.runner.RunbookRunnerCLI INFO start Request received to execute runbook: install-fix against server: server1.example.com (correlationId: 65301e97-5754-4001-afbe-0c669d6774ff) [07/28/23 18:27:00:607 UTC] 1 com.ibm.ws.automation.core.runbook.runner.AnsibleRunner INFO runRunbook Here is the standard output of the command: [07/28/23 18:27:00:613 UTC] 1 com.ibm.ws.automation.core.runbook.runner.AnsibleRunner INFO runRunbook was: [07/28/23 18:27:00:613 UTC] 1 com.ibm.ws.automation.core.runbook.runner.AnsibleRunner INFO runRunbook hosts: [07/28/23 18:27:00:613 UTC] 1 com.ibm.ws.automation.core.runbook.runner.AnsibleRunner INFO runRunbook server1.example.com: [07/28/23 18:27:00:613 UTC] 1 com.ibm.ws.automation.core.runbook.runner.AnsibleRunner INFO runRunbook ansible_user: root [07/28/23 18:27:00:625 UTC] 1 com.ibm.ws.automation.core.runbook.runner.AnsibleRunner INFO runRunbook Agent pid 41This problem occurs because the
ssh-keygencommand on a non-FIPS system uses the MD5 digest algorithm to generate keys. On a FIPS-enabled system, the MD5 digest algorithm is disabled. SSH key pairs with no passphrase are not affected.When running WebSphere Automation on a FIPS-enabled cluster, choose one of the following options to use a passphrase protected SSH key pair on a FIPS-enabled system.
- Generate a new passphrase protected SSH key pair on a FIPS-enabled system.
- Convert the existing private key to a FIPS-compatible
format:
$ openssl pkcs8 -topk8 -v2 aes128 -in <INPUT FILENAME> -out <OUTPUT FILENAME> Enter pass phrase for id_rsa: <PASSPHRASE OF EXISTING KEY> Enter Encryption Password: <PASSPHRASE FOR NEW KEY> Verifying - Enter Encryption Password: <PASSPHRASE FOR NEW KEY>
- Email notifications are not sent,
Can't verify identity of servererror message in thewsa-secure-vulnerability-notifierpod log -
Starting in WebSphere Automation 1.7, a more secure Javamail service is used than in previous versions, which enforces server identity when establishing the TLS connection. If the certificate does not match the hostname of the mail server, a secure connection cannot be established and no emails are sent.
To disable hostname verification when sending emails, you can set a mail.smtps.ssl.checkserveridentity property to
falsein the Notifications page.- Log in to WebSphere Automation as a user with Manage notifications privileges.
- Click and then open the Notifications page.
- On the Notifications page, under Email server, click Configure server.
- Click the Add button under Add additional fields.
- Add a mail.smtps.ssl.checkserveridentity parameter, and set its value to
false. - Click Save.
- Runbook logs contain file access permission errors
-
WebSphere Automation playbooks require that the user profile that is defined for the ansible_user connection variable has read access to the following WebSphere Application Server traditional files and their parent directories.
app_server_root/properties/profileRegistry.xml app_server_root/properties/version/installed.xmlIf the profile that is defined for the ansible_user connection variable cannot read the files, you might see errors similar to the following in the runbook logs:
ValueError: File not found or no permissions to access app_server_root/properties/version/installed.xmlor
Permission denied: 'app_server_root/properties/profileRegistry.xml'To grant read access permissions to the files, use the operating system tools to change the file permissions. For example:
chmod +r /opt/IBM/WebSphere/AppServer/properties/profileRegistry.xml chmod +r /opt/IBM/WebSphere/AppServer/properties chmod +r /opt/IBM/WebSphere/AppServer/properties/version/installed.xml chmod +r /opt/IBM/WebSphere/AppServer/properties/versionAdditional information that might be helpful can be found in Granting write permission for profile-related tasks in the WebSphere Application Server documentation. See step 8 in particular and keep in mind that the use case in the linked documentation is different; do not follow those instructions explicitly.
- No iFix option for fixing vulnerabilities in a Java SDK runtime
- In the Prepare fix dialog, no interim fixes (iFixes) are shown as options to fix vulnerabilities in a Java SDK runtime. You must select Fix Pack as the Fix type on the Choose global options page, and then choose a fix pack to install on the Choose fixes page.