Troubleshooting integration servers and integration runtimes in the App Connect Dashboard

Review this information to help resolve issues when you deploy a Toolkit or Designer integration to an integration server or integration runtime in the App Connect Dashboard, or when the integration runs.

About this task

Tips: If you need to run any oc commands (or kubectl as an alternative), you must be logged in to your cluster, as described in Logging in to your cluster from the command-line interface.

For advice about specific problems that can occur when you deploy or run the integration, see the following sections.

Resolving a BAR file analysis error while creating a 13.0.1.0-r1 or later integration runtime

When you deploy a BAR file to create an integration runtime, you might see the following error if the types of flows in the BAR file cannot be identified when an attempt is made to analyze the BAR file:

Error analysing bar file, review analyse output for further information:….

Procedure

To resolve this error, complete the following steps:

  1. Update the custom resource settings of the integration runtime to indicate what type of flows are contained in the BAR file. Set value to true or false as follows to indicate that the flows are built in the IBM App Connect Enterprise Toolkit (toolkitFlow), or that they are API flows (designerAPIFlow) or event-driven flows (designerEventFlow) that are authored in App Connect Designer.
    spec:
      flowType:
        toolkitFlow: value
        designerAPIFlow: value
        designerEventFlow: value

    For more information, see Integration runtime: Updating the custom resource settings for an instance and Integration runtime: Custom resource values.

  2. Repeat the steps that you completed earlier to create the integration runtime.
  3. Optional: Raise a support ticket (or case) and provide a copy of your Operator log and a copy of the BAR file if you want IBM to investigate the root cause of the problem.

Resolving resource usage issues on startup of a pod

About this task

If a deployed BAR file for a Designer or Toolkit integration contains a definition that is complex or resource-intensive, on startup of the integration server or integration runtime, you might notice that its pod continually restarts.

To resolve this issue, check the CPU and memory usage of the pod to establish whether the default minimum limits need to be increased. For example, to check the resource usage from the Red Hat® OpenShift® web console, you can view the integration server or integration runtime pod details from the Details tab under Workloads > Pods, and then click the usage graph to view the metrics.

Example of the Details tab for an integration server pod

If the pod is close to or exceeding its resource limits, increase the limits to stop the restarts or to stop the pod from crashing.

Procedure

From the App Connect Dashboard, edit the integration server or integration runtime definition to adjust the resource limits for the pod's runtime container.

  1. Complete the relevant steps:
    • Integration server:
      1. From the Servers page, open the options menu Options menu on an integration server tile on the integration server tile and then click Edit.
      2. From the Edit Instance page, either use the Common settings view (with Advanced settings enabled) or the YAML editor view to increase the limits for the runtime container. You can update the following parameters or fields:
        • spec.pod.containers.runtime.resources.limits.cpu
        • spec.pod.containers.runtime.resources.limits.memory

        For information about these parameters, see Integration Server reference: Custom resource values.

        Resource limit fields for the runtime container
    • Integration runtime:
      1. From the Runtimes page, open the options menu Options menu on an integration runtime tile on the integration runtime tile and then click Edit.
      2. From the Edit Instance page, either use the Common settings view (with Advanced settings enabled) or the YAML editor view to increase the limits for the runtime container. You can update the following parameters or fields:
        • spec.template.spec.containers[].name (set to runtime)
        • spec.template.spec.containers[].resources.limits.cpu
        • spec.template.spec.containers[].resources.limits.memory

        For information about these parameters, see Integration Runtime reference: Custom resource values.

        Resource limit fields for the runtime container
  2. Click Update.
    Tip: You can also edit the resource limits from the Red Hat OpenShift web console or CLI. If using the CLI, you can use commands such as oc patch integrationserver or oc edit integrationserver for integration servers, or oc patch integrationruntime or oc edit integrationruntime for integration runtimes.

Checking the deployment status of an integration server or integration runtime

Procedure

To check the deployment status of an integration server or integration runtime, complete the following steps.

  1. After you deploy an integration server or integration runtime to the App Connect Dashboard, you can check whether the deployment was successful by running the following command in the CLI.
    oc get pods
    The command returns a list of pods in the namespace. Integration server pods are named in the format integrationServerName-is-generated_characters, and integration runtime pods are named in the format integrationRuntimeName-ir-generated_characters.
  2. Check the STATUS field for issues. A pod that is working correctly moves from a status of Pending to Container creating to Running.
    Output of the 'oc get pods' command

    A failed deployment might be due to an issue with the configuration that you specified or because you ran out of cluster resources.

  3. To get more detailed information about that pod, run the following command:
    oc describe pod podName
    Output of the 'oc describe pods' command

Retrieving logs

You can obtain user logs for an integration server or integration runtime from your cluster by running oc commands from the command line. You can also obtain logs for the IBM App Connect Operator and its custom resource instances in the same way.


If you need to send operational logs to IBM Support to aid with troubleshooting, see Gathering diagnostic information.

Procedure

To retrieve user logs for an integration server, complete the following steps:

  1. Retrieve the list of pods in the namespace by running the following command:
    oc get pods
  2. Identify the pod name for the integration server or integration runtime whose logs you want to retrieve and then run the following command. (The pod name is in the format integrationServerName-is-generated_characters or integrationRuntimeName-ir-generated_characters.)
    oc describe pod podName
  3. Locate the container name in the output and then run the following command to obtain the App Connect user logs for the pod:
    oc logs podName -c container_name

    The logs are written to a standard output (stdout) stream in the terminal window as shown in the following example.

    Warning:

    If you are running on Red Hat OpenShift, do not stream logs with large volumes of log messages to standard output. Doing so can cause pod logging to hang. The pod might appear to not be running, but the underlying runtimes will remain fully operational.

    In exceptional circumstances, you can log longer lines by setting the ACE_LOGGING_BUFFER environment variable to an integer, to customize the log buffer in KB. The advice is to set this value to twice the size of the largest message that you want to emit. For example, for a 2 MB message, set the ACE_LOGGING_BUFFER environment variable to 4096. This workaround is considered a temporary solution and must not be permanently configured on your system. For debugging purposes, use other techniques such as trace nodes. Also ensure that any logging application in your cluster (for example, the ELK Stack) is appropriately configured to allow larger messages to be consumed.

    The following example shows how to set the log buffer in an integration server custom resource.

    spec:
      env:
      - name: ACE_LOGGING_BUFFER
        value: '4096'

    The following example shows how to set the log buffer in an integration runtime custom resource.

    spec:
      template:
        spec:
          containers:
            - name: runtime
               env:
               - name: ACE_LOGGING_BUFFER
                 value: '4096'

    C:\WINDOWS\system32>oc logs https-basic-auth-is-5d646b9844-c6h85 -c https-basic-auth
    2020-07-06T10:16:01.450Z Image created: 2020-06-12T03:04:42+00:00
    2020-07-06T10:16:01.450Z Image revision: Not specified
    2020-07-06T10:16:01.450Z Image source: Not specified
    2020-07-06T10:16:01.539Z ACE version: 11009
    2020-07-06T10:16:01.539Z ACE level: S000-L200527.16701
    2020-07-06T10:16:01.539Z ACE build type: Production, 64 bit, amd64_linux_2
    2020-07-06T10:16:01.539Z Checking for valid working directory
    2020-07-06T10:16:01.539Z Checking if work dir is already initialized
    2020-07-06T10:16:01.539Z Checking for contents in the work dir
    2020-07-06T10:16:01.539Z Work dir initialization complete
    2020-07-06T10:16:01.539Z SetupConfigurationsFiles
    2020-07-06T10:16:01.539Z SetupConfigurationsFiles - configuration names: https-keystore-1.p12,https-policyproject-1,https-serverconf-1,https-setdbparms-1
    2020-07-06T10:16:01.600Z ConstructConfigurationsOnFileSystem - configuration name: https-keystore-1.p12 type: keystore
    2020-07-06T10:16:01.600Z constructKeyStoreOnFileSystem - Keystore name: https-keystore-1.p12
    2020-07-06T10:16:01.600Z ConstructConfigurationsOnFileSystem - configuration name: https-policyproject-1 type: policyproject
    2020-07-06T10:16:01.600Z ConstructPolicyProjectOnFileSystem
    2020-07-06T10:16:01.600Z ConstructConfigurationsOnFileSystem - configuration name: https-serverconf-1 type: serverconf
    2020-07-06T10:16:01.600Z constructServerConfYamlOnFileSystem
    2020-07-06T10:16:01.600Z ConstructConfigurationsOnFileSystem - configuration name: https-setdbparms-1 type: setdbparms
    2020-07-06T10:16:01.600Z ExecuteSetDbParms
    2020-07-06T10:16:01.600Z ExecuteSetDbParms - execute line 0 with number of args: 3
    2020-07-06T10:16:01.685Z ExecuteSetDbParms - execute line 1 with number of args: 3
    2020-07-06T10:16:01.768Z Performing initial configuration of integration server
    2020-07-06T10:16:01.768Z Getting configuration from content server
    2020-07-06T10:16:01.768Z Using ca file /home/aceuser/ssl/cacert.pem
    2020-07-06T10:16:01.798Z Configuration pulled from content server successfully
    2020-07-06T10:16:01.798Z Processing configuration in folder bars
    BIP8071I: Successful command completion.
    2020-07-06T10:16:01.892Z Processing configuration in folder webusers
    2020-07-06T10:16:01.906Z+00:00 Handling webusers configuration
    2020-07-06T10:16:02.074Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:02.269Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:02.421Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:02.580Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:02.733Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:02.888Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:02.895Z+00:00 Creating admin user ibm-ace-dashboard-admin
    2020-07-06T10:16:03.514Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:03.521Z+00:00 Creating operator user ibm-ace-dashboard-operator
    2020-07-06T10:16:04.289Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:04.296Z+00:00 Creating editor user ibm-ace-dashboard-editor
    2020-07-06T10:16:04.914Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:04.920Z+00:00 Creating audit user ibm-ace-dashboard-audit
    2020-07-06T10:16:05.678Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:05.685Z+00:00 Creating viewer user ibm-ace-dashboard-viewer
    2020-07-06T10:16:06.303Z+00:00 BIP8071I: Successful command completion.
    2020-07-06T10:16:06.303Z Enabling metrics in server.conf.yaml
    2020-07-06T10:16:06.304Z Metrics enabled in server.conf.yaml
    2020-07-06T10:16:06.304Z Initial configuration of integration server complete
    2020-07-06T10:16:06.304Z Discovering override ports
    2020-07-06T10:16:06.519Z Successfully discovered override ports
    2020-07-06T10:16:06.519Z Starting integration server
    2020-07-06T10:16:06.519Z Waiting for integration server to be ready
    2020-07-06T10:16:06.522Z Integration server not ready yet
    .....2020-07-06 10:16:06.703125: .2020-07-06 10:16:06.703346: Integration server 'https-basic-auth' starting initialization; version '11.0.0.9' (64-bit)
    ........................................2020-07-06 10:16:09.020610: About to 'Initialize' the deployed resource 'CustomerDatabaseV1' of type 'RestAPI'.
    2020-07-06T10:16:11.526Z Integration server not ready yet
    2020-07-06T10:16:16.576Z Integration server not ready yet
    2020-07-06 10:16:18.227306: About to 'Start' the deployed resource 'CustomerDatabaseV1' of type 'RestAPI'.
    An https endpoint was registered on port '7843', path '/customerdb/v1*'.
    2020-07-06 10:16:18.253094: The HTTP Listener has started listening on port '7843' for 'https' connections.
    2020-07-06 10:16:18.253180: Listening on HTTP URL '/customerdb/v1*'.
    Started native listener for HTTPS input node on port 7843 for URL /customerdb/v1*
    2020-07-06 10:16:18.253436: Deployed resource 'gen.CustomerDatabaseV1' (uuid='gen.CustomerDatabaseV1',type='MessageFlow') started successfully.
    ..2020-07-06 10:16:18.928608: IBM App Connect Enterprise administration security is authentication, authorization file. 
    2020-07-06 10:16:18.939272: The HTTP Listener has started listening on port '7600' for 'RestAdmin http' connections.
    
    2020-07-06 10:16:18.942780: Integration server has finished initialization.
    2020-07-06T10:16:21.585Z Integration server is ready
    2020-07-06T10:16:21.585Z Gathering Metrics...
    2020-07-06T10:16:21.585Z Starting metrics gathering
    2020-07-06T10:16:21.585Z Processing metrics...
    2020-07-06T10:16:21.585Z ACE_ADMIN_SERVER_SECURITY is false
    2020-07-06T10:16:21.585Z Connecting to ws://localhost:7600/ for statistics
    2020-07-06T10:16:21.893Z Connecting to ws://localhost:7600/ using session cookie
    

Enabling and downloading trace

To aid with problem determination and troubleshooting, you can enable and then download user or service trace on a deployed integration server or integration runtime. Enabling trace is useful if you cannot get enough information about a particular problem from the entries that are available in the log.

Procedure

You can enable and manage trace as follows:

Resolving liveness probe failures for long running flows

About this task

For long running flows, which take more than five minutes to complete, you might observe liveness probe failures for some of the containers in the integration server or integration runtime pods due to the amount of time that the event loop requires to process the request.

To resolve this issue, adjust the following liveness probe values for the pod containers (except the runtime container) to values that will stop the probe from failing. The failureThreshold and timeoutSeconds settings are particularly relevant.

  • Integration server:

    In the listed parameters, * represents a container name such as connectors; for example, spec.pod.containers.connectors.livenessProbe.failureThreshold.

    • spec.pod.containers.*.livenessProbe.failureThreshold
    • spec.pod.containers.*.livenessProbe.initialDelaySeconds
    • spec.pod.containers.*.livenessProbe.periodSeconds
    • spec.pod.containers.*.livenessProbe.timeoutSeconds
  • Integration runtime:
    • spec.template.spec.containers[].name
    • spec.template.spec.containers[].livenessProbe.failureThreshold
    • spec.template.spec.containers[].livenessProbe.initialDelaySeconds
    • spec.template.spec.containers[].livenessProbe.periodSeconds
    • spec.template.spec.containers[].livenessProbe.timeoutSeconds

For more information about these parameters, see Integration Server reference: Custom resource values or Integration Runtime reference: Custom resource values.

Resolving ValidatingAdmissionWebhook errors for BAR files

When you deploy one or more BAR files to an integration server or integration runtime, a 13-second time limit is applied for downloading the BAR files (with any applicable credentials) to ensure that they are available.

Depending on the number or size of these BAR files, or the speed of your network connection (if the files are hosted remotely), the validation checks might exceed the 13-second time limit, which Red Hat OpenShift enforces for running webhooks. When this timeout occurs, you see the following error:

"admission plugin "ValidatingAdmissionWebhook" failed to complete validation in 13s" for field "undefined"

Procedure

To skip the validation checks and prevent the ValidatingAdmissionWebhook error, complete the following steps:

  1. Edit the integration server or integration runtime custom resource (CR) settings to add the following annotation to the metadata block while in YAML view:
    metadata:
      annotations:
        appconnect.ibm.com/webhook-barcheck: NONE

    You can do so from the Red Hat OpenShift web console or CLI, as described in Integration Server reference: Updating the custom resource settings for an instance and Integration Runtime reference: Updating the custom resource settings for an instance.

    Note:
    • This annotation setting is applicable only for integration servers or integration runtimes with a spec.version value that resolves to 12.0.9.0-r3 or later.
    • When this annotation is applied, the specified BAR files (in spec.barURL) are not validated to check whether they can be reached. If any BAR file is unreachable with a 404 status code, the integration server or integration runtime fails to start.
  2. Save your changes to the CR.

Resolving MountVolume.SetUp failed for volume "contentservertls" errors for integration server or integration runtime deployments

If you are deploying an integration server or integration runtime, the deployment will fail to complete if the BAR file is hosted on a content server that is incompatible with the Kind value in the custom resource (CR) that you are trying to deploy.

A typical error that you might see is as follows:

MountVolume.SetUp failed for volume "contentservertls" : secret "dashboardName-dash.namespaceName" not found.

This error generally occurs in either of these circumstances:
  • You are deploying an integration server and manually specify a spec.barURL value for a BAR file in the content server of an App Connect Dashboard instance with an IntegrationRuntimes display mode.
  • You are deploying an integration runtime and manually specify a spec.barURL value for a BAR file in the content server of a Dashboard instance with an IntegrationServers display mode.

BAR files that you upload to a Dashboard's content server can only be used to deploy integrations that match the display mode value of the Dashboard. It's also worth noting that BAR URL formats vary slightly based on a Dashboard's display mode.

  • Format of a BAR URL in the content server of a Dashboard with a display mode of IntegrationServers:
    https://dashboardName-dash:3443/v1/directories/barFileStem?uniqueID
    Example:
    https://db-fd-nokeyclk-acelic-is-dash:3443/v1/directories/Customer_API?123456ef-f2eb-4680-9e2e-6a3de15f04e8
  • Format of a BAR URL in the content server of a Dashboard with a display mode of IntegrationRuntimes:
    https://dashboardName-dash.namespaceName:3443/v1/directories/barFileStem?uniqueID
    Example:
    https://db-fd-nokeyclk-acelic-ir-dash.ace-test:3443/v1/directories/Customer_API?8abcdef7-bf93-4a95-aac6-e1f40709fca8