Debugging and troubleshooting

Collect cluster information and debugging logs to troubleshoot issues with Standard Edition.

Attention:

Self-Hosted Standard Edition 1.10.3 and earlier versions:

For online (non–air‑gapped) installations, most stanctl lifecycle commands, such as stanctl up will fail to run.
For air‑gapped installations, the stanctl commands continue to work.

Required action: Upgrade stanctl to 1.10.4 or later versions before you perform a lifecycle operation.

Example: On online deployments that run stanctl 1.10.3 or earlier versions, any workflow that stops services, such as a stanctl down command, before a backup, cannot complete because the subsequent stanctl up command fails. Upgrade stanctl to 1.10.4 or later versions before you start these steps.

Collect information

Create an archive file with information about your cluster. You can use the information in the file to troubleshoot issues, or share the file with the support team.

The archive file collects the following information:

Container logs
Resource manifests (in YAML format)
stanctl logs
System information that includes memory, CPU, and CPU usage
Disk mounts and their usage
Open files (allocated, free, and maximum)
Backend logs

Use the following command to create the archive file:

stanctl debug

After you run the command, you see the following messages. When you see Done! in the messages, it means that your archive file is ready.

./stanctl debug
⠼ Streaming container logs  [26s] ✓
⠸ Gathering resource manifests  [27s] ✓
⠋ Gathering stanctl config files  [0s] ✓
⠋ Gathering system information  [0s] ✓
⠹ Creating tar file  [0s] ✓

----------------------
Done!
Debug package -> debug_20231027111737
Compressed debug package -> debug_20231027111737.tar.gz
----------------------

Adjust log level for Instana components

To adjust the level for Instana components, complete following steps:

Edit the Core Config file, for example, $HOME/.stanctl/values/instana-core/custom-values.yaml.

Configure a component’s log level in the Core or Unit CR. In the following example, the log level is changed to DEBUG for the butler component:

componentConfigs:
  - name: butler
    env:
      - name: COMPONENT_LOGLEVEL
        # Possible values are DEBUG, INFO, WARN, ERROR (not case-sensitive)
        value: DEBUG

Apply the custom values by running the following command:
```
stanctl backend apply
 
```
View the logs by running the following command:
```
kubectl logs <component name> -n instana-core
 
```
<component name> is the component name that you want to troubleshoot.

Secure Sockets Layer (SSL) certificates

Understand supported configurations, limitations, and troubleshooting steps related to SSL certificates.

Wildcard SSL certificates

You can use wildcard SSL certificates with Instana Standard Edition. A few wildcard configurations are unsupported, and specific deployment patterns require more consideration.

Example Scenario

Assume the following DNS structure:

Wildcard certificate: *.company.com
Instana backend: instana.company.com
Agent acceptor endpoint: agent-acceptor.instana.company.com
Instana UI: unit-tenant.instana.company.com

Limitations of single-level wildcards

You cannot use a wildcard certificate, such as *.company.com in this scenario.

Reason: By design, an asterisk (*) can replace only one label in a DNS name. It cannot span multiple subdomain levels.

Certificate matching rules

Certificate *.company.com matches:

www.company.com
api.company.com
mail.company.com

Certificate *.company.com does not match:

a.b.company.com
dev.api.company.com
company.com

Note: Each dot (.) separates DNS labels. The wildcard replaces exactly one label only.

To support the Instana deployment, use one of the following options:

Create a wildcard certificate for the full Instana base domain:
```
*.instana.company.com 
```

Use a SAN certificate that lists all required hostnames.

DNS: api.example.com
DNS: dev.api.example.com
DNS: prod.api.example.com

Combine multiple wildcard entries within a SAN certificate:
```
DNS: *.example.com
DNS: *.api.example.com
```

Restore the self-signed certificate

You can restore a previously removed self-signed TLS certificate.

Delete the existing TLS secret:

kubectl delete secret instana-tls -n instana-core

Generate a new self-signed certificate:
```
stanctl be apply --core-tls-generate-cert
```
Result:
- A new self-signed SSL certificate is generated and applied.
- TLS encryption is enabled for Instana endpoints.
- Modern browsers (such as Chrome or Firefox) display security warnings because a trusted Certificate Authority does not issue the certificate.
Important: Although browsers mark the connection as untrusted, all communication remains encrypted.

Summary

Single-level wildcard certificates (for example, *.company.com) do not support multi-level subdomains.
Instana Standard commonly requires SAN certificates or base-domain wildcards.
You can safely regenerate self-signed certificates when needed, with expected browser warnings.

Troubleshoot

Resolve these issues.

Instana agent is not displayed in the UI

After you delete the Instana agent that was configured for remote monitoring and install the Instana agent for self monitoring, the agent might not be displayed on the Instana UI.

The agent might be trying to connect to the remote Instana backend instead of the local Instana backend.

To resolve this issue, install the agent and specify the backend endpoint host and an agent key:

stanctl agent apply --agent-cluster-name <cluster-name> --agent-endpoint-host acceptor.instana-core --agent-endpoint-port 8600 --agent-zone-name <zone-name> --agent-key <agent-key-of-local-backend>

Instana backend becomes non‑functional when the Elasticsearch data disk exceeds 85% usage

Elasticsearch automatically switches its data store to read‑only mode when the disk it uses exceeds 85% usage. This causes the Instana backend to stop functioning. Free up space on the Elasticsearch data disk or increase its capacity to restore normal operations. Note: Other Instana disks do not trigger read‑only behavior at similar usage levels (even above 95%), which can make this issue appear confusing.

To resolve this issue, do one of the following measures:

Free up space on the Elasticsearch data disk
Increase the disk size allocated to Elasticsearch

When sufficient space is available, Elasticsearch resumes normal write operations, and the backend becomes functional again.

Note: Other Instana disks do not enter read‑only mode at similar usage levels (even above 95%), which might lead to confusion during troubleshooting.

Instana backend upgrade fails due to corrupt Helm chart installation

The Instana backend upgrade fails after you run the stanctl backend apply command. You might see the following error:

Error: another operation (install/upgrade/rollback) is in progress

In the console.log file, you might see information similar to the following entries:

ts=2025-05-26T12:26:09Z level=INFO msg="upgrading Helm chart" name=instana-core release=instana-core version=1.8.1 namespace=instana-core
ts=2025-05-26T12:26:09Z level=DEBUG msg="preparing upgrade for instana-core"

This issue indicates a corrupt Helm chart installation of the current core chart that you can reset by using the following command:

Delete the old Helm chart secret from the instana-core namespace.
```
kubectl delete secret -n instana-core -l owner=helm
 
```
Upgrade the backend.

stanctl up

Host agent cannot connect to the Instana backend on SLES hosts

After you install the host agent on the local host on SUSE Linux Enterprise Server (SLES) 15 SP5 hosts for self monitoring, the agent does not automatically connect to the Instana backend.

You must use the agent external URL to connect to the backend as a remote host.

Use the following command:

stanctl agent apply --agent-endpoint-host agent-acceptor.<base_domain> --agent-endpoint-port 8443

Kafka pods show CrashLoopBackOff status

Kafka pods do not restart after a shutdown of the Instana backend host. You might see a CrashLoopBackOff status of the Kafka pods.

To resolve the issue, restart the Instana backend.

Shut down the backend.
```
stanctl down
 
```
Start the backend.
```
stanctl up
 
```

After the backend is restarted, check the status of Kafka pods.

kubectl get pods --all-namespaces | grep kafka

The Kafka pod status should show as Running.

Scheduled Synthetic tests are not running after Instana backup and restore

After Instana backend and agent data are restored, the scheduled Synthetic tests are not running.

To resolve this issue, restart the synthetic-pop-controller pod on the cluster where it is installed.

Standard Edition installation on RHEL 9.3 fails

Red Hat® Enterprise Linux® 9.3 uses iptables 1.8.8.

If you are installing Standard Edition on RHEL 9.3, the installation might fail due to iptables 1.8.8.

To work around the issue, upgrade your host to RHEl 9.4, which also upgrades the iptables to version 1.8.10.

Upgrade fails on Standard Edition 1.9.x

When you upgrade Standard Edition 1.9.x to a later version, you might encounter the following error:

Error: installation failed for prerequisite app coredns: Unable to continue with install: ConfigMap "coredns" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "coredns"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "kube-system"

To resolve this issue, run the stanctl up command again.

Upgrade fails with "Insufficient CPU" or "Insufficient memory" errors

You might you experience any of the following issues during upgrades on single-node K3s deployments with limited resources or environments where adding temporary capacity is not feasible:

Errors such as "Insufficient CPU" or "Insufficient memory"
Pods that remain in a Pending state

The default RollingUpdate strategy requires temporary additional capacity to run both old and new pods simultaneously during the upgrade. On systems with limited resources, this requirement can exceed the available CPU or memory, even when overall utilisation appears low.

To troubleshoot this issue, use the Recreate update strategy, which does not require additional capacity during upgrades:

stanctl up --core-update-strategy=Recreate

For more information on update strategy, see Configuring update strategy for upgrades

Note:

The Recreate strategy causes brief downtime (several minutes) during the upgrade process. This tradeoff is typically acceptable for nonproduction environments or systems where adding hardware capacity is not feasible.

Systemd does not set a default working directory

When stanctl is started by a systemd service, systemd does not set a working directory on its own. If you do not provide a working directory, systemd runs the service from /. This activity can cause stanctl to create files, such as cluster data, .stanctl, or Kubernetes configs in the wrong place (often / or /root/), even if the service uses a non‑root user.

To mitigate the issue, you must add a WorkingDirectory= line to the systemd service to create files in the correct home directory of users. For example, WorkingDirectory=/home/instana.

Unable to update the license

When you run the stanctl license update command, the command might fail with the following error message:

...
no dependency found: 'instana-core'
...

Run the following commands to update the license:

stanctl license download --sales-key=<your-key>
stanctl backend apply

Instana backend upgrade fails due to node disk pressure

An Instana backend installation or upgrade might fail when the node experiences disk pressure.

Symptoms

You might observe one or more of the following symptoms:

The backend installation or upgrade fails.
Pods remain pending.
Some workloads show ContainerStatusUnknown.

Cause

During installation, upgrade, or air‑gapped package import, disk usage increases temporarily as container images and artifacts are processed. If the node runs out of disk space, Kubernetes sets a DiskPressure condition and prevents new pods from starting.

Verification

Run the following command to check the node condition:
```
kubectl describe node <node-name>
```
In the Conditions section, check the DiskPressure.
Run the following command to check disk usage:
```
df -h
```
Verify whether disk usage is close to or at capacity.

Solution

Complete one of the following actions to resolve the issue:

Remove unused container images or unnecessary files to free disk space.
Increase the storage capacity of the node.

Recovery

If workloads remain in the ContainerStatusUnknown state after you recover disk space, reboot the node. After the reboot completes, retry the installation or upgrade.

License is invalid or missing

If the license is invalid or missing, the backend prevents agents from connecting.

When this occurs

The imported license is invalid.
The Instana Operator cannot apply the license to the Groundskeeper backend.

How to troubleshoot

Verify that the Sales Key in the core secret matches the license strings in the unit secret. If they differ, re-download the license using the correct Sales Key.

Check the Instana Operator logs for license import errors:

kubectl logs -n instana-operator deployment/instana-operator --tail=100

Check the Groundskeeper backend component, pod status, and logs:
```
kubectl get pods -n instana-core | grep groundskeeper 
```
If the license still shows an invalid state, contact IBM Support.

Installation fails with “Fatal glibc error: CPU does not support x86‑64‑v2”

Symptom

The installation fails early when you run stanctl install, with an error similar to:

Fatal glibc error: CPU does not support x86-64-v2

This failure typically occurs before all components are installed and can prevent services, such as Cassandra and Kafka, from starting.

Cause

This error indicates that the operating system cannot detect the x86‑64‑v2 instruction set. The most common cause is a virtual machine CPU configuration that does not expose the required CPU flags, often due to legacy compatibility settings.

Resolution

In VMware and vSphere environments, this issue is usually caused by selecting an outdated virtual CPU architecture or enabling CPU compatibility modes that restrict instruction sets. To resolve the issue:

Avoid legacy CPU compatibility profiles when you configure the virtual machine.
Make sure that CPU masking is not enabled.
If Enhanced vMotion Compatibility (EVC) is enabled, verify that it is set to a level that supports x86‑64‑v2 instructions.
Power off the virtual machine and update the CPU configuration.
If necessary, recreate the virtual machine with updated CPU settings.

After you apply the changes, confirm that the required CPU flags are visible in the guest operating system before you rerun the installation.

Contact support

If you are unable to resolve the issue, contact IBM support. Provide the archive file that you created to the support team.