Troubleshooting and known issues

Troubleshoot your Content Runtime infrastructure and read the known issues.

Failed to create the Content Runtime

The console log displays the following error message which is caused by the issue. You must correct the error and configure the Content Runtime again.

null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m50s elapsed)
null_resource.singlenode: Still creating... (4m0s elapsed)
null_resource.singlenode: Still creating... (4m10s elapsed)
null_resource.singlenode: Still creating... (4m20s elapsed)
null_resource.singlenode: Still creating... (4m30s elapsed)
null_resource.singlenode: Still creating... (4m40s elapsed)
null_resource.singlenode: Still creating... (4m50s elapsed)
null_resource.singlenode: Still creating... (5m0s elapsed)
Error applying plan:

1 error(s) occurred:

* null_resource.singlenode: 1 error(s) occurred:

* ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain

Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure.

If you are unable to determine the cause of the failure from the logs, it is possible to log in to the Content Runtime virtual machine and run the configuration process manually using the following command which is located in the home directory of the OS user:

advanced-content-runtime/launch-docker-compose.sh

Additional debug information can be captured from the ./advanced-content-runtime/launch-docker-compose.sh script by editing the ./advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh file and adding the line --debug to the end of the file. Rerun the ./advanced-content-runtime/launch-docker-compose.sh command to capture the additional information. Parameters may be updated in the ./advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh file if the parameter values were the cause of the failure.

If you are successful when completing the creation of a Content Runtime from the command line, you need to complete the following steps to add it to the Content Runtime creation panel:

  1. Go back to the Content Runtime creation panel and recreate the Content Runtime from the beginning.
  2. Leave the virtual machine intact, go back to the Content Runtime panel, select create Content Runtime's Other template. You must enter all the parameters as they appear in the ~/advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh file in the create panel so that the existing virtual machine is reused. Managed services then stores its information so that it can be used later in pattern deployment.

Common issues when deploying a Content Runtime

The following are a set of input issues that would result in a failed deployment of a Content Runtime, how to identify them by reading the provided Log File and the steps to solve them.

All cloud providers

The following errors can occur independently of the selected cloud provider.

  ssh: unable to authenticate, attempted methods [none password], no supported methods remain
[ERROR] Platform <distribution> not supported
[ERROR] This OS version (<version>) is not supported
[ERROR] This script requires root permissions with the NOPASSWD option enabled for executing
[ERROR] This script requires <package manager> permissions for executing
[ERROR] This script requires at least 5GB of available disk space
[ERROR] The provided encoded private key contains a password. Pattern Manager requires the use of a passwordless private key.
[ERROR] The provided SSH public and private keys for Pattern Manager do not match, please provide a matching pair of keys.
[ERROR] The server's hostname can not contain uppercase letters
[ERROR] Docker CE for Red Hat Enterprise is not supported, please provide a valid Docker EE repository URL
[ERROR] There was an error validating the certificate validation when obtaining Docker Compose from the provided URL.
[ERROR] There is an error with the default permissions (umask) for new users and folders. The recommended value is 022, found <mask>
[ERROR] Failed while installing <package>

VMware vSphere

Troubleshooting Docker EE installation

RHEL

The installation of Docker EE requires access to the container-selinux package, located by default in the rhel-7-server-extras-rpms repository in /etc/yum.repos.d/redhat.repo. If the provided Virtual Machine image contains customized RHEL repositories and there is no access to the mentioned extras-rpms, the following error is displayed when attempting to install Docker EE.

vsphere_virtual_machine.singlenode (remote-exec): Loaded plugins: product-id, search-
vsphere_virtual_machine.singlenode (remote-exec):               : disabled-repos,
vsphere_virtual_machine.singlenode (remote-exec):               : subscription-manager
vphere_virtual_machine.singlenode (remote-exec): Error loading certificate

vsphere_virutal_machine.singlenode (remote-exec): Error getting repository data for rhel-7-server-extras-rpms, repository not found
vsphere_virtual_machine.singlenode (remote-exec): [ERROR] There was an error installing Docker EE from the provided repository
vsphere_virtual_machine.singlenode (remote-exec): [ERROR] Repo: https://storebits.docker.com/ee/rhel/sub-b5

Error applying plan:

1 error(s) occurred:

* vsphere_virtual_machine.singlenode: 1 error(s) occurred:

*Script exited with non-zero exit status: 1

Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure.

Remote installation verification

A verification of the installation can be done by executing the following command:

ssh <user>@<content-runtime> advanced-content-runtime/verify-installation.sh

The information returned to the screen should help to diagnose the issue.

Known issues

Health check

If the Content Runtime experiences an unknown issue, read the following sections to know some commands which may help to isolate the problem.

A script on the Content Runtime virtual machine can be executed to display current status for running services.

The script can be executed by running the command:

ssh <user>@<content-runtime> advanced-content-runtime/verify-installation.sh

where user is the user used to create the Content Runtime. The path to the script can be adjusted based on the user to find the correct location of the script.

content-runtime is the IP address or host name of the Content Runtime virtual machine.

[SUCCESS] Chef server nodes are running correctly
[SUCCESS] Docker is installed
[SUCCESS] Docker is currently running
[SUCCESS] The Pattern Manager image is running correctly
[SUCCESS] The Software Repository image is running correctly
[SUCCESS] Connection to the Software Repo image was established correctly
[SUCCESS] Connection to the Pattern Manager image was established correctly
Connection to 192.168.122.89 443 port [tcp/https] succeeded!
[SUCCESS] Connection from Pattern Manager to host has been established successfully
[SUCCESS] /opt/ibm/docker/software-repo/var/swRepo/private is setup as expected

Connection to the Content Runtime virtual machine

The virtual machine running the Content Runtime can be connected using ssh and the private key associated with the User's Public Key parameter.

Docker

Ensure that the Docker service is running:

sudo systemctl status docker

or

sudo service docker status

The service can be started by using the start option with the previous command.

If you have installed Docker, you must enable the service to start on reboot. You can check the enablement of the service by running:

sudo systemctl is-enabled docker

Platforms running service should have the service to start by default.

Docker containers

Ensure that the Docker containers are running:

sudo docker ps

The command output should display two active containers as, for example:

CONTAINER ID    IMAGE                                 COMMAND                  CREATED        STATUS       PORTS                                             NAMES
93e4607e889a    ibmcom/camc-sw-repo:latest            "/bin/bash /tmp/in..."   3 days ago     Up 3 days    0.0.0.0:8888->8888/tcp, 0.0.0.0:9999->9999/tcp    camc-sw-repo
24866ee2426d    ibmcom/camc-pattern-manager:latest    "bin/bash /opt/ibm..."   3 days ago     Up 3 days    0.0.0.0:5443->443/tcp                             camc-pattern-manager

If the containers are not running, run the advanced-content-runtime/launch-docker-compose.sh command from the ~ directory to restart the containers.

Note that the docker-compose command controls stopping and starting the container.

Chef server status

Check the status of the Chef server by using the /usr/bin/chef-server-ctl status command.

If there are failed processes, take the appropriate action to recover.

Ubuntu apt-get fails

The configuration of the Content Runtime virtual machine can fail with an apt-get lock error. The issue is that a background process holds the lock on the /var/lib/dpkg/lock file. The timed upgrade does not complete within a limited amount of time.

null_resource.singlenode (remote-exec): Checing lock file /var/lib/dpkg/lock
null_resource.singlenode: Still creating... (18m30s elapsed)
null_resource.singlenode (remote-exec): E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
null_resource.singlenode (remote-exec): E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?
null_resource.singlenode (remote-exec): [ERROR] This script requires apt-get permissions for executing

Ubuntu 16.04 may be configured to run unattended updates. The lock which is held by that process might cause delays to occur with the Content Runtime configuration process.

Disk allocation

The logic within the configuration code takes the largest available non-formatted disk for the Software Repository. If no disk is available, the primary partition is used, with the assumption that it has enough space to hold the Software Repository.

If there is a second disk available, the disk is used as the Docker disk. Usage of the second disk is recommended for production environments.

The configuration code does not support using a separate disk for Docker, but not for the Software Repository.

Disk space

If there is not enough space for the Software Repository, you can allocate new space, and move the content of the Software Repository to that space. Complete this action according to the operating system you selected. When completed, you can run the advanced-content-runtime/verify-installation.sh script to determine if the base file structure is correct.

File system

A file directory structure is created for the Software Repository on the /opt/ibm/docker/software-repo/var/swRepo/ file system.

If you delete or rename any of the file directories, restart the docker container to pick up the changes by running the following command:

advanced-content-runtime/launch-docker-compose.sh