Troubleshooting and known issues

Troubleshoot your Content Runtime infrastructure and read the known issues.

Failed to create the Content Runtime

The console log displays the following error message which is caused by the issue. You must correct the error and configure the Content Runtime again.

null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m50s elapsed)
null_resource.singlenode: Still creating... (4m0s elapsed)
null_resource.singlenode: Still creating... (4m10s elapsed)
null_resource.singlenode: Still creating... (4m20s elapsed)
null_resource.singlenode: Still creating... (4m30s elapsed)
null_resource.singlenode: Still creating... (4m40s elapsed)
null_resource.singlenode: Still creating... (4m50s elapsed)
null_resource.singlenode: Still creating... (5m0s elapsed)
Error applying plan:

1 error(s) occurred:

* null_resource.singlenode: 1 error(s) occurred:

* ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain

Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure.

If you are unable to determine the cause of the failure from the logs, it is possible to log in to the Content Runtime virtual machine and run the configuration process manually using the following command which is located in the home directory of the OS user:

advanced-content-runtime/launch-docker-compose.sh

Additional debug information can be captured from the ./advanced-content-runtime/launch-docker-compose.sh script by editing the ./advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh file and adding the line --debug to the end of the file. Rerun the ./advanced-content-runtime/launch-docker-compose.sh command to capture the additional information. Parameters may be updated in the ./advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh file if the parameter values were the cause of the failure.

If you are successful when completing the creation of a Content Runtime from the command line, you need to complete the following steps to add it to the Content Runtime creation panel:

  1. Go back to the Content Runtime creation panel and recreate the Content Runtime from the beginning.
  2. Leave the virtual machine intact, go back to the Content Runtime panel, select create Content Runtime's Other template. You must enter all the parameters as they appear in the ~/advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh file in the create panel so that the existing virtual machine is reused. Managed services then stores its information so that it can be used later in pattern deployment.

Common issues when deploying a Content Runtime

The following are a set of input issues that would result in a failed deployment of a Content Runtime, how to identify them by reading the provided Log File and the steps to solve them.

All cloud providers

The following errors can occur independently of the selected cloud provider.

ssh: unable to authenticate, attempted methods [none password], no supported methods remain
  • This error is displayed when the credentials (user name and password or private key) provided for the SSH connection to the virtual machine were incorrect.

    [ERROR] Platform <distribution> not supported
    
  • If a Content Runtime is trying to be deployed to an unsupported operating system, this error is displayed.

    [ERROR] This OS version (<version>) is not supported
    
  • The error occurs when a Content Runtime is trying to be deployed to a supported operating system but an unsupported version.

    [ERROR] This script requires root permissions with the NOPASSWD option enabled for executing
    
  • This error is displayed when the user account provided for connecting to the Content Runtime has no sudo permissions or creates a prompt for password when executing as sudo:

  • To remove the password prompt, the file /etc/sudoers must be changed to include user_name ALL=(ALL:ALL) NOPASSWD: ALL

    [ERROR] This script requires <package manager> permissions for executing
    
  • This error occurs when the user account provided for connecting to the Content Runtime has no permissions to use the OS's package manager. A different user account with higher permissions can be provided to solve it.

    [ERROR] This script requires at least 5GB of available disk space
    
  • This error is displayed when the hard drive being used for installing the necessary software and downloading additional packages does not contain at least 5GB of empty space. An image containing a larger hard drive or freeing up more space should solve this issue.

    [ERROR] The provided encoded private key contains a password. Pattern Manager requires the use of a passwordless private key.
    
  • This error is displayed when the provided private key was password-encrypted when generated. In order to solve this issue, generate another private key without providing a password.

    [ERROR] The provided SSH public and private keys for Pattern Manager do not match, please provide a matching pair of keys.
    
  • This error message indicates that the provided pair of private and public keys provided for the Pattern Manager installation do not match. In order to fix it, create a new Content Runtime deployment while making sure that they were generate generated together or, if unsure, generate a new pair of keys.

    [ERROR] The server's hostname can not contain uppercase letters
    
  • This error occurs when the provided host name for the virtual machine contains an uppercase letter and the setup prerequisite checker was set to strict mode. When set to lenient, the script automatically converts the current host name to lower case. This issue originates from limitations in Chef.

    [ERROR] Docker CE for Red Hat Enterprise is not supported, please provide a valid Docker EE repository URL
    
  • This error is displayed when a RHEL template image is provided without a valid Docker EE repository URL. Docker Community Edition is installed by default on all the other distributions but it does not support RHEL officially.

    [ERROR] There was an error validating the certificate validation when obtaining Docker Compose from the provided URL.
    
  • When downloading the Docker compose installer, curl command is used for obtaining it. This issue is shown when the curl command wasn't able to validate the certificate for downloading the file, usually due to a required update in the system. To fix this issue, perform an update on the system and reboot, or skip the certificate verification in curl by adding the word insecure to ~/.curlrc. For security reasons, updating the system is the recommended solution path.

    [ERROR] There is an error with the default permissions (umask) for new users and folders. The recommended value is 022, found <mask>
    
  • The OS image being used contains a UMASK that will limit the required images' permissions when reading or writing files. The default UMASK value can usually be found and changed in the /etc/login.def file.

    [ERROR] Failed while installing <package>
    
  • During the prerequisite checking phase of the process, a failure when installing a software package displays this error. If encountered, reviewing the logs can provide more insight on the reason.

  • This issue can sometimes be caused by internet connectivity problems or custom setup of repositories which does not include the packages being installed.

VMware vSphere

  • The following error is displayed when a virtual machine with the same name already exists in the provided VMware vSphere cloud. By default, this name is pre-populated to ibm-content-runtime. To solve this issue, provide a new unique name for the Content Runtime instance.

    Error applying plan:
    1 error(s) occurred:
    * vsphere_virtual_machine.singlenode: 1 error(s) occurred:
    * vsphere_virtual_machine.singlenode: The attempted operation cannot be performed in the current state (Powered on).
    
  • The followings error varies according to the setup of the vSphere server in use. Some of the optional fields in the Optional Cloud Provider Settings might be required in order to achieve a successful deployment.

    Error applying plan:
    1 error(s) occurred:
    * vsphere_virtual_machine.singlenode: 1 error(s) occurred:
    * vsphere_virtual_machine.singlenode: default resource pool resolves to multiple instances, please specify
    

Troubleshooting Docker EE installation

RHEL

The installation of Docker EE requires access to the container-selinux package, located by default in the rhel-7-server-extras-rpms repository in /etc/yum.repos.d/redhat.repo. If the provided Virtual Machine image contains customized RHEL repositories and there is no access to the mentioned extras-rpms, the following error is displayed when attempting to install Docker EE.

vsphere_virtual_machine.singlenode (remote-exec): Loaded plugins: product-id, search-
vsphere_virtual_machine.singlenode (remote-exec):               : disabled-repos,
vsphere_virtual_machine.singlenode (remote-exec):               : subscription-manager
vphere_virtual_machine.singlenode (remote-exec): Error loading certificate

vsphere_virutal_machine.singlenode (remote-exec): Error getting repository data for rhel-7-server-extras-rpms, repository not found
vsphere_virtual_machine.singlenode (remote-exec): [ERROR] There was an error installing Docker EE from the provided repository
vsphere_virtual_machine.singlenode (remote-exec): [ERROR] Repo: https://storebits.docker.com/ee/rhel/sub-b5

Error applying plan:

1 error(s) occurred:

* vsphere_virtual_machine.singlenode: 1 error(s) occurred:

*Script exited with non-zero exit status: 1

Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure. 

Remote installation verification

A verification of the installation can be done by executing the following command:

ssh <user>@<content-runtime> advanced-content-runtime/verify-installation.sh

The information returned to the screen should help to diagnose the issue.

Known issues

  • CentOS and RHEL

    The default firewall on the CentOS must be modified to allow for the docker0 network connection to communicate back to the docker host. The connection docker0 must be added to the firewall configuration, and port 443 must be enabled for communication.

    firewall-cmd --permanent --zone=public --change-interface=docker0
    firewall-cmd --permanent --zone=public --add-port=443/tcp
    firewall-cmd --zone=public --list-all
    firewall-cmd --reload
    systemctl status podman
    systemctl restart podman
    podman exec -it camc-pattern-manager /bin/bash -c "nc -v -z <hostip> 443"
    podman exec -it camc-pattern-manager /bin/bash -c "ping google.com"
    

    Disabling the firewall causes an issue with the default docker DNS failing to resolve any hosts.

  • localhost.localdomain host name

    localhost is not supported as a host name for Content Runtime virtual machine. nmtui can be used to set the host name on RHEL and CentOS.

  • Mixed case host name

    The host name of the Content Runtime virtual machine is restricted to lower case. In the case of other template, the name is lower cased as part of the Content Runtime configuration process.

  • curl command fails on download of chef-server with SSL certificate error

    curl: (60) SSL certificate problem: certificate is not yet valid
    More details here: http://curl.haxx.se/docs/sslcerts.html
    

    The Content Runtime virtual machine time is too far out of synch with the current time. Set the time of the virtual machine to the current time.

  • Failed connection to external sites

    The Content Runtime configuration process relies on external sites to download install content for the virtual machine. If a connection to the internet is not available or if one of the sites is down, the Content Runtime installation and configuration process fails. These sites include:

Health check

If the Content Runtime experiences an unknown issue, read the following sections to know some commands which may help to isolate the problem.

A script on the Content Runtime virtual machine can be executed to display current status for running services.

The script can be executed by running the command:

ssh <user>@<content-runtime> advanced-content-runtime/verify-installation.sh

where user is the user used to create the Content Runtime. The path to the script can be adjusted based on the user to find the correct location of the script.

content-runtime is the IP address or host name of the Content Runtime virtual machine.

[SUCCESS] Chef server nodes are running correctly
[SUCCESS] Docker is installed
[SUCCESS] Docker is currently running
[SUCCESS] The Pattern Manager image is running correctly
[SUCCESS] The Software Repository image is running correctly
[SUCCESS] Connection to the Software Repo image was established correctly
[SUCCESS] Connection to the Pattern Manager image was established correctly
Connection to 192.168.122.89 443 port [tcp/https] succeeded!
[SUCCESS] Connection from Pattern Manager to host has been established successfully
[SUCCESS] /opt/ibm/docker/software-repo/var/swRepo/private is setup as expected

Connection to the Content Runtime virtual machine

The virtual machine running the Content Runtime can be connected using ssh and the private key associated with the User's Public Key parameter.

Docker

Ensure that the Docker service is running:

sudo systemctl status podman

or

sudo service podman status

The service can be started by using the start option with the previous command.

If you have installed Docker, you must enable the service to start on reboot. You can check the enablement of the service by running:

sudo systemctl is-enabled podman

Platforms running service should have the service to start by default.

Docker containers

Ensure that the Docker containers are running:

sudo podman ps

The command output should display two active containers as, for example:

CONTAINER ID    IMAGE                                 COMMAND                  CREATED        STATUS       PORTS                                             NAMES
93e4607e889a    ibmcom/camc-sw-repo:latest            "/bin/bash /tmp/in..."   3 days ago     Up 3 days    0.0.0.0:8888->8888/tcp, 0.0.0.0:9999->9999/tcp    camc-sw-repo
24866ee2426d    ibmcom/camc-pattern-manager:latest    "bin/bash /opt/ibm..."   3 days ago     Up 3 days    0.0.0.0:5443->443/tcp                             camc-pattern-manager

If the containers are not running, run the advanced-content-runtime/launch-docker-compose.sh command from the ~ directory to restart the containers.

Note that the docker-compose command controls stopping and starting the container.

Chef server status

Check the status of the Chef server by using the /usr/bin/chef-server-ctl status command.

If there are failed processes, take the appropriate action to recover.

Ubuntu apt-get fails

The configuration of the Content Runtime virtual machine can fail with an apt-get lock error. The issue is that a background process holds the lock on the /var/lib/dpkg/lock file. The timed upgrade does not complete within a limited amount of time.

null_resource.singlenode (remote-exec): Checing lock file /var/lib/dpkg/lock
null_resource.singlenode: Still creating... (18m30s elapsed)
null_resource.singlenode (remote-exec): E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
null_resource.singlenode (remote-exec): E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?
null_resource.singlenode (remote-exec): [ERROR] This script requires apt-get permissions for executing

Ubuntu 16.04 may be configured to run unattended updates. The lock which is held by that process might cause delays to occur with the Content Runtime configuration process.

Disk allocation

The logic within the configuration code takes the largest available non-formatted disk for the Software Repository. If no disk is available, the primary partition is used, with the assumption that it has enough space to hold the Software Repository.

If there is a second disk available, the disk is used as the Docker disk. Usage of the second disk is recommended for production environments.

The configuration code does not support using a separate disk for Docker, but not for the Software Repository.

Disk space

If there is not enough space for the Software Repository, you can allocate new space, and move the content of the Software Repository to that space. Complete this action according to the operating system you selected. When completed, you can run the advanced-content-runtime/verify-installation.sh script to determine if the base file structure is correct.

File system

A file directory structure is created for the Software Repository on the /opt/ibm/docker/software-repo/var/swRepo/ file system.

If you delete or rename any of the file directories, restart the docker container to pick up the changes by running the following command:

advanced-content-runtime/launch-docker-compose.sh