Troubleshooting and known issues
Troubleshoot your Content Runtime infrastructure and read the known issues.
Failed to create the Content Runtime
The console log displays the following error message which is caused by the issue. You must correct the error and configure the Content Runtime again.
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m40s elapsed)
null_resource.singlenode: Still creating... (3m50s elapsed)
null_resource.singlenode: Still creating... (4m0s elapsed)
null_resource.singlenode: Still creating... (4m10s elapsed)
null_resource.singlenode: Still creating... (4m20s elapsed)
null_resource.singlenode: Still creating... (4m30s elapsed)
null_resource.singlenode: Still creating... (4m40s elapsed)
null_resource.singlenode: Still creating... (4m50s elapsed)
null_resource.singlenode: Still creating... (5m0s elapsed)
Error applying plan:
1 error(s) occurred:
* null_resource.singlenode: 1 error(s) occurred:
* ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain
Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure.
If you are unable to determine the cause of the failure from the logs, it is possible to log in to the Content Runtime virtual machine and run the configuration process manually using the following command which is located in the home directory of the OS user:
advanced-content-runtime/launch-docker-compose.sh
Additional debug information can be captured from the ./advanced-content-runtime/launch-docker-compose.sh
script by editing the ./advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh
file and adding
the line --debug
to the end of the file. Rerun the ./advanced-content-runtime/launch-docker-compose.sh
command to capture the additional information. Parameters may be updated in the ./advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh
file if the parameter values were the cause of the failure.
If you are successful when completing the creation of a Content Runtime from the command line, you need to complete the following steps to add it to the Content Runtime creation panel:
- Go back to the Content Runtime creation panel and recreate the Content Runtime from the beginning.
- Leave the virtual machine intact, go back to the Content Runtime panel, select create Content Runtime's
Other
template. You must enter all the parameters as they appear in the~/advanced-content-runtime/.advanced-runtime-config/.launch-docker-compose.sh
file in the create panel so that the existing virtual machine is reused. Managed services then stores its information so that it can be used later in pattern deployment.
Common issues when deploying a Content Runtime
The following are a set of input issues that would result in a failed deployment of a Content Runtime, how to identify them by reading the provided Log File
and the steps to solve them.
All cloud providers
The following errors can occur independently of the selected cloud provider.
ssh: unable to authenticate, attempted methods [none password], no supported methods remain
- This error is displayed when the credentials (user name and password or private key) provided for the SSH connection to the virtual machine were incorrect.
[ERROR] Platform <distribution> not supported
- If a Content Runtime is trying to be deployed to an unsupported operating system, this error is displayed. For more information, see Supported Openshift versions and platforms.
[ERROR] This OS version (<version>) is not supported
- The error occurs when a Content Runtime is trying to be deployed to a supported operating system but an unsupported version. For more information, see Supported Openshift versions and platforms.
[ERROR] This script requires root permissions with the NOPASSWD option enabled for executing
-
This error is displayed when the user account provided for connecting to the Content Runtime has no
sudo
permissions or creates a prompt for password when executing assudo
: -
To remove the password prompt, the file
/etc/sudoers
must be changed to includeuser_name ALL=(ALL:ALL) NOPASSWD: ALL
[ERROR] This script requires <package manager> permissions for executing
- This error occurs when the user account provided for connecting to the Content Runtime has no permissions to use the OS's package manager. A different user account with higher permissions can be provided to solve it.
[ERROR] This script requires at least 5GB of available disk space
- This error is displayed when the hard drive being used for installing the necessary software and downloading additional packages does not contain at least 5GB of empty space. An image containing a larger hard drive or freeing up more space should solve this issue.
[ERROR] The provided encoded private key contains a password. Pattern Manager requires the use of a passwordless private key.
- This error is displayed when the provided private key was password-encrypted when generated. In order to solve this issue, generate another private key without providing a password.
[ERROR] The provided SSH public and private keys for Pattern Manager do not match, please provide a matching pair of keys.
- This error message indicates that the provided pair of private and public keys provided for the Pattern Manager installation do not match. In order to fix it, create a new Content Runtime deployment while making sure that they were generate generated together or, if unsure, generate a new pair of keys.
[ERROR] The server's hostname can not contain uppercase letters
- This error occurs when the provided host name for the virtual machine contains an uppercase letter and the setup prerequisite checker was set to
strict
mode. When set tolenient
, the script automatically converts the current host name to lower case. This issue originates from limitations inChef
.
[ERROR] Docker CE for Red Hat Enterprise is not supported, please provide a valid Docker EE repository URL
- This error is displayed when a RHEL template image is provided without a valid Docker EE repository URL. Docker Community Edition is installed by default on all the other distributions but it does not support RHEL officially.
[ERROR] There was an error validating the certificate validation when obtaining Docker Compose from the provided URL.
- When downloading the Docker compose installer,
curl
command is used for obtaining it. This issue is shown when thecurl
command wasn't able to validate the certificate for downloading the file, usually due to a required update in the system. To fix this issue, perform an update on the system and reboot, or skip the certificate verification incurl
by adding the wordinsecure
to~/.curlrc
. For security reasons, updating the system is the recommended solution path.
[ERROR] There is an error with the default permissions (umask) for new users and folders. The recommended value is 022, found <mask>
- The OS image being used contains a UMASK that will limit the required images' permissions when reading or writing files. The default UMASK value can usually be found and changed in the
/etc/login.def
file.
[ERROR] Failed while installing <package>
-
During the prerequisite checking phase of the process, a failure when installing a software package displays this error. If encountered, reviewing the logs can provide more insight on the reason.
-
This issue can sometimes be caused by internet connectivity problems or custom setup of repositories which does not include the packages being installed.
VMware vSphere
-
The following error is displayed when a virtual machine with the same
name
already exists in the provided VMware vSphere cloud. By default, thisname
is pre-populated toibm-content-runtime
. To solve this issue, provide a new unique name for the Content Runtime instance.Error applying plan: 1 error(s) occurred: * vsphere_virtual_machine.singlenode: 1 error(s) occurred: * vsphere_virtual_machine.singlenode: The attempted operation cannot be performed in the current state (Powered on).
-
The followings error varies according to the setup of the vSphere server in use. Some of the optional fields in the
Optional Cloud Provider Settings
might be required in order to achieve a successful deployment.Error applying plan: 1 error(s) occurred: * vsphere_virtual_machine.singlenode: 1 error(s) occurred: * vsphere_virtual_machine.singlenode: default resource pool resolves to multiple instances, please specify
Troubleshooting Docker EE installation
RHEL
The installation of Docker EE requires access to the container-selinux
package, located by default in the rhel-7-server-extras-rpms
repository in /etc/yum.repos.d/redhat.repo
. If the provided Virtual Machine
image contains customized RHEL repositories and there is no access to the mentioned extras-rpms
, the following error is displayed when attempting to install Docker EE.
vsphere_virtual_machine.singlenode (remote-exec): Loaded plugins: product-id, search-
vsphere_virtual_machine.singlenode (remote-exec): : disabled-repos,
vsphere_virtual_machine.singlenode (remote-exec): : subscription-manager
vphere_virtual_machine.singlenode (remote-exec): Error loading certificate
vsphere_virutal_machine.singlenode (remote-exec): Error getting repository data for rhel-7-server-extras-rpms, repository not found
vsphere_virtual_machine.singlenode (remote-exec): [ERROR] There was an error installing Docker EE from the provided repository
vsphere_virtual_machine.singlenode (remote-exec): [ERROR] Repo: https://storebits.docker.com/ee/rhel/sub-b5
Error applying plan:
1 error(s) occurred:
* vsphere_virtual_machine.singlenode: 1 error(s) occurred:
*Script exited with non-zero exit status: 1
Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure.
Remote installation verification
A verification of the installation can be done by executing the following command:
ssh <user>@<content-runtime> advanced-content-runtime/verify-installation.sh
The information returned to the screen should help to diagnose the issue.
Known issues
-
CentOS and RHEL
The default firewall on the CentOS must be modified to allow for the
docker0
network connection to communicate back to the docker host. The connectiondocker0
must be added to the firewall configuration, and port 443 must be enabled for communication.firewall-cmd --permanent --zone=public --change-interface=docker0 firewall-cmd --permanent --zone=public --add-port=443/tcp firewall-cmd --zone=public --list-all firewall-cmd --reload systemctl status docker systemctl restart docker docker exec -it camc-pattern-manager /bin/bash -c "nc -v -z <hostip> 443" docker exec -it camc-pattern-manager /bin/bash -c "ping google.com"
Disabling the firewall causes an issue with the default docker DNS failing to resolve any hosts.
-
localhost.localdomain host name
localhost
is not supported as a host name for Content Runtime virtual machine.nmtui
can be used to set the host name on RHEL and CentOS. -
Mixed case host name
The host name of the Content Runtime virtual machine is restricted to lower case. In the case of
other
template, the name is lower cased as part of the Content Runtime configuration process. -
curl command fails on download of chef-server with SSL certificate error
curl: (60) SSL certificate problem: certificate is not yet valid More details here: http://curl.haxx.se/docs/sslcerts.html
The Content Runtime virtual machine time is too far out of synch with the current time. Set the time of the virtual machine to the current time.
-
Failed connection to external sites
The Content Runtime configuration process relies on external sites to download install content for the virtual machine. If a connection to the internet is not available or if one of the sites is down, the Content Runtime installation and configuration process fails. These sites include:
- Chef
- Docker Hub
- GitHub
- OS update service for the base operating
systemctl
Health check
If the Content Runtime experiences an unknown issue, read the following sections to know some commands which may help to isolate the problem.
A script on the Content Runtime virtual machine can be executed to display current status for running services.
The script can be executed by running the command:
ssh <user>@<content-runtime> advanced-content-runtime/verify-installation.sh
where user
is the user used to create the Content Runtime. The path to the script can be adjusted based on the user to find the correct location of the script.
content-runtime
is the IP address or host name of the Content Runtime virtual machine.
[SUCCESS] Chef server nodes are running correctly
[SUCCESS] Docker is installed
[SUCCESS] Docker is currently running
[SUCCESS] The Pattern Manager image is running correctly
[SUCCESS] The Software Repository image is running correctly
[SUCCESS] Connection to the Software Repo image was established correctly
[SUCCESS] Connection to the Pattern Manager image was established correctly
Connection to 192.168.122.89 443 port [tcp/https] succeeded!
[SUCCESS] Connection from Pattern Manager to host has been established successfully
[SUCCESS] /opt/ibm/docker/software-repo/var/swRepo/private is setup as expected
Connection to the Content Runtime virtual machine
The virtual machine running the Content Runtime can be connected using ssh
and the private key associated with the User's Public Key
parameter.
Docker
Ensure that the Docker service is running:
sudo systemctl status docker
or
sudo service docker status
The service can be started by using the start
option with the previous command.
If you have installed Docker, you must enable the service to start on reboot. You can check the enablement of the service by running:
sudo systemctl is-enabled docker
Platforms running service
should have the service to start by default.
Docker containers
Ensure that the Docker containers are running:
sudo docker ps
The command output should display two active containers as, for example:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
93e4607e889a ibmcom/camc-sw-repo:latest "/bin/bash /tmp/in..." 3 days ago Up 3 days 0.0.0.0:8888->8888/tcp, 0.0.0.0:9999->9999/tcp camc-sw-repo
24866ee2426d ibmcom/camc-pattern-manager:latest "bin/bash /opt/ibm..." 3 days ago Up 3 days 0.0.0.0:5443->443/tcp camc-pattern-manager
If the containers are not running, run the advanced-content-runtime/launch-docker-compose.sh
command from the ~
directory to restart the containers.
Note that the docker-compose
command controls stopping and starting the container.
Chef server status
Check the status of the Chef server by using the /usr/bin/chef-server-ctl status
command.
If there are failed processes, take the appropriate action to recover.
Ubuntu apt-get fails
The configuration of the Content Runtime virtual machine can fail with an apt-get
lock error. The issue is that a background process holds the lock on the /var/lib/dpkg/lock
file. The timed upgrade does not complete within
a limited amount of time.
null_resource.singlenode (remote-exec): Checing lock file /var/lib/dpkg/lock
null_resource.singlenode: Still creating... (18m30s elapsed)
null_resource.singlenode (remote-exec): E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
null_resource.singlenode (remote-exec): E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?
null_resource.singlenode (remote-exec): [ERROR] This script requires apt-get permissions for executing
Ubuntu 16.04 may be configured to run unattended updates. The lock which is held by that process might cause delays to occur with the Content Runtime configuration process.
Disk allocation
The logic within the configuration code takes the largest available non-formatted disk for the Software Repository. If no disk is available, the primary partition is used, with the assumption that it has enough space to hold the Software Repository.
If there is a second disk available, the disk is used as the Docker disk. Usage of the second disk is recommended for production environments.
The configuration code does not support using a separate disk for Docker, but not for the Software Repository.
Disk space
If there is not enough space for the Software Repository, you can allocate new space, and move the content of the Software Repository to that space. Complete this action according to the operating system you selected. When completed, you can run
the advanced-content-runtime/verify-installation.sh
script to determine if the base file structure is correct.
File system
A file directory structure is created for the Software Repository on the /opt/ibm/docker/software-repo/var/swRepo/
file system.
If you delete or rename any of the file directories, restart the docker container to pick up the changes by running the following command:
advanced-content-runtime/launch-docker-compose.sh