When Using HMC GUI you see message "Unable to connect to the Database Error occurred" with scenarios

Troubleshooting

Problem

When using HMC graphical user interface (GUI) to manage virtual networks or virtual storage on a managed system, you might encounter a problem where you see a message that includes "Unable to connect to the Database Error occurred."

Cause

The error is describing a message that the HMC is getting back from one or more Virtual I/O Server (VIOS) LPARs when it is trying to access the VIOS configurations. While you cannot proceed past this type of error in the HMC GUI, you have to take actions on the VIOS LPARs that are reporting the issue to fix the problem as the issue deals with VIOS not being able to parse data out of it is Configuration Management Database (CMDB).

Environment

IBM Power Systems with VIOS 3.1.x and higher managed by HMC V8 or V9.

Diagnosing The Problem

Scenario #1: Hostname resolution on the VIOS

If DNS is setup, make sure /etc/resolv.conf file contains the proper name server, and domain search entries if the file exists.

In the /etc/hosts file, make sure that you have the loopback and the VIOS hostname entries. The format of the VIOS hostname should be: IP FQDN alias

EXAMPLE: vios2 has IP (VIOS_IP) and the domain is dfw.ibm.com 

                127.0.0.1               loopback localhost      # loopback (lo0) name/address 

                (VIOS_IP)    (VIOS_Fully_qualified_domain_name)   (VIOS_alias)

Note: If you are using IPv4 only, you need to add the IPv4 loopback and remove or comment out the IPv6 loopback.

127.0.0.1 loopback localhost # loopback (lo0) name/address <---- IPv4 loopback
::1 loopback localhost # loopback (lo0) name/address <---- IPv6 loopback

Scenario #2: Missing the name resolution ordering in netsvc.conf file

In the /etc/netsvc.conf file, Make sure that the resolution ordering is mentioned. If you are using IPv4, then add the following:

hosts=local,bind4

Scenario #3: Updating the VIOS from 3.1.0.x to higher

You can encounter this error after updating your VIOS from 3.1.0.x to a higher version.

From the /home/ios/logs/viosvc.log.err log file on VIOS the following error is reported:

Could not load module /usr/ios/db/postgres13/lib/psqlodbcw.so. 

                Dependent module /usr/lib/libpq.a(libpq.so.5) could not be loaded. 

                The module has an invalid magic number. 

            Could not load module /usr/ios/db/postgres13/lib/psqlodbcw.

When we run the following commands under oem_setup_env, we get similar outputs:

$ oem_setup_env

# ldd /usr/ios/db/postgres13/lib/psqlodbcw.so
                /usr/ios/db/postgres13/lib/psqlodbcw.so needs:
                        /usr/lib/libc.a(shr.o)
                        /usr/lib/libiodbcinst.a(libiodbcinst.so.2)
                        /usr/lib/libpthreads.a(shr_xpg5.o)
                        /usr/ios/db/postgres13/lib/libpq.a(libpq.so.5)
                        /unix
                        /usr/lib/libcrypt.a(shr.o)
                        /usr/lib/libdl.a(shr.o)
                        /usr/lib/libpthreads.a(shr_comm.o)

# ar -tv /usr/ios/db/lib/libpq.a
                rw-r----- 300/300 418709 May 22 11:33 2018 libpq.32so.5

# ar -tv /usr/lib/libpq.a
                rw-r----- 300/300 418709 May 22 11:33 2018 libpq.32so.5

Scenario #4: Filesystem size is full on VIOS

When the filesystem on the VIOS is full, it can impact the ability of the vio_daemon from activating and updaing the Configuration Managment Database (CMDB) which will lead to a corruption in the database. If the "/home" filesystem is full, this can cause the vio_daemon not access the CMDB in order to update the database. Check the filesystem size by using the following command

$ df -g
Filesystem           GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4              0.25           0.10 62%    3277   13%       /
/dev/hd2               4.44           1.63 64%    59622 14%.     /usr
/dev/hd9var         0.75           0.70 8%       672      1%        /var
/dev/hd3               4.69           4.69 1%       35        1%       /tmp
/dev/hd1               10.00        9.89 2%       1556   1%       /home
/dev/hd11admin 0.12          0.12 1%       5           1%      /admin
/proc                        -                -         -           -            -          /proc
/dev/hd10opt       0.81         0.77 5%       597      1%     /opt
/dev/livedump     0.25          0.25 1%       4           1%    /var/adm/ras/livedump
/ahafs                     -            -         -           37         1%    /aha

If the "/home", "/var", "/", or "/tmp" are reporting 100% used, the respective filesystem needs to be cleaned out in order to leave enough space for the vio_daemon and other processes to work correctly.

Scenario #5: Permissions issue in /tmp

Check vpgadmin user permissions by using the following command

$ oem_setup_env
# lsuser vpgadmin
(Output)

You can use your favorite editor or you can use the vi editor that is part of the VIOS to view the /home/ios/logs/vdba.log file, you can see the following error

could not remove old lock file "/tmp/.s.PGSQL.6080.lock": The file access permissions do not allow the specified action.

Alternatively, you can run the following grep command to filter the /home/ios/logs/vdba.log file and output the error if it exists:

$ grep -i '/tmp/.s.PGSQL.6080.lock' /home/ios/logs/vdba.log

Then, when you try to run #su vpgadmin, you receive the following error

$ oem_setup_env
# su vpgadmin
ksh: /tmp/sh29032950.13: 0403-005 Cannot create the specified file.

Scenario #6: Upgrading from 3.1.x to 4.1.x

After upgrading the VIOS using the viosupgrade command on the VIOS command line using both the -F with the -g flags for system specific files like /etc/group, the HMC GUI would produce an error that it is not able to communicate with the VIOS database as the CMDB is not created on the VIOS after the upgrade process is completed.

Check vpgadmin attribute by using the following command

$ oem_setup_env
# lsuser vpgadmin
(Output)

Check the groups attribute if it is missing "db_users"

Check the vdba.log file for the following error "Cannot set process credentials"

$ oem_setup_env
# grep -i 'Cannot set process credentials' /home/ios/logs/vdba.log

Resolving The Problem

Scenario #1: Hostname resolution on the VIOS

In the /etc/hosts file, make sure that you have the loopback and the VIOS hostname entries. The format of the VIOS hostname should be as the following: IP FQDN alias

        EXAMPLE: (VIOS_alias) has IP (VIOS_IP) and domain is dfw.ibm.com 
                    
                                    127.0.0.1               loopback localhost      # loopback (lo0) name/address 
                    
                                    (VIOS_IP)    (VIOS_Fully_qualified_domain_name)   (VIOS_alias)

After editing the /etc/hosts file, you need to check that the resolution is correct by running the following commands:

$ oem_setup_env
# nslookup 1.2.3.4
# nslookup myhostname.mydomain
# host 1.2.3.4
# host myhostname.mydomain

        EXAMPLE: (VIOS_alias) has IP (VIOS_IP) and domain is dfw.ibm.com
                                    -- Run a reverse name lookup (query for IP in DNS database)
                                    # nslookup (VIOS_IP)
                                    Server:         (server_IP)
                                    Address:        (server_IP)#53
                                    
                            
                                    -- Run a hostname lookup using FQDN
                                    # nslookup (VIOS_Fully_qualified_domain_name)
                                    Server:         (server_IP)
                                    Address:        (server_IP)#53
                                    Address: (VIOS_IP)
                                    NOTE: both the hostname lookup and reverse name lookup returned the same information and this is what you need to see if DNS is configured (/etc/resolv.conf file exists and is configured properly)
                            
                            
                                    -- Also check local name resolution using the host command
                            
                                    # host (VIOS_IP)
                                    (VIOS_Fully_qualified_domain_name) is (VIOS_IP)
                                    # host (VIOS_Fully_qualified_domain_name)
                                    (VIOS_Fully_qualified_domain_name) is (VIOS_IP)
                                    NOTE: both queries returned same information.

Correct /etc/hosts and if needed get entries in DNS fixed untill all checks return the same information when your lookup VIOS's IP and hostname.

After completing your edits in the /etc/hosts file, run the following commands in order:

$ oem_setup_env
# /usr/bin/stopsrc -s vio_daemon
Wait 300 seconds or until vio_daemon has stopped.
# /usr/sbin/slibclean
# rm -rf /home/ios/CM
# /usr/bin/startsrc -s vio_daemon -a '-d 4' (this will start vio_daemon and vio_chgmgt and database)
# ps -eaf |grep vio_chgmgt ---> Note down the process ID of vio_chgmgt
# kill -1 PID_of_vio_chgmgt

Then, wait 5 - 10 minutes for the CMDB to repopulated then try the HMC GUI query again

If you are running an SSP environment, you must first stop the cluster and leave the MFS node in the end before running the commands mentioned:

$ clstartstop -stop -n clustername -m nodeA

Scenario #2: Missing the name resolution ordering in netsvc.conf file

In the /etc/netsvc.conf file, make sure that the resolution ordering is mentioned. If you are using IPv4, then add the following:

        hosts=local,bind4

After completing your edits in the /etc/hosts file, run the following commands in order:

$ oem_setup_env
# /usr/bin/stopsrc -s vio_daemon
Wait 300 seconds or until vio_daemon has stopped.
# /usr/sbin/slibclean
# rm -rf /home/ios/CM
# /usr/bin/startsrc -s vio_daemon -a '-d 4' (this will start vio_daemon and vio_chgmgt and database)
# ps -eaf |grep vio_chgmgt ---> Note down the process ID of vio_chgmgt
# kill -1 PID_of_vio_chgmgt

Then, wait 5 - 10 minutes for the CMDB to repopulated then try the HMC GUI query again

If you are running an SSP environment, you must first stop the cluster and leave the MFS node in the end before running the commands mentioned:

$ clstartstop -stop -n clustername -m nodeA

Scenario #3: Updating the VIOS from 3.1.0.x to higher

To resolve this issue, run the following commands in order:

$ oem_setup_env
# stopsrc -s vio_daemon
# rm /usr/lib/libpq.a
# startsrc -s vio_daemon
# lssrc -a | grep -i vio_daemon -> to get the vio_daemon’s PID
# kill -1 vio_daemon's PID

Then, wait 5 - 10 minutes for the CMDB to repopulated then try the HMC GUI query again

If you are running an SSP environment, you must first stop the cluster and leave the MFS node in the end before running the commands mentioned:

$ clstartstop -stop -n clustername -m nodeA

Scenario #4: Filesystem size is full on VIOS

In order to resolve this issue, the full filesystem needs to cleaned out by diagnosing the Full File Systems in PowerVM VIOS

Then run the following commands in order:

$ oem_setup_env
# /usr/bin/stopsrc -s vio_daemon
Wait 300 seconds or until vio_daemon has stopped.
# /usr/sbin/slibclean
# rm -rf /home/ios/CM
# /usr/bin/startsrc -s vio_daemon -a '-d 4' (this will start vio_daemon and vio_chgmgt and database)
# ps -eaf |grep vio_chgmgt ---> Note down the process ID of vio_chgmgt
# kill -1 PID_of_vio_chgmgt

Then, wait 5 - 10 minutes for the CMDB to repopulated then try the HMC GUI query again

If you are running an SSP environment, you must first stop the cluster and leave the MFS node in the end before running the commands mentioned:

$ clstartstop -stop -n clustername -m nodeA

Scenario #5: Permissions issue in /tmp

This is an issue with /tmp permissions. You need to make sure that /tmp has the following permissions by running the following command:

$ ls -ld /tmp
drwxrwxrwt bin bin tmp

If permissions do not match, run the following command to fix this:

$ oem_setup_env
# chmod 1777 /tmp
Then run the following commands in order:
$ oem_setup_env
# /usr/bin/stopsrc -s vio_daemon
Wait 300 seconds or until vio_daemon has stopped.
# /usr/sbin/slibclean
# rm -rf /home/ios/CM
# /usr/bin/startsrc -s vio_daemon -a '-d 4' (this will start vio_daemon and vio_chgmgt and database)
# ps -eaf |grep vio_chgmgt ---> Note down the process ID of vio_chgmgt
# kill -1 PID_of_vio_chgmgt
Then, wait 5 - 10 minutes for the CMDB to repopulated then try the HMC GUI query again
If you are running an SSP environment, you must first stop the cluster and leave the MFS node in the end before running the commands mentioned:
$ clstartstop -stop -n clustername -m nodeA

Scenario #6: Upgrading from 3.1.x to 4.1.x

The APAR IJ49629: SSP OR CM DB DOES NOT START AFTER VIOSUPGRADE discusses that it is not recommended to use the viosupgrade flags -F and -g for system specific files like /etc/group. Instead, just use -g, and manually merge the copy of /etc/group from backup_files after migration.

As a workaround, make sure to run the following commands on the affected VIOS

$ oem_setup_env
# stopsrc -s vio_daemon
# mkgroup -'A' id='202' users='vpgadmin,padmin' db_users
# startsrc -s vio_daemon -a "-d 4"
# kill -1 vio_daemon's PID

Then, wait 5 - 10 minutes for the CMDB to repopulated then try the HMC GUI query again

If you are running an SSP environment, you must first stop the cluster and leave the MFS node in the end before running the commands mentioned:

$ clstartstop -stop -n clustername -m nodeA

Note: You can use your favorite editors to edit or read the files mentioned. For example, You can use the vi editor that is part of the VIOS to make your alterations and view the content of your files.

Author:

Aly Aboulgheit

Related Information

resolv.conf File Format for TCP/IP

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.1.0"}]

Tips

When Using HMC GUI you see message "Unable to connect to the Database Error occurred" with scenarios

Troubleshooting

Problem

Cause

Environment

Diagnosing The Problem

Scenario #1: Hostname resolution on the VIOS

Scenario #2: Missing the name resolution ordering in netsvc.conf file

Scenario #3: Updating the VIOS from 3.1.0.x to higher

Scenario #4: Filesystem size is full on VIOS

Scenario #5: Permissions issue in /tmp

Scenario #6: Upgrading from 3.1.x to 4.1.x

Resolving The Problem

Scenario #1: Hostname resolution on the VIOS

Scenario #2: Missing the name resolution ordering in netsvc.conf file

Scenario #3: Updating the VIOS from 3.1.0.x to higher

Scenario #4: Filesystem size is full on VIOS

Scenario #5: Permissions issue in /tmp

Scenario #6: Upgrading from 3.1.x to 4.1.x

Related Information

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?