Problem reporting for IBM PowerKVM host through IBM Electronic Service Agent for IBM PowerLinux
Detecting hardware problems proactively to achieve higher availability
Some back-ground on Electronic Service Agent
IBM Electronic Service Agent (ESA) is a no-charge software tool available on IBM Power Systems™ to automatically and continuously monitor, collect, and submit hardware problem information to IBM Electronic Support. IBM ESA can also routinely collect and submit hardware, software, performance, and system configuration information, which can help the IBM support team in diagnosing problems. ESA is a product that runs on multiple platforms. There is a version in IBM AIX® and in IBM i operating systems. There is a version embedded in the IBM Power® Hardware Management Console (HMC), IBM Systems Director, IBM Flex System Manager® and in the IBM installation toolkit for PowerLinux. With ESA 3.0 or later versions for PowerLinux, problem reporting for IBM PowerKVM hosts is supported. ESA can monitor multiple PowerKVM systems and automatically report hardware problems occurring on PowerKVM hosts proactively and this results in higher availability of PowerKVM servers. This article covers only the hardware problem reporting functionality for PowerKVM hosts.
Problem reporting for PowerKVM host
In order to have automatic problem detection and reporting function for PowerKVM, the following prerequisites are required on the Power servers:
- Electronic Service Agent 3.0 or later installed on any PowerLinux system [stand alone, logical partitioning (LPAR) or kernel-based virtual machine (KVM) guest] and activated
- IBM Serviceable Event Provider RPM installed on the PowerKVM
Note: PowerKVM 2.1.1 comes with Serviceable Event provider by default. PowerKVM 2.1.0 does not come with Serviceable Event Provider by default. A download and install of Serviceable Event Provider on PowerKVM 2.1.0 hosts is required.
- PowerKVM host discovered through ESA
Discovery of PowerKVM hosts
In order to have successful discovery, a Serviceable Event Provider RPM must be installed on KVM host. IBM Serviceable Event Provider detects the hardware problems occurred on KVM host and sends the SNMP alerts to subscribed listeners.
The root credentials are needed for initial discovery, however, as part of the discovery process ESA creates a specific user esaadmin which has necessary privileges to run all commands needed by ESA and generates public-private key pairs that are stored on ESA and on KVM hosts, such that ESA system can log in without a password. Hence, the root credentials for any KVM host system are not stored or saved at all in this process.
Discovery also starts the Serviceable Event Provider and allows the system to subscribe to it.
Perform the following steps for discovery:
- Open the ESA web console and click Discovery on the left navigation pane and enter the PowerKVM host IP and credentials (root credentials are required).
Figure 1. Opening the Discovery panel
- Verify the connection between ESA and the KVM host by clicking Verify Connectivity.
Figure 2. Verifying connectivity
- Click Add System to add the PowerKVM system to ESA.
Figure 3. Adding the system
- Click on Refresh Log button to get the latest status of discovery.
Figure 4. Refreshing a discovery log
- Click All Systems under the Main tab and refresh the screen.
Figure 5. Navigatinge to All Systems
Similarly, multiple hosts may be discovered.
- Click Refresh Log to get the latest status of discovery.
Figure 6. Discovering multiple systems
Note: ESA invokes the background job on a daily basis and this activity purges the discovery log entries that are older than 24 hours.
- Next, click All Systems and click Refresh to see whether the discovered hosts are visible or not.
On the All Systems pane, we can see the list of all PowerKVM systems that were discovered along with the system where we have installed ESA. It displays both: system health and ESA status.
Figure 7. All Systems pane with multiple KVM hosts
System health represents the health status of PowerKVM host we discovered.
- A tick mark represents that there are no H/W problems detected by ESA for that specific system.
- A cross mark represents that there are hardware (H/W) problems detected by ESA for that specific system.
ESA will invoke a background job every 24 hours, which will update the system health based on service request status.
ESA status represents whether ESA is able to connect the discovered system or not.
- A tick mark represents that ESA is able to connect to the discovered system.
- A cross mark represents that ESA is unable to connect to the discovered system.
ESA invokes a background job every 24 hours, and updates the ESA status for all discovered hosts.
Users can select the host and can see system information, problem information, and can also delete the PowerKVM hosts. An alternative method for the user is to directly click the corresponding host name on the All Systems page to see the system information, and directly click the corresponding system health symbol to see any problems reported on that specific system.
The All Problems pane shows the list of all problems reported by all KVM hosts discovered by ESA.
Note: By default, the System Info, View Problems, and Delete System buttons are disabled until a host is selected.
Now, click System Info after selecting the host.
Figure 8. Selecting the discovered system
Figure 9. Viewing system information
Select any one of the discovered hosts and click View Problems.
Figure 10. Viewing problems
The problem details are available here (for example, description, SRC code, service request number and so on) and you can create test problems by clicking the Send Test Problem for that specific host.
We can send a test problem to the IBM Electronic Support team to see whether the problem reporting function is working correctly and also to determine whether connectivity to IBM support is working. After clicking Send Test Problem, ESA detects the test problem we triggered and creates a service request.
Click Send Test Problem.
Figure 11. Sending a test problem
Figure 12. Listing problems
Next, click All Systems and refresh the screen. We can see that the system's health status now shows a cross mark.
Figure 13. System's health with an open problem
System health will be changed to a tick mark (indicating that it is healthy) when all the problems reported on specific host are closed.
Figure 14 and Figure 15 show that change.
Figure 14. Checking the problem status
Figure 15. System health with closed or no problem
In today's world, everyone prefers highly available systems. To maintain the system's health at every moment, identifying and resolving the problems must be as fast as they are occurring. IBM Electronic Service Agent identifies the hardware problems at an early stage, collects and transmits the extended error data to IBM. The IBM support team can proactively assist customers in resolving the issues, achieving higher availability and more customer satisfaction.
Hardware problem detection for IBM POWER Hypervisor™ (PHYP) based Power servers is embedded with the management software called Hardware Management Console (HMC). But HMC cannot manage the KVM-based Power servers. So, in order to have problem detection capability, having IBM Electronic Service Agent is a key step for making those systems more highly available.
- To know more about ESA, visit the ESA Overview page.
- To know more about IBM support, visit the IBM support portal.