Monitoring IBM Z HMC
The IBM Z HMC sensor is automatically deployed and installed after you install the Instana agent.
To monitor IBM Z HMC in a Kubernetes or Red Hat OpenShift cluster, do not install Instana host agents on each node of the cluster. Install host agents on dedicated host machines.
This is an optional feature, disabled by default in the Instana backend. To enable this optional feature, see the page for your Instana deployment: SaaS, Self-Hosted Custom Edition (Kubernetes or Red Hat OpenShift Container Platform), or Self-Hosted Classic Edition (Docker)
- Supported versions
- Configuration
- Metrics collection
- Manage events from CPC(Server) and HMC console
- Troubleshoot
Supported versions
IBM Z HMC sensor is now supported as a platform. Confirmed support for metrics and configuration data for Z/HMC API version 2.x, 3.x and 4.x.
Configuration
Required permissions
For the IBM zHMC sensor to connect with zHMC and monitor the metrics, ensure that the HMC user must have the following permissions:
- Access to Web Service APIs on the HMC. To enable access, go to HMC Management - Customize API Settings.
- Permission to Audit and Log Management, Hardware messages, and View Security Logs tasks. To grant permission, use the User Management task to create a user role that includes specific tasks, or find a user role with those tasks. Then, assign that user role to the user.
- Permission to use the HMC Web Services APIs. To grant access, on the User Details section of User Management for the user, select the
Allow access to Web Services management interfaces
field on the Customize API Settings task or the User Management task. - Object-access is required to monitor objects. The bare minimum objects required are ’Defined CPC
,
LPAR Image, and
Central Processor (CP)`. To monitor adapters, add adapter objects when you create a role on zHMC. Use the User Management task to create a user role that contains specific objects or object types to monitor, or find an existing user role with the appropriate objects. Then assign that user role to the user.
Sensor configuration
To connect to zHMC server, you need to configure the following fields in the agent configuration <agent_install_dir>/etc/instana/configuration.yaml
:
Note: Only remote monitoring is supported. You can have multiple HMCs configured as follows:
com.instana.plugin.zhmc:
remote:
- host: '' # IP address of the HMC
port: '' # HMC port
user: '' # userid on the HMC to be used for logging on
password: '' # password for the userid
poll_rate: 15 # metrics poll rate in seconds. Poll rate can not be less than 15 seconds.
eventsPollRate: 60 # event poll rate in seconds (optional). Comment this configuration to stop the events.
connectionTimeout: 50 # It is the timeout until a connection with the server is established. Default is 50 seconds.
connectionRequestTimeout: 50 # It is the time to fetch a connection from the connection pool. Default is 50 seconds.
socketTimeout: 50 # It is socket read time out. Default is 50 seconds.
Metrics collection
To view the metrics, select Platforms in the sidebar of the Instana UI, click zHMC in the listed platforms, and then you can see a dashboard with a list of IBM Z HMC servers in the zHMCs tab and a list of Central Processor Complex (CPC) in the Systems tab.
Currently, this supports following 11 Metric Groups in both Classic and DPM operational mode.
SI No. | Metrics Group Name | Mode |
---|---|---|
1 | cpc-usage-overview | C |
2 | logical-partition-usage | C |
3 | channel-usage | C |
4 | dpm-system-usage-overview | D |
5 | partition-usage | D |
6 | zcpc-environmentals-and-power | C+D |
7 | zcpc-processor-usage | C+D |
8 | crypto-usage | C |
9 | flash-memory-usage | D |
10 | adapter-usage | C |
11 | network-physical-adapter-port | D |
C - Classic and D - DPM mode.
Performance metrics
CPC overview (C)
This metric group reports the aggregated processor usage and channel usage, the ambient temperature, and total system power consumption for each system. The cpc-processor-usage is the average of the percentages of processing capacity for all the physical processors in the CPC. The channel-usage is the average of the percentages of I/O capacity for all the channels and adapters in the CPC.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
CPC Processor Usage | The processor percent usage for Central Processor Complex processors. | 15 seconds |
Channel Usage | The channel percent usage. | 15 seconds |
Power Consumption Watts | The total system power consumption in watts. | 15 seconds |
Temperature Celsius | The ambient temperature in Degree Celsius. | 15 seconds |
CP Shared Processor Usage | The processor percent usage for shared Central Processors. | 15 seconds |
CP Dedicated Processor Usage | The processor percent usage for dedicated Central Processors. | 15 seconds |
IFL Shared Processor Usage | The processor percent usage for shared Integrated Facility for Linux processors. | 15 seconds |
IFL Dedicated Processor Usage | The processor percent usage for dedicated Integrated Facility for Linux processors. | 15 seconds |
ICF Shared Processor Usage | The processor percent usage for shared Internal Coupling Facility processors. | 15 seconds |
ICF Dedicated Processor Usage | The processor percent usage for dedicated Internal Coupling Facility processors. | 15 seconds |
IIP Shared Processor Usage | The processor percent usage for shared Integrated Information processors. | 15 seconds |
IIP Dedicated Processor Usage | The processor percent usage for dedicated Integrated Information Processors. | 15 seconds |
AAP Shared Processor Usage | The processor percent usage for shared Application Assist Processors. | 15 seconds |
AAP Dedicated Processor Usage | The processor percent usage for dedicated Application Assist Processors. | 15 seconds |
ALL Shared Processor Usage | The processor percent usage for all the shared processors, combined together. | 15 seconds |
ALL Dedicated Processor Usage | The processor percent usage for all the dedicated processors, combined together. | 15 seconds |
CP ALL Processor Usage | The processor percent usage for all the Central Processors, combined together. | 15 seconds |
IFL ALL Processor Usage | The processor percent usage for all the Integrated Facility for Linux processors, combined together. | 15 seconds |
ICF ALL Processor Usage | The processor percent usage for all the Internal Coupling Facility processors, combined together. | 15 seconds |
IIP ALL Processor Usage | The processor percent usage for all the Integrated Information Processors, combined together. | 15 seconds |
CBP Shared Processor Usage | The processor percent usage for shared Container Based Processors. | 15 seconds |
CBP Dedicated Processor Usage | The processor percent usage for dedicated Container Based Processors. | 15 seconds |
CBP ALL Processor Usage | The processor percent usage for all the Container Based Processors. | 15 seconds |
Logical partitions (C)
This metric group reports the processor usage for each active logical partition (Image, LPAR Image, Zone, PR/SM virtual server) on the system.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Processor Usage | The processor percent usage of the Logical Partition. | 15 seconds |
CP Processor Usage | The processor percent usage for Central Processor. | 15 seconds |
IFL Processor Usage | The processor percent usage for Integrated Facility for Linux processors. | 15 seconds |
ICF Processor Usage | The processor percent usage for Internal Coupling Facility processors. | 15 seconds |
IIP Processor Usage | The processor percent usage for Integrated Information Processors. | 15 seconds |
CBP Processor Usage | The processor percent usage for Container Based Processor. | 15 seconds |
Channels Usage (C)
This metric group reports the channel usage for each channel on the system. An instance of this metric group is created for each channel of a CPC.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Channel Name | The name of the channel in the form channel subsystem path id. | 15 seconds |
Shared Channel | True if the channel is shared among logical partitions, and false if it is not. | 15 seconds |
Logical Partition Name | The name of the owning logical partition or the value "shared" if the channel is shared. | 15 seconds |
Channel Usage | The channel percent usage (0 – 100%). | 15 seconds |
DPM system overview (D)
This metric group reports the aggregated processor usage, network usage, storage usage, accelerator usage, crypto usage, power consumption and temperature for each DPM enabled system.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Processor usage | The processor percent usage. | 15 seconds |
Network usage | The network percent usage. | 15 seconds |
Storage usage | The storage percent usage. | 15 seconds |
Accelerator usage | The accelerator percent usage. | 15 seconds |
Crypto usage | The crypto percent usage. | 15 seconds |
Power consumption watts | The power consumption in watts. | 15 seconds |
Temperature celsius | The ambient temperature. | 15 seconds |
CP shared- processor usage | The processor percent usage for all CP shared processors. | 15 seconds |
CP all processor usage | The processor percent usage for all CP processors. | 15 seconds |
IFL shared processor usage | The processor percent usage for all IFL shared processors. | 15 seconds |
All processor usage | The processor percent usage for all IFL processors. | 15 seconds |
All shared processor usage | The processor percent usage for all shared processors. | 15 seconds |
Partitions (D)
This metric group reports the processor usage, network usage, storage usage, accelerator usage, and crypto usage for each active partition on a DPM enabled system.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Processor usage | The processor percent usage. | 15 seconds |
Network usage | The network percent usage. | 15 seconds |
Storage usage | The storage percent usage. | 15 seconds |
Accelerator usage | The accelerator percent usage. | 15 seconds |
Crypto usage | The crypto percent usage. | 15 seconds |
zCPC environmentals and power (C+D)
This metric group reports environmental data and power consumption for the zCPC.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Temperature celsius | The ambient temperature | 15 seconds |
Humidity | The relative humidity | 15 seconds |
Dew point celsius | The dew point | 15 seconds |
Power consumption watts | The power consumption in watts | 15 seconds |
Heat load | The total heat load of the system (heat load forced-air + heat load water) | 15 seconds |
Heat load forced air | The heat load covered by forced-air | 15 seconds |
Heat load water | The heat load covered by water | 15 seconds |
Exhaust temperature celsius | The exhaust temperature | 15 seconds |
zCPC processors (C+D)
This metric group reports the processor usage for each physical zCPC processor on the system. This includes the System Assist Processors (SAPs). An instance of this metric group is created for each processor of a CPC.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Processor name | The name of the zCPC processor in the form processor-type + processor ID. | 15 seconds |
Processor type | The type of zCPC processor. | 15 seconds |
Processor usage | The processor percent usage. | 15 seconds |
Smt usage | The percentage of time the processor is running in simultaneous multithreading (SMT) mode. | 15 seconds |
Thread 0 usage | The percent usage of thread 0 when the processor is running in simultaneous multithreading (SMT) mode | 15 seconds |
Thread 1 usage | The percent usage of thread 1 when the processor is running in simultaneous multithreading (SMT) mode | 15 seconds |
Cryptos (C)
This metric group reports the adapter usage for each crypto on the system. An instance of this metric group is created for each crypto adapter. This metric group is not used for a DPM system. For DPM, crypto adapters are reported in the Adapters metric group.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Channel id | The physical channel identifier of the crypto | 15 seconds |
Crypto id | The crypto identifier of the crypto, decimal 0-15 | 15 seconds |
Adapter usage | The adapter percent usage (0-100%) | 15 seconds |
Adapters (D)
This metric group reports the adapter usage for each adapter on the DPM enabled system. An instance of this metric group is created for each adapter.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Adapter usage | The adapter percent usage (0-100%) | 15 seconds |
Flash memory adapters (C)
This metric group reports the adapter usage for each Flash memory (Flash Express) adapter on the system. An instance of this metric group is created for each Flash memory adapter of the CPC. If a CPC has no flash memory adapters, then no data will appear in this metric group for that CPC.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
Channel id | The physical channel identifier of the Flash memory adapter | 15 seconds |
Adapter usage | The adapter percent usage (0-100%) | 15 seconds |
Network adapter port metric group (D)
OSA and RoCE network adapters have up to two physical ports that connect to the network. Metrics are collected from these ports on a DPM enabled system and provided to the user. This metrics group will contain metrics data representing metrics for one physical port.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
network-port-id | Numerical value corresponding to the network adapter's physical port. | 15 seconds |
bytes-sent | Number of bytes this physical port sent out to the attached network. | 15 seconds |
bytes-received | Number of unicast packets this physical port received from the attached network. | 15 seconds |
packets-sent | Number of unicast packets this physical port sent out to the attached network. | 15 seconds |
packets-received | Number of unicast packets this physical port received from the attached network. | 15 seconds |
packets-sent-dropped | Number of packets that were dropped when this physical port was sending them out to the attached network. | 15 seconds |
packets-received- dropped | Number of packets that were dropped when this physical port was receiving them from the attached network. | 15 seconds |
packets-sent- discarded | Number of packets that were discarded when this physical port was sending them out to the attached network. | 15 seconds |
packets-received- discarded | Number of packets that were discarded when this physical port was receiving them from the attached network. | 15 seconds |
multicast-packets-sent | Number of multicast packets this physical port sent out to the attached network. | 15 seconds |
multicast-packets received | Number of multicast packets this physical port received from the attached network. | 15 seconds |
broadcast-packets sent | Number of broadcast packets this physical port sent out to the attached network. | 15 seconds |
broadcast-packets received | Number of broadcast packets this physical port received from the attached network. | 15 seconds |
interval-bytes-sent | Number of bytes sent by this physical port over the collection interval. | 15 seconds |
interval-bytes-received | Number of bytes received by this physical port over the collection interval. | 15 seconds |
bytes-per-second-sent | Number of bytes sent per second by this physical port over the collection interval. | 15 seconds |
bytes-per-second- received | Number of bytes per second received by this physical port over the collection interval. | 15 seconds |
utilization | Link utilization expressed as usage percentage of overall link bandwidth. | 15 seconds |
mac-address | The MAC address of this uplink, if known. | 15 seconds |
flags | Flags indicating the types of metrics that are supported by this interface. | 15 seconds |
Network interface metric group (D)
This metric group reports metrics for NICs on a DPM enabled system. NICs are network resources associated with DPM partitions. Only NICs that are activated will report metric data. This metrics group contains metrics data representing metrics for one NIC. Metrics are collected and provided on an interval, and each metric provided is the total cumulative value, and not a delta.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
partition-id | The unique identifier for the partition that owns the NIC whose metric is contained within this metric group. | 30 seconds |
bytes-per-second-sent | Number of bytes sent per second by this network adapter over the collection interval. | 30 seconds |
bytes-per-second-received | Number of bytes per second received by this network adapter over the collection interval. | 30 seconds |
packets-sent | Number of unicast packets this network adapter sent out to the attached network. | 30 seconds |
packets-received | Number of unicast packets this network adapter received from the attached network. | 30 seconds |
packets-sent-dropped | Number of packets that were dropped when this network adapter was sending them out to the attached network. | 30 seconds |
packets-received-dropped | Number of packets that were dropped when this network adapter was receiving them from the attached network. | 30 seconds |
packets-sent-discarded | Number of packets that were discarded when this network adapter was sending them out to the attached network. | 30 seconds |
packets-received-discarded | Number of packets that were discarded when this network adapter was receiving them from the attached network. | 30 seconds |
RoCE adapters (C)
This metric group reports the adapter usage for each RoCE (10GbE RoCE) adapter on the system. An instance of this metric group is created for each RoCE adapter of the CPC.
The following metrics are provided in each entry of this metric group:
Metric | Description | Granularity |
---|---|---|
channel-id | The physical channel identifier of the RoCE adapter. | 15 seconds |
adapter-usage | The adapter percent usage (0-100%). | 15 seconds |
Manage events from CPC(Server) and HMC console
Critical events gets triggered from these servers due to some failures or incidents. Following types of events are sent to Instana to be displayed on the event page:
- Problematic Hardware Messages from CPC
- Problematic Hardware Messages from Console
- Critical Console Audit Events
- Critical Console Security Events
Troubleshoot
-
Import self-signed certificate used by Z HMC server If Z HMC server uses self-signed certificate make sure it is imported into the jvm's cacert. Also, if you see following exception in the log it implies Z HMC server uses self-signed certificate and it needs to be imported into the jvm's cacert.
sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target. PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.
Solution: Import a self-signed certificate by following the instructions from Self-Signed Certificate.
-
Self-signed certificate having invalid SAN The server certificate is invalid if the SAN does not have the expected ip address.
Solution: Correct the server certificate and import it again.
-
403 Forbidden The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.
Solution: Verify the credentials provided in the configuration.yaml file along with all the required permissions to the user.