Monitoring IBM i instances
Supported versions
Supported IBM i platforms are: 7.2, 7.3, 7.4.
Note: Currently, we only support remote monitoring of IBM i instances.
Configuration
To start monitoring IBM i instances, you need to configure the following fields in the agent configuration file <agent_install_dir>/etc/instana/configuration.yaml
:
com.instana.plugin.ibmiseries:
enabled: true
remote: # multiple configurations supported
- host: 'remote.host-1.com'
user: 'username'
password: 'password'
availabilityZone: 'IBM i Remote Monitoring'
poll_rate: 15 # seconds
- host: 'remote.host-2.com'
user: 'username'
password: 'password'
availabilityZone: 'IBM i Remote Monitoring'
poll_rate: 15 # seconds
user_specification: # For user inputs (Optional)
jobs: 'comma separated list of job names' # example - '100345/QUSER/QZDASOINIT,QLWISVR/ADMIN2'
messageQueue:
filter: # User defined filter for Message Queue table
library/queueName: 'Lib-1/queueName1,Lib-2/queueName2' ## Provide values in comma(,) separated way. (Default Value : 'QSYS/QSYSOPR')
timeFrame: '10 HOURS' ## Format is : {value} MINUTES/HOURS/DAYS (Default Value : '10 MINUTES')
event: # User defined filter for Message Queue table
library/queueName: 'Lib-1/queueName1,Lib-2/queueName2' ## Provide values in comma(,) separated way.
messageIds: 'messageId-1,messageId-2' ## Provide values in comma(,) separated way.
timeFrame: '10 HOURS' ## Format is : {value} MINUTES/HOURS/DAYS (Default Value : '10 MINUTES')
historyLog:
filter:
timeFrame: '1 DAYS' ## Format is : {value} MINUTES / HOURS / DAYS (Default Value : '10 MINUTES')
The configured remote IBM i instance will then be shown as a separate box in the specified availabilityZone
.
Note: Currently, the user specified within the user
configuration parameter should have QSECOFR
authority.
Metrics collection
Configuration data
- Host name
- OS Version
- Total CPU
- Total Memory
- Configured CPU
- Configured Memory
- Partition ID
- Number of partitions
- Restricted state
Performance metrics
System Metrics
Metric | Description | Granularity |
---|---|---|
CPU Rate | The average CPU rate expressed as a percentage where 100% indicates the processor is running at its nominal frequency. A value above or below 100% indicates how much the processor has been slowed down (throttled) or speeded up (turbo) relative to the nominal frequency for the processor model. For instance, a value of 120% indicates the processor is running 20% faster against its nominal speed. | 15 seconds |
Average CPU Utilization | The average CPU utilization for all the active processors. | 15 seconds |
Min CPU Utilization | The CPU utilization of the processor that reported the minimum amount of CPU utilization. | 15 seconds |
Max CPU Utilization | The CPU utilization of the processor that reported the maximum amount of CPU utilization. | 15 seconds |
Active Jobs | The number of jobs active in the system (jobs that have been started, but have not yet ended), including both user and system jobs. | 15 seconds |
Interactive Jobs | The percentage of interactive performance assigned to this logical partition. This value is a percentage of the total interactive performance available to the entire physical system. | 15 seconds |
Total Jobs | The total number of user and system jobs that are currently in the system. The total includes: all jobs on job queues waiting to be processed, all jobs currently active (being processed), all jobs that have completed running but still have output on output queues to be produced. | 15 seconds |
Max Jobs | The maximum number of jobs that are allowed on the system. When the number of jobs reaches this maximum, you can no longer submit or start more jobs on the system. The total includes: all jobs on job queues waiting to be processed, all jobs currently active (being processed), all jobs that have completed running but still have output on output queues to be produced. | 15 seconds |
Used Auxiliary Storage Pool | The percentage of the system storage pool (ASP number 1) currently in use. | 15 seconds |
Capacity of Auxiliary Storage Pool | The storage capacity of the system auxiliary storage pool (ASP number 1) in millions of bytes. This value represents the amount of space available for storage of both permanent and temporary objects. | 15 seconds |
Current Temporary Storage | The current amount of storage, in millions of bytes, in use for temporary objects. | 15 seconds |
Maximum Temporary Storage Used | The largest amount of storage, in millions of bytes, used for temporary objects at any one time since the last IPL. | 15 seconds |
Active Threads | The number of initial and secondary threads in the system (threads that have been started, but have not yet ended), including both user and system threads. | 15 seconds |
Total Spool Space | The total spool space consumed by the output queue in bytes. | 15 seconds |
Active Memory Pool Metrics
Metric | Description | Granularity |
---|---|---|
Storage Used | The amount of main storage, in megabytes, in the pool. | 15 seconds |
Storage Reserved | The amount of storage, in megabytes, in the pool reserved for system use (for example, for save/restore operations). | 15 seconds |
Storage Defined | The size of the pool, in megabytes, as defined in the shared pool, subsystem description, or system value QMCHPOOL. Contains the null value for a pool without a defined size. | 15 seconds |
Active Threads | The number of threads currently using the pool. | 15 seconds |
Ineligible Threads | The number of ineligible threads in the pool. | 15 seconds |
Max Threads | The maximum number of threads that can be active in the pool at any one time. | 15 seconds |
Output Queue Metrics
Metric | Description | Granularity |
---|---|---|
Queue Name | The name of the output queue. | 15 seconds |
Library Name | The name of the library that contains the output queue. | 15 seconds |
Status | The status of the output queue. | 15 seconds |
Files in Queue | The total number of spooled files currently on this output queue. | 15 seconds |
Writer Job Name | The qualified job name of the writer job. If more than one writer is started, this is the name of the first writer. Contains the null value if a writer job is not started for this queue. | 15 seconds |
Writer Job Status | The status of the writer job. If more than one writer is started, this is the status of the first writer. | 15 seconds |
Top Spool Space Consumption
Top 20 users consuming the spool space
Metric | Description | Granularity |
---|---|---|
User | The name of the user profile that produced the Spool files. | 15 seconds |
Spool Space | The size of the users spooled files, in bytes. | 15 seconds |
Top Active Jobs
Top 20 active jobs currently running in the system and job names matching the values specified in user_specification:jobs
Metric | Description | Granularity |
---|---|---|
Job Name | The qualified job name. | 15 seconds |
User Name | The user profile under which the initial thread is running at this time. For jobs that swap user profiles, this user profile name and the user profile that initiated the job can be different. | 15 seconds |
Elapsed CPU Percentage | The percent of processing unit time attributed to this job during the measurement time interval. | 15 seconds |
Temporary Storage | The size of the users spooled files, in kilobytes. | 15 seconds |
Job Status | The status of the initial thread of the job. | 15 seconds |
Job Type | Type of active job. | 15 seconds |
Thread Count | The number of active threads in the job. | 15 seconds |
Auxiliary Storage Pools
Information about auxiliary storage pools (ASPs).
Metric | Description | Granularity |
---|---|---|
ASP Number | A unique identifier for an ASP. Possible values are 1 through 255. | 15 seconds |
Device Description Name | The name of the device description that brought the independent ASP (IASP) to varyon/active state. | 15 seconds |
ASP Type | The use that is assigned to the ASP. | 15 seconds |
ASP State | The device configuration status of an ASP. | 15 seconds |
Number Of Disk Units | The total number of disk units in the ASP. If mirroring is active for disk units within the ASP, the mirrored pair of units is counted as one. | 15 seconds |
Total Capacity | The total number of used and unused megabytes in the ASP. A special value of -2 is returned if the size of this field is exceeded. | 15 seconds |
Total Capacity Utilization | Utilization Percentage of the Total Capacity in the ASP. | 15 seconds |
Protected Capacity | The total number of used and unused megabytes in the ASP that are protected by mirroring or device parity. A special value of -2 is returned if the value was too big to return. Contains the null value if the capacity cannot be determined. | 15 seconds |
Protected Capacity Utilization | Utilization Percentage of the Protected Capacity in the ASP. | 15 seconds |
Unprotected Capacity | The total number of used and unused megabytes in the ASP that are not protected by mirroring or device parity. A special value of -2 is returned if the value was too big to return. Contains the null value if the capacity cannot be determined. | 15 seconds |
Unprotected Capacity Utilization | Utilization Percentage of the Unprotected Capacity in the ASP. | 15 seconds |
Active Subsystems
Information about Active Subsystems
Metric | Description | Granularity |
---|---|---|
Name | The name of the subsystem about which information is being returned. | 15 seconds |
Library Name | The name of the library in which the subsystem description resides. | 15 seconds |
Active Jobs | The number of jobs currently active in the subsystem. This number includes held jobs but excludes jobs that are disconnected or suspended because of a transfer secondary job or a transfer group job. If STATUS is INACTIVE, returns 0. | 15 seconds |
Max Active Jobs | The maximum number of jobs that can run or use resources in the subsystem at one time. Contains the null value if the subsystem description specifies *NOMAX, indicating that there is no maximum. | 15 seconds |
Description | The text description of the subsystem description. | 15 seconds |
Job Queue
Information about job queue.
Metric | Description | Granularity |
---|---|---|
Job Queue Name | The name of the job queue. | 15 seconds |
Job Queue Library | The name of the library that contains the job queue. | 15 seconds |
Subsystem Name | The name of the subsystem that can receive jobs from this job queue. Contains the null value if this job queue is not associated with an active subsystem. | 15 seconds |
Subsystem Library Name | The library in which the subsystem description resides. Contains the null value if this job queue is not associated with an active subsystem. | 15 seconds |
Number Of Jobs | The number of jobs in the queue. | 15 seconds |
Active Jobs | The current number of jobs that are active that came through this job queue entry. Contains the null value if this job queue is not associated with an active subsystem. | 15 seconds |
Maximum Active Jobs | The maximum number of jobs that can be active at the same time through this job queue entry. A value of -1 indicates *NOMAX, no maximum number of jobs is defined. Contains the null value if this job queue is not associated with an active subsystem. | 15 seconds |
Job Queue Status | The status of the job queue. HELD : The queue is held. RELEASED : The queue is released. | 15 seconds |
Text Description | Text that describes the job queue. Contains the null value if there is no text description for the job queue. | 15 seconds |
Held Jobs | The current number of jobs that are in *HELD status. This is the sum of the 10 HELD_JOBS_PRIORITY_n columns. | 15 seconds |
Released Jobs | The current number of jobs that are in *RELEASED status. This is the sum of the 10 RELEASED_JOBS_PRIORITY_n columns. | 15 seconds |
Scheduled Jobs | The current number of jobs that are in *SCHEDULED status. This is the sum of the 10 SCHEDULED_JOBS_PRIORITY_n columns. | 15 seconds |
Network interfaces
Information about IPv4 and IPv6 interfaces
Metric | Description | Granularity |
---|---|---|
Internet Address | The internet address of the interface. | 15 seconds |
Subnet Mask | The subnet mask for the network, subnet, and host address fields of the internet address that defines the subnetwork for an interface. Contains null if this is an IPv6 connection. | 15 seconds |
Connection Type | The type of connection (IPV4,IPV6). | 15 seconds |
Interface Line Type | The type of line used by the interface. | 15 seconds |
Line Description | The name of the communications line description that identifies the physical network associated with an interface. | 15 seconds |
VLAN ID | The virtual LAN to which this interface belongs. | 15 seconds |
Status | The current status of the logical interface. | 15 seconds |
Status value Mapping
Metric Value | Status |
---|---|
0 | ENDING |
1 | ACTIVE |
2 | FAILED |
3 | FAILED_TCP |
4 | INACTIVE |
5 | RCYCNL |
6 | RCYPND |
7 | STARTING |
8 | ACQUIRING |
9 | ACQUIRING |
10 | ACQUIRING |
Network connections (Top Receivers)
Netstat Info For Bytes Received Locally
Metric | Description | Granularity |
---|---|---|
Remote Port & Address | This column is combination of remote Port and remote Address. Remote Port : The remote host port number. A value of 0 means that the connection is a listening or UDP socket, so this field does not apply. Remote Address : The internet address of the remote host. For IPv4: The address is in IPv4 address format. A value of 0.0.0.0 indicates that either the system is waiting for a connection to open or that a UDP socket is being used. A value of 0 means that the connection is a listening or UDP socket so this field does not apply. For IPv6: The address is in IPv6 address format. A value of :: means that the connection is a listening socket so this field does not apply. | 15 seconds |
Bind User | The user profile of the job on the local system which first performed a sockets API bind() of the socket. | 15 seconds |
Local Port & Address | This column is combination of local Port and local Address. Local Port : The local system port number. Local Address : The local address of this connection on this system. For IPv4: The address is in IPv4 address format. A value of 0.0.0.0 indicates that either the system is waiting for a connection to open or that a UDP socket is being used. For IPv6: The address is in IPv6 address format. A value of :: means the local application specified that any local internet address can be used. | 15 seconds |
Remote Port Name | The library in which the subsystem description resides. Contains the null value if this job queue is not associated with an active subsystem. | 15 seconds |
Local Port Name | The local system well-known port name or the name from the service table entry. Contains null if there is no well-known port name. | 15 seconds |
Bytes Sent Remotely | The number of bytes sent to the remote host. | 15 seconds |
Bytes Received Locally | The number of bytes received from the remote host. | 15 seconds |
Protocol | Identifies the type of connection protocol. TCP : A Transmission Control Protocol (TCP) connection or socket. UDP : A User Datagram Protocol (UDP) socket. | 15 seconds |
TcpState | The state of the connection. CLOSED : This connection has ended. CLOSE-WAIT : Waiting for an end connection request from the local user. CLOSING : Waiting for an end connection request acknowledgment from the remote host. ESTABLISHED : The normal state in which data is transferred. FIN-WAIT-1 : Waiting for the remote host to acknowledge the local system request to end the connection. FIN-WAIT-2 : Waiting for the remote host request to end the onnection. LAST-ACK : Waiting for the remote host to acknowledge an end connection request. LISTEN : Waiting for a connection request from any remote host. SYN-RECEIVED : Waiting for a confirming connection request acknowledgment. SYN-SENT : Waiting for a matching connection request after having sent a connection request. TIME-WAIT : Waiting to allow the remote host enough time to receive the local system's acknowledgment to end the connection. Contains null if PROTOCOL is UDP. | 15 seconds |
Network connections (Top Senders)
Netstat Info For Bytes Send Locally
Metric | Description | Granularity |
---|---|---|
Remote Port & Address | This column is combination of remote Port and remote Address. Remote Port : The remote host port number. A value of 0 means that the connection is a listening or UDP socket, so this field does not apply. Remote Address : The internet address of the remote host. For IPv4: The address is in IPv4 address format. A value of 0.0.0.0 indicates that either the system is waiting for a connection to open or that a UDP socket is being used. A value of 0 means that the connection is a listening or UDP socket so this field does not apply. For IPv6: The address is in IPv6 address format. A value of :: means that the connection is a listening socket so this field does not apply. | 15 seconds |
Bind User | The user profile of the job on the local system which first performed a sockets API bind() of the socket. | 15 seconds |
Local Port & Address | This column is combination of local Port and local Address. Local Port : The local system port number. Local Address : The local address of this connection on this system. For IPv4: The address is in IPv4 address format. A value of 0.0.0.0 indicates that either the system is waiting for a connection to open or that a UDP socket is being used. For IPv6: The address is in IPv6 address format. A value of :: means the local application specified that any local internet address can be used. | 15 seconds |
Remote Port Name | The library in which the subsystem description resides. Contains the null value if this job queue is not associated with an active subsystem. | 15 seconds |
Local Port Name | The local system well-known port name or the name from the service table entry. Contains null if there is no well-known port name. | 15 seconds |
Bytes Sent Remotely | The number of bytes sent to the remote host. | 15 seconds |
Bytes Received Locally | The number of bytes received from the remote host. | 15 seconds |
Protocol | Identifies the type of connection protocol. TCP : A Transmission Control Protocol (TCP) connection or socket. UDP : A User Datagram Protocol (UDP) socket. | 15 seconds |
TcpState | The state of the connection. CLOSED : This connection has ended. CLOSE-WAIT : Waiting for an end connection request from the local user. CLOSING : Waiting for an end connection request acknowledgment from the remote host. ESTABLISHED : The normal state in which data is transferred. FIN-WAIT-1 : Waiting for the remote host to acknowledge the local system request to end the connection. FIN-WAIT-2 : Waiting for the remote host request to end the connection. LAST-ACK : Waiting for the remote host to acknowledge an end connection request. LISTEN : Waiting for a connection request from any remote host. SYN-RECEIVED : Waiting for a confirming connection request acknowledgment. SYN-SENT : Waiting for a matching connection request after having sent a connection request. TIME-WAIT : Waiting to allow the remote host enough time to receive the local system's acknowledgment to end the connection. Contains null if PROTOCOL is UDP. | 15 seconds |
Message Queue
Information about each message in a message queue. Instana event would be created whenever a message in a Message Queue matches the specifications(Queue Library, Queue Name, Message ID) as provided by the user in configuration.yaml
file.
Metric | Description | Granularity |
---|---|---|
Message Id | The message ID for this message. Contains the null value if this is an impromptu message or MESSAGE_TYPE is REPLY. | 15 seconds |
Message Type | Type of message. Values are: COMPLETION, DIAGNOSTIC, ESCAPE, INFORMATIONAL, INQUIRY, NOTIFY, REPLY, REQUEST, SENDER. | 15 seconds |
Severity | The severity assigned to the message. | 15 seconds |
Message Queue Library | The name of the library containing the message queue. | 15 seconds |
Message Queue Name | The name of the message queue containing the message. | 15 seconds |
Message Timestamp | The timestamp when the message is sent. | 15 seconds |
Message Text | The first level text of the message including tokens, or the impromptu message text. Contains the null value if MESSAGE_TYPE is REPLY or if the message file could not be accessed. | 15 seconds |
Message Second Level Text | The second level text of the message including tokens. Contains the null value if MESSAGE_ID is null or if the message has no second level text or if the message file could not be accessed. | 15 seconds |
Message Key | The key that is assigned to the message. The key is assigned by the command or API that sends the message. For details, see Message Types and Message Keys in the QMHRCVM API. | 15 seconds |
History Logs
Information about each message in the history log.
Metric | Description | Granularity |
---|---|---|
Message Id | The message ID for this message. Contains the null value if this is an impromptu message or MESSAGE_TYPE is REPLY. | 15 seconds |
Message Type | Type of message. Values are COMPLETION, DIAGNOSTIC, ESCAPE, INFORMATIONAL, INQUIRY, NOTIFY, REPLY, REQUEST, or SENDER. | 15 seconds |
Severity | The severity that is assigned to the message. | 15 seconds |
User | The current user of the job when the message was sent. | 15 seconds |
Job | The qualified job name when the message was sent. | 15 seconds |
Program | The program that sent the message. | 15 seconds |
Message Timestamp | The timestamp when the message is sent. | 15 seconds |
Message Text | The first level text of the message including tokens, or the impromptu message text. Contains the null value if MESSAGE_ID is null or if the message file could not be accessed. | 15 seconds |
Message Second Level Text | The second level text of the message including tokens. Contains the null value if MESSAGE_ID is null or if the message has no second level text or if the message file could not be accessed. | 15 seconds |