Processing monitor data

6.10 z/VM guest

The mon_procd daemon writes process data to the z/VM® monitor stream.

The data includes system summary information and information of each process for up to 100 processes currently being managed by an instance of Linux® on z/VM to the z/VM monitor stream.

At the time of the sample interval, one sample monitor record is written for system summary data. Then, one sample monitor record is written for each process for up to 100 processes currently being managed by the Linux instance. If more than 100 processes exist in a Linux instance at a given time, processes are sorted by the sum of CPU and memory usage percentage values. Only the top 100 processes' data is written to the z/VM monitor stream.

The monitor data in each record begins with a header (a time stamp, the length of the data, and the offset). The data after the header depends on the field "record number" of the 16-bit product ID and can be summary data or process data. See Reading the monitor data for details.

Table 1. System summary data format
Type Name Description
__u64 time_stamp Time at which the process data was sampled.
__u16 data_len Length of data that follows the header.
__u16 data_offset Offset from start of the header to the start of the process data.
__u64 uptime Uptime of the Linux instance.
__u32 users Number of users on the Linux instance.
char[6] loadavg_1 Load average over the last 1 minute.
char[6] loadavg_5 Load average over the last 5 minutes.
char[6] loadavg_15 Load average over the last 15 minutes.
__u32 task_total total number of tasks on the Linux instance.
__u32 task_running Number of running tasks.
__u32 task_sleeping Number of sleeping tasks.
__u32 task_stopped Number of stopped tasks.
__u32 task_zombie Number of zombie tasks.
__u32 num_cpus Number of CPUs.
__u16 puser A number that represents (100 * percentage of total CPU time used for normal processes executing in user mode).
__u16 pnice A number that represents (100 * percentage of total CPU time used for niced processes executing in user mode).
__u16 psystem A number that represents (100 * percentage of total CPU time used for processes executing in kernel mode).
__u16 pidle A number that represents (100 * percentage of total CPU idle time).
__u16 piowait A number that represents (100 * percentage of total CPU time used for I/O wait).
__u16 pirq A number that represents (100 * percentage of total CPU time used for interrupts).
__u16 psoftirq A number that represents (100 * percentage of total CPU time used for softirqs).
__u16 psteal A number that represents (100 * percentage of total CPU time spent in stealing).
__u64 mem_total Total memory in KB.
__u64 mem_used Used memory in KB.
__u64 mem_free Free memory in KB.
__u64 mem_buffers Memory in buffer cache in KB.
__u64 mem_pgpgin Data read from disk in KB.
__u64 mem_pgpgout Data written to disk in KB
__u64 swap_total Total swap memory in KB.
__u64 swap_used Used swap memory in KB.
__u64 swap_free Free swap memory in KB.
__u64 swap_cached Cached swap memory in KB.
__u64 swap_pswpin Pages that are swapped in.
__u64 swap_pswpout Pages that are swapped out.

The following is the format of a process information data that is passed to the z/VM monitor stream.

Table 2. Process data format
Type Name Description
__u64 time_stamp Time at which the process data was sampled.
__u16 data_len Length of data that follows the header.
__u16 data_offset Offset from start of the header to the start of the process data.
__u32 pid ID of the process.
__u32 ppid ID of the process parent.
__u32 euid Effective user ID of the process owner.
__u16 tty Device number of the controlling terminal or 0.
__s16 priority Priority of the process
__s16 nice Nice value of the process.
__u32 processor Last used processor.
__u16 pcpu A number that represents (100 * percentage of the elapsed cpu time that is used by the process since last sampling).
__u16 pmem A number that represents (100 * percentage of physical memory that is used by the process).
__u64 total_time Total cpu time the process used.
__u64 ctotal_time Total cpu time the process and its dead child processes used.
__u64 size Total virtual memory that is used by the task in KB.
__u64 swap Swapped out portion of the virtual memory in KB.
__u64 resident Non-swapped physical memory that is used by the task in KB.
__u64 trs Physical memory that is devoted to executable code in KB.
__u64 drs Physical memory that is devoted to other than executable code in KB.
__u64 share Shared memory that is used by the task in KB.
__u64 dt Dirty page count.
__u64 maj_flt Number of major page faults occurred for the process.
char state Status of the process.
__u32 flags The process current scheduling flags.
__u16 ruser_len Length of real user name of the process owner and should not be larger than 64.
char[ruser_len] ruser Real user name of the process owner. If the name is longer than 64, the name is truncated to the length 64.
__u16 euser_len Length of effective user name of the process owner and should not be larger than 64.
char[euser_len] euser Effective user name of the process owner. If the name is longer than 64, the name is truncated to the length 64.
__u16 egroup_len Length of effective group name of the process owner and should not be larger than 64.
char [egroup_len] egroup Effective group name of the process owner. If the name is longer than 64, the name is truncated to the length 64.
__u16 wchan_len Length of sleeping in function's name and should not be larger than 64.
char[wchan_len] wchan_name Name of sleeping in function or '-'. If the name is longer than 64, the name is truncated to the length 64.
__u16 cmd_len Length of command name or program name that is used to start the process and should not be larger than 64.
char[cmd_len] cmd Command or program name that is used to start the process. If the name is longer than 64, the name is truncated to the length 64.
__u16 cmd_line_len Length of command line that is used to start the process and should not be larger than 1024.
char [cmd_line_len] cmd_line Command line that is used to start the process. If the name is longer than 1024, the name is truncated to the length 1024.

Use the time_stamp to correlate all process information that were sampled in a given interval.