The Linux tick based CPU time accounting is heuristic: When
a timer interrupt occurs - which is every 1/100 second on
a zSeries machine - Linux checks which context has been interrupted.
It accounts the complete time slice to this context. For example,
if the timer interrupt occurs in a kernel context, then the
complete time slice is accounted as system time.
Example:
This example shows: Depending on when the timer interrupt
occurs, the CPU time accounting may look different. For
simplification only the user and kernel time is displayed.
For Linux images running on non-virtualized systems, the
tick based CPU time accounting usually is sufficient. It is
not precise, but over time, inaccuracies are balanced out.
On systems with virtual CPUs (like z/VM, or Xen) the tick
based CPU time accounting is not precise enough, when using
more virtual CPUs than real CPUs on the virtual platform.
In this case, the real CPUs might spend part of their time
servicing another virtual processor, while the time slice
might be accounted to a process which actually could not utilize
it. Therefore the Linux reported CPU times can highly deviate
from the load numbers reported by the hypervisor of a virtual
platform.
This example shows involuntary wait states (called "steal
time", in white) for Linux images running on virtual
systems.
A precise CPU time accounting for virtual systems must provide
the following features:
Distinction between real and virtual CPU time
Provision of a concept for showing wait states caused
by involuntary wait (steal time)
For Linux on System z a new CPU time accounting, called "virtual
CPU time accounting" has been implemented from Linux
kernel 2.6.11 on, which is based on the virtual System z virtual
CPU timer instead of using wall-time as in the tick based
CPU accounting:
Each CPU has its own CPU timer. The stepping rate of a
CPU timer is synchronized with the system TOD (Time-of-Day)
clock, but is only incremented when a virtual CPU is backed
by a physical CPU. The stpt (Store CPU Timer)
instruction is added to the Linux system call path. By storing
the CPU timer, the really used CPU time can be calculated
for a Linux image.
Virtual CPU time accounting accounts CPU times, whenever
the execution context changes (see stpt in the
chart below). This is much more precise than the tick based
accounting scheme. Those two features - distinction between
real and virtual CPU time and explicitly exposing steal times
- guarantee correct CPU time accounts for Linux images running
on virtualized or on non-virtualized platforms.