Let's look at how easy it is to get started with the Linux CPUfreq subsystem by detailing its usage settings and providing some interface options. We'll start with some general settings like
- The /sys interface
- The cpuspeed settings file
- cpufreq-utils
The /sys filesystem provides a user interface for CPUfreq, starting at /sys/devices/system/cpu/. Some of these files are writable (by root) and others are read-only.
First, take a look at /sys/devices/system/cpu/. Here you will find a
directory for each logical CPU and the
sched_mc_power_savings tunable and, if
available on your system, the
sched_smt_power_savings tunable, which I will
discuss later.
Listing 1. Checking the contents of /sys/devices/system/cpu/
[root@systemx ~]# cd /sys/devices/system/cpu/ |
Inside each processor's directory is a cpufreq directory, which contains the CPUfreq interface:
Listing 2. Checking the cpufreq directory
[root@systemx cpu]# cd cpu0/cpufreq/
[root@systemx cpufreq]# ls -l
total 0
-r--r--r-- 1 root root 4096 Oct 31 14:53 affected_cpus
-r-------- 1 root root 4096 Oct 31 14:53 cpuinfo_cur_freq
-r--r--r-- 1 root root 4096 Oct 31 14:53 cpuinfo_max_freq
-r--r--r-- 1 root root 4096 Oct 31 14:53 cpuinfo_min_freq
-r--r--r-- 1 root root 4096 Oct 31 14:53 scaling_available_frequencies
-r--r--r-- 1 root root 4096 Oct 31 14:53 scaling_available_governors
-r--r--r-- 1 root root 4096 Oct 31 14:53 scaling_cur_freq
-r--r--r-- 1 root root 4096 Oct 31 14:53 scaling_driver
-rw-r--r-- 1 root root 0 Nov 5 11:44 scaling_governor
-rw-r--r-- 1 root root 4096 Oct 31 14:53 scaling_max_freq
-rw-r--r-- 1 root root 4096 Oct 31 14:53 scaling_min_freq
|
If the governor is set to conservative or ondemand, you will also see a directory of the governor's name here. We will discuss how to change the governor later.
These files are available for every governor. We'll talk about what each of the settings mean and how to change some of them; then we will discuss some governor-specific settings beyond this interface. Note that the settings under the cpufreq directory can be different for each processor, so to get a uniform policy across processors, you must change the setting's value for each processor as described in the following sections.
First, affected_cpus shows which processors are
affected by a frequency change. The reason is that some processors are
frequency-dependent on each other due to a coordination of hardware or
software or both, and must change frequencies at the same time. For
example, you might see this type of setup:
Listing 3. Checking which processors are affected by frequency change
[root@systemx ~]# cd /sys/devices/system/cpu [root@systemx cpu]# grep . cpu*/cpufreq/affected_cpus cpu0/cpufreq/affected_cpus:0 1 cpu1/cpufreq/affected_cpus:0 1 cpu2/cpufreq/affected_cpus:2 3 cpu3/cpufreq/affected_cpus:2 3 |
Next, cpuinfo_cur_freq shows the processor's
current operating frequency. The file
scaling_cur_freq lists the current scaling
frequency that the governors are using.
Listing 4. Checking frequencies
[root@systemx cpufreq]# cat cpuinfo_cur_freq 2997000 [root@systemx cpufreq]# cat scaling_cur_freq 2997000 |
All frequencies listed in this interface are in KHz.
Next are some files that provide information about the available processor
frequencies. The files cpuinfo_max_freq and
cpuinfo_min_freq hold the max and min
frequencies available to the system;
scaling_available_frequencies shows all
available frequencies.
Listing 5. Checking max, min, and available frequencies
[root@systemx cpufreq]# cat cpuinfo_max_freq 2997000 [root@systemx cpufreq]# cat cpuinfo_min_freq 1998000 [root@systemx cpufreq]# cat scaling_available_frequencies 2997000 2664000 2331000 1998000 |
The scaling_available_governors file lists all
available governors. If you do not see all five of the governors, check to
make sure all the governors are enabled in your config file and that you
have the governor's module loaded as I described in Part 1.
Listing 6. Checking available governors
[root@systemx cpufreq]# cat scaling_available_governors ondemand powersave conservative userspace performance |
The file scaling_driver will tell you what
cpufreq driver your system is running. Some typical drivers include
acpi, speedstep-smi,
speedstep-centrino,
powernor_k8,
powernow_k7,
longhaul, etc. If you wish to change the
driver, you will need to unload the driver in use before loading another
driver. Also, be sure to check that the driver will work with your
processor before trying to use it.
Listing 7. Checking which cpufreq driver you system is running
[root@systemx cpufreq]# cat scaling_driver centrino |
The rest of the files in this directory are writable by root and give the user the ability to change some cpufreq settings. These files are the only settings the user is allowed to change for the powersave and performance governors. The other governors have additional settings available which we discuss in the next section.
First, the file scaling_governor shows the
current governor enabled. To change the governor, simply
echo the new governor's name into this file.
Note that you must do this for every processor to obtain a uniform policy.
For example:
Listing 8. Checking which governor is enabled and changing governors
[root@systemx ~]# cd /sys/devices/system/cpu/ [root@systemx cpu]# ls cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu7 sched_mc_power_savings [root@systemx cpu]# cat cpu0/cpufreq/scaling_governor performance [root@systemx cpu]# echo conservative > cpu0/cpufreq/scaling_governor [root@systemx cpu]# cat cpu0/cpufreq/scaling_governor conservative |
The scaling_max_freq and
scaling_min_freq files show the max and min
frequencies available to the governor. The user can change the range of
frequencies available to the governor by
echoing an available frequency into these
files. Note that the frequency must be one of the frequencies listed in
scaling_available_frequencies since these are
all of the processor frequencies available to the system. Again, note that
you must do this for every processor. For example:
Listing 9. Changing frequencies available to governor
[root@systemx ~]# cd /sys/devices/system/cpu/ [root@systemx cpu]# cat cpu0/cpufreq/scaling_available_frequencies 2997000 2664000 2331000 1998000 [root@systemx cpu]# cat cpu0/cpufreq/scaling_max_freq 2997000 [root@systemx cpu]# cat cpu0/cpufreq/scaling_min_freq 1998000 [root@systemx cpu]# echo 2331000 > cpu0/cpufreq/scaling_min_freq [root@systemx cpu]# cat cpu0/cpufreq/scaling_min_freq 2331000 |
Using the cpuspeed settings file
In addition to directly echoing in the values
for the settings as mentioned previously, a user can also use the cpuspeed
settings file to change the driver, governor, min and max speeds,
utilization thresholds, and the
ignore_nice_load setting. RHEL 5.2 comes with
cpuspeed included, but other distributions of Linux may not contain this
package. If cpuspeed is not included in your distribution, you can
download the carlthompson.net version;
directions for installation are provided in the README. To use the RHEL
5.2 version of cpuspeed, simply edit the /etc/sysconfig/cpuspeed file,
assign a value to any of the setting variables in the file, and issue the
following command:
/etc/init.d/cpuspeed restart
This command will put the new settings into effect. Remember, you must have the corresponding governor module loaded to start using that governor unless it was already built in.
RHEL 5.2 and some other distributions also come with the cpufreq-utils package that allows for another user interface to the CPUfreq subsystem. Most other distributions should have this package included as well. When you install the cpufreq-utils rpm, you will have two utilities called cpufreq-info and cpufreq-set.
The cpufreq-info utility will list information about the processors and
their CPUfreq settings such as the current frequency, the frequency
limits, the CPUfreq driver, the current policy, the current governor, and
the affected-cpus list.
The cpufreq-set utility allows the user to change each processor's range of available frequencies, the operating governor, and the current running frequency when the userspace governor is enabled. For more information, see the cpufreq-info and cpufreq-set man pages.
Now let's discuss settings the user can change in the in-kernel governors.
The powersave and performance governors
These governors statically set the processor frequency to the lowest and highest frequencies, respectively. The only settings the user can change are the settings I discussed in the previous section.
Now we start the discussion on governor-specific settings. If you enable
the userspace governor, you will also see a file called
scaling_setspeed that is writable by root in
the cpufreq directory. This governor allows the user or a program in
userspace to interactively change the processor frequency. The user has
the ability to echo in the desired frequency to
this file or allow some userspace daemon to set this value. As we did with
the files we discussed earlier, you must change the
scaling_setspeed file for each processor.
Numerous daemons work with the userspace governor to adjust the processor frequency; here are a few examples:
cpudyn(CPU dynamic frequency control): This daemon bases frequency changes off of the processor load and also has the ability to put disks in standby when there is no activity to save even more power.cpufreqd: This daemon can be configured to react to battery level, AC status, temperature, running programs, processor usage, and more.cpuspeed: This daemon can change frequencies based on processor demand, power supply changes, temperature, and more.powernowd: This governor daemon bases frequency changes off of the processor load and has four different behavioral modes that users can chose.
If you load the ondemand governor, you will see a directory called
ondemand in the cpufreq directory. Inside this directory are many tunable
settings. All of the writable (by root) files can be changed by
echoing in the new value as shown previously.
Note that any changes to the ondemand settings will be applied systemwide
so you will not need to change the setting for each processor.
Listing 10. Checking tunable settings for ondemand
[root@systemx ~]# cd /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ [root@systemx ondemand]# ls -l total 0 -rw-r--r-- 1 root root 4096 Nov 19 10:30 ignore_nice_load -rw-r--r-- 1 root root 4096 Nov 19 10:30 powersave_bias -rw-r--r-- 1 root root 4096 Nov 19 10:30 sampling_rate -r--r--r-- 1 root root 4096 Nov 19 10:30 sampling_rate_max -r--r--r-- 1 root root 4096 Nov 19 10:30 sampling_rate_min -rw-r--r-- 1 root root 4096 Nov 19 10:30 up_threshold |
The ignore_nice_load file can be set to 0 or 1
(with 0 being the default). When this parameter is set to 1, any processes
with a "nice" value will not be counted toward the overall processor
utilization. When it is set to 0, all processes are counted toward the
utilization. This setting is useful when you are running something that
requires a lot of processor but you don't care about the runtime. If you
apply the "nice" setting to the process, you can prevent it from
influencing the frequency decisions.
Next, the powersave_bias file is a setting that
was brought about to slightly modify the behavior of the ondemand governor
in order to save more power when the user has less emphasis on performance
by reducing its target frequency by a specified percent. This setting can
be set to a value between 1 and 1000 which will result in a 0.1 percent to
100 percent reduction in frequency.
The sampling_rate, measured in microseconds,
determines how often the governor will look at the processor utilization
so that it can determine which frequency to set. This setting must be set
to a value in between the values of
sampling_rate_min and
sampling_rate_max.
Lastly, the up_threshold setting allows the
user to change the max processor utilization threshold that triggers a
change in processor frequencies. By default the
up_threshold value is 80. This means that every
sampling_rate, the kernel will check the
processor utilization and if it is above 80 percent utilized, the governor
will increase the frequency to the maximum frequency available.
If you load the conservative governor, you will see a directory called
conservative in the cpufreq directory. Inside this directory are many
tunable settings. All of the writable (by root) files can be changed by
echoing in the new value as shown previously.
Note that any changes to the conservative settings will be applied
systemwide so you will not need to change the setting for each processor.
Listing 11. Checking tunable settings for conservative
[root@systemx ~]# cd /sys/devices/system/cpu/cpu0/cpufreq/conservative/ [root@systemx conservative]# ls -l total 0 -rw-r--r-- 1 root root 4096 Nov 19 11:31 down_threshold -rw-r--r-- 1 root root 4096 Nov 19 11:31 freq_step -rw-r--r-- 1 root root 4096 Nov 19 11:31 ignore_nice_load -rw-r--r-- 1 root root 4096 Nov 19 11:31 sampling_down_factor -rw-r--r-- 1 root root 4096 Nov 19 11:31 sampling_rate -r--r--r-- 1 root root 4096 Nov 19 11:31 sampling_rate_max -r--r--r-- 1 root root 4096 Nov 19 11:31 sampling_rate_min -rw-r--r-- 1 root root 4096 Nov 19 11:31 up_threshold |
The ignore_nice_load,
sampling_rate,
sampling_rate_max,
sampling_rate_min, and
up_threshold settings are the same settings
described earlier with the ondemand governor.
The conservative governor also allows the user to set the
down_threshold. For example, by default the
down_threshold is set to 20. This means that
every sampling_rate, the kernel will check the
processor utilization and if it is below 20 percent utilized, the governor
will decrease the frequency.
The freq_step setting changes the size of the
frequency step the governor uses to change CPU frequency in either
direction. By default this value is set to 5 which means the governor will
change the frequency by 5 percent of the maximum or minimum frequency each
time it makes a decision to change frequencies. If you set this value to
100, the governor will act exactly like the ondemand governor.
Lastly, the sampling_down_factor works as a
multiplier with the sampling_rate to lessen how
often the processor utilization is sampled. For example, if the
sampling_rate was set to 10,000 and the
sampling_down_factor was set to 2, the kernel
would sample the processor utilization every 20,000 microseconds.
Now we'll examine the two scheduler tunables —
sched_mc_power_savingsfor scheduling processes on cores.sched_smt_power_savingsfor scheduling processes on hyperthreads on a core.
The sched_mc_power_savings is a scheduler
tunable located in the /sys/devices/system/cpu/ directory. Don't forget to
set the CONFIG_SCHED_MC config file option to
y as discussed in the setup section (from
Part 1)
if you want to use this tunable.
Listing 12. Checking location of sched_mc_power_savings
[root@systemx ~]# cd /sys/devices/system/cpu/ [root@systemx cpu]# ls -l total 0 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu0 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu1 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu2 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu3 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu4 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu5 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu6 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu7 -rwxrwxr-x 1 root root 4096 Nov 19 09:54 sched_mc_power_savings |
The sched_mc_power_savings file can be set to 0
or 1; 0 is the default. When it is set to 1, the scheduler tries to
schedule processes on as few cores as possible so that the others can go
idle. In other words, if all the processors are a little busy,
sched_mc_power_savings tries to consolidate the
work onto the fewest number of processors possible. This in turn allows
some processors to be idle for longer which saves power, especially if the
processor supports some sort of deep-sleep state like C states where it
draws very little power when idle. The actual power savings can vary
depending on many factors, including how many processors are available and
which CPUfreq governor is running. When
sched_mc_power_savings is set to 0, no special
scheduling is done.
The sched_smt_power_savings tunable is another
scheduler tunable also located in the /sys/devices/system/cpu/ directory;
however, this tuning option is available only on systems that support
hyperthreading. Don't forget to set the
CONFIG_SCHED_SMT config file option to
y as discussed in the setup section (in Part 1)
if you want to use this tunable.
Listing 13. Checking location of sched_smt_power_savings
[root@systemx ~]# cd /sys/devices/system/cpu/ [root@systemx cpu]# ls -l total 0 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu0 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu1 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu2 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu3 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu4 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu5 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu6 drwxr-xr-x 5 root root 0 Nov 12 17:45 cpu7 -rwxrwxr-x 1 root root 4096 Nov 19 09:54 sched_mc_power_savings -rwxrwxr-x 1 root root 4096 Nov 19 09:54 sched_smt_power_savings |
Similar to the sched_mc_power_savings setting,
the sched_smt_power_savings file can be set to
0 or 1; 0 is the default. When it is set to 1, the scheduler tries to
schedule processes to as few hyperthreads on a core as possible so that
the others can go idle and in turn save power through idle C states.
In Part 3, I will discuss the effects each of the governors can have on different workloads using two popular configuration workloads.
Learn
- Check out these additional materials on
power consumption:
- The tutorial "How to make use of Dynamic Frequency Scaling"
- The tutorial "Enhanced Intel SpeedStep Technology and Demand-Based Switching on Linux"
- The article "Making power policy just work" (on power schedulers)
- The article "CPU frequency scaling in Linux"
- The documentation "Linux CPUfreq Governors" (on CPU frequency and voltage scaling code in the Linux kernel)
- The Gentoo "Power Management Guide" (comes with a caveat — for laptops, so don't apply to servers unless you know what you're doing!)
- The tutorial "How to use CPU frequency scaling (cpufreq)"
- This wiki entry on CPU Frequency Scaling
- The tutorial "Scheduler tunables for multi-socket systems"
- And data on the CPUfreq subsystem from kernel.org
- Here are more details on daemons that
work with the userspace governor to adjust the processor frequency:
- cpudyn is CPU dynamic frequency control, a daemon that bases frequency changes off of the processor load; also has the ability to put disks in standby.
- cpufreqd is a daemon that can be configured to react to battery level, AC status, temperature, running programs, processor usage, etc.
- cpuspeed is a daemon that can change frequencies based on processor demand, power supply changes, temperature, etc.
- powernowd is a daemon that bases frequency changes off of the processor load and has four different behavioral modes.
- Review the
list of hardware that supports the CPUfreq subsystem.
- Need help rebuilding/rebooting your
kernel? Try Kwan Lowe's
"Kernel Rebuild
Guide."
-
In the
developerWorks Linux zone,
find more resources for Linux developers, and scan our
most popular articles and
tutorials.
-
See all
Linux tips and
Linux tutorials on developerWorks.
-
Stay current with
developerWorks technical events and Webcasts.
Get products and technologies
- Download the
carlthompson.net version of cpuspeed.
-
With
IBM trial software,
available for download directly from developerWorks, build your next development
project on Linux.
Discuss
- Participate in the discussion forum.
-
Get involved in the
My developerWorks community; with your personal profile and custom home page, you
can tailor developerWorks to your interests and interact with other developerWorks users.





