Take charge of processor affinity

Why (three reasons) and how to use hard (versus soft) CPU affinity

Knowing a little bit about how the Linux® 2.6 scheduler treats CPU affinity can help you design better userspace applications. Soft affinity means that processes do not frequently migrate between processors, whereas hard affinity means that processes run on processors you specify. This article describes current affinity mechanisms, explains why and how to use hard affinity, and provides sample code showing you how to use the available functionality.

Share:

Eli Dow, Software Engineer, IBM Linux Test and Integration Center

Eli Dow is a Software Engineer in the IBM Linux Test and Integration Center in Poughkeepsie, NY. He holds a B.S. degree in Computer Science and Psychology and a Masters of Computer Science from Clarkson University. His interests include the GNOME desktop, human computer interaction, and Linux systems programming. You can contact Eli at emdow@us.ibm.com.



29 September 2005

Also available in Japanese

Simply stated, CPU affinity is the tendency for a process to run on a given CPU as long as possible without being moved to some other processor. The Linux kernel process scheduler inherently enforces what is commonly referred to as soft CPU affinity, which means that processes generally do not frequently migrate between processors. This state is desirable because processes that seldom migrate often incur less overhead.

The 2.6 Linux kernel also contains a mechanism that allows developers to programmatically enforce hard CPU affinity. This means your applications can explicitly specify which processor (or set of processors) a given process may run on.

What is Linux kernel hard affinity?

In the Linux kernel, all processes have a data structure associated with them called the task_struct. This structure is important for a number of reasons, most pertinent being the cpus_allowed bitmask. This bitmask consists of a series of n bits, one for each of n logical processors in the system. A system with four physical CPUs would have four bits. If those CPUs were hyperthread-enabled, they would have an eight-bit bitmask.

If a given bit is set for a given process, that process may run on the associated CPU. Therefore, if a process is allowed to run on any CPU and allowed to migrate across processors as needed, the bitmask would be entirely 1s. This is, in fact, the default state for processes under Linux.

The Linux kernel API includes some methods to allow users to alter the bitmask or view the current bitmask:

  • sched_set_affinity() (for altering the bitmask)
  • sched_get_affinity() (for viewing the current bitmask)

Note that cpu_affinity is passed on to child threads,so you should place calls to the sched_set_affinity appropriately.


Why should you use hard affinity?

Normally the Linux kernel does a good job of scheduling processes to run where they should (that is, running on available processors and obtaining good overall performance). The kernel includes algorithms for detecting skewed workloads across CPUs, enabling process migration to less busy processors.

As a rule of thumb, you should simply use the default scheduler behaviors in your applications. However, you might want to alter these default behaviors to optimize performance. Let's look at three reasons for using hard affinity.

Reason 1. You have a hunch

Hunch-based scenarios come up so often in scientific and academic computing that they are no doubt applicable to public-sector computing as well. A common indicator is when you know intuitively that your application will need to consume a lot of computational time on multiprocessor machines.

Reason 2. You are testing complex applications

Testing complex software is another reason to be interested in the kernel affinity technology. Consider an application that requires linear scalability testing. Some products claim to perform better with a throw-more-hardware-at-it mantra.

Rather than just purchasing multiple machines (a machine for each processor configuration), you can:

  • Purchase a single multiprocessor machine
  • Incrementally allocate processors
  • Measure your transactions per second
  • Plot the resulting scalability

If your application truly does scale linearly with CPU additions, a plot of transactions-per-second versus number of CPUs should yield a linear relationship (such as a straight diagonal graph -- see the next paragraph). Modeling behavior this way can indicate whether your application can use the underlying hardware efficiently.

Amdahl's Law

Amdahl's Law governs the speedup of using parallel processors on a problem versus using only one serial processor. Speedup is the time it takes a program to execute in serial (with one processor) divided by the time it takes to execute in parallel (with many processors):
T(1)

S = ------

T(j)

Where T(j) is the time it takes to execute the program when using j processors.

Amdahl's Law states this probably won't happen in reality, but the closer the better. For the general case, we can deduce that every program will have some sequential component. As problems sets get larger, the sequential component eventually places an upper limit on the optimal solution time.

Amdahl's Law is especially important when you want to keep the CPU cache hit rate high. If a given process gets migrated, it loses the benefits of the CPU cache. In fact, if the CPU you are using needs to cache some specific piece of data for itself, all other CPUs invalidate any entry (for that data) from their own cache.

So, if multiple threads need the same data, it might make sense to bind them to a particular CPU to ensure they all have access to the cached data (or at least improve the odds of a cache hit). Otherwise, the threads might execute on different CPUs and constantly invalidate each other's cache entries.

Reason 3. You are running time-sensitive, deterministic processes

A final reason to be interested in CPU affinity is for real-time (time-sensitive) processes. For example, you might wish to use hard affinity to specify one processor on an eight-way machine, while allowing the other seven processors to handle all the normal scheduling needs of the system. This action ensures that your long-running, time-sensitive application gets run, and also allows other/another application(s) to monopolize the remaining computing resources.

The following sample application shows how this all works.


How to code hard affinity

Let's devise a program to make a Linux system very busy. You can construct this program using the system calls mentioned previously along with some other APIs that indicate how many processors are on the system. In essence, the goal is to write a program that can make each processor in a system busy for a few seconds. Download the sample application from the "Download" section below.

Listing 1. Keeping the processors busy
/* This method will create threads, then bind each to its own cpu. */

bool do_cpu_stress(int numthreads)

{

   int ret = TRUE;

   int created_thread = 0;



   /* We need a thread for each cpu we have... */

   while ( created_thread < numthreads - 1 )

   {

      int mypid = fork();



      if (mypid == 0) /* Child process */

       {

          printf("\tCreating Child Thread: #%i\n", created_thread);

          break;

      }



      else /* Only parent executes this */

      {

          /* Continue looping until we spawned enough threads! */ ;

          created_thread++;

      }

   }



   /* NOTE: All threads execute code from here down! */

As you can see, the code simply creates a bunch of threads by forking. Each executes the remaining code in the method. Now let's have each thread set affinity to its own CPU.

Listing 2. Setting CPU affinity for each thread
   cpu_set_t mask;



   /* CPU_ZERO initializes all the bits in the mask to zero. */

        CPU_ZERO( &mask );



   /* CPU_SET sets only the bit corresponding to cpu. */

        CPU_SET( created_thread, &mask );



   /* sched_setaffinity returns 0 in success */

        if( sched_setaffinity( 0, sizeof(mask), &mask ) == -1 )

   {

      printf("WARNING: Could not set CPU Affinity, continuing...\n");

   }

If the program executed thus far, our threads would be set with their individual affinity. The call to sched_setaffinity sets the CPU affinity mask of the process denoted by pid. If pid is zero, then the current process is used.

The affinity mask is represented by the bitmask stored in mask. The least significant bit corresponds to the first logical processor number on the system, while the most significant bit corresponds to the last logical processor number on the system.

Each set bit corresponds to a legally schedulable CPU, while an unset bit corresponds to an illegally schedulable CPU. In other words, a process is bound to and will run only on processors whose corresponding bit is set. Usually, all bits in the mask are set. The CPU affinity of each of these threads is passed on to any children forked from them.

Note that you should not alter the bitmask directly. You should use the following macros instead. Though not all were used in our example, they are listed here in case you need them in your own program.

Listing 3. Macros to indirectly alter the bitmask
void CPU_ZERO (cpu_set_t *set)

This macro initializes the CPU set set to be the empty set.



void CPU_SET (int cpu, cpu_set_t *set)

This macro adds cpu to the CPU set set.



void CPU_CLR (int cpu, cpu_set_t *set)

This macro removes cpu from the CPU set set.



int CPU_ISSET (int cpu, const cpu_set_t *set)

This macro returns a nonzero value (true) if cpu is

   a member of the CPU set set, and zero (false) otherwise.

For our purposes, the sample code will go on to have each thread execute some computationally expensive operation.

Listing 4. Each thread executes a compute-intensive operation
    /* Now we have a single thread bound to each cpu on the system */

    int computation_res = do_cpu_expensive_op(41);

    cpu_set_t mycpuid;

    sched_getaffinity(0, sizeof(mycpuid), &mycpuid);



    if ( check_cpu_expensive_op(computation_res) )

    {

      printf("SUCCESS: Thread completed, and PASSED integrity check!\n",

         mycpuid);

      ret = TRUE;

    }

    else

    {

      printf("FAILURE: Thread failed integrity check!\n",

         mycpuid);

      ret = FALSE;

    }



   return ret;

}

There you have the basics of setting CPU affinity for 2.6 Linux kernels. Let's wrap this method call with a fancy main program that takes a user-specified parameter for how many CPUs to make busy. We can even use another method to determine the number of processors in the system:

int NUM_PROCS = sysconf(_SC_NPROCESSORS_CONF);

This method lets the program make wise decisions about how many processors to make busy, such as spinning all by default and allowing users to specify something only in the range of actual processors available on the system.


Running the sample application

When you run the sample application described above, you can use a variety of tools to see that the CPUs are busy. For simple testing, use the Linux command top. Press the "1" key while running top to see a per-CPU breakdown of executing processes.


Conclusion

The sample application, although trivial, shows you the basics of hard affinity as implemented in the Linux kernel. (Any application using this code sample will no doubt do something much more interesting.) At any rate, with a basic understanding of the CPU affinity kernel API, you are in position to squeeze every last drop of performance out of complicated applications.


Download

DescriptionNameSize
Sample app using CPU affinity kernel APIthrasher.zip3 KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=94747
ArticleTitle=Take charge of processor affinity
publish-date=09292005