How do I boot the operating system with no "holes" on the processor numbering?
With holes on the processor naming, I am having some challenges to "assign" OMP threads to processors. Are there any tricks to this?
Also, watching "top" and found that numactl might not do this properly ?
Topic
-
Re: How do I handle the "holes" in the CPU numbering on POWER7 ?
2010-05-11T22:13:17ZThis is the accepted answer. This is the accepted answer.
On Power systems, there are holes in CPU numbering when some HW threads are not used, e.g., the ST mode (CPUs 0, 4, 8, ...) or the SMT2 mode (CPUs 0, 1, 4, 5, ...) with POWER7 processors. There are no holes in the SMT4 mode (CPUs 0, 1, 2, 3, 4, 5, 6, 7, ...) since every HW thread is used.
We can run "ppc64_cpu --smt=0", "ppc64_cpu --smt=2" and "ppc64_cpu --smt=4" to set different modes dynamically.
Even with holes on POWER7 systems, we can use the environment parameter "XLSMPOPTS" to bind OpenMP threads to specified HW threads, assuming that the IBM XL compiler is being used.
Depending on the SMT mode, XLSMPOPTS s set differently.
For the ST mode (CPUs 0, 4, 8, 12, ...), set XLSMPOPTS=STARTPROC=0:STRIDE=4. It means the first OpenMP thread is bound to CPU0, the second to CPU4 (0+4), the third to CPU8 (4+4), and so on.
For the SMT4 mode (CPUs 0, 1, 2, 3, 4, ..), set XLSMPOPTS=STARTPOC=0:STRIDE=1. It means the first OpenMP thread is bound to CPU0, the second to CPU1 (0+1), the third to CPU2 (1+1), and so on.
However, we cannot use STRIDE=2 for the SMT2 mode (CPUs 0, 1, 4, 5, 8, 9, ...), since the second thread cannot be bound to CPU2 (0+2) which is not enabled. The IBM XL beta compiler has a new option in XLSMPOPTS to handle it.
We can set XLSMPOPTS=proc='0,1,4,5,...'. The first OpenMP thread is bound to CPU0, the second to CPU1, the third to CPU4, the fourth to CPU5, and so on, just following the explicit list of CPU numbers.
Definitely, we can use OMP_NUM_THREADS=N to create N OpenMP threads for each involved process. -
Re: How do I handle the "holes" in the CPU numbering on POWER7 ?
2010-05-12T13:48:11ZThis is the accepted answer. This is the accepted answer.
- wpeter
- 2010-05-11T22:13:17Z
On Power systems, there are holes in CPU numbering when some HW threads are not used, e.g., the ST mode (CPUs 0, 4, 8, ...) or the SMT2 mode (CPUs 0, 1, 4, 5, ...) with POWER7 processors. There are no holes in the SMT4 mode (CPUs 0, 1, 2, 3, 4, 5, 6, 7, ...) since every HW thread is used.
We can run "ppc64_cpu --smt=0", "ppc64_cpu --smt=2" and "ppc64_cpu --smt=4" to set different modes dynamically.
Even with holes on POWER7 systems, we can use the environment parameter "XLSMPOPTS" to bind OpenMP threads to specified HW threads, assuming that the IBM XL compiler is being used.
Depending on the SMT mode, XLSMPOPTS s set differently.
For the ST mode (CPUs 0, 4, 8, 12, ...), set XLSMPOPTS=STARTPROC=0:STRIDE=4. It means the first OpenMP thread is bound to CPU0, the second to CPU4 (0+4), the third to CPU8 (4+4), and so on.
For the SMT4 mode (CPUs 0, 1, 2, 3, 4, ..), set XLSMPOPTS=STARTPOC=0:STRIDE=1. It means the first OpenMP thread is bound to CPU0, the second to CPU1 (0+1), the third to CPU2 (1+1), and so on.
However, we cannot use STRIDE=2 for the SMT2 mode (CPUs 0, 1, 4, 5, 8, 9, ...), since the second thread cannot be bound to CPU2 (0+2) which is not enabled. The IBM XL beta compiler has a new option in XLSMPOPTS to handle it.
We can set XLSMPOPTS=proc='0,1,4,5,...'. The first OpenMP thread is bound to CPU0, the second to CPU1, the third to CPU4, the fourth to CPU5, and so on, just following the explicit list of CPU numbers.
Definitely, we can use OMP_NUM_THREADS=N to create N OpenMP threads for each involved process.
Keep in mind there are other ways of binding processes and threads to CPUs which may leverage different mechanisms available in Linux. The example provided by Peter is an OMP method taking advantage of the IBM XL compilers. -
Re: How do I handle the "holes" in the CPU numbering on POWER7 ?
2010-05-12T14:53:47ZThis is the accepted answer. This is the accepted answer.
- wpeter
- 2010-05-11T22:13:17Z
On Power systems, there are holes in CPU numbering when some HW threads are not used, e.g., the ST mode (CPUs 0, 4, 8, ...) or the SMT2 mode (CPUs 0, 1, 4, 5, ...) with POWER7 processors. There are no holes in the SMT4 mode (CPUs 0, 1, 2, 3, 4, 5, 6, 7, ...) since every HW thread is used.
We can run "ppc64_cpu --smt=0", "ppc64_cpu --smt=2" and "ppc64_cpu --smt=4" to set different modes dynamically.
Even with holes on POWER7 systems, we can use the environment parameter "XLSMPOPTS" to bind OpenMP threads to specified HW threads, assuming that the IBM XL compiler is being used.
Depending on the SMT mode, XLSMPOPTS s set differently.
For the ST mode (CPUs 0, 4, 8, 12, ...), set XLSMPOPTS=STARTPROC=0:STRIDE=4. It means the first OpenMP thread is bound to CPU0, the second to CPU4 (0+4), the third to CPU8 (4+4), and so on.
For the SMT4 mode (CPUs 0, 1, 2, 3, 4, ..), set XLSMPOPTS=STARTPOC=0:STRIDE=1. It means the first OpenMP thread is bound to CPU0, the second to CPU1 (0+1), the third to CPU2 (1+1), and so on.
However, we cannot use STRIDE=2 for the SMT2 mode (CPUs 0, 1, 4, 5, 8, 9, ...), since the second thread cannot be bound to CPU2 (0+2) which is not enabled. The IBM XL beta compiler has a new option in XLSMPOPTS to handle it.
We can set XLSMPOPTS=proc='0,1,4,5,...'. The first OpenMP thread is bound to CPU0, the second to CPU1, the third to CPU4, the fourth to CPU5, and so on, just following the explicit list of CPU numbers.
Definitely, we can use OMP_NUM_THREADS=N to create N OpenMP threads for each involved process.
XLSMPOPTS=PROC.....
The error message is
1587-103 ..... The option strings for the SMP run-time ....... "PROC" contains unexpected or invlaid text in stead of an option name. All SMP run time .. has been set to default .) -
Re: How do I handle the "holes" in the CPU numbering on POWER7 ?
2010-05-12T14:58:06ZThis is the accepted answer. This is the accepted answer.
- null
- 2010-05-12T14:53:47Z
I am getting an error message on
XLSMPOPTS=PROC.....
The error message is
1587-103 ..... The option strings for the SMP run-time ....... "PROC" contains unexpected or invlaid text in stead of an option name. All SMP run time .. has been set to default .) -
Re: How do I handle the "holes" in the CPU numbering on POWER7 ?
2010-05-12T15:18:19ZThis is the accepted answer. This is the accepted answer.
- SystemAdmin
- 2010-05-12T14:58:06Z
Try PROCS=
I missed an 's'. It should be 'procs=...'. -
Re: How do I handle the "holes" in the CPU numbering on POWER7 ?
2010-05-12T16:07:47ZThis is the accepted answer. This is the accepted answer.
So let's be sure everything is correct.
Confirm POWER7 mode
# cat /proc/cpuinfo | grep -m 1 cpu cpu : POWER7 (architected), altivec supported
Confirm correct ppc64_cpu
# ppc64_cpu | grep X ppc64_cpu --smt=X # Set SMT state to X
Confirm correct SMT mode
# ppc64_cpu --smt SMT=2
I have a two core system in SMT=2 mode. I confirm the CPU numbering:
# cat /proc/cpuinfo | grep processor processor : 0 processor : 1 processor : 4 processor : 5
I have stream installed. I have the beta IBM compilers installed for POWER7.
# export PATH=$PATH:/opt/ibmcmp/vac/11.1/bin # xlc -O5 -qsmp=omp -qthreaded stream.c -o stream # XLSMPOPTS=PROCS=0,1,4,5 ./stream
I had modified stream to repeat itself quite a bit, so could watch "top" from another window
top - 16:01:35 up 22:35, 2 users, load average: 0.06, 0.06, 0.07 Tasks: 97 total, 1 running, 96 sleeping, 0 stopped, 0 zombie Cpu0 : 46.4%us, 0.4%sy, 0.0%ni, 2.8%id, 0.0%wa, 0.0%hi, 0.0%si, 50.4%st Cpu1 : 46.5%us, 0.4%sy, 0.0%ni, 3.1%id, 0.0%wa, 0.1%hi, 0.0%si, 49.9%st Cpu4 : 45.5%us, 0.5%sy, 0.0%ni, 4.5%id, 0.1%wa, 0.1%hi, 0.0%si, 49.3%st Cpu5 : 45.6%us, 0.5%sy, 0.0%ni, 4.5%id, 0.0%wa, 0.0%hi, 0.0%si, 49.5%st Mem: 10170M total, 3493M used, 6676M free, 325M buffers
I confirmed the similar steps with XLF (Fortan).
Please be sure all of the setup steps are correct. -
Re: How do I handle the "holes" in the CPU numbering on POWER7 ?
2010-05-12T18:25:54ZThis is the accepted answer. This is the accepted answer.
- SystemAdmin
- 2010-05-12T16:07:47Z
So let's be sure everything is correct.
Confirm POWER7 mode
<pre class="jive-pre"># cat /proc/cpuinfo | grep -m 1 cpu cpu : POWER7 (architected), altivec supported
</pre>
Confirm correct ppc64_cpu
<pre class="jive-pre"># ppc64_cpu | grep X ppc64_cpu --smt=X # Set SMT state to X
</pre>
Confirm correct SMT mode
<pre class="jive-pre"># ppc64_cpu --smt SMT=2
</pre>
I have a two core system in SMT=2 mode. I confirm the CPU numbering:
<pre class="jive-pre"># cat /proc/cpuinfo | grep processor processor : 0 processor : 1 processor : 4 processor : 5
</pre>
I have stream installed. I have the beta IBM compilers installed for POWER7.
<pre class="jive-pre"># export PATH=$PATH:/opt/ibmcmp/vac/11.1/bin # xlc -O5 -qsmp=omp -qthreaded stream.c -o stream # XLSMPOPTS=PROCS=0,1,4,5 ./stream
</pre>
I had modified stream to repeat itself quite a bit, so could watch "top" from another window
<pre class="jive-pre">top - 16:01:35 up 22:35, 2 users, load average: 0.06, 0.06, 0.07 Tasks: 97 total, 1 running, 96 sleeping, 0 stopped, 0 zombie Cpu0 : 46.4%us, 0.4%sy, 0.0%ni, 2.8%id, 0.0%wa, 0.0%hi, 0.0%si, 50.4%st Cpu1 : 46.5%us, 0.4%sy, 0.0%ni, 3.1%id, 0.0%wa, 0.1%hi, 0.0%si, 49.9%st Cpu4 : 45.5%us, 0.5%sy, 0.0%ni, 4.5%id, 0.1%wa, 0.1%hi, 0.0%si, 49.3%st Cpu5 : 45.6%us, 0.5%sy, 0.0%ni, 4.5%id, 0.0%wa, 0.0%hi, 0.0%si, 49.5%st Mem: 10170M total, 3493M used, 6676M free, 325M buffers
</pre>
I confirmed the similar steps with XLF (Fortan).
Please be sure all of the setup steps are correct.
# ppc64_cpu --smt=off # ppc64_cpu --smt SMT is off
Checking CPU numbering
# cat /proc/cpuinfo | grep proc processor : 0 processor : 4
Running stream again
# XLSMPOPTS=PROCS=0,4 ./stream
Watching from "top"
top - 18:23:16 up 1 day, 57 min, 2 users, load average: 0.16, 0.05, 0.01 Tasks: 77 total, 1 running, 76 sleeping, 0 stopped, 0 zombie Cpu0 : 59.9%us, 0.5%sy, 0.0%ni, 3.1%id, 0.0%wa, 0.1%hi, 0.0%si, 36.4%st Cpu4 : 59.1%us, 0.6%sy, 0.0%ni, 4.2%id, 0.0%wa, 0.2%hi, 0.0%si, 35.9%st
Alternatively, with this mode, you can specify stride=4 to execute on the SMT=off threads.
# XLSMPOPTS=startproc=0:stride=4 ./stream