I've been running this patch on my QS22 cluster (BladeCenter H with 12 QS22 blades, each with 2 PowerXCell 8i processors).
Problem: Only half of the SPEs are running the xhpl process. This is shown by looking in /spu, at spu-top, and other means. I've confirmed that I can get other programs (such as the 'mandelbrot' demo) to run on all 16 SPEs. But xhpl only runs on the first 8, regardless of how many MPI processes, numactl, and other things I tried.
In the hpl/accel/lib directory, there is a variable in hpl_accel_global.h that sounds like it should help: HPL_ACCEL_CMD_ENTRIES. I set this to 16 (from 8), also HPL_ACCEL_SPES in hpl_accel_spu.h. This increases the number of threads (tasks?) on the SPEs, but they are still only running on all 8 of the SPEs on the first Cell BE, and none of the second Cell's SPEs.
I've followed all the directions in the README supplied by the patch (referenced at the netlib site), recompiled, etc. I have the 3.0.5 SDK, not yet 3.1.
When I ran with just 8 tasks per node, HPL hasn't finished yet (I let it run for over 16 hours, before killing it and experimenting with other options). Maybe this is a symptom, or maybe unrelated.
Has anyone else encountered this? There are relatively few QS22 systems out there, and relatively few people trying to run Linpack, and maybe not everyone looks at whether all SPEs are busy. But I did, and am perplexed!
I did try setting the environment variable BLAS_NUMSPES=16, but this seemed to have no impact. Again, with other applications (not the patched HPL), I'm able to run all 16 SPEs successfully.
TIA. gbn
PS: here's what spu-top looks like. There are 16 tasks, but they are all running on only 8 SPEs:
spu-top: SPU View
Cpu(s) load avg: 0.22, 0.05, 0.06
Spu(s) load avg: 16.07, 16.67, 16.85
Cpu(s): 24.6%us, 1.0%sys, 0.0%wait, 0.0%nice, 74.4%idle
Spu(s): 49.7%us, 0.1%sys, 0.0%wait, 50.2%idle
SPE %SPU %USR %SYS %WAI S SLB HFLT mFLT MFLT IRQ2 PPE_LIB
0 100.0 99.8 0.2 0.0 U 54010 580 578 0 2812 0
1 100.0 99.8 0.2 0.0 U 55130 586 583 0 3072 0
2 100.0 99.8 0.2 0.0 U 53983 477 475 0 2802 0
3 100.0 99.8 0.2 0.0 U 56609 671 669 0 3275 0
4 100.0 99.8 0.2 0.0 U 54455 714 709 0 2972 0
5 100.0 99.8 0.2 0.0 U 58496 872 865 0 3652 0
6 100.0 99.8 0.2 0.0 U 54710 820 815 0 2980 0
7 100.0 99.8 0.2 0.0 U 63905 2179 2175 0 4938 0
8 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0
9 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0
10 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0
11 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0
12 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0
13 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0
14 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0
15 0.0 0.0 0.0 0.0 I 0 0 0 0 0 0