None.
Inserts prefetch instructions automatically where there are opportunities to improve code performance.
When -qprefetch is in effect, the compiler may insert prefetch instructions in compiled code. When -qnoprefetch is in effect, prefetch instructions are not inserted in compiled code.
.-:-------------------------------. .-prefetch---. V | >>- -q--+-noprefetch-+----+-----------------------------+-+---->< +-=--assistthread--=--+-SMT-+-+ | '-CMP-' | +-=--noassistthread-----------+ | .-noaggressive-. | '-=--+-aggressive---+---------'
The -qnoprefetch option does not prevent built-in functions such as __prefetch_by_stream from generating prefetch instructions.
When you run -qprefetch=assistthread, the compiler uses the delinquent load information to perform analysis and generates prefetching assist threads. The delinquent load information can either be provided through the built-in __mem_delay function (const void *delinquent_load_address, const unsigned int delay_cycles), or gathered from dynamic profiling using -qpdf1=level=2.
Here is how you generate code using assist threads with __MEM_DELAY:
int y[64], x[1089], w[1024];
void foo(void){
int i, j;
for (i = 0; i &l; 64; i++) {
for (j = 0; j < 1024; j++) {
/* what to prefetch? y[i]; inserted by the user */
__mem_delay(&y[i], 10);
y[i] = y[i] + x[i + j] * w[j];
x[i + j + 1] = y[i] * 2;
}
}
}
void foo@clone(unsigned thread_id, unsigned version)
{ if (!1) goto lab_1;
/* version control to synchronize assist and main thread */
if (version == @2version0) goto lab_5;
goto lab_1;
lab_5:
@CIV1 = 0;
do { /* id=1 guarded */ /* ~2 */
if (!1) goto lab_3;
@CIV0 = 0;
do { /* id=2 guarded */ /* ~4 */
/* region = 0 */
/* __dcbt call generated to prefetch y[i] access */
__dcbt(((char *)&y + (4)*(@CIV1)))
@CIV0 = @CIV0 + 1;
} while ((unsigned) @CIV0 < 1024u); /* ~4 */
lab_3:
@CIV1 = @CIV1 + 1;
} while ((unsigned) @CIV1 < 64u); /* ~2 */
lab_1:
return;
}