dcbt (Data Cache Block Touch) instruction

Purpose

Allows a program to request a cache block fetch before it is actually needed by the program.

Note: The dcbt instruction is supported for POWER5™ and later architecture.

Syntax

Bits Value
0-5 31
6-10 TH
11-15 RA
16-20 RB
21-30 278
31 /
POWER5™
dcbt RA , RB, TH

Description

The dcbt instruction may improve performance by anticipating a load from the addressed byte. The block containing the byte addressed by the effective address (EA) is fetched into the data cache before the block is needed by the program. The program can later perform loads from the block and may not experience the added delay caused by fetching the block into the cache. Executing the dcbt instruction does not invoke the system error handler.

If general-purpose register (GPR) RA is not 0, the effective address (EA) is the sum of the content of GPR RA and the content of GPR RB. Otherwise, the EA is the content of GPR RB.

Consider the following when using the dcbt instruction:

  • If the EA specifies a direct store segment address, the instruction is treated as a no-op.
  • The access is treated as a load from the addressed cache block with respect to protection. If protection does not permit access to the addressed byte, the dcbt instruction performs no operations.
    Note: If a program needs to store to the data cache block, use the dcbtst (Data Cache Block Touch for Store) instruction.
The Touch Hint (TH) field is used to provide a hint that the program will probably load soon from the storage locations specified by the EA and the TH field. The hint is ignored for locations that are caching-inhibited or guarded. The encodings of the TH field depend on the target architecture selected with the -m flag or the .machine pseudo-op. The encodings of the TH field on POWER5™ and subsequent architectures are as follows:
TH Values Description
0000 The program will probably soon load from the byte addressed by EA.
0001 The program will probably soon load from the data stream consisting of the block containing the byte addressed by EA and an unlimited number of sequentially following blocks. The sequentially preceding blocks are the bytes addressed by EA + n * block_size, where n = 0, 1, 2, and so on.
0011 The program will probably soon load from the data stream consisting of the block containing the byte addressed by EA and an unlimited number of sequentially preceding blocks. The sequentially preceding blocks are the bytes addressed by EA - n * block_size where n = 0, 1, 2, and so on.
1000 The dcbt instruction provides a hint that describes certain attributes of a data stream, and optionally indicates that the program will probably soon load from the stream. The EA is interpreted as described in Table 1.
1010 The dcbt instruction provides a hint that describes certain attributes of a data stream, or indicates that the program will probably soon load from data streams that have been described using dcbt instructions in which TH[0] = 1 or probably no longer load from such data streams. The EA is interpreted as described in Table 2.

The dcbt instruction serves as both a basic and extended mnemonic. The dcbt mnemonic with three operands is the basic form, and the dcbt with two operands is the extended form. In the extended form, the TH field is omitted and assumed to be 0b0000.

Table 1. EA Encoding when TH=0b1000
Bit(s) Name Description
0-56 EA_TRUNC High-order 57 bits of the effective address of the first unit of the data stream.
57 D Direction
0
Subsequent units are the sequentially following units.
1
Subsequent units are the sequentially preceding units.
58 UG
0
No information is provided by the UG field.
1
The number of units in the data stream is unlimited, the program's need for each block of the stream is not likely to be transient, and the program will probably soon load from the stream.
59 Reserved Reserved
60–63 ID Stream ID to use for this stream.

Table 2. EA Encoding when TH=0b1010
Bit(s) Name Description
0-31 Reserved Reserved
32 GO
0
No information is provided by the GO field
1
The program will probably soon load from all nascent data streams that have been completely described, and will probably no longer load from all other data streams.
33-34 S Stop
00
No information is provided by the S field.
01
Reserved
10
The program will probably no longer load from the stream associated with the Stream ID (all other fields of the EA are ignored except for the ID field).
11
The program will probably no longer load from the data streams associated with all stream IDs (all other fields of the EA are ignored).
35-46 Reserved Reserved
47-56 UNIT_CNT Number of units in the data stream.
57 T
0
No information is provided by the T field.
1
The program's need for each block of the data stream is likely to be transient (that is, the time interval during which the program accesses the block is likely to be short).
58 U
0
No information is provided by the U field.
1
The number of units in the data stream is unlimited (and the UNIT_CNT field is ignored).
59 Reserved Reserved
60-63 ID Stream ID to use for this stream.

The dcbt instruction has one syntax form and does not affect the Condition Register field 0 or the Fixed-Point Exception register.

Parameters

Item Description
RA Specifies source general-purpose register for EA computation.
RB Specifies source general-purpose register for EA computation.
TH Indicates when a sequence of data cache blocks might be needed.

Examples

The following code sums the content of a one-dimensional vector:


# Assume that GPR 4 contains the address of the first element
# of the sum.
# Assume 49 elements are to be summed.
# Assume the data cache block size is 32 bytes.
# Assume the elements are word aligned and the address
# are multiples of 4.
        dcbt    0,4              # Issue hint to fetch first
                                 # cache block.
        addi    5,4,32           # Compute address of second
                                 # cache block.
        addi    8,0,6            # Set outer loop count.
        addi    7,0,8            # Set inner loop counter.
        dcbt    0,5              # Issue hint to fetch second
                                 # cache block.
        lwz     3,4,0            # Set sum = element number 1.
bigloop:
        addi    8,8,-1           # Decrement outer loop count
                                 # and set CR field 0.
        mtspr   CTR,7            # Set counter (CTR) for
                                 # inner loop.
        addi    5,5,32           # Computer address for next
                                 # touch.
lttlloop:
        lwzu    6,4,4            # Fetch element.
        add     3,3,6            # Add to sum.
        bc      16,0,lttlloop    # Decrement CTR and branch
                                 # if result is not equal to 0.
        dcbt    0,5              # Issue hint to fetch next
                                 # cache block.
        bc      4,3,bigloop      # Branch if outer loop CTR is
                                 # not equal to 0.
        end                      # Summation complete.