fdpr Command
Item | Description |
---|---|
-analyse_asm_csects | Analyze csects written in assembly (when used, must be specified at both the -1 and -3 phases). |
-extra_safe_analysis | Do not attempt to analyze unconventional csects containing hand-written assembly code (when used, must be specified at both the -1 and -3 phases). |
-ignore_info | Ignore .info sections produced with the -qfdpr option during compile time (when used, must be specified at both -1 and -3 phases). |
-align bytes | Align frequently executed code according to given number of bytes, for improving code prefetch buffer ratio. If this option is omitted, the fdpr command aligns the code with variable default number of bytes. |
-lr_opt | Eliminate stores and restores of the link register in frequently executed procedures. |
-bt_csect_anchor_removal | Eliminate load instructions related to the usage of branch tables in the code. |
-dead_code_removal | Remove unreachable code. |
-selective_inline | Perform selective inlining for functions that are frequently called from a single dominant call site. |
-sid_fac percent | Set a dominant factor percentage for selective inline optimization. The allowed range is between 50 - 100 (applicable only with the -selective_inline flag). |
-inline_small_funcs size | Inline all functions that are smaller or equal to the given size in bytes. |
-inline_hot_funcs percent | Inline all functions with an execution frequency equals or greater than the given percentage. The input percent range is between 0 - 100. |
-inline | Perform -inline_small_funcs 12 with -selective_inline. |
-hco_resched | Relocate instructions from frequently executed code to rarely executed code area, when possible. |
-dcbt_opt | Insert dcbt instructions to improve data-cache performance. |
-killed_regs | Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls. |
-tb | Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically restored for C++ applications using Try & Catch mechanism. |
-pc | Preserve csects' boundaries in reordered code. |
-pp | Preserve functions' boundaries in reordered code. |
-RD | Perform static data reordering. |
-dpnf factor | Data Placement Normalization Factor between 0 - 1; where 0 causes static variables to be reordered regardless of their size, whereas 1 will locate only small sized variables first (applicable only with the -RD flag). |
-dpht threshold | Data Placement Hotness Threshold between 0 - 1; where 0 reorders the static variables in large groups based on the control flow, and whereas 1 will reorder the variables in very small groups based on their access frequency (applicable only with the -RD flag). |
-build_dcg | Build DCG (Data Connectivity Graph) for enhanced data reordering (applicable only with the -RD flag). |
-tocload | Perform tocload optimization. |
-reduce_toc removal_factor | Perform TOC entries removal accordingly to removal factor between 0 - 1, where 0 removes only non-accessed TOC entries and 1 removes all non-exported TOC entries. |
-strip | Strip the output file (if any is produced). |
-ptrgl_opt | Perform optimization of indirect call instructions by way of registers by replacing them with direct jumps. |
-no_ptrgl_r11 | Do not perform removal of R11 load instruction in _ptrgl csect (the -ptrgl_r11 optimization is applied by default). |
-O | Perform code reordering with branch prediction bit setting, branch folding and NOOP instructions removal. The -O flag is applied by default. |
-O2 | Switch on all less aggressive optimization flags. |
-O3 | Switch on all aggressive optimization flags. |
-O4 | Switch on all aggressive optimization flags. |
Purpose
A performance tuning utility for improving execution time and real memory utilization of user-level post-link application programs.
Syntax
Most Common Usage:
fdpr -p ProgramFile -x WorkloadCommand
Detailed Usage:
fdpr -p ProgramFile [ -M SegNum ] [ -fd Fdesc ] [ -o OutputFile ] [ -armember ArchiveMemberList ] [ OptimizationFlags ] [ -map ] [ -disasm ] [ -disasm_data] [ -disasm_bss] [ -profcount ] [ -quiet] [ -v ] [ -1 | -2 | -3 | -12 | -23 | -123] [ -x WorkloadCommand ]
Optimization Flags
[ -tb ] [ -pc ] [ -pp ] [ -O ][ -O2 ] [ -O3 ] [ -O4 ] [ -selective_inline] [ -sid_fac percent] [ -inline_small_funcs size] [ -inline_hot_funcs percent] [ -hco_resched] [ -killed_regs ] [ -lr_opt] [ -align bytes] [ -RD ] [ -dpnf factor] [ -dpht threshold] [ -build_dcg] [ -tocload ] [-ptrgl_opt ] [ -no_ptrgl_r11] [ -dcbt_opt ] [ -ignore_info] [ -dead_code_removal] [ -bt_csect_anchor_removal] [ -strip] [-analyse_asm_csects] [-extra_safe_analysis] [-inline] [-reduce_toc removal_factor]
Description
The fdpr command (Feedback Directed Program Restructuring) is a performance-tuning utility that may help improve the execution time and the real memory utilization of user-level application programs. The fdpr program optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload, and then creating a new version of the program that is optimized for that workload. The new program generated by fdpr typically runs faster and uses less real memory.
The fdpr command builds an optimized executable program in 3 distinct phases:
- Phase 1 (-1 flag): Creates an instrumented executable program and an empty template profile file.
- Phase 2 (-2 flag): Runs the instrumented program and updates the profile data.
- Phase 3 (-3 flag): Generates the optimized executable program file.
Flags
Item | Description |
---|---|
-1,-2, -3 | Specifies the phase to run. The default is all 3 phases (-123). The -s flag must be used when running separate phases so that the succeeding phases can access the required intermediate files. The phases must be run in order (for example, -1, then -2, then -3, or -1, then -23). The -2 flag must be used along with the invocation flag -x. |
-M SegNum | Specifies where to map shared memory for profiling. The default is 0x30000000. Specify an alternate shared memory address if the program to be optimized or any of the workload command strings invoked with the -x flag use conflicting shared-memory addresses. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000). |
-fd Fdesc | Specifies which file descriptor number is to be used for the profile file that is mapped to the above shared memory area. The default of Fdesc is set to 1999. |
-o OutFile | Specifies the name of the output file from the optimizer. The default is program.fdpr |
-p ProgramFile | Contains the name of the executable program file or shared object file or shared library containing shared objects/executables, to optimize. This program must be an unstripped executable. |
-armember ArchiveMemberList | List of archive members to be optimized, within a shared archive file specified by the -p flag. If -armember is not specified, all members of the archive file are optimized. |
-map | Print a map of basic blocks and static variables with their respective old -> new addresses into a suffixed .mapper file. |
-disasm | Prints the disassembled text section of the output optimized and instrumented program into a suffixed .dis_text file. |
-disasm_data | Prints the disassembled data section of the output optimized and instrumented program into a suffixed .dis_data file. |
-disasm_bss | Prints the disassembled bss section of the output optimized and instrumented program into a suffixed .dis_bss file. |
-profcount | Prints the profiling counters into a suffixed .ncounts file. |
-quiet | Quiet output mode. |
-v | Verbose output. |
-x WorkloadCommand | Specifies the command used for invoking the instrumented program. All the arguments after the -x flag are used for the invocation. Therefore, the -x flag must appear last in the command line. The -x flag is required when the -2 flag is used. |
Optimization Flags
Optimization
The fdpr command performs, by default, the highest possible level of code reordering optimization together with the optimizations of branch prediction bit setting, branch folding, code alignment and removal of redundant NOOP instructions. The -pc flag reorders the entire code while preserving csects' boundaries and therefore, may result in less performance improvement than the default code reordering. Similarly, the -pp flag reorders the entire code while preserving procedures' boundaries.
Additional optimizations performed on the entire executable program file are available by the optimization flags above.
Executables built with the -qfdpr IBM® xl compiler flag contain information to assist fdpr in producing reordered programs. Modules which are not compiled with the -qfdpr option, are reordered based on the compiler signatures in the symbol table.
Additional performance enhancements may be realized by using static linking when building the program to be reordered. Since the fdpr program only reorders the instructions within the executable program specified, any dynamically linked shared library routines called by the program are not optimized. Statically linking these library routines to the executable allows for optimizing both the instructions in the program and all library routines used by the program. There are other advantages as well as disadvantages to building a statically linked program.
Output Files
All files created by the fdpr command are stored in the current directory with the exception of any files which may be created by running the workload command specified in the -x flag. During the optimization process, the original program is saved by renaming the program, and is only restored to the original program name upon successful completion of the final phase.
The profile file created by the fdpr command explicitly uses the full name of the current directory since scripts used to run the program may change the working directory before executing the program.
The files created and/or used by the fdpr command are:
Item | Description |
---|---|
program | Name of the unstripped executable to be optimized. |
program.save | Saved version of the original executable program. |
program.nprof | Name of the profile file. |
program.instr | Name of the instrumented version of program. |
program.fdpr | Default name of optimized executable output file. |
program.instr.dis_text | Default disassembly file in ASCII format produced by -disasm flag after instrumentation phase. |
program.fdpr.dis_text | Default disassembly file in ASCII format produced by -disasm flag after optimization phase. |
program.instr.dis_data | Default disassembly file in ASCII format produced by -disasm_data flag after instrumentation phase. |
program.fdpr.dis_data | Default disassembly file in ASCII format produced by -disasm_data flag after optimization phase. |
program.instr.dis_bss | Default disassembly file in ASCII format produced by -disasm_bss flag after instrumentation phase. |
program.fdpr.dis_bss | Default disassembly file in ASCII format produced by -disasm_bss flag after optimization phase. |
program.instr.mapper | Default mapping file in ASCII format produced by -map flag after instrumentation phase. |
program.fdpr.mapper | Default mapping file in ASCII format produced by -map flag after optimization phase. |
program.ncounts | Default profile counters file in ASCII format produced by -profcount flag. |
Enhanced Debugging Capabilities
In order to enable a certain degree of debugging capability for optimized programs, FDPR updates the Symbol Table to reflect the changes that were made in the .text section.
Entry fields in the Symbol Table that specify addresses of symbols that were relocated during the reordering of FDPR, are modified to point to their new addresses in the .text section.
In addition, in the case where functions or files are split during reordering, FDPR creates new entries in the Symbol Table for each new part of the split function/file. These new parts of the same function are given new symbol names in the Symbol Table according to the following naming convention:
<original function name>__fdpr_<function's part number>
After
code reordering all the new entries are suffixed with the __fdpr_
string.
[Index] m Value Scn Aux Sclass Type Name
[456] m 0x00000230 2 1 0x02 0x0000 .main
main
was
split into 3 parts, then it would have 3 entries in the Symbol Table;
one for each part as follows: [Index] m Value Scn Aux Sclass Type Name
[456] m 0x00000304 2 1 0x02 0x0000 .main
[1447] m 0x00003328 2 1 0x02 0x0000 .main__fdpr_1
[1453] m 0x000033b4 2 1 0x02 0x0000 .main__fdpr_2
Examples
The following are typical usage examples of the fdpr command.
- This example allows the user to run all three phases. In this
example, test1 is the unstripped executable and test2 is
a shell script that invokes test1. The current working directory
is /tmp/fdpr.
Execute the fdpr command (using the default optimization):test2 script file: # code to exercise test1 test1 -expand 100 -root $PATH file.jpg -quit # the end of test2
This results in the new reordered executable test1.fdpr.fdpr -p test1 -x test2
- To run one phase at a time, execute phase one of fdpr.
This command string creates an instrumented version with the namefdpr -1 -p test1
test1.instr
and the empty template profile filetest1.nprof
.To execute phase two:
This command string executes the script file test2 that runs the instrumented version of test1 to collect the profile data.fdpr -2 -p test1 -x test2
To execute phase three:
Again, this results in the new reordered executable test1.fdpr.fdpr -3 -p test1
- To run the first two phases followed by phase three, execute
phase one and two.
Execute phase three using optimization level three.fdpr -12 -p test1 -x test2
fdpr -3 -O3 -p test1
- If an error occurs while running an fdpr optimized program,
the dbx command can be used to determine what procedure the
error occurred in as follows:
which produces the output similar to the following:dbx program.fdpr
Type 'help' for help. reading symbolic information ...warning: no source compiled with -g [using memory image in core] Segmentation fault in proc_d at 0x10000634 0x10000634 (???) 98640000 stb r3,0x0(r4) (dbx)
A stack traceback, which is used to determine how the program arrived at the current location, is produced as follows:
which produces the following output:(dbx) where
proc_d(0x0) at 0x10000634 proc_c(0x0) at 0x10000604 proc_b(0x0) at 0x100005d0 proc_a(0x0) at 0x1000059c main(0x2, 0x2ff7fba4) at 0x1000055c (dbx)
- The dbx subcommand stepi may also be used to single
step through the instructions of a reordered executable program as
follows:
which produces the following output:(dbx) stepi
In this example, dbx indicates that the program stopped in routine proc_d at address 0x1000061c in the reordered text section.stopped in proc_d at 0x1000061c 0x1000061c (???) 9421ffc0 stwu r1,-64(r1) (dbx)
Implementation Specifics
Software Product/Option: AIX® Performance Aide/ Local Performance Analysis & Control Commands.
Standards Compliance: None.
Files
Item | Description |
---|---|
/usr/bin/fdpr | Contains the fdpr command. |
program | Name of the unstripped executable to be optimized. |
program.save | Saved version of the original executable program. |
program.nprof | Name of the profile file. |
program.instr | Name of the instrumented version of program. |
program.fdpr | Default name of optimized executable output file. |
program.instr.dis_text | Default disassembly file in ASCII format produced by -disasm flag after instrumentation phase. |
program.fdpr.dis_text | Default disassembly file in ASCII format produced by -disasm flag after optimization phase. |
program.instr.dis_data | Default disassembly file in ASCII format produced by -disasm_data flag after instrumentation phase. |
program.fdpr.dis_data | Default disassembly file in ASCII format produced by -disasm_data flag after optimization phase. |
program.instr.dis_bss | Default disassembly file in ASCII format produced by -disasm_bss flag after instrumentation phase. |
program.fdpr.dis_bss | Default disassembly file in ASCII format produced by -disasm_bss flag after optimization phase. |
program.instr.mapper | Default mapping file in ASCII format produced by -map flag after instrumentation phase. |
program.fdpr.mapper | Default mapping file in ASCII format produced by -map flag after optimization phase. |
program.ncounts | Default profile counters file in ASCII format produced by -profcount flag. |