Operating systems: SUSE Linux Enterprise 10 and Red Hat Enterprise Linux Advanced Platform 5
]]>echo 200 > /proc/sys/vm/nr_hugepages
Usage: chsyscfg -r lpar | prof | sys | sysprof | frame -m <managed system> | -e <managed frame> -f <configuration file> | -i "<configuration data>" [--help] Changes partitions, partition profiles, system profiles, or the attributes of a managed system or a managed frame. -r - the type of resource(s) to be changed: lpar - partition prof - partition profile sys - managed system sysprof - system profile frame - managed frame -m <managed system> - the managed system's name -e <managed frame> - the managed frame's name -f <configuration file> - the name of the file containing the configuration data for this command. The format is: attr_name1=value,attr_name2=value,... or "attr_name1=value1,value2,...",... -i "<configuration data>" - the configuration data for this command. The format is: "attr_name1=value,attr_name2=value,..." or ""attr_name1=value1,value2,...",..." --help - prints this help The valid attribute names for this command are: -r prof required: name, lpar_id | lpar_name optional: ... lpar_proc_compat_mode (default | POWER6_enhanced)
submit = numactl --membind=\$SPECCOPYNUM --physcpubind=\$SPECCOPYNUM $command
--membind=nodes Only allocate memory from nodes. Allocation will fail when there is not enough memory available on these nodes. --physcpubind=cpus Only execute process on cpus. This accepts physical cpu numbers as shown in the processor fields of /proc/cpuinfo.
HUGETLB_VERBOSE=0 : Turn off any debugging message from libhugetlbfs HUGETLB_MORECORE=yes: Instructs libhugetlbfs to override libc's normal morecore() function with a hugepage version and use it for malloc(). HUGETLB_MORECORE_HEAPBASE=0x50000000: Specifies that the hugepage heap address to start at 0x50000000. XLFRTEOPTS=intrinthrds=1 : Causes the Fortran runtime to only use a single thread.
- First we copied the original executable (baseexe) to baseexe.orig. - Then, the executable is instrumented and its initial profile generated, as follows: $ fdprpro -a instr baseexe The output will be generated (by default) in baseexe.instr and its profile in baseexe.nprof. - Next, run baseexe.instr using the training data. This will fill the profile file with information that characterizes the training workload. - Finally, re-run FDPR-Pro with the profile file provided, as follows: $ fdprpro -a opt -f baseexe.nprof [optimization options] baseexe - We use the following optimization options : -q -O4 -A 32 -shci 90 -sdp 9 Optimization Options Descriptions: -A alignment, --align-code alignment Align program code so that hot code will be aligned on alignment-byte addresses. -abb factor, --align-basic-blocks factor Align basic blocks that are hotter then the average by given (float) factor. This is a lower-level machine-specific alignment compared to --align-code. Value of -1 (the default) disables this option. -bf, --branch-folding Eliminate branch to branch instructions. -bp, --branch-prediction Set branch prediction bit for conditional branches. -dce, --dead-code-elimination Eliminate instructions related to unused local variables within frequently executed functions (useful mainly after applying function inlining optimization). -dp, --data-prefetch Insert dcbt instructions to improve data-cache performance. -ece, --epilog-code-eliminate Reduce code size by grouping common instructions in functions' epilogs, into a single unified code. -hr, --hco-reschedule Relocate instructions from frequently executed code to rarely executed code areas, when possible. -hrf factor, --hco-resched-factor factor Set the aggressiveness of the -hr optimization option according to a factor value between (0,1), where 0 is the least aggressive factor (applicable only with the -hr option). -i, --inline Same as --selective-inline with --inline-small-funcs 12. -ihf pct, --inline-hot-functions pct Inline all function call sites to functions that have a frequency count greater than the given pct frequency percentage. -isf size, --inline-small-funcs size Inline all functions that are smaller or equal to the given size in bytes. -kr, --killed-registers Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls. -lap, --load-address-propagation Eliminate load instructions of variables' addresses by re-using pre-loaded addresses of adjacent vari- ables. -las, --load-after-store Add NOP instructions to place each load instruction further apart following a store instruction that reference the same memory address. -lro, --link-register-optimization Eliminate saves and restores of the link register in frequently-executed functions. -lu aggressiveness_factor, --loop-unroll aggressiveness_factor Unroll short loops containing of one to several basic blocks according to an aggressiveness factor between (1,9), where 1 is the least aggressive unrolling option for very hot and short loops. -lun unrolling_number, --loop-unrolling-number unrolling_number Set the number of unrolled iterations in each unrolled loop. The allowed range is between (2,50). Default is set to 2. (applicable only with the -lu flag). -nop, --nop-removal Remove NOP instructions from reordered code. -O Switch on basic optimizations only. Same as -RC -nop -bp -bf. -O2 Switch on less aggressive optimization flags. Same as -O -hr -pto -isf 8 -tlo -kr. -O3 Switch on aggressive optimization flags. Same as -O2 -RD -isf 12 -si -dp -lro -las -vro -btcar -lu 9 -rt 0 -pbsi. -O4 Switch on aggressive optimization flags together with aggressive function inlining. Same as -O3 -sidf 50 -ihf 20 -sdp 9 -shci 90 and -bldcg (for XCOFF files). -O5 Switch on aggressive optimization flags together with HLR optimization. Same as -O4 -sa -gcpyp -gcnstp -dce. -pbsi, --path-based-selective-inline Perform selective inlining of dominant hot function calls based on control flow paths leading to hot functions. -pca, --propagate-constant-area Relocate the constant variables area to the top of the code section when possible. -[no]pr, --[no]ptrgl-r11 Perform removal of R11 load instruction in _ptrgl csect. -pto, --ptrgl-optimization Perform optimization of indirect call instructions via registers by replacing them with conditional direct jumps. -ptosl limit_size, --ptrgl-optimization-size-limit limit_size Set the limit of the number of conditional statements generated by -pto optimization. Allowed values are between 1..100. Default value set to 3. (applicable only with the -pto flag). -ptoht heatness_threshold, --ptrgl-optimization-heatness-threshold heatness_threshold Set the frequency threshold for indirect calls that are to be optimized by -pto optimization. Allowed range between 0..1. Default is set to 0.8. (applicable only with -pto flag). -RC, --reorder-code Perform code reordering. -rcaf aggressiveness_factor, --reorder-code-aggressivenes-factor aggressiveness_factor Set the aggressiveness of code reordering optimization. Allowed values are [0 | 1 | 2], where 0 pre- serves original code order and 2 is the most aggressive. Default is set to 1. (applicable only with the -RC flag). -rcctf termination_factor, --reorder-code-chain-termination-factor termination_factor Set the threshold fraction which determines when to terminate each chain of basic blocks during code reordering. Allowed input range is between 0.0 to 1.0 where 0.0 generates long chains and 1.0 creates single basic block chains. Default is set to 0.05. (applicable only with the -RC flag). -rccrf reversal_factor, --reorder-code-condition-reversal-factor reversal_factor Set the threshold fraction which determines when to enable condition reversal for each conditional branch during code reordering. Allowed input range is between 0.0 to 1.0 when 0.0 tries to preserve original condition direction and 1.0 ignores it. Default is set to 0.8 (applicable only with the -RC flag). -RD, --reorder-data Perform static data reordering. -rmte, --remove-multiple-toc-entries Remove multiple TOC entries pointing to the same location in the input program file. -rt removal_factor, --reduce-toc removal_factor Perform removal of TOC entries according to a removal factor between (0,1), where 0 removes non- accessed TOC entries only, and 1 removes all possible TOC entries. -sdp aggressiveness_factor, --stride-data-prefetch aggressiveness_factor Perform data prefetching within frequently executed loops based on stride analysis, according to an aggressiveness factor between (1,9), where 1 is least aggressive. -sdpla iterations_number, --stride-data-prefetch-look-ahead iterations_number Set the number of iterations for which data is prefetched into the cache ahead of time. Default value is set to 4 iterations. (applicable only with the -sdp flag). -sdpms stride_min_size, --stride-data-prefetch-min-size stride_min_size Set the minimal stride size in bytes, for which data will be considered as a candidate for prefetch- ing. Default value is set to 128 bytes. (applicable only with the -sdp flag). -shci pct, --selective-hot-code-inline pct Perform selective inlining of functions in order to decrease the total number of execution counts, so that only functions whose hotness is above the given percentage are inlined. -si, --selective-inline Perform selective inlining of dominant hot function calls. -sll Lib1:Prof1,...,LibN:ProfN, --static-link-libraries Lib1:Prof1,...,LibN:ProfN Statically link hot code from specified dynamically linked libraries to the input program. The parame- ter consists of comma-separated list of libraries and their profiles. IMPORTANT: licensing rights of specified libraries should be observed when applying this copying optimization. -sllht hotness_threshold, --static-link-libraries-hotness-threshold hotness_threshold Set hotness threshold for the --static-link-libraries optimization. The allowed input range is between 0 (least aggressive) to 1, or -1, which does not require profile and selects all code that might be called by the input program from the given libraries. Default is 0.5. -sidf percentage_factor, --selective-inline-dominant-factor percentage_factor Set a dominant factor percentage for selective inline optimization. The allowed range is between (0,100). Default is set to 80 (applicable only with the -si and -pbsi flags). -siht frequency_factor, --selective-inline-hotness-threshold frequency_factor Set a hotness threshold factor percentage for selective inline optimization to inline all dominant function calls that have a frequency count greater than the given frequency percentage. Default is set to 100 (applicable only with the -si -pbsi flags). -so, --stack-optimization Reduce the stack frame size of functions which are called with a small number of arguments. -tb, --preserve-traceback-tables Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically included only for C++ applications which use the Try & Catch mechanism. -rtb, --remove-traceback-tables Remove traceback tables in reordered code. -tlo, --tocload-optimization Replace each load instruction that references the TOC with a corresponding add-immediate instruction via the TOC anchor register, when possible. -vro, --volatile-registers-optimization Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers.]]>
-O5 is equivalent to the following flags
-O4 is equivalent to the following flags
-O3 is equivalent to the following flags
Supported values for this flag are
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
]]>