AIX 5L with IBM XL Compilers SPEC CPU Flags
Compilers: IBM XL C/C++ Enterprise Edition Version 8.0 for AIX
Compilers: IBM XL Fortran Enterprise Edition Version 10.1 for AIX
Compilers: IBM XL C/C++ Enterprise Edition Version 9.0 for AIX
Compilers: IBM XL Fortran Enterprise Edition Version 11.1 for AIX
OS: IBM AIX 5L V5.3
Last updated: 18-May-2007
]]>
smtctl -m on|off -w now|boot Controls the enabling and disabling of processor simultaneous multi-threading mode
vmo -r -o lgpg_regions=n -o lgpg_size=16777216 Sets the size of large pages to 16M, and set the number to use, with -r, takes effect on the next IPL.
bosboot -q Regenerates the IPL boot to set the options specified with smtctl and vmo
ulimit Controls resources allowed to be used by processes. All resource are set to unlimited, of primary importance is the "stack" and "data/heap" settings for SPEC CPU2006.
chsyscfg -m system -r prof -i name=profile,lpar_name=partition,lpar_proc_compat_mode=POWER6_enhanced The Hardware Management Console (HMC) command that enables the POWERPC architeture optional instructions supported on POWER6.
bindprocessor $$ n the next program is to be bound to the specified processor.
Environment variables set before the run:
MEMORY_AFFINITY=MCM cause the OS to alloc memory "closest" to the chip that first requests it
XLFRTEOPTS=intrinthrds=1 Causes the Fortran runtime to only use a single thread.
MALLOPTIONS=POOL Selects the OS malloc option that allocates/frees small objects very quickly.
fdpr -q -O4 -A 32 -bldcg -shci 90 -sdp 9
The fdpr command (Feedback Directed Program Restructuring) is a performance-tuning utility that may help
improve the execution time and the real memory utilization of user-level application programs. The fdpr program
optimizes the executable image of a program by collecting information on the behavior of the program while the
program is used for some typical workload, and then creating a new version of the program that is optimized for
that workload. The new program generated by fdpr typically runs faster and uses less real memory.
-q, --quiet Set quiet output mode, suppressing informational
messages
-O Switch on basic optimizations only.
-O2 Switch on less aggressive optimization flags.
-O3 Switch on aggressive optimization flags.
-O4 Switch on aggressive optimization flags together with
aggressive function inlining.
-A , --align-code
Align program code according to given
-bldcg, --build-dcg Build a DCG (data connectivity graph) for enhanced data
reordering (applicable only with the -RD flag)
-shci , --selective-hot-code-inline
Perform selective inlining of functions in order to
decrease the total execution counts
-sdp , --stride-data-prefetch
Perform data prefetching within frequently executed
loops based on stride analysis, according to an
aggressiveness factor between (1,9), where 1 is least
aggressive
]]>
Invoke the IBM XL C compiler. 32-bit binaries are produced by default.
]]>
Invoke the IBM XL C++ compiler. 32-bit binaries are produced by default.
]]>
Invoke the IBM XL Fortran compiler. 32-bit binaries are produced by default.
]]>
-O5
Perform optimizations for maximum performance. This includes maximum
interprocedural analysis on all of the objects presented on the "link"
step. This level of optimization will increase the compiler's memory
usage and compile time requirements. -O5 Provides all of the functionality
of the -O4 option, but also provides the functionality of the
-qipa=level=2 option.
-O5 is equivalent to the following flags
- -O3
- -qipa=level=2
- -qarch=auto
- -qtune=auto
]]>
-O4
Perform optimizations for maximum performance. This includes
interprocedural analysis on all of the objects presented on the "link"
step.
-O4 is equivalent to the following flags
- -O3
- -qipa=level=1
- -qarch=auto
- -qtune=auto
]]>
-O3
-O3 Performs additional optimizations that are memory intensive, compile-time
intensive, and may change the semantics of the program slightly, unless
-qstrict is specified. We recommend these optimizations when the desire for
run-time speed improvements outweighs the concern for limiting compile-time
resources.
-O2 is equivalent to the following flags
]]>
-O2
-O2 Performs a set of optimizations that are intended to offer improved
performance without an unreasonable increase in time or storage that is
required for compilation.
]]>
-O
-O enables the level of optimization that represents the best tradeoff
between compilation speed and run-time performance.
If you need a specific level of optimization, specify the appropriate
numeric value.
Currently, -O is equivalent to -O2.
]]>
-qarch
Produces object code containg instructins that will run on the
specified processors. "auto" selects the processor the complile
is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
- auto
Use the processor on which the program is compiled.
- pwr6e
The POWER6 processor in "Enhanced" mode based systems.
- pwr6
The POWER6 processor based systems.
- pwr5x
The POWER5+ processor based systems.
- pwr5
The POWER5 processor based systems.
- pwr4
The POWER4 processor based systems.
- ppc970
The PPC970 processor based systems.
]]>
-qtune
Specifies the architecture system for which the executable program
is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are:
- auto
Use the processor on which the program is compiled.
- pwr6e
The POWER6 processor in "Enhanced" mode based systems.
- pwr6
The POWER6 processor based systems.
- pwr5x
The POWER5+ processor based systems.
- pwr5
The POWER5 processor based systems.
- pwr4
The POWER4 processor based systems.
- ppc970
The PPC970 processor based systems.
]]>
This option inlines glue code that optimizes external
function calls when compiling.
-qhot
Performs high-order transformations on loops during optimization.
]]>
-qipa=level
Enhances optimization by doing detailed analysis across procedures
(interprocedural analysis or IPA).
The level determines the amount of interprocedural analysis
and optimization that is performed.
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited
call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
]]>
The option used in the first pass of a profile directed feedback compile
that causes pdf information to be generated.
The profile directed feedback optimization gathers data on both exectuion
path and data values. It does not use hardware counters, nor gather any
data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile
that causes PDF information to be utilized during optimization.
-qxlf90=nosignedzero
-qxlf90=
Determines whether the compiler provides the
Fortran 90 or the Fortran 95 level of support for
certain aspects of the language. can be
one of the following:
signedzero | nosignedzero
Determines how the SIGN(A,B) function handles
signed real 0.0. In addition, determines
whether negative internal values will be
prefixed with a minus when formatted output
would produce a negative sign zero.
autodealloc | noautodealloc
Determines whether the compiler deallocates
allocatable arrays that are declared locally
without either the SAVE or the STATIC
attribute and have a status of currently
allocated when the subprogram terminates.
oldpad | nooldpad
When the PAD=specifier is present in the
INQUIRE statement, specifying -qxlf90=nooldpad
returns UNDEFINED when there is no connection,
or when the connection is for unformatted I/O.
This behavior conforms with the Fortran 95
standard and above. Specifying -qxlf90=oldpad
preserves the Fortran 90 behavior.
Default:
o signedzero, autodealloc and nooldpad for the
xlf95, xlf95_r, xlf95_r7 and f95 invocation
commands.
o nosignedzero, noautodealloc and oldpad for
all other invocation commands.
]]>
-q64
Generates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
-bmaxdata:0x80000000
Causes the system loader to put the heap in it's own segment of the size specified.
This is only required for 32-bit applications, as their segments are 256M. If the
last digit of the value is "C", the it also turns of the malloc pool option for that
executable.
Sets the bit in the file's XCOFF header indicating that this
executable will request the use of large pages when they are
available on the system and when the user has an appropriate
privilege
Indicates that a program, designed to execute in a
large page memory environment, can take advantage
of large 16 MB pages provided on POWER4 and higher
based systems.
Indicates that the compiler understands how to do alloca().
Causes the Fortran compiler to allocate dynamic arrays on the heap instead
of the stack
Enables the generation of vector instructions for processors
that support them.
Specifies whether to use volatile or non-volatile vector
registers. Volatile vector registers are registers whose
value is not preserved across function calls so the
compiler will not depend on values in them across function
calls.
The __IBM_FAST_VECTOR macro defines a different iterator for the std::vector
template class. This iterator results in faster code, but is not compatible
with code using the default iterator for a std::vector template class.
All uses of std::vector for a data type must use the same iterator.
Add -D__IBM_FAST_VECTOR to the compile line, or "#define __IBM_FAST_VECTOR 1"
to your source code to use the faster iterator for std::vector template class.
You must compile all sources with this macro.
Causes AIX to define "ischar()" (and friends) as macro's and no subroutines.
Cause the C++ compiler to generate Run Time Type Identification code
Causes the compiler to treat "char" variables as signed instead of the
default of unsigned.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entites to match the C compiler ABI
-qalias=noansi
qalias=ansi | noansi
If ansi is specified, type-based aliasing is
used during optimization, which restricts the
lvalues that can be safely used to access a
data object. The default is ansi for the xlc,
xlC, and c89 commands. This option has no
effect unless you also specify the -O option.
qalias=std |nostd
Indicates whether the compilation units contain
any non-standard aliasing (see Compiler Reference
for more information). If so, specify nostd.
]]>
Specifies what aggregate alignment rules the
compiler uses for file compilation, where the
alignment options are:
bit_packed
The compiler uses the bit_packed alignment
rules.
full
The compiler uses the RISC System/6000
alignment rules. This is the same as power.
mac68k
The compiler uses the Macintosh alignment
rules. This suboption is valid only for 32-
bit compilations.
natural
The compiler maps structure members to their
natural boundaries.
packed
The compiler uses the packed alignment rules.
power
The compiler uses the RISC System/6000
alignment rules.
twobyte
The compiler uses the Macintosh alignment
rules. This suboption is valid only for 32-
bit compilations. The mac68k option is the
same as twobyte.
The default is -qalign=full.
]]>
-qstrict
Turns off aggressive optimizations which have the
potential to alter the semantics of your program.
-qstrict sets -qfloat=nofltint:norsqrt. -qnostrict
sets -qfloat=rsqrt. This option is only valid with
-O2 or higher optimization levels.
Default:
o -qnostrict at -O3 or higher.
o -qstrict otherwise.
]]>
Allows most any c dialect.
-qipa=noobject
Specifies whether to include standard object code in the object files.
The noobject suboption can substantially reduce overall
compilation time, by not generating object code during the first IPA phase.
]]>
-qipa=threads
The threads suboption allows the IPA optimizer to run portions
of the optimization process in parallel threads, which can speed up the
compilation process on multi-processor systems. All the available
threads, or the number specified by N, are used. N must be a positive
integer. Specifying nothreads does not run any parallel threads;
this is equivalent to running one serial thread.
]]>
Causes the compiler to output a traceback if it abends.
-qsuppress=msg1:msg2
-qsuppress=1500-036,
-qsuppress=cmpmsg
Suppresses the message with the message number specified.