Fujitsu.PQ580A.ipf.linux.flags
Fujitsu Limited PQ580A SPEC CPU Flags Description
Compilers: Intel Compilers for C++ and Fortran, Version 10.1 for IPF Linux64
Operating system: Red Hat Enterprise Linux 5.1 (for Intel Itanium)
]]>
Processes are bound to CPUs using numactl and taskset.
- taskset -c cpulist command {arguments ...}
launch a new COMMAND with a CPU affinity specified by cpulist
- numactl --membind nodes command {arguments ...}
launch a new COMMAND with memory placing policy that only allocate memory from nodes
limit stacksize unlimited
set the stack size to unlimited using the command 'ulimit -s unlimited' prior to run
Memory system is in "Non Mirror Mode".
PRIMEQUEST 580A/540A/520A memory system supports DSSA (Dual Sync System Architecture )
and works in one of following two modes.
- Mirror Mode
Address buses and Data buses are duplicated.
And most of internal action of chipset is also duplicated.
In this mode, system memory throughput becomes half but higher
reliability is expected by the memory system duplication.
- Non Mirror Mode
Address buses and data buses are not duplicated.
And the internal action of chipset is not duplicated.
In this mode, full memory bandwidth is available,
but the system duplication for higher reliability does not work.
The following 2 environment variables were set.
- MALLOC_MMAP_MAX_=0
- MALLOC_TRIM_THRESHOLD_=-1
This will cause use of sbrk() calls instead of
mmap() calls to get memory from the system.
]]>
Specifies the main program is not written in Fortran, and prevents the compiler
from linking for_main.o into applications.
]]>
Maximizes speed across the entire program.
Sets the following options:
]]>
Enables O2 optimizations plus more aggressive optimizations, such as
prefetching, scalar replacement, and loop and memory access
transformations. Enables optimizations for maximum speed, such as:
- Loop unrolling, including instruction scheduling
- Code replication to eliminate branches
- Padding the size of certain power-of-two arrays to allow
more efficient cache use.
On Intel Itanium processors, the O3 option enables optimizations
for technical computing applications (loop-intensive code):
loop optimizations and data prefetch.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.
]]>
Enables optimizations for speed. This is the generally recommended
optimization level.
This option enables optimizations for speed, including global code scheduling,
software pipelining,predication, and speculation.
This option also enables:
- Inlining of intrinsics
- Intra-file interprocedural optimizations, which include:
- inlining
- constant propagation
- forward substitution
- routine attribute propagation
- variable address-taken analysis
- dead static function elimination
- removal of unreferenced variables
- The following capabilities for performance gain:
- constant propagation
- copy propagation
- dead-code elimination
- global register allocation
- global instruction scheduling and control speculation
- loop unrolling
- optimized code selection
- partial redundancy elimination
- strength reduction/induction variable simplification
- variable renaming
- exception handling optimizations
- tail recursions
- peephole optimizations
- structure assignment lowering and optimizations
- dead store elimination
]]>
Enables optimizations for speed and disables some optimizations that
increase code size and affect speed.
To limit code size, this option:
- Enables global optimization; this includes data-flow analysis,
code motion, strength reduction and test replacement, split-lifetime
analysis, and instruction scheduling.
- Disables intrinsic recognition and intrinsics inlining.
- On Itanium-based systems, it disables software pipelining, loop unrolling,
and global code scheduling.
On Intel Itanium processors, this option also enables optimizations for server
applications (straight-line and branch-like code with a flat profile).
The O1 option may improve performance for applications with very large
code size, many branches, and execution time not dominated by code within loops.
-unroll0, -fbuiltin, -mno-ieee-fp, -fomit-frame-pointer (same as -fp), -ffunction-sections
]]>
Tells the compiler the maximum number of times to unroll loops.
]]>
Enables inline expansion of all intrinsic functions.
]]>
Disables conformance to the ANSI C and IEEE 754 standards for
floating-point arithmetic.
]]>
Enables EBP to be used as a general-purpose register.
]]>
Places each function in its own COMDAT section.
]]>
Enables multifile interprocedural optimizations between files.
]]>
Prevents linking with shared libraries.
]]>
Instruments a program for profiling to get the execution count of
each basic block. It also creates a new static profile information file.
]]>
Enables use of profiling information (including function splitting and
function grouping) during optimization. It enables option -fnsplit.
]]>
Enables function splitting. This option is enabled automatically if you
specify -prof-use.
]]>
Enables use of faster but slightly less accurate code sequences for
math functions, such as divide and sqrt. When compared to strict IEEE*
precision, this option slightly reduces the accuracy of floating-point
calculations performed by these functions, usually limited to the least
significant digit.
This option also enables the performance of more aggressive
floating-point transformations, which may affect accuracy.
]]>
Disables prefetch insertion optimization..
]]>
Specifies that aliasing should not be assumed in the program.
]]>
Do not assume arguments may be aliased.
]]>
Tells the compiler to assume that the program adheres to ISO C Standard.
aliasability.
If your program adheres to these rules, then this option allows the compiler
to optimize more aggressively. If it doesn't adhere to these rules, then it
can cause the compiler to generate incorrect code.
]]>
Enables language compatibility with the gcc option -ansi and provides the
same level of ANSI standard comformance as that option.
This option sets option -fmath-errno.
]]>
Tells the compiler to assume that the program test errno after calls to
math library functions. This restricts optimization because it causes the
compiler to treat most math functions as having side effects.
]]>
The -Wl option directs the compiler to pass a list of arguments
to the linker. In this case, "-z muldefs" is passed to the
linker. For the Gnu linker (ld), the "-z keyword" option accepts
several recognized keywords. Keyword "muldefs" allows multiple
definitions. The muldefs keyword will enable, for example,
linking with third party libraries like SmartHeap from
Microquill.
]]>
MicroQuill SmartHeap Library available from http://www.microquill.com/
To link SmartHeap with C applications, you must link with libsmartheap64.a
To link SmartHeap with C++ applications, you must link with libsmartheap64.a and libsmartheapC64.a
]]>
MicroQuill SmartHeap Library available from http://www.microquill.com/
To link SmartHeap with C applications, you must link with libsmartheap64.a
To link SmartHeap with C++ applications, you must link with libsmartheap64.a and libsmartheapC64.a
]]>
-mtune=cpu
Performs optimizations for a specified CPU. On Itanium(R)-based Linux
systems, you can specify one of the following values.
- itanium: Optimizes for Intel(R) Itanium(R) processors.
- itanium2: Optimizes for Intel(R) Itanium(R) 2 processors..
- itanium2-p9000: Optimizes for Dual-Core Intel(R) Itanium(R) 2
Processor 9000 Sequence processors.
]]>
Instructs the compiler to analyze and transform the program so that
64-bit pointers are shrunk to 32-bit pointers, and 64-bit longs
(on Linux) are shrunk into 32-bit longs wherever it is legal and safe
to do so. In order for this option to be effective the compiler must be
able to optimize using the -ipo option and must be able to analyze all
library or external calls the program makes.
]]>
-opt-mem-bandwidthn
Enables or disables performance tuning and heuristics that control
memory bandwidth use among processors. It allows the compiler to be
less aggressive with optimizations that might consume more bandwidth,
so that the bandwidth can be well-shared among multiple processors
for a parallel program. For values of n greater than 0, the option tells
the compiler to enable a set of performance tuning and heuristics in
compiler optimizations such as prefetching, privatization, aggressive code
motion, and so forth, for reducing memory bandwidth pressure and
balancing memory bandwidth traffic among threads. The n value is the
level of optimizing for memory bandwidth usage. You can specify one of
the following values for n:
- 0 -- Disables a set of performance tuning and heuristics in compiler
optimizations for parallel code. This is the default for serial code.
- 1-- Enables a set of performance tuning and heuristics in compiler
optimizations for multithreaded code generated by the compiler. This is
the default if compiler option -parallel or -openmp is specified, or Cluster
OpenMP option -cluster-openmp is specified (see the Cluster OpenMP
documentation).
- 2 -- Enables a set of performance tuning and heuristics in compiler
optimizations for parallel code such as Windows Threads, pthreads, and
MPI code, besides multithreaded code generated by the compiler.
]]>
-inline-factor=n
Specifies the percentage multiplier that should be applied to all inlining
options that define upper limits: -inline-max-size, -inline-max-total-size,
-inline-max-per-routine, and -inline-max-per-compile.
This option takes the default value for each of the above options and
multiplies it by n divided by 100. For example, if 200 is specified, all inlining
options that define upper limits are multiplied by a factor of 2. This option
is usuful if you do not want to individually increase each option limit.
n is a positive integer specifying the percentage value. The default value
is 100 (a factor of 1).
]]>
-inline-max-size=n
Specifies the lower limit for the size of what the inliner considers to be a
large routine. It specifies the boundary between what the inliner considers
to be medium and large-size routines.
The inliner prefers to inline small routines. It has a preference against
inlining large routines. So, any large routine is highly unlikely to be inlined.
n is a positive integer that specifies the minimum size of a large routine.
]]>
-inline-max-per-routine=n
Specifies the maximum number of times the inliner may inline into a
particular routine. It limits the number of times that inlining can be applied
to any routine.
n is a positive integer that specifies the maximum number.
]]>
-inline-max-total-size=n
Specifies how much larger a routine can normally grow when inline expansion
is performed. It limits the potential size of the routine. For example, if 2000 is
specified for n, the size of any routine will normally not increase by more
than 2000.
n is a positive integer that specifies the permitted increase in the size of the
routine.
]]>
-inline-min-size=n
Specifies the upper limit for the size of what the inliner considers to be a
small routine. It specifies the boundary between what the inliner considers to
be small and medium-size routines. n is a positive integer that specifies the
maximum size of a small routine.
The inliner has a preference to inline small routines. So, when a routine is
smaller than or equal to the specified size, it is very likely to be inlined.
]]>
This option turns on versioning of modulo operations for
certain types of operands (e.g. x%y where y is dynamically
determined to be a power of 2). The default is modulo
versioning off. This option may improve performance.
Versioning of modulo operations commonly results in possibly
large speedups for x%y where y is a power of 2. However,
the optimization could hurt performance slightly if y is
not a power of 2.
This option tells the compiler to use more aggressive unrolling
for certain loops. The default is -no-unroll-aggressive
(the compiler uses less aggressive default heuristics when
unrolling loops). This option may improve performance.
On the Itanium architecture, this option enables additional
complete unrolling for loops that have multiple exits or outer
loops that have a small constant trip count.
This option controls the prefetches that are issued for a
memory access in the next iteration, typically done in a
pointer-chasing loop. This option should improve performance.
The default is -no-opt-prefetch-next-iteration (next iteration
prefetch off).
This option controls the prefetches that are issued before
the loop is entered. These prefetches target the initial
iterations of the loop. The default is -opt-prefetch-initial-values
(prefetch for initial iterations on) at -O1 and higher optimization
levels.
This option controls the loadpair optimization. The loadpair
optimization is enabled by default when -O3 is used for
Itanium. -no-opt-loadpair turns the loadpair optimization off.
Enables or disables use of the "exclusive hint" when generating
prefetch instructions. (IA-64 architecture only, default: off)
The Itanium architecture provides mechanisms, such as instruction
templates, branch hints, and cache hints to enable the compiler
to communicate compile-time information to the processor.
"exclusive hint" is one of the cache hints and tells the processor
to bring the prefetched cache line into the cache in exclusive state.
]]>
Invoke the Intel C++ compiler for IPF Linux64 to compile C applications
]]>
Invoke the Intel C++ compiler for IPF Linux64 to compile C++ applications
]]>
Invoke the Intel Fortran compiler for IPF Linux64
]]>