The text for many of the descriptions below was taken
from the documentation of the Intel Compilers.
This documentation is copyright © 2007 Intel Corporation. All Rights Reserved.
The original documentation is distributed with the Intel compilers.
One or more of the following settings may have been set. If so, the corresponding notes sections of the report will say so; and you can read below to find out more about what these settings mean.
OMP_NUM_THREADS
Sets the maximum number of threads to use for OpenMP* parallel regions
if no other value is specified in the application.
This environment variable applies to both -openmp and -parallel
(Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows).
Example syntax on a Windows system with 8 cores:
set OMP_NUM_THREADS=8
Default is the number of cores visible to the OS.
KMP_AFFINITY
KMP_AFFINITY =
Hardware Prefetch:
This BIOS option allows the enabling/disabling of a processor mechanism to prefetch data into the cache according to a pattern-recognition algorithm.
In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.
Adjacent Sector Prefetch:
This BIOS option allows the enabling/disabling of a processor mechanism to fetch the adjacent cache line within an 128-byte sector that contains the data needed due to a cache line miss.
In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.
Snoop Filter Enabled/Disabled:
The Snoop Filter is designed to reduce system bus utilization
coming from cache misses. On the Intel 5000X and 5400 chipset,
it is built as a cache structure able to minimize unnecessary
snoop traffic.
When enabled, it can lead to significant memory performance
improvements for several workstation applications
on suitable memory configurations.
Profile Guided Optimization (PGO) consists of 3 phases:
Phase 1: Compile and generate instrumented code in preparation
to gather profiling information (compiler flag -Qprof_gen).
Phase 2: Execute the instrumented code and gather profiling information.
Phase 3: Recompile the code and use the profiling information
for improved optimization (compiler flag -Qprof_use).
The option -Qprof_gen instruments a program for profiling to get the execution count of each basic block. It also creates a new static profile information file (.spi). This flag is used in phase 1 of the Profile Guided Optimizer (PGO) to instruct the compiler to produce code in your object files in preparation for instrumented execution.
The instrumented code
Profile Guided Optimization (PGO) consists of 3 phases:
Phase 1: Compile and generate instrumented code in preparation
to gather profiling information (compiler flag -Qprof_gen).
Phase 2: Execute the instrumented code and gather profiling information.
Phase 3: Recompile the code and use the profiling information
for improved optimization (compiler flag -Qprof_use).
The option -Qprof_use instructs the compiler to use the profiling information from phase 2 of PGO in order to produce a profile-optimized executable (phase 3 of PGO).
It also enables function splitting (option -Qfnsplit) and function grouping during optimization.
Note that there is no way to turn off function grouping if you enable it using this option.
The recompilation with -Qprof_use
In Windows, it sets the following options:
-O3 -Qipo -Qprec-div- -QxT
Note that programs compiled with the -QxT option
will detect non-compatible processors and generate
an error message during execution.
The -QxT option that is set by the -fast option
cannot be overridden by other command line options.
If you specify -fast and a differnt processor-specific option,
such as -QxN, the compiler will issue a warning that explains
the -QxT option cannot be overridden.
The optimizations include:
With some optimizations, -QxN and -QxB, the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, use this option to disable the floating-point division-to-multiplication optimization. The result is more accurate, with some loss of performance.
If you specify -Qprec-div-, it enables optimizations that give slightly less precise results than full IEEE division.
Default is -Qprec-div
]]>processor Is the processor
for which you want to target your program.
Here: T Code is optimized
generating SSSE3, SSE3, SSE2, and SSE instructions for Intel processors.
Code can be optimized for the Intel Core 2 Duo processor family.
The resulting code may contain unconditional use of features
that are not supported on other processors.
This option also enables new optimizations in addition to Intel
processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.
Programs compiled with -QxT will display a fatal run-time error if they are executed on unsupported processors.
]]>This option enables interprocedural optimizations between files. This is also called multifile interprocedural optimization (multifile IPO) or Whole Program Optimization (WPO).
When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.
You cannot specify the names for the object files that are created.
n Is an optional integer that specifies
the number of object files the compiler should create.
The integer must be greater than or equal to 0.
If you do not specify n, the default is 0.
If n is 0, the compiler decides whether to create one or more object files based on an estimate of the size of the application. It generates one object file for small applications, and two or more object files for large applications.
If n is greater than 0, the compiler generates n object files, unless n exceeds the number of source files (m), in which case the compiler generates only m object files.
]]>This options defaults to ON.
This option also enables:
To use this option, you must also specify -O2 or -O3.
]]>This option enables function splitting if -Qprof-use is also specified. Otherwise, this option has no effect.
It is enabled automatically if you specify -Qprof-use. If you do not specify one of those options, the default is -Qfnsplit-, which disables function splitting but leaves function grouping enabled.
To disable function splitting when you use -Qprof-use, specify -Qfnsplit-.
]]>Default enabled
]]>Default enabled
]]>Default enabled
]]>n = 1
Enables inlining of functions declared with the __inline keyword.
Also enables inlining according to the C++ language
n = 2
Enables inlining of any function.
However, the compiler decides which functions are inlined.
This option enables interprocedural optimizations and has the same
effect as specifying option Qip.
Default enabled with n = 2
]]>Default enabled with n = 4096.
]]>Default disabled
]]>Overrides -Os
]]>Default enabled
]]>If your program satisfies the above conditions, setting the -Qansi_alias flag will help the compiler better optimize the program. However, if your program does not satisfy one of the above conditions, the -Qansi_alias flag may lead the compiler to generate incorrect code.
for Fortran
Enables (default) or disables the compiler to assume that the program adheres to the ANSI Fortran type aliasablility rules.
For example, an object of type real cannot be accessed as an integer.
You should see the ANSI Standard for the complete set of rules.
This option only has an effect when the main program is being compiled. It sets the ftz mode for the process.
]]>Default is -Qprefetch- which disables this kind of optimization.
]]>It does not affect variables that have the SAVE attribute or ALLOCATABLE attribute, or variables that appear in an EQUIVALENCE statement or in a common block.
This option may provide a performance gain for your program, but if your program depends on variables having the same value as the last time the routine was invoked, your program may not function properly.
If you want to cause variables to be placed in static memory, specify /Qsave (Windows).
]]>
Default is 16.
Problem: 16 is also possible. How to write regexp?
]]>Default enabled
]]>This option has the same effect as specifying -GX -GR.
-GX Enables C++ exception handling.
-GR Enables C++ Run Time Type Information (RTTI).
-F<n>
<n> is the stack reserve amount.
It can be specified as a decimal integer or by using a C-style convention
for constants (for example, -F0x1000).
Default: The stack size default is chosen by the operating system.
string Is the name of the tool.
Here: cpp indicates the C++ preprocessor.
options Are one or more comma-separated,
valid options for the designated tool.
Here: --no_wchar_t_keyword is passed to C++ preprocessor to provide
the information that there is no wchar_t keyword.
This flag must be used with Microsoft Visual Studio 2005.
It avoids syntax errors coming from the use of wchar_t in 483.xalancbmk.