Flag disclosure for the AMD CPU2000 submissions ********************************************************************* /[no]optimize Syntax: /optimize[:level], /nooptimize, /Od, /Ox, or /Oxp The /optimize option controls the level of optimization performed by the compiler. To provide efficient run-time performance, Visual Fortran increases compile time in favor of decreasing run time. If an operation can be performed, eliminated, or simplified at compile time, the compiler does so rather than have it done at run time. Also, the size of object file usually increases when certain optimizations occur (such as with more loop unrolling and more inlined procedures). In the visual development environment, specify the Optimization Level in the General or Optimizations Compiler Option Category. The /optimize options are: /optimize:0 or /Od /optimize:1 /optimize:2 /optimize:3 /optimize:4, /Ox, and /Oxp /optimize:5 The /optimize options: /optimize:0 or /Od Disables nearly all optimizations. This is the default if you specify /debug (with no keyword). Specifying this option causes certain /warn options to be ignored. Specifying /Od sets the /optimize:0 and /math_library:check options. /optimize:1 Enables local optimizations within the source program unit, recognition of common subexpressions, and expansion of integer multiplication and division (using shifts). /optimize:2 Enables global optimization. This includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling. Specifying /optimize:2 includes the optimizations performed by /optimize:1. /optimize:3 Enables additional global optimizations that improve speed (at the cost of extra code size). These optimizations include: Loop unrolling, including instruction scheduling Code replication to eliminate branches Padding the size of certain power-of-two arrays to allow more efficient cache use (see Use Arrays Efficiently) Specifying /optimize:3 includes the optimizations performed by /optimize:1 and /optimize:2. /optimize:4, /Ox, and /Oxp Enables interprocedure analysis and automatic inlining of small procedures (with heuristics limiting the amount of extra code). Specifying /optimize:4 includes the optimizations performed by /optimize:1 /optimize:2, and /optimize:3. For the DF command, /optimize:4 is the default unless you specify /debug (with no keyword). Specifying /Ox sets: /optimize:4, /math_library:fast, and /assume:nodummy_aliases. Specifying /Oxp sets: /optimize:4, /math_library:check, /assume:nodummy_aliases, and /fpconsistency (x86 systems). /optimize:5 On x86 systems, activates the loop transformation optimizations (also set by /transform_loops). The loop transformation optimizations are a group of optimizations that apply to array references within loops. These optimizations can improve the performance of the memory system and can apply to multiple nested loops. Loop transformation optimizations include loop blocking, loop distribution, loop fusion, loop interchange, loop scalar replacement, and outer loop unrolling. You can specify loop transformation optimizations without software pipelining (see /[no]transform_loops). On x86 systems, specifying /optimize:5 activates /transform_loops. To determine whether using /optimize:5 benefits your particular program, you should compare program execution timings for the same program (or subprogram) compiled at levels /optimize:4 and /optimize:5. Specifying /optimize:5 includes the optimizations performed by /optimize:1, /optimize:2, /optimize:3, and /optimize:4. For detailed information on these optimizations, see Optimization Levels: the /optimize Option. For information about timing your program, see Analyze Program Performance. To compile your application for efficient run-time performance, see Compile With Appropriate Options and Multiple Source Files. /fast Syntax: /fast The /fast option sets several options that generate optimized code forfast run-time performance. Specifying this option is equivalent to specifying: /alignment:(dcommons, records, sequence) /architecture:host /assume:noaccuracy_sensitive /math_library:fast (which changes the default of /check:[no]power) /tune:host In the visual development environment, specify the Generate Most Optimized Code in the Code Generation Compiler Option Category. If you omit /fast, these performance-related options will not be set. /[no]alignment Syntax: /alignment[:keyword...], /noalignment, or /Zpn The /alignment option specifies the alignment of data items in common blocks, record structures, and derived-type structures. The /Zpn option specifies the alignment of data items in derived-type or record structures. The /alignment options are: /align:[no]commons The /align:commons option aligns the data items of all COMMON data blocks on natural boundaries up to four bytes. The default is /align:nocommons (unless /fast is specified), which does not align data blocks on natural boundaries. In the visual development environment, specify the Common Element Alignment as 4 in the Fortran Data Compiler Option Category. /align:dcommons The /align:dcommons option aligns the data items of all COMMON data blocks on natural boundaries up to eight bytes. The default is /align:nocommons (unless /fast is specified), which does not align data blocks on natural boundaries. Specifying /fast sets /align:dcommons. In the visual development environment, specify the Common Element Alignment as 8 in the Fortran Data Compiler Option Category. /align:[no]records The /align:records option (the default) requests that components of derived types and fields of records be aligned on natural boundaries up to 8 bytes (for derived types with the SEQUENCE statement, see /align:[no]sequence below). The /align:norecords option requests that components and fields be aligned on arbitrary byte boundaries, instead of on natural boundaries up to 8 bytes. In the visual development environment, specify the Structure Element Alignment in the Fortran Data Compiler Option Category. /align:[no]sequence The /align:sequence option requests that components of derived types with the SEQUENCE statement will obey whatever alignment rules are currently in use (default alignment rules will align unsequenced components on natural boundaries). The default value (unless /fast is specified) is /align:nosequence, which means that components of derived types with the SEQUENCE property will be packed, regardless of whatever alignment rules are currently in use. Specifying /fast sets /align:sequence. In the visual development environment, specify Allow SEQUENCE Types to be Padded for Alignment in the Fortran Data Compiler Option Category. /align:recNbyte or /Zpn The /align:recNbyte or /Zpn options request that fields of records and components of derived types be aligned on the smaller of: The size byte boundary (N) specified. The boundary that will naturally align them. Specifying /align:recNbyte, /Zpn, or /align:[no]records does not affect whether common block fields are naturally aligned or packed. In the visual development environment, specify the Structure Element Alignment in the Fortran Data Compiler Option Category. Specifying Is the Same as Specifying /Zp /alignment:records or /align:rec8byte /Zp1 /alignment:norecords or /align:rec1byte /Zp2 /align:rec2byte /Zp4 /align:rec4byte /alignment /Zp8 with /align:dcommons, /alignment:all, or /alignment:(dcommons, records) /noalignment /Zp1, /alignment:none, or /alignment:(nocommons,nodcommons, norecords) /align:rec1byte /align:norecords /align:rec8byte /align:records When you omit the /alignment option, records and components of derived types are naturally aligned, but fields in common blocks are packed. This default is equivalent to: /alignment=(nocommons,nodcommons,records,nosequence) You can also control the alignment of components in records and derived types and data items in common blocks by Using the cDEC$ OPTIONS Directive. /architecture Syntax: /architecture:keyword The /architecture (/arch) option controls the types of processor specific instructions generated for this program unit. The /arch:keyword option uses the same keywords as the /tune:keyword option. All processors of a certain architecture type (Alpha or x86) implement a core set of instructions. Certain (more recent) processor versions include additional instruction extensions. Whereas the /tune:keyword option is primarily used by certain higher level optimizations for instruction scheduling purposes, the /arch:keyword option determines the type of machine-code instructions generated for the program unit being compiled. In the visual development environment, specify the Generate Code For in the Code Generation Compiler Option Category. For x86 (IntelTM and AMDTM) systems, the supported /arch keywords are: /arch:generic Generates code (sometimes called blended code) that is appropriate for processor generations for the architecture type in use. This is the default. Programs compiled on an x86 system with the generic keyword will run on all x86 (486 and Pentium series) systems. /arch:host Generates code for the processor generation in use on the system being used for compilation. Depending on the host system used on x86 systems, the program may or may not run on other x86 systems: Programs compiled on a 486 system with the host keyword will run on all x86 systems. Programs compiled on a PentiumTM (586) system with the host keyword should not be run on 486 systems. Programs compiled on a Pentium ProTM, Pentium II, or AMD K6 system with the host keyword should not be run on 486 or Pentium systems. Programs compiled on a Pentium III system with the host keyword should not be run on 486, Pentium, Pentium Pro, Pentium II, or AMD K6 systems. Programs compiled on a AMD K6_2 or AMD K6_III system with the host keyword should not be run on 486, Pentium, Pentium Pro, Pentium II, AMD K6, or Pentium III systems. Programs compiled on a AMD Athlon system with the host keyword should not be run on 486, Pentium, Pentium Pro, Pentium II, AMD K6, Pentium III systems, AMD K6_2, or AMD K6_III systems. /arch:p5 Generates code for the Pentium processor systems. Programs compiled with the p5 keyword will run correctly on Pentium, Pentium Pro, Pentium II, AMD K6, and higher processors, but should not be run on 486 processors. /arch:p6 Generates code for the Pentium Pro and Pentium II and AMD K6 processor systems only. Programs compiled with the p6 or k6 keyword will run correctly on Pentium II, AMD K6, Pentium III, and higher processors, but should not be run on 486 or Pentium processors. /arch:k6 Generates code for the AMD K6 (same as Pentium II systems) processor systems only. Programs compiled with the k6 or p6 keyword will run correctly on Pentium II, AMD K6, Pentium III, and higher processors, but should not be run on 486 or Pentium processors. /arch:p6p Generates code for the Pentium III, AMD K6_2, and AMD K6_III processor systems only. Programs compiled with the p6p keyword will run correctly on Pentium III, AMD K6_2, and AMD K6_III and higher processors, but should not be run on 486, Pentium, Pentium Pro, or Pentium II (same as AMD K6) processors. /arch:k6_2 Generates code for the AMD K6_2 and AMD K6_III processor systems. Programs compiled with the k6_2 keyword will run correctly on AMD K6_2, AMD K6_III, and AMD AthlonTM processors, but should not be run on 486, Pentium, Pentium Pro, Pentium II (same as AMD K6), or Pentium III processors. /arch:k7 Generates code for the AMD Athlon processor systems only. Programs compiled with the k7 keyword will run correctly on AMD Athlon processors, but should not be run on 486, Pentium, Pentium Pro, Pentium II (same as AMD K6), Pentium III, AMD K6_2, or AMD K6_III processors. Other processors (not listed) that have instruction-level compatiblity with the processors listed above will have results similar to those processors. Specifying /fast sets /arch:host. For information about timing program execution, see Analyze Program Performance. /assume Syntax: /assume:keyword The /assume option specifies assumptions made by the Fortran syntax analyzer, optimizer, and code generator. These option keywords are: /assume:[no]accuracy_sensitive /assume:[no]buffered_io /assume:[no]byterecl /assume:[no]dummy_aliases /assume:[no]minus0 /assume:[no]source_include /assume:[no]underscore The /assume options are: /assume:[no]accuracy_sensitive Specifying /assume:noaccuracy_sensitive allows the compiler to reorder code based on algebraic identities (inverses, associativity, and distribution) to improve performance. In the visual development environment, specify Allow Reordering of Floating-Point Operations in the Optimizations Compiler Option Category. The numeric results can be slightly different from the default (/assume:accuracy_sensitive) because of the way intermediate results are rounded. Numeric results with /assume:noaccuracy_sensitive are not categorically less accurate. They can produce more accurate results for certain floating-point calculations, such as dot product summations. For example, the following expressions are mathematically equivalent but may not compute the same value using finite precision arithmetic. X = (A + B) - C X = A + (B - C) If you omit /assume:noaccuracy_sensitive and omit /fast, the compiler uses a limited number of rules for calculations, which might prevent some optimizations. If you specify /assume:noaccuracy_sensitive, or if you specify /fast and omit /assume:accuracy_sensitive, the compiler can reorder code based on algebraic identities to improve performance. For more information on /assume:noaccuracy_sensitive, see Arithmetic Reordering Optimizations. /assume:[no]buffered_io The /assume:buffered_io option controls whether records are written (flushed) to disk as each record is written (default) or accumulated in the buffer. For disk devices, /assume:buffered_io (or the equivalent OPEN statement BUFFERED='YES' specifier) requests that the internal buffer will be filled, possibly by many record output statements (WRITE), before it is written to disk by the Fortran run-time system. If a file is opened for direct access, I/O buffering will be ignored. Using buffered writes usually makes disk I/O more efficient by writing larger blocks of data to the disk less often. However, if you specified /assume:buffered_io or BUFFERED='YES', records not yet written to disk may be lost in the event of a system failure. The default is BUFFERED='NO' and /assume:nobuffered_io for all I/O, in which case, the Fortran run-time system empties its internal buffer for each WRITE (or similar record output statement). The OPEN statement BUFFERED specifier takes precedence over the /assume:[no]buffered_io option. In the visual development environment, specify the Enable I/O Buffering in the Optimizations Compiler Option Category. For more information on /assume:buffered_io, see Efficient Use of Record Buffers and Disk I/O. /assume:[no]byterecl The /assume:byterecl option applies only to unformatted files. In the visual development environment, specify the Use Bytes as Unit for Unformatted Files in the Fortran Data Compiler Option Category. Specifying the /assume:byterecl option: Indicates that the units for an explicit OPEN statement RECL specifier value are in bytes. Forces the record length value returned by an INQUIRE by output list to be in byte units. Specifying /assume:nobyterecl indicates that the units for RECL values with unformatted files are in four-byte (longword) units. This is the default. /assume:[no]dummy_aliases Specifying the /assume:dummy_aliases option requires that the compiler assume that dummy (formal) arguments to procedures share memory locations with other dummy arguments or with variables shared through use association, host association, or common block use. The default is /assume:nodummy_aliases. In the visual development environment, specify Enable Dummy Argument Aliasing in the Fortran Data (or Optimizations) Compiler Option Category. These program semantics do not strictly obey the Fortran 90 Standard and they slow performance. If you omit /assume:dummy_aliases, the compiler does not need to make these assumptions, which results in better run-time performance. However, omitting /assume:dummy_aliases can cause some programs that depend on such aliases to fail or produce wrong answers. You only need to compile the called subprogram with /assume:dummy_aliases. If you compile a program that uses dummy aliasing with /assume:nodummy_aliases in effect, the run-time behavior of the program will be unpredictable. In such programs, the results will depend on the exact optimizations that are performed. In some cases, normal results will occur; however, in other cases, results will differ because the values used in computations involving the offending aliases will differ. For more information, see Dummy Aliasing Assumption. /assume:[no]minus0 This option controls whether the compiler uses Fortran 95 standard semantics for the IEEE floating-point value of -0.0 (minus zero) in the SIGN intrinsic, if the processor is capable of distinguishing the difference between -0.0 and +0.0. The default is /assume:nominus0, which uses Fortran 90 and FORTRAN 77 semantics where the value -0.0 or +0.0 in the SIGN function is treated as 0.0. To request Fortran 95 semantics to allow use of the IEEE value -0.0 in the SIGN intrinsic, specify /assume:minus0. In the visual development environment, specify Enable IEEE Minus Zero Support in the Floating Point Compiler Option Category. /assume:[no]source_include This option controls the directory searched for module files specified by a USE statement or source files specified by an INCLUDE statement: Specifying /assume:source_include requests a search for module or include files in the directory where the source file being compiled resides. This is the default. Specifying /assume:nosource_include requests a search for module or include files in the current (default) directory. In the visual development environment, specify the Default INCLUDE and USE Paths in the Preprocessor Compiler Option Category. /assume:[no]underscore Specifying /assume:underscore option controls the appending of an underscore character to external user-defined names: the main program name, named COMMON, BLOCK DATA, and names implicitly or explicitly declared EXTERNAL. The name of blank COMMON remains _BLNK__, and Fortran intrinsic names are not affected. In the visual development environment, specify Append Underscore to External Names in the External Procedures (or Fortran Data) Compiler Option Category. Specifying /assume:nounderscore option does not append an underscore character to external user-defined names. This is the default. For example, the following command requests the noaccuracy_sensitive and nosource_include keywords and accepts the defaults for the other /assume keywords: df /assume:(noaccuracy_sensitive,nosource_include) testfile.f90 /math_library Syntax: /math_library:keyword The /math_library option specifies whether argument checking of math routines is done on x86 systems and the type of math library routines used on Alpha systems. In the visual development environment, specify the Math Library in the Optimizations (or Code Generation) Compiler Option Category. The /math_library options are: /math_library:fast, and /math_library:check: /math_library:fast On x86 systems, /math_library:fast improves performance by not checking the arguments to the math routines. Using /math_library:fast makes tracing the cause of unexpected exceptional values results difficult. On x86 systems, /math_library:fast does not change the accuracy of calculated floating-point numbers. /math_library:check On x86 systems, /math_library:check validates the arguments to and results from calls to the Fortran math routines. This provides slower run-time performance than /math_library:fast on x86 systems, but with earlier detection of exceptional values. This is the default on x86 systems. /tune Syntax: /tune:keyword The /tune option specifies the type of processor-specific machine code instruction tuning for implementations of the processor architecture in use (either x86 or Alpha). Tuning for a specific implementation can improve run-time performance; it is also possible that code tuned for a specific processor may run slower on another processor. Regardless of the /tune:keyword option you use, the generated code runs correctly on all implementations of the processor architecture. If you omit /tune:keyword, /tune:generic is used. In the visual development environment, specify the Optimize For in the Optimizations Compiler Option Category. The /tune keywords have meanings specific to x86 systems or Alpha systems. For x86 (Intel and AMD) systems, the /tune keywords are: /tune:generic Generates and schedules code (sometimes called blended code) that will execute well for all x86 systems. This provides generally efficient code for those applications where all x86 processor generations are likely to be used. This is the default. /tune:host Generates and schedules code optimized for the processor type in use on the x86 system being used for compilation. /tune:p5 (x86 only) Generates and schedules code optimized for the Pentium (586) processor systems. /tune:p6 (x86 only) Generates and schedules code optimized for Pentium Pro, Pentium II, and AMD K6 processor systems. /tune:k6 (x86 only) Generates and schedules code optimized for AMD K6 and Pentium Pro and Pentium II processor systems (/tune:p6 and /tune:k6 are the same). /tune:p6p (x86 only) Generates and schedules code optimized for Pentium III, AMD K6_2, and AMD K6_III processor systems. /tune:k6_2 (x86 only) Generates and schedules code optimized for AMD K6_2 and AMD K6_III processor systems. /tune:k7 (x86 only) Generates and schedules code optimized for AMD Athlon processor systems. Specifying /fast sets /tune:host. For more information about this option, see Requesting Optimized Code for a Specific Processor Generation. For information about timing program execution, see Analyze Program Performance. To control the processor-specific type of machine-code instructions being generated, see the /architecture:keyword option. ********************************************************************* here are the relevant sections of the Intel User's Guide. Processor-Specific Instruction Support (-Qx[i|M|K] and -Qax[i|M|K]) The -Qx[i|M|K] and -Qax[i|M|K] options provide support to generate code that is specific to processor-instruction extensions. The processors and features provided by each extension are listed in the Processor Extension Options table. Processor Extension Options Extension Processors and Features Provided by the Extension i Pentium Pro processor and Pentium II processors, which use the CMOV and FCMOV instructions M Pentium brand processors with MMX technology instructions K Pentium III processor with the Streaming SIMD Extensions, which implies the i and M instruction sets as well. Exclusive Specialized Code with -Qx[i|M|K] Use -Qx[i|M|K] tospecify the minimum extensions required to exist on a processor to enable execution of your application. The following example compiles the program myprog.cpp , using the i extension: prompt> icl -O2 -G6 -Qxi myprog.cpp The resulting program, myprog.exe , might not execute on a Pentium processor, but will execute on Pentium Pro, Pentium II, and Pentium III processors. If a program compiled with -Qx[i|M|K] is executed on a processor that lacks the specified extensions, it can fail with an illegal instruction exception or display other unexpected behavior. See the following table, Processor Instructions Optimization Matrix for -Qx[i|M|K], for suggested processor optimization combinations. Processor Instructions Optimization Matrix for -Qx[i|M|K] 1 To use instructions available on: Optimized to the architecture of these processors: Pentium Processor Pentium Processor with MMX Technology Pentium Pro Processor Pentium II Processor Pentium III Processor Pentium Processor -G5 -G5 -G6 -G6 -G6 Pentium Processor with MMX Technology N/A -G5, -QxM -G6 -G6, -QxM -G6, -QxM Pentium Pro Processor N/A N/A -G6, -Qxi -G6, -Qxi -G6, -Qxi Pentium II Processor N/A N/A N/A -G6, -QxiM -G6, -QxiM Pentium III Processor N/A N/A N/A N/A -G6, -QxK Automated Processor Support with -Qax[i|M|K] Use -Qax[i|M|K] to enable automatic code specialization for specific processors extensions. The compiler generates code to detect, at runtime, the extensions supported by the processor. Specialized code is executed if the applicable extensions are supported by the current processor. The compiler also generates an equivalent, but likely slower, version of generic code. Rounding Control Option (-Qrcd) The Intel compiler uses the -Qrcd option to improve the performance of code that requires floating-point-to-integer conversions. The optimization is obtained by controlling the change of the rounding mode. The system default floating point rounding mode is round-to-nearest. This means that values are rounded during floating point calculations. However, the C language requires floating point values to be truncated when a conversion to an integer is involved. To do this, the compiler must change the rounding mode to truncation before each floating point to integer conversion and change it back afterwards. The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point to integer conversions. Turning on this option can improve performance, but floating point conversions to integer will not conform to C semantics. When to Use -Op The -Op option restricts optimization to maintain declared precision and to ensure that floating-point arithmetic conforms more closely to the ANSI and IEEE standards. For most programs, specifying this option adversely affects performance. If you are not sure whether your application needs this option, try compiling and running your program both with and without it to evaluate the effects on performance versus precision. Specifying this option has the following effects on program compilation: User variables declared as floating-point types are not assigned to registers. Whenever an expression is spilled, it is spilled as 80 bits (extended precision), not 64 bits (double precision). Floating-point arithmetic comparisons conform to IEEE 754 except for NaN behavior. The exact operations specified in the code are performed. For example, division is never changed to multiplication by the reciprocal. The compiler performs floating-point operations in the order specified without reassociation. The compiler does not perform the constant-folding optimization on floating-point values. Constant folding also eliminates any multiplication by 1, division by 1, and addition or subtraction of 0. For example, code that adds 0.0 to a number is executed exactly as written. Compile-time floating-point arithmetic is not performed to ensure that floating-point exceptions are also maintained. Floating-point operations conform to ANSI C. When assignments to type float and double are made, the precision is rounded from 80 bits (extended) down to 32 bits ( float ) or 64 bits ( double ). When you do not specify -Op , the extra bits of precision are not always rounded before the variable is reused. The -Oi- option, which disables inline functions expansion, is used. The -Oi- and -Op options are active by default when you choose the -Za (strict ANSI C conformance) option. /Qipo_wp a higher level of ip optimizations that verifies that the whole program optimizations listed below are possible. These optimizations only happen at link time when it is known that an executable is generated. If the conditions for the listed optimizations are not met, the link time compilation will just do the equivalent of -Qipo Whole program optimizations done at link time: - Data alignment within common blocks - Data layout within common blocks - Elimination of external function not called - Interprocedural constant propagation - No alternate entry needed for stack aligned external function entries /Ow assume no aliasing in program but assume aliasing across function calls. This switch tells the compiler that no aliasing occurs within function bodies but might occur across function calls. After each function call, pointer variables must be reloaded from memory.