Compaq Tru64 UNIX Switch Disclosure SPEC CPU95 Compaq Computer Corporation Revised 3 August 1999 This SPEC CPU95 switch disclosure is for Tru64 UNIX (formerly known as Digital UNIX) - other disclosures apply to Alpha/NT, Intel/NT, and NonStop-UX. This document was originally written in October 1997, and it has been updated to add switches used in submissions of August, 1999. An attempt is made to be cumulative, so some switches listed here might not be used in current submissions. Switches are given in alphabetical order rather than by product or benchmark. It is hoped that this order will be convenient for the reader of the NOTES section of a SPEC CPU95 disclosure who wants to look up a specific command or switch. The collating sequence ignores upper/lower case, hyphens, and the presence of "no" for negation. That is, if you are looking for "-nomumble", try looking under "-mumble". Note: some switches in this disclosure statement are not used directly, but are generated by other switches (e.g. "-fast"). -ag=a (KAP Fortran) Perform aggressive optimization, pad common blocks and subroutine-local memory to avoid cache line collisions. -ag=b (KAP Fortran) Redefine the array indices of arrays when doing so would cause cache utilization benefits. -ag=f (KAP Fortran) Pad arrays as in -ag=b while relaxing the restriction that arrays cannot be passed as actual parameters to other procedures. -ag=g (KAP Fortran) Extends padding for -ag=abf to pad one dimensional arrays as well as the leading dimension of multi-dimensional arrays. -ansi_alias (DEC C) Directs the compiler to assume the ANSI C aliasing rules. -ansi_args (DEC C) Tells the compiler that the source code follows all ANSI rules about arguments; that is, whether the type of an argument matches the type of the parameter in the called function, or whether a function prototype is present so the compiler can automatically perform the expected type conversion. -arch (DEC C, Digital Fortran) Generate code that may include instructions which are newly introduced with . For example, "ev56" adds byte/word load and store, and "ev6" adds sqrt. See also -tune, below. -noarclimit (KAP Fortran) Specifies unlimited data dependence analysis. -assume trusted_short_alignment (DEC C) Specifies that any short accessed through a pointer is naturally aligned. -assume whole_program (DEC C) Specifies that there are no occurrences of the address-of operator (&) being applied outside the current compilation unit to extern variables that are declared inside the current compilation unit. This flag is often suitable for use with the -ifo flag, which presents a group of source files to the compiler as a single compilation unit. -automatic Place local variables on the run-time stack, rather than in static storage. cc (compiler) If the Developer's Toolkit is NOT installed, this command invokes the system C compiler. If the toolkit has been installed, then it invokes the compiler in /usr/lib/cmplrs/cc.dtk cc.alt (compiler) This commmand invokes the alternative DEC C compiler, which is intended to deliver faster runtime performance. -conc (KAP Fortran) Restructure the source code for parallel processing. -D_INTRINSICS (DEC C) Declares certain functions to be intrinsic. When a function is intrinsic, the compiler is free to generate faster code that provides the same function behavior (but may not actually call the function). -D_INLINE_INTRINSICS (DEC C) Directs the compiler to inline some of the intrinsic functions, avoiding the overhead of a function call. -D_FASTMATH (DEC C) Redefines the names of certain common math routines so that faster but slightly less accurate functions are used. -fast (DEC C) Provides a single method for turning on a collection of optimizations for increased performance, namely: -ansi_alias -ansi_args -assume trusted_short_alignment -D_INTRINSICS -D_INLINE_INTRINSICS -D_FASTMATH -float -fp_reorder -ifo -O3 -readonly_strings -float (DEC C) Tells the compiler that it is not necessary to promote expressions of type float to type double. -fp_reorder (DEC C) Allows floating-point operations to be reordered during optimiza- tion. -fkapargs='...' (KAP Fortran) Pass the switches between apostrophes to the KAP optimizer. -fuse (KAP Fortran) Perform loop fusion. -fuselevel=1 or 2 (KAP Fortran) Perform more aggressive fusion (0=default). -gn (C, Fortran) Debugging information. g0 deletes symbolic information, g1 retains line numbers, g2 retains all symbols and reduces optimization, and g3 does the same as g2 without inhibiting optimization. g2 produces more accurate debug information than g3. -gen_feedback (DEC C) Generate accurate profile information for compiler feedback. -granularity byte (Digital Fortran) Ensures that data of byte size can be accessed from different threads sharing data in memory. -heaplimit=500 (KAP Fortran) KAP may require large amounts of memory in order to process your source code. The heaplimit switch specifies the maximum size in megabytes that the KAP heap can grow. -ifo (DEC C) Performs inter-file optimizations. -inline none (DEC C, Digital Fortran) Do not do inline expansion of function calls -inline speed (DEC C, Digital Fortran) Provides inline expansion of function calls even when doing so may significantly increase the size of the program. -interleave (KAP Fortran) Allow array references in unrolled loops to be interchanged so that references to the same array are adjacent to each other. -ipa (KAP Fortran) Do interprocedural analysis. -ipa_optimize=2 (KAP Fortran) Enables a group of interprocedural analysis options useful for optimizing large codes, including -ipa, -ipa_loop_level=3, -ipa_depth=10, -heaplimit=500 and -noarclimit. -ipa_depth=10 (KAP Fortran) The ipa_depth switch sets the maximum level of subprogram nesting which kapf will attempt to analyze. -ipa_loop_level=3 (KAP Fortran) The ipa_looplevel switch enables the user to limit inlining to just functions which are referenced in nested loops where the effects of reduced function call overhead or enhanced optimizations will be multiplied. kf77 (compiler) This command invokes the KAP Fortran 77 high-level optimizer and then invokes the Fortran 77 compiler. kf90 (compiler) This command invokes the KAP Fortran 90 high-level optimizer and then invokes the Fortran 90 compiler. ld (linker) This command invokes the linker. -lexc (ld) Include the exception handling library. -lsys5 (ld) Links with the system 5 version of system services. -mc=1500 (KAP Fortran) The minconcurrent switch sets the level of work in a loop above which KAP executes the loop in parallel. -merge (prof -pixie) Sums the .Counts files and writes the result into a new file. -non_shared (ld) Directs the linker to produce a static executable. The output object created by the linker will not use any shared objects during execution. -O0 through -O5 (Digital Fortran) Fortran's general optimization level. O0 disable all optimizations O1 local optimizations and common subexpressions O2 global optimizations such as code motion, strength reduction, lifetime analysis, and code scheduling O3 additional global optimizations that may cost more space, such as loop unrolling and code replication O4 inline expansion O5 software pipelining and loop transformation -O0 through -O4 (DEC C) DEC C's general optimization level. O0 disable all optimizations O1 local optimizations and common subexpressions global optimizations such as code motion, strength reduction, lifetime analysis, and code scheduling O2 additional global optimizations that may cost more space, such as loop unrolling and code replication O3 inline expansion O4 software pipelining -o=n (KAP Fortran) KAP's general optimization level. 0 No optimization 1 Induction variables recognized, loop interchanging 2 Lifetime analysis 3 Additional loop interchanging, wraparound variables 4 Loop interchanging around reductions 5 Array expansion om (postlink optimizer) [location: /usr/lib/cmplrs/cc.alt/om ] -om (Digital Fortran, DEC C) This command (or switch if added to a Fortran or C compile command) invokes the om post-link time optimizer which does optimizations such as "nop" removal, .lita removal, and global pointer repositioning. -om_ireorg_feedback (om) Uses the pixie-produced information in file.Counts and file.Addrs to reorganize the instructions to reduce cache thrashing. -om_split_procedures (om) Allows om to break procedures into more than one piece. -pids (DEC C, pixie) Enables the addition of the process-id to the filename of the basic block counts file (.Counts). This facilitates collecting information from multiple invocations of the pixie output file. -pipeline (DEC C, Digital Fortran) Enables software pipelining, that is, "wrap around" of loop iterations to reduce latency. pixie (profiling) Add profiling code to a program prof -pixie (profiling) Analyze profile data -prof_dir /tmp/prof (DEC C) Specifies a location to which the profiling data files (.Counts and .Addrs) are written. -prof_gen (DEC C) Generates an executable image that has profiling code added to it. -prof_gen_noopt (DEC C) Generates an executable image that has profiling code added to it and which is not optimized (this may improve the profile accuracy). -prof_use_feedback (DEC C) Uses profiling feedback to improve runtime performance. -prof_use_om_feedback (DEC C) Uses profiling feedback to rearrange the resulting image to reduce cache conflicts of the program text. This flag uses the -om postlink optimizer. protect_headers_setup.sh (DEC C installation option) Ensures that the compiler's assumptions about pointer sizes and data alignments are not in conflict with the default values that were in effect when the system libraries were created. -r (ld) Retain relocation entries in the output file. -r=n (KAP Fortran) Roundoff option to specify change in serial roundoff error that is tolerable. The range of values allowed is 0 through 3, from least to most differences allowed. -readonly_strings (DEC C) Makes string literals read-only for improved performance. setenv DECFORT_CC cc.alt (Fortran Driver) The Fortran driver does not invoke the linker or om directly; instead it calls the C driver which does these tasks. Setting the environment variable DECFORT_CC tells Fortran where to find the C driver, in this case /usr/lib/cmplrs/cc.alt. -so=0 (KAP Fortran) Scalar optimizations. Setting this switch to 0 disables optimizations such as code floating out of loops, loop unrolling, and loop peeling. -speculate all (Digital Fortran, DEC C) Enables speculative code scheduling for all routines in the program. Speculation occurs when a conditionally executed instruction is moved to a position before a test instruction so that the moved instruction is then executed unconditionally. -std1 (DEC C) Strictly enforces the ANSI C standard and all its prohibitions. -taso (ld) Directs the linker to load the executable file in the lower 31-bit addressable virtual address range. tcp_ttl (Digital Unix) Sets the time to live for TCP/IP sockets; default is 60 seconds. -transform_loops (Digital Fortran) Enables a group of loop transformation optimizations that apply to array references within loops, including loop blocking, distribution, fusion, and interchange. -tune (DEC C, Digital Fortran) Generate code that is optimized for a particular cpu model. This switch preferentially tunes for the specified model, but assumes that the code may be run on any processor that implements the instruction set called for in -arch. For example, the combination "-tune ev6 -arch ev56" specifies that the code should be scheduled for ev6 class machines while still preserving the ability to run quickly on machines that lack the sqrt instruction. See also -arch, above. -unroll n (DEC C, DIGITAL Fortran) Specify the depth of loop unrolling -unsigned (DEC C) Causes all char declarations to be unsigned char declarations. -ur= (KAP Fortran) [alternate spelling: -unroll] The maximum number of iterations to unroll inner loops. -ur2= (KAP Fortran) [alternate spelling: -unroll2] The maximum work allowed in an unrolled loop. Work is estimated by counting operands and operators in a loop. -ur3= (KAP Fortran) [alternate spelling: -unroll3] The lower limit for unrolling. If there are less than n units of work in the loop, the loop will not be unrolled. -xtaso_short (DEC C) Directs the compiler to allocate 32-bit pointers by default. You can still use 64-bit pointers, but only by the use of pragmas.