Compaq Tru64 UNIX Switch Disclosure SPEC CPU2000 Compaq Computer Corporation Revised 21 December 1999 This SPEC CPU2000 switch disclosure is for Tru64 UNIX (formerly known as Digital UNIX). This document was originally written in November 1999, and will be updated to add new switches used in later submissions. An attempt is made to be cumulative, so some switches listed from earlier submissions might not be used in later submissions. Switches are given in alphabetical order rather than by product or benchmark. It is hoped that this order will be convenient for the reader of the NOTES section of a SPEC CPU2000 disclosure who wants to look up a specific command or switch. The collating sequence ignores upper/lower case, hyphens, and the presence of "no" for negation. That is, if you are looking for "-nomumble", try looking under "-mumble". Note: some switches in this disclosure statement are not used directly, but are generated by other switches (e.g. "-fast"). -align dcommons (Compaq Fortran) Aligns COMMON block entities on natural boundaries up to 8-bytes. -align sequence (Compaq Fortran) Specifies that components of a SEQUENCEd derived type are to be aligned according to the alignment rules set by the user (which by default cause components to be aligned on natural boundaries). -ansi_alias (Compaq C) Directs the compiler to assume the ANSI C aliasing rules. -ansi_args (Compaq C) Tells the compiler that the source code follows all ANSI rules about arguments; that is, whether the type of an argument matches the type of the parameter in the called function, or whether a function prototype is present so the compiler can automatically perform the expected type conversion. -arch (Compaq C, Compaq Fortran) Generate code that may include instructions which are newly introduced with . For example, "ev56" adds byte/word load and store, and "ev6" adds sqrt. See also -tune, below. -arl=n (KAP C) Informs KAP what level of data aliasing may be present in the program: 0 kapc makes no assumptions about data aliasing. 1 A pointer will not contain its own address. 2 No objects represented by function parameters overlap in memory. 3 Globals, function parameters, and locals form distinct groups. 4 No aliases for objects. -assume bigarrays (Compaq Fortran) Suppresses run-time checking for distributed small array dimensions, for increased performance if using the -wsf option. -assume noaccuracy_sensitive (Compaq Fortran) Same as -fp_reorder -assume nomath_errno (Compaq C) Allows the compiler to reorder or combine computations to improve the performance of those math functions that it recognizes as intrinsic functions. -assume trusted_short_alignment (Compaq C) Specifies that this is a strictly-conforming ANSI C program with respect to the dereferencing of pointer-to-short variables. This allows the compiler to assume that any short accessed through a pointer is naturally aligned (as the C language requires). -assume nozsize (Compaq Fortran) Omits run-time checking for zero-sized array sections for increased performance if using the -wsf option. cc (compiler) If the Developer's Toolkit is NOT installed, this command invokes the system C compiler. If the toolkit has been installed, then it invokes the compiler in /usr/lib/cmplrs/cc.dtk -ckapargs='' (KAP C) Pass the switches between apostrophes to the KAP optimizer. -Dalloca=__builtin_alloca (Compaq C) Portability switch, used for gcc; specifies to use the builtin version of alloca cxx (compiler) Invokes the C++ compiler -D_INTRINSICS (Compaq C) Declares certain functions to be intrinsic. When a function is intrinsic, the compiler is free to generate faster code that provides the same function behavior (but may not actually call the function). -D_INLINE_INTRINSICS (Compaq C) Directs the compiler to inline some of the intrinsic functions, avoiding the overhead of a function call. -D_FASTMATH (Compaq C) Redefines the names of certain common math routines so that faster but slightly less accurate functions are used. -DALPHA (crafty) Portability switch for crafty. Specifies that longs are 64 bits, that we do not need to say "long long" to get 64 bit quantities, and that the architecture is little endian. -DSPEC_CPU2000_DUNIX (perlbmk) Portability switch for perlbmk - see source code for exact effect in module benchspec/CINT2000/253.perlbmk/src/spec_config.h. Sets items such as number of bytes in a long, little endian byte order, how to invoke the C preprocessor, says that "fcntl" is available. -DSPEC_CPU2000_LP64 (gap, vortex) Specifies that longs and pointers are 64 bits. -DSYS_HAS_CALLOC_PROTO (gap) Specifies that the system already defines the function calloc -DSYS_HAS_IOCTL_PROTO (gap) Specifies that the system already defines ioctl -DSYS_IS_BSD (gap) Specifies that the system is compatible with BSD Unix, using conventions such as "/" for directory separation, Unix signals, string concatenation, etc. f77 (compiler) EITHER invokes the f90 compiler with some flags set that increase compatibility with the older f77 standard, OR invokes the older compiler, if the link in /bin/f77 has been set as specified in the release notes. Initial CPU2000 submissions set the link for the older compiler. f90 (compiler) Invokes the f90 compiler -fast (Compaq C) Provides a single method for turning on a collection of optimizations for increased performance, namely: -ansi_alias -ansi_args -assume nomath_errno -assume trusted_short_alignment -D_INTRINSICS -D_INLINE_INTRINSICS -D_FASTMATH -float -fp_reorder -ifo -intrinsics -O3 -readonly_strings -fast (Compaq Fortran) Provides a single method for turning on a collection of optimizations for increased performance, namely: -align dcommons -arch host -assume noaccuracy_sensitive -math_library fast -O4 (the default) -tune host For f90 and f95, -fast also sets -align sequence -assume bigarrays -assume nozsize. fdo_pre0 = mkdir /tmp/pb; rm -f /tmp/pb/${baseexe}* (CPU2000 config file) Causes the SPEC tools to clean the directories where feedback is accumulated to remove data from any previous compiles. -fixed (Compaq Fortran) Portability switch, used by galgel, indicates that source code is in fixed (72 column) format. -fkapargs='...' (KAP Fortran) Pass the switches between apostrophes to the KAP optimizer. -float (Compaq C) Tells the compiler that it is not necessary to promote expressions of type float to type double. -fp_reorder (Compaq C, Compaq Fortran) Allows floating-point operations to be reordered during optimiza- tion based on algebraic identities. +GEMFB (Compaq C) Use GEM (i.e. compiler) feedback. This is an abbreviation used to make the notes section simpler, and not an actual switch. Look elsewhere in the notes section for the details. -g3 (Compaq C, Compaq C++, Compaq Fortran) Allow symbols in optimized code -ifo (Compaq C) Performs inter-file optimizations. -inline speed (Compaq C, Compaq Fortran) Provides inline expansion of function calls even when doing so may significantly increase the size of the program. kcc (compiler) This command invokes the KAP C high-level optimizer and then invokes the Compaq C compiler. kf77 (compiler) This command invokes the KAP Fortran 77 high-level optimizer and then invokes the Fortran 77 compiler. When the f77 compiler is invoked, KAP adds the following switches: -fast -non_shared (single CPU only) -tune host kf90 (compiler) This command invokes the KAP Fortran 90 high-level optimizer and then invokes the Fortran 90 compiler. When the f90 compiler is invoked, KAP adds the following switches: -fast -non_shared (single CPU only) -tune host -ldxml (library) Specifies that the program should be linked with the Compaq extended math library, which incluces optimized BLAS functions -math_library fast (Compaq Fortran) Select math library routines that provide faster performance. For certain ranges of input values, the faster routines may not provide a result that is as accurate as provided by the default. -non_shared (ld) Directs the linker to produce a static executable. The output object created by the linker will not use any shared objects during execution. -O0 through -O5 (Compaq Fortran) Fortran's general optimization level. O0 disable all optimizations O1 local optimizations and common subexpressions O2 global optimizations such as code motion, strength reduction, lifetime analysis, and code scheduling O3 additional global optimizations that may cost more space, such as loop unrolling and code replication O4 inline expansion O5 software pipelining and loop transformation -O0 through -O4 (Compaq C) Compaq C's general optimization level. O0 disable all optimizations O1 local optimizations and common subexpressions global optimizations such as code motion, strength reduction, lifetime analysis, and code scheduling O2 additional global optimizations that may cost more space, such as loop unrolling and code replication O3 inline expansion O4 software pipelining NOTE: when kcc is used, optimization levels are effectively one less than stated in the command line, for historical reasons. That is, "kcc -O4" eventually invokes the compiler backend with the same optimization level as would be used by "cc -O3". -O0 through -O4 (Compaq C++) C++ General optimization level: O0 no optimization O1 Optimize for space O2 Optimize for time O3 Same as O2 O4 Additional speed optimizations at the expense of space -o=n (KAP Fortran) KAP's general optimization level. 0 No optimization 1 Induction variables recognized, loop interchanging 2 Lifetime analysis 3 Additional loop interchanging, wraparound variables 4 Loop interchanging around reductions 5 Array expansion -o=n (KAP C) KAP C's general optimization level. 0 No loop optimization performed Only simple analysis performed 1 Simple loop optimization performed Loops distributed to optimize only part of loop 2 Loops in a loop nest optimized Lifetime analysis performed More powerful data dependence tests performed 3 Special techniques used to break data dependence cycles Triangular loops recognized Loop interchanging performed to improve memory referencing Special case data dependence tests used 4 Two versions of a loop generated to break data dependence arc when necessary Exact data dependence tests used Wraparound variables recognized 5 Array expansion and loop fusion enabled ONESTEP (SPEC CPU2000 config file) Setting ONESTEP=YES tells the SPEC tools to build from all sources in one step. For more information, search for "ONESTEP" in the run rules. -pipeline (Compaq C, Compaq Fortran) Enables software pipelining, that is, "wrap around" of loop iterations to reduce latency. -prof_dir (Compaq C) Specifies a location to which the profiling data files (.Counts and .Addrs) are written. -prof_gen_noopt (Compaq C) Generates an executable image that has profiling code added to it and which is not optimized (this may improve the profile accuracy). -prof_use_feedback (Compaq C) Uses profiling feedback to improve runtime performance. readonly_strings (Compaq C) Makes string literals read-only for improved performance. RM_SOURCES= (SPEC CPU2000 config file) Tells the SPEC tools not to use a certain source file, normally because it will be replaced by a dxml library. -transform_loops (Compaq Fortran) Enables a group of loop transformation optimizations that apply to array references within loops, including loop blocking, distribution, fusion, and interchange. -tune (Compaq C, Compaq Fortran) Generate code that is optimized for a particular cpu model. This switch preferentially tunes for the specified model, but assumes that the code may be run on any processor that implements the instruction set called for in -arch. For example, the combination "-tune ev6 -arch ev56" specifies that the code should be scheduled for ev6 class machines while still preserving the ability to run quickly on machines that lack the sqrt instruction. See also -arch, above. -unroll n (Compaq C, Compaq Fortran) Specify the depth of loop unrolling -ur= (KAP Fortran) [alternate spelling: -unroll] The maximum number of iterations to unroll inner loops. -v (all compilers) Turn on verbose mode, so the compiler driver will print its steps as it goes. Has no effect on the generated executable. -xtaso_short (Compaq C) Directs the compiler to allocate 32-bit pointers by default. You can still use 64-bit pointers, but only by the use of pragmas. Using this switch can cause conflicts between the compiler's assumptions about pointer sizes and the assumptions in the system libraries. Diagnostic messages will be generated at compile time unless the installation option "protect_headers_setup(8)" has been used. (It is, in fact, used in the CPU2000 submissions, as requested by the manpage).