Sun Microsystems SPEC CPU Flags + Sun Blade 6000 Platform Notes

Compilers

Sun Studio 12

  • "cc": Sun Studio C
  • "CC": Sun Studio C++
  • "f90": Sun Studio Fortran

GCC for SPARC Systems V4.2.0 (gccfss).

  • "gcc": invoke compiler for C programs
  • "g++": invoke compiler for C++ programs.

Note: these compilers are described together because gccfss uses the same optimizing code generator as Studio 12.

Operating systems: Solaris 10
Copyright:

The text for many of the descriptions below was excerpted from the Sun Studio Compiler Documentation, which is copyright © 2005 Sun Microsystems, Inc. The original documentation can be found at docs.sun.com.

Some material below is quoted from the gccfss website, http://cooltools.sunsource.net/gcc/. Additional information about GCC options may be found at the The GNU C documentation website.

Last updated: 13-Mar-2008 jh

Sections

Selecting one of the following will take you directly to that section:


Optimization Flags


Portability Flags


Compiler Flags


Other Flags


Forbidden Flags


System and Other Tuning Information

Platform Settings

These platform notes are in two sections: a generic section about system tuning and a section which is specific to the Sun Blade 6000 test of March, 2008.

System and process tuning

One or more of the following settings may have been applied to the testbed. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.

autoup=<n> (Unix /etc/system)
When the file system flush daemon fsflush runs, it writes to disk all modified file buffers that are more than n seconds old.

bufhwm=<n> (Unix /etc/system)
Sets the upper limit of the file system buffer cache. The units for bufhwm are in kilobytes. Alternatively, the units can be expressed as a percent of total memory, by setting bufhw_pct.

cpu_bringup_set=<n> (Unix /etc/system)
Specifies which processors are enabled at boot time. <n> represents a bitmap of the processors to be brought online.

disablecomponent (System Management Services)
This command can be used prior to booting the system for a 1-cpu test. The tester uses disablecomponent to add all other CPUs to the "blacklist", which is a list of components that cannot be used at boot time.

LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time and run-time linkers. Usually, it can be defaulted; but testers may sometimes choose to explicitly set it (as documented in the notes in the submission), in order to ensure that the correct versions of libraries are picked up.

LD_PRELOAD=<shared object> (Unix environment variable)
Adds the named shared object to the runtime environment.

MADV=access_lwp and LD_PRELOAD=madv.so.1 (Unix environment variables)
When the madv.so.1 shared object is present in the LD_PRELOAD list, it is possible to provide advice to the system about how memory is likely to be accessed. The advice present in MADV applies to all processes and their descendants. A commonly used value is access_lwp, which means that when memory is allocated, the next process to touch it will be the primary user. Examples of other possible values include sequential, for memory that is used only once and then no longer needed and acces_many when many processes will be sharing data.

MPSSHEAP=<size>, MPSSSTACK=<size>, and LD_PRELOAD=mpss.so.1 (Unix environment variables)
When these variables are set, the mpss.so.1 shared object will set the preferred page size for new processes, and their descendants, to the requested sizes for the heap and stack.

PARALLEL=<n> (Unix environment variable)
If programs have been compiled with -xautopar, this environment variable can be set to the number of processors that programs should use.

segmap_percent=<n> (Unix /etc/system)
This value controls the size of the segmap cache as a percent of total memory. Set this value to help keep the file system cache from consuming memory unnecessarily.

STACKSIZE=<n> (Unix environment variable)
Set the size of the stack (temporary storage area) for each slave thread of a multithreaded program.

svcadm disable webconsole (Unix, superuser commands)
Turns off the Sun Web Console, a browser-based interface that performs systems management. If it is enabled, system administrators can manage systems, devices and services from remote systems.

tsb_rss_factor=<1> (Unix /etc/system)
Suggests that the the size of the TSB (Translation Storage Buffer) may be increased if it is more than 25% (128/512) full. Doing so may reduce TSB traps, at the cost of additional kernel memory.

tune_t_fsflushr=<n> (Unix /etc/system)
Controls the number of seconds between runs of the file system flush daemon, fsflush.

ulimit -s <n> (Unix shell)
Sets the stack size to n kbytes, or "unlimited" to allow the stack size to grow without limit.
Note that the "heap" and the "stack" share space; if your application allocates large amounts of memory on the heap, then you may find that the stack limit should not be set to "unlimited". A commonly used setting for SPEC CPU2006 purposes is a stack size of 128MB (131072K).

Submit on the Sun Blade 6000

For the testing of the Sun Blade 6000 System, jobs are submitted to processors using

submit = $[top]/config/blade-submit.pl $SPECCOPYNUM "$command" 

In this line, the SPEC tools invoke a perl procedure which does arithmetic to derive processor numbers from the SPEC copy number. The procedure receives as input the copy number and the command that actually runs the benchmark, and produces as output a file that assigns the job to the correct location, and starts that file with ssh. Here is the complete text of the procedure:

#!/bin/perl

use strict;
use Cwd;

# Particular testbed used today:
my @nodes = qw ( sys115 sys114 sys113
                 sys112 sys111 sys110 
                 sys109 sys108 sys107
                 sys106 );

# Processor description: 
# UltraSPARC T2 has 8 cores, each with 2 integer units, each with 4 threads
my @cores                = qw ( 7 6 5 4 3 2 1 0);  # When assigning,
my @int_units            = qw ( 1 0 );             # ...fill resources from top
my @threads              = qw ( 3 2 1 0);          # ...to bottom.
my $threads_per_int_unit = $#threads + 1;
my $threads_per_core     = $threads_per_int_unit * ($#int_units + 1);
# end of processor description section

my $rundir        = getcwd;
my $copynum       = shift @ARGV;

my $node          =     $copynum % ($#nodes+1);     
my $copy_on_node  = int($copynum / ($#nodes+1)); 

my $core          =     $copy_on_node % ($#cores+1);         
my $copy_on_core  = int($copy_on_node / ($#cores+1));         

my $int_unit         =     $copy_on_core % ($#int_units+1);    
my $copy_on_int_unit = int($copy_on_core / ($#int_units+1));    

my $processor_num = ($cores[$core] * $threads_per_core) + 
                    ($int_units[$int_unit] * $threads_per_int_unit) +
                    $threads[$copy_on_int_unit];

open DOBMK, "> dobmk" or die "Eh?";
print DOBMK "cd $rundir\n";
print DOBMK '/usr/ucb/echo -n "`hostname` " >> pbind.out' , "\n";
print DOBMK "/usr/sbin/pbind -b $processor_num \$\$ >> pbind.out\n"; 
print DOBMK 'sh -c "' . join(' ', @ARGV) . '"' . "\n";
close DOBMK;
system '/usr/bin/ssh', '-n', $nodes[$node], 'sh', "$rundir/dobmk";

The effect of the above procedure is to use ssh (Secure Shell) to submit jobs to the nodes listed at the top, binding a copy of the benchmark to a virtual processor on that node with pbind. (Note: the above arithmetic could have been accomplished more efficiently using more traditional 'awk' and 'expr' methods, but the tester felt that a slight loss of efficiency was balanced by the potential improvement in clarity from the above procedure.)