SPEC® MPIM2007 Result

Copyright 2006-2010 Standard Performance Evaluation Corporation

AMD, QLogic Corporation, Rackable Systems, IWILL

AMD Emerald Cluster: AMD Opteron CPUs,
QLogic InfiniPath/SilverStorm Interconnect

MPI2007 license: 0018 Test date: May-2007
Test sponsor: QLogic Corporation Hardware Availability: Nov-2006
Tested by: QLogic Performance Engineering Software Availability: Jul-2007
Benchmark results graph

Results Table

Benchmark Base Peak
Ranks Seconds Ratio Seconds Ratio Seconds Ratio Ranks Seconds Ratio Seconds Ratio Seconds Ratio
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
104.milc 32 625 2.51 746 2.10 733 2.13 32 625 2.51 746 2.10 733 2.13
107.leslie3d 32 2124 2.46 2355 2.22 2030 2.57 32 2004 2.61 2075 2.52 2044 2.55
113.GemsFDTD 32 1317 4.79 1559 4.05 1659 3.80 32 1317 4.79 1559 4.05 1659 3.80
115.fds4 32 836 2.33 891 2.19 843 2.31 32 836 2.33 891 2.19 843 2.31
121.pop2 32 1059 3.90 1112 3.71 1112 3.71 32 1059 3.90 1112 3.71 1112 3.71
122.tachyon 32 1445 1.94 1464 1.91 1330 2.10 32 1445 1.94 1464 1.91 1330 2.10
126.lammps 32 1005 2.90 1032 2.82 1033 2.82 32 1005 2.90 1032 2.82 1033 2.82
127.wrf2 32 1547 5.04 1552 5.02 1557 5.01 32 1547 5.04 1552 5.02 1557 5.01
128.GAPgeofem 32 570 3.62 560 3.69 547 3.77 32 570 3.62 560 3.69 547 3.77
129.tera_tf 32 946 2.93 945 2.93 945 2.93 32 942 2.94 942 2.94 939 2.95
130.socorro 32 867 4.40 846 4.51 865 4.41 32 535 7.13 490 7.79 509 7.50
132.zeusmp2 32 1186 2.62 1171 2.65 1155 2.69 32 1186 2.62 1171 2.65 1155 2.69
137.lu 32 2641 1.39 2619 1.40 2565 1.43 32 2641 1.39 2619 1.40 2565 1.43
Hardware Summary
Type of System: Homogenous
Compute Node: Rackable, IWILL, AMD
Interconnects: QLogic InfiniBand HCAs and switches
Broadcom NICs, Force10 switches
File Server Node: Headnode NFS filesystem
Head Node: Rackable, IWILL, AMD
Other Node: Headnode NFS filesystem
Total Compute Nodes: 8
Total Chips: 16
Total Cores: 32
Total Threads: 32
Total Memory: 64 GB
Base Ranks Run: 32
Minimum Peak Ranks: 32
Maximum Peak Ranks: 32
Software Summary
C Compiler: QLogic PathScale C Compiler 3.0
C++ Compiler: QLogic PathScale C++ Compiler 3.0
Fortran Compiler: QLogic PathScale Fortran Compiler 3.0
Base Pointers: 64-bit
Peak Pointers: 64-bit
MPI Library: QLogic InfiniPath MPI 2.1
Other MPI Info: None
Pre-processors: No
Other Software: None

Node Description: Rackable, IWILL, AMD

Hardware
Number of nodes: 8
Uses of the node: compute, head
Vendor: Rackable Systems, IWILL, AMD
Model: Rackable Systems C1000 chassis, IWILL DK8-HTX
motherboard
CPU Name: AMD Opteron 290
CPU(s) orderable: 1-2 chips
Chips enabled: 2
Cores enabled: 4
Cores per chip: 2
Threads per core: 1
CPU Characteristics: --
CPU MHz: 2800
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 1 MB I+D on chip per core
L3 Cache: None
Other Cache: None
Memory: 8 GB (8 x 1 GB DDR400)
Disk Subsystem: 250 GB, SATA
Other Hardware: Nodes custom-built by Rackable Systems. The
Rackable C1000 chassis is half-depth with 450W,
48 VDC Power Supply. Integrated Gigabit Ethernet
for admin/filesystem.
Adapter: Intel 82541PI Gigabit Ethernet controller
Number of Adapters: 1
Slot Type: integrated on motherboard
Data Rate: 1 Gbps Ethernet
Ports Used: 1
Interconnect Type: Ethernet
Adapter: QLogic InfiniPath QHT7140
Number of Adapters: 1
Slot Type: HTX
Data Rate: InfiniBand 4x SDR
Ports Used: 1
Interconnect Type: InfiniBand
Software
Adapter: Intel 82541PI Gigabit Ethernet controller
Adapter Driver: Part of Linux kernel modules
Adapter Firmware: None
Adapter: QLogic InfiniPath QHT7140
Adapter Driver: InfiniPath 2.1
Adapter Firmware: None
Operating System: ClusterCorp Rocks 4.2.1
(Based on RedHat Enterprise Linux 4.0 Update 4)
Local File System: Linux ext3
Shared File System: NFS
System State: Multi-User
Other Software: Sun Grid Engine 6.0

Node Description: Headnode NFS filesystem

Hardware
Number of nodes: 1
Uses of the node: file server, other
Vendor: Tyan
Model: Thunder K8QSD Pro (S4882) motherboard
CPU Name: AMD Opteron 885
CPU(s) orderable: 1-4 chips
Chips enabled: 4
Cores enabled: 8
Cores per chip: 2
Threads per core: 1
CPU Characteristics: --
CPU MHz: 2600
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 1 MB I+D on chip per core
L3 Cache: None
Other Cache: None
Memory: 16 GB (16 x 1 GB DDR400 dimms)
Disk Subsystem: 250 GB, SATA, 7200 RPM
Other Hardware: None
Adapter: Broadcom BCM5704C
Number of Adapters: 2
Slot Type: integrated on motherboard
Data Rate: 1 Gbps Ethernet
Ports Used: 2
Interconnect Type: Ethernet
Software
Adapter: Broadcom BCM5704C
Adapter Driver: Part of Linux kernel modules
Adapter Firmware: None
Operating System: ClusterCorp Rocks 4.2.1
(Based on RedHat Enterprise Linux 4.0 Update 4)
Local File System: Linux ext3
Shared File System: NFS
System State: Multi-User
Other Software: Sun Grid Engine 6.0

General Notes

"other" purposes of this node: login, compile, job submission
and queuing.
This node assembled with a 2U chassis and 700 watt ATX 12V Power Supply.

Interconnect Description: QLogic InfiniBand HCAs and switches

Hardware
Vendor: QLogic
Model: InfiniPath and Silverstorm
Switch Model: QLogic SilverStorm 9120 Fabric Director
Number of Switches: 1
Number of Ports: 144
Data Rate: InfiniBand 4x SDR and InfiniBand 4x DDR
Firmware: 3.4.0.5.2
Topology: Single switch (star)
Primary Use: MPI traffic

General Notes

The data rate between InifniPath HCAs and SilverStorm switches
is SDR. However, DDR is used for inter-switch links.

Interconnect Description: Broadcom NICs, Force10 switches

Hardware
Vendor: Force10
Model: E300
Switch Model: Force10 E300 Gig-E switch
Number of Switches: 1
Number of Ports: 288
Data Rate: 1 Gbps Ethernet
Firmware: N/A
Topology: Single switch (star)
Primary Use: file system traffic

Compiler Invocation

C benchmarks:

 /usr/bin/mpicc -cc=pathcc 

C++ benchmarks:

126.lammps:  /usr/bin/mpicxx -CC=pathCC 

Fortran benchmarks:

 /usr/bin/mpif90 -f90=pathf90 

Benchmarks using both Fortran and C:

 /usr/bin/mpicc -cc=pathcc   /usr/bin/mpif90 -f90=pathf90 

Portability Flags

104.milc:  -DSPEC_MPI_LP64 
115.fds4:  -DSPEC_MPI_LC_TRAILING_DOUBLE_UNDERSCORE   -DSPEC_MPI_LP64 
121.pop2:  -DSPEC_MPI_DOUBLE_UNDERSCORE   -DSPEC_MPI_LP64 
122.tachyon:  -DSPEC_MPI_LP64 
127.wrf2:  -DF2CSTYLE   -DSPEC_MPI_DOUBLE_UNDERSCORE   -DSPEC_MPI_LINUX   -DSPEC_MPI_LP64 
128.GAPgeofem:  -DSPEC_MPI_LP64 
130.socorro:  -fno-second-underscore   -DSPEC_MPI_LP64 
132.zeusmp2:  -DSPEC_MPI_LP64 

Base Optimization Flags

C benchmarks:

 -march=opteron   -Ofast   -OPT:malloc_alg=1 

C++ benchmarks:

126.lammps:  -march=opteron   -O3   -OPT:Ofast   -CG:local_fwd_sched=on 

Fortran benchmarks:

 -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 

Benchmarks using both Fortran and C:

 -march=opteron   -Ofast   -OPT:malloc_alg=1   -O3   -OPT:Ofast   -LANG:copyinout=off 

Peak Optimization Flags

C benchmarks:

104.milc:  basepeak = yes 
122.tachyon:  basepeak = yes 

C++ benchmarks:

126.lammps:  basepeak = yes 

Fortran benchmarks:

107.leslie3d:  -march=opteron   -Ofast   -OPT:unroll_size=256 
113.GemsFDTD:  basepeak = yes 
129.tera_tf:  -march=opteron   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -OPT:unroll_size=256 
137.lu:  basepeak = yes 

Benchmarks using both Fortran and C:

115.fds4:  basepeak = yes 
121.pop2:  basepeak = yes 
127.wrf2:  basepeak = yes 
128.GAPgeofem:  basepeak = yes 
130.socorro:  -march=opteron   -Ofast   -OPT:malloc_alg=1   -O3   -OPT:Ofast   -LANG:copyinout=off   -L/net/files/tools/acml/x86_64/acml3.5.0/pathscale64/lib -lacml 
132.zeusmp2:  basepeak = yes 

Other Flags

C benchmarks:

 -IPA:max_jobs=4 

C++ benchmarks:

126.lammps:  -IPA:max_jobs=4 

Fortran benchmarks:

 -IPA:max_jobs=4 

Benchmarks using both Fortran and C:

 -IPA:max_jobs=4 

The flags file that was used to format this result can be browsed at
http://www.spec.org/mpi2007/flags/MPI2007_flags.20070717.01.html.

You can also download the XML flags source by saving the following link:
http://www.spec.org/mpi2007/flags/MPI2007_flags.20070717.01.xml.