SPEC® MPIM2007 Result

Copyright 2006-2010 Standard Performance Evaluation Corporation

Dell, QLogic, ClusterVision,

U. of Cambridge HPC Cluster Darwin,
QLogic InfiniBand Interconnect

SPECmpiM_peak2007 = Not Run

MPI2007 license: 0018 Test date: May-2007
Test sponsor: QLogic Corporation Hardware Availability: Jul-2006
Tested by: QLogic Performance Engineering Software Availability: Feb-2007
Benchmark results graph

Results Table

Benchmark Base Peak
Ranks Seconds Ratio Seconds Ratio Seconds Ratio Ranks Seconds Ratio Seconds Ratio Seconds Ratio
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
104.milc 512 40.3 38.8 40.0 39.1 40.1 39.0
107.leslie3d 512 144   36.1 139   37.5 137   38.0
113.GemsFDTD 512 592   10.7 601   10.5 596   10.6
115.fds4 512 51.8 37.7 50.1 38.9 70.9 27.5
121.pop2 512 198   20.8 195   21.2 196   21.0
122.tachyon 512 65.9 42.4 66.2 42.2 66.8 41.9
126.lammps 512 221   13.2 221   13.2 221   13.2
127.wrf2 512 125   62.4 125   62.3 140   55.7
128.GAPgeofem 512 28.9 71.5 28.8 71.8 29.4 70.2
129.tera_tf 512 117   23.6 116   23.8 125   22.2
130.socorro 512 133   28.7 135   28.4 132   29.0
132.zeusmp2 512 77.3 40.1 71.2 43.6 70.4 44.1
137.lu 512 53.5 68.7 52.8 69.6 67.9 54.1
Hardware Summary
Type of System: Homogeneous
Compute Node: Dell PowerEdge 1950
Interconnects: QLogic InfiniBand HCAs and switches
Ethernet Network for File Server Access
File Server Node: Dell PowerVault MD1000
Head Node: Dell PowerEdge 1950
Total Compute Nodes: 128
Total Chips: 256
Total Cores: 512
Total Threads: 512
Total Memory: 1 TB
Base Ranks Run: 512
Minimum Peak Ranks: --
Maximum Peak Ranks: --
Software Summary
C Compiler: QLogic PathScale C Compiler 3.0
C++ Compiler: QLogic PathScale C++ Compiler 3.0
Fortran Compiler: QLogic PathScale Fortran Compiler 3.0
Base Pointers: 64-bit
Peak Pointers: 64-bit
MPI Library: QLogic InfiniPath MPI 2.0
Other MPI Info: None
Pre-processors: No
Other Software: None

Node Description: Dell PowerEdge 1950

Hardware
Number of nodes: 128
Uses of the node: compute, head
Vendor: Dell
Model: Dell PowerEdge 1950
CPU Name: Intel Xeon 5160
CPU(s) orderable: 1-2 chips
Chips enabled: 2
Cores enabled: 4
Cores per chip: 2
Threads per core: 1
CPU Characteristics: 1333 MHz system bus
CPU MHz: 3000
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 4 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 8 GB (8 x 1 GB PC2-5300F)
Disk Subsystem: SAS, 73 GB, 15000 RPM
Other Hardware: None
Adapter: QLogic InfiniPath QLE7140
Number of Adapters: 1
Slot Type: PCIe x8
Data Rate: InfiniBand 4x SDR
Ports Used: 1
Interconnect Type: InfiniBand
Software
Adapter: QLogic InfiniPath QLE7140
Adapter Driver: InfiniPath 2.0
Adapter Firmware: None
Operating System: ClusterVisionOS 2.1
Based on Scientific Linux SL release 4.3
(Beryllium)
Local File System: Linux/ext3
Shared File System: NFS
System State: Multi-User
Other Software: Torque 2.1.2

Node Description: Dell PowerVault MD1000

Hardware
Number of nodes: 1
Uses of the node: file server
Vendor: Dell
Model: Dell PowerEdge 1950
CPU Name: Intel Xeon 5160
CPU(s) orderable: 1-2 chip
Chips enabled: 2
Cores enabled: 4
Cores per chip: 2
Threads per core: 1
CPU Characteristics: 1333 MHz system bus
CPU MHz: 3000
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 4 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 4 GB (4 x 1 GB PC2-5300F)
Disk Subsystem: 13.5 TB: 3 x 15 x 300 GB, SAS, 10000 RPM
3 Dell PowerVault MD1000 Disk Arrays, each one
has 15 disks.
Other Hardware: None
Adapter: Chelsio T310 10GBASE-SR RNIC (rev 3)
Number of Adapters: 1
Slot Type: PCIe x8 MSI-X
Data Rate: 10 Gbps Ethernet
Ports Used: 1
Interconnect Type: Ethernet
Software
Adapter: Chelsio T310 10GBASE-SR RNIC (rev 3)
Adapter Driver: cxgb3 1.0.078
Adapter Firmware: T 3.3.0
Operating System: ClusterVisionOS 2.1
Based on Scientific Linux SL release 4.3
(Beryllium)
Local File System: Linux/ext3
Shared File System: NFS
System State: Multi-User
Other Software: None

General Notes

A separate node handling login and resouces management
is not listed as it is not performance related.

Interconnect Description: QLogic InfiniBand HCAs and switches

Hardware
Vendor: QLogic
Model: InfiniPath adapters and Silverstorm switches
Switch Model: QLogic SilverStorm 9080 Fabric Director
(InfiniBand switch)
Number of Switches: 2
Number of Ports: 96
Data Rate: InfiniBand 4x SDR and InfiniBand 4x DDR
Firmware: 3.4.0.1.3
Switch Model: QLogic SilverStorm 9240 InfiniBand switch
Number of Switches: 1
Number of Ports: 288
Data Rate: InfiniBand 4x SDR and InfiniBand 4x DDR
Firmware: 3.4.0.1.3
Topology: Constant Bisectional Bandwidth, Fat-Tree,
Max 5 switch-chip hops.
Primary Use: MPI traffic

General Notes

Two CUs (Computational Unit, 65 nodes) were involved, so
two SilverStorm 9080 switches and the 9240 core switch were
used on this run.
The data rate between InifniPath HCAs and SilverStorm switches
is SDR. However, DDR is used for inter-switch links.

Interconnect Description: Ethernet Network for File Server Access

Hardware
Vendor: Chelsio, Nortel
Model: Chelsio T310 adapters and Nortel 5530 5510 8610
switches
Switch Model: Nortel Ethernet Routing Switch 5510-24T
Number of Switches: 1
Number of Ports: 24
Data Rate: 1 Gbps Ethernet
Firmware: 1.0.0.16
Switch Model: Nortel Ethernet Routing Switch 5510-48T
Number of Switches: 3
Number of Ports: 48
Data Rate: 1 Gbps Ethernet
Firmware: 1.0.0.16
Switch Model: Nortel Ethernet Routing Switch 5530-24TFD
Number of Switches: 2
Number of Ports: 26
Data Rate: 1 Gbps Ethernet (24 ports) and 10 Gbps Ethernet
(2 ports)
Firmware: 4.2.0.12
Switch Model: Nortel Passport 8610 switch 4.1.0.0
Number of Switches: 1
Number of Ports: 24
Data Rate: 10 Gbps Ethernet
Firmware: Optivity Switch Manager version 4.1
Topology: Three CUs are connected with six Ethernet Routing
switches 5530-24TFD, 5510-24T and 5510-48T as a
ring. Each of two 5530-24TFD switches is connected
to the Nortel Passport 8610 switch through two
10Gbit ports. See Slide 10 of
NortelEthernetSwitchDiagram.pdf
for a network diagram.
Primary Use: file system traffic

Base Compiler Invocation

C benchmarks:

 /usr/bin/mpicc -cc=pathcc 

C++ benchmarks:

126.lammps:  /usr/bin/mpicxx -CC=pathCC 

Fortran benchmarks:

107.leslie3d:  /usr/bin/mpif90 -f90=pathf90 
113.GemsFDTD:  /usr/bin/mpif90 -f90=pathf90 
115.fds4:  /usr/bin/mpif90 -f90=pathf90 
129.tera_tf:  /usr/bin/mpif90 -f90=pathf90 
132.zeusmp2:  /usr/bin/mpif90 -f90=pathf90 
137.lu:  /usr/bin/mpif90 -f90=pathf90 

Benchmarks using both Fortran and C (except as noted below):

 /usr/bin/mpicc -cc=pathcc   /usr/bin/mpif90 -f90=pathf90 

Base Portability Flags

104.milc:  -DSPEC_MPI_LP64 
121.pop2:  -DSPEC_MPI_DOUBLE_UNDERSCORE   -DSPEC_MPI_LP64 
122.tachyon:  -DSPEC_MPI_LP64 
127.wrf2:  -DF2CSTYLE   -DSPEC_MPI_DOUBLE_UNDERSCORE   -DSPEC_MPI_LINUX   -DSPEC_MPI_LP64 
128.GAPgeofem:  -DSPEC_MPI_LP64 
130.socorro:  -fno-second-underscore   -DSPEC_MPI_LP64 

Base Optimization Flags

C benchmarks:

 -march=core   -Ofast 

C++ benchmarks:

126.lammps:  -march=core   -O3   -OPT:Ofast   -CG:local_fwd_sched=on 

Fortran benchmarks:

107.leslie3d:  -march=core   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
113.GemsFDTD:  -march=core   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
115.fds4:  -march=core   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
129.tera_tf:  -march=core   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
132.zeusmp2:  -march=core   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
137.lu:  -march=core   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 

Benchmarks using both Fortran and C:

121.pop2:  -march=core   -Ofast   -O3   -OPT:Ofast   -OPT:malloc_alg=1   -LANG:copyinout=off 
127.wrf2:  Same as 121.pop2 
128.GAPgeofem:  Same as 121.pop2 
130.socorro:  Same as 121.pop2 

Base Other Flags

C benchmarks:

 -IPA:max_jobs=4 

C++ benchmarks:

126.lammps:  -IPA:max_jobs=4 

Fortran benchmarks:

107.leslie3d:  -IPA:max_jobs=4 
113.GemsFDTD:  -IPA:max_jobs=4 
115.fds4:  -IPA:max_jobs=4 
129.tera_tf:  -IPA:max_jobs=4 
132.zeusmp2:  -IPA:max_jobs=4 
137.lu:  -IPA:max_jobs=4 

Benchmarks using both Fortran and C (except as noted below):

 -IPA:max_jobs=4 

The flags file that was used to format this result can be browsed at
http://www.spec.org/mpi2007/flags/MPI2007_flags.20070717.00.html.

You can also download the XML flags source by saving the following link:
http://www.spec.org/mpi2007/flags/MPI2007_flags.20070717.00.xml.