|
Lenovo Global Technology ThinkSystem SR670 V2 (Intel Xeon Platinum 8380, Nvidia A100-PCIE-40G) |
SPEChpc 2021_tny_base = 24.8 |
|
SPEChpc 2021_tny_peak = Not Run |
| hpc2021 License: | 28 | Test Date: | Aug-2021 |
|---|---|---|---|
| Test Sponsor: | Lenovo Global Technology | Hardware Availability: | Aug-2021 |
| Tested by: | Lenovo Global Technology | Software Availability: | Aug-2021 |
Benchmark result graphs are available in the PDF report.
| Benchmark | Base | Peak | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | |
| SPEChpc 2021_tny_base | 24.8 | |||||||||||||||||
| SPEChpc 2021_tny_peak | Not Run | |||||||||||||||||
| Results appear in the order in which they were run. Bold underlined text indicates a median measurement. | ||||||||||||||||||
| 505.lbm_t | ACC | 3 | 1 | 33.5 | 67.1 | 33.9 | 66.4 | 34.0 | 66.1 | |||||||||
| 513.soma_t | ACC | 3 | 1 | 66.9 | 55.3 | 64.8 | 57.1 | 64.8 | 57.1 | |||||||||
| 518.tealeaf_t | ACC | 3 | 1 | 203 | 8.14 | 203 | 8.14 | 203 | 8.14 | |||||||||
| 519.clvleaf_t | ACC | 3 | 1 | 45.0 | 36.7 | 45.1 | 36.6 | 45.1 | 36.6 | |||||||||
| 521.miniswp_t | ACC | 3 | 1 | 166 | 9.66 | 165 | 9.68 | 165 | 9.71 | |||||||||
| 528.pot3d_t | ACC | 3 | 1 | 68.6 | 31.0 | 68.4 | 31.1 | 68.5 | 31.0 | |||||||||
| 532.sph_exa_t | ACC | 3 | 1 | 169 | 11.6 | 168 | 11.6 | 168 | 11.6 | |||||||||
| 534.hpgmgfv_t | ACC | 3 | 1 | 101 | 11.6 | 101 | 11.6 | 101 | 11.7 | |||||||||
| 535.weather_t | ACC | 3 | 1 | 41.6 | 77.6 | 41.6 | 77.5 | 41.5 | 77.7 | |||||||||
| Hardware Summary | |
|---|---|
| Type of System: | Homogenous |
| Compute Node: | ThinkSystem SR670 V2 |
| Interconnect: | None |
| File Server Node: | ThinkSystem SR670 V2 |
| Compute Nodes Used: | 1 |
| Total Chips: | 2 |
| Total Cores: | 80 |
| Total Threads: | 80 |
| Total Memory: | 512 GB |
| Software Summary | |
|---|---|
| Compiler: | Nvidia HPC SDK 21.5 |
| MPI Library: | Open MPI 4.0.5 |
| Other MPI Info: | None |
| Base Parallel Model: | ACC |
| Base Ranks Run: | 3 |
| Base Threads Run: | 1 |
| Peak Parallel Models: | Not Run |
| Hardware | |
|---|---|
| Number of nodes: | 1 |
| Uses of the node: | compute |
| Vendor: | Lenovo Global Technology |
| Model: | ThinkSystem SR670 V2 |
| CPU Name: | Intel Xeon Platinum 8380 |
| CPU(s) orderable: | 2 chips |
| Chips enabled: | 2 |
| Cores enabled: | 80 |
| Cores per chip: | 40 |
| Threads per core: | 1 |
| CPU Characteristics: | Intel Turbo Boost Technology up to 3.4 GHz |
| CPU MHz: | 2300 |
| Primary Cache: | 32 KB I + 48 KB D on chip per core |
| Secondary Cache: | 1280 KB I+D on chip per core |
| L3 Cache: | 60 MB I+D on chip per chip |
| Other Cache: | None |
| Memory: | 512 GB (16 x 32 GB 2Rx8 PC4-3200A-R) |
| Disk Subsystem: | 1 x 4 TB NVMe SSD |
| Other Hardware: | None |
| Accel Count: | 8 |
| Accel Model: | Tesla A100 PCIe 40GB |
| Accel Vendor: | Nvidia Corporation |
| Accel Type: | GPU |
| Accel Connection: | PCIe Gen4 x16 |
| Accel ECC enabled: | Yes |
| Accel Description: | Nvidia Tesla A100 PCIe 40GB |
| Adapter: | Mellanox ConnectX-6 HDR |
| Number of Adapters: | 1 |
| Slot Type: | PCI-Express 4.0 x16 |
| Data Rate: | 200 Gb/s |
| Ports Used: | 1 |
| Interconnect Type: | Nvidia Mellanox ConnectX-6 HDR |
| Software | |
|---|---|
| Accelerator Driver: | 470.42.01 |
| Adapter: | Mellanox ConnectX-6 HDR |
| Adapter Driver: | 5.2-1.0.4 |
| Adapter Firmware: | 20.28.1002 |
| Operating System: | Red Hat Enterprise Linux Server release 8.3, Kernel 4.18.0-193.el8.x86_64 |
| Local File System: | xfs |
| Shared File System: | XFS |
| System State: | Multi-user, run level 3 |
| Other Software: | None |
| Hardware | |
|---|---|
| Number of nodes: | 1 |
| Uses of the node: | Fileserver |
| Vendor: | Lenovo Global Technology |
| Model: | ThinkSystem SR670 V2 |
| CPU Name: | Intel Xeon Platinum 8380 |
| CPU(s) orderable: | 2 chips |
| Chips enabled: | 2 |
| Cores enabled: | 80 |
| Cores per chip: | 40 |
| Threads per core: | 1 |
| CPU Characteristics: | Turbo up to 3.4 GHz |
| CPU MHz: | 2300 |
| Primary Cache: | 32 KB I + 48 KB D on chip per core |
| Secondary Cache: | 1280 KB I+D on chip per core |
| L3 Cache: | 60 MB I+D on chip per chip |
| Other Cache: | None |
| Memory: | 512 GB (16 x 32 GB 2Rx8 PC4-3200A-R) |
| Disk Subsystem: | 1 x 4 TB NVMe SSD |
| Other Hardware: | None |
| Accel Count: | 8 |
| Accel Model: | Tesla A100 PCIe 40GB |
| Accel Vendor: | Nvidia |
| Accel Type: | GPU |
| Accel Connection: | Nvidia Tesla A100 PCIe 40GB |
| Accel ECC enabled: | Yes |
| Accel Description: | Nvidia Tesla A100 PCIe 40GB |
| Adapter: | Mellanox ConnectX-6 HDR |
| Number of Adapters: | 1 |
| Slot Type: | PCI-Express 4.0 x16 |
| Data Rate: | 200 Gb/s |
| Ports Used: | 1 |
| Interconnect Type: | Nvidia Mellanox ConnectX-6 HDR |
| Software | |
|---|---|
| Accelerator Driver: | None |
| Adapter: | Mellanox ConnectX-6 HDR |
| Adapter Driver: | 5.2-1.0.4 |
| Adapter Firmware: | 20.28.1002 |
| Operating System: | Red Hat Enterprise Linux Server release 8.3 |
| Local File System: | xfs |
| Shared File System: | None |
| System State: | Multi-User, run level 3 |
| Other Software: | None |
| Hardware | |
|---|---|
| Vendor: | None |
| Model: | None |
| Switch Model: | None |
| Number of Switches: | 0 |
| Number of Ports: | 0 |
| Data Rate: | None |
| Firmware: | None |
| Topology: | None |
| Primary Use: | None |
| Software |
|---|
Indiviual Ranks were bound to the CPU cores on the same NUMA node as
the GPU using 'numactl' within the following "bind.pl" perl script:
---- Start bind.pl ------
my %bind;
$bind{0} = "1-3";
$bind{1} = "4-7";
$bind{2} = "8-10";
$bind{3} = "11-14";
$bind{4} = "41-43";
$bind{5} = "44-47";
$bind{6} = "61-63";
$bind{7} = "64-67";
my $rank = $ENV{OMPI_COMM_WORLD_LOCAL_RANK};
my $cmd = "taskset -c $bind{$rank} ";
while (my $arg = shift) {
$cmd .= "$arg ";
}
my $rc = system($cmd);
exit($rc);
---- End bind.pl ------
The config file option 'submit' was used.
submit = mpirun --allow-run-as-root -x UCX_MEMTYPE_CACHE=n
-host localhost:8 -np $ranks perl $[top]/bind.pl $command
Environment variables set by runhpc before the start of the run: UCX_MEMTYPE_CACHE = "n" UCX_TLS = "self,shm,cuda_copy"
==============================================================================
CC 505.lbm_t(base) 513.soma_t(base) 518.tealeaf_t(base) 521.miniswp_t(base)
534.hpgmgfv_t(base)
------------------------------------------------------------------------------
nvc 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
------------------------------------------------------------------------------
==============================================================================
CXXC 532.sph_exa_t(base)
------------------------------------------------------------------------------
nvc++ 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
------------------------------------------------------------------------------
==============================================================================
FC 519.clvleaf_t(base) 528.pot3d_t(base) 535.weather_t(base)
------------------------------------------------------------------------------
nvfortran 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
------------------------------------------------------------------------------
| 521.miniswp_t: | -DUSE_KBA -DUSE_ACCELDIR |
| 532.sph_exa_t: | -DSPEC_USE_LT_IN_KERNELS --c++17 |
| -Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu -DSPEC_ACCEL_AWARE_MPI |
| -Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu -DSPEC_ACCEL_AWARE_MPI |
| -DSPEC_ACCEL_AWARE_MPI -Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu |