IBM (Test Sponsor: Oak Ridge National Laboratory) Summit: IBM Power System AC922 (IBM Power9, Tesla V100-SXM2-16GB) |
SPEChpc 2021_med_base = 41.3 |
SPEChpc 2021_med_peak = Not Run |
hpc2021 License: | 056A | Test Date: | Sep-2021 |
---|---|---|---|
Test Sponsor: | Oak Ridge National Laboratory | Hardware Availability: | Nov-2018 |
Tested by: | Oak Ridge National Laboratory | Software Availability: | Jul-2021 |
Benchmark result graphs are available in the PDF report.
Benchmark | Base | Peak | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | |
SPEChpc 2021_med_base | 41.3 | |||||||||||||||||
SPEChpc 2021_med_peak | Not Run | |||||||||||||||||
Results appear in the order in which they were run. Bold underlined text indicates a median measurement. | ||||||||||||||||||
705.lbm_m | ACC | 4200 | 1 | 17.3 | 71.0 | 10.1 | 122 | |||||||||||
718.tealeaf_m | ACC | 4200 | 1 | 48.4 | 27.9 | 49.3 | 27.4 | |||||||||||
719.clvleaf_m | ACC | 4200 | 1 | 20.3 | 91.3 | 20.3 | 91.1 | |||||||||||
728.pot3d_m | ACC | 4200 | 1 | 74.2 | 24.9 | 73.2 | 25.3 | |||||||||||
734.hpgmgfv_m | ACC | 4200 | 1 | 102 | 9.78 | 105 | 9.51 | |||||||||||
735.weather_m | ACC | 4200 | 1 | 20.3 | 118 | 20.3 | 118 |
Hardware Summary | |
---|---|
Type of System: | Homogenous Cluster |
Compute Node: | IBM Power System AC922 |
Interconnect: | Mellanox InfiniBand |
Compute Nodes Used: | 700 |
Total Chips: | 1400 |
Total Cores: | 15400 |
Total Threads: | 61600 |
Total Memory: | 350 TB |
Software Summary | |
---|---|
Compiler: | C/C++/Fortran: Version 21.7 of NVHPC Toolkit |
MPI Library: | Spectrum MPI Version 10.4.0.3 |
Other MPI Info: | None |
Other Software: | None |
Base Parallel Model: | ACC |
Base Ranks Run: | 4200 |
Base Threads Run: | 1 |
Peak Parallel Models: | Not Run |
Hardware | |
---|---|
Number of nodes: | 700 |
Uses of the node: | compute |
Vendor: | IBM |
Model: | IBM Power System AC922 |
CPU Name: | IBM POWER9 2.1 (pvr 004e 1201) |
CPU(s) orderable: | 2 chips |
Chips enabled: | 2 |
Cores enabled: | 22 |
Cores per chip: | 44 |
Threads per core: | 4 |
CPU Characteristics: | Up to 3.8 GHz |
CPU MHz: | 2300 |
Primary Cache: | 32 KB I + 32 KB D on chip per core |
Secondary Cache: | 512 KB I+D on chip per core |
L3 Cache: | 110 MB I+D on chip per chip |
Other Cache: | None |
Memory: | 512 GB (16 x 32 GB RDIMM-DDR4-2666) |
Disk Subsystem: | 2 x 800 GB (Samsung Electronics Co Ltd NVMe SSD Controller 172Xa/172Xb) |
Other Hardware: | None |
Accel Count: | 4 |
Accel Model: | Tesla V100-SXM2-16GB |
Accel Vendor: | NVIDIA Corporation |
Accel Type: | GPU |
Accel Connection: | NVLink 2.0 |
Accel ECC enabled: | Yes |
Accel Description: | See Notes |
Adapter: | Mellanox ConnectX-5 |
Number of Adapters: | 2 |
Slot Type: | None |
Data Rate: | 100 Gb/s (4X EDR) |
Ports Used: | 2 |
Interconnect Type: | EDR InfiniBand |
Software | |
---|---|
Accelerator Driver: | NVIDIA CUDA 450.80.02 |
Adapter: | Mellanox ConnectX-5 |
Adapter Driver: | 4.9-2.2.4.1 |
Adapter Firmware: | 16.29.1016 |
Operating System: | Red Hat Enterprise Linux 8.2 |
Local File System: | xfs |
Shared File System: | 250 PB IBM Spectrum Scale parallel filesystem over 4X EDR InfiniBand |
System State: | Multi-user, run level 3 |
Other Software: | None |
Hardware | |
---|---|
Vendor: | Mellanox |
Model: | Mellanox Switch IB-2 |
Switch Model: | Mellanox IB EDR Switch IB-2 |
Number of Switches: | 1 |
Number of Ports: | 36 |
Data Rate: | 100 Gb/s |
Topology: | Non-blocking Fat-tree |
Primary Use: | MPI Traffic and GPFS access |
Software |
---|
The config file option 'submit' was used.
MPI startup command: jsrun command was used to launch job using 1 GPU/rank. Detailed information from nvaccelinfo CUDA Driver Version: 11000 NVRM version: NVIDIA UNIX ppc64le Kernel Module 450.80.02 Wed Sep 23 00:55:04 UTC 2020 Device Number: 0 Device Name: Tesla V100-SXM2-16GB Device Revision Number: 7.0 Global Memory Size: 16911433728 Number of Multiprocessors: 80 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 Total Shared Memory per Block: 49152 Registers per Block: 65536 Warp Size: 32 Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647B Texture Alignment: 512B Clock Rate: 1530 MHz Execution Timeout: No Integrated Device: No Can Map Host Memory: Yes Compute Mode: exclusive-process Concurrent Kernels: Yes ECC Enabled: Yes Memory Clock Rate: 877 MHz Memory Bus Width: 4096 bits L2 Cache Size: 6291456 bytes Max Threads Per SMP: 2048 Async Engines: 4 Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device: Yes Default Target: cc70
============================================================================== CC 705.lbm_m(base) 718.tealeaf_m(base) 734.hpgmgfv_m(base) ------------------------------------------------------------------------------ /usr/lib64/crt1.o:(.rodata+0x8): undefined reference to `main' pgacclnk: child process exit status 1: /usr/bin/ld nvc 21.7-0 linuxpower target on Linuxpower NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ------------------------------------------------------------------------------ ============================================================================== FC 719.clvleaf_m(base) 728.pot3d_m(base) 735.weather_m(base) ------------------------------------------------------------------------------ nvfortran 21.7-0 linuxpower target on Linuxpower NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ------------------------------------------------------------------------------