AMD Scores First Top 10 Zen Supercomputer… at NVIDIAby Dr. Ian Cutress on June 22, 2020 6:30 PM EST
One of the key metrics we’ve been waiting for since AMD launched its Zen architecture was when it would re-enter the top 10 supercomputer list. The previous best AMD system, built on Opteron CPUs, was Titan, which held the #1 spot in 2012 but slowly dropped out of the top 10 by June 2019. Now, in June 2020, AMD scores a big win for its Zen 2 microarchitecture by getting to #7. But there’s a twist in this tale.
Measuring success by the TOP500 list is not so much for scoring revenue, but for scoring prestige. On the database are systems that were built over a decade ago, so a chance to put something into the list on the latest and greatest at a fraction of the size and power ends up being a big promotional opportunity for the company whose hardware is involved (as well as where it ends up being based). Obviously since AMD started introducing its new Zen-based processors, as a return to the high-end of performance after several years, we’ve been wondering how long it would take for a large scale AMD deployment.
AMD has had HPC success in the past, most notably with the Titan supercomputer, built on a mixture of Opteron 6274 CPUs paired with NVIDIA K20x accelerator cards. The machine hit #1 in 2012, and still sits at #12 today. This was a sizeable deployment, coming in at 17.6 PetaFLOPs for 8.2 MegaWatts.
Anand back in the day event went for a look around:
Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs
When it comes to AMD’s Zen designs, the two main CPUs we have to look for are Naples (1st Gen EPYC) and Rome (2nd Gen EPYC). That latter has been getting a lot of attention for having up to 64 high performance cores as well as a lot of memory bandwidth and heaps of connectivity for storage and add-in cards.
However, the first Zen system on the top 500 was technically neither of those.
The Hygon joint venture actually provided the first Zen based supercomputer to join the list in November 2018 at #38. This was a system built at Sugon, the company distributing the Hygon systems, to showcase the hardware. It used 5120 of the Hygon 32 core CPUs. We’ve reviewed and done a deep dive into the Hygon hardware. The Hygon joint venture has since dissolved, but the supercomputer it's based on is still running at #58.
It wasn’t until late 2019 that systems based on AMD EPYC show up. In November’s list that we saw two AMD Naples and two AMD Rome systems push AMD’s total up to six (5 based on EPYC, one on older Opterons). For the June 2020 announcement this week, another seven AMD Rome systems are in the list, making Rome the 10th most popular processor family for supercomputers. But it’s Selene at #7 that’s making the headlines.
Selene is the name of the new supercomputer sitting at #7. For host processors, it is using AMD’s Rome 7742 parts, which are the highest performing commercial parts available that aren’t for specialized markets – technically a list price of $6950 each. What makes Selene a bit odd for an AMD win is that it is part of a supercomputer built with NVIDIA A100 accelerators. And it’s also built for NVIDIA to use at NVIDIA.
When NVIDIA announced its new A100 Ampere accelerator card for compute, it also announced the concept of a DGX A100 ‘SuperPod’, connecting 140 DGX A100 nodes and 1120 A100 GPUs to supply up to 700 PetaOPs of AI-based performance. It turns out that this concept of a SuperPOD also just happens to hit #7 in the TOP500 supercomputer list, which uses more traditional LINPACK FP64 FLOPs, straight off the bat. Each of the DGX A100 nodes contains two AMD EPYC CPUs and eight A100 accelerators.
Selene scores a performance of 27.6 PetaFLOPs of FP64 throughput, for 1.3 MegaWatts of power. Compared to the previous Titan supercomputer, which had Opterons and K20x accelerators, that’s 57% more performance for only 16% of the power, making it almost 10x more efficient. Selene uses NVIDIA’s Mellanox HDR Infiniband for connectivity, and has 560 TiB of memory installed.
At launch, NVIDIA said that a DGX A100 node would cost $199k. This makes the hardware deployment for Selene (minus switches, install cost, cabling) somewhere around $28 million. It’s worth noting that this is technically only 280 EPYC CPUs paired with 1120 A100 GPUs, combined together for 277760 ‘cores’. It seems odd to suggest that 'this is all that is needed' to reach #7.
The wins for AMD on Zen are now (with Rmax):
- #7, Selene, an EPYC 7742 + A100 system for NVIDIA (27.6 PF)
- #30, Belenos, an EPYC 7742 system for Meteo France (7.7 PF)
- #34, Joliot-Curie Rome, an EPYC 7H12 system for CEA in France (7.0 PF)
- #48, Mahti, an EPYC 7H12 system for CSC in Finland (5.4 PF)
- #56, Betzy, an EPYC 7742 system for Sigam2 AS in Norway (4.44 PF)
- #58, PreE, a Hygon C86 system for Sugon, China (4.32 PF)
- #124, Freeman, an EPYC 7542 system for ERDC DSRC (2.5 PF)
- #172, Betty, an EPYC 7542 system for the US Army Research Laboratory (2.1 PF)
- #268, Cara, an EPYC 7601 system for German Aerospace Center (1.75 PF)
- #292, an EPYC 7501 + Vega 20system for Pukou Advanced Computing Center, China (1.66 PF)
- #483, Spartan, an EPYC 7H12 system for Atos, France (1.26 PF)
All of which are new in the past year, except for #58 the Hygon system.
The two main upcoming supercomputers for AMD are both part of the US Exascale project.
Frontier is set to have 1.5 ExaFLOPs of EPYC and Radeon Instinct in a 30 MegaWatt design at Oak Ridge, built by Cray (HPE), for 2021.
El Capitan is set to 2.0 ExaFLOPs of EPYC and Radeon Instinct in a 30 MegaWatt design at Lawrence Livermore National Laboratory, built by Cray (HPE), for early 2023.
The other US Exascale project in the US is Aurora, with 1.0 Exaflops of Xeon and Xe, for the Argonne National Laboratories, due in late 2021.
|US Department of Energy Exascale Supercomputers|
|CPU Architecture||AMD EPYC "Genoa"
|Intel Xeon Scalable|
|GPU Architecture||Radeon Instinct||Radeon Instinct||Intel Xe|
|Performance (RPEAK)||2.0 EFLOPS||1.5 EFLOPS||1 EFLOPS|
|Laboratory||Lawrence Livermore||Oak Ridge||Argonne|
AMD is still fervent in meeting its goal of hitting 10% market share for EPYC by the middle of the year. Given that the middle is usually somewhere in Q2/Q3, and we’re set to enter Q3, we should be hearing more about that target soon, and how COVID-19 may have adjusted those expectations.
- El Capitan Supercomputer Detailed: AMD CPUs & GPUs To Drive 2 Exaflops of Compute
- US Dept. of Energy Announces Frontier Supercomputer: Cray and AMD to Build 1.5 Exaflop Machine
- AMD Confirms Zen 4 EPYC Codename, and Elaborates on Frontier Supercomputer CPU
- New #1 Supercomputer: Fujitsu’s Fugaku and A64FX take Arm to the Top with 415 PetaFLOPs
- An Interview with AMD’s CTO Mark Papermaster: ‘There’s More Room At The Top’
- An Interview with AMD’s Forrest Norrod: Naples, Rome, Milan, & Genoa
- Intel’s 2021 Exascale Vision in Aurora: Two Sapphire Rapids CPUs with Six Ponte Vecchio GPUs
Post Your CommentPlease log in or sign up to comment.
View All Comments
inighthawki - Monday, June 22, 2020 - linkThis really doesn't seem all that odd. I would imagine if you want a supercomputer with the most processing power you'd want it to have CPUs with the highest core count, and AMD wins that by a landslide. I don't really think the fact that AMD competes with them in a different department would make them pass on the best choice for what they're doing.
jeremyshaw - Monday, June 22, 2020 - linkWell, if was purely core count, there are other CPUs. We have to include performance (on top of I/O). With 2 CPUs, Rome gives 128PCIe4.0 lanes (and 128 cores), though 128 lanes aren't enough for 8 GPUs + (several) NIC + NVMe, it is a lot better than anything short of 256 lanes PCIe3.0. Why is it a lot better? I'd hate to route 256 PCIe 3.0 lanes, nevermind the switch chips, and the accelerators only have 16 lanes max - PCIe 3.0 loses half of the bandwidth. This is especially important for the 200GBps+ NICs.
IBM POWER9 may have the I/O, but it doesn't have the cores nor much remaining acceptance in the community. IBM tried to bid for two of the exaflop projects, with and without Nvidia, and lost both (I don't know if Nvidia had any serious bids on their own). Nvidia has their own arm CPUs with PCIe4.0 and NVLink, but those are scaled for embedded systems, not HPC. Ampere and Cavium/Marvell, like Nvidia, didn't have the full arm infrastructure at the time of the contracts, anyways (they seem to have it now).
We also have Fujitsu with their A64FX arm-SVE CPU, but historically, Japanese CPUs have failed to gain traction outside of Japan. They seem to be making a serious push for exports this time around, however. But either way, it would have been too late for the exaflop contracts.
In the end, I agree AMD won this part of the supercomputer by a landslide, just for different reasons than core count alone. :D
inighthawki - Monday, June 22, 2020 - linkGood points!
Wafflefries128 - Tuesday, June 23, 2020 - linkTwo 7742 CPUs equate to 256 pcie lanes.
With 8 gpus at x16 that's 128 pcie lanes for the GPUs, leaving an additional 128 lanes for storage, networking and other peripheral hardware.
Wafflefries128 - Tuesday, June 23, 2020 - linkWoops forgot about infinity fabric...nevermind you're right!
schujj07 - Tuesday, June 23, 2020 - linkIn a dual socket system, Gen 2 Epyc can be configured to have up to 160 PCIe 4.0 lanes. This is done by reducing the IF lanes from 128 to 96 and adding the extra lanes to IO. Even in the 96 lane configuration, Gen 2 Epyc has more 50% more CPU-CPU bandwidth than it did in Gen 1.
Deicidium369 - Tuesday, June 23, 2020 - linkNo. Each Epyc CPU has 128 Lanes. 64 of those lanes are used to connect to a 2nd CPU - leaving only 128 Lanes. Epyc tops out at 2 sockets.
Intel has QPI/UPI to facilitate multi socket systems - and does not use PCIe Lanes
Deicidium369 - Tuesday, June 23, 2020 - linkEpyc has 128 PCIe4 lanes - 64 of which are used to connect to a 2nd CPU - there are no 4 or 8 socket Epyc
Deicidium369 - Friday, June 26, 2020 - linkNVSwitch connects the 8 to 16 GPUs together into a single GPU - the AMD system is just a traffic cop and IO. There are no direct CPU to individual GPU connection - the AMD connects to the 8/16GPU Cluster. The AMD system does not provide any compute performance to the mix - much as the previous DGX systems. So the cores in the A100 GPUs are the overwhelming vast majority of the compute power here.
808Hilo - Sunday, July 12, 2020 - linkJapanese chips? I d hate to read their user manual.
Att. Ryan Smith
eat less. You look sick and obesity hurts the brain.