AMD Platform vs GIGABYTE: IO Power Overhead Gone

Starting off with the big change for toady’s review: the new production-grade GIGABYTE Milan compatible test platform.

In our original review of Milan, we had initially discovered that AMD’s newest generation chips had one large glass jaw: the platform’s extremely high idle package power behaviour exceeding 100W. This was a notable regression compared to what we saw on Rome, and we deemed it as a core cause as of why Milan was seeing some performance regressions in certain workloads compared to the predecessor Rome SKUs.

We had communicated our findings and worries to AMD prior to the review publishing, but never root-caused the issue, and never were able to confirm whether this was the intended behaviour of the new Milan chips or not. We theorized that it was a side-effect of the new sIOD which had the infinity fabric running at a higher frequency, which this generation runs in 1:1 mode with the memory controller clocks.

Package Idle Power

To our surprise, when setting up the new GIGABYTE system, we found out that this behaviour of extremely high idle power was not being exhibited on the new test platform.

Indeed, instead of the 100W idle figures as we had tested on the Daytona system, we’re now seeing figures that are pretty much in line with AMD’s Rome system, at around 65-72W. The biggest discrepancy was found in the 75F3 part, which now idles 39W less than on the Daytona system.

Milan Power Efficiency
SKU EPYC 7763
(Milan)
Motherboard/
Platform
Daytona GIGABYTE
TDP Setting 280W
 
  Perf
 
PKG
(W)
Core
(W)
Perf PKG
(W)
Core
(W)
500.perlbench_r 281 274 166 317 282 195
502.gcc_r 262 262 131 271 265 150
505.mcf_r 155 252 115 158 252 132
520.omnetpp_r 142 249 120 144 244 133
523.xalancbmk_r 181 261 131 195 266 152
525.x264_r 602 279 172 641 283 196
531.deepsjeng_r 262 267 161 296 283 196
541.leela_r 267 249 148 303 274 199
548.exchange2_r 487 274 176 543 262 202
557.xz_r 190 260 141 206 272 171
SPECint2017 255 260 141 275 265 164
kJ Total   2029     1932  
Score / W   0.980     1.037  
 
503.bwaves_r 354 226 90 362 218 99
507.cactuBSSN_r 222 278 150 229 285 174
508.namd_r 282 279 176 280 260 193
510.parest_r 153 256 119 162 259 138
511.povray_r 348 275 176 387 255 193
519.lbm_r 39 219 84 40 210 92
526.blender_r 372 276 165 396 282 188
527.cam4_r 399 278 147 417 285 170
538.imagick_r 446 278 178 471 268 200
544.nab_r 259 278 175 275 282 198
549.fotonik3d_r 110 220 86 113 215 95
554.roms_r 88 243 106 89 241 119
SPECfp2017 211 240 110 220 235 123
kJ Total   4980     4716  
Score / W   0.879     0.9361  

A more detailed power analysis of the EPYC 7763 during our SPEC2017 runs confirms the change in the power behaviour. Although the total average package power hasn’t changed much between the systems, in the integer suite now 5W higher at 265W vs 260W, and in the FP suite now 5W lower at 235W vs 240W, what more significantly changes is the core power allocation which is now much higher on the GIGABYTE system.

In core-bound workloads with little memory pressure, such as 541.leela_r, the core power of the EPYC 7763 went up from 148W to 199W, a +51W increase or +34%. Naturally because of this core power increase, there’s also a corresponding large performance increase of +13.3%.

The behaviour change doesn’t apply to every workload, memory-heavy workloads such as 519.lbm don’t see much of a change in power behaviour, and only showcase a small performance boost.

Reviewing the performance differences between the original Daytona system tested figures and the new GIGABYTE motherboard test-runs, we’re seeing some significant performance boosts across the board, with many 10-13% increases in compute bound and core-power bound workloads.

These figures are significant enough that they do change the overall verdict of those SKUs, and they also change the tone of our final review verdict on Milan, as evidently the one weakness the new generation had was actually not a design mishap, but actually was an issue with the Daytona system. It explains a lot of the more lacklustre performance increases of Milan vs Rome, and we’re happy that this was ultimately not an issue for production-grade platforms.

As a note, because we also have the 4-chiplet EPYC 7443 and EPYC 7343 SKUs in-house now, we also measured the platform idle power of those units, which came in at 50 and 52W. This is actually quite a bit below the 65-75W of the 8-chiplet 7763, 75F3 and 72F3 parts, which indicates that this power behaviour isn’t solely internal to the sIOD chiplet, but actually part of the sIOD and CCD interfaces, or as well the CCD L3 cache power.

 

Test Bed and Setup - Compiler Options SPEC - Multi-Threaded Performance - Subscores
Comments Locked

58 Comments

View All Comments

  • Andrei Frumusanu - Friday, June 25, 2021 - link

    Those results don't contradict anything I'm saying. Given a normalised throughput performance of the socket, for example here where the 16- and 24- core Milan equals or beats the 28-core ICL-SP in many workloads, the Xeon still handily beats those Milan parts in transactional workloads. The 40-core Xeon has 77% of the jbb performance of the 64-core EPYC even though in the int suite it's only at 60%. Those particular STH results work out because the 7543P is $1000 cheaper than the 7543, but for the SKUs we had in today, Intel still is on equal footing in terms of DB performance value.
  • Cllaymenn - Friday, June 25, 2021 - link

    Whatever one says about some insignificant single anomaly in some DB test... The fact is that ANY company, from small to large, any corporation needing power, any data centre, hosting, cloud computing, research institutes, universities, will choose EPYC on ZEN3 over even the 8320, because it will allow them to compute faster, make more money per month, and less stress for administrators when there are higher network loads, clouds because AMD will "grind" / process faster the requests/needs of thousands of of thousands of clients simultaneously using servers, because in addition to more compute power has more bandwidth AMD platform especially with 256 threads and 8 channel memory and fast Infinity Fabric and many of the ZEN3 optimizations... and is more flexible (harder to clog or jam Zen2/Zen3 from what I've noticed. ) These processors grind through anything you throw at them without any breathlessness.
  • schujj07 - Friday, June 25, 2021 - link

    While Spec is an "industry standard" benchmark, vendors spend hours optimizing for their servers to look better. Therefore as an administrator and designer of a high performance data-center I personally look at Spec results with a grain of salt. For example, Super Micro submitted data for 2 of their A+ AS-1124US-TNRP with dual 75F3 on April 26, 2021. One system has max-jOPS of 276,317 and critial-jOPS of 116,628. The other has a score of 211,179 max-jOPS & 191,813 critical-jOPS. They also have 2 X12DPG-QT6 with dual 8380's and one has scores of 272,500 for max-jOPS & 147,409 for critical-jOPS. The other has scores of 258,368 for max-jOPS & 201,334 for critical-jOPS. In these cases the 75F3 with few cores and threads ends up in a virtual tie with the 8380 in the transactional workload for one of the results, but the second result in the database is a 22-30% lower based on comparison systems. https://www.spec.org/jbb2015/results/res2021q2/

    Depending on the results you want, the 75F3 is a much better value or of equal value to the 8380. I think now you can see why I take Spec with a grain of salt on their results. Globally saying that Milan has issues in transactional DBs based solely on Spec results isn't a good idea. While I know it is the benchmarks that you choose as they are "industry standard," I think it would be worth while to invest in creating an actual real world scenario DB benchmark that doesn't use Spec.
  • Andrei Frumusanu - Friday, June 25, 2021 - link

    > One system has max-jOPS of 276,317 and critial-jOPS of 116,628. The other has a score of 211,179 max-jOPS & 191,813 critical-jOPS.

    Which generally makes submitted scores not very useful, we're using apples-to-apples runs here, and while you can argue they're not as optimised, they're comparable to each other.

    And I also never said that Milan has *issues*, I'm simply saying that compared to other workloads where there's a massive performance lead for AMD, Intel is still competitive, a view that falls in line with many industry customers.
  • Cllaymenn - Friday, June 25, 2021 - link

    We know that Intel watches the Anandtech website, and that you are aware of this, they also send you expensive hardware for testing, and hope that the results will be more favourable to their new development (e.g. 8320) which they have been working on for a long time. I think it would be unpleasant and uncomfortable to criticise their new products harshly if I were writing a review, but I would rather gently point out which is good at what, which is leading and which still needs to catch up. Because of the awareness of the efforts of hundreds or even thousands of Intel engineers I would not have the heart to criticize their new product, or sharply, clearly say who wins everything and the rest can hide. I know that even the engineers, designers and CPU architects like to read about their new baby after work, and they go to sites like Anandtech with enthusiasm and quiet hope that they have made a better impression on the reviewer and readers, than their previous older products, that we have noticed a significant difference, jump in performance and that it has been appreciated and maybe there will be some nice, positive comments, feedback. It probably gives them a lot of happiness to see people out there enjoying the results of their hard work and another success for the company. Because the 8320 was a huge challenge for these people, it's a brand new fresh 10nm SuperFin technology and a mega monolithic 40 core big piece of silicon. And it works! It may not catch up with the 64 core competition but it's still a huge step forward for them, reaching a significant milestone. Once they mastered this SuperFin 10nm technology to create monolithic 40 core chips they now have a lot of experience and know how to do it even better, especially in a modular architecture where the silicon pieces will be smaller. Many of the threads stem from the creation of the Xeon 8320, so I understand the reviewer's attitude of appreciating the level of technology, sophistication, and performance of their new design. (sorry for some grammatical errors, I'm still improving)
  • bwhitty - Friday, June 25, 2021 - link

    Can't tell if you're very subtly implying Andrei is coloring the results in favor of Intel? Perhaps you're not, but anyways it doesn't seem he is. Other than that, I agree that

    Small correction: Ice Lake is on 10nm+, not Super Fin. Tiger Lake is 10SF (10++), and Sapphire Rapids will be on 10 Enhance Super Fin, so 10nm+++.

    Tangent: I think that Ice Lake being on the non-SF process actually bodes extremely well for Sapphire Rapids because Ice Lake even in laptops is just not that good from a mfg perspective. It's basically Intel 10nm's first shippable and salvaged process. Super Fin appears far, far better in Tiger Lake versus Ice Lake, and so an improvement on top of that thusly should perhaps finally bring Intel's mfg in line with TSMC 7nm. That gives Sapphire Rapids a good place to be in the first half of 2022 until Genoa rolls out on TSMC 5nm is late 2022 / early 2023.
  • Cllaymenn - Friday, June 25, 2021 - link

    bwhitty. I did not mean favoring Intel products, but a more subdued way of speaking about their performance in relation to ZEN3, a way other than the popular Linus on YT, which is sharply pressing Intel with each premiere of new AMD products.

    As for Super Fin, I read about it recently in one of the popular IT websites. I typed in google and found a quote

    "Intel Xeon Scalable Ice Lake-SP processors were announced some time ago, but we had to wait a while for their premiere. We finally got it - we got to know the technical details of the units, as well as their performance results. Intel Xeon Scalable units (Ice Lake-SP) use the new Sunny Cove microarchitecture, which is expected to translate into up to a 20% increase in IPC over the previous generation Skylake. The chipsets are manufactured using a new 10nm SuperFin process.

    As I checked with a few other sources, I now know that this site was wrong about the 83xx series.
  • Ian Cutress - Friday, June 25, 2021 - link

    On 10nm naming, Intel has changed it twice. There are no + or ++ any more.

    https://www.anandtech.com/show/16107/what-products...
  • bwhitty - Monday, June 28, 2021 - link

    Oh yes, Dr Cutress, I know all these Intel mfg node specifics purely from Anandtech’s breakdowns
  • outsideloop - Friday, June 25, 2021 - link

    Far, far better? Tiger Lake H still sucks power like an anebriated Cleopatra.

Log in

Don't have an account? Sign up now