Voltage Lockdown: Investigating AMD's Recent AM5 AGESA Updates on ASRock's X670E Taichiby Gavin Bonshor on May 16, 2023 12:00 PM EST
It's safe to say that the last couple of weeks have been a bit chaotic for AMD and its motherboard partners. Unfortunately, it's been even more chaotic for some users with AMD's Ryzen 7000X3D processors. There have been several reports of Ryzen 7000 processors burning up in motherboards, and in some cases, burning out the chip socket itself and taking the motherboard with it.
Over the past few weeks, we've covered the issue as it's unfolded, with AMD releasing two official statements and motherboard vendors scrambling to ensure their users have been updating firmware in what feels like a grab-it-quick fire sale, pun very much intended. Not everything has been going according to plan, with AMD having released two new AGESA firmware updates through its motherboard partners to try and address the issues within a week.
The first firmware update made available to vendors, AGESA 18.104.22.168, addressed reports of SoC voltages being too high. This AGESA version put restrictions in place to limit that voltage to 1.30 V, and was quickly distributed to all of AMD's partners. More recently, motherboard vendors have pushed out even newer BIOSes which include AMD's AGESA 22.214.171.124 (BETA) update. With even more safety-related changes made under the hood, this is the firmware update AMD and their motherboard partners are pushing consumers to install to alleviate the issues – and prevent new ones from occurring.
In this article, we'll be taking a look at the effects of all three sets of firmware (AGESA 126.96.36.199c - 7) running on our ASRock X670E Taichi motherboard. The goal is to uncover what, if any, changes there are to variables using the AMD Ryzen 9 7950X3D, including SoC voltages and current drawn under intensive memory based workloads.
Here is our recent coverage of the Ryzen 7000X3D/7000 'burnout' issues, including two statements from AMD, an official response from ASUS and MSI:
- MSI Addresses CPU Voltages on AM5 Motherboards for Ryzen 7000X3D Processors
- AMD Issues Official Statement on Reported Ryzen 7000 Burnout Issues
- ASUS Issues Statement on Ryzen 7000X3D Processor Issues, Possible Voltage Issues with AMD EXPO
- AMD Issues Second Statement on Ryzen 7000 Burnout Issues: Caps SoC Voltages
AMD Ryzen 7000 AGESA Firmware: From 188.8.131.52c to 184.108.40.206 Within 32 Days
The first firmware update made available to vendors, AGESA 220.127.116.11, addressed reports of SoC voltages being too high, with new restrictions put in place to limit things to 1.30 V. In the case of the board we've been using to try and dig deeper into issues, the ASRock X670E Taichi, this was made available to the public on 4/27/23 through its 1.21 firmware update. More recently, on 5/4/23, ASRock made its latest 1.24.AS02 firmware, which includes AMD's AGESA 18.104.22.168 (BETA) update.
The AGESA 22.214.171.124 (BETA) update is the firmware that AMD has been planning to roll out most recently to alleviate the issues of burnout, not just for Ryzen 7000X3D chips with 3D V-Cache, but also across the broader Ryzen 7000 and AM5 ecosystem. In between the initial AGESA 126.96.36.199c firmware that brought AMD's Ryzen 7000X3D support to AM5 motherboards, in the space of a mere 32 days, AMD has released a total of three major AGESA versions, which ASRock has dutifully published for the X670E Taichi. We'll be using this as our baseline for our analysis and look into what's going on.
On top of this, AMD is also planning to release an even more robustly updated AGESA firmware, which could be in the coming weeks. Referred to internally as AGESA 188.8.131.52, we did reach out to AMD for comment on this, but our rep couldn't comment on "unannounced or internal only software stacks." It should also be noted that the current firmware at the time of writing available to users is a BETA version, implying that a newer AGESA is undoubtedly on its way. Still, the timescale of the release is anyone's guess currently.
So looking at the variations in AMD's AGESA updates over the last month, there hasn't been any official indication of changes other than the bare minimum, at least not from ASRock's descriptions. The following is what ASRock is saying about the descriptions of the AGESA updates:
- AGESA 184.108.40.206c: Initial support for Ryzen 7000X3D processors with 3D V-Cache.
- AGESA 220.127.116.11 (BETA): Improved memory compatibility, Optimizations for Ryzen 7000X3D, recommended update for Ryzen 7000X3D processors.
- AGESA 18.104.22.168 (BETA): Support for 48/24GB DDR5 memory modules.
The description of the changes, at least from the point of ascertaining what each AGESA is offering, is borderline pitiful. In none of the descriptions does it state what changes AMD has made to each AGESA firmware to address the current issues, which in all honesty, is a pretty big thing to omit. There are no indications whatsoever on ASRock's X670E Taichi BIOS page as to what each firmware changes, and with no public notes available to users, it's a case of "update to this firmware, it's recommended."
So what do we know about the changes? Well, we know the critical change going from AGESA 22.214.171.124c to the 0.6 and 0.7 versions is a lockdown on SoC voltage to 1.30 V. Previously, on the ASRock X670E Taichi with 126.96.36.199c; we were able to set the SoC voltage to 2.5 V, which would almost certainly result in frying our X3D chips like an egg.
Image Credit: Igor Wallossek, Igorslab.de
The other changes coming with AGESA 188.8.131.52, according to Igor Wallosek, the Editor-in-chief of Igorslab.de, AMD has also added two new PROCHOT entries that point directly to combating overheating. PROCHOT essentially means Processor Hot, and it is a controlled mechanism that is designed to protect the processor from overheating. There are two implementations here. The first is the PROCHOT Control mechanism which is precisely what it says on the tin. When the CPU hits a defined value, the component sends a PROCHOT Control signal, and the CPU draws less power to try and mitigate temperatures and reduce the risk of damage.
The second mechanism is PROCHOT Deassertion Ramp Time, which dictates how long a processor can ramp up the power after the initial PROCHOT Control signal has been disabled. Essentially, PROCHOT Deassertion Ramp is the time it takes for the processor to get back up to normal parameters, and different variables, including cooling, the aggressiveness of said cooling, and general heat dissipation quality, can dictate this time. If the processor is inadequately cooled, this can result in a longer deassertion ramp time, whereas more aggressive heat dissipation methods should theoretically allow for a quicker ramp-up time.
The Story So Far: Gamers Nexus Deep-Dive - The Ryzen 7000 CORE Fundamental Issues
Before the rollout of new firmware, Steve Burke, the Editor-in-Chief of Gamers Nexus, and his team investigated the issues in-depth, including looking at the original fried hardware from Speedrookie. This includes a faulty and bulged out Ryzen 7 7800X3D processor and his burnt ASUS ROG STRIX X670E E Gaming motherboard. Instead of RMA'ing the hardware, Steve Burke reached out to the user and offered to buy the hardware from him, minimizing the RMA lead time and allowing Speedrookie to purchase new hardware.
The 38:46 long video is a very good watch, and we certainly recommend that users watch this, especially for those more interested in the inner workings (or issues) of the Ryzen 7000X3D and 7000 series processors. To summarize Steve's findings, we took away the following points:
- AMD Ryzen 7000X3D CPUs are shutting down too late to mitigate physical damage.
- ASRock, GIGABYTE, and MSI have a 116°C thermal trip point, and ASUS has 106°C, but sometimes didn't work as intended.
- The thermal cut-off for Ryzen 7000X3D is supposed to be 106°C and 116°C for Ryzen 7000.
- AMD EXPO enabled on ASUS is 1.35V on SoC voltage up until BIOS 1202 (AGESA 184.108.40.206).
- ASUS's SoC Voltage settings were/are too high.
- The AGESA firmware rollout has been nothing short of chaos at this point.
- AMD is offering RMA (paying shipping both ways) on killed CPUs, even if EXPO has been used (at least in the US)
- No word on if motherboard vendors will honor the warranty (at the time of writing)
While Steve and his team at Gamers Nexus have gone deep into uncovering the root causes of the problem, one thing remains abundantly clear: the issue is not just one that relates to SoC voltage. There has certainly been some confusion between AMD themselves and its motherboard partners in implementing the appropriate failsafe to prevent the CPU (and motherboard socket, for that matter) from burning into oblivion.
The other problem relates to ASUS here, with a more aggressive implementation of its SoC voltages, which Gamers Nexus confirmed in their testing as running too high. Before the AGESA firmware (220.127.116.11) update through BIOS version 1202, ASUS was overshooting SoC voltage by 0.05 V over AMD's newly imposed SoC voltage limit of 1.3 V.
Image Credit: Gamers Nexus
Soldering leads and connecting the motherboard to a digital multimeter, a 1.35 V SoC setting within the ASUS firmware (and with EXPO enabled) resulted in an observed 1.398 V from an SoC pad. This was typically even higher when probed at the choke, at an eye-watering 1.42 V. This fundamentally poses a problem that ASUS's firmware and the SoC rails themselves aren't cohabiting well with each other. An additional 0.05 V on top of the recommended 1.30 V is a lot, to say the least, but adding an extra 0.05 V on top of that can undoubtedly lead to dielectric degradation and possibly lead to dead CPUs and burnt motherboard sockets.
Doing some preliminary testing on the effect of SoC voltage on stability on the latest AGESA 18.104.22.168 (BETA) firmware, our G.Skill DDR5-6000 kit of DDR5 memory (2 x 16 GB) on the ASRock X670E Taichi would automatically preset 1.30 V on the SoC when applying the EXPO memory profile. To elaborate, unfortunately, we tried 1.15 V, which was a no-go, and even 1.20 V was a no-go. We eventually settled on 1.25 V on the SoC for this kit and our Ryzen 9 7950X3D, and we found stability in memory-intensive benchmarks was solid.
Perhaps one of the biggest things to come outside of Gamers Nexus's testing was that AMD is now offering RMA support for users who have used EXPO memory profiles, something which normally voids the warranty on AMD's processors. Whether or not other regions intend to honor these RMA requests hasn't been confirmed, but it's unlikely to be an issue.
Still, it's a good gesture for users with damaged CPUs from an issue that is entirely not their fault. Motherboard vendors, on the other hand, operate within their policies and parameters, and it may be trickier getting an RMA on a damaged motherboard simply because AMD doesn't control motherboard vendors' RMA policies. We would hope in good faith that motherboard vendors will honor the warranty in instances of these burnout issues, but we cannot confirm if they will at this time.
Our Testing: Methodology, Test Setup, and Hardware
To summarize the reason for testing AMD's AGESA firmware, we aren't trying to replicate burning our Ryzen 7000X3D samples – enough processors have already been sacrificed for science. For that matter, we certainly didn't see or smell any smoke coming from our ASRock X670E Taichi during testing, so we'll take that as a good sign.
Our purpose for testing is to highlight any differences or variations in parameters and power-related elements coming from AMD's latest AGESA packages. This includes looking at rails like SoC voltage and Package Power Tracking (PPT) output from the AM5 CPU socket. As AMD has dialed down what users and motherboard vendors can apply in regards to SoC voltage to 1.30 V, it's worth noting that all of ASRock's firmware we've tested on the X670E Taichi in this piece automatically sets SoC voltage to 1.30 V. While we don't have the necessary tools and equipment to solder leads to the motherboard to observe 'physical' voltages, we are relying on HWInfo's reporting prowess, as well as looking at multiple temperatures.
We also did some in-house stability testing against the new SoC voltage limits, running a fresh batch of tests on our Ryzen 9 7950X3D paired with a G.Skill DDR5-6000 (2 x 16 GB) memory kit with its AMD EXPO memory profile enabled. We found that things weren't stable until we applied 1.25 V on the SoC voltage within the firmware. Hitting up to 1.25V on the SoC, our kit was rock solid, even in memory-intensive workloads and benchmarks.
That has been our focus, trying to push the memory as hard as we can to ensure complete stability. A lot of the fanfare surrounding the issue, on the whole, has been unfairly put on AMD's EXPO profiles as being one of the causes; it is not. We know that CPU-intensive workloads will generate more heat, but that isn't what we've been looking at investigating. We're looking for variations in current and power between the different firmware versions to see if AMD (and ASRock) has made optimizations within its framework to reduce these factors, with current, or more specifically over current and the integrated failsafes being bypassed, which is one of the key concerns in the burnouts.
Our test bench for our AGESA (AM5) update testing is as follows:
|AMD Ryzen 7950X3D AGESA Test Platform|
|CPU||Ryzen 9 7950X3D ($699)
16 Cores, 32 Threads
120 W TDP
|Motherboard||ASRock X670E Taichi (BIOS 1.18, 1.21 & 1.24.AS02)|
|Memory||G.Skill Trident Z5 Neo
DDR5-5200 (JEDEC Default)
DDR5-6000 CL34 (EXPO Profile)
|Cooling||EK-AIO Elite 360 D-RGB 360 mm AIO|
|Storage||SK Hynix 2TB Platinum P41 PCIe 4.0 x4 NMve|
|Power Supply||Corsair HX1000|
|GPUs||AMD Radeon RX 6950 XT, Driver 31.0.12019|
|Operating Systems||Windows 11 22H2|
For our choice of workloads, we're relying on the Memory Test Suite from Openbenchmarking.org via Phoronix to implement our memory-intensive workloads. Although some of these workloads aren't optimized and don't run on Windows, we used the CacheBench benchmark, which uses multiple data types across read, write, modify, and read/write/modify combined. As part of the LLCbench low-level architectural characterization benchmark suite, CacheBench is designed to test memory and cache bandwidth performance and relies on a compilation of C++ Toolchains and compilers.
Read on for more analysis.