AMD 7th Gen Bristol Ridge and AM4 Analysis: Up to A12-9800, B350/A320 Chipset, OEMs first, PIBs Laterby Ian Cutress on September 23, 2016 9:00 AM EST
Over the last two weeks, AMD officially launched their 7th Generation Bristol Ridge processors as well as the new AM4 socket and related chipsets. The launch was somewhat muted, as the target for the initial launch is purely to the big system OEMs and system integrators, such as Lenovo, HP, Dell and others – for users wanting to build their own systems, ‘Product-in-Box’ units (called PIBs) for self-build systems will come at the end of the year. We held off on the announcement because the launch and briefings left a number of questions unanswered as to the potential matrix of configurations, specifications of the hardware and how it all connects together. We got a number of answers, so let’s delve in.
The seven APUs and one CPU being launched for OEM systems spans from a high-frequency A12 part using the 7th Generation microarchitecture (we call it Excavator v2) to the A6, and they all build on the Bristol Ridge notebook parts that were launched earlier in the year but focused on the desktop this time around. AMD essentially skipped the 6th Gen, Carrizo, for desktop as the design was significantly mobile focused – we ended up with one CPU, the Athlon X4 845 (which we reviewed), with DDR3 support but no integrated graphics. Using the updated 28nm process from TSMC, AMD was able to tweak the microarchitecture and allow full on APUs for desktops using a similar design.
The full list of processors is as follows:
|AMD 7th Generation Bristol Ridge Processors|
|CPU Base /
|GPU||GPU Base /
|A12-9800||2M / 4T||3800 / 4200||Radeon R7||800 / 1108||65W|
|A12-9800E||2M / 4T||3100 / 3800||Radeon R7||655 / 900||35W|
|A10-9700||2M / 4T||3500 / 3800||Radeon R7||720 / 1029||65W|
|A10-9700E||2M / 4T||3000 / 3500||Radeon R7||600 / 847||35W|
|A8-9600||2M / 4T||3100 / 3400||Radeon R7||655 / 900||65W|
|A6-9500||1M / 2T||3500 / 3800||Radeon R5||720 / 1029||65W|
|A6-9500E||1M / 2T||3000 / 3400||Radeon R5||576 / 800||35W|
|Athlon X4 950||2M / 4T||3500 / 3800||-||-||65W|
AMD’s mainstream processors will now hit a maximum of 65W in their official thermal design power (TDP), with the launch offering a number of 65W and 35W parts. There is the potential to offer CPUs with a configurable TDP, however much like the older parts that supported 65W/45W modes, it was seldom used, and chances are we will see OEMs stick with the default design power windows here. Also, the naming scheme: any 35W part now has an ‘E’ at the end of the processor name, allowing for easier identification.
As part of this review, we were able to snag a few extra configuration specifications for each of the processors, including the number of streaming processors in each, base GPU frequencies, base Northbridge frequencies (more on the NB later), and confirmation that all the APUs launched will support DDR4-2400 at JEDEC sub-timings.
|AMD 7th Generation 65W Bristol Ridge Processors|
|CPU Base /
|GPU Base /
|A12-9800||2M / 4T||3800 / 4200||512||800 / 1108||1400|
|A10-9700||2M / 4T||3500 / 3800||384||720 / 1029||1400|
|A8-9600||2M / 4T||3100 / 3400||384||655 / 900||1300|
|A6-9500||1M / 2T||3500 / 3800||384||720 / 1029||1400|
|Athlon X4 950||2M / 4T||3500 / 3800||-||-||1400|
|AMD 7th Generation 35W Bristol Ridge Processors|
|CPU Base /
|GPU Base /
|A12-9800E||2M / 4T||3100 / 3800||512||655 / 900||1300|
|A10-9700E||2M / 4T||3000 / 3500||384||600 / 847||1300|
|A6-9500E||1M / 2T||3000 / 3400||256||576 / 800||1300|
The A12-9800 at the top of the stack is an interesting part on paper. If we do a direct comparison with the previous high-end AMD APUs, the A10-7890K, A10-7870K and A10-7860K, a lot of positives end up on the side of the A12.
|High-End AMD APU Comparison|
|Platform||Bristol Ridge||Kaveri Refresh||Bristol Ridge|
|uArch||Excavator v2||Steamroller||Steamroller||Steamroller||Excavator v2|
|Threads||2M / 4T||2M / 4T||2M / 4T||2M / 4T||2M / 4T|
|CPU Base Freq||3800||4100||3900||3600||3500|
|CPU Turbo Freq||4200||4300||4100||4000||3800|
|GPU Turbo Freq||1108||866||866||757||1029|
|L1-I Cache||192 KB||192 KB||192 KB||192 KB||192 KB|
|L1-D Cache||128 KB||64 KB||64 KB||64 KB||128 KB|
|L2 Cache||2 MB||4 MB||4 MB||4 MB||2 MB|
The frequency of the A12-9800 gives it a greater dynamic range than the A10-7870K (having 3.8-4.2 GHz, rather than 3.9-4.1), but with the newer Excavator v2 microarchitecture, improved L1 cache, AVX 2.0 support and a much higher integrated graphics frequency (1108 MHz vs. 866 MHz) while also coming in at 30W less TDP. The 30W TDP jump is the most surprising – we’re essentially getting better than the previous A10-class performance at a lower power, which is most likely why they started naming the best APU in the stack an ‘A12’. Basically, the A12-9800 APU will be an extremely interesting one to review given the smaller L2 cache but faster graphics and DDR4 memory.
A Wild Overclocker Appears!
Given that technically the systems with the new APUs have been released for a couple of weeks, some vendors have their internal enthusiasts play around with the platform. Bearing in mind that AMD has not announced any formal overclocking support on these new APUs, NAMEGT, a South Korean overclocker with ties to ASUS, has pushed the A12-9800 APU to 4.8 GHz by adjusting the multiplier. To do this, he used an unreleased ASUS Octopus AM4 motherboard and AMD’s 125W Wraith air cooler (which will presumably be bundled with PIBs later in the product cycle).
NAMEGT ran this setup on multithreaded Cinebench 11.5 and Cinebench 15, scoring 4.77 and 380 respectively for a 4.8 GHz overclock. If we compare this to our Bench database results, we see the following
For Cinebench 15, this overclocked score puts the A12-9800 above the Haswell Core i3-4360 and the older AMD FX-4350, but below the newer Skylake i3-6100TE. The Athlon X4 845 at stock frequencies scored 314 while running at 3.5 GHz, which would suggest that a stock A12-9800 at 3.8 GHz would fall around the 340 mark.
(Since writing this, a preview by Korean website Bodnara, using the A12-9800 in a GIGABYTE motherboard, scored 334 for a stock Cinebench 15 multithreaded test and 96 for the single threaded test. We've added this result for perspective.)
When we previously tested the Excavator architecture for desktop on the 65W Athlon X4 845, overclocking was a nightmare, with stability being a large issue. At the time, we suspected that due to the core design being focused towards 15W, moving beyond 65W was perhaps a bit of a stretch for the design at hand. This time around, as we reported before, Bristol Ridge is using an updated 28nm process over Carrizo, which may have a hand in this.
When we asked AMD about overclocking details on the new APUs, the return reply was along the lines of ‘No OEM systems at this time will be unlocked, and no official comment on the individual units. More details will be released closer to the platform launch for DIY users’.
Post Your CommentPlease log in or sign up to comment.
View All Comments
ddriver - Saturday, September 24, 2016 - linkHey, at least Trump is only preposterous and stupid. Hillary is all that PLUS crazy and evil. She is just as racist as Trump, if not more so, but she is not in the habit of being honest, she'd prefer to claim the votes of minorities.
Politics is a joke and the current situation is a very good example of it. People deserve all shit that coming their way if they still put faith in the political process after this.
ClockHound - Friday, September 23, 2016 - link+101
Particularly enjoyed the term: "walled garden spyware milking station" model
Ok, not really enjoyed, cringed at the accuracy, however. ;-)
msroadkill612 - Wednesday, April 26, 2017 - linkAn adage I liked "If its free, YOU are the product."
hoohoo - Friday, September 23, 2016 - linkI see what you did there! Nicely done.
patrickjp93 - Saturday, September 24, 2016 - linkNo they aren't. If Geekbench optimized for x86 the way it does for ARM, the difference in performance per clock is nearly 5x
ddriver - Saturday, September 24, 2016 - linkYou have no idea what you are talking about. Geekbench is very much optimized, there are basically three types of optimization:
optimization done by the compiler - it eliminates redundant code, vertorizes loops and all that good stuff, that happens automatically
optimization by using intrinsics - do manually what the compiler does automatically, sometimes you could do better, but in general, compiler optimizations are very mature and very good at doing what they do
"optimization" of the type "if (CPUID != INTEL) doWorse()" - harmful optimization that doesn't really optimize anything in the true sense of the word, but deliberately chooses a less efficient code path to purposely harm the performance of a competitor - such optimizations are ALWAYS in the favor of the TOP DOG - be that intel or nvidia - companies who have excess of money to spend on such idiotic things. Smaller and less profitable companies like amd or arm - they don't do that kind of shit.
Finally, performance is not magic, you can't "optimize" and suddenly get 5X the performance. Process and TDP are a limiting factor, there is only so much performance you can get out of a chip produced at a given process for a given thermal budget. And that's if it is some perfectly efficient design. A 5W 20nm x86 chip could not possibly be any faster than a 5W 20nm ARM chip, intel has always had a slight edge in process, but if you manufacture an arm and a x86 chip on identical process (not just the claimed node size) with the same thermal budget the amr chip will be a tad faster, because the architecture is less bloated and more efficient.
It is a part of a dummy's belief system that arm chips are somehow fundamentally incapable of running professional software - on the contrary, hardware wise they are perfectly capable, only nobody bothers to write professional software for them.
patrickjp93 - Saturday, September 24, 2016 - linkI have a Bachelor's in computer science and specialized in high performance parallel, vectorized, and heterogeneous computing. I've disassembled Geekbench on x86 platforms, and it doesn't even use anything SSE or higher, and that's ancient Pentium III instructions.
It does not happen automatically if you don't use the right compiler flags and don't have your data aligned to allow the instructions to work.
You need intrinsics for a lot of things. Clang and GCC both have huge compiler bug forums filled with examples of where people beat the compilers significantly.
Yes you can get 5x the performance by optimizing. Geekbench only handles 1 datem at a time on Intel hardware vs. the 8 you can do with AVX and AVX2. Assuming you don't choke on bandwidth, you can get an 8x speedup.
ARM is not more efficient on merit, and x86 is not bloated by any stretch. Both use microcode now. ARM is no longer RISC by any strict definition.
Cavium has. Oracle has. Google has. Amazon has. In all cases ARM could not keep up with Avoton and Xeon D in performance/watt/$ and thus the industry stuck with Intel instead of Qualcomm or Cavium.
Toss3 - Sunday, September 25, 2016 - linkThis is a great post, and I just wanted to post an article by PC World where they discussed these things in simpler terms: http://www.pcworld.com/article/3006268/tablets/tes...
amagriva - Sunday, September 25, 2016 - linkGood post. To any interested a good paper on the subject : http://etn.se/images/expert/FD-SOI-eQuad-white-pap...
ddriver - Sunday, September 25, 2016 - linkI've been using GCC mostly, and in most of the cases after doing explicit vectorization I found no perf benefits, analyzing assembly afterwards revealed that the compiled has done a very good job at vectorizing wherever possible.
However, I am highly skeptical towards your claims, I'll believe it when I see it. I can't find the link now, but last year I've read detailed analysis, showing that A9X core performance per watt better than skylake over most of the A9X's clock range. And not in geekbench, but in SPEC.
As for geekbench, you make it sound as if they actually disabled vectorization explicitly. Which would be an odd thing. Not entirely clear what you mean by "1 datem at a time", but if you mean they are using scalar rather than vector instructions, that would be quite odd too. Luckily, I have better things to do than rummage about in geekbench machine code, so I will take your word that it is not properly optimized.
And sure, 256bit wide SIMD will have higher throughput than 128bit SIMD, but nowhere nearly 8 or even 5 times. And that doesn't make arm chips any less capable of running devices, which are more than useless toys. Those chips are more powerful than workstations were some 10 years ago, but their usability is nowhere near that. As the benchmarks from the link Toss3 posted indicate, the A9X is only some ~40% slower than i5-4300U in the "true/real world benchmarks", and that's a 15 watt chip vs the A9X is like what, 5-ish or something like that? And ARM is definitely more efficient once you account for intel's process advantage. This will become obvious if intel ever dare to manufacture arm cores at the same process as their own products. And it is not because of the ISA bloat but because of the design bloat.
Naturally, ARM chips are a low margin product, one cannot expect a 50$ chip to outperform a 300$ chip, but the gap appears to be closing, especially keeping in mind the brickwall process is going to hit the next decade. A 50$ chip running equal to a 300$ (and much wider design) chip from 2 year ago opens up a lot of possibilities, but I am not seeing any of them being realized by the industry.