Original Link: https://www.anandtech.com/show/16535/intel-core-i7-11700k-review-blasting-off-with-rocket-lake
Intel Core i7-11700K Review: Blasting Off with Rocket Lake
by Dr. Ian Cutress on March 5, 2021 4:30 PM EST- Posted in
- CPUs
- Intel
- 14nm
- Xe-LP
- Rocket Lake
- Cypress Cove
- i7-11700K
The march on performance with desktop platforms has unique challenges compared to other platforms. Peak single thread throughput is often considered the Holy Grail, with a fast follow up of good multi-core and all-core performance given the nature of how desktop platforms are used with background processes and multiple concurrent applications. In order to bring its best single core performance to the desktop market, Intel had to redesign its 10nm product on 14nm, which combines the high throughput of the design with the high frequency of 14nm. These redesigned Cypress Cove cores form the basis of Intel’s new 11th Gen Desktop Processor Family, Rocket Lake. Today we are reviewing the Core i7-11700K, an eight-core processor with hyperthreading able to boost up to 5.0 GHz.
Notice
The official launch date for these processors, and full reviews, is March 30th. We are currently under NDA with Intel for the information that has been provided by Intel, and will publish that information in due course. However, as noted in a number of press outlets, some units have already been sold at retail before that sales date. Units obtained by that method are not under NDA by definition, and we obtained the Core i7-11700K for this review at retail, and as such we are not under NDA for any information we have obtained through using this processor.
Before publishing this review, we gave Intel advance notice to respond to us having a full review ahead of the formal release. Our email seemingly generated some excitement inside (and to our surprise, outside) Intel, but we received a response from Intel stating that they had no comment to offer.
Update 1: This review was originally posted on March 5th using 0x2C microcode, and has been updated on March 14th with data from 0x34 microcode. The difference between the two is about +1.8% on CPU tests, +3% on Gaming tests, including performance regression in some areas. This review will showcase both sets of results. Details of the update can be found here.
Rocket Lake We Know About
Core i9-11900K and Core i7-11700K
Back at the start of the year, during CES, Intel disclosed product information about its lead halo product on the Rocket Lake platform, the Core i9-11900K. This includes some microarchitecture details, as well as core count, frequency, memory, graphics, and features relating to IO and the chipset. With our review here today, we can add the 11700K to that data with what we can probe from the processor.
AnandTech | Core i9-11900K |
Core i7-11700K |
SoC | Rocket Lake | Rocket Lake |
Microarchitecture | Cypress Cove | Cypress Cove |
Cores / Threads | 8 / 16 | 8 / 16 |
TDP | 125 W | 125 W |
Base Frequency | ? | 3600 MHz |
Turbo 2.0 (1-2 C) | ? | 4900 MHz |
Turbo 3.0 (1-2 C) | ? | 5000 MHz |
Thermal Velocity Boost | 5300 MHz | - |
All Core Turbo | 4800 MHz | 4600 MHz |
DDR4 | 2 x DDR4-3200 | 2 x DDR4-3200 |
GPU + EUs | Xe-LP, 32 EUs | Xe-LP, 32 EUs |
PCIe | 4.0 x16 + 4.0 x4 | 4.0x16 + 4.0 x4 |
AVX-512 | Yes | Yes |
Price | ? | We paid equivalent $469 |
The differences between the two Rocket Lake processors, based on available information, are slim. The main difference is that the Core i9 is known to have Intel’s Thermal Velocity Boost technology, however the Core i7 does not – this means the peak frequency is only 5.0 GHz, not 5.3 GHz. The all core frequency is only 200 MHz different.
The new generation Rocket Lake is the combination of two different backported technologies. Intel took the Sunny Cove core from 10nm Ice Lake, and re-built it on 14nm, calling it now Cypress Cove. Intel also took the Xe graphics from 10nm Tiger Lake and re-built those on 14nm, but these are still called Xe graphics.
We can see that the new design is an amalgam of new technologies, by comparing Rocket Lake to Comet Lake, Ice Lake, and Tiger Lake:
Microarchitecture Comparison | |||||
AnandTech | Comet Lake |
Rocket Lake |
Ice Lake |
Tiger Lake |
Ryzen 5000 |
Form Factor | Desktop | Desktop | Laptop | Laptop | Desktop |
Max Cores | 10 | 8 | 4 | 4 | 16 |
TDP | 125 W | 125 W | 28 W | 35 W | 105 W |
uArch | Comet | Cypress | Sunny | Willow | Zen 3 |
IGP | Gen 9 | Xe-LP | Gen 11 | Xe | - |
IGP Cores | 24 | 32 | 64 | 96 | - |
L1-D | 32 KB /c | 48 KB /c | 48 KB /c | 48 KB/c | 32 KB/c |
L2 Cache | 256 KB /c | 512 KB /c | 512 KB/c | 1280KB /c | 512 KB/c |
L3 Cache | 20 MB | 16 MB | 8 MB | 12 MB | 64 MB |
PCIe | 3.0 x16 | 4.0 x20 | 3.0 x8 | 4.0 x4 | 4.0 x24 |
DDR4 | 2 x 2933 | 2 x 3200 | 2 x 3200 | 2 x 3200 | 2 x 3200 |
LPDDR4X | - | - | 4 x 3733 | 4 x 4266 | - |
There are obviously some differences between the notebook and desktop parts, most noticeably that the new platform at the high-end has only eight cores, two fewer than Comet Lake.
This is because Intel found 8 cores to be the best balance of die area, power consumption, performance, and cost. Several times I’ve seen Intel spokespeople say the reason for 8 cores being ‘the most we could fit’, although that’s categorically false. More cores can be added, but overall they would run at a lower frequency for the same power, the interconnect might not scale, or the die size/yield would raise the price too much. The phrase ‘the most we could fit’, by all technical understanding, is a steaming pile. It needs additional qualifiers, or to simply say 'the best fit given die area, yield, and cost'.
Additional improvements over Comet Lake include AVX512 units, support for 20 PCIe 4.0 lanes, and faster memory. With the new chipsets, Intel has already disclosed that the Rocket Lake platform will have native USB 3.2 Gen 2x2 (20 Gbps), and with the Z590 motherboards, a double bandwidth link from CPU to the chipset, moving from DMI x4 to DMI x8, effectively a PCIe 3.0 x8 link.
Rocket Lake on 14nm: The Best of a Bad Situation
The delays around the viability of Intel’s 10nm manufacturing have been well documented. To date, the company has launched several products on its 10nm process for notebooks, such as Cannon Lake, Ice Lake, Jasper Lake, Elkhart Lake, and Tiger Lake. There have been other non-consumer products, such as Agilex FPGAs and Snow Ridge 5G SoCs, and Intel has confirmed that its 10nm server products ‘Ice Lake Xeon Scalable’, are currently in volume production for a late Q1 launch.
The one product line missing from that list is the desktop and enthusiast segments that typically use socketed processors paired with discrete graphics. Intel has always committed to launching desktop processors on its 10nm process, however we are yet to see the results of their efforts. The issues Intel is having with 10nm have never been fully elaborated on, with Intel instead opting to promote some of the improvements made, such as its new SuperFin technology, which is in Tiger Lake and the next-generation server platform beyond Ice Lake Xeon Scalable (for those keeping track, that would be Sapphire Rapids). The 10nm improvements so far has enabled Intel to launch notebook processors and server processors, both of which have lower power-per-core than a typical desktop offering.
As 10nm has not been able to meet the standards required for desktop-level performance, rather than leave a potential 3 year gap in the desktop product family, Intel has been in a holding pattern releasing slightly upgraded versions of Skylake on slightly improved variants of 14nm. The first two members of the Skylake family, Skylake and Kaby Lake were released as expected. While waiting, we saw Intel release Coffee Lake, Coffee Lake Refresh, and Comet Lake. Each of these afforded minor updates in frequency, or core count, or power, but very little in the way of fundamental microarchitectural improvement. The goal all along was to move to 10nm with the same architecture as the mobile Ice Lake processors, but that wasn’t feasible due to manufacturing limitations limiting how well the processors scaled to desktop level power.
- Skylake, Core 6th Gen in August 2015
- Kaby Lake, Core 7th Gen in January 2017
- Coffee Lake, Core 8th Gen in October 2017
- Coffee Lake Refresh, Core 9th Gen in October 2018
- Comet Lake, Core 10th Gen in April 2020
- Rocket Lake, Core 11th Gen in March 2021
With previous generations, Intel traditionally had either upgraded the process node technology, or updated the microarchitecture – a process that Intel called Tick-Tock. Originally Intel was set to perform a normal ‘Tick’ after Kaby Lake, and have Cannon Lake with the same effective Skylake microarchitecture move to 10nm. Cannon Lake ending up only as a laptop processor with no working graphics in a small number of notebooks in China as it was a hot mess (as shown in our review). As a result, Intel refocused its 10nm for notebook processors hoping that advances would also be applicable to desktop, but the company had to release minor upgrades on desktop from Coffee Lake onwards to keep the product line going.
This meant that at some level Intel knew that it would have to combine both a new architecture and a new process node jump into one product cycle. At some point however, Intel realized that the intercept point with having a new microarchitecture and the jump for the desktop to 10nm was very blurry, and somewhat intangible, and at a time when its main competitor was starting to make noise about a new product that could reach parity in single core performance. In order to keep these important product lines going, drastic measures would have to be taken.
After many meetings with many biscuits, we presume, the decision was made that Intel would take the core microarchitecture design from 10nm Ice Lake, which couldn’t reach high enough frequencies under desktop power, and repackage that design for the more dependable 14nm node which could reach the required absolute performance numbers. This is known as a ‘backport’.
Sunny Cove becomes Cypress Cove
The new Core 11th Gen processor which we are looking at today has the codename Rocket Lake. That’s the name for the whole processor, which consists of cores, graphics, interconnect, and other different accelerators and IP blocks, each of which also have their own codenames, just for the sake of making it easier for the engineers to understand what parts are in use. We use these codenames a lot, and the one to focus on here is the CPU core.
Intel’s 10nm Ice Lake notebook processor family uses Sunny Cove cores in the design. It is these cores that have been backported to 14nm for use in the Rocket Lake processors, and because it is on a different process node and there are some minor design changes, Intel calls them Cypress Cove cores.
The reason behind this is because taking a design for one manufacturing process and designing it for a second is no easy task, especially if it’s a regressive step – transistors are bigger, which means logic blocks are bigger, and all the work done with respect to signaling and data paths in the silicon has to be redone. Even with a rework, signal integrity needs to be upgraded for longer distances, or additional path delays and buffers need to be implemented. Any which way you cut it, a 10nm core is bigger when designed for 14nm, consumes more power, and has the potential to be fundamentally slower at execution level.
Intel’s official disclosures to date on the new Cypress Cove cores and Rocket Lake stem from a general briefing back in October, as well as a more product oriented announcement at CES in January. Intel is promoting that the new Cypress Cove core offers ‘up to a +19%’ instruction per clock (IPC) generational improvement over the cores used in Comet Lake, which are higher frequency variants of Skylake from 2015. However, the underlying microarchitecture is promoted as being identical to Ice Lake for mobile processors, such as caches and execution, and overall the new Rocket Lake SoC has a number of other generational improvements new to Intel’s desktop processors.
In This Review, and Limitations
As mentioned at the outset, this is a review prior to the official review embargo for these processors. We are able to post this outside of the NDA as we were able to obtain the hardware at retail. There is still a lot of information that has not been disclosed, the sort of thing that normally accompanies a new processor launch, and whatever Intel has told is still part under NDA – details of which are also under the same NDA. So we won’t be able to go into those just yet, but we can start to fire some benchmark data at you. In this review we’re focusing mainly on the generational 8-core offerings across a number of products and generations.
8-Core CPU Comparison | ||||||
AnandTech | Core i9-9900KS |
Core i7-10700K |
Core i7-11700K |
Ryzen 7 5800X |
Ryzen 7 4750G |
|
uArch | Coffee Refresh |
Comet Lake |
Cypress Cove |
Zen 3 | Zen 2 + Vega |
|
Cores | 8 C / 16 T | 8 C / 16 T | 8 C / 16 T | 8 C / 16 T | 8 C / 16 T | |
Base Freq | 4000 | 3800 | 3600 | 3800 | 3600 | |
Turbo Freq | 5000 | 5100 | 5000 | 4800 | 4400 | |
All-Core | 5000 | 4700 | 4600 | ~4550 | ~4150 | |
TDP | 127 W | 125 W | 125 W | 105 W | 65 W | |
IGP / EUs | Gen 9, 24 | Gen 9, 24 | Xe-LP, 32 | - | Vega, 8 | |
L3 Cache | 16 MB | 16 MB | 16 MB | 32 MB | 8 MB | |
DDR4 | 2 x 2666 | 2 x 2933 | 2 x 3200 | 2 x 3200 | 2 x 3200 | |
PCIe | 3.0 x16 | 3.0 x16 | 4.0 x20 | 4.0 x24 | 3.0 x8 | |
MSRP | $513 box | $387 box | ? | $449 SEP | ~$345 |
We paid 394€ for our processor pre-tax, which comes to $469. We suspect this is well above Intel's recommended retail price, given that this was sold before the official sales date and demand for high performance processors is very high.
Test Setup and #CPUOverload Benchmarks
As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also run at JEDEC subtimings where possible. Reasons are explained here.
Test Setup | |||||
Intel Rocket Lake |
Core i7-11700K | MB1: Microcode 0x2C MB2: Microcode 0x34 |
TRUE Copper + SST* |
ADATA 4x32 GB DDR4-3200 |
|
Intel Comet Lake |
Core i7-10700K | - | - | TRUE Copper + SST* |
ADATA 4x32 GB DDR4-2933 |
Intel Coffee Refresh |
Core i9-9900KS | MSI MPG Z390 Gaming Edge AC |
AB0 | TRUE Copper +SST* |
ADATA 4x32GB DDR4-2666 |
AMD AM4 |
Ryzen 7 5800X Ryzen 7 4750G |
GIGABYTE X570I Aorus Pro |
F31L | Noctua NHU-12S SE-AM4 |
ADATA 2x32 GB DDR4-3200 |
GPU | Sapphire RX 460 2GB (CPU Tests) NVIDIA RTX 2080 Ti FE (Gaming Tests) |
||||
PSU | Corsair AX860i | ||||
SSD | Crucial MX500 2TB | ||||
*TRUE Copper used with Silverstone SST-FHP141-VF 173 CFM fans. Nice and loud. **Other CPUs in graphs tested in same systems for CPU family |
While we can't disclose the motherboard used due to NDA reasons, it has already been announced by the manufacturer. Meanwhile, the BIOS used is likely not the final variant that will be used for Rocket Lake's retail launch later this month, and further BIOSes may contain potential minor adjustments to performance or turbo responses.
As an addendum to this review a week after our original numbers, we obtained a second motherboard that offered a newer microcode version from Intel. The original motherboard still offered the same microcode at that time. For more details, please see this link.
We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.
Hardware Providers for CPU and Motherboard Reviews | |||
Sapphire RX 460 Nitro |
NVIDIA RTX 2080 Ti |
Crucial SSDs | Corsair PSUs |
G.Skill DDR4 | ADATA DDR4 | Silverstone Coolers |
Noctua Coolers |
A big thanks to ADATA for the AD4U3200716G22-SGN modules for this review. They're currently the backbone of our AMD testing.
Users interested in the details of our current CPU benchmark suite can refer to our #CPUOverload article which covers the topics of benchmark automation as well as what our suite runs and why. We also benchmark much more data than is shown in a typical review, all of which you can see in our benchmark database. We call it ‘Bench’, and there’s also a link on the top of the website in case you need it for processor comparison in the future.
Table Of Contents
- Intel Core i7-11700K Review: Blasting Off with Rocket Lake
- Power Consumption: Hot Hot HOT
- CPU Tests: Microbenchmarks
- CPU Tests: Office and Science
- CPU Tests: Office and Science
- CPU Tests: Encoding
- CPU Tests: SPEC
- Gaming Tests
- Conclusion: The War of Attrition
Note: If you've reached this far, great! This is not the end of the review. We have over a dozen pages with data and benchmark results, as well as a conclusion. They are all in the drop down box below.
Power Consumption: Hot Hot HOT
I won’t rehash the full ongoing issue with how companies report power vs TDP in this review – we’ve covered it a number of times before. But in a quick sentence, Intel uses one published value for sustained performance, and an unpublished ‘recommended’ value for turbo performance, the latter of which is routinely ignored by motherboard manufacturers. Most high-end consumer motherboards ignore the sustained value, often 125 W, and allow the CPU to consume as much as it needs with the real limits being the full power consumption at full turbo, the thermals, or the power delivery limitations.
One of the dimensions of this we don’t often talk about is that the power consumption of a processor is always dependent on the actual instructions running through the core. A core can be ‘100%’ active while sitting around waiting for data from memory or doing simple addition, however a core has multiple ways to run instructions in parallel, with the most complex instructions consuming the most power. This was noticeable in the desktop consumer space when Intel introduced vector extensions, AVX, to its processor design. The concurrent introduction of AVX2, and AVX-512, means that running these instructions draws the most power.
AVX-512 comes with its own discussion, because even going into an ‘AVX-512’ mode causes additional issues. Intel’s introduction of AVX-512 on its server processors showcased that in order to remain stable, the core had to reduce the frequency and increase the voltage while also pausing the core to enter the special AVX-512 power mode. This made the advantage of AVX-512 suitable only for strong high-performance server code. But now Intel has enabled AVX-512 across its product line, from notebook to enterprise, allowing these chips to run AI code faster and enabling a new use cases. We’re also a couple of generations on from then, and AVX-512 doesn’t get quite the same hit as it did, but it still requires a lot of power.
For our power benchmarks, we’ve taken several tests that represent a real-world compute workload, a strong AVX2 workload, and a strong AVX-512 workload. Note that Intel lists the Core i7-11700K as a 125 W processor.
Motherboard 1: Microcode 0x2C
Our first test using Agisoft Photoscan 1.3 shows a peak power consumption around 180 W, although depending on the part of the test, we have sustained periods at 155 W and 130 W. Peak temperatures flutter with 70ºC, but it spends most of the time at around the 60ºC mark.
For the AVX2 workload, we enable POV-Ray. This is the workload on which we saw the previous generation 10-core processors exceed 260 W.
At idle, the CPU is consuming under 20 W while touching 30ºC. When the workload kicks in after 200 seconds or so, the power consumption rises very quickly to the 200-225 W band. This motherboard implements the ‘infinite turbo’ strategy, and so we get a sustained 200-225 W for over 10 minutes. Through this time, our CPU peaks at 81ºC, which is fairly reasonable for some of the best air cooling on the market. During this test, a sustained 4.6 GHz was on all cores.
Our AVX-512 workload is 3DPM. This is a custom in-house test, accelerated to AVX2 and AVX512 by an ex-Intel HPC guru several years ago (for disclosure, AMD has a copy of the code, but hasn’t suggested any changes).
This tests for 10-15 seconds and then idles for 10 seconds, and does rapidly go through any system that doesn’t run an infinite turbo. What we see here in this power only graph is the alarming peaks of 290-292 W. Looking at our data, the all-core turbo under AVX-512 is 4.6 GHz, sometimes dipping to 4.5 GHz. Ouch. But that’s not all.
Our temperature graph looks quite drastic. Within a second of running AVX-512 code, we are in the high 90ºC, or in some cases, 100ºC. Our temperatures peak at 104ºC, and here’s where we get into a discussion about thermal hotspots.
There are a number of ways to report CPU temperature. We can either take the instantaneous value of a singular spot of the silicon while it’s currently going through a high-current density event, like compute, or we can consider the CPU as a whole with all of its thermal sensors. While the overall CPU might accept operating temperatures of 105ºC, individual elements of the core might actually reach 125ºC instantaneously. So what is the correct value, and what is safe?
The cooler we’re using on this test is arguably the best air cooling on the market – a 1.8 kilogram full copper ThermalRight Ultra Extreme, paired with a 170 CFM high static pressure fan from Silverstone. This cooler has been used for Intel’s 10-core and 18-core high-end desktop variants over the years, even the ones with AVX-512, and not skipped a beat. Because we’re seeing 104ºC here, are we failing in some way?
Another issue we’re coming across with new processor technology is the ability to effectively cool a processor. I’m not talking about cooling the processor as a whole, but more for those hot spots of intense current density. We are going to get to a point where can’t remove the thermal energy fast enough, or with this design, we might be there already.
Smaller Packaging
I will point out an interesting fact down this line of thinking though, which might go un-noticed by the rest of the press – Intel has reduced the total vertical height of the new Rocket Lake processors.
The z-height, or total vertical height, of the previous Comet Lake generation was 4.48-4.54 mm. This number was taken from a range of 7 CPUs I had to hand. However, this Rocket Lake processor is over 0.1 mm thinner, at 4.36 mm. The smaller height of the package plus heatspreader could be a small indicator to the required thermal performance, especially if the airgap (filled with solder) between the die and the heatspreader is smaller. If it aids cooling and doesn’t disturb how coolers fit, then great, however at some point in the future we might have to consider different, better, or more efficient ways to remove these thermal hotspots.
Motherboard 2: Microcode 0x34
As an addendum to this review a week after our original numbers, we obtained a second motherboard that offered a newer microcode version from Intel.
On this motherboard, the AVX-512 response was different enough to warrant mentioning. Rather than enable a 4.6 GHz all-core turbo for AVX-512, it initially ramped up that high, peaking at 276 W, before reducing down to 4.4 GHz all-core, down to 225 W. This is quite a substantial change in behaviour:
This means that at 4.4 GHz, we are running 200 MHz slower (which gives a 3% performance decrease), but we are saving 60-70 W. This is indicative of how far away from the peak efficiency point that these processors are.
There was hope that this will adjust the temperature curve a little. Unfortunately we still see peaks at 103ºC when AVX-512 is first initiated, however during the 4.4 GHz time scale we are more akin to 90ºC, which is far more palatable.
On AVX2 workloads with the new 0x34 microcode, the results were very similar to the 0x2C microcode. The workload ran at 4.6 GHz all-core, reached a peak power of 214 W, and the processor temperature was sustained around 82ºC.
Peak Power Comparison
For completeness, here is our peak power consumption graph. These are the peak power consumption numbers taken from a series of benchmarks on which we run our power monitoring tools.
CPU Tests: Microbenchmarks
Core-to-Core Latency
As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.
But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.
If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.
The core-to-core numbers are interesting, being worse (higher) than the previous generation across the board. Here we are seeing, mostly, 28-30 nanoseconds, compared to 18-24 nanoseconds with the 10700K. This is part of the L3 latency regression, as shown in our next tests.
One pair of threads here are very fast to access all cores, some 5 ns faster than any others, which again makes the layout more puzzling.
Update 1: With microcode 0x34, we saw no update to the core-to-core latencies.
Cache-to-DRAM Latency
This is another in-house test built by Andrei, which showcases the access latency at all the points in the cache hierarchy for a single core. We start at 2 KiB, and probe the latency all the way through to 256 MB, which for most CPUs sits inside the DRAM (before you start saying 64-core TR has 256 MB of L3, it’s only 16 MB per core, so at 20 MB you are in DRAM).
Part of this test helps us understand the range of latencies for accessing a given level of cache, but also the transition between the cache levels gives insight into how different parts of the cache microarchitecture work, such as TLBs. As CPU microarchitects look at interesting and novel ways to design caches upon caches inside caches, this basic test proves to be very valuable.
Looking at the rough graph of the 11700K and the general boundaries of the cache hierarchies, we again see the changes of the microarchitecture that had first debuted in Intel’s Sunny Cove cores, such as the move from an L1D cache from 32KB to 48KB, as well as the doubling of the L2 cache from 256KB to 512KB.
The L3 cache on these parts look to be unchanged from a capacity perspective, featuring the same 16MB which is shared amongst the 8 cores of the chip.
On the DRAM side of things, we’re not seeing much change, albeit there is a small 2.1ns generational regression at the full random 128MB measurement point. We’re using identical RAM sticks at the same timings between the measurements here.
It’s to be noted that these slight regressions are also found across the cache hierarchies, with the new CPU, although it’s clocked slightly higher here, shows worse absolute latency than its predecessor, it’s also to be noted that AMD’s newest Zen3 based designs showcase also lower latency across the board.
With the new graph of the Core i7-11700K with microcode 0x34, the same cache structures are observed, however we are seeing better performance with L3.
The L1 cache structure is the same, and the L2 is of a similar latency. In our previous test, the L3 latency was 50.9 cycles, but with the new microcode is now at 45.1 cycles, and is now more in line with the L3 cache on Comet Lake.
Out at DRAM, our 128 MB point reduced from 82.4 nanoseconds to 72.8 nanoseconds, which is a 12% reduction, but not the +40% reduction that other media outlets are reporting as we feel our tools are more accurate. Similarly, for DRAM bandwidth, we are seeing a +12% memory bandwidth increase between 0x2C and 0x34, not the +50% bandwidth others are claiming. (BIOS 0x1B however, was significantly lower than this, resulting in a +50% bandwidth increase from 0x1B to 0x34.)
In the previous edition of our article, we questioned the previous L3 cycle being a larger than estimated regression. With the updated microcode, the smaller difference is still a regression, but more in line with our expectations. We are waiting to hear back from Intel what differences in the microcode encouraged this change.
Frequency Ramping
Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.
Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.
One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.
We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.
We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.
Our ramp test shows a jump straight from 800 MHz up to 4900 MHz in around 17 milliseconds, or a frame at 60 Hz.
CPU Tests: Office and Science
Our previous set of ‘office’ benchmarks have often been a mix of science and synthetics, so this time we wanted to keep our office section purely on real world performance.
Agisoft Photoscan 1.3.3: link
The concept of Photoscan is about translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the final 3D model in both spatial accuracy and texturing accuracy. The algorithm has four stages, with some parts of the stages being single-threaded and others multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.
For the update to version 1.3.3, the Agisoft software now supports command line operation. Agisoft provided us with a set of new images for this version of the test, and a python script to run it. We’ve modified the script slightly by changing some quality settings for the sake of the benchmark suite length, as well as adjusting how the final timing data is recorded. The python script dumps the results file in the format of our choosing. For our test we obtain the time for each stage of the benchmark, as well as the overall time.
There is a small performance gain here in the real world test across the three generations of Intel processors, however it is still a step away from AMD.
Application Opening: GIMP 2.10.18
First up is a test using a monstrous multi-layered xcf file to load GIMP. While the file is only a single ‘image’, it has so many high-quality layers embedded it was taking north of 15 seconds to open and to gain control on the mid-range notebook I was using at the time.
What we test here is the first run - normally on the first time a user loads the GIMP package from a fresh install, the system has to configure a few dozen files that remain optimized on subsequent opening. For our test we delete those configured optimized files in order to force a ‘fresh load’ each time the software in run. As it turns out, GIMP does optimizations for every CPU thread in the system, which requires that higher thread-count processors take a lot longer to run.
We measure the time taken from calling the software to be opened, and until the software hands itself back over to the OS for user control. The test is repeated for a minimum of ten minutes or at least 15 loops, whichever comes first, with the first three results discarded.
The app initialization test here favors single core performance, and AMD wins despite the lower single thread frequency. The 9900KS has a slight advantage, being a guaranteed 5.0 GHz, but none of the improved IPC from the Cypress Cove seems to come into play here.
Science
In this version of our test suite, all the science focused tests that aren’t ‘simulation’ work are now in our science section. This includes Brownian Motion, calculating digits of Pi, molecular dynamics, and for the first time, we’re trialing an artificial intelligence benchmark, both inference and training, that works under Windows using python and TensorFlow. Where possible these benchmarks have been optimized with the latest in vector instructions, except for the AI test – we were told that while it uses Intel’s Math Kernel Libraries, they’re optimized more for Linux than for Windows, and so it gives an interesting result when unoptimized software is used.
3D Particle Movement v2.1: Non-AVX and AVX2/AVX512
This is the latest version of this benchmark designed to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. This involves randomly moving particles in a 3D space using a set of algorithms that define random movement. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in.
The initial version of v2.1 is a custom C++ binary of my own code, and flags are in place to allow for multiple loops of the code with a custom benchmark length. By default this version runs six times and outputs the average score to the console, which we capture with a redirection operator that writes to file.
For v2.1, we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere. According to Jim Keller, there are only a couple dozen or so people who understand how to extract the best performance out of a CPU, and this guy is one of them. To keep things honest, AMD also has a copy of the code, but has not proposed any changes.
The 3DPM test is set to output millions of movements per second, rather than time to complete a fixed number of movements.
When AVX-512 comes to play, every-one else goes home. Easiest and clearest win for Intel.
y-Cruncher 0.78.9506: www.numberworld.org/y-cruncher
If you ask anyone what sort of computer holds the world record for calculating the most digits of pi, I can guarantee that a good portion of those answers might point to some colossus super computer built into a mountain by a super-villain. Fortunately nothing could be further from the truth – the computer with the record is a quad socket Ivy Bridge server with 300 TB of storage. The software that was run to get that was y-cruncher.
Built by Alex Yee over the last part of a decade and some more, y-Cruncher is the software of choice for calculating billions and trillions of digits of the most popular mathematical constants. The software has held the world record for Pi since August 2010, and has broken the record a total of 7 times since. It also holds records for e, the Golden Ratio, and others. According to Alex, the program runs around 500,000 lines of code, and he has multiple binaries each optimized for different families of processors, such as Zen, Ice Lake, Sky Lake, all the way back to Nehalem, using the latest SSE/AVX2/AVX512 instructions where they fit in, and then further optimized for how each core is built.
For our purposes, we’re calculating Pi, as it is more compute bound than memory bound. In ST and MT mode we calculate 250 million digits.
In ST mode, we are more dominated by the AVX-512 instructions, whereas in MT it becomes a mix of memory as well.
NAMD 2.13 (ApoA1): Molecular Dynamics
One of the popular science fields is modeling the dynamics of proteins. By looking at how the energy of active sites within a large protein structure over time, scientists behind the research can calculate required activation energies for potential interactions. This becomes very important in drug discovery. Molecular dynamics also plays a large role in protein folding, and in understanding what happens when proteins misfold, and what can be done to prevent it. Two of the most popular molecular dynamics packages in use today are NAMD and GROMACS.
NAMD, or Nanoscale Molecular Dynamics, has already been used in extensive Coronavirus research on the Frontier supercomputer. Typical simulations using the package are measured in how many nanoseconds per day can be calculated with the given hardware, and the ApoA1 protein (92,224 atoms) has been the standard model for molecular dynamics simulation.
Luckily the compute can home in on a typical ‘nanoseconds-per-day’ rate after only 60 seconds of simulation, however we stretch that out to 10 minutes to take a more sustained value, as by that time most turbo limits should be surpassed. The simulation itself works with 2 femtosecond timesteps. We use version 2.13 as this was the recommended version at the time of integrating this benchmark into our suite. The latest nightly builds we’re aware have started to enable support for AVX-512, however due to consistency in our benchmark suite, we are retaining with 2.13. Other software that we test with has AVX-512 acceleration.
The 11700K shows some improvement over the previous generations of Intel, however it does sit much in the middle of the APU and the Zen 3.
AI Benchmark 0.1.2 using TensorFlow: Link
Finding an appropriate artificial intelligence benchmark for Windows has been a holy grail of mine for quite a while. The problem is that AI is such a fast moving, fast paced word that whatever I compute this quarter will no longer be relevant in the next, and one of the key metrics in this benchmarking suite is being able to keep data over a long period of time. We’ve had AI benchmarks on smartphones for a while, given that smartphones are a better target for AI workloads, but it also makes some sense that everything on PC is geared towards Linux as well.
Thankfully however, the good folks over at ETH Zurich in Switzerland have converted their smartphone AI benchmark into something that’s useable in Windows. It uses TensorFlow, and for our benchmark purposes we’ve locked our testing down to TensorFlow 2.10, AI Benchmark 0.1.2, while using Python 3.7.6.
The benchmark runs through 19 different networks including MobileNet-V2, ResNet-V2, VGG-19 Super-Res, NVIDIA-SPADE, PSPNet, DeepLab, Pixel-RNN, and GNMT-Translation. All the tests probe both the inference and the training at various input sizes and batch sizes, except the translation that only does inference. It measures the time taken to do a given amount of work, and spits out a value at the end.
There is one big caveat for all of this, however. Speaking with the folks over at ETH, they use Intel’s Math Kernel Libraries (MKL) for Windows, and they’re seeing some incredible drawbacks. I was told that MKL for Windows doesn’t play well with multiple threads, and as a result any Windows results are going to perform a lot worse than Linux results. On top of that, after a given number of threads (~16), MKL kind of gives up and performance drops of quite substantially.
So why test it at all? Firstly, because we need an AI benchmark, and a bad one is still better than not having one at all. Secondly, if MKL on Windows is the problem, then by publicizing the test, it might just put a boot somewhere for MKL to get fixed. To that end, we’ll stay with the benchmark as long as it remains feasible.
Every generation of Intel seems to regress with AI Benchmark, most likely due to MKL issues. I have previously identified the issue for Intel, however I have not heard of any progress to date.
CPU Tests: Simulation
Simulation and Science have a lot of overlap in the benchmarking world, however for this distinction we’re separating into two segments mostly based on the utility of the resulting data. The benchmarks that fall under Science have a distinct use for the data they output – in our Simulation section, these act more like synthetics but at some level are still trying to simulate a given environment.
DigiCortex v1.35: link
DigiCortex is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation, similar to a small slug.
The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.
The software originally shipped with a benchmark that recorded the first few cycles and output a result. So while fast multi-threaded processors this made the benchmark last less than a few seconds, slow dual-core processors could be running for almost an hour. There is also the issue of DigiCortex starting with a base neuron/synapse map in ‘off mode’, giving a high result in the first few cycles as none of the nodes are currently active. We found that the performance settles down into a steady state after a while (when the model is actively in use), so we asked the author to allow for a ‘warm-up’ phase and for the benchmark to be the average over a second sample time.
For our test, we give the benchmark 20000 cycles to warm up and then take the data over the next 10000 cycles seconds for the test – on a modern processor this takes 30 seconds and 150 seconds respectively. This is then repeated a minimum of 10 times, with the first three results rejected. Results are shown as a multiple of real-time calculation.
AMD's single chiplet design seems to get a big win here, but DigiCortex can use AVX-512 so the 11700K gets a healthy boost over the previous generation.
Dwarf Fortress 0.44.12: Link
Another long standing request for our benchmark suite has been Dwarf Fortress, a popular management/roguelike indie video game, first launched in 2006 and still being regularly updated today, aiming for a Steam launch sometime in the future.
Emulating the ASCII interfaces of old, this title is a rather complex beast, which can generate environments subject to millennia of rule, famous faces, peasants, and key historical figures and events. The further you get into the game, depending on the size of the world, the slower it becomes as it has to simulate more famous people, more world events, and the natural way that humanoid creatures take over an environment. Like some kind of virus.
For our test we’re using DFMark. DFMark is a benchmark built by vorsgren on the Bay12Forums that gives two different modes built on DFHack: world generation and embark. These tests can be configured, but range anywhere from 3 minutes to several hours. After analyzing the test, we ended up going for three different world generation sizes:
- Small, a 65x65 world with 250 years, 10 civilizations and 4 megabeasts
- Medium, a 127x127 world with 550 years, 10 civilizations and 4 megabeasts
- Large, a 257x257 world with 550 years, 40 civilizations and 10 megabeasts
DFMark outputs the time to run any given test, so this is what we use for the output. We loop the small test for as many times possible in 10 minutes, the medium test for as many times in 30 minutes, and the large test for as many times in an hour.
With the small worlds, the 11700K gets a small boost over previous Intel hardware, but this evens out as the worlds get bigger.
Dolphin v5.0 Emulation: Link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in seconds, where the Wii itself scores 1051 seconds.
CPU Tests: Rendering
Rendering tests, compared to others, are often a little more simple to digest and automate. All the tests put out some sort of score or time, usually in an obtainable way that makes it fairly easy to extract. These tests are some of the most strenuous in our list, due to the highly threaded nature of rendering and ray-tracing, and can draw a lot of power. If a system is not properly configured to deal with the thermal requirements of the processor, the rendering benchmarks is where it would show most easily as the frequency drops over a sustained period of time. Most benchmarks in this case are re-run several times, and the key to this is having an appropriate idle/wait time between benchmarks to allow for temperatures to normalize from the last test.
Blender 2.83 LTS: Link
One of the popular tools for rendering is Blender, with it being a public open source project that anyone in the animation industry can get involved in. This extends to conferences, use in films and VR, with a dedicated Blender Institute, and everything you might expect from a professional software package (except perhaps a professional grade support package). With it being open-source, studios can customize it in as many ways as they need to get the results they require. It ends up being a big optimization target for both Intel and AMD in this regard.
For benchmarking purposes, we fell back to one rendering a frame from a detailed project. Most reviews, as we have done in the past, focus on one of the classic Blender renders, known as BMW_27. It can take anywhere from a few minutes to almost an hour on a regular system. However now that Blender has moved onto a Long Term Support model (LTS) with the latest 2.83 release, we decided to go for something different.
We use this scene, called PartyTug at 6AM by Ian Hubert, which is the official image of Blender 2.83. It is 44.3 MB in size, and uses some of the more modern compute properties of Blender. As it is more complex than the BMW scene, but uses different aspects of the compute model, time to process is roughly similar to before. We loop the scene for at least 10 minutes, taking the average time of the completions taken. Blender offers a command-line tool for batch commands, and we redirect the output into a text file.
A marginal win for Intel in Blender is a good result, as well as a few % points over the 5.0 GHz 9900KS.
Corona 1.3: Link
Corona is billed as a popular high-performance photorealistic rendering engine for 3ds Max, with development for Cinema 4D support as well. In order to promote the software, the developers produced a downloadable benchmark on the 1.3 version of the software, with a ray-traced scene involving a military vehicle and a lot of foliage. The software does multiple passes, calculating the scene, geometry, preconditioning and rendering, with performance measured in the time to finish the benchmark (the official metric used on their website) or in rays per second (the metric we use to offer a more linear scale).
The standard benchmark provided by Corona is interface driven: the scene is calculated and displayed in front of the user, with the ability to upload the result to their online database. We got in contact with the developers, who provided us with a non-interface version that allowed for command-line entry and retrieval of the results very easily. We loop around the benchmark five times, waiting 60 seconds between each, and taking an overall average. The time to run this benchmark can be around 10 minutes on a Core i9, up to over an hour on a quad-core 2014 AMD processor or dual-core Pentium.
Crysis CPU-Only Gameplay
One of the most oft used memes in computer gaming is ‘Can It Run Crysis?’. The original 2007 game, built in the Crytek engine by Crytek, was heralded as a computationally complex title for the hardware at the time and several years after, suggesting that a user needed graphics hardware from the future in order to run it. Fast forward over a decade, and the game runs fairly easily on modern GPUs.
But can we also apply the same concept to pure CPU rendering? Can a CPU, on its own, render Crysis? Since 64 core processors entered the market, one can dream. So we built a benchmark to see whether the hardware can.
For this test, we’re running Crysis’ own GPU benchmark, but in CPU render mode. This is a 2000 frame test, with medium and low settings.
POV-Ray 3.7.1: Link
A long time benchmark staple, POV-Ray is another rendering program that is well known to load up every single thread in a system, regardless of cache and memory levels. After a long period of POV-Ray 3.7 being the latest official release, when AMD launched Ryzen the POV-Ray codebase suddenly saw a range of activity from both AMD and Intel, knowing that the software (with the built-in benchmark) would be an optimization tool for the hardware.
We had to stick a flag in the sand when it came to selecting the version that was fair to both AMD and Intel, and still relevant to end-users. Version 3.7.1 fixes a significant bug in the early 2017 code that was advised against in both Intel and AMD manuals regarding to write-after-read, leading to a nice performance boost.
The benchmark can take over 20 minutes on a slow system with few cores, or around a minute or two on a fast system, or seconds with a dual high-core count EPYC. Because POV-Ray draws a large amount of power and current, it is important to make sure the cooling is sufficient here and the system stays in its high-power state. Using a motherboard with a poor power-delivery and low airflow could create an issue that won’t be obvious in some CPU positioning if the power limit only causes a 100 MHz drop as it changes P-states.
V-Ray: Link
We have a couple of renderers and ray tracers in our suite already, however V-Ray’s benchmark came through for a requested benchmark enough for us to roll it into our suite. Built by ChaosGroup, V-Ray is a 3D rendering package compatible with a number of popular commercial imaging applications, such as 3ds Max, Maya, Undreal, Cinema 4D, and Blender.
We run the standard standalone benchmark application, but in an automated fashion to pull out the result in the form of kilosamples/second. We run the test six times and take an average of the valid results.
Cinebench R20: Link
Another common stable of a benchmark suite is Cinebench. Based on Cinema4D, Cinebench is a purpose built benchmark machine that renders a scene with both single and multi-threaded options. The scene is identical in both cases. The R20 version means that it targets Cinema 4D R20, a slightly older version of the software which is currently on version R21. Cinebench R20 was launched given that the R15 version had been out a long time, and despite the difference between the benchmark and the latest version of the software on which it is based, Cinebench results are often quoted a lot in marketing materials.
Results for Cinebench R20 are not comparable to R15 or older, because both the scene being used is different, but also the updates in the code bath. The results are output as a score from the software, which is directly proportional to the time taken. Using the benchmark flags for single CPU and multi-CPU workloads, we run the software from the command line which opens the test, runs it, and dumps the result into the console which is redirected to a text file. The test is repeated for a minimum of 10 minutes for both ST and MT, and then the runs averaged.
The improvement in Cinebench R20 is a good measure over previous generations of Intel. However mobile Tiger Lake scores 593 at 28 W, still ahead of the 11700K, and they are all behind AMD.
CPU Tests: Encoding
One of the interesting elements on modern processors is encoding performance. This covers two main areas: encryption/decryption for secure data transfer, and video transcoding from one video format to another.
In the encrypt/decrypt scenario, how data is transferred and by what mechanism is pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security.
Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.
HandBrake 1.32: Link
Video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codecs, VP9 and AV1, there are others that are prominent: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H.265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content. There are other codecs coming to market designed for specific use cases all the time.
Handbrake is a favored tool for transcoding, with the later versions using copious amounts of newer APIs to take advantage of co-processors, like GPUs. It is available on Windows via an interface or can be accessed through the command-line, with the latter making our testing easier, with a redirection operator for the console output.
We take the compiled version of this 16-minute YouTube video about Russian CPUs at 1080p30 h264 and convert into three different files: (1) 480p30 ‘Discord’, (2) 720p30 ‘YouTube’, and (3) 4K60 HEVC.
Up to the final 4K60 HEVC, in CPU-only mode, the Intel CPU puts up some good gen-on-gen numbers.
7-Zip 1900: Link
The first compression benchmark tool we use is the open-source 7-zip, which typically offers good scaling across multiple cores. 7-zip is the compression tool most cited by readers as one they would rather see benchmarks on, and the program includes a built-in benchmark tool for both compression and decompression.
The tool can either be run from inside the software or through the command line. We take the latter route as it is easier to automate, obtain results, and put through our process. The command line flags available offer an option for repeated runs, and the output provides the average automatically through the console. We direct this output into a text file and regex the required values for compression, decompression, and a combined score.
An increase over the previous generation, but AMD has a 25% lead.
AES Encoding
Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.
WinRAR 5.90: Link
For the 2020 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack
- 33 video files , each 30 seconds, in 1.37 GB,
- 2834 smaller website files in 370 folders in 150 MB,
- 100 Beat Saber music tracks and input files, for 451 MB
This is a mixture of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test for 20 minutes times and take the average of the last five runs when the benchmark is in a steady state.
For automation, we use AHK’s internal timing tools from initiating the workload until the window closes signifying the end. This means the results are contained within AHK, with an average of the last 5 results being easy enough to calculate.
CPU Tests: Legacy and Web
In order to gather data to compare with older benchmarks, we are still keeping a number of tests under our ‘legacy’ section. This includes all the former major versions of CineBench (R15, R11.5, R10) as well as x264 HD 3.0 and the first very naïve version of 3DPM v2.1. We won’t be transferring the data over from the old testing into Bench, otherwise it would be populated with 200 CPUs with only one data point, so it will fill up as we test more CPUs like the others.
The other section here is our web tests.
Web Tests: Kraken, Octane, and Speedometer
Benchmarking using web tools is always a bit difficult. Browsers change almost daily, and the way the web is used changes even quicker. While there is some scope for advanced computational based benchmarks, most users care about responsiveness, which requires a strong back-end to work quickly to provide on the front-end. The benchmarks we chose for our web tests are essentially industry standards – at least once upon a time.
It should be noted that for each test, the browser is closed and re-opened a new with a fresh cache. We use a fixed Chromium version for our tests with the update capabilities removed to ensure consistency.
Mozilla Kraken 1.1
Kraken is a 2010 benchmark from Mozilla and does a series of JavaScript tests. These tests are a little more involved than previous tests, looking at artificial intelligence, audio manipulation, image manipulation, json parsing, and cryptographic functions. The benchmark starts with an initial download of data for the audio and imaging, and then runs through 10 times giving a timed result.
We loop through the 10-run test four times (so that’s a total of 40 runs), and average the four end-results. The result is given as time to complete the test, and we’re reaching a slow asymptotic limit with regards the highest IPC processors.
Google Octane 2.0
Our second test is also JavaScript based, but uses a lot more variation of newer JS techniques, such as object-oriented programming, kernel simulation, object creation/destruction, garbage collection, array manipulations, compiler latency and code execution.
Octane was developed after the discontinuation of other tests, with the goal of being more web-like than previous tests. It has been a popular benchmark, making it an obvious target for optimizations in the JavaScript engines. Ultimately it was retired in early 2017 due to this, although it is still widely used as a tool to determine general CPU performance in a number of web tasks.
Speedometer 2: JavaScript Frameworks
Our newest web test is Speedometer 2, which is a test over a series of JavaScript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.
Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics.
We repeat over the benchmark for a dozen loops, taking the average of the last five.
Legacy Tests
CPU Tests: SPEC
Page by Andrei Frumusanu
SPEC2017 is a series of standardized tests used to probe the overall performance between different systems, different architectures, different microarchitectures, and setups. The code has to be compiled, and then the results can be submitted to an online database for comparison. It covers a range of integer and floating point workloads, and can be very optimized for each CPU, so it is important to check how the benchmarks are being compiled and run.
We run the tests in a harness built through Windows Subsystem for Linux, developed by our own Andrei Frumusanu. WSL has some odd quirks, with one test not running due to a WSL fixed stack size, but for like-for-like testing is good enough. Because our scores aren’t official submissions, as per SPEC guidelines we have to declare them as internal estimates from our part.
For compilers, we use LLVM both for C/C++ and Fortan tests, and for Fortran we’re using the Flang compiler. The rationale of using LLVM over GCC is better cross-platform comparisons to platforms that have only have LLVM support and future articles where we’ll investigate this aspect more. We’re not considering closed-sourced compilers such as MSVC or ICC.
clang version 10.0.0
clang version 7.0.1 (ssh://git@github.com/flang-compiler/flang-driver.git
24bd54da5c41af04838bbe7b68f830840d47fc03)
-Ofast -fomit-frame-pointer
-march=x86-64
-mtune=core-avx2
-mfma -mavx -mavx2
Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions. We decided to build our SPEC binaries on AVX2, which puts a limit on Haswell as how old we can go before the testing will fall over. This also means we don’t have AVX512 binaries, primarily because in order to get the best performance, the AVX-512 intrinsic should be packed by a proper expert, as with our AVX-512 benchmark. All of the major vendors, AMD, Intel, and Arm, all support the way in which we are testing SPEC.
To note, the requirements for the SPEC licence state that any benchmark results from SPEC have to be labeled ‘estimated’ until they are verified on the SPEC website as a meaningful representation of the expected performance. This is most often done by the big companies and OEMs to showcase performance to customers, however is quite over the top for what we do as reviewers.
For the new Cypress Cove based i7-11700K, we haven’t had quite the time to investigate the new AVX-512 instruction differences – since this is the first consumer desktop socketed CPU with the new ISA extensions it’s something we’ll revisit in the full review. Based on our testing on the server core counterparts however, it doesn’t make any noticeable differences in SPEC.
In the SPECint2017 suite, we’re seeing the new i7-11700K able to surpass its desktop predecessors across the board in terms of performance. The biggest performance leap is found in 523.xalancbmk which consists of XML processing at a large +54.4% leap versus the 10700K.
The rest of the improvements range in the +0% to +15% range, with an average total geomean advantage of +15.5% versus the 10700K. The IPC advantage should be in the +18.5% range.
In the FP scores, there’s nothing standing out too much, with general even improvements across the board. The total improvement here is +19.6%, with the IPC improvement in the +22% range.
Although the new Cypress Cove cores in the 11700K do have good generational IPC improvements, that’s all compared to the quite old predecessor, meaning that for single-thread performance, the advancements aren’t enough to quite keep up with the latest Zen3 competition from AMD, or for that matter, the Firestorm cores in Apple’s new M1.
More interesting are the multi-threaded SPEC results. Here, the new generation from Intel is showcasing a +5.8% and +16.2% performance improvement over its direct predecessor. Given the power draw increases we’ve seen this generation, those are rather unimpressive results, and actually represent a perf/W regression. AMD’s current 6-core 5600X actually is very near to the new 11700K, but consuming a fraction of the power.
Gaming Tests: Deus Ex Mankind Divided
Deus Ex is a franchise with a wide level of popularity. Despite the Deus Ex: Mankind Divided (DEMD) version being released in 2016, it has often been heralded as a game that taxes the CPU. It uses the Dawn Engine to create a very complex first-person action game with science-fiction based weapons and interfaces. The game combines first-person, stealth, and role-playing elements, with the game set in Prague, dealing with themes of transhumanism, conspiracy theories, and a cyberpunk future. The game allows the player to select their own path (stealth, gun-toting maniac) and offers multiple solutions to its puzzles.
DEMD has an in-game benchmark, an on-rails look around an environment showcasing some of the game’s most stunning effects, such as lighting, texturing, and others. Even in 2020, it’s still an impressive graphical showcase when everything is jumped up to the max. For this title, we are testing the following resolutions:
- 600p Low, 1440p Low, 4K Low, 1080p Max
The benchmark runs for about 90 seconds. We do as many runs within 10 minutes per resolution/setting combination, and then take averages and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
DEMD is often considered a CPU-limited title, so when the 11700K is better than the older Intel CPUs is at the low resolution, low quality setting, that confirms that. But as we ramp up the resolution, and the quality, the 11700K falls behind ever so slightly in both averages and percentiles.
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Final Fantasy XIV
Despite being one number less than Final Fantasy 15, because FF14 is a massively-multiplayer online title, there are always yearly update packages which give the opportunity for graphical updates too. In 2019, FFXIV launched its Shadowbringers expansion, and an official standalone benchmark was released at the same time for users to understand what level of performance they could expect. Much like the FF15 benchmark we’ve been using for a while, this test is a long 7-minute scene of simulated gameplay within the title. There are a number of interesting graphical features, and it certainly looks more like a 2019 title than a 2010 release, which is when FF14 first came out.
With this being a standalone benchmark, we do not have to worry about updates, and the idea for these sort of tests for end-users is to keep the code base consistent. For our testing suite, we are using the following settings:
- 768p Minimum, 1440p Minimum, 4K Minimum, 1080p Maximum
As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS |
As the resolution increases, the 11900K seemed to get a better average frame rate, but with the quality increased, it falls back down again, coming behind the older Intel CPUs.
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Final Fantasy XV
Upon arriving to PC, Final Fantasy XV: Windows Edition was given a graphical overhaul as it was ported over from console. As a fantasy RPG with a long history, the fruits of Square-Enix’s successful partnership with NVIDIA are on display. The game uses the internal Luminous Engine, and as with other Final Fantasy games, pushes the imagination of what we can do with the hardware underneath us. To that end, FFXV was one of the first games to promote the use of ‘video game landscape photography’, due in part to the extensive detail even at long range but also with the integration of NVIDIA’s Ansel software, that allowed for super-resolution imagery and post-processing effects to be applied.
In preparation for the launch of the game, Square Enix opted to release a standalone benchmark. Using the Final Fantasy XV standalone benchmark gives us a lengthy standardized sequence to record, although it should be noted that its heavy use of NVIDIA technology means that the Maximum setting has problems - it renders items off screen. To get around this, we use the standard preset which does not have these issues. We use the following settings:
- 720p Standard, 1080p Standard, 4K Standard, 8K Standard
For automation, the title accepts command line inputs for both resolution and settings, and then auto-quits when finished. As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
In more CPU limited scenarios, the 11700K shows generational improvements over other Intel processors, but as the resolution or quality increases, we end up being GPU limited and all the CPUs even out.
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: World of Tanks
Albeit different to most of the other commonly played MMO or massively multiplayer online games, World of Tanks is set in the mid-20th century and allows players to take control of a range of military based armored vehicles. World of Tanks (WoT) is developed and published by Wargaming who are based in Belarus, with the game’s soundtrack being primarily composed by Belarusian composer Sergey Khmelevsky. The game offers multiple entry points including a free-to-play element as well as allowing players to pay a fee to open up more features. One of the most interesting things about this tank based MMO is that it achieved esports status when it debuted at the World Cyber Games back in 2012.
World of Tanks enCore is a demo application for its new graphics engine penned by the Wargaming development team. Over time the new core engine has been implemented into the full game upgrading the games visuals with key elements such as improved water, flora, shadows, lighting as well as other objects such as buildings. The World of Tanks enCore demo app not only offers up insight into the impending game engine changes, but allows users to check system performance to see if the new engine runs optimally on their system. There is technically a Ray Tracing version of the enCore benchmark now available, however because it can’t be deployed standalone without the installer, we decided against using it. If that gets fixed, then we can look into it.
The benchmark tool comes with a number of presets:
- 768p Minimum, 1080p Standard, 1080p Max, 4K Max (not a preset)
The odd one out is the 4K Max preset, because the benchmark doesn’t automatically have a 4K option – to get this we edit the acceptable resolutions ini file, and then we can select 4K. The benchmark outputs its own results file, with frame times, making it very easy to parse the data needed for average and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
WoT is a fun test to see 700 FPS+ numbers with the best CPUs. However the differences between the CPUs end up being minor, and in absolute terms the 11700K still has issues, often sitting at the lower end of the results.
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Borderlands 3
As a big Borderlands fan, having to sit and wait six months for the EPIC Store exclusive to expire before we saw it on Steam felt like a long time to wait. The fourth title of the franchise, if you exclude the TellTale style-games, BL3 expands the universe beyond Pandora and its orbit, with the set of heroes (plus those from previous games) now cruising the galaxy looking for vaults and the treasures within. Popular Characters like Tiny Tina, Claptrap, Lilith, Dr. Zed, Zer0, Tannis, and others all make appearances as the game continues its cel-shaded design but with the graphical fidelity turned up. Borderlands 1 gave me my first ever taste of proper in-game second order PhysX, and it’s a high standard that continues to this day.
BL3 works best with online access, so it is filed under our online games section. BL3 is also one of our biggest downloads, requiring 100+ GB. As BL3 supports resolution scaling, we are using the following settings:
- 360p Very Low, 1440p Very Low, 4K Very Low, 1080p Badass
BL3 has its own in-game benchmark, which recreates a set of on-rails scenes with a variety of activity going on in each, such as shootouts, explosions, and wildlife. The benchmark outputs its own results files, including frame times, which can be parsed for our averages/percentile data.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
With the 9900K sitting at 5.0 GHz, the fact that the 11700K only does single core 5.0 GHz shouldn't matter if the IPC gains on the core help push the needle. Unfortunately, it doesn't seem to do much in Borderlands.
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: F1 2019
The F1 racing games from Codemasters have been popular benchmarks in the tech community, mostly for ease-of-use and that they seem to take advantage of any area of a machine that might be better than another. The 2019 edition of the game features all 21 circuits on the calendar for that year, and includes a range of retro models and DLC focusing on the careers of Alain Prost and Ayrton Senna. Built on the EGO Engine 3.0, the game has been criticized similarly to most annual sports games, by not offering enough season-to-season graphical fidelity updates to make investing in the latest title worth it, however the 2019 edition revamps up the Career mode, with features such as in-season driver swaps coming into the mix. The quality of the graphics this time around is also superb, even at 4K low or 1080p Ultra.
For our test, we put Alex Albon in the Red Bull in position #20, for a dry two-lap race around Austin. We test at the following settings:
- 768p Ultra Low, 1440p Ultra Low, 4K Ultra Low, 1080p Ultra
In terms of automation, F1 2019 has an in-game benchmark that can be called from the command line, and the output file has frame times. We repeat each resolution setting for a minimum of 10 minutes, taking the averages and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
The Ego engine is usually a good bet where cores, IPC, and frequency matters. Despite this, the 11700K isn't showing much of a generational improvement.
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Far Cry 5
The fifth title in Ubisoft's Far Cry series lands us right into the unwelcoming arms of an armed militant cult in Montana, one of the many middles-of-nowhere in the United States. With a charismatic and enigmatic adversary, gorgeous landscapes of the northwestern American flavor, and lots of violence, it is classic Far Cry fare. Graphically intensive in an open-world environment, the game mixes in action and exploration with a lot of configurability.
Unfortunately, the game doesn’t like us changing the resolution in the results file when using certain monitors, resorting to 1080p but keeping the quality settings. But resolution scaling does work, so we decided to fix the resolution at 1080p and use a variety of different scaling factors to give the following:
- 720p Low, 1440p Low, 4K Low, 1440p Max.
Far Cry 5 outputs a results file here, but that the file is a HTML file, which showcases a graph of the FPS detected. At no point in the HTML file does it contain the frame times for each frame, but it does show the frames per second, as a value once per second in the graph. The graph in HTML form is a series of (x,y) co-ordinates scaled to the min/max of the graph, rather than the raw (second, FPS) data, and so using regex I carefully tease out the values of the graph, convert them into a (second, FPS) format, and take our values of averages and percentiles that way.
If anyone from Ubisoft wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to ian@anandtech.com. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).
As with the other gaming tests, we run each resolution/setting combination for a minimum of 10 minutes and take the relevant frame data for averages and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Gears Tactics
Remembering the original Gears of War brings back a number of memories – some good, and some involving online gameplay. The latest iteration of the franchise was launched as I was putting this benchmark suite together, and Gears Tactics is a high-fidelity turn-based strategy game with an extensive single player mode. As with a lot of turn-based games, there is ample opportunity to crank up the visual effects, and here the developers have put a lot of effort into creating effects, a number of which seem to be CPU limited.
Gears Tactics has an in-game benchmark, roughly 2.5 minutes of AI gameplay starting from the same position but using a random seed for actions. Much like the racing games, this usually leads to some variation in the run-to-run data, so for this benchmark we are taking the geometric mean of the results. One of the biggest things that Gears Tactics can do is on the resolution scaling, supporting 8K, and so we are testing the following settings:
- 720p Low, 4K Low, 8K Low, 1080p Ultra
For results, the game showcases a mountain of data when the benchmark is finished, such as how much the benchmark was CPU limited and where, however none of that is ever exported into a file we can use. It’s just a screenshot which we have to read manually.
If anyone from the Gears Tactics team wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to ian@anandtech.com. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).
As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed. For this benchmark, we manually read each of the screenshots for each quality/setting/run combination. The benchmark does also give 95th percentiles and frame averages, so we can use both of these data points.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
Gears is the one test where at our 1080p Maximum settings it shines ahead of the pack. Although at high resolution, low quality, although all five CPUs are essentially equal, it still sits behind AMD's Ryzen APU.
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: GTA 5
The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA to help optimize the title. At this point GTA V is super old, but still super useful as a benchmark – it is a complicated test with many features that modern titles today still struggle with. With rumors of a GTA 6 on the horizon, I hope Rockstar make that benchmark as easy to use as this one is.
GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.
We are using the following settings:
- 720p Low, 1440p Low, 4K Low, 1080p Max
The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data. The benchmark can also be called from the command line, making it very easy to use.
There is one funny caveat with GTA. If the CPU is too slow, or has too few cores, the benchmark loads, but it doesn’t have enough time to put items in the correct position. As a result, for example when running our single core Sandy Bridge system, the jet ends up stuck at the middle of an intersection causing a traffic jam. Unfortunately this means the benchmark never ends, but still amusing.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Red Dead Redemption 2
It’s great to have another Rockstar benchmark in the mix, and the launch of Red Dead Redemption 2 (RDR2) on the PC gives us a chance to do that. Building on the success of the original RDR, the second incarnation came to Steam in December 2019 having been released on consoles first. The PC version takes the open-world cowboy genre into the start of the modern age, with a wide array of impressive graphics and features that are eerily close to reality.
For RDR2, Rockstar kept the same benchmark philosophy as with Grand Theft Auto V, with the benchmark consisting of several cut scenes with different weather and lighting effects, with a final scene focusing on an on-rails environment, only this time with mugging a shop leading to a shootout on horseback before riding over a bridge into the great unknown. Luckily most of the command line options from GTA V are present here, and the game also supports resolution scaling. We have the following tests:
- 384p Minimum, 1440p Minimum, 8K Minimum, 1080p Max
For that 8K setting, I originally thought I had the settings file at 4K and 1.0x scaling, but it was actually set at 2.0x giving that 8K. For the sake of it, I decided to keep the 8K settings.
For our results, we run through each resolution and setting configuration for a minimum of 10 minutes, before averaging and parsing the frame time data.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Strange Brigade
Strange Brigade is based in 1903’s Egypt, and follows a story which is very similar to that of the Mummy film franchise. This particular third-person shooter is developed by Rebellion Developments which is more widely known for games such as the Sniper Elite and Alien vs Predator series. The game follows the hunt for Seteki the Witch Queen, who has arose once again and the only ‘troop’ who can ultimately stop her. Gameplay is cooperative centric with a wide variety of different levels and many puzzles which need solving by the British colonial Secret Service agents sent to put an end to her reign of barbaric and brutality.
The game supports both the DirectX 12 and Vulkan APIs and houses its own built-in benchmark as an on-rails experience through the game. For quality, the game offers various options up for customization including textures, anti-aliasing, reflections, draw distance and even allows users to enable or disable motion blur, ambient occlusion and tessellation among others. Strange Brigade supports Vulkan and DX12, and so we test on both.
- 720p Low, 1440p Low, 4K Low, 1080p Ultra
The automation for Strange Brigade is one of the easiest in our suite – the settings and quality can be changed by pre-prepared .ini files, and the benchmark is called via the command line. The output includes all the frame time data.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Conclusion: The War of Attrition
(These numbers have been updated with the later 0x34 microcode)
Intel’s desktop product teams have had their hands in a bind for a while. The goal was always to migrate from 14nm to 10nm when the intersection of performance, cost, and power hit the sweet spot. This was originally expected to happen after 2017, with the launch after Kaby Lake, but we are here in 2021 and this still hasn’t happened. Intel’s 10nm manufacturing process is unable to scale to the right level of frequency, power, and cost that is needed for an effective desktop processor.
Now Intel is full of smart people – alongside the manufacturing team, the internal microarchitecture teams building the next generation cores would have already been 3-5 years ahead in their design cycle, waiting to deploy the best ideas when the manufacturing was ready. However, with a plug in the pipeline and no way to easily patch it, Intel had to decide what to do in a worst case scenario – what if 10nm is never ready?
The first part of that answer is in our hands today. Despite not being designed for 14nm, Intel took its 10nm Sunny Cove core design (and Xe integrated graphics), and rebuilt it from the ground up. This sounds arduous – all the solutions to get things working in 10nm need to be rethought, and new issues with timing and signal integrity have to be solved. It wasn’t designed for 14nm, and to signify its difference, it was called Cypress Cove. These engineers are no doubt frustrated that they had core designs on the table, ready to go on 10nm, but they had to re-draw them in a different style where they are bigger and more power hungry, just to get something out of the door. That different style is Rocket Lake, and specifically the Core i7-11700K we have tested today.
Improvements for Desktop, Sort Of
Rocket Lake brings to the table a big core design with new features such as AVX-512 and PCIe 4.0. The core is so big that in order to keep die size, yield, and costs similar to the previous generation, the final design only has eight cores rather than ten. This would appear to be a 20% regression in absolute performance, however Intel is promoting a +19% average performance gain, evening it all out, while also providing the new features listed above. That +19% also should apply to single thread situations, enabling faster single user response time.
To validate Intel’s claims here, we run our industry standard benchmarks, such as SPEC, and compare the i7-10700K to the i7-11700K. Though this testing, we can confirm that Intel is correct on that +19% claim, however that isn’t an overall performance uplift and there’s a big asterisk next to that number.
All workloads at their core, even when browsing the web or word processing, can be split into integer (whole numbers, most workloads) and floating point (numbers with decimal places, workloads with math). In our testing, we saw the following:
- Single thread floating point: +22%
- Multi-thread floating point: +16.2%
Sounds great, right?
- Single thread integer: +18.5%
- Multi-thread integer: +5.8%
Oh. While Intel’s claim of +19% is technically correct, it only seems to apply to math-heavy workloads or single thread integer workloads. The benefits of non math-based throughput are still better than average, but only at 5.8% for multithreaded. Very rarely do Intel’s big claims come with an easily identifiable asterisk.
When we look at our real-world data, in almost every benchmark the 11700K either matches or beats the 10700K, and showcases the IPC gain in tests like Dolphin, Blender, POV-Ray, Agisoft, Handbrake, web tests, and obviously SPECfp. It scores a big win in our 3DPM AVX test, because it has AVX-512 and none of the other CPUs do.
A Comment on Gaming
Users looking at our gaming results will undoubtedly be disappointed. The improvements Intel has made to its processor seem to do very little in our gaming tests, and in a lot of cases, we see performance regressions rather than improvements. If Intel is promoting +19% IPC, then why is gaming not seeing similar jumps?
Normally with gaming we might look to the structural latency comparison to see where some uplifts might come.
The biggest change in the cache hierarchy is in the L3 cache, which is now ~45-46 cycles rather than 42-43 cycles. When we originally tested this with the 0x2C microcode it was 51 cycles, but Intel has implemented updates to reduce this to around 45-46 cycles, which improves gaming performance a little. This is still a slight regression, and we’re seeing the core-to-core latency (regardless of microcode) still show 28-30 nanosecond latencies on most cores, rather than 18-24 as observed on Comet Lake. We expected some regressions from Ice Lake/Sunny Cove with the backported core – 51 cycles for the L3 was considered a lot, but reducing that to 45-46 cycles is still a slight regression, but more in line with what we expected. Overall, there is an effective latency decrease from having larger caches, but this hasn't translated into gaming performance along with the increase in IPC.
But Margins, Power, and Temperature
Moving into this review, users that have followed Intel’s desktop platform know that sustained power modes on the high-core count models are a lot higher than the number on the box suggests. This isn’t just limited to the overclockable processors, like in our i9-10850K review where we saw 260 W, but even the i7-10700 rated at 65 W would push 200 W, especially in motherboards that ignored recommended turbo limits (which is practically every consumer gaming motherboard).
The migration of Sunny Cove cores, already known for being power hungry, onto an older process node, and then bundling AVX-512 in the mix, has had a number of enthusiasts concerned for how Intel would approach power consumption. Based on our testing today, the simple answer is to offer a blessing to the deity of your choice for a good CPU. Our Core i7-11700K is rated at 125 W. But in practice for a mild AVX2 workload we saw 225 W of power consumption and a temperature of 81ºC, while a general workload was around 130-155 W at 60ºC.
The danger is that during our testing, the power peaked at an eye-catching 292 W on one of our tests systems. This was during an all-core AVX-512 workload, automatically set at 4.6 GHz, and the CPU hit 104ºC momentarily. A second motherboard, running new firmware, only peaked at 276 W, running at 4.4-4.6 GHz, but still saw 103ºC before reducing in power to 225 W.
For the first motherboard on the 0x2C microcode, there’s no indication that the frequency reduced when hitting this temperature, and our cooler is easily sufficient for the thermal load, which means that on some level we might be hitting the thermal density limits of wide mathematics processing on Intel’s 14nm. In order to keep temperatures down, new cooling methods have to be used, regardless of motherboard or microcode.
I noted that Intel has reduced the air gap inside the CPU package, with the whole z-height reduced from 4.48 mm to 4.36 mm. It’s a small change, meaning less material for thermal energy to transfer through, improving cooling.
Users looking to overclock on these processors are going to have to implement a strong AVX-512 offset here.
A Rock(et) and A Hard Place, But The Only Option Available
Rocket Lake is the product of an idea to backport a design, and ensures that the popular market segment of consumer processors is closer to the leading edge of Intel’s design, despite the unavailability of Intel’s latest process node to desktop-class hardware.
Going forward, Intel has (in not so many words) committed to a less rigid philosophy than the past – use the right design on the right process node, rather than tying the two together. Rocket Lake is arguably the first product coming from that philosophy, despite being a later part that came after core was designed in the first place. But Intel will measure its success as an initial yard stick to similar endeavors in the future. And it will succeed, for reasons external to Intel.
Our results clearly show that Intel’s performance, while substantial, still trails its main competitor, AMD. In a core-for-core comparison, Intel is slightly slower and a lot more inefficient. The smart money would be to get the AMD processor. However, due to high demand and prioritizing commercial and enterprise contracts, the only parts readily available on retail shelves right now are from Intel. Any user looking to buy or build a PC today has to dodge, duck, dip, dive and dodge their way to find one for sale, and also hope that it is not at a vastly inflated price. The less stressful solution would be to buy Intel, and use Intel’s latest platform in Rocket Lake.
Normally this is the point where I’d conclude with a comment on what to recommend. But the clear answer during this chip crunch is to buy the processor you can find at a reasonable price. We don’t have official pricing on Rocket Lake just yet, but if a retailer was happy to sell units before the official launch, then perhaps there will be sufficient number out there to go around.
Official details of Rocket Lake will be posted when our NDA on that information expires. Official retail of Rocket Lake will commence on March 30th.