CPU Tests: Legacy and Web

In order to gather data to compare with older benchmarks, we are still keeping a number of tests under our ‘legacy’ section. This includes all the former major versions of CineBench (R15, R11.5, R10) as well as x264 HD 3.0 and the first very naïve version of 3DPM v2.1. We won’t be transferring the data over from the old testing into Bench, otherwise it would be populated with 200 CPUs with only one data point, so it will fill up as we test more CPUs like the others.

The other section here is our web tests.

Web Tests: Kraken, Octane, and Speedometer

Benchmarking using web tools is always a bit difficult. Browsers change almost daily, and the way the web is used changes even quicker. While there is some scope for advanced computational based benchmarks, most users care about responsiveness, which requires a strong back-end to work quickly to provide on the front-end. The benchmarks we chose for our web tests are essentially industry standards – at least once upon a time.

It should be noted that for each test, the browser is closed and re-opened a new with a fresh cache. We use a fixed Chromium version for our tests with the update capabilities removed to ensure consistency.

Mozilla Kraken 1.1

Kraken is a 2010 benchmark from Mozilla and does a series of JavaScript tests. These tests are a little more involved than previous tests, looking at artificial intelligence, audio manipulation, image manipulation, json parsing, and cryptographic functions. The benchmark starts with an initial download of data for the audio and imaging, and then runs through 10 times giving a timed result.

We loop through the 10-run test four times (so that’s a total of 40 runs), and average the four end-results. The result is given as time to complete the test, and we’re reaching a slow asymptotic limit with regards the highest IPC processors.

(7-1) Kraken 1.1 Web Test

Google Octane 2.0

Our second test is also JavaScript based, but uses a lot more variation of newer JS techniques, such as object-oriented programming, kernel simulation, object creation/destruction, garbage collection, array manipulations, compiler latency and code execution.

Octane was developed after the discontinuation of other tests, with the goal of being more web-like than previous tests. It has been a popular benchmark, making it an obvious target for optimizations in the JavaScript engines. Ultimately it was retired in early 2017 due to this, although it is still widely used as a tool to determine general CPU performance in a number of web tasks.

(7-2) Google Octane 2.0 Web Test

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a test over a series of JavaScript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics.

We repeat over the benchmark for a dozen loops, taking the average of the last five.

(7-3) Speedometer 2.0 Web Test

Legacy Tests

(6-5a) x264 HD 3.0 Pass 1(6-5b) x264 HD 3.0 Pass 2

(6-4a) 3DPM v1 ST(6-4b) 3DPM v1 MT

(6-3a) CineBench R15 ST(6-3b) CineBench R15 MT

CPU Tests: Simulation CPU Tests: Synthetic
Comments Locked

98 Comments

View All Comments

  • Spunjji - Friday, July 16, 2021 - link

    Having seen how modern processors behave with insufficient cooling, Threska's right that it won't get "fried", but you're correct to infer that it would result in unpredictably sub-optimal performance.

    Anecdotally, I had a friend with a Sandy Bridge system with a cooling issue that he only noticed when he bought a new GPU and ran 3DMark and got unexpectedly low results. The "cooling issue" was that the stock heatsink wasn't even making contact with the CPU heat-spreader; he'd been gaming with the system for 3 years by that point. 😬
  • serpretetsky - Friday, July 16, 2021 - link

    I had to do some thermal shutdown testing on some consumer intel cpu. I forgot which one. Maybe i5/i7 8000 series?

    With server CPUs this was usually pretty easy, remove fan, and wait for shutdown. With the consumer CPU it kept running. So i completely removed the heatsink, the thing simply downclocked to 800 MHz, and continued running happily with no heatsink. Booted to linux, ran everything great, and no heatsink (actually once it booted to linux I think it even started clocking back up once in a while). I had get a hot-air soldering gun to heat it up till shutdown.
  • mode_13h - Saturday, July 17, 2021 - link

    5-10 years ago, there was a heatsink gasket where you have to get near 100 degrees C to melt the material so it fuses with the heatsink and CPU. I forget the name, but I'm wondering if it's even possible to do that any more.
  • skaurus - Wednesday, July 14, 2021 - link

    That's great analysis.
  • Threska - Wednesday, July 14, 2021 - link

    It would be nice to see how these MBs do with VFIO since that has considerations most users don't.
  • mode_13h - Wednesday, July 14, 2021 - link

    Ian, is the source code for your 3DPM benchmark published anywhere? If not, it would be nice if we could see it and compare the AVX2 path with the AVX-512 one. Also, maybe someone could add support for ARM NEON or SVE.
  • techguymaxc - Wednesday, July 14, 2021 - link

    I'm slightly confused by the concluding remarks.

    "Performance between Threadripper Pro and Threadripper came in three stages. Either (a) the results between similar processors was practically identical, (b) Threadripper beat TR Pro by a small margin due to slightly higher frequencies, or (c) TR Pro thrashed Threadripper due to memory bandwidth availability. That last point, (c), only really kicks in for the 32c and 64c processors it should be noted. Our 16c TR Pro had the same memory bandwidth results as TR, most likely due to only having two chiplets in its design."

    A and B are observable, but C only proves true in synthetic benchmarks (and Pi calculation). Is there a real-world use-case for the additional memory bandwidth, outside of calculating Pi?
  • Blastdoor - Wednesday, July 14, 2021 - link

    The advantage shows up with multi-threaded SPEC. SPEC is essentially a composite of a suite of real-world tasks. I guess you could call it 'synthetic' due to it being a composite, but the individual tasks don't strike me as 'synthetic.' For example, here's a description of namd: https://www.spec.org/cpu2017/Docs/benchmarks/508.n...
  • techguymaxc - Wednesday, July 14, 2021 - link

    Thanks for that info. It would be nice to see the breakdown of individual test results from the SPEC suite.
  • arashi - Saturday, July 17, 2021 - link

    Bench

Log in

Don't have an account? Sign up now