One question when building or upgrading a gaming system is of which CPU to choose - does it matter if I have a quad core from Intel, or a quad module from AMD? Perhaps something simpler will do the trick, and I can spend the difference on the GPU. What if you are running a multi-GPU setup, does the CPU have a bigger effect? This was the question I set out to help answer.

A few things before we start:

This set of results is by no means extensive or exhaustive. For the sake of expediency I could not select 10 different gaming titles across a variety of engines and then test them in seven or more different configurations per game and per CPU, nor could I test every different CPU made. As a result, on the gaming side, I limited myself to one resolution, one set of settings, and four very regular testing titles that offer time demos: Metro 2033, DiRT 3, Civilization V and Sleeping Dogs. This is obviously not Skyrim, Battlefield 3, Crysis 3 or Far Cry 3, which may be more relevant in your set up.

The arguments for and against time demo testing as well as the arguments for taking FRAPs values of sequences are well documented (time demos might not be representative vs. consistency and realism of FRAPsing a repeated run across a field), however all of our tests can be run on home systems to get a feel for how a system performs. Below is a discussion regarding AI, one of the common usages for a CPU in a game, and how it affects the system. Out of our benchmarks, DiRT 3 plays a game, including AI in the result, and the turn-based Civilization V has no concern for direct AI except for time between turns.

All this combines in with my unique position as the motherboard senior editor here at AnandTech – the position gives me access to a wide variety of motherboard chipsets, lane allocations and a fair number of CPUs. GPUs are not necessarily in a large supply in my side of the reviewing area, but both ASUS and ECS have provided my test beds with HD7970s and GTX580s respectively, such that they have been quintessential in being part of my test bed for 12 and 21 months. The task set before me in this review would be almost a career in itself if we were to expand to more GPUs and more multi-GPU setups. Thus testing up to 4x 7970 and up to 2x GTX 580 is a more than reasonable place to start.

Where It All Began

The most important point to note is how this set of results came to pass. Several months ago I came across a few sets of testing by other review websites that floored me – simple CPU comparison tests for gaming which were spreading like wildfire among the forums, and some results contradicted the general prevailing opinion on the topic. These results were pulling all sorts of lurking forum users out of the woodwork to have an opinion, and being the well-adjusted scientist I am, I set forth to confirm the results were, at least in part, valid.

What came next was a shock – some had no real explanation of the hardware setups. While the basic overview of hardware was supplied, there was no run down of settings used, and no attempt to justify the findings which had obviously caused quite a stir. Needless to say, I felt stunned that the lack of verbose testing, as well as both the results and a lot of the conversation, particularly from avid fans of Team Blue and Team Red, that followed. I planned to right this wrong the best way I know how – with science!

The other reason for pulling together the results in this article is perhaps the one I originally started with – the need to update drivers every so often. Since Ivy Bridge release, I have been using Catalyst 12.3 and GeForce 296.10 WHQL on my test beds. This causes problems – older drivers are not optimized, readers sometimes complain if older drivers are used, and new games cannot be added to the test bed because they might not scale correctly due to the older drivers. So while there are some reviews on the internet that update drivers between testing and keep the old numbers (leading to skewed results), actually taking time out to retest a number of platforms for more data points solely on the new drivers is actually a large undertaking.

For example, testing new drivers over six platforms (CPU/motherboard combinations) would mean: six platforms, four games, seven different GPU configurations, ~10 minutes per test plus 2+ hours to set up each platform and install a new OS/drivers/set up benchmarks. That makes 40+ hours of solid testing (if all goes without a second lost here or there), or just over a full working week – more if I also test the CPU performance for a computational benchmark update, or exponentially more if I include multiple resolutions and setting options.

If this is all that is worked on that week, it means no new content – so it happens rarely, perhaps once a year or before a big launch. This time was now, and when I started this testing, I was moving to Catalyst 13.1 and GeForce 310.90, which by the time this review goes live will have already been superseded! In reality, I have been slowly working on this data set for the best part of 10 weeks while also reviewing other hardware (but keeping those reviews with consistent driver comparisons). In total this review encapsulates 24 different CPU setups, with up to 6 different GPU configurations, meaning 430 data points, 1375 benchmark loops and over 51 hours in just GPU benchmarks alone, without considering setup time or driver issues.

What Does the CPU do in a Game?

A lot of game developers use customized versions of game engines, such as the EGO engine for driving games or the Unreal engine. The engine provides the underpinnings for a lot of the code, and the optimizations therein. The engine also decides what in the game gets offloaded onto the GPU.

Imagine the code that makes up the game as a linear sequence of events. In order to go through the game quickly, we need the fastest single core processor available. Of course, games are not like this – lots of the game can be parallelized, such as vector calculations for graphics. These were of course the first to be moved from CPU to the GPU. Over time, more parts of the code have made the move – physics and compute being the main features in recent months and years.

The GPU is good at independent, simple tasks – calculating which color is in which pixel is an example of this, along with addition processing and post-processing features (FXAA and so on). If a task is linear, it lives on the CPU, such as loading textures into memory or negotiating which data to transfer between the memory and the GPUs. The CPU also takes control of independent complex tasks, as the CPU is the one that can make complicated logic analysis.

Very few parts of a game come under this heading of ‘independent yet complex’. Anything suitable for the GPU but not ported over will be here, and the big one usually quoted is artificial intelligence. Deciding where an NPC is going to run, shoot or fly could be considered a very complex set of calculations, ideal for fast CPUs. The counter argument is that games have had complex AI for years – the number of times I personally was destroyed by a Dark Sim on Perfect Dark on the N64 is testament to either my uselessness or the fact that complex AI can be configured with not much CPU power. AI is unlikely to be a limiting factor in frame rates due to CPU usage.

What is most likely going to be the limiting factor is how the CPU can manage data. As engines evolve, they try and use data between the CPU, memory and GPUs less – if textures can be kept on the GPU, then they will stay there. But some engines are not as perfect as we would like them to be, resulting in the CPU as the limiting factor. As CPU performance increases, and those that write the engines in which games are made understand the ecosystem, CPU performance should be less of an issue over time. All roads point towards the PS4 of course, and its 8-core Jaguar processor. Is this all that is needed for a single GPU, albeit in an HSA environment?

Multi-GPU Testing

Another angle I wanted to test beyond most other websites is multi-GPU. There is content online dealing mostly with single GPU setups, with a few for dual GPU. Even though the number of multi-GPU users is actually quite small globally, the enthusiast markets are clearly geared for it. We get motherboards with support for four GPU cards; we have cases that will support a dual processor board as well as four double-height GPUs. Then there are GPUs being released with two sets of silicon on a PCB, wrapped in a double or triple width cooler.

More often than not on a forum, people will ask ‘what GPU for $xxx’ and some of the suggestions will be towards two GPUs at half the budget, as it commonly offers more performance than a single GPU if the game and the drivers all work smoothly (at the cost of power, heat, and bad driver scenarios). The ecosystem supports multi-GPU setups, so I felt it right to test at least one four-way setup. Although with great power comes great responsibility – there was no point testing 4-way 7970s on 1080p.

Typically in this price bracket, users will go for multi-monitor setups, along the lines of 5760x1080, or big monitor setups like 1440p, 1600p, or the mega-rich might try 4K. Ultimately the high end enthusiast, with cash to burn, is going to gravitate towards 4K, and I cannot wait until that becomes a reality. So for a median point in all of this, we are testing at 1440p and maximum settings. This will put the strain on our Core 2 Duo and Celeron G465 samples, but should be easy pickings for our multi-processor, multi-GPU beast of a machine.

A Minor Problem In Interpreting Results

Throughout testing for this review, there were clearly going to be some issues to consider. Chief of these is the question of consistency and in particular if something like Metro 2033 decides to have an ‘easy’ run which reports +3% higher than normal. For that specific example we get around this by double testing, as the easy run typically appears in the first batch – so we run two or three batches of four and disregard the first batch.

The other, perhaps bigger, issue is interpreting results. If I get 40.0 FPS on a Phenom II X4-960T, 40.1 FPS on an i5-2500K, and then 40.2 FPS on a Phenom II X2-555 BE, does that make the results invalid? The important points to recognize here are statistics and system state.

System State: We have all had times booting a PC when it feels sluggish, but this sluggish behavior disappears on reboot. The same thing can occur with testing, and usually happens as a result of bad initialization or a bad cache optimization routine at boot time. As a result, we try and spot these circumstances and re-run. With more time we would take 100 different measurements of each benchmark, with reboots, and cross out the outliers. Time constraints outside of academia unfortunately do not give us this opportunity.

Statistics: System state aside, frame rate values will often fluctuate around an average. This will mean (depending on the benchmark) that the result could be +/- a few percentage points on each run. So what happens if you have a run of four time demos, and each of them are +2% above the ‘average’ FPS? From the outside, as you will not know the true average, you cannot say if it is valid as the data set is extremely small. If we take more runs, we can find the variance (the technical version of the term), the standard deviation, and perhaps represent the mean, median and mode of a set of results.

As always, the main constraint in articles like these is time – the quicker to publish, the less testing, the larger the error bars and the higher likelihood that some results are going to be skewed because it just so happened to be a good/bad benchmark run. So the example given above of the X2-555 getting a better result is down to interpretation – each result might be +/- 0.5 FPS on average, and because they are all pretty similar we are actually more GPU limited. So it is more whether the GPU has a good/bad run in this circumstance.

For this example, I batched 100 runs of my common WinRAR test in motherboard testing, on an i5-2500K CPU with a Maximus V Formula. Results varied between 71 seconds and 74 seconds, with a large gravitation towards the lower end. To represent this statistically, we normally use a histogram, which separates the results up into ‘bins’ (e.g. 71.00 seconds to 71.25 seconds) of how accurate the final result has to be. Here is an initial representation of the data (time vs. run number), and a few histograms of that data, using a bin size of 1.00 s, 0.75s, 0.5s, 0.33s, 0.25s and 0.1s.


As we get down to the lower bin sizes, there is a pair of large groupings of results between ~71 seconds and ~ 72 seconds. The overall average/mean of the data is 71.88 due to the outliers around 74 seconds, with the median at 72.04 seconds and standard deviation of 0.660. What is the right value to report? Overall average? Peak? Average +/- standard deviation? With the results very skewed around two values, what happens if I do 1-3 runs and get ~71 seconds and none around ~72 seconds?

Statistics is clearly a large field, and without a large sample size, most numbers can be one-off results that are not truly reflective of the data. It is important to ask yourself every time you read a review with a result – how many data points went into that final value, and what analysis was performed?

For this review, we typically take four runs of our GPU tests each, except Civilization V which is extremely consistent +/- 0.1 FPS. The result reported is the average of those four values, minus any results we feel are inconsistent. At times runs have been repeated in order to confirm the value, but this will not be noted in the results.

The Bulldozer Challenge

Another purpose of this article was to tackle the problem surrounding Bulldozer and its derivatives, such as Piledriver and thus all Trinity APUs. The architecture is such that Windows 7, by default, does not accurately assign new threads to new modules – the ‘freshly installed’ stance is to double up on threads per module before moving to the next. By installing a pair of Windows Updates (which do not show in Windows Update automatically), we get an effect called ‘core parking’, which assigns the first series of threads each to its own module, giving it access to a pair of INT and an FP unit, rather than having pairs of threads competing for the prize. This affects variable threaded loading the most, particularly from 2 to 2N-2 threads where N is the number of modules in the CPU (thus 2 to 6 threads in an FX-8150). It should come as no surprise that games fall into this category, so we want to test with and without the entire core parking features in our benchmarks.

Hurdles with NVIDIA and 3-Way SLI on Ivy Bridge

Users who have been keeping up to date with motherboard options on Z77 will understand that there are several ways to put three PCIe slots onto a motherboard. The majority of sub-$250 motherboards will use three PCIe slots in a PCIe 3.0 x8/x8 + PCIe 2.0 x4 arrangement (meaning x8/x8 from the CPU and x4 from the chipset), allowing either two-way SLI or three-way Crossfire. Some motherboards will use a different Ivy Bridge lane allocation option such that we have a PCIe 3.0 x8/x4/x4 layout, giving three-way Crossfire but only two-way SLI. In fact in this arrangement, fitting the final x4 with a sound/raid card disables two-way SLI entirely.

This is due to a not widely publicized requirement of SLI – it needs at least an x8 lane allocation in order to work (either PCIe 2.0 or 3.0). Anything less than this on any GPU and you will be denied in the software. So putting in that third card will cause the second lane to drop to x4, disabling two-way SLI. There are motherboards that have a switch to change to x8/x8 + x4 in this scenario, but we are still capped at two-way SLI.

The only way to go onto 3-way or 4-way SLI is via a PLX 8747 enabled motherboard, which greatly enhances the cost of a motherboard build. This should be kept in mind when dealing with the final results.

Power Usage

It has come to my attention that even if the results were to come out X > Y, some users may call out that the better processor draws more power, which at the end of the day costs more money if you add it up over a year. For the purposes of this review, we are of the opinion that if you are gaming on a budget, then high-end GPUs such as the ones used here are not going to be within your price range.

Simple fun gaming can be had on a low resolution, limited detail system for not much money – for example at a recent LAN I went to I enjoyed 3-4 hours of TF2 fun on my AMD netbook with integrated HD3210 graphics, even though I had to install the ultra-low resolution texture pack and mods to get 30+ FPS. But I had a great time, and thus the beauty of high definition graphics of the bigger systems might not be of concern as long as the frame rates are good.

But if you want the best, you will pay for the best, even if it comes at the electricity cost. Budget gaming is fine, but this review is designed to focus on 1440p with maximum settings, which is not a budget gaming scenario.

Format Of This Article

On the next couple of pages, I will be going through in detail our hardware for this review, including CPUs, motherboards, GPUs and memory. Then we will move to the actual hardware setups, with CPU speeds and memory timings (with motherboards that actually enable XMP) detailed. Also important to note is the motherboards being used – for completeness I have tested several CPUs in two different motherboards because of GPU lane allocations.

We are living in an age where PCIe switches and additional chips are used to expand GPU lane layouts, so much so that there are up to 20 different configurations for Z77 motherboards alone. Sometimes the lane allocation makes a difference, and it can make a large difference using three or more GPUs (x8/x4/x4 vs. x16/x8/x8 with PLX), even with the added latency sometimes associated with the PCIe switches. Our testing over time will include the majority of the PCIe lane allocations on modern setups, but for our first article we are looking at the major ones we are likely to come across.

The results pages will start with a basic CPU analysis, running through my regular motherboard tests on the CPU. This should give us a feel for how much power each CPU has in dealing with mathematics and real world tests, both for integer operations (important on Bulldozer/Piledriver/Radeon) and floating point operations (where Intel/NVIDIA seem to perform best).

We will then move to each of our four gaming titles in turn, in our six different GPU configurations. As mentioned above, in GPU limited scenarios it may seem odd if a sub-$100 CPU is higher than one north of $300, but we hope to explain the tide of results as we go.

I hope this will be an ongoing project here at AnandTech, and over time we can add more CPUs, 4K testing, perhaps even show four-way Titan should that be available to us. The only danger is that on a driver or game change, it takes another chunk of time to get data! Any suggestions of course are greatly appreciated – drop me an email at ian@anandtech.com. Our next port of call will most likely be Haswell, which I am very much looking forward to testing.

CPUs, GPUs, Motherboards, and Memory
Comments Locked

242 Comments

View All Comments

  • Spunjji - Wednesday, May 8, 2013 - link

    Crap troll is crap.
  • lorribot - Wednesday, May 8, 2013 - link

    Would love to see something like a E3-1230 tested, it is around the same price as a i5-3570K but has no graphics, bigger cache and Hyper threading, but no over clocking and 100MHz lower clock. should be similar to a i7-3770 for around 60% of the price.
  • Spunjji - Wednesday, May 8, 2013 - link

    So let me get this straight. The engineers are idiots, yet you want them to go and work for other companies, the best candidate there being Intel. Also, thanks to this apparent mental handicap, they use Intel processors... oh, I get it, you're pro AMD!
  • bikerbass77 - Wednesday, May 8, 2013 - link

    Just a note as I keep seeing posts going on about Planetside 2 being CPU limited. It is not. I am saying this from experience having played it just fine on a Core2Duo based system. The reason you are most likely having problems will either be your ping or your GPU. I was running with a GTX 460 1gb card and it was fine for that particular game. I have just upgraded to a new CPU because the main game I play (Mechwarrior Online) is far more CPU bound being based on CryEngine 3.
  • cusideabelincoln - Wednesday, May 8, 2013 - link

    I think it should be noted that online multiplayer games are a different beast. Multiplayer is typically more CPU intensive, and if you're looking to maintain completely smooth gameplay without any dips in framerate or stuttering then the CPU becomes more important than it is for single player gaming.

    Also would you consider benchmarking live, online streaming of games? Would be great to see how much of a benefit the 3930K would have over other chips, and if Piledriver can pull ahead of the i5s definitively.
  • Markus_Antonius - Wednesday, May 8, 2013 - link

    Your sample size is statistically beyond irrelevant which prevents the scientist in me from drawing any conclusions from it. In addition, claiming any sort of causal relationship between results is outright scientifically wrong even if the sample size would be statistically relevant. From an engineering standpoint the X79 systems with ample headroom in every relevant department would be the best choice to avoid any possible bottlenecks / contention issues in the largest possible number of different workloads.

    Any recent system with a recent CPU and recent midrange graphics card can play a game and can often play it well. Advising a Core i7 3770K based on a statistically irrelevant benchmark while disregarding systems architecture is something that neither the scientist, the software engineer and the hobbyist in me can get behind in any way.
  • JarredWalton - Wednesday, May 8, 2013 - link

    Hyperbole, you have a new friend: meet Markus! "Beyond irrelevant", "any conclusions", "outright scientifically wrong", "ample headroom", "every relevant department", "best choice"....

    Let me guess: you have an X79 system, and it works great for you, and thus anyone even suggesting that it might not be the best thing since sliced bread is something you can't even think about in any way. This article is focused on gaming, and if you want to do things besides gaming yes, you will need to consider other facets of the system build. At the same time, if all you're looking for is a good gaming setup, perhaps with two or three GPUs, I have trouble imagining anyone recommending something other than i7-3770K right now (unless the recommendation is to "wait for Haswell").

    Let me give you a few things to consider that, while the scientist may not necessarily agree, the software engineer and hobbyist definitely would avoid SNB-E and workstations. 1) Overall power requirements (they still matter). 2) Quick Sync (may not be perfect quality, but dang it's fast). 3) Better performance in many games with two GPUs, no matter what paper specs and system architectures might say.
  • smuff3758 - Thursday, May 9, 2013 - link

    And that, Jared, is how to shut down this arrogant, condescending self-titled expert/scientist. I guess he must think the rest of us are bozos who come here for the comic relief?
  • Markus_Antonius - Sunday, May 12, 2013 - link

    My comment was about the testing method not being scientifically sound even though the author makes it a point to refer to the "well-adjusted scientist" in himself. There's a huge number of games out there as well as a lot of different mid-range to high-end video cards. Recommending an i7 3770K on the basis of one resolution tested and only 4 games is something that you absolutely cannot call science.

    I am among other responsibilities a software engineer and I don't actively avoid Sandy Bridge E and workstations.

    My criticism of the methods used and the conclusions drawn is valid criticism, especially in the face of the article being given the appearance of being science.

    If you're going to do recommendations based on statistics and for whatever reason decide to disregard engineering and the science behind systems design you're going to need a far larger sample size than what was used here.

    You can deflect this all you want by quoting power usage and quicksync but while power usage power usage should be a factor, this test was not about quicksync. If it had been they would not have tested X79 systems at all ;-)

    From both work and hobby I know a lot of power users and gaming remains one of the most demanding uses and one of the *most prevalent* demanding uses of a modern PC. Throwing a more powerful system aside and disregarding engineering needs to be done with a lot more care and thoroughness, all of which is missing here.

    Answering valid criticism with scorn and aggression is also very telling. Perhaps you're more insecure than you thought you were?
  • Badelhas - Wednesday, May 8, 2013 - link

    Great review, congrats!
    This comes at a perfect time for me, I just ordered a Qnix QX2710 1440p 27 inch monitor from ebay and a couple of 670´s to work along with my 2500k OCed to 4.5Ghz. It seems I will be amazed with that upgrade, lets see!
    Cheers

Log in

Don't have an account? Sign up now