Yesterday after an all-day session of benchmarking on Wednesday, we published our initial performance results for Civilization: Beyond Earth. As can often be the case with limited testing, we ran into a problem and were unable to find a solution at the time. In short, while there was a lot of talk about how developers Firaxis had spent some effort to improve latency using a custom Split-Frame Rendering (SFR) approach with Mantle on CrossFire configurations, we were unable to produce anything that corroborated that story. Emails were sent, but it took half a day before we finally had the answer: enabling SFR actually requires manual editing of the configuration file. Oops.

We could ask why manual editing of the INI file is even necessary, and there are other user interface items that would be nice to address as well as I noted in the conclusion of the original Benchmarked article. But that's all water under the bridge at this point, so let me issue a public apology for not having the complete information yesterday.

I've updated the text of the original article (and added a discussion of minimum frame rates in case you missed that), but since many people have potentially read the article already and are unlikely to revisit the subject, I wanted to post a separate Pipeline to update everyone on the true performance of CrossFire with Mantle and SFR. But before we get to that, let me also take this opportunity to provide some of the additional information from Firaxis and AMD on why SFR matters. Firaxis has a couple blog posts on the subject (including one highlighting the benefits of Mantle with multiple GPUs), and here's the direct quote from AMD's marketing folks:

With a traditional graphics API, multi-GPU (MGPU) arrays like AMD CrossFire are typically utilized with a rendering method called "alternate-frame rendering" (AFR). AFR renders odd frames on the first GPU, and even frames on the second GPU. Parallelizing a game’s workload across two GPUs working in tandem has obvious performance benefits.

As AFR requires frames to be rendered in advance, this approach can occasionally suffer from some issues:

  • Large queue depths can reduce the responsiveness of the user’s mouse input
  • The game’s design might not accommodate a queue sufficient for good MGPU scaling
  • Predicted frames in the queue may not be useful to the current state of the user’s movement or camera

Thankfully, AFR is not the only approach to multi-GPU. Mantle empowers game developers with full control of a multi-GPU array and the ability to create or implement unique MGPU solutions that fit the needs of the game engine. In Civilization: Beyond Earth, Firaxis designed a "split-frame rendering" (SFR) subsystem. SFR divides each frame of a scene into proportional sections, and assigns a rendering slice to each GPU in AMD CrossFire configuration. The "master" GPU quickly receives the work of each GPU and composites the final scene for the user to see on his or her monitor.

If you don’t see 70-100% GPU scaling, that is working as intended, according to Firaxis. Civilization: Beyond Earth’s GPU-oriented workloads are not as demanding as other recent PC titles. However, Beyond Earth’s design generates a considerable amount of work in the producer thread. The producer thread tracks API calls from the game and lines them up, through the CPU, for the GPU’s consumer thread to do graphics work. This producer thread vs. consumer thread workload balance is what establishes Civilization as a CPU-sensitive title (vs. a GPU-sensitive one).

Because the game emphasizes CPU performance, the rendering workloads may not fully utilize the capacity of a high-end GPU. In essence, there is no work leftover for the second GPU. However, in cases where the GPU workload is high and a frame might take a while to render (affecting user input latency), the decision to use SFR cuts input latency in half, because there is no long AFR queue to work through. The queue is essentially one frame, each GPU handling a half. This will keep the game smooth and responsive, emphasizing playability, vs. raw frame rates.

Let me provide an example. Let’s say a frame takes 60 milliseconds to render, and you have an AFR queue depth of two frames. That means the user will experience 120ms of lag between the time they move the map and that movement is reflected on-screen. Firaxis’ decision to use SFR halves the queue down to one frame, reducing the input latency to 60ms. And because each GPU is working on half the frame, the queue is reduced by half again to just 30ms.

In this way the game will feel very smooth and responsive, because raw frame-rate scaling was not the goal of this title. Smooth, playable performance was the goal. This is one of the unique approaches to MGPU that AMD has been extolling in the era of Mantle and other similar APIs.

When I first read the above, my initial reaction was: "This is awesome!" I've always been a bit leery of AFR and the increase in input latency that it can create, so using SFR to avoid the issue is an excellent idea. Unfortunately, it requires more work and testing to get it working right, so most games simply stick with AFR. Ironically, while reducing input latency is never a bad thing, it honestly doesn't matter nearly as much in a turn-based strategy game like Civilization: Beyond Earth. What we'd really love to see is use of techniques like SFR to reduce input latency on games from genres where input latency is a bigger deal – first-person games like Crysis, Battlefield, Far Cry, etc. and third-person games like Batman, Shadow of Mordor, Assassin's Creed, etc. being prime examples. With that said, let's revisit the subject of Civilization: Beyond Earth and CrossFire performance, with and without Mantle:

Civilization: Beyond Earth 4K Performance

Civilization: Beyond Earth QHD Performance

Civilization: Beyond Earth 1080p Performance

Civilization: Beyond Earth 1080p High Performance

Our graphing engine doesn't allow for sorting on multiple criteria, otherwise I might try sorting by average + minimum frame rate. Regardless, you can see that across the range of options the CrossFire Mantle SFR support is now doing what we'd expect and improving frame rates. But it's not just about improving frame rates; as the above commentary notes, improving input latency is also important. We aren't really equipped to test for input latency (that would require a very high speed camera as well as additional time filming and measuring input latency), but the minimum frame rates definitely improve as well.

What's interesting is that CrossFire without Mantle (which uses AFR) has higher average FPS in many cases, but the minimum frame rates are worse than with a single GPU. The two images above show why this isn't necessarily a good thing. We haven't tested SLI performance, but I have at least one source that says SLI performance is similar to CrossFire AFR: higher average FPS but lower minimum FPS. It's entirely possible that driver updates will improve the situation with D3D, but for now CrossFire with Mantle SFR definitely scores a win over Direct3D AFR as it provides for a smoother gaming experience.

Let's look at the above charts in a different format before we continue this discussion.

We can see that even with just two GPUs splitting the workload, our CPU has apparently become a bottleneck with the R9 290X. Average frame rates still show an increase going from 4K Ultra to QHD Ultra to 1080p Ultra to 1080p High, but when we look at minimum FPS we've apparently run straight into a wall. For the R9 290X with Mantle, CrossFire effectively tops out with a minimum FPS of roughly 65FPS while a single GPU hits a lower minimum of around 50FPS without Mantle, and regular CrossFire on the 290X (i.e. without Mantle) has a minimum of 45FPS. Again, there are likely some optimizations that could be made in both drivers and the game to improve the situation, but it wouldn't be too surprising to find that Mantle and SFR with three or four GPUs doesn't show much of an increase over two GPUs.

I do have to wonder how applicable the above results are to other games. Last I checked, Mantle CrossFire rendering on Sniper Elite 3 was basically not working, but if other software developers can use Mantle to effectively implement SFR instead of AFR that would be nice to see. But didn't we have SFR way back in the early days of multiple GPUs? Of course we did! 3dfx initially called their solution SLI – Scan Line Interleave – and had each GPU rendering every other line. That approach had problems with things like anti-aliasing, but there are many other ways to divide the workload between GPUs, and both AMD (formerly ATI) and NVIDIA have done variations on SFR in the past.

The problem is that when DirectX 9 rolled around and we started getting programmable shaders and deferred rendering, at some point synchronization issues cropped up and basically developers were locked out of doing creative things like SFR (or geometry processing on one GPU and rendering on another). The only thing you can do with multiple GPUs using Direct3D right now is AFR. That may change with Direct3D 12, but we're still a ways out from that release. Basically, AFR is the easiest approach to implement, but it has various drawbacks even when it does work properly.

Of course there are other potential pitfalls with doing alternative workload splitting like SFR. They can require more work from the CPU, and as you add GPUs the CPU already creates a potential bottleneck. AMD informed us that the engine in Civilization: Beyond Earth is actually extremely scalable with CPU cores, so while we're testing with an overclocked i7-4770K, AMD said they even saw a 20% improvement in performance (with Mantle) going from hex-core Ivy Bridge-E to octal-core Haswell-E with R9 290X CrossFire. There are apparently other cases where certain hardware configurations and game settings can result in an even greater improvement in performance thanks to Mantle (e.g. the 50% increase in minimum frame rates on the R9 290X at our 1080p High settings).

The bottom line is that if you have an AMD GPU, games like Civilization: Beyond Earth can certainly benefit. Maybe Direct3D 12 will bring similar options to developers next year, but in the meantime, congrats to both AMD and Firaxis for shining the light on the latency subject once again. NVIDIA made some waves with similar discussions when they released FCAT last year, but the topic of latency and jitters is definitely important – and don't even get me started on silliness like capping frame rates at 30FPS by default (cough, The Evil Within, cough).

Comments Locked

61 Comments

View All Comments

  • silverblue - Saturday, October 25, 2014 - link

    The original article has NVIDIA single GPU results. The point of this one is to highlight that the Mantle implementation works, as well as to point out its benefits as regards a more stable frame rate. Having NV results within this article would've just been confusing considering it's about comparing API performance on AMD cards. Just load both articles, put them side by side, and compare... or just wait for the SLI results before you do so.
  • ZeDestructor - Saturday, October 25, 2014 - link

    Stability will remain as it is right now. Hell, stability has remained pretty much at the same level since Vista launched with a new driver model.

    What may change is the level of support, but don't expect much. Thanks to the XBone, DX11.X (that's an actual version that adds a few DX12 bits to it) will become the "legacy"/baseline platform, while DX12 moves forward as the prime platform. Older DX versions will be pretty much left as is as much as possible: no point breaking the current good and stable versions.
  • ZeDestructor - Saturday, October 25, 2014 - link

    Crap.. meant to reply to D. Lister above....
  • chizow - Saturday, October 25, 2014 - link

    So is this AMD's response to G-Sync? Use 2 GPUs to reduce multi-GPU input latency/jitter? I thought it might actually be something cool like Nvidia's SLI AA where each GPU rendered slightly offset frames and merged them in framebuffer to produce a 2x sampled image, but I guess they would go this route if the game is already CPU-limited and already having trouble producing more frames.

    Interesting to see Nvidia has no problems beating AMD without Mantle however.
  • Creig - Monday, October 27, 2014 - link

    Uh, no. AMD is working on FreeSync, an adaptive-sync based solution to counter Nvidia's vendor locked, expensive G-sync. It sounds like you need to do some more reading before posting as you're mixing up your technologies.
  • TiGr1982 - Tuesday, October 28, 2014 - link

    AMD's response to G-Sync is FreeSync DisplayPort-based technology.

    AMD Hawaii GPU already supports FreeSync in hardware, according to AMD website; all one needs is the FreeSync supporting monitor (some gonna be soon in retail, in a matter of several months) - connect Hawaii card to it and there you go.

    chizow, if you are going to talk about something non-nVidia, then you have to get out of your beloved nVidia sandbox and read some things first.
  • Kevin G - Saturday, October 25, 2014 - link

    A couple of things. SFR isn't new as this article points out. However, it is different conceptually than 3dfx's SLI technology. What 3dfx did was interlacing by having one GPU render the even horizontal lines while the other renders the odd lines. This also took care of the problem of load balancing the workload as neighboring lines generally took the same amount of work to render. SFR reappeared with DirectX 9 and the Geforce 7950GX2. So nVidia implemented a load balancing scheme so that an upper half of a frame and lower half would be rendered on two different GPU's. Any dual GPU Geforce setup at the time had the option of using SFR in the drivers though they defaulted to AFR due to simplicity and compatibility. A dual Geforce 7950GX2 setup had the option of using pure AFR, pure SFR or a hybrid AFR + SFR. Scaling problems were abound on a dual 7950GX2 though. DX9 could only render a max of three frames simultaneously in its pipeline so pure AFR could only use 3 GPUs. SFR had scaling problems of its own due to the load balancing algorithm, especially in the pure SFR quad GPU scenario. The AFR + SFR scenario was interesting but incurred the bugginess of both implementations with no overall performance advantage.

    Things are a bit different than they were 8 years ago when the Geforce 7950GX2 launched. This SFR implementation is being done at the application level and not at the driver level. Due to the context, the results should be far better than in the past. The application can also feed predictive information into the SFR load balancer algorithm that a driver implementation would never have access to (and rightfully should not). This also leaves open the possibility of SFR + AFR in a quad GPU system. I'm really curious what frame rate controls Mantle exposes to the developers at the application level as this could help eliminate the latency issues by direct control. Being able to scale by both SFR and AFR opens the door 6 way and greater GPU systems actually being useful for gaming (they can only really be used for compute currently).

    The big downside with Mantle in this scenario (outside of Mantle currently being an AMD only API) is that the game developers have to tackle the handling of multiple GPUs. Personally I just don't see developers putting in the the time, effort and money to increase performance in these incredibly niche systems.
  • zodiacsoulmate - Sunday, October 26, 2014 - link

    no nvidia test.... also i don't really see how mantle is useful in civilization 5...
  • JarredWalton - Monday, October 27, 2014 - link

    Late in the game, you can get a map with tons of units on the screen, which can result in lots more draw calls than what you would get from a typical FPS. So if Mantle can increase the number of draw calls you can do, this will raise the minimum frame rate and average frame rate quite a bit. A great example of this is the minimum frame rates with the R9 290X. They go from 48 FPS to 56 FPS even at QHD, and at 1080p the difference is even more dramatic (49 to 68 FPS). And that's with a relatively beefy i7-4770K OC.
  • HisDivineOrder - Monday, October 27, 2014 - link

    You really shouldn't apologize. Your experience is the same one most users would have had if people like you didn't call them out for not enabling support when the setting is enabled in-game.

    Do you really think most people are going to realize something is amiss and that a text file ALSO has to be edited?

    Nope. They'll just go on in ignorance because they'll assume--as most anyone would assume--that the game will work the way it's supposedly advertised to work.

    Whoops. If anyone should be apologizing, it's the person who chose to make editing a text file necessary to make the feature work. It also begs the question, "Why?" Is the feature still in beta? Alpha? A work in progress? Is that why text files need to be edited?

    That's usually why, after all.

Log in

Don't have an account? Sign up now