Updating AnandTech’s 2013 Mobile Benchmark Suite (RFC)by Jarred Walton on January 29, 2013 9:45 PM EST
- Posted in
If it seems like just last year that we updated our mobile benchmark suite, that’s because it was. We’re going to be keeping some elements of the testing, but with the release of Windows 8 We’re looking to adjust other areas. This is also a request for input (RFC = Request for Comments if you didn’t know) from our readers on benchmarks they would like us to run—or not run—specifically with regards to laptops and notebooks.
We used most of the following tests with the Acer S7 review, but we’re still early enough in the game that we can change things up if needed. We can’t promise we’ll use every requested benchmark, in part because there’s only so much time you can spend benchmarking before you’re basically generating similar data points with different applications, and also because ease of benchmarking and repeatability are major factors, but if you have any specific recommendations or requests we’ll definitely look at them.
General Performance Benchmarks
We’re going to be keeping most of the same general performance benchmarks as last year. PCMark 7, despite some question as to how useful the results really are, is at least a general performance suite that’s easy to run. (As as side note, SYSmark 2012 basically requires a fresh OS install to run properly, plus wiping and reinstalling the OS after running, which makes it prohibitively time consuming for laptop testing where every unit comes with varying degrees of customization to the OS that may or may not allow SYSmark to run.) We’re dropping PCMark Vantage this year, mostly because it’s redundant; if Futuremark comes out with a new version of PCMark, we’ll likely add that as well.
At least for the near term, we’re also including results for TouchXPRT from Principled Technologies; this is a “light” benchmark suite designed more for tablets than laptops (at least in our opinion), but it does provide a few other results separate from a monolithic suite like PCMark 7. We’ll also include results from WebXPRT for the time being, though again it seems more tablet-centric. We don’t really have any other good general performance benchmarking suites, so for other general performance benchmarks we’ll return once again to the ubiquitous Cinebench 11.5 and x264 HD. We’re updating to x264 HD 5.x, however, which does change the encoding somewhat, and if a version of x264 comes out with updated encoding support (e.g. for CUDA, OpenCL, and/or Quick Sync) we’ll likely switch to that when appropriate. We’re still looking for a good OpenCL benchmark or two; WinZip sort of qualifies, but unfortunately we’ve found in testing that 7-zip tends to beat it on file size, compression time, or both depending on the settings and files we use.
On the graphics side of the equation, there doesn’t seem to be a need to benchmark every single laptop on our gaming suite—how many times do we need to see how an Ultrabook with the same CPU and iGPU runs (or doesn’t run) games?—so we’ll continue using 3DMark as a “rough estimate” of graphics performance. As with PCMark, we’re dropping the Vantage version, but we’ll continue to use 3DMark06 and 3DMark 11, and we’ll add the new version “when it’s done”. We’re considering the inclusion of another 3D benchmark, CatZilla (aka AllBenchmark 1.0 Beta19), at the “Cat” and “Tiger” settings, but we’d like to hear feedback on whether it makes sense or not.
Finally, we’ll continue to provide analysis of display quality, and this is something we really hope to see improve in 2013. Apple has thrown down the gauntlet with their pre-calibrated MacBook, iPhone, iPad, and iMac offerings; if anyone comes out with a laptop that charges Apple prices but can’t actually match Apple on areas like the display, touchpad, and overall quality, you can bet we’ll call them to the carpet. Either be better than Apple and charge the same, or match Apple and charge less, or charge a lot less and don’t try to compete with Apple (which is a dead-end race to the bottom, so let’s try to at least have a few laptops that eschew this path).
As detailed in the Acer S7 review, we’re now ramping up the “difficulty” of our battery life testing. The short story is that we feel anything less than our previous Internet surfing test is too light to truly represent how people use their laptops, so we’re making that our Light test. For the Medium test, we’ll be increasing the frequency of page loads on our Internet test (from every 60 seconds down to every 12 seconds) and adding in playback of MP3 files. The Heavy test is designed not as a “worst-case battery life” test but rather as a “reasonable but still strenuous” use case for battery power, and we use the same Internet test as in the Medium test but add in looped playback of a 12Mbps 1080p H.264 video with a constant FTP download from a local server running at ~8Mbps (FileZilla Server with two simultaneous downloads and a cap of 500KBps, downloading a list of large movie files).
Other aspects of our battery testing also warrant clarification. For one, we continue to disable certain “advanced” features like Intel’s Display Power Saving Technology (which can adjust contrast, brightness, color depth, and other items in order to reduce power use). The idea seems nice, but it basically sacrifices image quality for battery life, and since other graphics solutions are not using these “tricks” we’re leaving it enabled. We also disable refresh rate switching, for similar reasons—testing 40Hz on some laptops and 60Hz on others isn’t really apples-to-apples. Finally, we’re also moving from 100 nits brightness to 200 nits brightness for all the battery life testing, and the WiFi and audio will remain active (volume at 30% with headphones connected).
In truth, this is the one area where there is the most room for debate. Keep in mind that when testing notebooks, we’re not solely focused on GPU performance most of the time (even with gaming notebooks); the gaming tests are only a subset of all the benchmarks we run. We’ll try to overlap with our desktop GPU testing where possible, but we’ll continue to use 1366x768 ~Medium as our Value setting, 1600x900 ~High as our Mainstream setting, and 1920x1080 ~Max for our Enthusiast setting. Beyond the settings however is the question of which games to include.
Ideally, we’d like to have popular games that also tend to be strenuous on the graphics (and possibly the CPU as well). A game or benchmark that is extremely demanding of your graphics hardware that few people actually play isn’t relevant, and likewise a game that’s extremely popular but that doesn’t require much from your hardware (e.g. Minecraft) is only useful for testing low-end GPUs. We would also like to include representatives of all the major genres—first person shooter/action, role-playing, strategy, and simulation—with the end goal of having ten or fewer titles (and for laptops eight seems like a good number). Ease of benchmarking is also a factor; we can run FRAPS on any game, but ideally a game with a built-in benchmark is both easier to test and produces more reliable/repeatable results. Frankly, at this point we don’t have all that many titles that we’re really set on including, but here’s the short list.
Elder Scrolls: Skyrim: We’ve been using this title since it came out, and while it may not be the most demanding game out there, it is popular and it’s also more demanding (and scalable) than most other RPGs that come to mind. For example, Mass Effect 3 generally has lower quality (also DX9-only) graphics and doesn’t require as much from your hardware, and The Witcher 2 has three settings: High, Very High, and Extreme (not really, but it doesn’t scale well to lower performance hardware). Skyrim tends to hit both the CPU and GPU quite hard, and even with the high resolution texture pack it can still end up CPU limited on some mobile chips. Regardless of our concerns, however, we can’t think of a good RPG replacement, so our intention is to keep Skyrim for another year.
Far Cry 3: This is an AMD-promoted title, which basically means they committed some resources to helping with the games development and/or advertising. In theory, that means it should run better on AMD hardware, but as we’ve seen in the past that’s not always the case. This is a first-person shooter that has received good reviews and it’s a sequel to a popular franchise with a reputation for punishing GPUs, making it a good choice. It doesn’t have a built-in benchmark, so we’ll use FRAPS on this one.
Sleeping Dogs: This is another AMD-promoted title. This is a sandbox shooter/action game with a built-in benchmark, making it a good choice. Yes, right now that's two for AMD and none for NVIDIA, but that will likely change with the final list.
Sadly, that’s all we’re willing to commit to at this point, as all of the other games under consideration have concerns. MMORPGs tend to be a bit too variable, depending on server load and other aspects, so we’re leaving out games like Guild Wars 2, Rift, etc. For simulation/racing games, DiRT: Showdown feels like a step back from DiRT 3 and even DiRT 2; the graphics are more demanding, yes, but the game just isn’t that fun (IMO and according to most reviews). That means we’re still in search of a good racing game; Need for Speed Most Wanted is a possibility, but we’re open for other suggestions.
Other titles we’re considering but not committed to include Assassin’s Creed III, Hitman: Absolution, and DmC: Devil May Cry; if you have any strong feelings for or against the use of those titles, let us know. Crysis 3 will hopefully make the grade this time, as long as there's no funny business at launch or with the updates (e.g. no DX11 initially, and then when it was added the tessellation was so extreme that it heavily favored NVIDIA hardware, even though much of the tessellation was being done on flat surfaces). Finally, we’re also looking for a viable strategy game; Civ5 and Total War: Shogun 2 could make a return, or there are games like Orcs Must Die 2 and XCOM: Enemy Unknown, but we’re not sure if either meet the “popular and strenuous” criteria, so we may just hold off until StarCraft II: Heart of the Swarm comes out (and since that games on “Blizzard time”, it could be 2014 before it’s done, though tentatively it’s looking like March; hopefully it will be able to use more than 1.5 CPU cores this time).
As stated at the beginning, this is a request for comments and input as much as a list of our plans for the coming year. If you have any strong feelings one way or the other on these benchmarks, now is the time to be heard. We’d love to be able to accommodate every request, but obviously there are time constraints that must be met, so tests that are widely used and relevant are going to be more important than esoteric tests that only a few select people use. We also have multiple laptop reviewers (Dustin, Jarred, and occasionally Vivek and Anand), so the easier it is to come up with a repeatable benchmark scenario the better. Remember: these tests are for laptops and notebooks, so while it would be nice to do something like a compilation benchmark, those can often take many hours just to get the right files installed on a system, which is why we’ve shied away from such tests so far. But if you can convince us of the utility of a benchmark, we’ll be happy to give it a shot.
Post Your CommentPlease log in or sign up to comment.
View All Comments
JarredWalton - Wednesday, January 30, 2013 - linkThat's the problem: if HOTS is anything like the first SC2, the playback mode will require you to view the whole battle to get to the benchmark portion. Still, it can be done if that's the only way to do it. We actually did a variety of tests with SC2; one was single-player, but several were multiplayer with lots of units later in the game. The ones with lots of units were even more CPU limited, and I think Anand mostly used them for CPU testing.
IanCutress - Wednesday, January 30, 2013 - linkHey Jarred, a C++ AMP benchmark would be great. It would automatically run on the most powerful AMP device on the machine (either multi-dGPU, dGPU, iGPU or CPU) and can provide a comparison point for testing GPU/AMP simulations while on-the-go. Have a look at the C++ AMP example site at MS, and run the n-body simulation in MultiAMP mode (it will default to the best mode regardless of dGPU, iGPU, CPU or SLI/CFX) with a fixed number of bodies. Either take FPS or GFLOPs, and it only takes 5 seconds to get a result.
JarredWalton - Wednesday, January 30, 2013 - linkIs there a pre-compiled version available? Seems like you've used this before, so can you send me your binary?
Rick83 - Wednesday, January 30, 2013 - linkEspecially with mobile devices, it can be important to protect data from physical theft, which is where encryption comes in.
On laptops used in most businesses, this is obligatorily activated.
What's important to know then, is
1) does the standard installation provide an easy way to encrypt? (built-in HDD/SSD supports encryption, Bitlocker easy to set up (TPM present), pre-installed software offering encryption capability)
2) how does this modify some of the I/O + CPU-heavy benchmarks (or how many CPU-cycles are lost in a max-I/O situation)?
3) how does a standard (dm_crypt would be one choice, bitlocker or truecrypt alternatives ) encryption algorithm perform on the device?
An additional battery life test would be nice to see (encryption on vs off) but as those take a lot of time to run, I wouldn't want to impose that upon anyone.
Point number 3, arguably the most important, is very simple to test - set up a bootable USB key (in the dm_crypt case) with a Linux system that performs the benchmarks, writes the result into the benchmark database, and then reboots. All it takes, is some free disk space on the integrated storage, so that it can perform write-benchmarks (ideally RAW, as otherwise the NTFS driver might have a minute impact on the results, but a bit file, loop-mounted as a block-device would also work.
DanNeely - Wednesday, January 30, 2013 - linkI'd be interested in if TPMs are available in any consumer systems too. I haven't seen them mentioned anywhere; but when they're conspicuously absent on Dell's Latitude pages I'm not sure if I can draw any conclusions from that.
This is of interest to me because Truecrypt doesn't fully support Win8 yet and bitlocker's usability is badly degraded without a TPM.
Death666Angel - Wednesday, January 30, 2013 - linkI'm not sure if that is completely relevant to this benchmark thread, but I would be very interested in seeing low wattage CPUs (35W quad, 17W dual) be tested for their turbo capabilities. This could be done using any number of torture tests or extremely high setting games (to be more "realistic", although torture programs would point to realistic results down the road when the laptop is a year old and starting to build up dust in grim in the cooling system). You would run these tests, record performance metrics and clock frequencies. This way we can see how the cooling system handles the load and how the limited TDP range allows for simultaneous GPU/CPU turbo modes. :)
JarredWalton - Wednesday, January 30, 2013 - linkThis of course would be more a test of the laptops using these CPUs than of the CPUs themselves I think. With enough cooling, 17W and 35W chips should be able to run at near-max Turbo constantly, but in most Ultrabooks and smaller laptops they can't do so because the cooling is insufficient. I've got the ASUS UX51VZ to review still, and that's definitely something I'll look at.
Death666Angel - Wednesday, January 30, 2013 - linkYeah, I realize that the test I proposed is not strictly a component test but more of a platform/OEM test. :) Still, very interesting and with many components being equal in a lot of laptops these days, very important for purchasing decisions. Looking forward to your review! :D
But wasn't there an article even here on AT where it showed that even with sufficient cooling Ultrabook ULVs did not reach both the max GPU and max CPU turbos because of their limited TDP capacity? Or is it really purely thermal limitations? I also remember reading that in AMD ULV chips (A8-4555M, A10-4655M especially), the limited TDP (19/25W) limits the turbo modes of the GPU/CPU, that there is a trade off between the two. That is why, especially in CPU intensive games, an A8-4555M with a small discrete graphics card can be better, even if the dGPU would be inferior to the iGPU, because the CPU part can turbo higher with the iGPU part being off.
Wow, I hope I made myself understood. :D
JarredWalton - Wednesday, January 30, 2013 - linkYes, you're correct. If you try to run HD 4000 and the CPU at full load (e.g. playing almost any game), the 17W TDP comes into play. My best estimate is that the CPU cores in a ULV IVB processor can draw around 15W and the HD 4000 can draw around 10W, so something has to give. Oddly, even though the HD 4000 showed clocks around 900MHz (just 20% lower than the max), actual performance was down more like 20-40% from standard voltage IVB mobile chips, indicating the clocks reported by HWiNFO may not be accurate.
Death666Angel - Wednesday, January 30, 2013 - linkThanks for the follow up on that! :D
Yeah, I'd be very interested in articles that could tackle those things with laptops. I'm in the market for ultra mobile (13.3", below 1.6kg) laptops or even tablet/laptop hybrids. Nearly all reasonably priced options have ULV processors (most Intel, some AMD). But gaming is still an important thing for me, although not top priority. I don't need to be playing the latest Hitman @ full res. But getting playable rates with Portal2/CoD4 comparable games in non-lowest settings is something I would like very much (currently I have an i3-330UM and used that for some gog.com stuff... worked okay).
Especially with Intel pursuing faster and faster iGPUs while continually reducing the power envelope, I think it is important to put their feet to the fire so to speak.
Looking forward to seeing those kinds of things! Keep up the good work you are already doing! :)