In our first article, we explained that dynamic power, power leakage, the memory wall and wire delay have forced CPU designers to rethink the methods that they use to achieve higher performance CPUs.

In Part 2, we will investigate the advantages and disadvantages of the new market trend: multi-core CPUs. Will dual core enhance your gaming experience? Tim Sweeney, the leading developer behind the Unreal 3 engine, was so kind to answer our questions about multi-threaded development with concise answers. There is more - in the third part of this series, we will investigate what future multi-core and single core architectures will bring. We examine if the stories about "the new era of multi-threaded multi-core CPUs" are true and whether or not this will really benefit the consumer.

Should you care?

Should you care whether or not we are moving to multi-core and multi-threaded CPUs? After all, the past decades, we were able to get consistently more performance for lower prices. However, it is pretty unclear whether or not multi-cores will benefit all consumers. We will explain this statement in more detail, but it is very interesting to see whether or not it will benefit you. The last spring IDF was all about multi-core CPUs, but there was very little information on how this is going to benefit the consumers. Let us take a critical look at this new direction that the desktop CPUs have taken.

Multi-core, multi-expensive?

Dual cores are expensive to manufacture. Yields (the number of working chips on one wafer) are roughly proportional to size. Larger, dual core chips will always have lower yields than smaller, single core chips on the same process technology. But that is only a small problem. A bigger and more obvious problem is that you have only half the number per wafer (even slightly less). So, dual cores (such as Pressler) cost at least twice as much to manufacture compared to a single core chip - most likely more (such as Yonah, Pentium-D). Dual and multi-cores might not increase the thermal density (dissipated power per mm²), but they do increase the total power. Granted, from the viewpoint of a heat sink designer, it is not much harder to cool a 112 mm² Prescott chip that dissipates +/- 90 Watt than a theoretical 206 mm² Pentium-D with 180 Watt. However, making sure that those 180 Watts do not cook all the components inside your computer is almost an impossible task for the system designer who wants to design a relatively silent PC. The result is that multi-core CPUs will run at lower clockspeeds than their single core counterparts. The Pentium-D, the dual core Prescott, is limited to 130 Watt and 3.2 GHz, while the current Prescott dissipates up to 115 Watt and runs at 3.8 GHz. And last, but not least, dual core CPUs need more bandwidth than a single core to make a difference and increase the "CPU perceived" latency. Cache coherency and getting access to the same memory bus all increase the total latency that the CPU sees and thus, lowers performance.

Multi-core, multi-performance?

The advantages of multi-core and multi-threaded CPUs far outweigh the disadvantages in the server market. While most server applications produce a lot of threads and processes, performance scales close to linear as more cores are added to the die. This is in sharp contrast with the superscalar CPU where increasingly complex designs require exponentionally more transistors, and power show diminishing returns, especially in server applications where the IPC can go below 1. While Dual core CPUs are more expensive to manufacture, they are far easier to design than turning a single core CPU into an even wider issue, complex CPU. Development costs for a new CPU design are astronomically high. So, it does not surprise us at all that Server CPU manufacturers have turned en masse towards multi-core CPU designs: significant power gains with a fraction of the time and money invested. And the same can be said about a big part of the HPC market.

A good example of how well server applications can scale with more CPUs, refer to our DB2 tests, which showed up to a 96% performance increase going from single to dual, and a boost of up to 89% when we increased the number of Opterons from two to four. Most desktop and many workstation applications are single-threaded, however. Or more accurately, they might be multithreaded to be more responsive, but there is only one thread that really needs CPU power.

Even some workstation applications that are supposed to be prime examples of multi-threaded applications are not as multi-core friendly as they appear to be. I ran a lot of Adobe Premier benchmarking with different video formats, and I found out that the second CPU offered a meagre 10% to 40% speed increase in video editing (rendering). 3DSMax shows only big increases when you use very complex scenes. When using a relatively light animation scene, the second CPU adds about 20% to 50%. One of the best scenes, the architecture scene of the Spec test, shows an 89% increase when adding a second Opteron, but two extra Opterons already show some diminishing returns - performance went up to 72%.

Multitasking scenarios might be another way to use the power of dual and multi-cores. However, many of the CPU heavy applications that desktop and workstation users like to run in the background - archiving, encoding - also operate on the hard disk. And despite the merits of NCQ (Native Command Queuing), high rotation speeds, and lower seek times, disk heavy tasks and especially multithreaded ones can bring a whole system to a crawl when there is too much hard disk activity. So, it is clear that there are big challenges ahead before multi-core CPUs will really bring benefits to most consumers and employees.

Threads & Performance
Comments Locked


View All Comments

  • Pjotr - Monday, March 14, 2005 - link

    "dual-cored GPUs are stupid. given the parallel nature of graphics, it makes more sense to just add another pipeline at very little design cost."

    Unless you hit a power and/or heat output wall.

    Tell nVidia that parallell GPUs are bad, they alreay sell their SLI solution for dual-GPU computers.
  • silverwolf - Monday, March 14, 2005 - link

    PPU, is the way to go.
  • defter - Monday, March 14, 2005 - link

    "Given the immense complexity involved, I expect dual cores taking a VERY VERY long time to catch on... even then it'll be a half assed job."

    Well ALL future consoles will use multi core CPUs. Thus if developers want to sell games, their games must take advantage of at least two cores :)
  • ceefka - Monday, March 14, 2005 - link

    #12 That's how it looks, for now. "No one will ever need more than 8 cores." :-D

    My dumb question, I was reading:

    "Tim clearly emphasizes that only parts of the application can be economically parallelized. Increasing parallelisation, using more threads, is simply not feasible. There is a pretty hard economic limit to TLP."

    Isn't a high IPC-count also a form of parallelism? If so, then beyond a certain count won't it be just as hard to take advantage of a high IPC-count.
  • sandorski - Monday, March 14, 2005 - link

    Good article. Most of it was over my head, but the gist was most important. That being that Multicore is a big question mark for Gamers and other common users.

    I've always preferred Sweeney over others in the industry, he knows what he's doing without getting in everybody elses face about it. I also found it appropriate that he was interviewed on the subject since Unreal Engines have always internally manged 100's of Processes in order to work(I assume other Engines do the same, but my knowledge of hte Unreal Engines is more thorough than them).If anyone can figure out how to use Multicore in gaming my money's on Sweeney.
  • Calin - Monday, March 14, 2005 - link

    Auto-parallelization is of limited use, and it can work only on small pieces of code. You might get a couple of percent extra speed, but no more (or no much more). Managing multi-threaded code interdependent code would be a nightmare.
    However, some "extra" speed can be recovered in case of multi processors (or multi core) from the reduced state (thread) change. Certainly not extra 24%, but more than a bit.
  • FDSatyr - Monday, March 14, 2005 - link

    Good read - mainly for Tim's comments though! I really enjoy the way Tim isn't arrogant at all in the way he talks. Some fairly silly questions from A though! I still think threads are rubbish, that processes and better schedulers are the way forward. I think the next step - realistically impossible in the industry it may be - would be to create a fresh architecture, and put an x86/87 core on the same die. Ho-hum.
  • mkruer - Monday, March 14, 2005 - link

    My magic eightball says that after 4-8 cores, any other core that will be added will be near worthless.
  • AnandThenMan - Monday, March 14, 2005 - link

    Anand got owned by Tim Sweeney.

    AnandTech: Did you make use of auto-parallelisation compiler technology...
    Tim Sweeney: Auto-parallelization of C++ code is not a serious notion.

    Good article though.
  • overclockingoodness - Monday, March 14, 2005 - link

    Oh my God - the Unreal 3 engine is beautiful. I get amazed everytime I look at it. I can't wait for games to be featured on Unreal 3 and the like engines in the future. So fine...

Log in

Don't have an account? Sign up now