AMD's dual and quad platform: consistency

AMD's PR is making a lot of noise about consistency, and rightly so. The quad socket and dual socket processors are - besides the obviously different multiprocessor capabilities - exactly the same. In the case of virtualization, this allows you to optimize your virtual machines and hypervisors once and then clone them as much as you like. There are fewer worries when moving virtual machines around, and there is no fiddling with masking processor capabilities. This is also well illustrated when you check what mode the VMware ESX virtual machines run. The table is pretty simple when you look at VMs running on top of an AMD processor: the virtual machines running on dual Opterons will run software virtualization, while the quad-cores will almost always run in the fastest mode (hardware virtualization combined with hardware assisted paging). The same is true for Hyper-V: it won't run on the dual-core Opterons and it will run at full speed on the quad-cores. It is remarkably simple compared to the complete mess Intel made: some of the old Pentium 4 based CPUs support VT-x, some don't. Some of the lower end Xeons launched in 2007 and 2008 don't and so on.

There is some inconsistency on HyperTransport and L3 cache speeds, but those will only cause small performance variations and no software management troubles. Of course, AMD's very consistent dual and quad socket platform is not without flaws either. The NVIDIA MCP55 Pro chipset was at times pretty quirky when installing new virtualization software. Most of the time, a patch took care of that, and the Opteron servers were running rock solid afterwards, but in the meantime a lot of valuable time was wasted. Also, the current platform has not evolved for years and is starting to show its age: we found out that the motherboards consume a bit more power than they should. In 2010, all Opteron server platforms will use AMD chipsets only.

The core part of the new hex-core Opteron is the identical to that of the quad-core, but the "uncore" part has some improvements. With the exception of the 2.8GHz 2387/8387 and 2.9GHz 2389/8389, most quad-core Opterons still connect with 1GHz HyperTransport links. The hex-core Opteron runs with speeds between 2 and 2.4GHz. The hex-core Opteron always connects to the other CPUs in the server via 2.4GHz HyperTransport links. That makes little difference in a 2P server, but performance gets quickly limited by interconnection speeds in 4P. Even at 2.4GHz (9.6GB/s interconnect), probe broadcasting can limit performance, and that is why you can reserve up to 1MB of cache for a snoop filter. These improvements make the hex-core Opteron a more interesting choice than the quad-core Opterons - even at lower clock speeds - for quad socket servers.

In fact, we feel that besides the very low power Opteron 2377 EE, the quad-core Opterons are of little use. If your application scales relatively badly, there is the X55xx series which offers much better "per thread" performance. If your application scales well, two 2.6GHz Opteron 2435 will offer 15% better (and sometimes more) performance than a 2.9GHz Opteron 2389 with the same power consumption. Using relatively "old" technology such as DDR2, the hex-core Opteron based servers are very affordable, especially if you compare them with similar Xeon servers.

The Intel Dual socket platform: pricey performance and performance/watt champion

We have already tested the new dual socket "Nehalem" Xeon platform. It is the platform with the fastest interconnects, the most threads per socket (thanks to Hyper-Threading), the most bandwidth (triple-channel) and the most modern virtualization features (Intel VT-D). Even the top models are far from power hogs: at full load, the X5570 offers an excellent performance/watt ratio. The low-power L5520 at 2.26GHz was a real champion in our performance per watt tests and is available at reasonable prices.

The relatively new platform (chipset, DDR3) is still on the expensive side: a similarly configured Dell R710 (two Xeon 5550 2.66GHz, 8 x 4GB 1066MHz DDR3) costs about one third more than a Dell R805 (Two Opteron 2435, 8 x 4GB 800MHz DDR2): $5047 versus $3838 (pricing at the end of September 2009). If you chose the Xeon platform, you should be aware of the fact that Intel's low end is much less interesting: the best Xeon 55xx CPUs have a clock speed between 2.26 and 2.93GHz. The low end models, the 5504 and 5506 are pretty crippled, with no Hyper-Threading, no Turbo Boost, and only half as much L3 cache (4MB). These crippled CPUs can keep up with the quad-core Opterons at about 2.5GHz, but they are the worst Xeons when you look at idle and full load power. The performance per Watt of the Xeon EE550x is pretty bad compared to the more expensive parts.

The Intel Quad socket platform

There is no quad socket version of Intel's excellent "Xeon Nehalem" platform. We will have to wait until the Nehalem-EX servers ship in the beginning of 2010. At that time, servers with the octal-core 24MB L3 cache CPU will almost certainly end up in a higher price class than the current quad socket servers. One indication is that Intel positions the Nehalem-EX as a RISC market killer. Then again, Intel might as well bring out quad-core versions too. We will have to wait and see.

So there's no Hyper-Threading, Turbo Boost, EPT, NUMA, or fast interconnects for the current Xeon "Dunnington" platform, which is still based on a "multi independent FSB" topology. It has massive amounts of bandwidth in theory (up to 21GB/s), but unfortunately less than 10GB/s is really available. Snooping traffic consumes lots of bandwidth and increases the latency of cache accesses. The 16MB L3 cache should lessen the impact of the relatively slow memory subsystem, but it is only clocked at half the clock speed of the core. A painful 100 cycle latency is the result, but luckily every two cores also have a shared and fast 3MB L3 cache.

When it was first launched, the Xeon MP defeated the AMD alternatives by a good margin in ERP and heavy database loads. It reigned supreme in TPC-C and broke a few new records. More importantly it took back 9% of market share in the quad socket market according to the IDC Worldwide Server Tracker. But at that time, the 2.66GHz hex-core had to compete with a 2.5GHz quad-core Opteron with a paltry 2M of shared L3, and AMD has been working hard on a comeback. The massive Intel chip (503 mm2) has to face a competitor that has three times as much L3 cache and 50% more cores at higher clock speeds, and that is not all: the DDR2-800 DIMMs deliver up to 42GB/s or four times as much bandwidth to the four AMD chips. At the same time, the Xeon behemoth has to outpace the ultra modern Dual Xeon platform by a decent margin to justify its much higher price.

Index What Intel and AMD Are Offering
Comments Locked

32 Comments

View All Comments

  • rbbot - Tuesday, October 6, 2009 - link

    Surely the high price of 8GB Dimms isn't going to last very long, especially with Samsung about to launch 16GB parts soon.
  • Calin - Wednesday, October 7, 2009 - link

    8GB DIMMs have two markets: one would be upgrade from 4GB or 2GB parts in older servers, the other would be more memory in cheaper servers. As the demand can be high, it all depends on the supply - and if the supply is low, prices are high.
    So, don't count on the price of 8GB DIMMs to decrease soon
  • Candide08 - Tuesday, October 6, 2009 - link

    One performance factor that has not improved much over the years is the decrease in percentage of performance gains for additional cores.

    A second core adds about 60% performance to the system.
    Third, fourth, fifth and sixth cores all add lower (decreasing) percentages of real performance gains - due to multi-core overhead.

    A dual socket dual core system (4 processors) seems like the sweet spot to our organization.
  • Calin - Wednesday, October 7, 2009 - link

    If your load is enough to fit into four processors, then this is great. However, for some, this level of performance is not enough, and more performance is needed - even if paying four times as much for twice as much performance
  • hifiaudio2 - Tuesday, October 6, 2009 - link

    FYI the R710 can have up to 192gb of ram...

    12x16GB

    not cheap :) but possible

  • JohanAnandtech - Tuesday, October 6, 2009 - link

    at $300 per GB, or the price of 2 times 4 GB DIMMs, I don't think 16 GB DIMMs are going to be a big success right now. :-)
  • wifiwolf - Wednesday, October 7, 2009 - link

    for at least 5 years you mean
  • mamisano - Tuesday, October 6, 2009 - link

    Great article, just have a question about the power supplies. Why do the quad-core servers need a 1200W PSU if the highest measured load was 512W? I know you would like to have some head-room but it looks to me that a more efficient 750 - 900W PSU may have provided better power consumption results... or am I totally wrong? :)
  • JarredWalton - Tuesday, October 6, 2009 - link

    Maximum efficiency for most PSUs is obtains at a load of around 40-60% (give or take), so if you have a server running mostly under load you would want a PSU rated at roughly twice the load power. (Plus a bit of headroom, of course.)
  • JohanAnandtech - Wednesday, October 7, 2009 - link

    Actually, the best server PSUs are now at maximum efficiency (+/- 3%) between 30 and 95% load.

    For example:
    http://www.supermicro.com/products/powersupply/80P...">http://www.supermicro.com/products/powersupply/80P...

    And the reason why our quads are using 1000W PSUs (not 1200) is indeed that you need some headroom. We do not test the server with all DIMM slots filled and you also need to take in account that you need a lot more power when starting up.

Log in

Don't have an account? Sign up now