We were quite amazed, even slightly suspicious, when HP and Fujitsu-Siemens Published their SAP numbers. These numbers showed that the newest Xeon X5570 (Nehalem EP) series offer an enormous performance boost over the Xeon X5470 (Harpertown). After all, an almost 100% improvement at a slightly lower speed (2.93 GHz vs 3.3 GHz) is nothing short of amazing. Turns out that the real clockspeed is 3.2 GHz (2.93 GHz + 266 MHz turbo) but that does not alter the fact that these are truly incredible performance numbers.

I can now confirm that there are no tricks behind these numbers: they paint the right picture about the Xeon Nehalem EP. Talking to SAP benchmarking specialists, it became clear that few tuning tricks exist that are not know to the big OEM. The benchmark has been analyzed and tuned so well, that even the use of a different database (for example MS SQL instead of DB2) only makes a 2 to 3% difference most of the time. So you might even compare SAP numbers which are obtained on different databases. To resume, the SAP numbers can only be really boosted by better hardware (CPU-memory).
 
Now why I am talking so much about SAP benchmarking numbers? It is not like the expensive ERP software is run by everyone.
 
Well, the SAP numbers are showing a dual 2.93 GHz (or 3.2 GHz) Xeon beating the only quad AMD 8384 (Shanghai at 2.7 GHz) score of 22000 we have so far. Granted, a blade server is most of the time a bit slower. But four AMD 8384 2.7 GHz will be in the same league as a dual Xeon X5570, which will be out very soon now.
 
Even worse for AMD is that the SAP benchmark is not some exotic exceptional benchmarking case for the Xeon 55xx series. It shall be no surprise that the HPC numbers will be very impressive too.So it looks like AMD is in a tough spot.
 
What happened? 
As the SAP threads are sharing a lot of data (as is typical for these kind of database driven applications), hyperthreading can not be the only explanation why Nehalem is simply doubling performance and annihilating the competition. SAP benchmarking specialists expect hyperthreading to be good for about one third of the performance boost. We tend to believe these people who performed this benchmark for years now. The reason why it is not one of the "top cases" for hyperthreading on Nehalem is that this OLTP based benchmark spends a lot of time on shared data. Our own Nehalem OLTP benchmarking (Oracle and MySQL) points also in that direction.
 
As we have pointed out before the benchmark also
  • responds very well to low latency cache and memory latency
  • does not care too much about memory bandwith
  • and is very sensitive to "syncing latency".
Since the AMD Shanghai CPU has the same fast way to sync between cores (via the L3-cache) as Nehalem, it can not explain why AMD falls behind. Another explanation is of course that these benchmarks are run on a CPU which uses turbo, which explains about a 5% advantage as the Nehalem CPU actually runs at 3.2 GHz. 
 
Nehalem has faster access to the memory than AMD's latest quadcore (70 ns vs 110 ns), which is probably the second reason why Shanghai falls behind. But AMD will probably have to redesign it's integer execution pipeline significantly before it will catch up with Nehalem (think memory disambiguation for example). Basically, AMD's better NUMA - integrated memory controller platform was hiding this disadvantage. Now that the new Intel platform does not put "the brakes" on the integer execution engine anymore, the superiority of Intel's integer engine is showing.
 
The lack of any form of multi-threading is hurting AMD badly. It is well known that most of these business applications achieve very low IPC (0.2-0.6) and that modern superscalar CPUs have ample execution resources for running two threads in these applications. The results is Simultaneous Multi Threading offers a typical 20 tot 40% performance advantage. And that is huge, considering that you need 25 to 50% more clockspeed to counter that. It is basically a mission impossible for a modern CPU without SMT to outperform a similar superscalar CPU with SMT in OLTP, Java, webserver, rendering and ERP workloads. AMD really dropped the ball there, SMT should have been part of the K10 architecture.
 
Difficult times ahead for AMD
Even if AMD is able to speed up beyond 3 GHz, chances are slim that AMD will be able to compete with the new Nehalem Xeons. Add Turbo mode, hyperthreading, a lower latency memory controller and a better integer core together and you get a performance gap the size of the "Grand Canyon".
 
So does AMD have any chance at all beyond a new architecture in 2011? Is it over and out for AMD in 2009 and 2010? Adding 2 cores at the end of 2009 is a good step in the right direction. But even if AMD executes flawlessly  the 32 nm Xeon Westmere will only give a window of a few months to the AMD hexacore "Istanbul".  Istanbul should appear at the end of 2009, the Westmere Xeon is scheduled for very early 2010.
 
Westmere has few performance optimizations, it seems to be a pretty straight forward shrink. Slightly higher clockspeeds, about 20% lower power consumption, and yet another addition to the ridiculously long list of SSE-instructions in the form of seven new instructions (six instructions are for crypto/AES acceleration). Westmere is only an evolutionary step forward, but the "Grand Canyon" gap that Nehalem EP has made is probably large enough.

 

It is sure that we'll see better (lower) virtualization switching from virtual machine to hypervisor time and some small tweaks in AMD's Istanbul CPU, but it remains unclear if there are any significant performance boosters in the core. So it looks like Intel will own the dual socket space throughout 2009 and 2010, if we may believe the current roadmaps.
 
As the SAP numbers indicate,  even the slowest Intel Xeons will show a large performance gap with the best AMD Opteron's. Is AMD doomed completely? In a large part of the market, yes. AMD's istanbul will make the gap a bit smaller but probably not small enough. 
 
There are some unknown factors that together with one of the few remaining weaknesses (or rather less strong points) of Nehalem that might make it possible that AMD's opteron comes close enough in a particular area of the market. In my next post, I will clarify the one and only opportunity that I see for AMD in the next two years.  Until then, don't shoot the messenger :-).
Comments Locked

35 Comments

View All Comments

  • drothgery - Wednesday, February 11, 2009 - link

    You don't expect dual-socket Intel systems to be significantly cheaper than quad-socket AMD systems? I mean, our AMD fanboi might think AMD quad-sockets will be significantly cheaper than Intel dual-sockets, but I kind of expect it will be the other way around, myself. I mean, even with the price penalty for FB-DIMMS (worse than the price penalty for DDR3), dual-socket Xeon boxes are a lot cheaper than quad-socket Opteron boxes right now (based on a cursory glance at rack server pricing at Dell.com).
  • JohanAnandtech - Wednesday, February 11, 2009 - link

    "You don't expect dual-socket Intel systems to be significantly cheaper than quad-socket AMD systems?"

    I meant that I don't expect that a dual Xeon box is so much more expensive that it is influencing the purchasing decision in a huge way. A 10% higher price is not really that important, just like a 10% performance boost should be put into perspective.
  • alpha754293 - Wednesday, February 11, 2009 - link

    Actually. I've already ran the early numbers on price/performance.

    Normally, the expectation would be that the much newer dual Xeon system would cost more than the already existing quad-Socket F option.

    In this case, I think that Intel actually released (or targetted release) in order to take aim directly at the 4S AMD market.

    On the other hand then, I think that people really really need to take into consideration that the SAP benchmark is the first of what I'm sure will be MANY to follow.

    So, just keep that in mind.
  • drothgery - Wednesday, February 11, 2009 - link

    Hmm... it looks to me (on a cursory glance; I've never seriously priced out anything bigger than a dual-socket box) like quad-socket boxes are much, more expensive than dual-socket boxes. To the point where that completely overwhelms the 'new and shiny' price penalty for cutting-edge hardware.
  • rv968 - Thursday, February 12, 2009 - link

    Also relevant, in the server market it is common for the cost of application licensing (per CPU) to be much greater than the cost of the hardware. Oracle DB EE for example lists at over $20K per Intel core.
    http://www.oracle.com/corporate/pricing/technology...">http://www.oracle.com/corporate/pricing/technology...
  • JohanAnandtech - Wednesday, February 11, 2009 - link

    Dual Xeon vs Dual Opteron thus. (I have to ask John for an edit function ;-)
  • strikeback03 - Wednesday, February 11, 2009 - link

    Wow, and here I thought fanboiism was only in the consumer space. Guess not.
  • carniver - Wednesday, February 11, 2009 - link

    So why don't you post your price estimates for both? I have a hard time believing a dual nehalem can cost much more than a quad opteron system
  • defter - Wednesday, February 11, 2009 - link

    Pricing for Nehalem Xeons has been released long time ago: http://www.xbitlabs.com/news/cpu/display/200811180...">http://www.xbitlabs.com/news/cpu/displa...ehalem_P...
    Cheapest (2.26GHz) 8-thread Xeon will cost only $373 while 2.93GHz model mentioned in the article costs $1386.

    You can see AMD's pricing here: http://www.amd.com/us-en/Processors/ProductInforma...">http://www.amd.com/us-en/Processors/ProductInforma...

    Chepeast 1.8GHz 4-way Opteron CPU costs $523, 2.7GHz model costs $2149.

    Then we need to take into account that 4-way motherboard costs much more than 2-way (offsetting higher cost of memory).

    Here are some price examples (without motherboard):
    cheapest dual Nehalem: 2*373 = $746
    cheapest quad Opteron: 4*523 = $2092

    Models mentioned in the article (2.93GHz Nehalem, 2.7GHz Opteron)
    dual Nehalem: 2*1386 = $2772
    quad Opteron: 4*2149 = $8596, over three times as much!

    AMD needs to cut 4-way Opteron prices really, really low in order to be even remotely competitive.
  • jmurbank - Thursday, February 12, 2009 - link

    The cheapest Opteron (Shanghai) 2376 2.3 GHz processor is around $380. An Opteron (Shanghai) 2384 2.7 GHz is around $950. Sure I only calculated for 2P versions not 8P versions. 8P versions are for systems needs over 60 processors. Most setups will do just fine on 2P setups or setting up a cluster server with 2P versions.

    If you really want to compare pricing and power consumption with theoretical pricing from Intel. Intel Xeon DP E5540 may cost a little less, but the Opteron has a little edge of power consumption. Though the Opteron (Shanghai) was not supposed to compare to Intel Xeon DP. They were supposed to compare to Xeon MP. I will be surprise that Opteron (Shanghai) compares to some Xeon DP.

    AMD HyperTransport sets the processors in a mesh like network. This creates complexity and (I think) requires fabrication quality to be higher for 8P versions.

Log in

Don't have an account? Sign up now