We were quite amazed, even slightly suspicious, when HP and Fujitsu-Siemens Published their SAP numbers. These numbers showed that the newest Xeon X5570 (Nehalem EP) series offer an enormous performance boost over the Xeon X5470 (Harpertown). After all, an almost 100% improvement at a slightly lower speed (2.93 GHz vs 3.3 GHz) is nothing short of amazing. Turns out that the real clockspeed is 3.2 GHz (2.93 GHz + 266 MHz turbo) but that does not alter the fact that these are truly incredible performance numbers.

I can now confirm that there are no tricks behind these numbers: they paint the right picture about the Xeon Nehalem EP. Talking to SAP benchmarking specialists, it became clear that few tuning tricks exist that are not know to the big OEM. The benchmark has been analyzed and tuned so well, that even the use of a different database (for example MS SQL instead of DB2) only makes a 2 to 3% difference most of the time. So you might even compare SAP numbers which are obtained on different databases. To resume, the SAP numbers can only be really boosted by better hardware (CPU-memory).
 
Now why I am talking so much about SAP benchmarking numbers? It is not like the expensive ERP software is run by everyone.
 
Well, the SAP numbers are showing a dual 2.93 GHz (or 3.2 GHz) Xeon beating the only quad AMD 8384 (Shanghai at 2.7 GHz) score of 22000 we have so far. Granted, a blade server is most of the time a bit slower. But four AMD 8384 2.7 GHz will be in the same league as a dual Xeon X5570, which will be out very soon now.
 
Even worse for AMD is that the SAP benchmark is not some exotic exceptional benchmarking case for the Xeon 55xx series. It shall be no surprise that the HPC numbers will be very impressive too.So it looks like AMD is in a tough spot.
 
What happened? 
As the SAP threads are sharing a lot of data (as is typical for these kind of database driven applications), hyperthreading can not be the only explanation why Nehalem is simply doubling performance and annihilating the competition. SAP benchmarking specialists expect hyperthreading to be good for about one third of the performance boost. We tend to believe these people who performed this benchmark for years now. The reason why it is not one of the "top cases" for hyperthreading on Nehalem is that this OLTP based benchmark spends a lot of time on shared data. Our own Nehalem OLTP benchmarking (Oracle and MySQL) points also in that direction.
 
As we have pointed out before the benchmark also
  • responds very well to low latency cache and memory latency
  • does not care too much about memory bandwith
  • and is very sensitive to "syncing latency".
Since the AMD Shanghai CPU has the same fast way to sync between cores (via the L3-cache) as Nehalem, it can not explain why AMD falls behind. Another explanation is of course that these benchmarks are run on a CPU which uses turbo, which explains about a 5% advantage as the Nehalem CPU actually runs at 3.2 GHz. 
 
Nehalem has faster access to the memory than AMD's latest quadcore (70 ns vs 110 ns), which is probably the second reason why Shanghai falls behind. But AMD will probably have to redesign it's integer execution pipeline significantly before it will catch up with Nehalem (think memory disambiguation for example). Basically, AMD's better NUMA - integrated memory controller platform was hiding this disadvantage. Now that the new Intel platform does not put "the brakes" on the integer execution engine anymore, the superiority of Intel's integer engine is showing.
 
The lack of any form of multi-threading is hurting AMD badly. It is well known that most of these business applications achieve very low IPC (0.2-0.6) and that modern superscalar CPUs have ample execution resources for running two threads in these applications. The results is Simultaneous Multi Threading offers a typical 20 tot 40% performance advantage. And that is huge, considering that you need 25 to 50% more clockspeed to counter that. It is basically a mission impossible for a modern CPU without SMT to outperform a similar superscalar CPU with SMT in OLTP, Java, webserver, rendering and ERP workloads. AMD really dropped the ball there, SMT should have been part of the K10 architecture.
 
Difficult times ahead for AMD
Even if AMD is able to speed up beyond 3 GHz, chances are slim that AMD will be able to compete with the new Nehalem Xeons. Add Turbo mode, hyperthreading, a lower latency memory controller and a better integer core together and you get a performance gap the size of the "Grand Canyon".
 
So does AMD have any chance at all beyond a new architecture in 2011? Is it over and out for AMD in 2009 and 2010? Adding 2 cores at the end of 2009 is a good step in the right direction. But even if AMD executes flawlessly  the 32 nm Xeon Westmere will only give a window of a few months to the AMD hexacore "Istanbul".  Istanbul should appear at the end of 2009, the Westmere Xeon is scheduled for very early 2010.
 
Westmere has few performance optimizations, it seems to be a pretty straight forward shrink. Slightly higher clockspeeds, about 20% lower power consumption, and yet another addition to the ridiculously long list of SSE-instructions in the form of seven new instructions (six instructions are for crypto/AES acceleration). Westmere is only an evolutionary step forward, but the "Grand Canyon" gap that Nehalem EP has made is probably large enough.

 

It is sure that we'll see better (lower) virtualization switching from virtual machine to hypervisor time and some small tweaks in AMD's Istanbul CPU, but it remains unclear if there are any significant performance boosters in the core. So it looks like Intel will own the dual socket space throughout 2009 and 2010, if we may believe the current roadmaps.
 
As the SAP numbers indicate,  even the slowest Intel Xeons will show a large performance gap with the best AMD Opteron's. Is AMD doomed completely? In a large part of the market, yes. AMD's istanbul will make the gap a bit smaller but probably not small enough. 
 
There are some unknown factors that together with one of the few remaining weaknesses (or rather less strong points) of Nehalem that might make it possible that AMD's opteron comes close enough in a particular area of the market. In my next post, I will clarify the one and only opportunity that I see for AMD in the next two years.  Until then, don't shoot the messenger :-).
Comments Locked

35 Comments

View All Comments

  • carniver - Wednesday, February 11, 2009 - link

    Thanks for the data. Though I'd have wished the AMD fanboys can do their homework themselves, that way we don't always have to point out how they don't know what they're talking about.
  • balancedthinking - Thursday, February 12, 2009 - link

    Time and time again people fall for cheap Intel taktiks.

    Intel is doing the same thing they do @ desktop with i7.

    The processors are very cheap, the money is made with the expensive platform. Without information about platformpricing and yes, DDR3 for server is also very expensive, you can not say anything about competition. Only comparing CPU prices from AMD right now vs. Intel prices in 2 months is double bullshit. If nehalem really is that good, prices will drop to a competitive level.

    Intel is not going to make a gift to anyone, they are also not going to devaluate their existing server lineup completely. When Shanghai is loosing to nehalem, everything Intel has to offer right now is going to loose big time to nehalem.

    Do you really believe a 2 socket Nehalem System is going to win every benchmark vs. a 4 socket Shanghai System? Seriously?

    Do you really believe Intel did not choose the sweetspot with releasing those SAP benchmarks? Beating a 4P system with a 2P system sounds great and is "simple" to understand for the average joe.

    Yeah right, Nehalem is going to be 100% faster than any Shanghai in every benchmark and also will use 50% less power doing so? What are you smoking?
  • BaronMatrix - Wednesday, February 11, 2009 - link

    Maybe now they will stop strong arming people and compete on merits. Hopefully though when Atom is done with growth there'll be someone left to actually sell the computers and make a profit.
  • JKflipflop98 - Wednesday, February 11, 2009 - link

    Ah, it appears the F-wads from AMDzone don't agree with the data provided. Of course, anandtech is just a "paid Intel pumper" site. And Intel forces people to buy their products, because AMD parts are actually alot faster and offer far more MEGATASKING PLATFORMANCE!!!11!
  • Griswold - Thursday, February 12, 2009 - link

    Hello Numbnuts!

Log in

Don't have an account? Sign up now