We were quite amazed, even slightly suspicious, when HP and Fujitsu-Siemens Published their SAP numbers. These numbers showed that the newest Xeon X5570 (Nehalem EP) series offer an enormous performance boost over the Xeon X5470 (Harpertown). After all, an almost 100% improvement at a slightly lower speed (2.93 GHz vs 3.3 GHz) is nothing short of amazing. Turns out that the real clockspeed is 3.2 GHz (2.93 GHz + 266 MHz turbo) but that does not alter the fact that these are truly incredible performance numbers.

I can now confirm that there are no tricks behind these numbers: they paint the right picture about the Xeon Nehalem EP. Talking to SAP benchmarking specialists, it became clear that few tuning tricks exist that are not know to the big OEM. The benchmark has been analyzed and tuned so well, that even the use of a different database (for example MS SQL instead of DB2) only makes a 2 to 3% difference most of the time. So you might even compare SAP numbers which are obtained on different databases. To resume, the SAP numbers can only be really boosted by better hardware (CPU-memory).
 
Now why I am talking so much about SAP benchmarking numbers? It is not like the expensive ERP software is run by everyone.
 
Well, the SAP numbers are showing a dual 2.93 GHz (or 3.2 GHz) Xeon beating the only quad AMD 8384 (Shanghai at 2.7 GHz) score of 22000 we have so far. Granted, a blade server is most of the time a bit slower. But four AMD 8384 2.7 GHz will be in the same league as a dual Xeon X5570, which will be out very soon now.
 
Even worse for AMD is that the SAP benchmark is not some exotic exceptional benchmarking case for the Xeon 55xx series. It shall be no surprise that the HPC numbers will be very impressive too.So it looks like AMD is in a tough spot.
 
What happened? 
As the SAP threads are sharing a lot of data (as is typical for these kind of database driven applications), hyperthreading can not be the only explanation why Nehalem is simply doubling performance and annihilating the competition. SAP benchmarking specialists expect hyperthreading to be good for about one third of the performance boost. We tend to believe these people who performed this benchmark for years now. The reason why it is not one of the "top cases" for hyperthreading on Nehalem is that this OLTP based benchmark spends a lot of time on shared data. Our own Nehalem OLTP benchmarking (Oracle and MySQL) points also in that direction.
 
As we have pointed out before the benchmark also
  • responds very well to low latency cache and memory latency
  • does not care too much about memory bandwith
  • and is very sensitive to "syncing latency".
Since the AMD Shanghai CPU has the same fast way to sync between cores (via the L3-cache) as Nehalem, it can not explain why AMD falls behind. Another explanation is of course that these benchmarks are run on a CPU which uses turbo, which explains about a 5% advantage as the Nehalem CPU actually runs at 3.2 GHz. 
 
Nehalem has faster access to the memory than AMD's latest quadcore (70 ns vs 110 ns), which is probably the second reason why Shanghai falls behind. But AMD will probably have to redesign it's integer execution pipeline significantly before it will catch up with Nehalem (think memory disambiguation for example). Basically, AMD's better NUMA - integrated memory controller platform was hiding this disadvantage. Now that the new Intel platform does not put "the brakes" on the integer execution engine anymore, the superiority of Intel's integer engine is showing.
 
The lack of any form of multi-threading is hurting AMD badly. It is well known that most of these business applications achieve very low IPC (0.2-0.6) and that modern superscalar CPUs have ample execution resources for running two threads in these applications. The results is Simultaneous Multi Threading offers a typical 20 tot 40% performance advantage. And that is huge, considering that you need 25 to 50% more clockspeed to counter that. It is basically a mission impossible for a modern CPU without SMT to outperform a similar superscalar CPU with SMT in OLTP, Java, webserver, rendering and ERP workloads. AMD really dropped the ball there, SMT should have been part of the K10 architecture.
 
Difficult times ahead for AMD
Even if AMD is able to speed up beyond 3 GHz, chances are slim that AMD will be able to compete with the new Nehalem Xeons. Add Turbo mode, hyperthreading, a lower latency memory controller and a better integer core together and you get a performance gap the size of the "Grand Canyon".
 
So does AMD have any chance at all beyond a new architecture in 2011? Is it over and out for AMD in 2009 and 2010? Adding 2 cores at the end of 2009 is a good step in the right direction. But even if AMD executes flawlessly  the 32 nm Xeon Westmere will only give a window of a few months to the AMD hexacore "Istanbul".  Istanbul should appear at the end of 2009, the Westmere Xeon is scheduled for very early 2010.
 
Westmere has few performance optimizations, it seems to be a pretty straight forward shrink. Slightly higher clockspeeds, about 20% lower power consumption, and yet another addition to the ridiculously long list of SSE-instructions in the form of seven new instructions (six instructions are for crypto/AES acceleration). Westmere is only an evolutionary step forward, but the "Grand Canyon" gap that Nehalem EP has made is probably large enough.

 

It is sure that we'll see better (lower) virtualization switching from virtual machine to hypervisor time and some small tweaks in AMD's Istanbul CPU, but it remains unclear if there are any significant performance boosters in the core. So it looks like Intel will own the dual socket space throughout 2009 and 2010, if we may believe the current roadmaps.
 
As the SAP numbers indicate,  even the slowest Intel Xeons will show a large performance gap with the best AMD Opteron's. Is AMD doomed completely? In a large part of the market, yes. AMD's istanbul will make the gap a bit smaller but probably not small enough. 
 
There are some unknown factors that together with one of the few remaining weaknesses (or rather less strong points) of Nehalem that might make it possible that AMD's opteron comes close enough in a particular area of the market. In my next post, I will clarify the one and only opportunity that I see for AMD in the next two years.  Until then, don't shoot the messenger :-).
Comments Locked

35 Comments

View All Comments

  • BaronMatrix - Wednesday, February 11, 2009 - link

    Baaaahhhh. The shearer is right over there.
  • BSMonitor - Thursday, February 12, 2009 - link

    Who let this clown start posting again?
  • HelToupee - Wednesday, February 11, 2009 - link

    Even if you may be right, this has anything to do with high-end server CPU's, and hence this article how?
  • Zak - Thursday, February 12, 2009 - link

    This referred to the posters accusing Anandtech of not praising AMD and being an Intel fan site.
  • balancedthinking - Wednesday, February 11, 2009 - link

    I knew that Anandtech is an Intel Website but this article is just ridiculous.

    Taking the absolut sweetspot with SAP (no vitualisation or cloud computing), saying nothing about price (DDR3?), availability (April) or power consumption.

    Just pure Intel promotion and AMD bashing, quite a masterpiece.

    Trying to keep customers from buying Shanghai servers that dominate the complete Intel server lineup today, by promoting the "oh so great" Nehalem without talking about platform pricing and power consumption.

  • melgross - Wednesday, February 11, 2009 - link

    You seem to be the fanboy here.

    Every article about Nehalem vs anything AMD shows about the same thing, that it's only in a very few areas that AMD has a chance, and in even fewer where they are a bit ahead.

    Don't blame Anandtech if other major testing agencies find AMD to be wanting, it's their own fault.
  • DeepBlue1975 - Wednesday, February 11, 2009 - link

    Nonsense.

    Back when AMD introduced its first Athlon 64 server CPUs, Anand's site was among the first to tell the world that Intel would have a hard time in the server market.

    Platform pricing is something OEMs end up determining on the server arena; I think you already know that in large corporations it won't be frequent to find anyone buying bare bones to build up the systems.

    And you certainly should also know that in that kind of scenario, price is not the only determining factor, added to the fact that no company will be switching servers overnight just because something new and better and shinier has come.

  • JohanAnandtech - Wednesday, February 11, 2009 - link

    "Taking the absolut sweetspot with SAP (no vitualisation or cloud computing), saying nothing about price (DDR3?), availability (April) or power consumption. "

    Like another reader already remarked, don't expect huge price differences. About virtualization, read between the lines: I have the necessary data but I can not publish it. Do you think I would write this kind of post if I had no data to back it up? Our longtime readers will know we won't take that risk, it.anandtech.com is not about producing sensational news.

    And lastly you should bring some real proof that I am bashing AMD. I have seen no proof that I am wrong in your post.
  • duploxxx - Thursday, February 12, 2009 - link

    knowing that you can´t use hyperthreading in virtualization, since it realy kills your system on high load configs, turbo mode is also no glance unless they found a way in there VM code to deal with the switching CPU ghz...., guess not for absolute stability. That leaves the raw nehalem performance. ddr3 would provide a benefit with there 3MC, also with the bigger L3 size, but then again for shanghai larger l1-l2 faster l3 so that is a mixed bag. In the end i think it gets to the real technology, AMD has really more experience when it comes to NPT while for intel its the first time they will use EPT and don't forget the enhanced performance on virtualisation switching time in shanghai showed a real big improvement in virtulisation performance, Barcelona time you would require about 10-15% more ghz from an intel 2s hypertown to get the same performance but now with shanghai its allready about 20-25%, this is realy shown in recent vmmarks and becomes just clearer when you actually work with the systems. Istanbul is also not far away and the die size won´t increase that much by just adding the 2 logic cores only,

    Then there is the 4s configs where there is really no competition in VMware, who ever buys a 4s intel for virtual platform is a marketing fool and that will continue for another half a year, tigertown had a short life because amd screwed there quad design and dunnington was dead from the beginning.

    interesting future, we will see soon, whas there a silent hint from intel why they sold there Vmware shares again and started a joined effort with Xen?
  • icrf - Wednesday, February 11, 2009 - link

    I'm curious, is the date that you can tell us more information also under embargo?

Log in

Don't have an account? Sign up now