AMD back in the quad socket raceby Johan De Gelas on April 9, 2008 12:00 AM EST
- Posted in
- IT Computing general
Finally. 7 months after the introduction of Intel's "Tigerton" Xeon 73xx series, AMD has an answer to the quad socket, quad-core Intel platform. The importance of the quad socket market cannot be understated for AMD. The quad socket market is only 10% of the x86 server CPU market (shipments), but it accounts for roughly 20% of the revenues! And it has been AMD's stronghold for years now: at the moment, AMD still holds about 42% of this market. The 4P product line is probably keeping AMD afloat....
AMD launches the B3 "no-TLB bug" Opterons today, with clock speeds of 2.3GHz (8356), 2.2GHz (8354) and 2GHz (8350). Hotheaded (125W) 2.5GHz and 2.4GHz Special Editions will follow. We are preparing a full AMD vs. Intel 16-core benchmark fest, but the boards and servers that will house our Opteron 8356 CPUs still haven't arrived.
Let us take a look at Intel's and AMD's 1K pricing:
|Server CPU Pricing|
(125W, 4x0.5 MB L2 + 2MB L3)
(130W, 2x4MB L2)
(125W, 4x0.5 MB L2 + 2MB L3)
(80W, 2x4MB L2)
(95W, 4x0.5 MB L2 + 2MB L3)
(80W, 2x3 MB L2)
(95W, 4x0.5 MB L2 + 2MB L3)
(80W, 2x2 MB L2)
(95W, 4x0.5 MB L2 + 2MB L3)
(80W, 2x2 MB L2)
The Opteron 8354 and 8350 look like the most competitive offerings; they have a small clock speed advantage over the comparable Intel CPUs and about the same amount of cache. As we have discussed in depth in our 2P Opteron 23xx versus Intel Xeon 54xx review, quad-core Intel is the best processor in all CPU intensive tasks (rendering, chess, SPECint, financial simulations...). Meanwhile, the quad-core Opteron is best in some memory and FP intensive workloads (many HPC applications).We don't expect anything to change with the B3 Barcelona cores, but there are still two question marks: who will win the server (OLTP, Warehouse) and virtualization benchmarks? We will find out in a few weeks.
AMD also launched their B3 23xx series, but frankly, we are disappointed that AMD's fastest quad-core is still only at 2.3GHz; AMD promised 2.5GHz months ago! 2.5GHz really is necessary to be competitive with Intel, who passed the 3GHz quad-core wall back in 2007. Even worse, Intel already has 50W parts at 2.5GHz. AMD is in defensive mode in the 2P market, and its only remaining weapon is aggressive pricing.
Things are looking better in the 4P market however. AMD's platform scales better, at least until Intel's Nehalem arrives - and Xeon "Nehalem" MP CPUs won't be available until 2009. In addition, AMD's newest quad-core has to compete with Intel's 65nm CPUs that are limited to 2.4GHz at 80W TDP for now. AMD has a narrow window to make a good impression in the quad socket market, ramp up clock speeds, and prepare for Intel's Dunnington in Q3. With up to 16MB L3 cache and six cores per die, Dunnington looks massive - but perhaps also a bit expensive.
We're not the only ones that have noticed AMD most likely has (we're not convinced until we see all our tests J) a competitive quad socket CPU. HP is the most enthusiastic tier-one OEM with two quad Opteron models:
- A "classic" 4U HP ProLiant DL585 G5
- A rather amazing quad socket HP ProLiant DL685 G5 blade
The fast growing blade market seems to like the third generation Opteron. As the 5000V chipset with DDR2 support is not available for the Xeon Tigerton, the latter is a bit harder to cool in a cramped blade environment. However, HP does have a quad Tigerton blade, the HP ProLiant DL680c G5 blade.
Eight blades in a 10U blade chassis (two HDs per blade) is not bad, but HPC specialist Supermicro does even better with 10 x 16 cores in a 7U enclosure (one HD per blade). Rackable, Appro, and Synnex also launch their newest Opteron models today.
According to AMD and HP, The HP ProLiant DL585 G5 set a new performance record for 4-socket, x86-based systems in TPC-C Price/tpmC, and the HP ProLiant BL685c G5 server set a new record for SPECfp_rate2006. HP's 8356 system scored 147 baseline while the best Intel based result is around 108. However, it should be noted that Specfp_rate 2006 exaggerates the importance of memory bandwidth. SPECFP2006 already runs with a rather large footprint, and if you run 16 instances in parallel....
Although the 83xx series will perform excellently in HPC, we don't believe that the difference will be this large, even in memory intensive applications. We definitely need some good independent benchmarking. Stay tuned and add http://it.anandtech.com to your bookmarks!
Post Your CommentPlease log in or sign up to comment.
View All Comments
JohanAnandtech - Thursday, April 10, 2008 - link"You need HyperTransport 3's bandwidth to be able to scale Opterons to 4S linearly."
You base this on? HT 1.0 at 1 GHz has 8 GB/s Full duplex (16 GB/s) available, only for syncs and accessing remote memory between the CPUs. Many applications optimize now for NUMA, keeping data close to the processing node, eliminating a lot off inter CPU communication.
tshen83 - Thursday, April 10, 2008 - linkLook, I really don't have much time to argue with you. For a system reviewer, I thought you should be more versed in the technical specifications.
The 8000 series Opterons have 3 coherent HT links, two of which is used for intra-processor communications, and 1 for the connection to the chipset, which means that in a 4S system, any processor can directly talk to 2 additional processors via a HT 1.0 link at 8GB/sec. That means there is 1/4 chance that a processor has to hop twice to get to the last CPU. Granted, NUMA makes that lesser of a problem. However, in any memory intensive application, Database for example, where the entire dataset is cached in the massive memory, the one extra hop is a pain in the butt. Plus 8GB/sec link isn't exactly good match for the 12.8GB/sec(DDR2-800) or 10.6GB(DDR2-667) memory controller each CPU can do.
It is a very easy thing to do. Why don't you benchmark the system with 4S and compare it to the result with 2 of the Sockets turned off and see if the results are linear. I will tell you it isn't. You are getting a 50% scaling for the last 2 CPUs.
MGSsancho - Saturday, April 12, 2008 - linkhttp://en.wikipedia.org/wiki/AMD_Horus">http://en.wikipedia.org/wiki/AMD_Horus
chip links HT groups to make 32 way system. It's used in the back pane of chasies. oh cray uses their seastar chip to make super computers. http://www.cray.com/products/xt4/index.html">http://www.cray.com/products/xt4/index.html. Last I checked they have good buisness.
HP is not a small corporation. They would not invest the resources to make this product if they did not think it could sell well.
JohanAnandtech - Thursday, April 10, 2008 - link"You need HyperTransport 3's bandwidth to be able to scale Opterons to 4S linearly. "
"Look, I really don't have much time to argue with you. For a system reviewer, I thought you should be more versed in the technical specifications. "
Your original statement is so oversimplified, so the moment I challenge it, I Am not versed in tech specifications? That is not making any sense.
Anyway, HT 3.0 will help, but it won't make Opterons scale linearly. How well a system scales depends on the software, as you are well aware. And the reason why I challenged your statement was that I would like to see some preview benchmarks which in which case scaling is so much better.
Your orginal statement states that the limitations of HT 1.0 are automatically the ones that keep the Opteron from scaling in 4S. That has been proved in hard numbers in the 8S space, but I recall no such numbers for the 4S space.
" However, in any memory intensive application, Database for example, where the entire dataset is cached in the massive memory, the one extra hop is a pain in the butt. Plus 8GB/sec link isn't exactly good match for the 12.8GB/sec(DDR2-800) or 10.6GB(DDR2-667) memory controller each CPU can do. "
That 12.8 GB/s is half/duplex. And since the memory bus is a lot less efficient, I wouldn't be surprised, that it is a good match.
"It is a very easy thing to do. Why don't you benchmark the system with 4S and compare it to the result with 2 of the Sockets turned off and see if the results are linear. I will tell you it isn't. You are getting a 50% scaling for the last 2 CPUs. "
With all respect, but what is that going to proof? That it is harder to scale your software with more CPUs? There are many other explanation than HT 1.0 limitations.
It is certainly not going to proof that you have a bandwidth limitation with HT 1.0 in most software.
Yes, one hop less in HT 3.0 is going to help. But it is not going to make software scale linearly. And depending on the software, switching from HT 1.0 to 3.0, results will range from "hardly measurable" to "very significant".
tshen83 - Thursday, April 10, 2008 - linkI don't work for Intel or AMD. But I can tell you surely that distributed system designs will overpower vertical scaling for sure. AMD simply send you 4 CPUs(about 6000 dollars worth) so you can write an article on anandtech to pump the fact that the TLB issue is solved.
Although software cannot scale linearly, but you can still test the linear scalability by running multiple instances of the same software and take the average composite score the software gives you. You will be surprised how bad the scaling is past 2S. In fact the only good thing about 4S and 8S is the independent memory controller that each Opteron comes with, so you can load a ton of memory in the server for an in-memory database, and have a quasi log(n) scaling of the memory subsystem. As I have said, distributed memcache took that advantage away too.
Let me conclude this argument by saying that there is a reason why you write articles about systems. You should really just follow what Google's doing. They are buying 2S systems from Intel. End of argument.
DigitalFreak - Thursday, April 10, 2008 - linkI thought you didn't have any more time to argue?
JohanAnandtech - Thursday, April 10, 2008 - link"Let me conclude this argument by saying that there is a reason why you write articles about systems. You should really just follow what Google's doing. They are buying 2S systems from Intel. End of argument. "
Google is using SATA disks for their database apps. Does this mean that SAS disks are not worth considering for a database app?
"there is a reason why you write articles about systems."
I would love to hear that reason... You have a tendency to generalize, so I am expecting something like that again.
tshen83 - Wednesday, April 9, 2008 - linkWhat I am trying to say so far is that if you need to buy 4S and 8S systems to scale, it means your software infrastructure is behind the curves already. It is going to be prohibitively expensive from this point on to scale vertically. I don't think any DB system admin or Virtualization system admins should run to their boss and say, let's dump 30K on a 4S server because it is the EASIEST way to scale instead of spending some money for a software engineer to tweak the system to scale horizontally from that point on, and get ten $3K 2S nodes.
TheJian - Wednesday, April 9, 2008 - linkSo that means Intel has a FIREHEADED 7350 at 130w then right? Heck you've got it in the same list and you make a comment like that about AMD? Saying Intel has already hit 3ghz in Xeon's but leaving out what TDP they run at is a bit misleading. I feel like I'm being led to believe Intels 3ghz chips are not hotheaded. I'm not saying I love either, just pointing out those kind of statments a sensational at best.
DigitalFreak - Thursday, April 10, 2008 - linkDown, fanboy.