AMD back in the quad socket race

Name: AMD back in the quad socket race
Item: AMD back in the quad socket race
Author: Johan De Gelas

by Johan De Gelas on April 9, 2008 12:00 AM EST

Posted in
IT Computing general

40 Comments | Add A Comment

40 Comments

Finally. 7 months after the introduction of Intel's "Tigerton" Xeon 73xx series, AMD has an answer to the quad socket, quad-core Intel platform. The importance of the quad socket market cannot be understated for AMD. The quad socket market is only 10% of the x86 server CPU market (shipments), but it accounts for roughly 20% of the revenues! And it has been AMD's stronghold for years now: at the moment, AMD still holds about 42% of this market. The 4P product line is probably keeping AMD afloat....

AMD launches the B3 "no-TLB bug" Opterons today, with clock speeds of 2.3GHz (8356), 2.2GHz (8354) and 2GHz (8350). Hotheaded (125W) 2.5GHz and 2.4GHz Special Editions will follow. We are preparing a full AMD vs. Intel 16-core benchmark fest, but the boards and servers that will house our Opteron 8356 CPUs still haven't arrived.

Let us take a look at Intel's and AMD's 1K pricing:

Server CPU Pricing
CPU	Price	Intel CPU	Price
Opteron 8360 SE 2.5GHz (125W, 4x0.5 MB L2 + 2MB L3)	$2149	Xeon X7350 2.93GHz (130W, 2x4MB L2)	$2301
Opteron 8358 SE 2.4GHz (125W, 4x0.5 MB L2 + 2MB L3)	$1865	Xeon X7340 2.4GHz (80W, 2x4MB L2)	$1980
Opteron 8356 2.3GHz (95W, 4x0.5 MB L2 + 2MB L3)	$1514	Xeon X7330 2.4GHz (80W, 2x3 MB L2)	$1391
Opteron 8354 2.2GHz (95W, 4x0.5 MB L2 + 2MB L3)	$1165	Xeon X7310 2.13GHz (80W, 2x2 MB L2)	$1177
Opteron 8350 2.0GHz (95W, 4x0.5 MB L2 + 2MB L3)	$873	Xeon X7310 1.6GHz (80W, 2x2 MB L2)	$856

The Opteron 8354 and 8350 look like the most competitive offerings; they have a small clock speed advantage over the comparable Intel CPUs and about the same amount of cache. As we have discussed in depth in our 2P Opteron 23xx versus Intel Xeon 54xx review, quad-core Intel is the best processor in all CPU intensive tasks (rendering, chess, SPECint, financial simulations...). Meanwhile, the quad-core Opteron is best in some memory and FP intensive workloads (many HPC applications).We don't expect anything to change with the B3 Barcelona cores, but there are still two question marks: who will win the server (OLTP, Warehouse) and virtualization benchmarks? We will find out in a few weeks.

AMD also launched their B3 23xx series, but frankly, we are disappointed that AMD's fastest quad-core is still only at 2.3GHz; AMD promised 2.5GHz months ago! 2.5GHz really is necessary to be competitive with Intel, who passed the 3GHz quad-core wall back in 2007. Even worse, Intel already has 50W parts at 2.5GHz. AMD is in defensive mode in the 2P market, and its only remaining weapon is aggressive pricing.

Things are looking better in the 4P market however. AMD's platform scales better, at least until Intel's Nehalem arrives - and Xeon "Nehalem" MP CPUs won't be available until 2009. In addition, AMD's newest quad-core has to compete with Intel's 65nm CPUs that are limited to 2.4GHz at 80W TDP for now. AMD has a narrow window to make a good impression in the quad socket market, ramp up clock speeds, and prepare for Intel's Dunnington in Q3. With up to 16MB L3 cache and six cores per die, Dunnington looks massive - but perhaps also a bit expensive.

We're not the only ones that have noticed AMD most likely has (we're not convinced until we see all our tests J) a competitive quad socket CPU. HP is the most enthusiastic tier-one OEM with two quad Opteron models:

A "classic" 4U HP ProLiant DL585 G5
A rather amazing quad socket HP ProLiant DL685 G5 blade

The fast growing blade market seems to like the third generation Opteron. As the 5000V chipset with DDR2 support is not available for the Xeon Tigerton, the latter is a bit harder to cool in a cramped blade environment. However, HP does have a quad Tigerton blade, the HP ProLiant DL680c G5 blade.

Eight blades in a 10U blade chassis (two HDs per blade) is not bad, but HPC specialist Supermicro does even better with 10 x 16 cores in a 7U enclosure (one HD per blade). Rackable, Appro, and Synnex also launch their newest Opteron models today.

According to AMD and HP, The HP ProLiant DL585 G5 set a new performance record for 4-socket, x86-based systems in TPC-C Price/tpmC, and the HP ProLiant BL685c G5 server set a new record for SPECfp_rate2006. HP's 8356 system scored 147 baseline while the best Intel based result is around 108. However, it should be noted that Specfp_rate 2006 exaggerates the importance of memory bandwidth. SPECFP2006 already runs with a rather large footprint, and if you run 16 instances in parallel....

Although the 83xx series will perform excellently in HPC, we don't believe that the difference will be this large, even in memory intensive applications. We definitely need some good independent benchmarking. Stay tuned and add http://it.anandtech.com to your bookmarks!

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

40 Comments

View All Comments

tshen83 - Wednesday, April 9, 2008 - link
Is indeed more impressive compared to Intel's simply because of the memory controller and its scalability.

However, one thing must be said that 4S market will be dying off because 2S systems are the most economical from a price/watt/dollar perspective. In the 2S market, Intel's Xeons are priced in parity compared to its Core 2 products which makes it very attractive. Let's face it, google is buying 2S systems from Intel, the decision must be right.

Scalability now is solved by software such as clustering techniques: Hadoop and Google Map Reduce programming paradigms.
Starglider - Thursday, April 10, 2008 - link
> However, one thing must be said that 4S market will be
> dying off because 2S systems are the most economical from
> a price/watt/dollar perspective.

Maybe if you have copious rackspace. For may users rackspace is at a premium; decemt co-lo is expensive. We're about to put in a couple of new 16-core servers, possibly Opterons (which is why I'm looking forward to this review). The reason is that it's the maximum CPU power you can cram into 1U (without going to blades, which aren't appropriate for us ATM).
tshen83 - Thursday, April 10, 2008 - link
"Maybe if you have copious rackspace. For may users rackspace is at a premium"

So you pay a 4x premium on the CPUs to save 1U of space? That's about the best argument I have heard.

If you are so cash strapped to have to put 16 CPUs in 1U, you obviously have no idea about data center cooling requirements. Not many data centers currently give you that density. A 4S quad Opteron server would draw about 800-1000W(125W*4 + 10W per stick of memory *16 sticks + 12W per hard drive *8 drives) That is between 6-8A of current per 1U. There is many data center I am aware of that will provision 300A+ per rack(40U* 8A per U) Most datacenters will give you 20A for the whole rack. So you can probably put 3 of those systems in a entire rack which means the 1U meaningless.
joekraska - Sunday, April 13, 2008 - link
> Most datacenters will give you 20A for the whole rack.

Mmmm. Older data centers might feature dual 208 20A or dual 115 30A. But the trend is now away from this.

Defaults are to dual 208 30A, and that's only for "low density".

For high density, you'll see AT LEAST four L6-30's above the rack now, and you'll start seeing two to four L21-30 (30A 3 phase) or even 35A 3 phase to the rack quite frequently with newer high density deployments.

Frankly, powering high density deployments is getting so challenging, that I'm expecting > 35A 3 phase within 24 months. How we avoid killing data center facilities workers, I haven't figgered out yet. :-)

Anyway, whichever poster said the cost economics aren't there for 4S servers was correct. Up until the present. For example, if you are buying a Dell R900 over two Dell 2950iii's to run virtual servers you are making a grave financial error. It will, however, be interesting to see if AMD's newer system can change that equation.

Joe.
DigitalFreak - Thursday, April 10, 2008 - link
And the BS keeps on a-comin.
Creig - Thursday, April 10, 2008 - link
I think you need to shut up now...
Starglider - Thursday, April 10, 2008 - link
> So you pay a 4x premium on the CPUs to save 1U of space?

The 8-socket Opterons are between two and three times as expensive as the 2-socket ones depending on model.

> A 4S quad Opteron server would draw about 800-1000W (125W*4
> + 10W per stick of memory *16 sticks + 12W per hard drive
> *8 drives)

That is ridiculous. 125W is a theoretical maximum, in practice even loaded CPUs are unlikely to exceed 80W. Registed DDR2 draws about 4 watts a stick, not 10 (were you thinking of FB-DIMM). I doubt it's physically possible to get 8 3.5" drives into a 1U case along with a 4S motherboard; the most I've seen is 4, though 12W is a reasonable loaded power draw. Including motherboard and fan power draw and power supply inefficiency, that's about 600W for a fully loaded server, which is two and a half amps (yes, I live in a country with a sane mains supply voltage). Of course full load is only experienced for a few hours a day; most of the time the power draw would be down below 300W.

> Most datacenters will give you 20A for the whole rack.

I don't know what worthless provider you're using that can only manage 2.5 KW/rack, but ours allows 400 watts per 1U server. We may have to ask for an upgrade to that if low voltage chips aren't an option.
Justin Case - Wednesday, April 9, 2008 - link
Assuming you need the processing power of 16 CPU cores, how is a single 16-core system less efficient per watt / dollar than two separate 8-core systems? A single "big" system lets you dynamically allocate resources (either within a single OS or through virtualization) and run much more efficiently. Not to mention it's cheaper (the only element that's more expensive is the main board, everything else costs the same, and doesn't need to be duplicated).

As to "scalability as been solved", maybe you should tell that to all the "morons" in the HPC field that are still using (and planning, and designing, and putting together) supercomputers...?
JohanAnandtech - Wednesday, April 9, 2008 - link
Google is a special case, as most of the applications they are running have almost infinite scalability. However, a lot of servers are also sold to consolidate smaller ones on. 4 socket systems might then be much more interesting than buying dual socket ones. You have twice the memory, twice the CPU power and a whole of other "big box" advantages such easier serviceability (adding another network controller or Disk controller) etc.

4-socket have probably a good future ahead thanks to virtualization and the fact that they also gotten smaller: you get 4 socket systems in 2U (with lots of expandability), 1U and even blades.
tshen83 - Wednesday, April 9, 2008 - link
Johan:

Your argument that Google and facebook's decision to buy 2S systems is a special case then you are seriously mistaken. 4S systems have a very small niche market.

That niche market is for customers who have dependence on non-clustered software packages that can scale vertically to 16 cores. There aren't many types of software that cannot cluster but can scale linearly on 16 cores. Database and Virtualization are the only two types where it can be scaled vertically.

For database apps, ram rules, and memcache basically took care of distributed memory cache for database. So having extremely large amount of memory locally on the DB server is now mitigated. Plus, given the pricing on the 4S CPUs, it is better to buy four 2S nodes and use DB cluster(MySQL NBD Engine) or simply Master-Slave configuration to handle load than having a single one. You needed redundancy anyways.

For virtualization, I have not seen any virtualization software package that can virtualize multiple-CPU systems reliably, simply due to CPU cycle contention issues. What I mean by that is that, ideally, you want to be able to virtualize 16 systems, each with 16 CPUs, on a 16CPU system. In that setup, every virtualized system can get access to maximum 16 cores, so every virtualized system can potentially take over the entire server should the load increase. Right now, all virtualization software are good at virtualizing 1-CPU machine as a single thread on the host OS scheduler. Even VMware's 2CPU support is buggy, and you will notice performance degradations when the total virtualized CPU is greater than the physical CPUs you have. That limits the potential power of the virtualized system.

On the hardware side, you failed to mention the nonlinear scaling of the Opterons at the 4S level(Intel too for that matter). The only chipset I am aware of that can support 4S and 8S Opterons are from Nvidia and as far as I know, is HyperTransport 1 based. You need HyperTransport 3's bandwidth to be able to scale Opterons to 4S linearly.

AMD back in the quad socket race

Post Your Comment

40 Comments

View All Comments

tshen83 - Wednesday, April 9, 2008 - link

Starglider - Thursday, April 10, 2008 - link

tshen83 - Thursday, April 10, 2008 - link

joekraska - Sunday, April 13, 2008 - link

DigitalFreak - Thursday, April 10, 2008 - link

Creig - Thursday, April 10, 2008 - link

Starglider - Thursday, April 10, 2008 - link

Justin Case - Wednesday, April 9, 2008 - link

JohanAnandtech - Wednesday, April 9, 2008 - link

tshen83 - Wednesday, April 9, 2008 - link

Log in

Don't have an account? Sign up now