The Intel Xeon D Review: Performance Per Watt Server SoC Champion?by Johan De Gelas on June 23, 2015 8:35 AM EST
- Posted in
The days that Intel neglected the low end of the server market are over. The most affordable Xeon used to be the Xeon E3: a desktop CPU with a few server features enabled and with a lot of potential limitations unless you could afford the E5 Xeons. The gap, both in performance and price, between Xeon E3 and E5 is huge. For example - a Xeon E5 can address up to 768 GB and the Xeon E3 up to 32 GB. A Xeon E5 server could contain up to 36 cores, whereas Xeon E3 was limited to a paltry four. And the list is long: most RAS features, virtualization features were missing from the E3, along with a much smaller L3-cache. On those terms, the Xeon E3 simply did not feel very "pro".
Luckily, the customers in the ever expanding hyperscale market (Facebook, Amazon, Google, Rackspace and so on) need Xeons at a very large scale and have been demanding a better chip than the Xeon E3. Just a few months ago, the wait was over: Xeon D fills the gap between the Xeon E3 and the Xeon E5. Combining the most advanced 14 nm Broadwell cores, a dual 10 gigabit interface, a PCIe 3.0 root with 24 lanes, USB and SATA controllers in one integrated SoC, the Xeon D has excellent specs on paper for everyone who does not need the core count of the Xeon E5 servers, but who simply needs 'more' than the Xeon E3.
Many news editors could not resist calling the Xeon D a response to the ARM server threat. After all, ARM has repeated more than once that the ambition is to be competitive in the scale-out server market. The term "micro server" is hard to find on the power point slides these days; the "scale-out" market is a lot cooler, larger and more profitable. But the comments of the Facebook engineers can quickly brings us back to reality:
"Introducing "Yosemite": the first open source modular chassis for high-powered microservers"
"We started experimenting with SoCs about two years ago. At that time, the SoC products on the market were mostly lightweight, focusing on small cores and low power. Most of them were less than 30W. Our first approach was to pack up to 36 SoCs into a 2U enclosure, which could become up to 540 SoCs per rack. But that solution didn't work well because the single-thread performance was too low, resulting in higher latency for our web platform. Based on that experiment, we set our sights on higher-power processors while maintaining the modular SoC approach."
It is pretty simple: the whole "low power simple core" philosophy did not work very well in the real scale out (or "high powered micro server") market. And the reality is that the current SoCs with an ARM ISA do not deliver the necessary per core performance: they are still micro server SoCs, at best competing with the Atom C2750. So currently, there is no ARM SoC competition in the scale out market until something better hits the market for these big players.
Two questions remain: how much better is the 2 GHz Xeon D compared to the >3GHz Xeon E3? And is it an interesting alternative to those that do not need the high end Xeon E5?
Post Your CommentPlease log in or sign up to comment.
View All Comments
JohanAnandtech - Wednesday, June 24, 2015 - linkHi Patrick, the base clock of our chip is 2 GHz, not 1.9 GHz as the one pre-production version that we got from Intel. I have to check the turboclocks though, but I do believe we have measured 2.6 GHz. I'll doublecheck.
pjkenned - Wednesday, June 24, 2015 - linkAwesome! Our ES ones were 1.9GHz.
Chrisrodinis1 - Tuesday, June 23, 2015 - linkFor comparison, this server uses Xeon's. It is the HP Proliant BL460c G9 blade server: https://www.youtube.com/watch?v=0s_w8JVmvf0
MrDiSante - Wednesday, June 24, 2015 - linkWhy use only -O2 when compiling the benchmarks? I would imagine that in order to squeeze out every last bit of performance, all production software is compiled with all optimizations turned up to 11. I noticed that their github uses -O2 as an example - is it that TinyMemBenchmark just doesn't play nice with -O3?
JohanAnandtech - Wednesday, June 24, 2015 - linkThe standard makefile had no optimization whatsoever. If you want to measure latency, you do not want maximum performance but rather accuracy, so I played it safe and used -O2. I am not convinced that all production software is optimized with all optimization turned on.
diediealldie - Wednesday, June 24, 2015 - linkIntel seems disARMing them... X-Gene 2 doesn't look so promising, as they'll have to fight mighty Skylake-based Xeons, not Broadwell ones.
Thanks for great article again.
jfallen - Wednesday, June 24, 2015 - linkThanks Johan for the great article. I'm a tech enthusiast, and will never buy or use one of these. But it makes great reading and I appreciate the time you take to research and write the article.
JohanAnandtech - Wednesday, June 24, 2015 - linkHappy to read this! :-)
TomWomack - Wednesday, June 24, 2015 - linkThis looks very much consistent with my experience; the disconcertingly high idle power (I looked at the board with a thermal camera; the hot chips were the gigabit PHY, the inductors for the power supply, and the AST2400 management chip), the surprisingly good memory performance, the fairly hot SoC (running sixteen threads of number-crunching I get a power draw of 83W at the plug) and the generally pretty good computation.
I'm not entirely sure it was a better buy for my use case than a significantly cheaper 6-core Haswell E - Haswell E is not that hot, electricity not that expensive, and from my supplier the X10SDV-F board and memory were £929 whilst Scan get me an i7-5820K board, CPU and memory for £702. And four-channel DDR4 probably is usefully faster than two-channel for what I do.
I quite strongly don't believe in server mystique - the outbuilding is big enough that I run out of power before I run out of space for micro-ATX cases, and I am lucky enough to be doing calculations which are self-checking to the point that ECC is a waste of money.
JohanAnandtech - Wednesday, June 24, 2015 - linkHi Tom, I believe we saw up to 90 Watt at the wall when running OpenFOAM (10 Gbit enabled). It is however less relevant for such a chip which is not meant to be a HPC chip as we have shown in the article. HPC really screams for an E5.