Measuring Stream Throughput

Before we start with the real world tests, it is good to perform a few low level benchmarks. First, we measured the bandwidth in Linux. The binary was compiled with the Open64 compiler 5.0 (Opencc). It is a multi-threaded, OpenMP based, 64-bit binary. The following compiler switches were used:

-Ofast -mp -ipa

The results are expressed in GB per second. Note that we also tested with gcc 4.8.1 and compiler options

-O3 –fopenmp –static

Results were consistently 20 to 30% lower with gcc. So we feel our choice for Open64 is appropriate: everybody can reproduce our results (Open64 is freely available) and as the binary is capable of reaching higher speeds, it is easier to spot speed differences between DIMMs. We equipped the HP with a Sandy Bridge EP Xeon (Xeon 2690) and an Ivy Bridge EP Xeon (Xeon 2680v2). Note that the Stream benchmark is not limited by the CPUs at all. All tests were done with 32 or 40 threads.

Stream Triad Bandwidth—Sandy vs Ivy

The extra buffering inside the LR-DIMMs has very little impact on the effective bandwidth. RDIMMs deliver only 3% more bandwidth at 1866 MHz, 1DPC. This bandwidth gap is 0 when we run the same test on our "Sandy Bridge EP" Xeon.

At 3DPC, there is no bandwidth gap at all. Both DIMMs are running at the same speed. Also note that the newer Xeon outperforms the older one by 8 to 33% in this test.

 

Benchmarking Configuration Measuring Latency
Comments Locked

27 Comments

View All Comments

  • subflava - Thursday, December 19, 2013 - link

    Great article...look forward to more enterprise/IT professional based articles from Anandtech in the future. This is very timely for me as my company is just about to pull the trigger on a server upgrade. Interesting stuff.
  • JohanAnandtech - Friday, December 20, 2013 - link

    Thanks for sharing! :-)
  • DERSS - Friday, December 27, 2013 - link

    You guys are seriously super-cool; thanks.
  • wsaenotsock - Thursday, December 19, 2013 - link

    costed?
  • blaktron - Thursday, December 19, 2013 - link

    Good article, although as an enterprise architect, I can tell you the one true benefit to LRDIMMS is in 2 and 4 socket vhost builds, because the double density RAM gives you the freedom to turn off NUMA spanning and still get near-ideal guest density.

    Almost nobody runs caching servers that big, although at almost double performance over a 256GB build (the 100k + concurrent user norm) its kind of attractive to run 2 of these per DC instead of 6 smaller ones (which would actually be the real world comparison with those kind of deltas).
  • mexell - Thursday, December 19, 2013 - link

    Real-world pricing, at least in the enterprise context, is quite a bit off from your numbers. In my employer's price bracket, we regularily buy similar servers as your 24*16GB config for about the same price (13k€) - but including a 3 year subscription VMWare Enterprise license, which is about 6 to 7 k€ on its own. No one pays list price on that kind of hardware.
  • JohanAnandtech - Friday, December 20, 2013 - link

    Are you sure that there is not a big discount on the VMware license? And smaller enterprises will pay something close to the list price. I know that the typical discount is 10-20% for smaller quantities, not more.
  • blaktron - Friday, December 20, 2013 - link

    Depends on the country Johan. The partner channel managers get to decide discounts on partner orders (which he is describing). Also, the bundling discount doesn't happen everywhere, but I could buy that server for like $15k CDN.

    The VMware license cost seems out of this world to me too, because we license our hosts for anyone from 2500 to 5k CDN, depending on their agreement with VMware.
  • mexell - Saturday, December 21, 2013 - link

    I don't really know where exactly the discount is applied, as the licenses are OEM and we don't get line-item pricing. In our market segment (large enterprise with Dell, medium-to-large with HP) we usually see at least 40% off on list prices, in some cases (networking equipment) up to 75%.

    VMWare, on the other hand, is especially rigid with their pricing structure. Two years ago, when we negotiated for a 100 host branch office deployment, they referred to their list pricing. For them, we are not even big enough to speak directly to us.
  • dstarr3 - Thursday, December 19, 2013 - link

    Wow. With 768GB of memory, I bet you could run Crysis.

Log in

Don't have an account? Sign up now