Future Visions, Cont: POWERed by NVIDIA

We have to check for ourselves of course, but IBM claims that compared to a dual K80 setup, a dual P100 gets a 2.07x speedup on the S822LC HPC. The same dual P100 on a fast Xeon with PCIe 3.0 only saw a 1.5x speedup. The benchmark used was a rather exotic Lattice QCD, or an approach to "solve quantum chromodynamics".

However, IBM reports that NVLink removes performance bottlenecks in

  1. FFT (signal processing)
  2. STAC-A2 (risk analysis)
  3. CPMD - computational chemistry
  4. Hash tables (used in many algorithms, security and big data)
  5. Spark

Those got our attention as, they are not some exotic niche HPC applications, but wide spread software components/frameworks used in both the HPC and data analytics world.

NVIDIA also claims that thanks to NVLink and the improved page migration engine capabilities, a new breed of GPU accelerated applications will be possible. The unified memory space (CUDA 6) introduced in Kepler was a huge step forward for the CUDA programmers: they no longer had to explicitly copy data from the CPU to the GPU. The Page Migration Engine would do that for them.

But the current system (Kepler and Maxwell) also had quite a few limitations. For example the memory space where the CPU and GPU are sharing data was limited to size of the GPU memory (typically 8-16 GB). The P100 now gets 49-bit virtual addressing, which means CUDA programs can thread every available RAM byte as one big virtual space. In the case of the newly launched S822LC, this means up to 1 TB of DRAM, and consequently 1 TB of memory space. Secondly, the whole virtual address space is coherent thanks to the new page fault mechanism: both the CPU and GPU can access the DRAM together. This requires OS support, and NVIDIA cooperated with the Linux community to make this happen.

Of course as the unified memory space gets larger, the amount of data to transfer back and forth gets larger too and that is where NVLink and the extra memory bandwidth of the POWER8 have a large advantage. Remember that even the POWER8 with only 4 buffer chips delivered twice as much memory bandwidth than the best Xeons. The higher end POWER8 have 8 buffer chips, and as a result offer almost twice as much memory bandwidth.

NVLink, together with the beefy memory subsystem of the POWER8, ensures that CUDA applications using such a unified 1TB memory space can actually work well.

The POWER8 - al heatsinks - looks less hot headed now that it has the companion of 4 Tesla P100 GPUs...

The S822LC will cost less than $50000, and it offers a lot of FLOPS per dollar if you ask us. First consider that a single Tesla P100 SXM2 costs around $9500. The S822LC integrates four of them, two 10-core POWER8s and 256 GB of RAM. More than 21 TFLOPS (FP64) connected by the latest and greatest interconnects in a 2U box: the S822LC HPC is going to turn some heads.

Last but not least, note that once you add two or more GPUs which consume 300W each, the biggest disadvantage of the POWER8 almost literally melts away. The fact that each POWER8 CPU may consume 45-100W more than the high performance Xeons seems all of a sudden relative and not such a deal breaker anymore. Especially in the HPC world, where performance is more important than Watts.

Future Visions: POWER8 with NVLink Back to the Present: Real World Application Benchmarking on IBM's S812LC
Comments Locked


View All Comments

  • Eden-K121D - Thursday, September 15, 2016 - link

    Can't wait for Power9
  • Kevin G - Thursday, September 15, 2016 - link

    Same here. I'm really curious about the differences between the four different dies IBM will be offering. Certainly the mix of two core types and IO types should fill the assorted niches found in the server market.
  • rahvin - Thursday, September 15, 2016 - link

    I can wait, it will be a market share failure like every other power because IBM will price it out of reach of any sensible price range. Going by previous attempts it will cost anywhere from 5-10X as much as an equivalent amount of x86 processing power. Something like $10K for the processor and a another $2-5 for the case, memory and motherboard and it will be equivalent to a quad x86 Xeon server that costs $5k for the same hardware.

    No one that doesn't need some special sauce it provides will buy them, particularly because you'd have to recompile all your software to use it. IBM has screwed up power so many times at this point that you'd have to be a fool to bet on it.
  • Eden-K121D - Friday, September 16, 2016 - link

    Tell that to Google
  • Brutalizer - Friday, September 16, 2016 - link

    Power9 will be 50% - 125% faster than power8, according to IBM.
    On average it will be 75% faster.

    The specjbb2013 benchmark is broken, SPEC discovered the benchmark can be vendor optimized to provide false results so they fixed it in specjbb2015. IBM have released specjbb2015 numbers for their S812LC server achieving 44.900 for max-jops and 13.000 for crticial-jops. That is almost as good as the Intel Xeon E5-2699v4 result. However, what is interesting is the critical-jops, which measures critical throughput under SLAs. IBM have said they will compete with Intel, with their power9.

    (Of course, one SPARC M7 cpu achieves 120.600 max-jops and 60.300 critical-jops, that is 2.7x faster max-jops and 4.6x faster critical-jops. This is not using the built in hardware accelerators in SPARC. Next year the SPARC M8 arrives, which is 2x faster than M7. Today, Oracle have released six cpus in six years, each doubling performance (except the low cost S7, which is a crippled M7))
  • wingar - Friday, September 16, 2016 - link

    I do like how you come with a comment that's incendiary towards POWER8 and POWER9, doing what you can to make it look worse... and then start touting how magical and wonderful SPARC M7 is. Using the same old Oracle-supplied performance claims without substantiating it. Funny, that. I think it stands out a little bit...

    But that's not what matters. If you run a simple google search, "site:anandtech.com brutalizer", you'll find comments with not a lot of variety. Usually commenting on anything x86 and POWER8, and in every single one (Except this one, actually! You actually reference an IBM supplied Spec result. However, you should link to it next time.) you tout the wonder of the latest SPARC of the time. Linking to Oracle-supplied benchmarks, on Oracles own site consistently concluding that Oracle outperforms their competitors. And every time you do this the comment seems to be as close to the top of the comment list as possible, for visibility.

    Have some links.

    But I found a couple of comments you left that anti-everyone-not-Oracle. Have some links.

    I'm sure there's more comments like this where you're actually adding to the conversation but those are the few I found, and they're always unrelated to CPUs and the server market. They seem to perhaps reflect your own interests? But there is one thing to point out here and that the first religiously-pro-Oracle comment you made seemed to be in 2014. What happened then? Did you buy the account? Did someone start paying you? I don't know.

    And hey, for fun I've actually posted this comment before to you, here's a link:
  • Brutalizer - Friday, September 16, 2016 - link

    I am not doing something to make power look worse, I put it in perspective and post other benchmark numbers from Intel and Oracle so people can compare. Yes, I am posting hard facts that can be indendently verified, or are you rejecting the benchmarks I post? Why? Why do you think it is a bad thing I post benchmarks from other vendors than IBM? You dont want people to be able to build their own opinion about power by comparing with other vendors? Why not? Why is it dangerous when someone quote benchmarks from other vendors? Whats the problem with that?

    If you insist, here is the SPARC M7 specjbb2015 results.
  • PowerOfFacts - Friday, September 16, 2016 - link

  • Brutalizer - Friday, September 16, 2016 - link

    "...Using the same old Oracle-supplied performance claims without substantiating it..."

    Now this is the same old FUD from the IBM supporters. As i have explained, mathematicians can always prove their claims with links to benchmarks, white papers, resaerch papers, or point to common comp sci knowledge, etc. So you are in deep sh-t now. I can always post links to the numbers I claim. You claim I can not, and I spread unsubstantiated information - now you are lying about me.

    Quote me on any number in any post - and I will post links to prove my numbers. If you ever find any post (you will not find any) where I make up numbers out of the blue to discredit IBM or Intel, you are correct that I post unsubstantiated claims. If you can not find any such posts by me, you are spreading FUD about me, and you lie about me. Now go ahead and quote me on any number where I make out things. I am waiting.

    You are not really smart to claim a mathematician to not be able to prove his figures. I am now able to prove you are a liar and FUDer.

    I think it is funny how the IBM supporters always FUD and try to discredit people, instead of countering the benchmark numbers. I post benchmark numbers, and instead of try to discuss the numbers you always attack me. That is not the scientific way, to avoid the hard facts and instead try to discredit the opponent. You should instead try to dissect my numbers and links instead of attacking me. But always, always, the IBM crowd does that " oh, he is an Oracle supporter" - so what? You are an IBM supporter! The difference is that I post numbers, and IBM crowd attacks me instead of countering with other numbers.

    If you want to disprove my claims about Sparc, post numbers that disproves my benchmarks. Do not attack me, that does not win you any discussions.
  • SarahKerrigan - Friday, September 16, 2016 - link

    Sure, it's true that on SPECjbb2015 a T7-1 beats a low-end IBM Turismo machine, an S812LC (with an entry price under $5000 list, compared to over $30000 entry price for the T7-1), by a factor of 2.7x on max-jops. It's also true that M7 came out almost a year and a half after P8 did, and that you can get a dual-CPU P8 server with that same processor, and 256GB RAM, for well under half of the list price of a single-CPU T7-1 with 128GB.

    Starting to see why IBM has over 70% of the non-x86 server market?

Log in

Don't have an account? Sign up now