Coinciding with the publication of the Top500 supercomputer list earlier this week, the Top500’s sister list, the Green500, was published earlier this morning. The Green500 is essentially to power efficiency what the Top500 is to total performance, being composed of the same computers as the Top500 list sorted by efficiency in MFLOPS per Watt. Often, but not always, the most powerful supercomputers are among the most power efficiency, which can at times lead to surprises.

Much like the spring Top500 list, the spring Green500 list was dominated by IBM BlueGene/Q systems. With the assembly of a number of new heterogeneous supercomputers since then however, not only has BlueGene/Q been dethroned from the Top500 list, but now the Green500 list as well. In its place on both lists are systems using co-processors from all of the big three: Intel, AMD, and NVIDIA.

For the latest Green500 list, the #1 spot goes to Beacon, a Xeon Phi based supercomputer running at the National Institute for Computational Sciences. At only 44.89 KW, Beacon is a much smaller installation than the likes of the major BlueGene/Q supercomputers or Titan (placing just #253 on the Top500), but Xeon Phi makes a very strong showing here as Intel’s first retail MIC co-processor. Altogether Beacon hit 2499.44 MFLOPS/W, nearly 400 MFLOPS/W higher than the BlueGene/Q computers it has surpassed.

Meanwhile at #2 on the Green500 is King Abdulaziz City for Science and Technology’s SANAM supercomputer, which is the only computer on the list powered by AMD GPUs. SANAM uses 420 of AMD’s recently announced FirePro S10000 cards, which are in turn each composed of 2 of AMD’s Tahiti GPUs. While AMD has had a significant showing in the Green500 list for several years now from the CPU side of things, they have never been a contender as a co-processor vendor, so this is a significant breakthrough for AMD and their first modern GPU compute architecture, GCN. Though much like the Xeon Phi powered Beacon, SANAM is a relatively modest supercomputer; its 2351.1 MFLOPS/W efficiency being spread among only 179KW of total power consumption (making it #52 on the Top500).

Finally, at #3 is Titan, the recently launched Tesla K20X based supercomputer at Oak Ridge National Laboratory. Titan is the current #1 computer on the Top500 list and larger than any other computer on the Green500 list, so along with their top showing on the Top500 list NVIDIA can now also claim to be powering one of the most efficient supercomputers in the world, a significant boost in prestige for their Tesla division. At 2142.77 MFLOPS/W for power efficiency Titan can’t quite match the top Intel and AMD systems, but it’s enough to push past BlueGene/Q just as it did on the Top500 list.

What’s interesting from all of this data is that of the top 10 computers on the Green500 list, the top four computers on the list are all heterogeneous systems using co-processors; the previous 3 systems being Intel, AMD, and NVIDIA, followed by NVIDIA again at #4. Just 6 months ago the CPU-only BlueGene/Q dominated the list, so for Intel, AMD, and NVIDIA to rocket to the top is a significant achievement for GPUs and GPU-like processors. The use of co-processors means that more traditional x86 CPUs are also along for the ride at the top of the list, with Intel and AMD each splitting the top of the Green500 list at 2 each. Ultimately this isn’t the first time heterogeneous systems have had a strong showing on the Green500, but this is the first time they’ve swept the top of the list like this, and marks a major leap in power efficiency for heterogeneous systems that finally puts them on-par with (and beyond) BlueGene/Q.

Source: Green500

Comments Locked


View All Comments

  • MadMan007 - Wednesday, November 14, 2012 - link

    I read this <quote>King Abdulaziz City for Science and Technology</quote> and couldn't help but think of the movie The Dictator.
  • gevorg - Wednesday, November 14, 2012 - link

    are you Jewish?
  • chizow - Wednesday, November 14, 2012 - link

    Seems that's where all the intent is heading after the recent announcements by AMD and Samsung to build servers using ARMv8. There's even been speculation that Nvidia's next-gen GPUs will integrate ARM cores as their command processors directly into their GPU die designs.
  • MrSpadge - Wednesday, November 14, 2012 - link

    The current ARM cores wouldn't be very good for massively parallel crunching. THere are diminshing returns at higher core counts (even for such workloads), so you want as many CPUs which are as fast as you can reasonably make them. That's not ARMs current strength.
  • chizow - Wednesday, November 14, 2012 - link

    The x86 CPUs aren't doing any of the parallel computation either, even Intel's HPC solution relies on Xeon Phi which is based on Intel's failed attempt to enter the GPU market, Larrabee.

    That's the point though, the front-end CPUs just need to be fast serial processors to feed the massively parallel compute engines, the GPU-derived "co-processors". A beefed up ARM processor shouldn't have any problems handling this task, maybe even better than x86 being a RISC processor.

    If you read AT's Titan write-up, you'll see there is only 1 core on each Opteron driving each K20 GPU, so 1x12-core Opteron feeding 12xK20s per rack.

    A good analogy would be the CPU is akin to the farmer driving a massive corn harvester (GPU). Sure the farmer is driving but he's clearly not doing the bulk of the heavy lifting.
  • JKflipflop98 - Saturday, November 17, 2012 - link

    That would depend on the workload. There's some tasks that CPU's are way faster than a GPU. There are functions that a tesla card just flat out can't do.
  • Casper42 - Wednesday, November 14, 2012 - link

    Yeah who would ever build a large cluster of small cores...

    I can tell you as an HP Employee who has seen alot of NDA material about Moonshot, while it may take more cores and definitely more "servers" to equal the computational power of the larger traditional multi core xeons, etc. The overall efficiency in both Power, Cooling and Floorspace is definitely there when its done right.
  • TheJian - Thursday, November 15, 2012 - link

    Google BSC Tegra and you'll see they think Denver/Stark will be great for it. Denver (I thought it was aimed at desktops) may be a specially made arm just for this task, or at least the version they're using might be a hacked denver so to speak.

    FYI: Denver is dev'd IN HOUSE. Not Arm ref model so to speak, much like Apple's A6. So it could be a totally different animal (possible with no gpu for this task). Boulder is also AFAIK but aimed at servers (which strangely or not) NV hid in the gpu division...LOL. Is stark boulder? No idea. YET... :)

    BSC is aiming at topping both the top500 & 500 green list for 2014/2015 and again with stark or better for 2017. This should be an easy task based on where they are now at 28nm and denver being 20nm samsung based. Nobody else AFAIK will be better until mid2014 at 14nm which may take even longer to get into one of these. So they should take the top spot for at least a bit if not longer.

    This is Tegra Feeding Tesla. Rather than the feeding being done by AMD/Intel. I'm starting to wonder if Boulder is a tegra/tesla integration of some sort. But doubtful and I have no proof.
    Was announced back in 2011 (or before, just a quick google here) and saying tegra 2012, which happened so I'm guessing they're still on target for 2014 and 2017 for stark or greater based version as noted here:
    It says tegra 3, but I find that hard to believe with it being old now and Q1 release of Tegra4 (we already have tegra3+) and this 1st machine targets 2014. The mention of stark or better says they're flexible here. I'd think they're flexible on the tegra3 version too if they can do it with stark or better on the next one. But that's an assumption on my part ;)

    The first is expected Sept2014. Ramirez makes some pretty bold claims here:
    3.5x BG/Q. If that's true it's pretty dang potent. Cutting out Intel/AMD for ~4w feeders is definitely a way to help you get there.

    Their technical coordinator of the project seems to think you're wrong. He says they're avoiding the "fast as you can" chips and heading for tons of medium chips :) But I guess you could be smarter than him :) Considering K20x just took the top spot with chips that feed it using far more watts than tegra, I'm thinking this guy knows a thing or two about his predictions and would see to be easily accurate. It may be even easier if you think they will be on a 20nm Samsung process instead of the current 40nm also. That would make his 4w estimate kind of high I think. We are not talking going from 40 to 28nm, but rather 40 to 20nm. That's two process shrinks in one fell swoop and they popped out samples in may this year in austin.

    I think this leaves your statement incorrect. It certainly would make sense at some point to have a feeder on a K20x die (as I think Chizow hints at here) not needing to traverse any bus outside the chip to talk. I'd think that would have to allow you to drop even more power/speed and get the same job done. Similar to integrating a memory controller etc on die. That would have to save complications and complexity in the hardware surrounding it too (ie cheaper & simpler to manage :)). It seems like a no-brainer.
  • protomech - Thursday, November 15, 2012 - link

    8 years ago I helped to stand up a large cluster based on Apple's Xserve G5 computer (dual cpu 2.0 ghz, 3.5 GB DDR).

    The cluster turned in performance around 13 MFLOPS/watt.

    You could replace the entire cluster (several dozen 42U racks) with 4 compute boards from Titan (16 compute nodes), drawing about 0.6% of the original power.

    Very impressive.
  • Gastec - Saturday, November 17, 2012 - link

    Where are the trolls, the "fan boi" (muuuu) , why have they not posted yet something about how Nvidia is the best thing that ever happened to the american buyer since and before iPhone?
    Well, while we wait for them trolls to show their ugly heads let us review again the Green500 podium :
    1st place: INTEL (summa cum laude)
    2nd place: AMD (magna cum laude)
    3rd place: Nvidia (cum)

Log in

Don't have an account? Sign up now