ARM based servers hold the promise of extremely low power and excellent performance per Watt ratios. It's theoretically possible to place an incredible number of servers into a single rack; there are already implementations with as many as 1000 ARM servers in one rack (48 server nodes in a 2U chassis). What's more, all of those nodes consume less than 5KW combined (or around 5W per quad-core ARM node). But whenever a new technology is hyped, it's important to remain objective. The media loves to rave about new trends and people like reading about "some new thing"; however, at the end of the day the system administrator has to keep his IT services working and convince his boss to invest in new technologies.

At first sight, the relatively low performance per core of ARM CPUs seems like a bad match for servers. The dominant CPU in the server market is without doubt Intel's Xeon. The success of the Xeon family is largely rooted in its excellent single-threaded (or per core) performance at moderate power levels (70-95W). Combine this exceptional single-threaded performance with a decent core count and you get good performance in almost any kind of application. Economies of scale and the resulting price levels are also very important, but the server market has been more than willing to pay a little extra if the response times are lower and the energy bills moderate.

A data point proving that single-threaded performance is still important is the evolution of the T-series of Oracle (or Sun if you prefer). The Sun T3 had 16 cores with 128 threads; the T4 however had only 8 cores with 8 threads each, and CEO Larry Ellison touted more than once that single-threaded performance was massively improved, up to five times faster. Do we really need another server with a flock of slow but energy efficient cores? Has history not taught us that a few "bulls" is better than "a flock of chickens"?

History has also shown that the amount of memory per server is very important. Many HPC and virtualization applications are limited by the amount of RAM. The current Cortex-A9 generation of ARM CPUs has a 32-bit address bus and does not support more than 4GB.

And yet, the interest in ARM-based servers is growing, and there is more to it than just hype. Yes, ARM-based CPUs still lack the number crunching power and the massive amount of DIMM slots that Xeon's memory controller can handle, but ARM CPUs score extremely well when it comes to cost and power consumption.

ARM based CPU have also made giant steps forward when it comes to performance. To give you a few data points: a dual ARM Cortex-A9 at 1.2GHz (Samsung Exynos 1.2GHz) introduced in 2011 compresses more than 10 times faster than the typical ARM 11 based cores in 2008. The SunSpider performance increased by a factor 20 according to Anand's measurements on the iPhones (though part of that is almost certainly thanks to browser and software optimizations). The latest ARM Cortex-A15 is again quite a bit more powerful, offering about 50% higher performance. The A57 will add 64-bit support and is estimated to deliver 20 to 30% higher performance. In short, the single-threaded performance is increasing quickly, and the same is true for the amount of RAM that can be addresssed. The ARM Cortex-A9 is limited to 4GB but the Cortex-A15 should be able to address 16GB while the A57 will be able to address a lot more.

It is likely just a matter of time before ARM products can start to chip away at segments of the server market. How much time? The best way to find out is to look at the most mature ARM server shipping today: the Calxeda based Boston Viridis. Just what can this server handle today, where does it have the potential to succeed, and what are its shortcomings? Let's find out.

It's a Cluster, Not a Server
Comments Locked

99 Comments

View All Comments

  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Ok, good question. I'll look into it, as I am definitely considering a follow-up
  • skyroski - Wednesday, March 13, 2013 - link

    I make performance oriented web apps for a living and I was looking forward to this performance test very much. However, I was quite disappointed at how you have done the "real world" test.

    If you're serving a single site you would never put a Xeon through the performance penalties of virtualisation, so I deem your real world results flawed/unusable.

    Basically, if I was to consider buying a Calxeda server tomorrow, I want to know if I can serve a site faster/better by using the "cluster in a box" solution which ARM's partners are going for or if a single Xeon server with standardised dedicated hardware will serve me and my businesses better.

    The other thing that I would have also tested is SSL request performance because Intel has AES-NI built in and I believe ARM has something similar? I would say the majority of request today for a serious web app/site will be traffic using the SSL protocol, so that would also be one of those deciding factors I would look at.

    If I was a cloud host provider your comparison may contain some truth as their business model would be to presumably let each ARM node out as a VPS alternative, but that isn't what you were testing were you?
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    1. The single site: it is not meant to be an environment of one single site. The reason why we use the same site over and over again, is that it makes it easier to interpret the results and more repeatable. Consider a hosting provider who host many similar - but not the same - LAMP sites.
    The repeatable part is the part that most people don't understand very well: we don't just hit the same URL over and over again. We perform real user interactions and randomize them in realworld patterns (like logging in first and then several real actions) and then getting a repeatable benchmark gets very complex.
    2. The SSL comment is definitely good feedback. We are currently writing the connection code for such SSL websites but also need to find one or more good examples. If your site is a good example, maybe we can use yours (even under NDA if necessary) ?
    3. Lastly, the virtualization overhead of ESXi 5 is very small.
  • Kurge - Wednesday, March 13, 2013 - link

    You know, you can host multiple different LAMP sites on bare metal ;)
  • klmccaughey - Wednesday, March 13, 2013 - link

    It won't be LAMP sites any more though - take a trawl through something like the Linode forums to get an idea of what people are building. You are talking higher concurrency and more likely nginx.

    Someone made a valid comment about database sharding - for web apps this is much more likely as people try to make sure they have failover.

    Whilst initially very disappointed, if you imaging the refresh on the ARM cores over the next 2 years (and considering the rate of change due to the phone market) you might actualy be looking at a beast of a machine in two or three iterations. Imagine if you could buy these off the shelf for under $10k: That feels to me like mission critical failover systems in a box. I can see this taking off in a couple of years.
  • klmccaughey - Wednesday, March 13, 2013 - link

    And kudos for the review - I look forward to the follow-up. This is a space that needs watching!
  • Silma - Thursday, March 14, 2013 - link

    True but do you think Intel will stop product development for the next 3 years? In addition who will have the best fabs then? My guess is Intel.
  • Krysto - Monday, March 18, 2013 - link

    I don't know how fast it actually is, but relative to the ARMv7 architecture, AES should be up to 10x faster on ARMv8.
  • kfreund - Wednesday, March 13, 2013 - link

    Nice job, Johan. Can't wait to see your next one; we will be sure to get you an A15 based system as soon as we get it out! Let the debates begin!
  • kfreund - Wednesday, March 13, 2013 - link

    Regarding Stream performance, this is a known limitation of A9; it just can't handle a lot of concurrent memory requests. A15 will nearly triple the memory bandwidth at same DDR rate.

Log in

Don't have an account? Sign up now