The Xserve Server Platform

The most surprising and even astonishing results of the previous article were, of course, the MySQL and Apache server benchmarks. A powerful Windows XP based client (see above: "Client Configuration: Dual Opteron 250") fires off an enormous amount of Select, grouping and ordering read intensive queries and simulates 1 to 50 concurrent clients. All that query data is sent over a direct Gigabit Ethernet link to the tested server; in this case, a PowerMac Dual G5 2.5 GHz running OS X Server (Tiger). In part I, we discovered that performance of the Apple machine completely collapsed once there were more than 2 concurrent clients.

The solution? Install a Linux distribution to verify our suspicion that the OS is to blame is on the mark. We chose Yellow Dog Linux (YDL). Terra Soft, the company behind Yellow Dog, is an Apple Authorized OEM Value Added Reseller, so you could say that Apple has no objection to installing YDL on your Apple machines. There is more: Terra Soft is specialized in optimizing for the G5 processor. The version that we used, Yellow Dog Linux 4.0.1, is based on the Linux Kernel version 2.6.10-1.ydl.1g5-smp.

Let us see how the Dual 2.5 GHz G5 performed in MySQL when running Yellow Dog Linux. Please note: YDL 4.0 wouldn't run on the 2.7 GHz Apple machine, so we do not have results for that platform.

The difference between the PowerMac running Linux and Mac OS X Server is absolutely striking. Mac OS X server shows better performance going from one to a second connection (and thus thread) because the second CPU steps in and helps carry the load. After that, however, performance completely collapses and stabilizes at around 50 queries per second.

While the G5 is not the best integer processing unit out there, it is not the one to blame for the poor performance that we experienced in our first tests. Running Yellow Dog Linux, the Dual G5 was capable of performing similar to a 3 GHz Xeon. Notice that more concurrent connections gives better performance from 1 to 20. At 5 concurrent simulated users, YDL simply wipes the floor with Mac OS X: 411 versus 113 queries per second. It gets worse at 10 concurrent users: 443 queries per second on Linux versus 62 on Mac Os X. Around 20 connections, performance declines only very slowly just like all the x86/Linux machines.

With the MySQL performance woes now clearly caused by OS X, let us see if Apache tells us the same story. We tested with Apachebench, with "n" being the total of number of connections and "c" the total of concurrent connections:
ab -n 100000 -c 100 http://localhost
Some people suggested that we should test with both Apache 1.3 and 2.0, so we gave Apache 2.0 a test run.

Unit: Requests per second Powermac Dual G5 2.5 GHz OS X Powermac Dual G5 2.5 GHz YDL Dual Xeon 3.6 GHz
Apache 1.3 250 709 1291
Apache 2.0 266 2165 3410

On OS X, we noticed that the activity monitor was telling us that the CPUs were not working very hard and were underutilized. This seems to indicate that the problem with Apache is somewhat different from MySQL, as MySQL showed a CPU load between 165% and 190%. (200% is the maximum, as there are 2 CPUs in the system.)

Apple told us that the problem lies in Apachebench (the client side), which stalls from time to time and thus generates too low of a load on the (Apache) server. The weird thing is that this does not happen with few connections (up to 10,000). When we repeated the test, Apachebench on Mac OS X gets in trouble again. Version 2.0 is slightly faster on OS X, but it still trails by a significant margin. On the other hand, YDL and the Xeon platform are roughly 3X as fast with version 2.0.

According to Apple, this is a bug in Apachebench. Now, we can accept that explanation, as it is clear that the server is not loaded and can still accept a lot more web requests. However, the Apachebench problem is still interesting. Why exactly does the client stall? Is it really a bug or is it running out of some resources? We didn't delve deeper, as we are developing a less synthetic, closer to the real world benchmark to test web servers.

Even if we ignore the Apache results, our MySQL tests - and the queries used in these tests - are based on a real world usage pattern of a real world database. The G5 is partially crippled by a chipset that takes a long time to access the memory, and it's not the fastest integer CPU; still, it performs like a 3 GHz Xeon on Linux. The problem clearly lies in Mac OS X, and is worth further investigation.

Micro CPU Benchmarks: Isolating the FPU Bottleneck Search
Comments Locked

47 Comments

View All Comments

  • Lori - Friday, September 2, 2005 - link

    http://en.wikipedia.org/wiki/Microkernel">http://en.wikipedia.org/wiki/Microkernel

    MacOS X uses a modified microkernel (a monolithic / microkernel hybrid). The idea was to cut down IPC costs by putting servers that would be IPC heavy directly into the kernel. However, there has recently been a lot of work in the microkernel world to reduce this IPC cost and bring its speed near that of a monolithic kernel.

    L4Ka::Pistachio is an example of this:
    http://www.l4ka.org/">http://www.l4ka.org/
  • leviat - Thursday, September 1, 2005 - link

    If the problem is indeed in the thread creation portion of the OS, it would be interesting to see how a single threaded webserver fairs. I would love to see a benchmark test of Lighttpd (www.lighttpd.org) to see a comparison of how that runs on Darwin vs linux-ppc.

    Another interesting test would be to see MySQL can be configured to precreate the handler threads. This might allow us to see how it handles the context-switching between the multiple threads and allow for it to compete.

    Anyways, great article!
  • JohanAnandtech - Friday, September 2, 2005 - link

    What exactly to do you mean by single threaded? Because Apache 1.3 works with processes, and is thus single-threaded per user.

    MySQL can make use of a Thread cache, we played with it but it didn't give any substantial boost. I don't see how the software would be able to precreate all threads as it has close down and open connections. If you got some insight, please share :-).

    Context switching is quite fast on the G5 OS X, give or take a few percentages compared to Linux x86 or G5 Linux, as we tested with lmbench.
  • Lori - Friday, September 2, 2005 - link

    Actually there are more than one way to handle multiple connections in a server application.

    To give you some examples...

    1. Multi process
    2. Multi thread
    3. Some hybrid of the two

    You can see combinations of these types all provided by Apache 2's MPMs. (perchild, prefork, threadpool, worker, leader.. etc)

    4. Asynchronus multiplexing.

    Your program becomes its own schedular. You can do all your processing within a single thread. Also read up on non blocking i/o. I am actually surprised apache does not have a MPM to handle this type of connection multiplexing but I also read its harder to get OS support.

    Letsee... links... umm... ahh...:

    http://www.kegel.com/c10k.html">http://www.kegel.com/c10k.html
  • Avalon - Thursday, September 1, 2005 - link

    Seems like once you remove the G5 from OSX, it's a very capable chip.
  • jamawass - Thursday, September 1, 2005 - link

    Great article, in response to the previous post Anand has posted tons of server articles on x86 systems so Apple is fair game here. Secondly Apple servers are based on OSX in the market, corporations want to know the real world performance not the desktop feel. Also Johan's speculation on Apple's move to Intel raises some troubling questions for Apple execs.
  • karlreading - Thursday, September 1, 2005 - link

    a lot of people commenting on how apple have mad a wrong dicision turning to intel.
    possibly, but IMHO, and, if im not mistaken, didnt the opteron dominate all the tests.
    so in my mind whilst its true for people to doubt apple for going intel, x86 on the whole is still a very viable option if you go the AMD route.
    yes i know people will say AMD dont hae the capacity, but amd powered macs should be how x86 macs are done.
    karlos
  • karlreading - Thursday, September 1, 2005 - link

    also worth noting is that they say the FP poerformance is as good as the fastest x86 chip. well scuse me, but isnt that a 2.7ghz g5 part there testing there? thats the fastest g5 currently avalible isnt it? well then why not test the opteron 254. thats the fastest x86 chip, running 2.8ghz, rather than the 850/250 2.4ghz part tested? that would put some lead against the g5 and also, 2.8ghz is a lot closer than 2.4ghz is to the 2.7ghz g5's core speed. if were trying to be fair.
    if we was being really picky we would be stating duakl core opteron as the fastest x86, but i digress....
    karlos
  • JohanAnandtech - Friday, September 2, 2005 - link

    You are right about the recentely introduced 2.8 GHz Opteron. Well, to be really accurate, at the time of the introduction of the 2.7 GHz G5, a 2.6 Ghz opteron was available.

    Anyway, It was not my intention to be "accurate", it was more a general impression. Give or take a few percent, the G5 can compete FP wise :-).
  • Pannenkoek - Thursday, September 1, 2005 - link

    It's a matter of scalability, SMP support and not so much of how fast some system calls are executed as the reason for the bad performance I would think. Linux is the most used OS for superclusters these days, Mac OS remains a desktop OS. It's no wonder that it performs poorly as a serious server on a multiprocessor/core system. It would have been interesting to see how Windows would have faired (on the x86 of course), if we are testing OSes in this way.

    However, MySQL benchmarks say little about desktop performance, Anandtech's audience consists of desktop users and the reason people love or hate Mac OS is its desktop. Nevertheless, almost a great article. It should have been if the autor could have resisted the temptation of too much speculation, instead of honest benchmark numbers.

Log in

Don't have an account? Sign up now