When it comes to server hardware failures, I've seen them all with our own infrastructure. With the exception of CPUs, I've seen virtually every other component that could fail, fail in the past 16 years of running AnandTech. Motherboards, power supplies, memory and of course, hard drives. 

By far the most frequent failure in our infrastructure had to be mechanical drives. Within the first year after the launch of Intel's X25-M in 2008, I had transitioned all of my testbeds to solid state drives. The combination of performance and reliability was what I needed. Most of my testbeds were CPU bound, so I didn't necessarily need a ton of IO performance - but having the headroom offered by a good SSD meant that I could get more consistent CPU performance results between runs. The reliability side was simple to understand - with a good SSD, I wouldn't have to worry about my drive dying unexpectedly. Living in fear of a testbed hard drive dying over the weekend before a big launch was a thing of the past. 

When it came to rearchitecting the AnandTech server farm, these very same reasons for going the SSD route on all of our testbeds (and personal systems) were just as applicable to the servers that ran AnandTech.

Our infrastructure is split up between front end application servers and back end database servers. With the exception of the boxes that serve our images, most of our front end app servers don't really stress IO all that much. The three 12-core virtualized servers at the front end would normally be fine with some hard drives, however we instead decided to go with mainstream SSDs to lower the risk of a random mechanical failure. We didn't need the endurance of an enterprise drive in these machines since they weren't being written to all that frequently, but we needed reliable drives. Although quite old by today's standards, we settled on 160GB Intel X25-M G2s but partitioned the drives down to 120GB in order to ensure they'd have a very long lifespan.

Where performance matters more is in our back end database servers. We run a combination of MS SQL and MySQL, and our DB workloads are particularly IO intensive. In the old environment we had around a dozen mechanical drives in various RAID configurations powering all of the databases that ran the site. To put performance in perspective, I grabbed our old Forum Database server and took a look at the external SAS RAID array we had created. Until last year, the Forums were powered by a combination of 6 x Seagate Barracuda ES.2s and 4 x Seagate Cheetah 10K.7s. 

For the new Forums DB we moved to 6 x 64GB Intel X25-Es. Again, old by modern standards, but a huge leap above what we had before. To put the performance gains in perspective I ran some of our enterprise IO benchmarks on the old array and the new array to compare. We split the DB workload across the Barracuda ES.2 array (6 drive RAID-10) and the Cheetah array (4 drive RAID-5), however to keep things simple I just created a 4-drive RAID-0 using the Cheetahs which should give us more than a good indication of peak performance of the old hardware:

AnandTech Forums DB IO Performance Comparison - 2013 vs 2007
  MS SQL - Update Daily Stats MS SQL - Weekly Stats Maintenance Oracle Swingbench
Old Forums DB Array (4 x 10K RPM  RAID-0) 146.1 MB/s 162.9 MB/s 2.8 MB/s
New Forums DB Array (6 x X25-E RAID-10) 394.4 MB/s 450.5 MB/s 55.8 MB/s
Performance Increase 2.7x 2.77x 19.9x

The two SQL tests are actually from our own environment, so the performance gains are quite applicable. The advantage here is only around 2.7x. In reality the gains can be even greater, but we don't have good traces of our live DB load - just some of our most IO intensive tasks on the DB servers. The final benchmark however does give us some indication of what a more random enterprise workload can enjoy with a move to SSDs from a hard drive array. Here the performance of our new array is nearly 20x the old HDD array.

Note that there's another simplification that comes along with our move to SSDs: we rely completely on Intel's software RAID. There are no third party RAID controllers, no extra firmware/drivers to manage and validate, and there's no external chassis needed to get more spindles. We went from a 4U HP DL585 server with a 2U Promise Vtrak J310s chassis and 10 hard drives, down to a 2U server with 6 SSDs - and came out ahead in the performance department. Later this week I'll talk about power savings, which ended up being a much bigger deal.

This is just the tip of the iceberg. In our specific configuration we went from old hard drives to old SSDs. With even greater demands you could easily go to truly modern enterprise SSDs or even PCIe based solutions. Using a combination of consumer and enterprise drives isn't a bad idea if you want to transition to an all-SSD architecture. Deploying reliable consumer drives in place of lightly used hard drives is a way to cut down the number of moving parts in your network, while moving to higher performing/higher endurance enterprise SSDs can deliver significant performance benefits as well.

Comments Locked

57 Comments

View All Comments

  • enoxseven - Tuesday, March 12, 2013 - link

    Are your database servers virtualized as well?
  • Anand Lal Shimpi - Tuesday, March 12, 2013 - link

    Not yet. In order to simplify deployment we went with two boxes, each being a warm spare for the other. Moving forward we will likely virtualize those platforms as well.
  • Adul - Thursday, March 14, 2013 - link

    I would advise against DB virtualization as we have seen in our customer environments a significant performance impact with going virtualized. Even when we pinned this customer to only SSD the difference in performance was significant. I am sure you will be doing some benchmarks before hand.
  • Doby - Saturday, March 16, 2013 - link

    virtualizing can be done so it doesn't have a performance hit. To the contrary, in many situations you can gain performance on large servers by virtualizing. Sure, its not an absolute, but to say virtualization cause significant performance impact is the same as saying a physical server can cause significant performance impact.

    I thought we were getting past the days where people blindly blamed virtualization. That statement should be as dead as saying you should have dedicated RAID10 for DB, and dedicated RAID for logs. It completely depends on the infrastructure.
  • gamoniac - Monday, March 18, 2013 - link

    @Adul, just being curious, in your customer's SQL VM, were the data files/log files residing on the VM disks? A more scalable approach would be to have the OS and SQL software on the VM with data/log files on another physical storage. I am interested in your experience. TIA.
  • kolbryn - Thursday, March 21, 2013 - link

    High disk IO for VMs requires special customizations such as PVSCI Storage Adapter or NPIV. As always disk RAID/type/Meta-LUN are all key to performance for any DB, physical or virtual. Also MB/s is not the best indication of performance, enterprise DBs live or die by IOPS, queue depth and response time.
  • lwatcdr - Thursday, March 14, 2013 - link

    Great writeup. I have been wondering about SSDs for databases makes a lot of sense.
  • DanNeely - Tuesday, March 12, 2013 - link

    Why multiple database engines? MySql and MsSql are both good products; but using the two together on a single site seems odd.
  • martajd - Tuesday, March 12, 2013 - link

    The main site appears to be written in ASP.net which naturally plays nice with MSSQL. The Forums use vBulletin which I believe uses MySQL by default. I imagine they are separate in app code and DB code
  • Anand Lal Shimpi - Tuesday, March 12, 2013 - link

    Correct :)

Log in

Don't have an account? Sign up now