When it comes to server hardware failures, I've seen them all with our own infrastructure. With the exception of CPUs, I've seen virtually every other component that could fail, fail in the past 16 years of running AnandTech. Motherboards, power supplies, memory and of course, hard drives. 

By far the most frequent failure in our infrastructure had to be mechanical drives. Within the first year after the launch of Intel's X25-M in 2008, I had transitioned all of my testbeds to solid state drives. The combination of performance and reliability was what I needed. Most of my testbeds were CPU bound, so I didn't necessarily need a ton of IO performance - but having the headroom offered by a good SSD meant that I could get more consistent CPU performance results between runs. The reliability side was simple to understand - with a good SSD, I wouldn't have to worry about my drive dying unexpectedly. Living in fear of a testbed hard drive dying over the weekend before a big launch was a thing of the past. 

When it came to rearchitecting the AnandTech server farm, these very same reasons for going the SSD route on all of our testbeds (and personal systems) were just as applicable to the servers that ran AnandTech.

Our infrastructure is split up between front end application servers and back end database servers. With the exception of the boxes that serve our images, most of our front end app servers don't really stress IO all that much. The three 12-core virtualized servers at the front end would normally be fine with some hard drives, however we instead decided to go with mainstream SSDs to lower the risk of a random mechanical failure. We didn't need the endurance of an enterprise drive in these machines since they weren't being written to all that frequently, but we needed reliable drives. Although quite old by today's standards, we settled on 160GB Intel X25-M G2s but partitioned the drives down to 120GB in order to ensure they'd have a very long lifespan.

Where performance matters more is in our back end database servers. We run a combination of MS SQL and MySQL, and our DB workloads are particularly IO intensive. In the old environment we had around a dozen mechanical drives in various RAID configurations powering all of the databases that ran the site. To put performance in perspective, I grabbed our old Forum Database server and took a look at the external SAS RAID array we had created. Until last year, the Forums were powered by a combination of 6 x Seagate Barracuda ES.2s and 4 x Seagate Cheetah 10K.7s. 

For the new Forums DB we moved to 6 x 64GB Intel X25-Es. Again, old by modern standards, but a huge leap above what we had before. To put the performance gains in perspective I ran some of our enterprise IO benchmarks on the old array and the new array to compare. We split the DB workload across the Barracuda ES.2 array (6 drive RAID-10) and the Cheetah array (4 drive RAID-5), however to keep things simple I just created a 4-drive RAID-0 using the Cheetahs which should give us more than a good indication of peak performance of the old hardware:

AnandTech Forums DB IO Performance Comparison - 2013 vs 2007
  MS SQL - Update Daily Stats MS SQL - Weekly Stats Maintenance Oracle Swingbench
Old Forums DB Array (4 x 10K RPM  RAID-0) 146.1 MB/s 162.9 MB/s 2.8 MB/s
New Forums DB Array (6 x X25-E RAID-10) 394.4 MB/s 450.5 MB/s 55.8 MB/s
Performance Increase 2.7x 2.77x 19.9x

The two SQL tests are actually from our own environment, so the performance gains are quite applicable. The advantage here is only around 2.7x. In reality the gains can be even greater, but we don't have good traces of our live DB load - just some of our most IO intensive tasks on the DB servers. The final benchmark however does give us some indication of what a more random enterprise workload can enjoy with a move to SSDs from a hard drive array. Here the performance of our new array is nearly 20x the old HDD array.

Note that there's another simplification that comes along with our move to SSDs: we rely completely on Intel's software RAID. There are no third party RAID controllers, no extra firmware/drivers to manage and validate, and there's no external chassis needed to get more spindles. We went from a 4U HP DL585 server with a 2U Promise Vtrak J310s chassis and 10 hard drives, down to a 2U server with 6 SSDs - and came out ahead in the performance department. Later this week I'll talk about power savings, which ended up being a much bigger deal.

This is just the tip of the iceberg. In our specific configuration we went from old hard drives to old SSDs. With even greater demands you could easily go to truly modern enterprise SSDs or even PCIe based solutions. Using a combination of consumer and enterprise drives isn't a bad idea if you want to transition to an all-SSD architecture. Deploying reliable consumer drives in place of lightly used hard drives is a way to cut down the number of moving parts in your network, while moving to higher performing/higher endurance enterprise SSDs can deliver significant performance benefits as well.

Comments Locked

57 Comments

View All Comments

  • extide - Tuesday, March 12, 2013 - link

    I would have gone with some form of software RAID vs using Intel RAID. Preferably something like ZFS or MDADM. Even for the MSSQL setup, I would say run a pair of mirrored drives for the OS and then use a ZFS array mounted from another box over iSCSI or fiberchannel or something.
  • extide - Tuesday, March 12, 2013 - link

    To clarify, I am kind of against ALL hardware raid these days, besides using simple on-board RAID for a mirror.

    I generally would setup systems using basic on-board raid with a mirror for all system drives, and then all data drives would use some form of software raid, ZFS preferably, and then those volumes could be mounted up where needed via things like iSCSI or fiberchannel.
  • mfenn - Tuesday, March 12, 2013 - link

    Sounds like whole site in Windows unfortunately. Maybe the next upgrade will be to a real OS? ;)
  • Egg - Tuesday, March 12, 2013 - link

    Where does it say that they use Windows?
    http://www.intel.com/support/chipsets/imsm/sb/cs-0... Intel RAID works on Linux.
  • Gigaplex - Wednesday, March 13, 2013 - link

    It states they use MSSQL, which only runs on Windows.
  • FunBunny2 - Tuesday, March 12, 2013 - link

    Anand: Would you consider measuring the Olde HDD machines against a New SSD Machine, with some (or all) of the tables fully normalized on the SSD, and the flatfile-ish Olde Tables on the Olde Machine? I think that's the truest measure of the value of SSD: largely sequential on HDD versus random/join/synthesize on SSD.
  • johannes - Tuesday, March 12, 2013 - link

    Is the performance/stability/maintenance advantage of dedicated raid-cards compared to softraid so small that softraid is used even in high-traffic servers? Is Intel softraid very different from mdadm-raid in this respect?
  • erple2 - Tuesday, March 12, 2013 - link

    I don't know that the hardware raids have any of those advantages - maybe stability, but they are not faster any more (particularly with SSDs). They are also more of a pain with maintenance - Anand even mentioned in the article that validating new Firmware on them is time consuming and "a pain". These days, I'm not really sure why anyone would go with a hardware RAID device (except MAYBE a giant SAN type operation - but even there, there's probably an underlying problem with how you're approaching the problem that can be re-tooled smarter).

    I think that the days of giant multimillion dollar RAID arrays are slowly going by the wayside, other than support of "decades" old computing platforms that are substantially incorrectly thought to be too expensive to replace.

    I recall having a long talk with someone that suggested that relying on single points of failure (like a giant SAN, for example) is ultimately the wrong direction to go (particularly given the outrageous expense of the hardware - they usually cost more than 2 years of development effort to come up with a clever-er use of smaller, cheaper, more parallel storage arrays).

    However, I'm probably wrong!
  • jmke - Wednesday, March 13, 2013 - link

    it depends; RAID card on PCIe, PCIe bandwith is 64GB/s
    SATA 3.0 spec: 600MB/s

    you do the math :)
  • Gigaplex - Wednesday, March 13, 2013 - link

    So what if PCIe bandwidth is higher than SATA? You've still got the SATA limitation from the drive to the RAID card in the first place. Unless you've got something exotic like fibre channel. But then why are you comparing to a single SATA port?

Log in

Don't have an account? Sign up now