MySQL and the power of Intel SSDsby Johan De Gelas on November 9, 2008 12:00 AM EST
- Posted in
- IT Computing general
More and more of the database vendors are talking about the wonders that SSD can do for transactional (OLTP) databases. So I read Anand's latest SSD article with more than usual interest. If many of the cheaper MLC SSD's write small blocks 20 times slower than a decent harddrive, these SSD's are an absolute nightmare for OLTP databases.
In our last Dunnington review, we showed our latest virtualization test which includes 4 concurrent OLTP ("Sysbench") tests on four separate MySQL 5.1.23 databases in four ESX virtual machines. We were fairly confident that our 6 disks RAID-0 for data and 1 separate disk for logging were capable of keeping up. After all, each disk is a 300 GB Cheetah Seagate at 15000 rpm, probably one of the fastest (mechanical) disks on this planet as it can deliver up to 400 I/O per second (and 125 MB/s sequential data rate).
But it is better to be safe than to be sorry. We did extensive monitoring with IOstat (on a "native" SLES 10 SP2) and found the following numbers on the disk that performs the logging transactions:
- queue length is about 0.22 (More than 2 indicates that the harddisk can not keep up)
- typical average I/O latency is 0.23 ms (90%), with about 10% spikes of 7 to 12 ms (we measure the average over the past 2 seconds)
That reassured us that our transaction log disk was not a bottleneck. On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk.
So we had to delve a little deeper. I first thought that as long as the harddisk is not the bottleneck, the number of transactions would be more or less the same with a faster disk. It turned out that I was somewhat wrong.
In MySQL each user thread can issue a write when the transaction is commited . More importantly is a completely serial, there doesn't seem to be a separate log I/O thread which would allow our user thread to "fire" a disk operation "and forget". As we want to be fully ACID compliant our database is configured with
innodb_flush_log_at_trx_commit = 1
So after each transaction is committed, there is a "pwrite" first, then followed by a flush to the disk. So the actual transactions performance is also influenced by the disk write latency even if the disk is nowhere near it's limits.
We still have to investigate this further but this seems to go a bit against the typical sizing advice that is given for OLTP databases: make sure your log disks achieve a certain numbers of I/Os or put otherwise: "make sure you have enough spindles". That doesn't seem to paint the complete picture: as each write to disk action seems to be in the "critical speed path" of your transaction, each individual access latency seems to influence performance.
We monitored the same Sysbench benchmark on our Intel X25-M disk:
- Queue length is lower: 0.153 (but 0.2 was already very low)
- typical access latency: an average 0.1 with very few spikes of 0.5 ms.
- 1900 instead of 1400 tr/s
So our conclusion so far seems to be that in case of MySQL OLTP, sizing for IO/s seems to be less important than the individual write latency. To put it more blunt: in many cases even tens of of spindles will not be able to beat one SSD as each individual disk spindle has a relatively high latency. We welcome your feedback!