I/O Benchmarking

Much of our jobs as systems administrators and application developers is related to performance of systems and applications. The holy grail of all things computer related is often removing bottlenecks and having software running as fast as possible. When this relates to data access we try to have as much caching as possible, as close to the CPU as possible. With database systems, this is often achieved by populating the RAM of the system as much as we can with the data that will be accessed most frequently.

Given that hard disk storage sizes are often several orders of magnitude larger than what is possible with RAM, we can never cache all of the data however. We have to accept that a lot of the time, we simply have to access disk. While the advent of solid state devices means that latencies and data throughput are massively improved, the bulk of our data storage is on disk-based media and it is worth our while to improve performance of these through careful benchmarking and tuning.

This is a guide on how to do this using some common Linux tools and methods.

Identify Your Goals

It is very well to want to "optimise" performance, but unless you have a baseline measurement and some idea of what specifically you want to improve, you are no better off than just stumbling around in the dark. Tuning will be haphazard and unlikely to achieve what you want.

In our example case, we have a client who runs a MySQL database on a pair of High Availability systems with DRBD. Since using DRBD incurs some slight performance penalties (depending on how it is set up) they wanted to ensure performance of certain queries was maximised. Specifically, because MySQL replication was being used, queries that result in a write (INSERT, UPDATE etc) needed to be as efficient as possible.

Identify a Test Case

Once you have identified your performance goal, you need to ensure you can measure when you have achieved that goal. In our case we constructed MySQL queries which would test the type of database behaviour we wanted to optimise. Script or code your test case and have it print performance data automatically. You want to make running your test as easy as possible, and return usable results in a reasonably short amount of time.

For this purpose we created a small Perl script that performed the queries we wanted and then printed diagnostic output after the run. It took a single parameter - the number of iterations desired. Essentially the steps performed are:

  • create a new table with the required columns
  • insert a row into the table
  • enter the loop
  • perform an update on the existing row of the table
  • record timing data for this iteration of the loop
  • return to the beginning of the loop until we have executed the desired number of iterations
  • calculate performance data based on times recorded

When run, the output of our script resembled the following:

# Start time: Fri Feb 27 12:18:38 2009
Iterations                              : 10000
Total time                              : 3.589983
        Initialisation time             : 0.058451
        Finalisation time               : 0.058451
        Benchmark loop                  : 3.530613
        Benchmark sum                   : 3.451372
                Min/Average/Max         : 0.000191/0.000345/0.240850
                Median                  : 0.000293
                Standard deviation      : 0.002766
# Finish time: Fri Feb 27 12:18:41 2009

This gives us a reasonably scientific record of current performance. Running the script again (with an appropriate amount of iterations) after making calculated performance tunings should reveal whether the tuning has been successful.

The full script can be downloaded here - benchmark-mysql.pl

Eliminate Unknowns

When we were testing this environment, we also had a staging environment in a VMWare Server VM. At one stage we ran our MySQL benchmark script on the VM which had a local MySQL database. The results that came back were incredibly fast, and staggered us as to how a VM on a moderately-powered machine could outperform a purpose-built database server. The answer was of course VMWare's aggressive write-caching.

When you are examining your environment for I/O performance enhancements, make sure you know the state of all unknowns to begin with. If the drives support on-drive write-caching (and most drives do, in fact they generally ship with the write cache turned ON) then record the settings prior to setting your baseline (although arguably write-caching on the drives should always be disabled). Examine default OS settings with respect to dirty-buffer flushing and filesystem caches. Record your database cache and buffer settings.

This way when you later need to tune settings, you are fully aware of your base operating environment.

Hardware Benchmarking

We used two Open Source tools for our I/O benchmarking:

Both of these tools generate a lot of output from their benchmarking which can be quite hard to understand, so we discuss the output below.

IOZone

IOZone tends to cover a lot of ground in testing, as in automatic mode it will test a number of data and record lengths with comprehensive tests for each iteration. Output looks like the following:

                                              random  random    bkwd  record  stride                                   
KB  reclen   write rewrite    read    reread    read   write    read rewrite    read   fwrite frewrite   fread  freread
64       4  138240  998622  2006158  2772930 1828508 1066042 1690338 1452528 2358717   395452   955947 2278628  2662899
64       8  387461 1087638  3363612  3541098 2662899 1310683 2561267 1638743 3363612   451299  1105556 2772930  3203069
64      16  418250 1124074  3363612  3541098 2772930 1255511 2662899 1638743 3203069   538156  1279447 3203069  3363612
64      32  418250 1163036  3541098  2923952 3203069 1562436 2801873 1562436 3363612   710511  1828508 3203069  2772930
64      64  409946 1188789  3541098  3363612 3541098 1363961 3057153 1336792 3541098   440921  1163036 3363612  3541098
128      4  375222  985435  2673584  2784517 2202044 1133103 2132084 1487977 2132084   383258   969421 2511022  2673584
128      8  410199 1111981  3124872  3359523 2727923 1306873 2608629 1778862 3380677   429563  1067750 3036502  3124872
128     16  430942 1162547  3359523  3445772 3053774 1471662 2905056 1705404 3359523   462100  1155043 3199360  3277486

For benchmarking database systems, large data sizes with small to moderate record length are ideal as they will best represent the database I/O usage. Each of the output figures are in KB/s and the meaning of each is well defined in the IOZone output. For our purposes, however, we are most interested in rewrite, random write and record rewrite as they will be good indicators for the type of operations we want to optimise. You should examine the definitions of each test and see what will be the best indicator for the areas you are trying to optimise.

IOZone also allows you to output the results in a CSV format which is useful for importing into a spreadsheeting program such as OpenOffice Spreadsheet. From there you can generate a graph of the results which can be more useful for visualising where bottlenecks are being hit. We discovered that random writes weren't particularly good, and we hypothesised that by adding a hardware RAID card with battery-backed write-cache the writes could be reordered for higher throughput and better latency.

Bonnie++

Bonnie++ does many similar things to IOZone but as in any scientific scenario it is good to take measurements with more than one device. The output is formatted in a less verbose manner than IOZone but we can achieve much the same result. Here is what the output looks like:

name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
dbserver,8G,39720,84,36708,12,24751,7,37151,72,66737,12,305.7,1,16,286,0,+++++,+++,516,0,293,0,+++++,+++,442,20
dbserver,8G,41623,88,35162,12,25144,7,36991,71,65655,11,340.7,1,16,292,0,+++++,+++,509,0,293,0,+++++,+++,437,20
dbserver,8G,43012,91,34331,11,24836,7,36735,71,65201,12,328.8,1,16,283,0,+++++,+++,512,0,293,0,+++++,+++,447,20

Once again we are mostly interested in results to do with random writes. Bonnie++ can also show you the effect of the operations on the CPU which can be useful to gauge what effect different RAID levels or other storage-design choices have had on CPU utilisation.

Output shown as +++ or +++++ completed in less than 500ms, so due to accuracy errors Bonnie++ prefers not to display the result.

Make Changes, Retest

Once we had established baselines, we could add in various other factors and see how it affected the performance.

  • Since DRBD was being used, we compared the baseline hardware performance against that on a DRBD volume in both disconnected and connected states.
  • We used the query test to show the difference between performance with the MySQL binary logging enabled and disabled (although it was essential to have it enabled for replication to work)
  • The query test was again used to test the effects of the MySQL setting sync_binlog which fsyncs after every binary log write (for data safety).

Through the benchmarks we had collected we were able to see that the biggest bottleneck to efficient database writes for the purposed of this case was latency between writes. These were affected directly by the use of DRBD, the MySQL binary log and sync_binlog. We determined that by adding a hardware RAID card with battery-backed write cache we could eliminate most of the latency induced by waiting for writes to read the physical disks. Tuning the network settings and DRBD had been done already but resulted in negligible gains.

Testing hardware was ordered and used to retest all the benchmarks to verify the planned performance increases. If one thing cannot be stressed enough, it is that you should keep on file all of your benchmarks and the exact hardware and software settings used to generate those benchmarks. That way, if you later see results that are unexpected or do not meet your requirements you can determine the exact cause. If everything has been planned correctly you can use your benchmarks to back up hypothesis that changes are necessary to enhance performance.


See also:

References/External Links