Much of our jobs as systems administrators and application developers is related to performance of systems and applications. The holy grail of all things computer related is often removing bottlenecks and having software running as fast as possible. When this relates to data access we try to have as much caching as possible, as close to the CPU as possible. With database systems, this is often achieved by populating the RAM of the system as much as we can with the data that will be accessed most frequently.
Given that hard disk storage sizes are often several orders of magnitude larger than what is possible with RAM, we can never cache all of the data however. We have to accept that a lot of the time, we simply have to access disk. While the advent of solid state devices means that latencies and data throughput are massively improved, the bulk of our data storage is on disk-based media and it is worth our while to improve performance of these through careful benchmarking and tuning.
This is a guide on how to do this using some common Linux tools and methods.
Identify Your Goals
It is very well to want to "optimise" performance, but unless you have a baseline measurement and some idea of what specifically you want to improve, you are no better off than just stumbling around in the dark. Tuning will be haphazard and unlikely to achieve what you want.
In our example case, we have a client who runs a MySQL database on a pair of High Availability systems with DRBD. Since using DRBD incurs some slight performance penalties (depending on how it is set up) they wanted to ensure performance of certain queries was maximised. Specifically, because MySQL replication was being used, queries that result in a write (INSERT, UPDATE etc) needed to be as efficient as possible.
Identify a Test Case
Once you have identified your performance goal, you need to ensure you can measure when you have achieved that goal. In our case we constructed MySQL queries which would test the type of database behaviour we wanted to optimise. Script or code your test case and have it print performance data automatically. You want to make running your test as easy as possible, and return usable results in a reasonably short amount of time.
For this purpose we created a small Perl script that performed the queries we wanted and then printed diagnostic output after the run. It took a single parameter - the number of iterations desired. Essentially the steps performed are:
- create a new table with the required columns
- insert a row into the table
- enter the loop
- perform an update on the existing row of the table
- record timing data for this iteration of the loop
- return to the beginning of the loop until we have executed the desired number of iterations
- calculate performance data based on times recorded
When run, the output of our script resembled the following:
# Start time: Fri Feb 27 12:18:38 2009 Iterations : 10000 Total time : 3.589983 Initialisation time : 0.058451 Finalisation time : 0.058451 Benchmark loop : 3.530613 Benchmark sum : 3.451372 Min/Average/Max : 0.000191/0.000345/0.240850 Median : 0.000293 Standard deviation : 0.002766 # Finish time: Fri Feb 27 12:18:41 2009
This gives us a reasonably scientific record of current performance. Running the script again (with an appropriate amount of iterations) after making calculated performance tunings should reveal whether the tuning has been successful.
The full script can be downloaded here - benchmark-mysql.pl
When we were testing this environment, we also had a staging environment in a VMWare Server VM. At one stage we ran our MySQL benchmark script on the VM which had a local MySQL database. The results that came back were incredibly fast, and staggered us as to how a VM on a moderately-powered machine could outperform a purpose-built database server. The answer was of course VMWare's aggressive write-caching.
When you are examining your environment for I/O performance enhancements, make sure you know the state of all unknowns to begin with. If the drives support on-drive write-caching (and most drives do, in fact they generally ship with the write cache turned ON) then record the settings prior to setting your baseline (although arguably write-caching on the drives should always be disabled). Examine default OS settings with respect to dirty-buffer flushing and filesystem caches. Record your database cache and buffer settings.
This way when you later need to tune settings, you are fully aware of your base operating environment.
We used two Open Source tools for our I/O benchmarking:
Both of these tools generate a lot of output from their benchmarking which can be quite hard to understand, so we discuss the output below.
IOZone tends to cover a lot of ground in testing, as in automatic mode it will test a number of data and record lengths with comprehensive tests for each iteration. Output looks like the following:
random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 64 4 138240 998622 2006158 2772930 1828508 1066042 1690338 1452528 2358717 395452 955947 2278628 2662899 64 8 387461 1087638 3363612 3541098 2662899 1310683 2561267 1638743 3363612 451299 1105556 2772930 3203069 64 16 418250 1124074 3363612 3541098 2772930 1255511 2662899 1638743 3203069 538156 1279447 3203069 3363612 64 32 418250 1163036 3541098 2923952 3203069 1562436 2801873 1562436 3363612 710511 1828508 3203069 2772930 64 64 409946 1188789 3541098 3363612 3541098 1363961 3057153 1336792 3541098 440921 1163036 3363612 3541098 128 4 375222 985435 2673584 2784517 2202044 1133103 2132084 1487977 2132084 383258 969421 2511022 2673584 128 8 410199 1111981 3124872 3359523 2727923 1306873 2608629 1778862 3380677 429563 1067750 3036502 3124872 128 16 430942 1162547 3359523 3445772 3053774 1471662 2905056 1705404 3359523 462100 1155043 3199360 3277486
For benchmarking database systems, large data sizes with small to moderate record length are ideal as they will best represent the database I/O usage. Each of the output figures are in KB/s and the meaning of each is well defined in the IOZone output. For our purposes, however, we are most interested in rewrite, random write and record rewrite as they will be good indicators for the type of operations we want to optimise. You should examine the definitions of each test and see what will be the best indicator for the areas you are trying to optimise.
IOZone also allows you to output the results in a CSV format which is useful for importing into a spreadsheeting program such as OpenOffice Spreadsheet. From there you can generate a graph of the results which can be more useful for visualising where bottlenecks are being hit. We discovered that random writes weren't particularly good, and we hypothesised that by adding a hardware RAID card with battery-backed write-cache the writes could be reordered for higher throughput and better latency.
Bonnie++ does many similar things to IOZone but as in any scientific scenario it is good to take measurements with more than one device. The output is formatted in a less verbose manner than IOZone but we can achieve much the same result. Here is what the output looks like:
name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu dbserver,8G,39720,84,36708,12,24751,7,37151,72,66737,12,305.7,1,16,286,0,+++++,+++,516,0,293,0,+++++,+++,442,20 dbserver,8G,41623,88,35162,12,25144,7,36991,71,65655,11,340.7,1,16,292,0,+++++,+++,509,0,293,0,+++++,+++,437,20 dbserver,8G,43012,91,34331,11,24836,7,36735,71,65201,12,328.8,1,16,283,0,+++++,+++,512,0,293,0,+++++,+++,447,20
Once again we are mostly interested in results to do with random writes. Bonnie++ can also show you the effect of the operations on the CPU which can be useful to gauge what effect different RAID levels or other storage-design choices have had on CPU utilisation.
Output shown as +++ or +++++ completed in less than 500ms, so due to accuracy errors Bonnie++ prefers not to display the result.
Make Changes, Retest
Once we had established baselines, we could add in various other factors and see how it affected the performance.
- Since DRBD was being used, we compared the baseline hardware performance against that on a DRBD volume in both disconnected and connected states.
- We used the query test to show the difference between performance with the MySQL binary logging enabled and disabled (although it was essential to have it enabled for replication to work)
The query test was again used to test the effects of the MySQL setting sync_binlog which fsyncs after every binary log write (for data safety).
Through the benchmarks we had collected we were able to see that the biggest bottleneck to efficient database writes for the purposed of this case was latency between writes. These were affected directly by the use of DRBD, the MySQL binary log and sync_binlog. We determined that by adding a hardware RAID card with battery-backed write cache we could eliminate most of the latency induced by waiting for writes to read the physical disks. Tuning the network settings and DRBD had been done already but resulted in negligible gains.
Testing hardware was ordered and used to retest all the benchmarks to verify the planned performance increases. If one thing cannot be stressed enough, it is that you should keep on file all of your benchmarks and the exact hardware and software settings used to generate those benchmarks. That way, if you later see results that are unexpected or do not meet your requirements you can determine the exact cause. If everything has been planned correctly you can use your benchmarks to back up hypothesis that changes are necessary to enhance performance.