Frequent Database Backups With Xdelta
How frequently do you backup your server, database, web data? Chances are your answer is "daily" like most other people (and hopefully not "what backups?"). By and large this is sufficient for most web-based operations that don't receive much traffic or don't have frequently changing data. Obviously though this isn't sufficient for everybody.
Contents
The Problem
If your data is changing quite frequently and the cost of losing even an hour's worth of data is significant you must consider taking more frequent backups than just daily. This raises a few questions:
- How frequently should we be taking backups?
- How much downtime of the services are permitted for backing up data?
- Where do we store the backups?
- How many backups should we store?
In the scenario I will describe below I will be talking about taking MySQL database backups, but the general procedure could apply equally to just about any type of data.
Answering the questions
Unfortunately with backups, just about every problem that you'll need to deal with will require a tradeoff of some sort. Generally you will be trading off your requirements with extra hardware and cost, and in some cases allowable downtime for the services. Here are some examples.
Frequency
If you want frequent backups, the data that you must be backing up has to be in a completely consistent state during the entire backup window. If this is static website data, that is generally no problem. Given we are talking about MySQL data, you will need to take an exclusive lock over the entire database for the duration of the backup, or use some other mechanism like LVM snapshots (which is strongly recommended).
Downtime
The larger the dataset, the longer the backups will take. If you want to take backups every 15 minutes but a backup takes 16 minutes obviously this is an unmaintainable system. You will need to spend money on faster hardware, or reorganise your dataset to take less space and thus take less time to backup.
Backup Storage
Frequent backups of a large dataset take up space. If you are taking backups every 15 minutes and the resulting backup takes up 5GB, you are looking at almost 500GB of storage just for a single day's worth of backups (and that's not even including temporary space required during the creation of the backups). For SCSI storage where we are still limited to 300GB drives this presents a problem as it drives the hardware requirements up significantly.
You can see a common thread in all of these problems with taking frequent backups - you are weighing costs against costs; the cost of losing your data versus the cost of keeping it backed up. Consider though that if you would stand to lose $1000/hr from lost data, for an entire day's worth of data you have lost $24,000 - well in excess of even a very well equipped premium rackmount server with a full complement of SCSI storage. It is definitely worthwhile putting significant research and preparation into your backup strategy.
Solving the problem with Xdelta
In one instance where we were committed to taking frequent backups of a MySQL database, we decided to use Xdelta. This tool takes an efficient differential of binary files, also working automatically with gzipped files (which is convenient if you are compressing your backups).
Because we wanted to maintain uptime of the MySQL service, we opted to have a second machine replicating the databases from the master database server, and only ever take backups from the slave. In this way, we are able to lock our databases to take consistent backups without affecting the availability or speed of the main database server (aside from the very minor overheads involved in sending replication data).
In this example we have not used LVM snapshots but it would be trivial to add support for them. Using LVM snapshots means you only have to lock the databases against writes for a fraction of a second while the snapshot is created.
Backup jobs
We have a cron fragment set up which controls the running of the different backup jobs:
- A full backup taken every day
- Xdeltas generated between the last full backup and the current database backup, every 15 minutes.
This allows us to point-in-time restore to within the last 15 minutes of the day just by recreating the backup from the last full backup and the most recent xdelta.
# xdeltas every 15 minutes */15 1-23 * * * backup /usr/local/sbin/db-export-mysqldelta 15,30,45 0 * * * backup /usr/local/sbin/db-export-mysqldelta # full backup once daily 0 0 * * * backup /usr/local/sbin/db-export-mysqldelta --daily
Metadata
Since we have replication in place, if a restoration is performed on the master to a previous point in time we will most likely have to wipe the slaves and start again. However the backup is also useful for initialising a new slave. Every new MySQL slave needs a full copy of the data from the master and a binary log name and position from which to start replicating.
Each slave (including the one we are taking backups on) shows you the binary log name and position it is up to on the master in its slave status data (retrieved using the command SHOW SLAVE STATUS). The items Relay_Master_Log_File and Exec_Master_Log_Pos show you which binary log name and position the slave was up to at the time you issued a lock on the databases. We can record this as metadata with the xdelta differential backups so that we know where to start a new slave from when it begins replication.
This alleviates load on the master by allowing us to initialise a new replicating slave completely independently of the master.
Xdelta script
We found that since we need to hold the lock of the database open while the backup is being taken, it is easiest to use the Perl DBI libraries to hold the lock while we fire off system commands to take the backups. Thus we created a Perl script to run the backups.
Essentially what the following script does is:
If the --daily flag has been specified, take a lock of the database and perform a straight mysqldump into a gzipped file.
- Otherwise:
- take a database lock
- record the binary log name and position the slave is up to on the master
- dump the database
- release the database lock
- run xdelta using the last full backup as the source and the dump just taken as the destination
- record the metadata in a file with the same timestamp as the xdelta just taken
Note: this version of the script does not use LVM snapshots.
1 #!/usr/bin/perl
2 #
3 # The local database is connected to as the "backup" user with options from
4 # /home/backup/.my.cnf
5
6 use strict;
7 use warnings;
8
9 use DBI;
10 use Getopt::Long;
11 use Date::Format; # better date/time formatting than Time::Local
12 use Switch '__'; # Allow case statements. Bonus points for anime smiley.
13
14 sub usage {
15 print <<USAGE
16 Usage:
17 $0 [--daily]
18
19 --daily Signals a standard gzipped mysqldump of all databases to
20 take place. Without this flag an uncompressed backup is
21 taken and an xdelta generated between it and the latest
22 gzipped backup.
23
24 USAGE
25 }
26
27 sub debug (@) {
28 print @_ if -t STDOUT;
29 }
30
31 sub connect_db {
32 eval {
33 # Connect the local database
34 $main::localdbh = DBI->connect($main::localdsn, "", "", {'RaiseError' => 1});
35 };
36 if ($@) {
37 warn $@;
38 cleanup(1);
39 };
40 }
41
42 sub setup_local {
43 # lock tables
44 debug "FLUSH TABLES WITH READ LOCK\n";
45 eval {
46 $::localdbh->do("FLUSH TABLES WITH READ LOCK");
47 };
48 if ($@) {
49 warn $@;
50 cleanup(2);
51 };
52
53 # Grab the master binlog name and position
54 my $sth;
55 eval {
56 debug "SHOW SLAVE STATUS\n";
57 $sth = $::localdbh->prepare("SHOW SLAVE STATUS");
58 $sth->execute;
59 };
60 if ($@) {
61 warn $@;
62 cleanup(2);
63 };
64
65 # There should be one row of show slave status data, no more, no less
66 my $rows = $sth->rows;
67 if ($rows != 1) {
68 warn "Error: there appears to be no slave configuration. Exiting.\n";
69 $sth->finish;
70 cleanup(2);
71 }
72 else {
73 # Grab the data row as a hash
74 if (!(defined(my $hash_ref = $sth->fetchrow_hashref))) {
75 warn "Error retrieving slave status data: ", $DBI::errstr;
76 $sth->finish;
77 cleanup(2);
78 }
79 else {
80 $::binlogname = $$hash_ref{'Relay_Master_Log_File'};
81 $::binlogpos = $$hash_ref{'Exec_Master_Log_Pos'};
82 debug("binlogname: $::binlogname\n");
83 debug("binlogpos: $::binlogpos\n");
84 }
85 }
86 $sth->finish;
87 }
88
89 sub dump_data {
90 # perform an uncompressed mysqldump of our local data
91 eval {
92 debug "/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q > $::backupdir/current.sql\n";
93 system("/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q > $::backupdir/current.sql");
94 };
95 if ($@) {
96 warn $@;
97 cleanup(2);
98 };
99
100 # unlock tables
101 eval {
102 debug "UNLOCK TABLES\n";
103 $::localdbh->do("UNLOCK TABLES");
104 };
105 if ($@) {
106 warn "Error unlocking tables: $@";
107 };
108
109 # Run a cleanup of the directory
110 system("/usr/sbin/tmpwatch --mtime $::cleanuptime $::backupdir/");
111
112 # Determine latest full dump from master
113 my @files;
114 eval {
115 opendir DIR, $::backupdir;
116 @files = grep { /\d{4}-\d{2}-\d{2}.gz/ && -f "$::backupdir/$_" } readdir DIR;
117 @files = sort(@files);
118 closedir DIR;
119 };
120 if ($@) {
121 warn $@;
122 cleanup(2);
123 };
124
125 if ($#files < 0) {
126 warn "No dumps from the master to process xdeltas against. Will keep the current full backup.\n";
127
128 eval {
129 my $newname = time2str("%Y-%m-%d", time);
130 debug "mv $::backupdir/current.sql $::backupdir/$newname\n";
131 rename "$::backupdir/current.sql", "$::backupdir/$newname";
132 debug "/bin/gzip -9 $::backupdir/$newname\n";
133 system("/bin/gzip -9 $::backupdir/$newname");
134 };
135 if ($@) {
136 warn $@;
137 cleanup(2);
138 };
139
140 cleanup(2);
141 }
142
143 # list of gzipped files is now sorted in ascending order, the last one is most recent
144 my $backupfile = "$files[$#files]";
145
146 # save the timestamp so we have consistency between the xdelta and metadata
147 my $timestamp = time2str("%Y%m%d%H%M", time);
148
149 # xdelta against it.
150 # need to change the environment TMPDIR as xdelta uses /tmp by default which is nowhere
151 # near big enough. use the mysqlbackup dir as it should have sufficient space.
152 eval {
153 debug "/usr/bin/env TMPDIR=$::backupdir /usr/bin/xdelta3 -q -e -s $::backupdir/$backupfile $::backupdir/current.sql $::backupdir/$timestamp.xdelta\n";
154 system("/usr/bin/env TMPDIR=$::backupdir /usr/bin/xdelta3 -q -e -s $::backupdir/$backupfile $::backupdir/current.sql $::backupdir/$timestamp.xdelta");
155 };
156 if ($@) {
157 warn $@;
158 cleanup(2);
159 };
160
161 # delete uncompressed dump
162 debug "unlinking $::backupdir/current.sql\n";
163 unlink "$::backupdir/current.sql";
164
165 # Write out the metadata
166 debug "writing out file $::backupdir/$timestamp.metadata\n";
167 debug "xdelta against: $backupfile\n";
168 open META, ">$::backupdir/$timestamp.metadata";
169 print META "binlogname: $::binlogname\n";
170 print META "binlogpos: $::binlogpos\n";
171 print META "xdelta against: $backupfile\n";
172 close META;
173 }
174
175 sub conclude_local {
176 # unlock tables
177 debug "UNLOCK TABLES\n";
178 $::localdbh->do("UNLOCK TABLES");
179 }
180
181 sub disconnect_db {
182 # Disconnect the database connections
183 if (defined($::localdbh)) { $::localdbh->disconnect() or warn "Error disconnecting local database: ", $DBI::errstr; };
184 }
185
186 sub cleanup {
187 # Cleanup actions. Cascaded depending on point of failure.
188 switch (shift) {
189 case __ > 1 { conclude_local(); next; }
190 case __ > 0 { disconnect_db(); }
191 }
192 system("rm -f $::lockfile");
193 exit 1;
194 }
195
196 sub run_daily {
197 # Standard daily backup run
198 eval {
199 # Run a cleanup of the directory
200 system("/usr/sbin/tmpwatch --mtime $::cleanuptime $::backupdir/");
201
202 debug "/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q | /bin/gzip -9 > $::backupdir/`/bin/date -I`.gz\n";
203 system("/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q | /bin/gzip -9 > $::backupdir/`/bin/date -I`.gz");
204 };
205 if ($@) {
206 warn $@;
207 system("/bin/rm -f $::backupdir/`/bin/date -I`.gz");
208 unlink "$::lockfile";
209 exit(1);
210 };
211 }
212
213 # Get params
214 my $daily = '';
215 if (!GetOptions('daily' => \$daily)) {
216 usage();
217 exit(1);
218 }
219
220 our $backupdir = "/backup/";
221 our $lockfile = "/tmp/db-export-mysqlxdelta.lock";
222
223 # Only keep backups around for this many hours
224 our $cleanuptime = "44"
225
226 # Ensure we don't run concurrently with ourselves
227 debug("/usr/bin/lockfile -r 0 $lockfile\n");
228 system("/usr/bin/lockfile -r 0 $lockfile") && die "Can't lock $lockfile. Another instance must be running or a stale lockfile was left around.\n";
229
230 # Set up the DSNs to the databases. Local database connects as "backup" using default options.
231 our $localdsn = "DBI:mysql:mysql;mysql_read_default_file=/home/backup/.my.cnf;mysql_connect_timeout=5";
232 our $localdbh;
233
234 # Variables for storing the master binlog name and position
235 our $binlogname;
236 our $binlogpos;
237
238 # Signal a daily backup to be run.
239 if ($daily) {
240 connect_db();
241 setup_local();
242 run_daily();
243
244 # Write out the metadata
245 my $timestamp = time2str("%Y%m%d%H%M", time);
246 debug "writing out file $::backupdir/$timestamp.metadata\n";
247 debug "fullbackup at $timestamp\n";
248 open META, ">$::backupdir/$timestamp.metadata";
249 print META "binlogname: $::binlogname\n";
250 print META "binlogpos: $::binlogpos\n";
251 print META "fullbackup at: $timestamp\n";
252 close META;
253
254 conclude_local();
255 disconnect_db();
256
257 unlink "$lockfile";
258 exit(0);
259 }
260
261 # Make the connection
262 connect_db();
263
264 # Setup the local server
265 setup_local();
266
267 # Backup the database, perform xdeltas etc
268 dump_data();
269
270 # Disconnect from both databases
271 disconnect_db();
272
273 # Remove the lockfile and exit
274 unlink "$lockfile";
275 exit(0);
We end up with a directory of differential backups and metadata files, and one full database dump per day. You can see that the xdelta files are significantly smaller than the original database dump.
... -rw-r--r-- 1 backup backup 917591226 Feb 18 00:16 2009-02-18.gz -rw-r--r-- 1 backup backup 80 Feb 19 02:22 200902190215.metadata -rw-r--r-- 1 backup backup 686793 Feb 19 02:22 200902190215.xdelta -rw-r--r-- 1 backup backup 80 Feb 19 03:21 200902190314.metadata -rw-r--r-- 1 backup backup 896541 Feb 19 03:21 200902190314.xdelta -rw-r--r-- 1 backup backup 80 Feb 19 04:24 200902190416.metadata -rw-r--r-- 1 backup backup 1114947 Feb 19 04:24 200902190416.xdelta ...
The metadata files look like this:
binlogname: mysql-bin.000266 binlogpos: 199674912 xdelta against: 2009-02-19.gz
Conclusion
We have now drastically reduced the diskspace taken up by database backups, which is a significant factor in maintaining frequent backups of the critical data.
To improve upon this, you could implement any of the following:
- Additional disk spindles for the backup storage directory, so it is located on separate storage to the database. This will increase the speed of dump and xdelta operations.
- LVM Snapshots
- Solid State data storage which will vastly increase the speed of all data operations.
- With large amounts of RAM, you could potentially have xdelta operations running very quickly on tmpfs storage (depending on your database size).
See also:
