Frequent Database Backups With Xdelta

How frequently do you backup your server, database, web data? Chances are your answer is "daily" like most other people (and hopefully not "what backups?"). By and large this is sufficient for most web-based operations that don't receive much traffic or don't have frequently changing data. Obviously though this isn't sufficient for everybody.

The Problem

If your data is changing quite frequently and the cost of losing even an hour's worth of data is significant you must consider taking more frequent backups than just daily. This raises a few questions:

  • How frequently should we be taking backups?
  • How much downtime of the services are permitted for backing up data?
  • Where do we store the backups?
  • How many backups should we store?

In the scenario I will describe below I will be talking about taking MySQL database backups, but the general procedure could apply equally to just about any type of data.

Answering the questions

Unfortunately with backups, just about every problem that you'll need to deal with will require a tradeoff of some sort. Generally you will be trading off your requirements with extra hardware and cost, and in some cases allowable downtime for the services. Here are some examples.

Frequency

If you want frequent backups, the data that you must be backing up has to be in a completely consistent state during the entire backup window. If this is static website data, that is generally no problem. Given we are talking about MySQL data, you will need to take an exclusive lock over the entire database for the duration of the backup, or use some other mechanism like LVM snapshots (which is strongly recommended).

Downtime

The larger the dataset, the longer the backups will take. If you want to take backups every 15 minutes but a backup takes 16 minutes obviously this is an unmaintainable system. You will need to spend money on faster hardware, or reorganise your dataset to take less space and thus take less time to backup.

Backup Storage

Frequent backups of a large dataset take up space. If you are taking backups every 15 minutes and the resulting backup takes up 5GB, you are looking at almost 500GB of storage just for a single day's worth of backups (and that's not even including temporary space required during the creation of the backups). For SCSI storage where we are still limited to 300GB drives this presents a problem as it drives the hardware requirements up significantly.

You can see a common thread in all of these problems with taking frequent backups - you are weighing costs against costs; the cost of losing your data versus the cost of keeping it backed up. Consider though that if you would stand to lose $1000/hr from lost data, for an entire day's worth of data you have lost $24,000 - well in excess of even a very well equipped premium rackmount server with a full complement of SCSI storage. It is definitely worthwhile putting significant research and preparation into your backup strategy.

Solving the problem with Xdelta

In one instance where we were committed to taking frequent backups of a MySQL database, we decided to use Xdelta. This tool takes an efficient differential of binary files, also working automatically with gzipped files (which is convenient if you are compressing your backups).

Because we wanted to maintain uptime of the MySQL service, we opted to have a second machine replicating the databases from the master database server, and only ever take backups from the slave. In this way, we are able to lock our databases to take consistent backups without affecting the availability or speed of the main database server (aside from the very minor overheads involved in sending replication data).

In this example we have not used LVM snapshots but it would be trivial to add support for them. Using LVM snapshots means you only have to lock the databases against writes for a fraction of a second while the snapshot is created.

Backup jobs

We have a cron fragment set up which controls the running of the different backup jobs:

  • A full backup taken every day
  • Xdeltas generated between the last full backup and the current database backup, every 15 minutes.

This allows us to point-in-time restore to within the last 15 minutes of the day just by recreating the backup from the last full backup and the most recent xdelta.

# xdeltas every 15 minutes
*/15 1-23 * * *            backup  /usr/local/sbin/db-export-mysqldelta
15,30,45 0 * * *           backup  /usr/local/sbin/db-export-mysqldelta

# full backup once daily
0 0 * * *               backup  /usr/local/sbin/db-export-mysqldelta --daily

Metadata

Since we have replication in place, if a restoration is performed on the master to a previous point in time we will most likely have to wipe the slaves and start again. However the backup is also useful for initialising a new slave. Every new MySQL slave needs a full copy of the data from the master and a binary log name and position from which to start replicating.

Each slave (including the one we are taking backups on) shows you the binary log name and position it is up to on the master in its slave status data (retrieved using the command SHOW SLAVE STATUS). The items Relay_Master_Log_File and Exec_Master_Log_Pos show you which binary log name and position the slave was up to at the time you issued a lock on the databases. We can record this as metadata with the xdelta differential backups so that we know where to start a new slave from when it begins replication.

This alleviates load on the master by allowing us to initialise a new replicating slave completely independently of the master.

Xdelta script

We found that since we need to hold the lock of the database open while the backup is being taken, it is easiest to use the Perl DBI libraries to hold the lock while we fire off system commands to take the backups. Thus we created a Perl script to run the backups.

Essentially what the following script does is:

  • If the --daily flag has been specified, take a lock of the database and perform a straight mysqldump into a gzipped file.

  • Otherwise:
    • take a database lock
    • record the binary log name and position the slave is up to on the master
    • dump the database
    • release the database lock
    • run xdelta using the last full backup as the source and the dump just taken as the destination
    • record the metadata in a file with the same timestamp as the xdelta just taken

Note: this version of the script does not use LVM snapshots.

#!/usr/bin/perl
#
# The local database is connected to as the "backup" user with options from
# /home/backup/.my.cnf

use strict;
use warnings;

use DBI;
use Getopt::Long;
use Date::Format;       # better date/time formatting than Time::Local
use Switch '__';        # Allow case statements. Bonus points for anime smiley.

sub usage {
        print <<USAGE
Usage:
$0 [--daily]

--daily             Signals a standard gzipped mysqldump of all databases to
                    take place. Without this flag an uncompressed backup is
                    taken and an xdelta generated between it and the latest
                    gzipped backup.

USAGE
}

sub debug (@) {
  print @_ if -t STDOUT;
}

sub connect_db {
        eval {
                # Connect the local database
                $main::localdbh = DBI->connect($main::localdsn, "", "", {'RaiseError' => 1});
        };
        if ($@) {
                warn $@;
                cleanup(1);
        };
}

sub setup_local {
        # lock tables
        debug "FLUSH TABLES WITH READ LOCK\n";
        eval {
                $::localdbh->do("FLUSH TABLES WITH READ LOCK");
        };
        if ($@) {
                warn $@;
                cleanup(2);
        };

        # Grab the master binlog name and position
        my $sth;
        eval {
                debug "SHOW SLAVE STATUS\n";
                $sth = $::localdbh->prepare("SHOW SLAVE STATUS");
                $sth->execute;
        };
        if ($@) {
                warn $@;
                cleanup(2);
        };

        # There should be one row of show slave status data, no more, no less           
        my $rows = $sth->rows;
        if ($rows != 1) {
                warn "Error: there appears to be no slave configuration. Exiting.\n";
                $sth->finish;
                cleanup(2);
        }
        else {
                # Grab the data row as a hash
                if (!(defined(my $hash_ref = $sth->fetchrow_hashref))) {
                        warn "Error retrieving slave status data: ", $DBI::errstr;
                        $sth->finish;
                        cleanup(2);
                }
                else {
                        $::binlogname = $$hash_ref{'Relay_Master_Log_File'};
                        $::binlogpos = $$hash_ref{'Exec_Master_Log_Pos'};
                        debug("binlogname: $::binlogname\n");
                        debug("binlogpos: $::binlogpos\n");
                }
        }
        $sth->finish;
}

sub dump_data {
        # perform an uncompressed mysqldump of our local data
        eval {
                debug "/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q > $::backupdir/current.sql\n";
                system("/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q > $::backupdir/current.sql");
        };
        if ($@) {
                warn $@;
                cleanup(2);
        };

        # unlock tables
        eval {
                debug "UNLOCK TABLES\n";
                $::localdbh->do("UNLOCK TABLES");
        };
        if ($@) {
                warn "Error unlocking tables: $@";
        };

        # Run a cleanup of the directory
        system("/usr/sbin/tmpwatch --mtime $::cleanuptime $::backupdir/");

        # Determine latest full dump from master
        my @files;
        eval {
                opendir DIR, $::backupdir;
                @files = grep { /\d{4}-\d{2}-\d{2}.gz/ && -f "$::backupdir/$_" } readdir DIR;
                @files = sort(@files);
                closedir DIR;
        };
        if ($@) {
                warn $@;
                cleanup(2);
        };

        if ($#files < 0) {
                warn "No dumps from the master to process xdeltas against. Will keep the current full backup.\n";

                eval {
                        my $newname = time2str("%Y-%m-%d", time);
                        debug "mv $::backupdir/current.sql $::backupdir/$newname\n";
                        rename "$::backupdir/current.sql", "$::backupdir/$newname";
                        debug "/bin/gzip -9 $::backupdir/$newname\n";
                        system("/bin/gzip -9 $::backupdir/$newname");
                };
                if ($@) {
                        warn $@;
                        cleanup(2);
                };
        
                cleanup(2);
        }

        # list of gzipped files is now sorted in ascending order, the last one is most recent
        my $backupfile = "$files[$#files]";

        # save the timestamp so we have consistency between the xdelta and metadata
        my $timestamp = time2str("%Y%m%d%H%M", time);

        # xdelta against it.
        # need to change the environment TMPDIR as xdelta uses /tmp by default which is nowhere
        # near big enough. use the mysqlbackup dir as it should have sufficient space.
        eval {
                debug "/usr/bin/env TMPDIR=$::backupdir /usr/bin/xdelta3 -q -e -s $::backupdir/$backupfile $::backupdir/current.sql $::backupdir/$timestamp.xdelta\n";
                system("/usr/bin/env TMPDIR=$::backupdir /usr/bin/xdelta3 -q -e -s $::backupdir/$backupfile $::backupdir/current.sql $::backupdir/$timestamp.xdelta");
        };
        if ($@) {
                warn $@;
                cleanup(2);
        };
        
        # delete uncompressed dump
        debug "unlinking $::backupdir/current.sql\n";
        unlink "$::backupdir/current.sql";
        
        # Write out the metadata
        debug "writing out file $::backupdir/$timestamp.metadata\n";
        debug "xdelta against: $backupfile\n";
        open META, ">$::backupdir/$timestamp.metadata";
        print META "binlogname: $::binlogname\n";
        print META "binlogpos: $::binlogpos\n";
        print META "xdelta against: $backupfile\n";
        close META;
}

sub conclude_local {
        # unlock tables
        debug "UNLOCK TABLES\n";
        $::localdbh->do("UNLOCK TABLES");
}

sub disconnect_db {
        # Disconnect the database connections
        if (defined($::localdbh)) { $::localdbh->disconnect() or warn "Error disconnecting local database: ", $DBI::errstr; };
}

sub cleanup {
        # Cleanup actions. Cascaded depending on point of failure.
        switch (shift) {
                case __ > 1 { conclude_local(); next; }
                case __ > 0 { disconnect_db(); }
        }
        system("rm -f $::lockfile");
        exit 1;
}

sub run_daily {
        # Standard daily backup run
        eval {
                # Run a cleanup of the directory
                system("/usr/sbin/tmpwatch --mtime $::cleanuptime $::backupdir/");

                debug "/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q | /bin/gzip -9 > $::backupdir/`/bin/date -I`.gz\n";
                system("/usr/bin/mysqldump --opt --skip-lock-tables -q -c -A -Q | /bin/gzip -9 > $::backupdir/`/bin/date -I`.gz");
        };
        if ($@) {
                warn $@;
                system("/bin/rm -f $::backupdir/`/bin/date -I`.gz");
                unlink "$::lockfile";
                exit(1);
        };
}

# Get params
my $daily = '';
if (!GetOptions('daily' => \$daily)) {
        usage();
        exit(1);
}

our $backupdir = "/backup/";
our $lockfile = "/tmp/db-export-mysqlxdelta.lock";

# Only keep backups around for this many hours
our $cleanuptime = "44"

# Ensure we don't run concurrently with ourselves
debug("/usr/bin/lockfile -r 0 $lockfile\n");
system("/usr/bin/lockfile -r 0 $lockfile") && die "Can't lock $lockfile. Another instance must be running or a stale lockfile was left around.\n";

# Set up the DSNs to the databases. Local database connects as "backup" using default options.
our $localdsn = "DBI:mysql:mysql;mysql_read_default_file=/home/backup/.my.cnf;mysql_connect_timeout=5";
our $localdbh;

# Variables for storing the master binlog name and position
our $binlogname;
our $binlogpos;

# Signal a daily backup to be run.
if ($daily) {
        connect_db();
        setup_local();
        run_daily();

        # Write out the metadata
        my $timestamp = time2str("%Y%m%d%H%M", time);
        debug "writing out file $::backupdir/$timestamp.metadata\n";
        debug "fullbackup at $timestamp\n";
        open META, ">$::backupdir/$timestamp.metadata";
        print META "binlogname: $::binlogname\n";
        print META "binlogpos: $::binlogpos\n";
        print META "fullbackup at: $timestamp\n";
        close META;

        conclude_local();
        disconnect_db();

        unlink "$lockfile";
        exit(0);
}

# Make the connection
connect_db();

# Setup the local server
setup_local();

# Backup the database, perform xdeltas etc
dump_data();

# Disconnect from both databases
disconnect_db();

# Remove the lockfile and exit
unlink "$lockfile";
exit(0);

We end up with a directory of differential backups and metadata files, and one full database dump per day. You can see that the xdelta files are significantly smaller than the original database dump.

...
-rw-r--r--  1 backup backup  917591226 Feb 18 00:16 2009-02-18.gz
-rw-r--r--  1 backup backup         80 Feb 19 02:22 200902190215.metadata
-rw-r--r--  1 backup backup     686793 Feb 19 02:22 200902190215.xdelta
-rw-r--r--  1 backup backup         80 Feb 19 03:21 200902190314.metadata
-rw-r--r--  1 backup backup     896541 Feb 19 03:21 200902190314.xdelta
-rw-r--r--  1 backup backup         80 Feb 19 04:24 200902190416.metadata
-rw-r--r--  1 backup backup    1114947 Feb 19 04:24 200902190416.xdelta
...

The metadata files look like this:

binlogname: mysql-bin.000266
binlogpos: 199674912
xdelta against: 2009-02-19.gz

Conclusion

We have now drastically reduced the diskspace taken up by database backups, which is a significant factor in maintaining frequent backups of the critical data.

To improve upon this, you could implement any of the following:

  • Additional disk spindles for the backup storage directory, so it is located on separate storage to the database. This will increase the speed of dump and xdelta operations.
  • LVM Snapshots
  • Solid State data storage which will vastly increase the speed of all data operations.
  • With large amounts of RAM, you could potentially have xdelta operations running very quickly on tmpfs storage (depending on your database size).


See also:

References/External Links