Better Postgres backups with hot backup

If you've read our page on Postgres backups using WAL archiving you might've found it a little daunting. That's why this page exists, for doing what they refer to as a Hot Backup. It doesn't allow a Point-In-Time Recover (PITR) like WAL Archiving does, but it's much simpler, and probably more suitable for most situations.

Better compression

I've already covered this on the WAL Archiving page - in short, we're going to use LZO instead of gzip because it's much faster. If you don't have LZO available you can always fallback to gzip.

Preparing for live backups

Postgres needs to be configured to archive its Write-Ahead Logs (WAL, aka. transaction logs), which can be applied to a baseline backup for point-in-time recovery.

  1. Make a home for the WAL files

    PG_ROOT=/var/lib/postgresql/8.3
    WAL_DIR=$PG_ROOT/wal_archive
    
    mkdir $WAL_DIR
    chmod 700 WAL_DIR
    chown postgres.postgres WAL_DIR
  2. Enable archiving in postgresql.conf

    archive_mode = on
    #archive_command = 'test ! -f $PG_ROOT/backup_in_progress || cp -i %p $WAL_DIR/%f < /dev/null'
    archive_command = 'test ! -f $PG_ROOT/backup_in_progress || lzop --path=$WAL_DIR %p'
  3. Restart postgres

    invoke-rc.d postgresql-8.3 restart

Taking your backup

This is taken practically verbatim from Postgres' docs. Run it as root.

  • /usr/local/sbin/db-export-pgsql-hotbackup

    #! /bin/bash
    #
    # Backup the entire PostgreSQL cluster
    
    case `whoami` in
    root)
            if [ ! -d $PG_BACKUP ]
            then
                    mkdir -m 0700 $PG_BACKUP
            fi
            ;;
    *)
            echo "$0 must be run as root" 1>&2
            exit 1
            ;;
    esac
    
    
    if [ ! -d $PG_BACKUP ]
    then
            echo "ERROR: database backup directory '$PG_BACKUP' does not exist." 1>&2
            exit 1
    fi
    
    if [ ! -f $LZOP ]
    then
            echo "ERROR: lzop compression executable (${LZOP}) not found." 1>&2
            exit 1
    fi
    
    
    PG_ROOT=/var/lib/postgresql/8.3
    WAL_DIR=$PG_ROOT/wal_archive
    PG_BACKUP=/data/pgsqlbackup
    DATE=`date +%Y%m%d-%H%M%S`
    LZOP=/usr/bin/lzop
    
    touch $PG_ROOT/backup_in_progress
    
    sudo -u postgres psql -c "SELECT pg_start_backup('hot_backup');"
    tar -c --use-compress-program $LZOP -vf $PG_BACKUP/${DATE}_postgres_a_data.tar.lzo $PG_ROOT/main/
    sudo -u postgres psql -c "SELECT pg_stop_backup();"
    
    rm $PG_ROOT/backup_in_progress
    sleep 5 # wait for the final WAL file to be archived
    tar -c --use-compress-program $LZOP -vf $PG_BACKUP/${DATE}_postgres_b_wal.tar.lzo $WAL_DIR/
    
    
    if [ -f /etc/redhat-release ]
    then
            /usr/sbin/tmpwatch --mtime 71 $PG_BACKUP
    elif [ -f /etc/debian_version ]
    then
            /usr/sbin/tmpreaper --mtime 71h $PG_BACKUP
    fi
    
    # Run clustering script if it exists
    if [ -x /usr/local/sbin/pg-cluster ]
    then
            /usr/local/sbin/pg-cluster
    fi

Recovery from backup

Follow official docs: http://www.postgresql.org/docs/current/static/continuous-archiving.html :)

According to "24.3.3. Recovering using a Continuous Archive Backup"

  1. Stop the server, if it's running.
  2. If you have the space to do so, copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later. Note that this precaution will require that you have enough free space on your system to hold two copies of your existing database. If you do not have enough space, you need at the least to copy the contents of the pg_xlog subdirectory of the cluster data directory, as it might contain logs which were not archived before the system went down.

    PG_ROOT=/var/lib/postgresql/8.3
    WAL_DIR=$PG_ROOT/wal_archive
    PG_BACKUP=/data/pgsqlbackup
    DATE=`date +%Y%m%d-%H%M%S`
    LZOP=/usr/bin/lzop
    
    mkdir /tmp/pg_deadcopy
    rsync -avx $PG_ROOT/main/ /tmp/pg_deadcopy/
  3. Clean out all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.

    rm -rfv $PG_ROOT/main/*
  4. Restore the database files from your base backup. Be careful that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.

    # this will extract to the root of the FS, which will have the full var/lib/postgresql/8.3/main path along with it
    tar -x --use-compress-program=/usr/bin/lzop -vf 20090818-183907_postgres_a_data.tar.lzo -C /
    tar -x --use-compress-program=/usr/bin/lzop -vf 20090818-183907_postgres_b_wal.tar.lzo -C /
  5. Remove any files present in pg_xlog/; these came from the backup dump and are therefore probably obsolete rather than current. If you didn't archive pg_xlog/ at all, then recreate it, being careful to ensure that you re-establish it as a symbolic link if you had it set up that way before.

    rm -rfv $PG_ROOT/main/pg_xlog/*
  6. If you had unarchived WAL segment files that you saved in step 2, copy them into pg_xlog/. (It is best to copy them, not move them, so that you still have the unmodified files if a problem occurs and you have to start over.)

    rsync -avx /tmp/pg_deadcopy/pg_xlog/ $PG_ROOT/main/pg_xlog/
  7. Create a recovery command file recovery.conf in the cluster data directory (see Recovery Settings). You might also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery has worked.

    echo "restore_command = '/usr/bin/lzop -d $WAL_DIR/%f.lzo -o %p'"
  8. Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode in case of a crash later) and then commence normal database operations.
  9. Inspect the contents of the database to ensure you have recovered to where you want to be. If not, return to step 1. If all is well, let in your users by restoring pg_hba.conf to normal.


See also:

  • Links to Anchor public wiki articles that relate to this article.

References/External Links