Rolling out your website's codebase, easy and pain-free

We work with a lot of web developers, it's the nature of our business. We don't hear out of a lot of them, but when there's problems we're often the first port of call. We're here to help!

When it comes to grief, there seems to be two overwhelmingly common features:

  1. Not using source-control
  2. Not having a good way to update the site's codebase

I won't cover the first point here. For those of you wanting to learn, one of my fair colleagues has written an intro guide to using Subversion, a popular cross-platform tool for managing your software development. We don't care what you use, but for Turing's sake, please use something.

  • CVS (Concurrent Versions System) - old-skool predecessor to Subversion

  • Subversion - Anchor's weapon of choice; flexible, feature-rich centralised version control

  • MS SourceSafe - Microsoft's offering, integrates with Visual Studio

  • Bazaar - distributed revision control system sponsored by Canonical Ltd. of Ubuntu fame

  • Git - distributed revision control system used for Linux kernel development

  • Many, many more...

So what's this about updating your website code? Surely you just upload your files by FTP and be done with it. Not so fast!

What problems are we trying to solve here?

One of the more common requests we get is to restore a website from backups. This is usually because the customer or developer has logged in and accidentally blown away the entire site, or the vital index.php file, or the like. Less commonly, a site may have been "hacked" and defaced. Whatever the case, this takes precious time, and we'll charge you for the privilege.

This is less of a problem if the developer is conscientious and keeps a copy of the site on their machine, but it falls apart if the live site has been edited on the server. Let's face it, it happens. It's just so easy to go in and make a quick change to fix a little bug here and there...

What we're really talking about is a more formalised process. Ideally you'd make changes on a development copy of your site, test them thoroughly, commit them to source-control, then cleanly update the live website. If this is too hard then it simply doesn't happen. This is where the rollout process starts. We're big on automation here at Anchor because it means wasting less brain cells on thinking about stuff that shouldn't matter. Who cares if it's 2am and you're debugging the website through bleary eyes? If it works when you test it, and you always use the same rollout procedure, you can be confident that it'll just work now. You don't need to think about that little typo-fix you made last Friday because it's already there and you didn't update the live site directly.

Technologies and techniques

The rollout process is generally independent of the source-control software you use. This is a good thing, because it leaves you free to use whatever suits you best. It also means you can do rollouts without a source-control system (something that should generally be avoided).

Manual process with FTP

In the olden days you'd test the site on your own machine, then login by FTP and upload the files. This is pretty tedious. If your internet connection isn't the diameter of a drinking straw then you probably started editing code on the webserver and hitting Reload in your browser. Before you know it you're not using source-control and you've got no rollout procedure... :(

Scripted FTP

As a marginal improvement, you could script up an FTP solution that'd login for you, then upload your files. This is still poor because FTP is dumb and will upload your entire site, hi-res images and all.

Rsync

This is our preferred method. Rsync is a smart protocol that transfers differences between sets of files, and can operate over various underlying transports. Rsync supports its own dedicated transport protocol, but more commonly it's used over SSH. Its support for RCP isn't worth mentioning.

This is a trivial example of using rsync to update my website:

yui@shirayuki:~$ rsync -av ~/dev/mysite/ yui@www7.anchor.net.au:~/public_html/
building file list ... done
util_regexes.py

sent 1371 bytes  received 15 bytes  2772.00 bytes/sec
total size is 163948  speedup is 118.29

I've only modified one file, so it only uploads that file. Brilliant! We'll work on extending this.

What about Windows?

What indeed. According to my limited research Windows doesn't quite have anything like rsync. You can install cygwin but that's pretty messy. I believe WinSCP may give you something useful, but I haven't tested this yet. Various IDEs and development tools may have FTP clients built in with some smarts. I'd love to know if there's a good solution for this!

Developing the rsync process further

The earlier example of rsync is useful, but inconvenient. Far better is to take advantage of make, the long-standing tool of choice for dependency management with code compilation. In case you've never used make before, it suffices to know that:

  • make lets you define rules governing actions to be taken

  • these rules are written in a Makefile

  • a Makefile is like a recipe, with rules having dependencies upon other rules as prerequisites

  • make uses timestamps on files to determine what has changed since the last compilation run

  • make only compiles what's necessary to achieve the final output

I'm going to assume at this point that you're comfortable with your source-control software. I'll leave it until a little later, but we'll eventually integrate the source-control into the rollout process. You can leverage the synergy between the two. Or something.

This is the Makefile we're using for now, it goes in the root directory of your code

ROUSER := USERNAME
ROHOST := SERVER_HOSTNAME
RODIR := /home/USERNAME/public_html

RSYNC_OPTIONS := --verbose \
     --checksum \
     --recursive \
     --links \
     --times \
     --perms \
     --cvs-exclude \
     --compress \
     --delete --delete-after --delete-excluded

.PHONY: all rollout

all:

rollout:
        rsync $(RSYNC_OPTIONS) ./ $(ROUSER)@$(ROHOST):$(RODIR)/

You'll need to fill in the details for your username and server, as well as whatever customisations you have for the path. When run this will do the job of ensuring all your website code is pushed to the server. You'll notice that the first rule defined is "all", and it has no actions. This is deliberate, as it requires you to conciously type make rollout to perform a rollout. You're welcome to change things for simplicity if you want.

Our rsync options also have the effect of deleting any files on the server that don't exist in our local directory, so be careful! You'll want to play with this on a test system before jumping in with live code.

A typical run of the Makefile rsync

For the sake of clarity, let's quickly run through the new development cycle. You have a very simple site with an index.html and a couple of images. You've been developing the site in your home directory on your workstation, under the directory imaginatively titled mysite

First-time use to deploy a new site

You have a shiny new hosting account at Anchor with nothing in it yet. You've just copied in the Makefile from our example and are ready to deploy it.

  1. The configuration parameters in your Makefile might look like this:

    ROUSER := yui
    ROHOST := www7.anchor.net.au
    RODIR := /home/yui/public_html
  2. Let's roll it out:

    yui@shirayuki:~/mysite$ make rollout
    rsync --verbose --checksum --recursive --links --times --perms --cvs-exclude --compress --delete --delete-after --delete-excluded . yui@www7.anchor.net.au:/home/yui/public_html/
    yui@www7.anchor.net.au's password:
    building file list ... done
    ./
    Makefile
    index.html
    header.jpg
    menu.jpg
    
    sent 8890 bytes  received 435 bytes  9001.31 bytes/sec
    total size is 8890  speedup is 1.00
  3. Check out the site now, asssuming you've already sorted out a domain name ahead of time.

Making changes to the site

Your rollout was flawless, exactly as planned. Let's pretend that it's a week later and the images have been updated. You used to have some placeholders, but your graphic designer has now finished their work, so they can be uploaded.

  1. Simply replace the files in your development copy of the site
  2. Run make rollout

    yui@shirayuki:~/mysite$ make rollout
    rsync --verbose --checksum --recursive --links --times --perms --cvs-exclude --compress --delete --delete-after --delete-excluded . yui@www7.anchor.net.au:/home/yui/public_html/
    yui@www7.anchor.net.au's password:
    sending incremental file list
    header.jpg
    menu.jpg
    
    sent 4445 bytes  received 313 bytes  8492.13 bytes/sec
    total size is 8890  speedup is 2.00
  3. That's it, your site is now up-to-date

Tidying things up a little

You might have noticed that when you run make rollout you're also uploading a copy of the Makefile. This is the correct behaviour from rsync, but it's probably not what you want. We'll now introduce an "exclude" file to fix this.

  1. We create a new files, NOROLLOUT

    Makefile
    NOROLLOUT

    You'll note we need to exclude NOROLLOUT in the NOROLLOUT file

  2. Update your Makefile to use this list of exclusions. We do this by adding another option to the list of RSYNC_OPTIONS

         --exclude-from=NOROLLOUT \
  3. Next time we perform a rollout, we'll see that the rsync options reflect the change, and that Makefile and NOROLLOUT are deleted from the remote system

    yui@shirayuki:~/mysite$ make rollout
    rsync --verbose --checksum --recursive --links --times --perms --cvs-exclude --compress --exclude-from=NOROLLOUT --delete --delete-after --delete-excluded . yui@www7.anchor.net.au:/home/yui/public_html/
    yui@www7.anchor.net.au's password:
    sending incremental file list
    deleting NOROLLOUT
    deleting Makefile
    
    sent 104 bytes  received 12 bytes  33.14 bytes/sec
    total size is 8890  speedup is 85.48

Integrating the rollout with source-control

There are a few small changes we can make to enhance things further. For these examples we're using Subversion. These demonstrate slightly more advanced usage of make and subversion, so we won't spend too much time on explanations. It's highly encouraged to explore and learn more yourself, but for the sake of these examples you can generally just copy and paste them into your Makefile for the desired results.

Keeping your rollouts clean

When you put your codebase under Subversion control it'll put .svn directories all through your working copy. Like the Makefile, we don't really want to roll these out. We'll update the NOROLLOUT file to exclude these, as well as a couple of temporary files we can also ignore that might pop up from time to time.

NOROLLOUT
Makefile
.svn
svn-commit*tmp
.*.swp
.*.*.swp

Pretty output

Used carefully, colourful output can help make problems stand out. Similarly, they can be used for easy visual confirmation than everything has gone according to plan. make will halt and let us know if there's a problem, so lack of such an event points to success. We use shell colour codes to great effect here, making the outcome obvious no matter how fatigued you are.

This updates the rollout rule, adding a few lines to the tail-end. The amended version of the rule is shown here in full

rollout:
        rsync $(RSYNC_OPTIONS) ./ $(ROUSER)@$(ROHOST):$(RODIR)/
        @echo "\\033[1;32m"
        @echo "  ============================================"
        @echo -n "  Rollout time is "
        @date
        @echo "  ============================================"
        @echo "\\033[0;39m";

The exact usage of colour codes will be governed by your shell; you may need to add the -e flag to your invocations of echo. This Works For Me with the bash shell version 3.2.48

Warning about unversioned files

To make sure we don't run into any surprises, we'd like to know if there's any files being rolled out that aren't in source-control. We've chosen to do this by adding a few lines to the post-rsync phase, after the "rollout-OK" banner. We could split this into a separate prerequisite for the rollout, but we're happy with it being a non-critical operation.

The updated rollout rule, including the previous pretty-printing banner.

rollout:
        rsync $(RSYNC_OPTIONS) ./ $(ROUSER)@$(ROHOST):$(RODIR)/
        @echo "\\033[1;32m"
        @echo "  ============================================"
        @echo -n "  Rollout time is "
        @date
        @echo "  ============================================"
        @if [ `svn st | wc -l` -gt 0 ]; then\
                echo "\\033[1;31m"; \
                echo "  ********************************************";\
                echo "  you still have modified files not checked in";\
                echo "  ********************************************";\
        fi
        @echo "\\033[0;39m";

Ensuring all changes are checked-in before rollout

While the previous check for unversioned files acted after the rollout has occurred (actually, it checks for uncommitted changes as well as unversioned files), we'd like to make it a strong prerequisite for rollout now. To do this we add another rule, and make it a prerequisite of the rollout rule.

committed:
        @if [ `svn st | wc -l` -gt 0 ]; then\
                echo -en "\\033[1;31m"; \
                echo "  ********************************************";\
                echo "  Whoa there cowboy! You still have modified files"; \
                echo "  not commited to subversion!";\
                echo "  ********************************************";\
                echo -en "\\033[0;39m"; \
                svn st; \
                false;\
        fi

rollout: committed
        # REST OF THE rollout RULE OMITTED FOR BREVITY

# We also need to update the PHONY targets list, to tell make that these rules are special and don't produce output
.PHONY: all rollout committed

Ensuring we're always rolling out the latest code

If you have more than one person working on your codebase then it's inevitable that you'll end up committing code at roughly the same time. While this doesn't solve the problem of a developer committing "broken" code, we can at least make sure that our copy is up to date before rolling it out. We do this by imposing an svn update before rolling out, with an added check that there are no conflicts caused by the update.

Once again, we add another rule, and make it a prerequisite for the rollout.

update:
        svn up
        @if svn st | grep -q ^C ; then \
                echo -en "\\033[1;31m"; \
                echo "  ****************"; \
                echo "  conflicts exist!"; \
                echo "  ****************"; \
                echo -en "\\033[0;39m"; \
                false;\
        fi

rollout: update committed
        # REST OF THE rollout RULE OMITTED FOR BREVITY

# Update the PHONY targets list again
.PHONY: all rollout committed update

This uses the subversion status command to check for any local inconsistencies, and breaks if anything is found. As committed is now a prerequisite for rollout, it's impossible to rollout the codebase unless the committed check passes successfully.

Finished Makefile with all examples

The following is a copypaste-and-use example, pulled together from the examples shown so far. It's up to you to put in the right parameters and get it working in your particular environment.

ROUSER := USERNAME
ROHOST := SERVER_HOSTNAME
RODIR := /home/USERNAME/public_html

RSYNC_OPTIONS := --verbose \
     --checksum \
     --recursive \
     --links \
     --times \
     --perms \
     --cvs-exclude \
     --compress \
     --exclude-from=NOROLLOUT \
     --delete --delete-after --delete-excluded

.PHONY: all rollout committed update

all:

committed:
        @if [ `svn st | wc -l` -gt 0 ]; then\
                echo -en "\\033[1;31m"; \
                echo "  ********************************************";\
                echo "  Whoa there cowboy! You still have modified files"; \
                echo "  not commited to subversion!";\
                echo "  ********************************************";\
                echo -en "\\033[0;39m"; \
                svn st; \
                false;\
        fi

update:
        svn up
        @if svn st | grep -q ^C ; then \
                echo -en "\\033[1;31m"; \
                echo "  ****************"; \
                echo "  conflicts exist!"; \
                echo "  ****************"; \
                echo -en "\\033[0;39m"; \
                false;\
        fi

rollout: update committed
        rsync $(RSYNC_OPTIONS) ./ $(ROUSER)@$(ROHOST):$(RODIR)/
        @echo "\\033[1;32m"
        @echo "  ============================================"
        @echo -n "  Rollout time is "
        @date
        @echo "  ============================================"
        @echo "\\033[0;39m";

Where to from here?

The possibilities are limited only by your imagination. Cliched, I know, but you're a developer aren't you? Get that brain in gear, you need to be a good thinker! Anchor isn't a web-dev company so our requirements are fairly limited, but there are plenty of ways to make your life easier.

Use SSH keys for the rsync rollout

You'll notice in the above examples that I'm typing my password every time I perform a rollout. This gets really tedious after a few rollouts, that must be a better way. Boy do I have a deal for you! SSH keys are the answer. Used properly, they are a secure and convenient way to do rollouts (running rsync over SSH transport), along with just about anything else involving SSH. We already have a fine article about leveraging SSH keys for world peace, so go and check it out.

Hosted source-control integration

In the last week alone I've handled two enquiries about using hosted services to trigger an automatic rollout of a codebase on a webserver (specifically, Github and CVSdude). These are externally hosted source-control repositories that allow you to specify an external post-commit hook; in this case you can have the repository server access a designated URL on you own webserver. This can do anything you want, but the requests I dealt with asked specifically for the webserver to get an updated copy of the code in the repository and roll that out to the live site.

At a technical level, this is entirely possible; you can make this a reasonably reliable process with a little scripting and sanity-checking. We have real concerns about the security implications of this, however.

First up, any URL-triggered script is going to run in the context of your webserver, indirectly or otherwise. This script is going to update the codebase being run by the webserver. You need look no further than the latest embarrassing public disclosure of a high-profile website hack to know this isn't the best idea in the world. Securing such processes is non-trivial.

Our second concern relates to human-error. If you work as part of a team, the probability that someone will commit "broken" code (whatever you define broken code to be) increases as you add more team members. One day, someone will commit a change that breaks the site, and an automated rollout process makes this immediately visible. Strict testing and peer-review policies can reduce the risk, but by tying the rollout to the development process you prevent people doing work at times when noone is around to validate them. I personally work best between midnight and 6am. Best practice dictates that I commit my changes frequently, but now every commit means I risk breaking the live site.

If you're still hell-bent on doing this and you're on an Anchor server:

  • You will need a shell script that can run your repository-management commands (just because you're a PHP mastah doesn't mean you should use PHP, just write a shell script)

  • If you're on a shared hosting server, this will run as a standard CGI script, as your user, via suexec
  • If you're on a dedicated server then you'll need a sudo entry setup to do the right thing and let apache execute svn/git/whatever commands as the owning user
  • You may need a couple of apache .htaccess rules to get the shell script to run as CGI
  • This is non-interactive, so if you're using remote repositories you'll probably need to setup SSH keys

  • Code the script defensively, and expect failure at every step. Check for errors obsessive-compulsively, we cannot stress this enough. We've seen plenty of scripts that don't do any checks of return codes and the like. This is passable most of the time. This is not one of those times. We're in the business of guaranteeing your uptime, but we're expecting you to do the right thing too. We're happy to offer advice if you're unsure, so just ask.

Wanting a better alternative? Just roll out from your own working copy, as shown above. Unfortunately this isn't much of an option for Windows users, but if you develop on a unix-y workstation then you're in luck. Rsync only transfers differences, so a slow/unreliable connection is no obstacle. Just one more command after your usual svn commit.


Author

Barney Desmond is a Linux systems administrator at Anchor with a passion for free software and open source solutions. Anchor is a provider of Australian web hosting and dedicated servers.

Got comments? Thoughts? We'd love to hear them. Know a fantastic rollout mechanism for Windows? We eagerly await the day...

See also

References/External Links