Subversion Server

A little while ago I had cause to setup some Subversion (SVN) repositories. Because the plan was for this to be used by a number of people other than myself, I wanted to Do It Right, in a way that's reliable, secure, and easy to maintain. Unlike my colleague's guide to using Subversion for version control, this will focus on the server side, assuming you're already a competent user of SVN.

Intro

What are we doing?

I've opted to setup a dedicated server purely for the purpose of hosting SVN repositories. You can host repositories just about anywhere with little more than the subversion tools and SSH, but I'd like a little more functionality.

Who is this article for?

This article is aimed at an admin with a bit of experience and knowledge. It won't be a kopipe (a cute Japanese abbreviation for "copy and paste") tutorial, but it should be sufficiently clear at every step to understand what's going on and how it can be adapted to fit your usage requirements.

It is assumed that you're quite comfortable installing packages for your distribution, have experience configuring and securing apache, and know your way around the common svn tools. You could probably stumble through this yourself without too much difficulty, but where's the fun in that?

Who will be using this setup?

As mentioned, this will be for non-personal use. I run a few collaborative projects with friends and SVN happens to be a good way to manage the files we work on. The userbase is typically quite small, up to a dozen people. Because the users are people I know well and who are technically competent, I can afford to make things a little more difficult to use if it meets my goals. I'm happy to provide support because they're friends.

What's our motivation?

The repos we use used to be setup on a server in the US. As most of my friends are in Australia, the experience could be quite poor, due to higher latency and flaky connection. I have the skills and resources to host the repos locally, hence this project. I've chosen to dedicate a server to it so that it can be isolated; this makes management somewhat easier.

What exactly do we want out of the server?

The server must host SVN repos, obviously! More specifically:

  • I want all traffic to be encrypted between the clients and server
  • I want to be able to easily manage users and their access to the repos
  • It should be easy to add new repos to the system
  • I thought it would be nice to be able to browse repos from a web browser

To ensure that these goals are being met, we'll be testing the setup at various points along the way. If you're running Linux on your desktop, the command-line client tools are probably already installed. If you're on Windows I can highly recommend TortoiseSVN.

Is SVN right for what I'm doing?

In my case, yes. SVN supports multiple users working on a project in parallel, and a centrally managed repo means everyone has access to the files. We could have used an FTP server for the file sharing, but it's not as elegant or as well-suited to our workflow as SVN.

Couldn't this be simpler?

Almost certainly. It's entirely possible to run an SVN repo out of a user account with SSH access, but we need more flexibility as detailed above.

Installing the server

With that out of the way, let's get started. I won't go into too much details, but I'll highlight a couple of decisions to consider when the server is built.

I've actually chosen to deploy a VMware virtual machine (VM) for this. I don't have a spare server on hand, and that would be pretty inefficient for something that won't be very demanding. I already have a very capable machine, yumi, setup to hosts VMs.

Hosting SVN repos isn't too demanding, especially for the number of users I plan to have. As such, I've chosen to create sachiko with a single virtual CPU, 384MiB of RAM and an 8GiB drive. If I run low on space I can always tack on another virtual disk. If I were deploying this on physical hardware, I'd certainly want some sort of redundant RAID configuration.

I've chosen to install CentOS 5 on the VM as it's stable, low-maintenance and well-supported. Administrative access is via SSH as the root user. This flies in the face of most common security guidelines, but I'm confident in my practices and things have been locked down tightly (access only from certain IP addresses, public key authentication, passwords are disabled). Our article on securing SSH access has a lot more detail on this. I've also uninstalled unneeded packages and disabled as many services as possible to reduce the surface area. Finally, the firewall is only allowing connections on the ports needed for SSH and HTTP/S

Apache

Ditch PHP

I'm going to assume you're comfortable installing and configuring the apache webserver. Under Redhat/CentOS the package is named httpd. One thing we don't need for Subversion is PHP; huzzah! Go ahead and remove it, it feels great.

yum erase php-cli php

Choice of worker/prefork MPM

Because you're not using PHP, you can switch Apache to using the "worker" Multi-Processing Module (MPM). mpm_worker places an emphasis on using threads instead of processes to serve requests, and was a new feature of Apache 2 when it came out. mpm_worker can theoretically provide better performance for a large number of requests, but Your Mileage May Vary. Unfortunately, PHP isn't properly thread-safe and can't easily be used with mpm_worker. PHP's ubiquitous popularity means that mpm_worker has never really had a chance to take off.

If you don't already know about mpm_worker you probably shouldn't bother changing it. If you're feeling adventurous, you can make the switch by editing /etc/sysconfig/httpd. Like it says, you'll need to stop apache before making this change.

# The default processing model (MPM) is the process-based
# 'prefork' model.  A thread-based model, 'worker', is also
# available, but does not work with some modules (such as PHP).
# The service must be stopped before changing this variable.
#
HTTPD=/usr/sbin/httpd.worker

Vhost example with SSL

As the last part of your basic Apache config, you should setup an SSL certificate. One of my criteria was for all client-server communication to be encrypted, which is easily done. This is my current vhost configuration to meet these requirements, it can be modified in many ways. I'm using an SSL certificate issued by CAcert.

<VirtualHost 3.141.59.26:443>
        ServerName sachiko
        DocumentRoot /usr/share/empty

        <IfModule mod_ssl.c>
                SSLEngine on
                SSLCertificateFile      /etc/ssl/sachiko_crt
                SSLCertificateKeyFile   /etc/ssl/sachiko_key
                SSLCertificateChainFile /etc/ssl/CAcert_class3.pem
        </IfModule>
</VirtualHost>

You should now be able to reach apache in your browser, eg. https://sachiko/. Redhat and CentOS should serve up their generic 404 page now, as /usr/share/empty isn't a terribly exciting docroot location. Check that SSL is working as expected. It's also a good idea to disable non-SSL access at this point.

Installing Subversion

If you haven't done so already, you'll need to install the subversion package and associated apache connector. The latter will pull in a couple of other packages like neon, a DAV client library that subversion uses to talk to apache.

yum install subversion mod_dav_svn

Setup the repo structure and vhost

We'll start by creating a home for our repositories. It needs to be somewhere that apache can write to, especially if you're using SElinux. By default, the structure under /var/www is suitable, so we'll use that. We'll make an empty test repo in there to start off with.

mkdir -p /var/www/svn
cd /var/www/svn
svnadmin create testrepo

Now we'll publish the repo through apache. In your vhost config, add a <Location> stanza so the mod_dav_svn module can do its job.

        <Location /testrepo>
                DAV svn
                SVNPath /var/www/svn/testrepo
        </Location>

You'll need to reload/restart apache so your changes are picked up.

Check it's working

At the moment you should be able to browse the empty repo via your browser; it's not very exciting. There's no security yet, but that's fine; you need to confirm that the repo is accessible and usable.

Checkout a working copy of the repo and commit a few files. You should also be able to use your browser to see the files, just to be sure. You should already know how to do this. If not, we have a helpful guide.

You're also most welcome to add another repo or two if you feel like it. Just create a new repo in /var/www/svn and add another <Location /reponame> stanza to your vhost in apache; don't forget to reload/restart apache after making changes.

Enabling authentication

At the moment, your repo/s have no security on them. Anyone can check them out, and anyone can commit changes. This is probably not what you want. To solve this, we're going to have apache handle the authentication. While SVN supports path-based restrictions in a repo, these can incur a hit on performance, and ideally shouldn't be needed. We plan to use one repo for each project we work on, and it makes sense (to us) that everyone with access to a given repo should have access to everything in it.

Adding PostgreSQL to handle user accounts

One of the goals I mentioned earlier was that it should be easy to add and manage users. While .htpasswd files are trivial to use, they can be a bit cumbersome, especially as you start to grow and need to handle more users. For this reason, we're pushing the user accounts into a database; both MySQL and PostgreSQL (aka. pgsql) are well supported as authentication backends. At Anchor we strongly prefer pgsql for being more robust and saner to deal with. For an authentication backend it's much of a muchness and you should go with whatever you're comfortable with. We have a comparison of MySQL and PgSQL if you need arguments one way or the other.

Install and configure postgres

You'll need the postgres server and apache module to connect to it. This will pull in a handful of dependent packages.

yum install postgresql-server mod_auth_pgsql

Design the schema and setup users for access

In designing the schema for the authorisation database, we need to consider a couple of things.

  1. We're centralising the user accounts for all svn access into a single database
  2. Different repos will have different lists of users that are allowed access
  3. We can assume that we can identify each user account uniquely, eg. using an email address. This means we don't have to worry about a username clash between Jim-who-has-access-to-repoA and Jim-who-has-access-to-repoB.

With this in mind, it's best to split things into a pair of tables. One table will have usernames and hashed passwords, the other will have a list of username-to-group mappings, where the groupname matches the name of the repo. We can then tell apache to allow access to a repo if the user is a member of the appropriate group.

The included module-config file has good documentation of doing this, as well as dealing with simpler authentication scenarios. On Redhat/CentOS you can find this file in /etc/httpd/conf.d/auth_pgsql.conf. Note that while they expect you to do the authentication configuration there, we'll ignore that and do it in the vhost definition.

For the sake of security, we'll create two new database users. One, svnauth, will own the database and have full access. The other, apache, will be read-only, and is used only to authenticate users. You can create users from within the psql shell, or you can run the createuser command as the postgres user. I'll demonstrate using SQL from the psql shell.

CREATE ROLE svnauth ENCRYPTED PASSWORD 'verysecurepassword' NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;
CREATE ROLE apache  ENCRYPTED PASSWORD 'adifferentsecurepassword' NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;

Exactly how you do this is up to you. If you already have an existing user database to connect to, that's great, you can leverage it by adding a couple more tables to it and everyone can use the same login details. In my case, this is a new database and I've decided to eschew the use of a password for the apache user. Postgres lets you use "ident" authentication, which means the apache system user can login as the apache user in postgres without a password. Implementing postgres user accounts is beyond the scope of this article, but it suffices to say that it's very flexible.

Create the user tables

Finally we get to the meat of this exercise in using postgres. It's really easy, just make sure you do this is the postgres user through the psql shell.

CREATE DATABASE svnauth WITH TEMPLATE = template0 OWNER = svnauth;

\connect svnauth

CREATE TABLE users (
    username character(50) NOT NULL PRIMARY KEY,
    "password" character(32) NOT NULL
);
ALTER TABLE public.users OWNER TO svnauth;

CREATE TABLE groups (
    username character(50) NOT NULL,
    groupname character(50) NOT NULL,
    PRIMARY KEY (username, groupname),
    FOREIGN KEY (username) REFERENCES users(username) ON UPDATE CASCADE ON DELETE CASCADE
);
ALTER TABLE groups OWNER TO svnauth;

GRANT SELECT ON TABLE users  TO apache;
GRANT SELECT ON TABLE groups TO apache;

In case you're not fluent in SQL, we've added primary keys to each table to create indexes and enforce uniqueness constraints. Similarly, we've used a foreign key to specify that the username field in the groups table refers to the identically-named field in the users tables; it doesn't make sense to have a username in the groups table that doesn't exist in the users table. We've also granted the apache user the right to get data out of the tables.

The choice of 50 character-long fields is arbitrary, it should be a safe value. The password field is exactly 32 characters because we'll be storing a hex-encoded MD5 hash of the user's password.

If you've done this right, it should be easy to run a quick test. We'll run a simple SQL query as the apache user and confirm that you can get results. Depending on your Postgres configuration, you may need to add a password, or connect via TCP, etc. If you can't get a resultset in one way or another then you'll need to diagnose that before proceeding.

[root@sachiko ~]# sudo -u apache psql -d svnauth -c 'SELECT * FROM users;'
 username | password
----------+----------
(0 rows)

Populate the tables with users, groups and password hashes

Now that we've got our tables, let's add some users. You could connect as the svnauth user now to do this, or you can be lazy like me and just use the postgres superuser. The svnauth account is useful if you want to develop some sort of management frontend, or need to delegate user management to someone else without giving them god-like permissions to your database.

In this example, testguy will only have access to firstrepo, while testgal can access firstrepo and secondrepo.

INSERT INTO users VALUES ('testguy', MD5('testguys_password'));
INSERT INTO users VALUES ('testgal', MD5('testgals_password'));
INSERT INTO groups VALUES ('testguy', 'firstrepo');
INSERT INTO groups VALUES ('testgal', 'secondrepo');
INSERT INTO groups VALUES ('testgal', 'secondrepo');

Now's a good time to check that apache will be able to see these entries. Here's one I prepared earlier.

[root@sachiko ~]# sudo -u apache psql -d svnauth -c 'SELECT * FROM users;'
                      username                      |             password
----------------------------------------------------+----------------------------------
 gordonfreeman                                      | aaa5b60c26e2d0fd3d9fa9e686906060
 alyxvance                                          | 3f7cd0593c2cbbf82524bef01f487f86
 isaackleiner                                       | 001af5fba13a75bcc757e47db8c62e2c
(3 rows)

[root@sachiko ~]# sudo -u apache psql -d svnauth -c 'SELECT * FROM groups;'
                      username                      |                     groupname
----------------------------------------------------+----------------------------------------------------
 gordonfreeman                                      | entanglement_research
 gordonfreeman                                      | xen_documentation
 gordonfreeman                                      | security_procedures
 barneycalhoun                                      | security_procedures
 barneycalhoun                                      | firearms_manuals
 isaackleiner                                       | entanglement_research
 isaackleiner                                       | headcrab_biopsy
(7 rows)

Hook apache up to postgres

Now we're ready to tell apache to use postgres for authentication. This is a minor change to the vhost config and should be hassle-free. If something doesn't work you'll need to watch your apache error log; it's most likely that apache is having trouble connecting to postgres. You should also check the postgres log in case of any problems, they should quickly become apparent.

Our vhost configuration is now looking much healthier. If you've been following up to this point it'll look something like this; the repo name will likely differ. I've added a couple of extra directives for logging, just as a convenience to keep things logically separated. The custom SVN log is a nice addition that shows you the high-level svn operations being executed, rather than a flood of HTTP requests.

<VirtualHost 3.141.59.26:443>
        ServerName sachiko
        DocumentRoot /usr/share/empty

        CustomLog /var/log/httpd/access_logs/sachiko.log combined
        ErrorLog /var/log/httpd/error_logs/sachiko.log
        CustomLog /var/log/httpd/sachiko_svn.log "%t %u %{SVN-ACTION}e" env=SVN-ACTION

        <IfModule mod_ssl.c>
                SSLEngine on
                SSLCertificateFile      /etc/ssl/sachiko_crt
                SSLCertificateKeyFile   /etc/ssl/sachiko_key
                SSLCertificateChainFile /etc/ssl/CAcert_class3.pem
        </IfModule>

        <Location /entanglement_research>
                DAV svn
                SVNPath /var/www/svn/testrepo
                AuthType Basic
                AuthName "Test repo"

                Auth_PG_database svnauth
                Auth_PG_user apache
                Auth_PG_pwd_table users
                Auth_PG_uid_field username
                Auth_PG_pwd_field password
                Auth_PG_grp_table groups
                Auth_PG_grp_user_field username
                Auth_PG_grp_group_field groupname
                Auth_PG_hash_type MD5
                Require group entanglement_research
        </Location>
</VirtualHost>

Don't forget to reload/restart apache after you make these vhost changes.

Test that it works

If you've configured apache correctly it should now ask you to authenticate when accessing the repository. A simple test can be done through your browser. Following the example above, I'd try to access https://sachiko/entanglement_research/ and it should ask me for a username and password.

Now's also a good time to add another repo so you can test out the access control as different users and groups. Just create a repo using svnadmin and add another <Location /name_of_new_repo> stanza to the vhost config, remembering to reload/restart apache again.

The real test now is to checkout a working copy of the repo and start committing to it. If you're using TortoiseSVN this is a piece of cake. For command-line svn you'd use something like the following. If your local username doesn't match the account you've created in postgres then the latter form will be needed.

svn checkout https://sachiko/security_procedures ~/svn/security_procedures

svn checkout --username gordonfreeman https://sachiko/security_procedures ~/svn/security_procedures

Maintenance

You should now have a fully functioning repo setup, but we can't stop here! (this is bat country, y'see)

  • What happens if the authentication database gets corrupted? This is quite likely if you use mysql.
  • What if the server loses power and your filesystem is broken?
  • What if the server gets compromised by an attacker?
  • What if you experience catastrophic hardware failure?

Ignoring the consequences of your repos being unavailable, the answer to these questions is backups. Backups are really easy to setup, so there's no excuse not to do it. It's also trivial to arrange for backups to be pushed to another machine at an offsite location, which covers the loss/destruction of your hardware.

Dumping the content to be backed up

The first thing to do is to get a consistent image of the data you're backing up. SVN repos are filesystem-based, but there's the possibility of getting inconsistent files if you were to simply make a tarball from the repository files. svnadmin has a dedicated function for doing this properly, so we'll use it. Similarly, postgres keeps "hot" files open, so you can't copy them either. We'll use pg_dumpall to handle this. If you're using your postgres installation for other things, you can either selectively dump the svnauth database using pg_dump, or integrate it into your existing backup processes.

svnadmin has a facility to take incremental backups, which are a diff against the previous-to-requested revision. This can save a lot of space if you have a large repo, but for the sake of simplicity we'll just take a full dump. Full dumps are easy to use and self-contained, meaning you can restore a repo from scratch with almost no effort.

I'm assuming you'll dump the repos and databases every day, then use some other backup process to keep the dumpfiles safe. We keep the last few days worth of backups so we've got them lying around in case we need them. Let's create a couple of scripts that do the work we need, you can adjust the filesystem paths as appropriate.

  • /usr/local/sbin/svndump.sh

    for repo in /var/www/svn/*
    do
            svnadmin dump -q $repo | gzip -9 > /data/svnbackup/`date -I`_`basename $repo`.gz
    done
    
    if [ -f /etc/redhat-release ]
    then
            /usr/sbin/tmpwatch --mtime 48 /data/svnbackup
    elif [ -f /etc/debian_version ]
    then
            /usr/sbin/tmpreaper --mtime 48 /data/svnbackup
    fi
  • /usr/local/sbin/dump-pgsql.sh

    pg_dumpall | gzip -9 > /data/pgsqlbackup/`date -I`_pgsql.gz
    
    if [ -f /etc/redhat-release ]
    then
            /usr/sbin/tmpwatch --mtime 48 /data/pgsqlbackup
    elif [ -f /etc/debian_version ]
    then
            /usr/sbin/tmpreaper --mtime 48 /data/pgsqlbackup
    fi

Scheduling dumps

There's probably no sane Linux distribution that doesn't come with the cron daemon installed and enabled, so we'll use it to perform our backup dumps. You could add these cron snippets to the list of daily jobs, but we prefer to have flexible scheduling so they can take place shortly before the nightly systemwide backups.

  • /etc/cron.d/pgdumpall

    1 0 * * * postgres /usr/local/sbin/db-dump-pgsql.sh
  • /etc/cron.d/svndump

    5 0 * * * root /usr/local/sbin/svndump.sh

With these, the postgres databases will be dumped at one minute past midnight, every night, using the scripts we setup above. Likewise, the SVN repos will be dumped every night, but at five minutes past midnight.

Keeping the data safe

What you do with these dumps now is up to you. If you have an existing backup regimen, great; they should be picked up along with everything else (make sure this is the case). If you don't have a systemwide backup process then you should at least copy them to another machine for safety.

It's outside the scope of this article, but it's very easy to use a tool like rsync to copy the dumpfiles to a remote machine, using password-less SSH keys. If you're the adventurous type, you can hack up a mechanism to upload the dumps to Amazon Web Services; their Simple Storage Service (S3) is great value for money when used as a filedump with minimal data transfers.

Author

Barney Desmond is a Linux systems administrator at Anchor with a passion for free software and open source solutions. Anchor is a provider of Australian web hosting and dedicated servers.

References