Migrating websites with Wget

Wget is a common unix tool, that is also available on windows. In its simplest form, it is a download program like Getright, Flashget or your browser. It works from the command line, and has a vast array of options available to get the most out of this tiny program.

We often use wget to completely mirror remote sites, when a new customer comes over to us from another web hosting provider, we often copy the site for them using wget. To use it on our server, log in using ssh (howto coming soon). From the command prompt, run wget with the url of the file you want to download. This will download the file directly to our server. Since our internet connection is probably several orders of magnitude faster than yours, this is simpler than downloading the file to your local computer and then uploading to ours, and much much faster.

Another common use is, as I said, to mirror an entire site. Let's assume you are moving www.yourwebsite.com.au from website hosting company A to hosting company B. You have your new account setup, and you have logged in via ssh to B's server. Now to mirror your site, run wget -r www.yourwebsite.com.au and wget will recursively download your website to the new account. If you current website is a subdomain, or user account like www.isp.com.au/~username/website, you will need to use the -np switch, to tell wget not to ascend to the parent directory.

Now you have a more or less complete copy of your website, but be warned, wget does not read javascript, so all those fancy rollover effects will not work unless you copy the correct files manually.

By default wget will create a directory named after the site it is downloading, you probably want to put the files in the directory you are in at the moment, so just add -nd to the command. This tells wget not to create directories except when needed for your website.

The final command should look something like this

wget -rnp -nd www.yourwebssite.com.au