A dev’s guide to safely escaping and encoding URLs

A lot of the support work that we do here at Anchor involves looking at websites. You could say that we’ve seen a few websites in our time. Something we come across pretty frequently is inadequate protection when it comes to handling user-submitted form data and URLs. This might not seem like a big deal, but it has some pretty big security implications, mostly relating to cross-site scripting. These problems can enable malicious activity like leaking of private data. The short version is that user-supplied data can never be trusted, and you need to carefully escape and format the data to make it safe for the intended use, such as printing it on a webpage. A very simple example Let’s say you run a site that accepts news tips from…

Grepping for binary data

I was dealing with an interesting content-encoding issue yesterday for a customer’s website. They’re adamant that the problem started a few weeks ago after a routine database restoration, but we beg to differ. In any case, the customer’s site was displaying “funny characters” here and there, classic symptoms of encoding failure. I’ve written about this before, as it relates to MySQL’s handling of character encoding, but it’s not mysql’s problem alone. In this case, the content coming from the database and CMS was proper UTF8, but there were dodgy characters leaking into the rendered page. I knew these would be coming from template files in the user’s account, but how to find them? I could find an instance here and there by searching for nearby strings, but I needed to…

