How to explain Hash DoS to your parents by using cats
We came across this interesting article recently, it’s about how an attacker can perform a denial-of-service attack by feeding perverse input to a system that uses weak hashing algorithms. This is referred to as a Hash DoS, and the specific target mentioned in the article is btrfs.
btrfs is a next-gen filesystem that’s expected to replace ext3/4 in Linux. It’s still considered experimental but is quite usable and maturing fast. This article piqued our interest because we’re using btrfs “for reals” here at Anchor.
It’s well and good to say that, but the article isn’t very exciting unless you have a background in computer science. How would you explain Hash DoS to your parents, who probably don’t have a CompSci background? This is the internet, so the answer is cats.
Welcome to Purrfect Kitty Daycare =-^_^-=
Let’s pretend that you run a daycare centre for pampered pusses. Doting owners drop their kitty off on the way to work each morning, and pick them up in the afternoon. You look after the pussies fantastically, so business is growing by leaps and bounds with more moggies every week. You can’t look after all the cats, so you hire some enthusiastic helpers.
Fast forward a few months, you now have 26 cat-minders working for you while you manage the business. Each minder has their own space to work in. To divide the work, you assign them to minders based on the cat’s name: all cats whose name begins with the letter ‘A’ go to minder no. 1, the cats whose name starts with the letter ‘B’ goes to minder no. 2, and so on.
When owners arrive to drop off or pick up their furry bundle of joy, they know exactly which room to go to! It’s super simple and does the job nicely. Your offices look something like this:
What you’ve implemented is called a hash function. It’s really basic but it does the job. As long as a cat has a name, there’s a room for it, and you always know exactly where to find a cat. When you use a hash function to distribute objects like this, each object (cat) goes into a bucket (room).
Your rooms don’t fill up evenly, this is to be expected. You might have a few cats in Room A (Alice, Alison, Amanda), and only one in Room X (Xerxes). Room A has what’s called a hash collision.
Finding Xerxes in the afternoon is easy, he has the whole room to himself. When Alice’s owner comes to pick her up, the attendant at the front desk has to ask what she looks like (or remember from previous visits). No big deal, we just have to check all the cats in Room A until we find Alice. It takes a couple of seconds.
Sometimes you’ll get a lot of cats in one room, maybe a dozen, but you can still work out which one you’re looking for with a little effort. You’ve got ninety-nine problems but cats ain’t one.
A rival appears! Kitty Kare has opened up across town and is looking to put you out of business. They’ve seen how your hashing function in action and know how it works. They’re going to use it against you, because they’re evil.
First, they need cats. Lots of cats. Maybe they pick up strays off the street, or just get kittens from the internet. It doesn’t matter how, but they’ve got over nine-thousand cats.
Now they give them all names starting with “Mr” – Mr Bigglesworth, Mr Fluffles, Mr Mac, Mr Pete, Mr Lincoln, Mr MoonUnit, etc. The list is practically infinite. Each cat gets a little engraved nametag on its collar and goes on its catty way.
They bring all the cats to Purrfect Kitty Daycare. Your staff are very smart people, and manage to handle the tsunami of tabbies by moving a few walls around to make enough room. It slows them down a bit, and your customers get irate that they have to wait to drop off their cat, but they get there in the end.
Your offices now look something like this:
Real mayhem arrives in the afternoon: evil employees from Kitty Kare return and ask to pick up ALL the cats. One by one.
Placing a cat in the room takes a roughly-fixed amount of time, called constant time when dealing with algorithms, mathematically written as O(1).
Finding a particular cat means going to the room and checking all the cats there. The more cats there are, the longer it takes. This is called linear time, written as O(n).
On average, your staff have to search about 4,500 cats (half of all the cats in the room) before they happen to find the right one. Things get better as some of the cats are returned to their (evil) owners, but it’s a bad situation for a long time. Your genuine customers are quite angry and upset, and it’s well past midnight by the time you knock off and go home.
You get home that night and have dreams. Bad dreams, about being overwhelmed by cats. You’ve just been Hash DoS’d, with cats.
Fixing those felines
In short, the answer to this problem is to use a hash function that isn’t vulnerable to this sort of attack. What you need is a hash function that “spreads” well (that is, similar inputs are unlikely to produce the same output). A cryptographic hash function (such as SHA-1) is the ultimate solution to this problem, but is overkill for your kitty daycare centre. The important criteria is that an attacker can’t easily generate names that will all map to the same hash value.
The best outcome is the case where the attacker has to “brute force” the problem by making up lots of cat names and then running them through your hash function to see which ones all hash to the same value. So long as your hash function evenly distributes cats into rooms, all you need to worry about is having enough staff to look after them all.