A customer’s Django app has been giving us hell for a little while now, something we’ve recently tracked down to dodgy signal-handling in some MySQL library code. Despite only showing up a dozen times in every 600 million queries or so, we’ve nailed it! It turns out the bug has been hanging around, on and […]
We’ve had a number of people ask us recently what sort of procedures and tricks we use when hunting down problems on systems we maintain, as a lot of the work can seem magical at times. While there’s no short answers to these sorts of questions (you could fill many, many pages with the topic), […]
Update from 2012-05-24: The Corosync devs have addressed this and a patch is in the pipeline. The effect is roughly as described below, to build the linked list by appending to the tail, and preferring an exact IP address match for bindnetaddr (which was intended all along but got lost along the way). Rejoicing all […]
Today we bring you a technical writeup for a bug that one of our sysadmins, Michael Chapman, found a little while ago. This was causing KVM hosts to mysteriously keel over and die, obviously causing an outage for all VM guests running on the system. The bug was eventually traced to the megaraid_sas driver and […]