For the last few months our internal development team has been working on integration between our various internal systems. As part of this work we’ve been developing a series of Web APIs to decouple the functionality from the frontend interfaces. We have a couple of specific goals in mind. One is enhanced automation: once you have an API, you can write robots to do work for you. The other goal is being able to expose this to customers. Anchor made its name with top-notch support and comprehensive hands-off management, but there’s a growing number of customers who want to do things for themselves. If we give them a way to do that, it makes them happy. And then they can write their own robots to do things for them. As…
One of the big buzzwords in IT at the moment is “big data” – data in such large quantities that it’s not feasible to analyse using your traditional set of tools. Thankfully this isn’t a problem that Anchor has to deal with, but we almost wish we did. We collect a lot of data about the servers we manage, way more than most other hosting providers: on a typical server we monitor and track a couple dozen metrics to know how healthy it is and whether that’s changing over time. This is good, but it’d be great if we could easily store lots more data. What if we didn’t limit ourselves to keeping a year of data? What if we collected data every few seconds instead of every minute? Even…
As you’ve probably noticed, we’ve been evaluating Ceph in recent months for our petabyte-scale distributed storage needs. It’s a pretty great solution and works well, but it’s not the easiest thing to setup and administer properly. One of the bits we’ve been grappling with recently is Ceph’s CRUSH map. In certain circumstances, which aren’t entirely clearly documented, it can fail to do the job and lead to a lack of guaranteed redundancy. How CRUSH maps work The CRUSH map algorithm is one of the jewels in Ceph’s crown, and provides a mostly deterministic way for clients to locate and distribute data on disks across the cluster. This avoids the need for an index server to coordinate reads and writes. Clusters with index servers, such as the MDS in Lustre, funnel…
We’re attending Linuxconf 2013 this week, being held down in our fair capital Canberra. There’s been some great talks so far, we thought we’d share one of the most interesting with you. In a nutshell, Checkpoint and Restore In Userspace (CRIU) is the ability to take a point-in-time snapshot of a running process (checkpoint), and revive it later, either on the same system or another system (restore). We’ll go over the difficulties in pulling this off, and what it’s good for. Problems – the rabbit hole goes much deeper At first blush, this sounds simple enough – dump the process’ memory and stash it away, then later restore it and fix up a few references in the kernel, too easy! Not so fast there, there’s a lot of subtle problems…
We’re currently in the process of beta-testing RADOS Gateway with a view to producing a viable product, it’s an S3-compatible cloud storage solution. We’ve done a good amount of smoketesting and turned up our fair share of buggy behaviour, but what it really needs is a good shakedown. Thus, Anchor’s first hackfest was held last Friday to show off what can be done with our deployment of RADOS Gateway, named Trove, and see if it really shines. We wanted to keep things fairly low key for a first-attempt hackfest, so we only invited a small number of staff and their geeky friends, and put together several programming teams. Hackfests are generally pretty freeform and light on restrictions, which is how we ran it. The rules: Build anything you want, though…
We came across this interesting article recently, it’s about how an attacker can perform a denial-of-service attack by feeding perverse input to a system that uses weak hashing algorithms. This is referred to as a Hash DoS, and the specific target mentioned in the article is btrfs. btrfs is a next-gen filesystem that’s expected to replace ext3/4 in Linux. It’s still considered experimental but is quite usable and maturing fast. This article piqued our interest because we’re using btrfs “for reals” here at Anchor. It’s well and good to say that, but the article isn’t very exciting unless you have a background in computer science. How would you explain Hash DoS to your parents, who probably don’t have a CompSci background? This is the internet, so the answer is cats….
Anchor is proud to announce a new partnership with Magento as a hosting provider. This relationship recognises Anchor’s expertise and experience when it comes to providing comprehensive hosting solutions for Magento stores. Anchor is the first Australian hosting company to attain this partnership status, and only the eighth worldwide. Anchor’s ability to deliver high-performance and scalable Magento deployments was highlighted by their work with Games Paradise. Following the migration to Anchor’s servers, their site loaded some five times faster, and availability was rock solid, greatly improving their customers’ experience. As a company that’s been around for over ten years, Anchor has been doing managed hosting for a long time now. In that time they’ve built up strong relationships with local developers and Magento solution partners. By taking advantage of these…
Anchor held its annual Christmas party last Friday, and in a departure from the usual it wasn’t held upon a seaworthy vessel. So while enthusiastic statements to that effect were not possible, our presence at Le Pub appealed to the Redditor in us. Le Pub is a cozy little underground establishment just a couple of blocks from Anchor HQ, so we packed up for the week and moseyed on down in no time at all. Parties are always more fun when people get a bit dressed up. To make this easier we decided to do it a bit differently this year: instead of visiting the costume shop to try and pick fun and silly outfits, we got an assortment of bits and pieces to add some pizazz. This was a…
The brouhaha surrounding ClickFrenzy has come and gone in the last few weeks, but we’re still getting questions about Booktopia and what they did to get through the ordeal without going down. While there were a few high-profile website failures during the ClickFrenzy event, the fact is that a lot of the participating retailers suffered badly under the load, and the organisers and retailers copped a lot of ire. Booktopia was not one of them. Booktopia’s site was designed and coded in-house, with the app written to run on JBoss, backing onto a MySQL database. It runs on a cluster to ensure high availability, but you might be surprised to learn that there’s actually not a huge number of servers involved. When you have an experienced team that’s done this…
RADOS Gateway (henceforth referred to as radosgw) is an add-on component for Ceph, large-scale clustered storage now mainlined in the Linux kernel. radosgw provides an S3-compatible interface for object storage, which we’re evaluating for a future product offering. We’ve spent the last few days digging through radosgw source trying to nail a some pesky bugs. For once, the clients don’t appear to be breaking spec, it’s radosgw itself. We’re using DragonDisk as our S3-alike client – what works? PUTing and GETing files works, obviously. Setting the Content-Type metadata returns a failure, and renaming a directory almost works – it gets duplicated to the new name, but the old copy hangs around. Wireshark to the rescue! We started pulling apart packet dumps, and it quickly became evident that setting Content-Type on…
