UMAD: Knowledge, Now

June 27, 2014 General, General

One of the big problems we face here at Anchor is the weight of knowledge that we carry around. We’ve been in business for almost 15 years, and the amount of information that we’ve acquired in that time is immense, largely due to the fact that we’ve dealt with many different technologies for many unique customers. How things work, how to fix things, how stuff gets done – these things tend to get encoded into people’s brains, and Anchor is no different. Even when it’s written down it can be hard to find that information. This is a challenge because it holds you back. Your ace employees with all the knowledge and experience are often the ones least available to teach new staff, which limits your ability to scale up and grow.

Anchor has half a dozen systems that our support staff and sysadmins use constantly. That’s half a dozen sources of information to search through when you just need to Get Stuff Done. It’s a bit tedious if you’re an old hand, and thoroughly daunting for a new hire trying to remember just where a particular snippet of information lives. I’ve been working on something in recent months to try and fix that.

That something is called Unearth Me A Document, and it aims to make all those sources searchable, quickly, in one place – much like everyone’s favourite search engine. UMAD isn’t full of super-exciting pixie dust, just a collection of technologies that are good at what they do. Let’s talk about those.

At its core, UMAD is a searchable index of documents, so we needed a search engine. Riak was looking promising for a short while, but the performance was less than stellar. We looked at a few other projects, but in the end we settled on ElasticSearch as the indexing engine. Its document-centric nature is a good match for what we’re trying to do, and it’s a feature rich and mature project. Pieces of information are our “documents”, and they each have a URL. ElasticSearch finds you the document and presents an excerpt, and then you can jump out to the URL if needed.

Surrounding ElasticSearch is a simple workqueue and various workers that push documents into the ElasticSearch index. The workqueue is held in Redis (simple and fast), and the feeding and processing daemons are written in Python. The web frontend is a very simple WSGI app written on top of the Bottle framework.

Adding more document sources is also straightforward. We started with just two, our wiki and support ticketing system, and kept adding more as we thought of information sources to mine. Starting with the template module it’s just a matter of writing some code to fetch the document then wrangle it into a data structure for indexing.

The beauty of this solution is that it’s scalable, and tolerant of faults. UMAD embraces the Unix philosophy of small units that do one thing well, coupled together loosely. Certainly it has become more complex as its grown, but UMAD will scale out as you add more nodes (up to a limit of course). Index performance can be increased by adding more ElasticSearch nodes to the cluster. Ingestion throughput goes up by adding more processing nodes. You can support more queries on the web interface by scaling out the frontend nodes (which shouldn’t be needed for a while, we’re not exactly facing an avalanche with our current number of staff).

So, UMAD is our “secret sauce” for new staff, and we’ve found it makes a huge difference just being able to find all our information. Even better is that it’s seriously fast, about half a second to return a page full of results. Can’t remember the secret incantation to make mysqldump do the right thing? UMAD knows what it is. Trying to answer a support call about UMAD knows about it. Didn’t catch the customer’s name but they said that their server named “deepwater” is down? UMAD knows everything about it, and it’s right there at the top of the page.

We are certain we’re not the only ones suffering from this problem. If you’ve ever built your own customer relationship/billing/asset tracking system then you’ll know our pain. Maybe you’ve done all of the above. Maybe you’ve done some of them more than once…

Even though this is an internal system, we’re big believers in open source and the source for UMAD is available on GitHub. There’s a bit of work involved in deploying it, but if you currently spend more time looking for knowledge rather than using that knowledge you might have a look.