Posts Tagged ‘documentation’

“But I’ll remember it!” — why people don’t write documentation

Monday, August 10th, 2009

As I mentioned in my last documentation-related article, having a wiki or other documentation repository is really only a small part of the system documentation battle. What really counts is actually getting it chock full of all the knowledge in your organisation, and — possibly even more importantly — keeping it up-to-date forever and ever, Amen.

Just telling people to document their work really doesn’t work. While a quick “hey everyone, write some docs!” might get a desultory page or two written, it’s not really enough. Writing things down needs to be as natural as breathing, and about as easy. In fact, you should really aim for it to be easier to write than to not write, so that the “natural” choice is to do it.

Finding and eliminating the things that prevent people from writing documentation is the key. Every barrier to writing a good, solid document that you can remove means that much more documentation can be written. While I’m sure there are a million possible reasons “why not”, here are a few that I’ve noticed (usually in myself), and how I go about dealing with them.

  • They don’t feel they have to. Sure, there aren’t many people who would answer a confident “no” to the question “Do you need to document any of your work?”, but that’s different to it feeling wrong not to document things. Encouraging a culture of documenting things, so that any lack of documentation feels wrong at a gut level, is a crucial part of getting documentation written.Please note that when say “feels wrong”, I don’t mean that it feels illegal or naughty — that way lies guilt and recriminations and every other kind of organisational unpleasantness. No, what I mean is that you feel uneasy about something until it’s been documented — like leaving the milk out on the bench. You just don’t feel right until it’s back in the fridge. (Well, OK, maybe that’s just me, but you get the point).
  • I’ll remember it“: Yeah, sure you will. Whilst there are people who have an eidetic memory, they’re rare enough that it’s unlikely any of them work on your team. What’s more, even if you have someone like this, not everyone is so gifted, and an eidetic memory still doesn’t confer telepathic ability, so communication is still required.People who argue against the need to document just need to be talked down from their suicide pact. Pair them with someone who likes documenting, build a documentation culture, and let them experience how much better it is when things are written down.
  • They think they suck at it. Very few people like to do things they’re not good at, and writing docs is no exception. It’s true that some people really do suck at writing, and those people need to be trained to improve their skills, but far more people can do it, they just don’t think they can. Gentle encouragement, pairing with someone who does know they’re good at writing, along with removing the other blockers (so that people start trying it and realising they’re good at it) will solve much of this problem.
  • I don’t know where to put this where people will find it. I can strongly relate to this sentiment, as it’s one of the most common reasons I give to myself for why I don’t document things sometimes. Ultimately, what I think any rapidly changing wiki needs is a librarian — someone who makes it their responsibility to keep all the information organised. I can imagine that a large enough organisation might have a literal librarian (someone trained in information collection and management), but almost everyone doesn’t have that much stuff. Instead, there just needs to be someone who knows that they’re responsible for keeping an eye on the clutter, and everyone else needs to know who the librarian is, too, so they know who to talk to about their categorisation woes.I would say that a project manager with a strong technical background is the best person to act as librarian (non-technical project managers, whilst being a bad idea in general unless there’s a strong technical lead, are especially bad in this case because they lack the experience to know how most of the documentation will be used, and hence can’t arrange it properly). They should have the “high level” view necessary to keep classification and arrangement logically sensible, without getting bogged down in the minutiae of one little corner of the world.

    Rotating the job of librarian around the team doesn’t work, as what’s really happening when someone is arranging everything is that the content is being organised in a way that is most logical to the librarian, and then everyone else just starts to learn to think like that person when it comes to finding the information again. Changing who does the job regularly doesn’t allow any of that to happen, and everything falls apart.

  • I don’t know what to write. We’ve all been there: the horror of writer’s block. The techniques for overcoming this are many, varied, and are learnt over time as you write more. Pairing, good team support, and experience are the keys to overcoming this problem.
  • Not knowing who to write for. This comes down to knowing how to write, but it’s worth making as a separate point. Before you start writing anything, decide who should be reading this and more importantly why they’re reading it. This will help ensure the correct tone, level of detail, and document structure. Describing the architecture of the system is a whole different style of writing from giving a procedure to resuscitate the app server in a real big hurry when it fails, and it’s important to get the tone right, otherwise your document is going to be pretty un-useful.
  • Nobody will ever read it anyway. In my pre-document-everything days, this was my go-to excuse. I couldn’t count how many times I used it. It can even be true, sometimes — if someone’s come up empty every time they looked for docs, pretty soon they stop searching. The best way to avoid this feeling is to not let it grow, but if you’re already in this situation, you need to work at reversing the impression.

    The only way to get people back into the habit of looking for and reading documentation is to make them explicitly aware that it exists, and that reading it will solve their problem quicker than interrupting someone else and asking them. This means that your documentation must be easily discoverable (good searching, good indexing), the docs in there must be high quality (constant QA), and it must be very likely that the answer is actually in there (keeping documentation up to date).

    On a closely related point, if you’ve got any external systems that tell you to do things, ensure that they provide easy pointers to relevant documentation. For example, your monitoring system should provide links to wiki pages describing how to diagnose and fix each individual problem (so e-mail alerts for the “Is the primary website up?” check should link to a procedure titled “how to restart the primary website when it fails”).

    As a more concrete example, we’ve got an automated event handler that resizes disk partitions for customer backups when they start to fill up. When that handler fires, it creates a ticket describing who was resized, and provides a direct link to the procedure for updating billing records. Nobody ever has any excuse not to follow the procedure, and they’re constantly reminded of the presence of a huge swathe of useful documentation on how to do their jobs.

  • If I write stuff down, you can outsource my job / fire me! If you really are just getting documentation written so you can outsource their jobs, then you’re evil and can get stuffed. If your corporate culture is such that people are legitimately worried about this, you’ve got bigger, more toxic problems than a lack of documentation. On the other hand, if it’s just rampant paranoia, I suggest that perhaps a psychiatric assessment might be of assistance.
  • I don’t have time now, I’ll do it later. To this, I say: “Yes, you do” and “No, I’m pretty sure you won’t”. Documentation, like automated testing, is a dish best served with the main course, not for dessert. Budget time in your project plans for documentation, and make sure everyone knows it. Make reviewing the documentation part of the review, and be generous with people who slip the timeline due to documentation work.A closely related problem is getting dragged off to do something else as soon as the “real job” is done, without having done the documentation. To combat this, everyone needs to know that “the job’s not done ’til the wiki’s up-to-date”, staff must have the authority to say “I can’t do that other task, because I haven’t finished documenting this yet”, and management needs to learn to stop judging completion by when the service is working, but rather by when it’s documented.

Ultimately, the best way to get documentation written is to make the writing a natural a part of the workflow, like logging into your workstation when you start work. Some of the ways I think work well to move everyone in the right direction include:

  • When new team members come on board, take the time to give them a guided tour of your existing documentation. Sure, these days most technical people already know what a wiki is and how to do the basics, but everyone structures their wiki content differently, and different wiki engines encourage different behaviour. Give the new geek a strong impression of how good your docs are, so they’ll be encouraged right from the start to keep that tradition going, and give them an idea of how the wiki is structured and how to find information so that their early experiences aren’t frustrating.
  • Answer all questions with a wiki page — either point someone to an existing page, or write a new one and point them at that. This can be slow and tedious at first if you’ve got a huge, undocumented mess, and might have to be applied selectively at first, but communicating through the wiki both ensures that documentation gets written, and that everyone gets into the habit of using the wiki to find the answers to their problems.As an aside, the person answering the question should be writing the page, instead of the questioner getting a verbal answer and then writing things down. Whilst this undoubtedly takes more time, it means that people can’t get away with skimping on their docs and then expecting the new guys to take up the slack, and the person asking the question is unlikely to have sufficient visibility into the problem to structure a page appropriately. Also, having someone write a page (to answer the question) and then someone else immediately read it to get the answer, provides a simple informal QA mechanism.
  • In your project plans, explicitly describe the documentation to be written for every single step. (You do have project plans, don’t you?) This isn’t just adding “Document as needed” to the end of each item description; that’s no where near detailed enough. Instead, write things like “Setup an internal package repository to contain all of the cluster-specific packages. Document how to build new packages, update existing packages (both existing-internal and newly-modified-from-upstream packages), and how to upload built packages to the repository and rebuild the indexes at Procedures/PackageManagement”.Yes, most of that description covers the docs rather than the core point, but that’s OK (and intentional) — it’s far more likely that whoever gets assigned that task will know how to setup a package repository and can judge when it’s done; far less likely is that they’re going to automatically think of all the possible ways that documentation should be written, especially when they’re excited to be getting on with the next task. It’s also ridiculously hard to QA a completed task if there’s no detailed list of what docs should have been written.

    When I’m not intrinsically motivated to complete a task, I break it down into steps so small that each one is “obviously trivial, I can do that in five minutes”, and then there’s no reason not to do it. It’s the same thing here — “document as needed” is big, scary, open-ended, and doesn’t have an obvious starting point. “Document how to do this, place it here” is a series of small, simple, completable tasks, and there’s no mental block over where to put the docs to get in the way.

  • Consciously budget time in your project plans for docs. I prefer to make it a separate column in the project plans, but whether it’s included in your other estimates or not, it definitely needs to be included in the budget or it’ll never get done.Separate line items for documentation allows you to do your evidence-based scheduling more accurately (which is always good, as it shows you whether your doc time budgets are accurate or not) and — most importantly — says to everyone “I have explicitly set aside time to document this, so there’s no excuse not to do it”.

    On the other hand, making it explicit that you’re spending half your time writing docs gives meddlesome senior executives yet another thing to haggle over and pressure you to cut from the schedule (“What? Three weeks of your two month project is writing documentation? Are you insane? The board wants this done by last week! Get out of my office!”). Only you can decide if your corporate culture permits you to make it explicit, but if you want docs, you can’t not budget the time somewhere to make it happen.

  • Include constant documentation review and improvement in all your operational processes. For example, our monitoring system alert handling procedures mandate that every alert goes into a ticket, and the ticket doesn’t get closed until it’s been through a set of review procedures, which include checking that it’s not a recurring problem (and fixing that), adjusting whatever needs adjusting if it was a false positive to make sure it doesn’t happen again, and — the key point in this discussion — fixing any documentation that wasn’t as good as it could be in helping to solve the problem. Don’t let any opportunity to improve your documentation go to waste.
  • Hire for technical writing skills. Make it a mandatory item in your interview process that the candidate demonstrate the ability to string a few written technical sentences together in a comprehensible way. Also, if you’ve got anyone in the team already who can’t write, encourage them to improve their writing skills — going as far as paying for them to attend a creative writing course (on company time, of course), or even running something internally if you’ve got enough people who can’t write. All of this emphasis on good written communication skills will make it obvious that you value good writing, and will encourage people to obtain and maintain those skills.
  • Be the change you wish to see in the world. Demonstrate that documentation writing is for everyone, by holding yourself to the highest possible standard. Document everything you do and know meticulously, do it as you do the work, and basically follow all the good advice you’re giving to everyone else. This doesn’t mean that you write all the documentation yourself (if you’re a manager, the chances are that you don’t know enough about everything to do it properly, anyway) but the work that you do do needs to include full and complete documentation.
  • QA docs along with the rest of the deliverables. Whenever you have a QA review of work done, make sure that it explicitly includes reviewing the accompanying documentation, both for style and substance, but also that the purpose of the documentation is met. Emergency recovery procedures look very different to project planning notes.
  • Ensure that people are kept up to date with changes, by having edits (including a diff of exactly what was changed) e-mailed out to everyone. I can’t believe that there are wiki systems that don’t do this, but I certainly will never use one of those systems (again) if I can avoid it. It’s just way too useful a feature to forego.The benefits are many and varied: it keeps the presence and quality of the documentation firmly in everyone’s mind; it’s an immediate QA process; people can’t get away with shoddy edits because everything they write immediately gets seen by everyone else; and, as processes change and improve over time, everyone’s kept up to date on those changes so they’re not doing things in the old, inferior way.

Given that long list of “Do’s”, here’s a don’t: espousing the need for documentation without doing anything else. “Yes boss, we all know how important docs are, but the project plan is already unrealistic even without any writing, and every time we try to write anything you roll your eyes and tell us to get on with the more important things, and you put the kybosh on hiring that extra person to help out with the extra workload, and oh yeah, you never write anything down yourself so what the hell?”

Next time, I’ll talk about how we structured the wiki for Project Starbug.

The Zen of Documentation Maintenance

Thursday, August 6th, 2009

Given that you’ve been suddenly and completely convinced of the need for documentation in my previous post, the question still remains: how does one make documentation appear on a consistent and ongoing basis?

If you’re really, really lucky, you’ve been spared the painful experience of putting up a wiki somewhere (or, worse, forked out a pile of cash for a “knowledge management system”), sticking some info into it at random, and then… nothing. You planted the seeds of a documentation tree, why isn’t it growing, and flowering, and solving all of your problems for now and forever?

For Project Starbug, we’re creating a whole new infrastructure, more-or-less from scratch. This is the easiest possible environment to make work, because you’re not constrained by what is already in place (and that you can’t afford to get rid of), and the whole thing isn’t in production so there’s no need to get freaked out by the thought of taking a major site off the Internet due to making an ill-advised change — and, most relevantly to this discussion, there’s no giant mass of undocumented… stuff that needs to be picked apart and documented. There’s nothing more deadly to motivation than the idea that when you’ve got this bit documented, there’s only 350,000 other bits to go.

So, if I didn’t want to end up with a shiny, new, incomprehensible and undocumented system, we needed to start focusing on documentation right off the bat and build the documentation alongside the rest of the system. This, in turn, meant that we needed to have something easy to work with, well structured, and above all ready to go before anything else could really kick off.

What to use was a no-brainer. Wikis are straightforward to access and edit, and there’s very little downside to them. We use moin internally for our documentation extensively, so it wasn’t a hard sell to spin up another copy of the wiki software to contain all of the documentation for this project. Most widely-used wiki engines these days are on much the same level, though, and it’s really just a matter of preference which one to use — mostly based around the language you’re most comfortable using (Python == Moin, PHP == MediaWiki, Perl == twiki, Ruby == instiki, Java == something useless and enterprisey), because you really want to be able to write plugins and extensions. One day I’d love to try ikiwiki, because that means I can edit wiki pages without even needing to open my web browser, which will be a particularly special kind of bliss.

Why did we use a separate wiki, though, and not an extension of our existing one? We want to communicate with the customer as well as we possibly can, and the content of the wiki is like a big, persistent communications nexus, and giving the customer (especially this customer, who really knows their stuff) direct access to be able to read all the internal procedures and technical information relating to the management of their infrastructure is a massive boon to communication. Who knows when they might see something we’ve written and say, “Hey, that’s not right!” and fix it? We’re the system administration experts, not the experts in their application, so it makes perfect sense to have them as tightly integrated as possible into the management of the whole infrastructure.

Though we may have made it over “Documentation Hurdle #1″, the race had barely even begun. Plenty of well-intentioned doc projects have gotten something started, and then withered on the vine. The key is to make sure that the documentation stays maintained, and keeping up with the growth of the infrastructure and it’s constant changes. The most important way to do this is to identify the reasons why people don’t keep a reasonably useable documentation repository maintained, and remove those reasons, leaving no possible excuse not to write docs. It needs to be easier to write docs than to not write them, otherwise they’ll get forgotten in the pressure of the moment, and playing catch-up is painful and annoying.

In the next article, we’ll examine why people don’t write docs as often as they know they should, and how to create a “documentation culture” in your team.

If you don’t write it down, it never happened

Wednesday, August 5th, 2009

You’re trying to reconfigure a service to do something new. Digging into the config files, you see that everything’s been modified heavily, but it doesn’t seem to make a lot of sense. Everything’s currently working, so it must be right, but why was it done like this in the first place? It looks like this can be simplified, but… you’re not sure. What if it needs to be this complicated for a reason?

Or perhaps it’s 2am, you’ve just been notified by the monitoring system that a critical system that you don’t have a lot of experience with has gone down. Logging in to the server you thought the service was on, you realise that this isn’t the right place, so you waste precious time tracing the network through the load balancers and proxies to where the service really lives. Then you realise that you don’t know where any of the config or binaries are. By now you’ve been ferreting around for 20 minutes and your SLA is just about blown, and you still have no freaking idea what’s going on with any of this…

Maybe the problem is that you’ve got something on your network that’s causing problems from excessive broadcasts on some random IP address. All you’ve got is the MAC address, but what server does that map to? You’ve got no idea.

Then there’s the frustration of being asked by the boss to setup a new instance of some minor service that nobody’s touched in two years as a test platform for a new client. Nobody else in the office remembers how it was done last time, only that it wasn’t a whole lot of fun. Of course, the boss doesn’t see why you can’t whip this one up nice and quick, given that “we’ve already got one of these over here, why should it take too long to do another one?”

By now, you probably know what this is all about (if the post title didn’t give it all away to begin with). We’re talking Documentation. Everyone’s heard a hundred reasons for why you should write documentation. Here’s number 101:

A system really only survives as long as people understand how it works and how to maintain it. The moment that information is lost, the system is basically dead — sooner or later someone’s going to come along and tear it down and replace it with something else, something that they understand. Of course, the operation of that system will sooner or later be lost and the cycle will repeat.

If you want to maximise the life of the systems you build, then, you need to ensure that the crucial details about it aren’t lost. The only way to do that is to write things down. Your memory will fade, you won’t always be around to answer questions, and trying to figure things out post-facto is a pain in the arse.

Of course, it’s all well and good to blather on about how wonderful
documentation is, but the difficult bit is how do you write good
documentation?
Well, I can’t say I’ve got all the answers, but in the next few installments of The Adventures of Project Starbug, I’ll describe how we laid out the documentation for this new project.

Pain-free server migration

Thursday, April 9th, 2009

Being the veteran of a datacentre migration and several whole server migrations I feel like I’m getting the process down to a reasonably fine art. I had to perform another migration last night from another datacentre to ours at Global Switch and the process went very smoothly so I thought I’d share some of the techniques I’ve built up over time so you might benefit if you’re in the same situation.

Preparation

This should go without saying. The more time you have to prepare for the migration, the better. You do not want to leave it until the last minute. My philosophy when approaching the migration is always to leave the least amount of work possible to do at the time of the actual migration. Clients will generally want to schedule any server downtime for late at night, when you are not going to be operating at your best (despite how many coffees or energy drinks you may have consumed). If you can log in to the machine, run a prepared script which takes of everything and have the migration completed for you, you will end up with a happy client and be happy yourself. You will be in the datacentre for less time and get to bed earlier, both of which are good things.

Make good use of scripting

Following on from my last point, I strongly encourage you to script as much as possible. The migration I just performed entailed moving a server from one datacentre and network provider to another which meant a change in address space. Thus, firewalls, IP address configuration files, Apache vhosts, ACLs and more had to change. Ahead of time I determined which files would need to be modified and created a script which took a backup of each of these files before overwriting them with corrected versions. Any failure would cause the script to stop and print the problem which could be easily diagnosed manually.

The more automation and failsafes you can build into your script, the better. Since you will be creating it with plenty of time up your sleeve and your brain operating at full capacity you can build up the script with your full arsenal of tricks. At 3am in a cold datacentre with noisy airconditioning you can hardly expect to have your full faculties with you, so make life easier for yourself by leaving as little actual work to do at this point.

Fully acquaint yourself with the server

You will only know what needs to be changed on the server if you are familiar with it. Of course, you should have plenty of good documentation already on it but if not, log in and get the lie of the land. Have a plan for how you will find out facts about the system – make use of grep and well structured regexes for finding out configuration details, slocate (if there is a locate database present) for finding critical files, and your usual toolkit of sysadmin techniques.

Document as you go

At Anchor, documentation is critically important. We have an internal wiki system in which we make detailed notes on every server and a great number of technical articles (a lot of which we have shared with you in our public wiki). Every migration plan is carefully documented from start to finish. In more complicated scenarios a full change proposal is created and officially ratified, but at the very least you should create a checklist:

  • people involved (and their contact details, if necessary)
  • time frame
  • a detailed list of items that need to be prepared or information that needs to be acquired before the migration takes place
  • actions that will be undertaken just before the migration starts
  • the list of actual migration steps, including details of what any scripts will be doing
  • post-migration actions which need to be done immediately after the migration – e.g. checking that all your monitoring is showing OK for all hosts and services
  • a list of “cleanup” items which can be completed after the migration, but not time critical, e.g. removing stale references to servers from your internal documentation

Have as many people check over your documentation as possible, preferably those who have knowledge of the systems so that they can find anything you have missed. The more eyes on your documentation, and heads thinking about it, the better the chances that you will have a plan that covers all aspects.

One of the most important things from my point of view with documentation is to forward a copy to the client, and keep them involved in the process. Not only does it give them confidence in your abilities to conduct the migration successfully, but it gives them an idea of the work that you have had to put in, gives transparency to the process and gives you another point of view on the migration – there may be other steps important to them which you may have missed for example lowering TTLs on domains that are solely client-controlled.

Keep the client “in the loop”

Following on from my previous point, as well as giving the client a copy of your migration documentation, it is important to let them know what is going on. Send them a courtesy email every day or two, a call or whatever your deem appropriate to let them know how you are going with preparations and any information you need from them.

On the day of the migration, double-check everything with them – times, contact details, the migration plan, and so on. Make sure they are still happy to go ahead and that they are happy with your plans. Give them a courtesy call or message when you are about to start the migration, when you are finishing, but most importantly whenever you have any unexpected problems. Nothing upsets clients more than having things go pear-shaped and not being informed about it. Even if you don’t know what the problem is, let them know that you are diligently working on it and will keep them up to date with developments.

Plan for when things go wrong

In a perfect world, you would prepare adequately and everything would go flawlessly (as it did for me last night, luckily). However every slightly obsessive-compulsive systems administrator knows that things can and will go wrong every now and then despite your best efforts.

Make an escape plan for every point where things can go wrong during the migration. Given you won’t have infinite time available, prepare most for the most likely failure scenarios. Make a rollback plan which will abort the migration, and decide how many failures will cause you to take this rollback plan on the night. Confirm this with the client.

Make sure that no change you make cannot be reverted (which most times will necessitate backups). There is nothing worse than discovering you have irrevocably destroyed data in the process of making a critical change.

Approach everything with an obsessive-compulsive attitude

The best plans will have considered everything and left no detail to chance. It can be tiresome to be painstakingly thorough in your plans, but ultimately it will pay off. At the same time though, you don’t have to do everything in one sitting – make notes in your migration plan on what you still need to do and follow it up later. Don’t foolishly believe you will remember everything on the migration day, or even an hour from now – WRITE IT DOWN!

Remember, even though the preparation may be slightly tiresome, you are just making life easier for yourself at migration time. Hopefully if you follow these general tips I’ve prepared, they will make your next migration a lot easier.

Site links
Anchor
Wiki
Blog
Services
Domain names
Web hosting
VPS
Dedicated Servers
Co-location
Articles
Dedicated Server Purchasing Guide
Dedicated Server Tutorials
Developer Friendly Hosting
Useful Tools