Limitations of BDB under concurrent access

March 28, 2013 Technical, General

Berkeley DB (BDB) isn’t exactly the most glamorous database engine around, but it’s surprisingly widely deployed and feature rich. It’s designed for embedded use, so you’ll tend to find it integrated into a lot of applications if you go looking.

One of our internal development efforts is evaluating various key-value databases as a secondary store to Redis. Because it’s accessed via in-code API calls instead of the network protocols that everyone’s familiar with, access semantics are quite different. BDB supports multiple simultaneous readers and writers, but the circumstances under which this is safe weren’t immediately clear.

Two independent processes? No problem.

Two independent processes? No problem.

Multiple processes

Each process initialises its own BDB handles and lets the library handle the concurrency. Shared memory is used to coordinate safe access to the database without explicit cooperation between each process.

From the documentation:

Multiple processes can all use the database at the same time as each uses the Berkeley DB library. Low-level services like locking, transaction logging, shared buffer management, memory management, and so on are all handled transparently by the library.

Too easy!

A cat with multiple threads is fine too.

A cat with multiple threads is fine too.

Multiple threads

Threading is generally considered to be a more complex problem. While processes are independent, threads share resources and operate in the same memory space as each other, so cooperation is paramount.

Thankfully, BDB makes this easy for us as well. The library facilitates reader and writer locks to allow concurrent access without causing deadlocks or data corruption.

Forked children? They're a handful!

Forked children? They’re a handful!

Forked child processes

This is where things get messy. Forked processes inherit the memory space of their parent, which in this case means all the BDB state and file descriptors. Attempts to do pretty much anything with BDB in the child process will probably result in a crash and/or data corruption.

To understand why this isn’t the same as threading, imagine each process is trying to use the same resources and manipulate the same objects in memory, but they’re mutually invisible to each other and can’t coordinate (“hey where’d I put my car keys??”); that’s pretty much exactly what’s going on now that they’re separate processes.

Resolving this would require BDB to have the ability to jettison all its known state and start afresh, something that isn’t possible as far as we can tell. It’s not all bad, it just means that we either need to use entirely separate (non-forked) processes or get threads to work cleanly in the rest of our app.

This isn’t actually a terribly uncommon situation, you tend to find the same limitations in graphical toolkit libraries. It’s fine now that we know, the nuisance here was the difficulty in finding a straightforward answer to our question!

No, BDB cannot be used across fork() in a child process.