Aggregating RRD data from multiple files

By February 2, 2009Technical

The RRD (Round-Robin Database) file format is a beautiful piece of work. It is used for storing time-series data in a (storage and CPU time) efficient form, with a fixed file size, and with some great support tools to retrieve, manipulate, and graph the data in various ways.

One problem you tend to hit every now and then, though, is that you want to aggregate the data from multiple separate RRD files into one monster graph. The simple method might be to put all the data into one RRD file, but that doesn’t work in the case where you can’t always collect all the data at once — RRD requires that you insert values for all your data sources at the same time.

Now, since we use Cacti for data collection at Anchor, in theory we should just be able to tell Cacti to do this. However, its interface is utter balls, and it always seems to take 10 times as long to do something as it should, so I tend to script this sort of thing instead of trying to fight Cacti. Also, if you don’t use Cacti (you lucky person, you), then you might need to know how to do this.

Recently, we needed to know the aggregate current draw from all the racks in our data centre. We’ve got APC managed power rails in every rack, and we already collect the current data from these devices, but then it’s stored in one RRD file for each power rail. So, we needed to aggregate this data into one big graph, and take some values out of it for management’s edification. Since there’s not a lot of info out there on aggregating lots of RRDs together, I thought I’d put down some notes on the subject.

The standard form of doing a graph in RRD is like this:

DEF:power=rack1.rrd:apc_current:AVERAGE
CDEF:kw=power,240,*
VDEF:avg=power,AVERAGE
VDEF:avg_kw=kw,AVERAGE
LINE:power#ff0000
GPRINT:avg:Average current is %9.2lfA
GPRINT:avg_kw:Average nominal power is %9.2lfA

This just takes the apc_current data source from the file rack1.rrd and stores it in the variable power. Then we scale the data source into kW (line 2), take the average of all the data points for both of those, then draw a line for the current, and print the average values we calculated. All pretty simple stuff, and if you work with RRD files at all, you’re probably quite familiar with this sort of thing.

What isn’t as common knowledge is that there’s nothing special about the DEF statement above — you can repeat that as many times as you like, and you can point to as many different files as you need. So if you’ve got, say, ten RRD files with current values in them, you can just do:

DEF:power1=rack1.rrd:apc_current:AVERAGE
DEF:power2=rack2.rrd:apc_current:AVERAGE
DEF:power3=rack3.rrd:apc_current:AVERAGE
...
DEF:power8=rack8.rrd:apc_current:AVERAGE
DEF:power9=rack9.rrd:apc_current:AVERAGE
DEF:power10=rack10.rrd:apc_current:AVERAGE

This will define separate variables for the apc_current data source in each of the files. This also works, incidentally, if you’ve got multiple data sources in each file (like, say, incoming bytes and outgoing bytes).

Once you’ve got your data sources mapped, it’s a fairly simple matter of adding them all together:

CDEF:power=power1,power2,+,power3,...,power9,+,power10,+

The rest of the definition stays the same.

What makes for a slightly more exciting time is when you don’t know, in advance, how many files you’re going to have to merge together. This happens whenever the user gets to specify what data gets included — the script we’ve got here asks you which racks you want to aggregate the data for, and I’ve done bandwidth graphs in the past which showed all of a customer’s IP addresses in one graph. In this case, you need a bit of code, and here’s some Ruby that I use to generate the RPN expression above to add all of the values together:

# Generate an RPN (reverse polish notation) sum of
# the strings given in list.
# A single-element list is supported, with the
# expected lack of addition operator.
def to_rpn_sum(list)
        if list.length == 1
                list[0]
        else
                x = list.dup
                (x.length - 1).times { |i| x.insert(i * 2 + 2, '+') }
                x.join(',')
        end
end

Glue that together with the code to create your list of RRD files, something to write out all the DEF lines (and keep a record of what variable names you use) and you’re pretty much done.