IP Traffic Accounting
When your business relies on data services that are being charged at a per-megabyte or per-gigabyte rate (and let's face it, unless you live in a bandwidth-saturated environment like the U.S. or Korea, that's probably all of us) you need some way of tracking your usage independently of your provider's measurements. As in any scientific arena the more measurements, the better, and the same goes for the accuracy of these records. If you are providing data services to clients and charging them by the megabyte, keeping accurate records of their usage is even more important as you'll need it for billing.
Below we'll talk about some of the ways it is possible to perform traffic accounting. In contrast to some of the other articles we have also written, we'll only be talking about accounting on a per-IP basis - not delving into log files of Apache or other services to find per-domain usage.
Traffic Accounting in the Dark Ages
Prior to the invention of useful tools like pmacct, in the Linux 2.2 and earlier days we used ipchains for firewalling. There was a handy package around at the time called ipac, which performed IP accounting based on packet/byte counters recorded by ipchains. The benefit was that you could get extremely detailed accounting, only limited by how detailed you wanted to be with your ipchains rules. A later version called ipac-ng also supported iptables.
For example (and we'll use iptables here), you can add in a simple rule which has no target but simply records the packet count.
[email protected]:~# iptables -I OUTPUT -p icmp [email protected]:~# iptables -Z [email protected]:~# iptables -L OUTPUT -nv Chain OUTPUT (policy ACCEPT 4 packets, 536 bytes) pkts bytes target prot opt in out source destination 0 0 icmp -- * * 0.0.0.0/0 0.0.0.0/0 [email protected]:~# ping -q -c 5 192.0.2.1 PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data. --- 192.0.2.1 ping statistics --- 5 packets transmitted, 3 received, 40% packet loss, time 3998ms rtt min/avg/max/mdev = 22.200/22.306/22.385/0.144 ms [email protected]:~# iptables -L OUTPUT -nv Chain OUTPUT (policy ACCEPT 1774 packets, 888K bytes) pkts bytes target prot opt in out source destination 5 420 icmp -- * * 0.0.0.0/0 0.0.0.0/0
Here we have added a simple accounting rule which logs the number of outbound ICMP packets and the corresponding byte count at the IP level. We can see from the ping output that although each ICMP packet was 56 bytes, the entire encapsulating IP packet was 84 bytes. 5 x 84 = 420 and that is exactly what the iptables counter has reported.
ipac runs periodically - much like an SNMP data collector - and measures the increases in the configured counters to give you incremental updates so that accurate graphs can be generated from the data usage. Data can be recorded in plain-text files or a database, and summarised as desired to give the required accuracy for accounting without blowing out the data storage.
ipac/ipac-ng are a simple way to do traffic accounting on the Linux platform with ipchains/iptables, but once you start to see a large amount of traffic or have to do accounting for a lot of hosts it starts to become a little unwieldy. There is a limit to how many ipchains/iptables rules you can have without it becoming difficult to manage and having an impact on processor load. There are also better tools out there, but they require a little more work, as we'll discuss below.
Sidebar: Network Layer
If you are being charged by your upstream provider for the amount of traffic you use, it is important to know how they are counting that data. Specifically, at which layer of the protocol stack they are counting the data. For the purpose of this sidebar we'll say you are getting IPv4 connectivity from your provider over an Ethernet connection.
It is fair to say that if you are getting charged for your IPv4 data, then so is your upstream provider. What they aren't being charged for is the overheads of the underlying Layer 2 protocol whether that be Ethernet, ATM or any other technology. The pieces of data that travel all over the Internet are the IPv4 packets sent and received from your equipment, not the Ethernet headers which change when they hit the next router along the way.
Some less scrupulous providers with lesser technical knowledge may charge you based on switch port data counters, which will reflect the byte count all the way down to the Ethernet layer. When you consider that this adds 14 bytes per frame (18 bytes if tagged VLANs are being used), this will add around 1-20% additional overheads on the traffic accounting figure (depending on average packet size). The difference in your bill at the end of the month (especially if you are using a lot of data) may be significant.
For this reason, it is always good to keep an accurate account of your own data usage. A reliable traffic accounting system is key to this goal.
Given the CPU load and administrative overhead that firewall rules can lead to (which will scale linearly as you add more hosts), it is undesirable to continue with the ipac/ipac-ng approach using firewall counters. Enter pmacct, which can capture accounting data from a number of sources, but we'll start with Promiscuous Mode which is its namesake.
When you start to scale up your infrastructure and add reliability concerns into the mix, you don't want your traffic accounting being performed in your main network traffic path. Ideally your traffic only travels through components that actually do something useful with the data - routing, switching, firewalling or actually consuming the data. Adding more components into the pipeline decreases overall reliability by adding another point of failure. Traffic accounting can also be very I/O and CPU intensive so you do not want to combine accounting with any of the other roles such as routing or firewalling.
For this purpose it is best to have a separate machine to do the accounting. Hardware recommendations are as follows:
- medium-scale I/O capacity (both in terms of storage and throughput)
- fast CPUs (at least one core per network interface)
- one network interface per link you need to monitor (if using mirror/span ports) or two network interfaces per link if using network taps
- one network interface that serves only as a management interface for the accounting machine
- memory capacity tuned for database usage
Essentially the machine will be hosting a large database, so build the basic specifications up from that point of view.
Mirror/Span ports versus Network Taps
Since the accounting machine will not be directly in the routing/switching pipeline (and therefore would have all the traffic pass through it) you need a mechanism to get a copy of the network traffic into the machine so it can be measured by pmacct. Fortunately, if you have managed switches you probably already have such a mechanism available to you.
"Span" ports (Cisco terminology) or "Mirror" ports (HP terminology) take a copy of every packet that enters on one or more designated "monitor" ports and passes it unchanged out of the designated span or mirror port. This applies to both inbound and outbound packets, so if you have a 100Mbps Full Duplex link technically you need 200Mbps of mirror port capacity. The downside to using a mirror port is obvious - you can't exceed the half-duplex capacity of the mirror port with the traffic you are monitoring. However you could easily mirror a 100Mbps Full Duplex port with a 1000Mbps mirror port - most switches that have Span/Mirror capabilities will allow you to have unmatched speeds on the mirror port and the monitor ports.
For example on an HP switch, if we have our border router upstream link connected to our border switch on port 20, and our traffic accounting machine connected to port 19 we would configure the switch as follows:
mirror-port 19 interface 20 monitor exit
If both ports are 1000Mbps, we need to ensure that the total inbound + outbound traffic on port 20 doesn't exceed 1000Mbps, or else we will start to drop packets before they reach the accounting machine.
If you are not already aware of them, network taps are very handy devices. They perform much the same job as Mirror or Span ports as described above, but typically split the inbound/outbound channels of a Full Duplex link into two separate feeds. This ensures that you will never drop a packet due to exceeding the limit of your mirror port. On the downside, you will need twice as many network ports on your traffic accounting machine, and they will need to be the same speed as the links you are monitoring. That is to say, if the link you are monitoring is 100Mbps, the network ports will need to be 100Mbps (or at least, set to that speed), and likewise 1000Mbps links will need two 1000Mbps ports to monitor the traffic.
On the upside, since inserting a network tap into the middle of one of your upstream connections means another single point of failure, good network taps will continue to pass traffic along network link even when their power is disconnected - you simply lose your monitor feed.
Network taps run from around $1000 for 100Mbps up to $10000 and more for 1Gbps speeds and more complex models so they are not an insignificant cost, but they offer significant flexibility and can be used for several different applications like Network Security Monitoring.
Setting up pmacct
Once we have our stream of network traffic entering the accounting machine, we need to install pmacct and set up the database. I recommend using PostgreSQL for this purpose for its reliability and speed (when correctly tuned). Since this data will be used for billing we need a backend that can be relied upon. Full configuration of pmacct is outside the scope of this article but here are the basic steps:
- Choose a table schema. Pmacct has several different schemas which provide different amounts of accounting detail. If you are only interested in source/destination IP address and number of packets/bytes then there is little to gain from recording the protocol or port as well (in fact there is a lot to lose, since your database usage will increase significantly). Only record the data you intend to use.
- Create a configuration file for each network interface that is receiving the traffic you are monitoring.
Configure the aggregate parameter with only the data points you are going to use.
Use a pcap_filter that you know captures the right traffic. Test it with tcpdump from a shell first.
- Make use of the SQL recovery options, which allows you to save the accounting traffic to a data file if the database becomes unavailable (for example, during daily vacuuming and backups).
- Use one table per day. When you need to archive old data off from the database, it is far easier and faster to dump and drop a table than limiting a SELECT statement to a date range and dropping the relevant rows.
- Don't use indexes on the tables. Most of the transactions will be simple inserts, which you want to be as fast as possible. When you do aggregate the information into a daily per-IP record, you will be doing a full row scan anyway so indexes will not help.
Once you have it configured and running, you should be able to query your database for accounting data and find details similar to the following:
pmacct=# select * from acct_20090201 limit 10; ip_src | ip_dst | packets | bytes | stamp_inserted | stamp_updated | vlan -----------------+----------------+---------+--------+---------------------+---------------------+------ 192.0.2.1 | 192.0.2.145 | 2168 | 122256 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 123 192.0.2.5 | 192.0.2.1 | 2720 | 181275 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234 192.0.2.7 | 192.0.2.8 | 113 | 41354 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234 192.0.2.1 | 192.0.2.145 | 4521 | 258144 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 123 192.0.2.1 | 192.0.2.167 | 2884 | 186284 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234 192.0.2.178 | 192.0.2.57 | 7800 | 312000 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234 192.0.2.37 | 192.0.2.1 | 721 | 89479 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234 192.0.2.150 | 192.0.2.78 | 2092 | 116299 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234 192.0.2.170 | 192.0.2.78 | 12666 | 765683 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234 192.0.2.57 | 192.0.2.178 | 936 | 917322 | 2009-01-31 23:59:00 | 2009-02-01 00:00:31 | 1234
For the purposes of billing, you'll need some script or mechanism to read data from the pmacct tables, aggregate them to a single inbound/outbound figure for each IP address and insert the result into your billing database. This is outside the scope of this article and highly dependent on the deployment environment.
Netflow, sFlow and friends
Promiscuous-mode accounting is easily implementable, flexible and quite reliable in most cases. However, it is ultimately limited by how fast you can count the traffic entering and exiting your network. One machine can quite comfortable perform accounting on a constant 100Mbps stream of data, but 1Gbps increases the burden on the hardware, and data rates beyond that start to become even more of a burden. With network taps you can split the traffic off to as many devices as you desire so it is possible to perform parallel accounting on several machines but the hardware requirements scale up quite rapidly with large amounts of traffic.
Since ultimately you will be aggregating this traffic accounting data for the purposes of billing anyway, a solution that most of the big players in the Internetworking world make use of is Netflow, sFlow and other derivatives. These protocols are built into solid-state routers, switches and firewalls and perform aggregation of the traffic inside the device. Periodically a flow packet is sent to the collector (in our case, the traffic accounting machine) which is then processed and entered into the database.
The major difference between using promiscuous-mode and flow accounting is that your accounting machine will not need to see every single byte and packet. This means a vast reduction of the network interface and bandwidth requirements of the traffic machine, and equally significant reduction in CPU load due to interrupts and network traffic processing. The trade off is a configurable loss of accuracy, however when scaling up to large amounts of traffic the accuracy sacrifice becomes negligible.
For example, if you have sFlow in your switch or router set to only sample 1 out of every 10 packets, you would configure your sFlow collector to use a multiplier of 10. For low traffic rates the accuracy is poor, but once you start seeing very large packet and byte counts, the marginal accuracy losses will be more than made up for by the extra accounting capacity and efficiency of the system as a whole.
Using Netflow/sFlow with pmacct
As Netflow and sFlow are "Enterprise"-oriented protocols, you will find scant documentation on them and information on how to use them. The best source of information in this case is in the pmacct mailing list archives and documentation. To use Netflow you will use the nfacctd component of pmacct, and to use sFlow you need to use the sfacctd component. Aside from the interface to the Netflow and sFlow aggregators in your network devices, these components operate much the same as the pmacctd component in the backend - writing the collected traffic accounting data to the database.
Typically with sFlow you will need to have a script that enables sFlow accounting on your network device such as the sFlowenable script, available from the InMon Corporation (developers of sFlow). Once you have run the script to enable sFlow accounting on the network device, it will begin aggregating data and sending updates to your sFlow collector - the traffic accounting machine. It is actually possible to have sFlow updates sent to several different collectors from the one switch or router, so it is possible to scale out your traffic accounting infrastructure quite easily using this method.
The sFlow packets that hit the collector will look something like this (courtesy of sflowtool):
startDatagram ================================= datagramSourceIP 0.0.0.0 datagramSize 220 unixSecondsUTC 1236325078 datagramVersion 5 agentSubId 0 agent 192.0.2.1 packetSequenceNo 98 sysUpTime 1403836350 samplesInPacket 1 startSample ---------------------- sampleType_tag 0:1 sampleType FLOWSAMPLE sampleSequenceNo 110 sourceId 0:3 meanSkipCount 512 samplePool 55568573 dropEvents 22 inputPort 24 outputPort multiple 4 flowBlock_tag 0:1 flowSampleType HEADER headerProtocol 1 sampledPacketSize 466 strippedBytes 8 headerLen 128 headerBytes 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00- 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00- 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 dstMAC 000000000000 srcMAC 000000000000 IPSize 444 ip.tot_len 444 srcIP 192.0.2.55 dstIP 192.0.2.99 IPProtocol 6 IPTOS 0 IPTTL 105 TCPSrcPort 3027 TCPDstPort 443 TCPFlags 24 endSample ---------------------- endDatagram =================================
The data is summarised down to a sequence of these UDP packets which are sent from the agent (the switch or router) to the collector (the machine running sfacctd. Netflow operates in much the same way.
Configuration of sfacctd or nfacctd is mostly identical to that of pmacctd however instead of listening in promiscuous mode for traffic to sample, the sampling is done on your switches/routers. Netflow and sFlow agents are specifically designed to be very lightweight in CPU utilisation so the overhead involved in running either protocol even on heavily loaded networks should be minimal. The summarised packets are delivered to your traffic accounting machine and entered into the database for billing data to be generated from, just as in our promiscuous mode example, however now you are able to scale up your accounting infrastructure to meet the traffic needs of your network for years to come.