Netflow and Kentik Network Observability

Introduction

In this article we talk about a network protocol called Netflow, which provides monitoring platforms with meta data about network traffic. We'll also look at some Python code that can generate Netflow, and so see how it's possible to integrate Netflow with applications or systems that don't normally have it.

We also take a brief look at the Kentik Network Observability Platform and talk about some of the key points we learned by using it (and making our Netflow work with it).

What is Netflow?

Netflow is a simple network protocol used by various network devices to provide usage and volumetric information to monitoring platforms. For example, it might be used by a network device to say "I have transferred 1 megabyte of data in 1000 packets from A to B".

Netflow can 'wrap' multiple packets into a single message so that you don't end up with a situation where you're sending a network packet about every packet that you've handled.

Netflow was developed by Cisco as a way to monitor their networking devices, but actually you can use Netflow for anything related to network traffic. This way you can provide rich information to monitoring platforms that goes into much more detail than simple network interface usage counters. You can probably imagine a device in a monitoring tool that says its network interface is running at 500Mbits/s, but then being able to "drill down" to see which other devices are using it the most, and even which protocols they're using to do so (when we say "protocols", we really mean "port numbers"). Netflow makes the "drill down" possible.

In technical terms, the Netflow protocol is quite flexible. However, as with all such things, some conventions exist so much of the flexibility and options aren't used all that much. A common example of a Netflow message might contain:

Fields are optional, so not all fields need to be included. Netflow V9 uses a "template" mechanism to say which fields are included in the message.

Something to note is the "ID of the network interface" field. This doesn't have to be included, but monitoring platforms will use this information to attribute Netflow with "devices" they are monitoring. Where its used/needed, it has to be the same numeric ID number as the monitoring platform is expecting. If your device has a network interface called eth1, then somewhere in the monitoring platform there'll be a device with an interface called eth1. It'll have some numeric ID number, and the Netflow being produced needs to match this number.

Monitoring platforms like to auto-configure themselves, which they typically do using some sort of 'network discovery' mechanism. SNMP is extremely common for this purpose, and the 'device ID' that SNMP uses is likely the device ID required for Netflow. Indeed, the template calls this field "INPUT_SNMP" or "OUTPUT_SNMP".

For Linux users, the output of ip a gives you the ID numbers you need. For example:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether d8:3a:dd:30:5d:8b brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc state UP group default qlen 1000
    link/ether d8:3a:dd:30:5d:8c brd ff:ff:ff:ff:ff:ff
    inet 192.168.23.45/24 brd 192.168.23.255 scope global dynamic noprefixroute wlan0
       valid_lft 55304sec preferred_lft 55304sec

Here, the number in the very left hand column is the ID number (eg. eth0 has ID 2).

To generate some actual Netflow messages in Python, the helpful scapy library can do all the hard work. For example:

import scapy.layers.netflow as nf
import scapy.layers as sl
import scapy.layers.l2 as l2
import scapy.layers.inet as sinet
import ipaddress
import socket

# Define the 'template', which outlines which fields we'll be sending
# when we send Netflow messages
flowset = nf.NetflowFlowsetV9(
    templates=[nf.NetflowTemplateV9(
        template_fields=[
            nf.NetflowTemplateFieldV9(fieldType="IN_BYTES", fieldLength=4),
            nf.NetflowTemplateFieldV9(fieldType="IN_PKTS", fieldLength=4),
            nf.NetflowTemplateFieldV9(fieldType="OUT_BYTES", fieldLength=4),
            nf.NetflowTemplateFieldV9(fieldType="OUT_PKTS", fieldLength=4),
            nf.NetflowTemplateFieldV9(fieldType="PROTOCOL"),
            nf.NetflowTemplateFieldV9(fieldType="IPV4_SRC_ADDR"),
            nf.NetflowTemplateFieldV9(fieldType="IPV4_DST_ADDR"),
            nf.NetflowTemplateFieldV9(fieldType="INPUT_SNMP"),
            nf.NetflowTemplateFieldV9(fieldType="OUTPUT_SNMP"),
        ],
        templateID=256,
        fieldCount=9)
    ],
    flowSetID=0
)
netflow_header = nf.NetflowHeader()/nf.NetflowHeaderV9()
recordClass = nf.GetNetflowRecordV9(flowset)

# These values would normally come from whatever you're monitoring ;-)
# As an example, we'll just set them statically for now.
in_bytes = 754
in_pkts = 5
out_bytes = 267
out_pkts = 1

# These are the "SNMP IDs" of each interface. See `ip a` for details
if_ids = {
  'lo': 1,
  'eth0': 2,
  'eth2': 3,
}

# Now construct a Netflow record. Note that the integers have to be 'big endian'
# and packed into byte fields with the same length as defined in the template.
netflow_record = recordClass(
    IN_BYTES=in_bytes.to_bytes(4, 'big'),
    IN_PKTS=in_pkts.to_bytes(4, 'big'),
    OUT_BYTES=out_bytes.to_bytes(4, 'big'),
    OUT_PKTS=out_pkts.to_bytes(4, 'big'),
    PROTOCOL=4,
    IPV4_SRC_ADDR="192.168.23.37",
    IPV4_DST_ADDR="192.168.24.232",
    INPUT_SNMP=if_ids['eth1'].to_bytes(2,'big'),
    OUTPUT_SNMP=if_ids['eth0'].to_bytes(2,'big'),
)

# Now construct an actual Netflow packet. This does not include the IP/UDP
# components
dataFS = nf.NetflowDataflowsetV9(templateID=256, records=[netflow_record])
pkt = netflow_header / flowset / dataFS

# `pkt` contains the bytes we now need to send to a monitoring platform
# (usually via UDP, although TCP is often possible too)

Once the packet has been constructed, it can be sent to a monitoring platform using the Python socket library (which will add the IP/UDP components to the packet as it travels across the network). We won't get into it too deeply here, but something like this will do it:

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(bytes(pkt), (netflow_server, netflow_port))

Once you've got some basic Netflow running, it's not much to start to produce Netflow for things that don't normally have it. It gets pretty cool when your webservers suddenly start producing Netflow and your Networks team can start to 'drill down' into what's going on with the network and even the applications that run on it.

Kentik Network Observability

There are free trials available of the Kentik platform, which we used to get our Netflow working, and also to demonstrate a "mesh" of devices and how they talk to each other. We produced something like this picture (although ours was a lot smaller):

A ring of devices with network links between them

As a side-note, we had a great experience with Kentik and their lovely people. We had a fairly atypical use-case, but everyone we spoke to was keen to get us up and working, and for us to get the most out of their platform that we could.

The main things we learned by using the Kentik platform were:

Our other main learning was just how much information platforms like Kentik are able to surface about the networks they monitor. Even our relatively simple network was very ably exposed by Kentik, and so we really could see an awful lot of what was going on there, in a variety of different contexts. This seems like a great return on a relatively small amount of setup that's required.

We didn't use either, but we note that Kentik has a REST API and there are also Terraform providers for it. That suggests it might be possible to provision servers and whatnot in a cloud environment AND update Kentik with the same information immediately. That would reduce any manual work required and would ensure Kentik stays up to date, even as virtual infrastructure changes.

Conclusions

Netflow is a relatively simple protocol for providing network monitoring and observability tools with meta data about network traffic flowing through devices. It's pretty easy to generate some Netflow, and we've demonstrated this using the scapy Python library.

The likes of the Kentik Network Observability Platform work well with Netflow data, but there is some interconnection between the platform, the Netflow and the physical device generating it. Once resolved though, Kentik is able to show a great deal of information in different forms, for a relatively small amount of setup.

For any help with Netflow, or getting yourself setup for maximum Observability (with Kentik or indeed anyone else), please contact us - we can help you figure out what you need and make it work for you.