I see a lot of people talking about eBPF and XDP (eXpress Data Path) but I never had the chance to play with it. So, I decided to write a simple XDP program as a week-end project. This program should be able to filter network packets for a given source IP address. In this article, I will show you how this program works, how to compiles and runs it.

eBPF and XDP

I don’t know (yet :D) every details about these technologies, so don’t hesitate to send me an email if you find an issue in this article.

eBPF is a Linux kernel functionality which allows to write programs that will be compiled in eBPF bytecode. This bytecode is then verified (some common errors like using a potential null value are detected by the compiler), and executed in a virtual machine which runs inside the Kernel.

eBPF can be used to write monitoring tools (by attaching the program to kernel events like syscall for example). The advantage of eBPF is its low impact on performances of the instrumented system.

eBPF also allows to interact with the network with XDP (for example, this technology can be used to write a load balancer or a firewall). Performances of XDP programs are also good, because it is executed close to the hardware.

Here are some links about eBPF and XDP:

These articles are nice, but it was very hard to me to understand what’s going on here. That’s why I decided to practice, and to write an XDP program. In future articles, I will continue to explore eBPF and XDP.

Installation

The easiest way to compile a BFP program is to compile it directly in the Linux kernel source tree.

We should start by cloning the Kernel repository with git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git.

I also had to install in my machine (Debian) some packages (maybe you will have other packages to install in your own computer): apt-get install bison clang flex libelf-dev llvm.

Next, go to the kernel source tree root, and run make headers_install then make menuconfig (for this command, I used the default configuration).

You should now be able to compile the eBPF programs included in the Kernel with make samples/bpf/ (the / at the end of the path is mandatory).

Project setup

As previously said, my goal is to write a program which will filter all packets coming from a given IP address on the localhost interface. My program name will be xdp_ip_filter.

     Makefile

First, we should add in the samples/bpf/Makefile file instructions to compile our program. You will see in this file a lot of statements that begin with hostprogs-y. You should add the line hostprogs-y += xdp_ip_filter.

In the same way, you should add the line xdp_ip_filter-objs := bpf_load.o xdp_ip_filter_user.o where statement that begin with xdp_ are, then you should add always += xdp_ip_filter_kern.o.

The Makefile is now ready.

     Project files

We will work in two files: samples/bpf/xdp_ip_filter_kern.c and samples/bpf/xdp_ip_filter_user.c. The kern file will contain the code which will be compiled in BPF bytecode. The user file will be the entrypoint of our program (to start it). I will use the kern and user names to speak about these files.

The code for these files is available in two places:

Disclaimer: I’m far from a C expert, so my code is probably ugly (but it’s not a big deal for this exercise ¯_(ツ)_/¯).
I also advise you to read this article while having the two files open in your favorite text editor.

xdp_ip_filter_kern.c

After the headers declaration, we have a first macro:

#define bpf_printk(fmt, ...)                    \
({                              \
           char ____fmt[] = fmt;                \
           bpf_trace_printk(____fmt, sizeof(____fmt),   \
                ##__VA_ARGS__);         \
})

This macro will be used as a logger. How it works is not important.

     Maps

We have then a more interesting part:

struct bpf_map_def SEC("maps") ip_map = {
	.type        = BPF_MAP_TYPE_HASH,
	.key_size    = sizeof(__u32),
	.value_size  = sizeof(__u32),
	.max_entries = 1,
};

struct bpf_map_def SEC("maps") counter_map = {
	.type        = BPF_MAP_TYPE_PERCPU_ARRAY,
	.key_size    = sizeof(__u32),
	.value_size  = sizeof(__u64),
	.max_entries = 1,
};

We define here two maps. These maps are key/value associations, and these maps will be used by the BPF program to interact with the outside world (our user file in our case). The user program will be able to read and write in these maps, same thing for the kern program. You can see these maps like shared memory between these two programs, and it’s to my knowledge the only way to communicate between these two programs.

The first map ip_map is of type BPF_MAP_TYPE_HASH (it’s a basic key/value map). The keys and values are of type u32 (indeed, an IPv4 address can be represented as an integer). This map can only contain one entry (cf max_entries).
This map will be used by the user program to pass to the kern program the IP address which will be filtered (here, we only want to filter one address, that’s why the map has only one entry).

The next map named counter_map is of type BPF_MAP_TYPE_PERCPU_ARRAY. This type indicates that we will have one instance of the map per CPU core (if you have 8 cores, you will have 8 instances of the map). These maps will be used to count how many packets are filtered per core. The ARRAY type indicates that the map key should be between 0 and max_entries -1 (so in our case, we will only have one entry). In conclusion, we will have for each core a map whose the value for the key 0 will be the number of packets filtered by this core.

maps xdm et abpf

     The code

Get the filtered IP address

Here, we have a function which takes a xdp_md struct as a parameter. This struct contains the network packet on which we will interact.

SEC("xdp_ip_filter")
int _xdp_ip_filter(struct xdp_md *ctx) {
  // key of the maps
  u32 key = 0;
  // the ip to filter
  u32 *ip;

  bpf_printk("starting xdp ip filter\n");

  // get the ip to filter from the ip_filtered map
  ip = bpf_map_lookup_elem(&ip_map, &key);
  if (!ip){
    return XDP_PASS;
  }
  bpf_printk("the ip address to filter is %u\n", ip);

The first thing to do is to retreve the IP address we want to filter in the ip_map map. To do that, we call bpf_map_lookup_elem function with the ip_map and the 0 key as parameters (remember, our map has only one element: the key 0). Like said before, the IP returned by bpf_map_lookup_elem is an u32 in little endian (for example 192.168.1.78 ⇒ 0xC0A8014E in hexadecimal ⇒ read backward ⇒ 0x4E0180C0 ⇒ 1308721344 in base 10).
You can also see how I use bpf_printk as a logger.

lookup map ebpf

Get the packet source IP

Now, we want to retrieve the source IP address of the packet.

  void *data_end = (void *)(long)ctx->data_end;
  void *data     = (void *)(long)ctx->data;
  struct ethhdr *eth = data;

  // check packet size
  if (eth + 1 > data_end) {
    return XDP_PASS;
  }

  // check if the packet is an IP packet
  if(ntohs(eth->h_proto) != ETH_P_IP) {
    return XDP_PASS;
  }

  // get the source address of the packet
  struct iphdr *iph = data + sizeof(struct ethhdr);
  if (iph + 1 > data_end) {
    return XDP_PASS;
  }
  u32 ip_src = iph->saddr;
  bpf_printk("source ip address is %u\n", ip_src);

We start by getting the data from the ctx variable with ctx→data and a pointer on the end of the packet with (void *)(long)ctx→data_end. Then, we create a new variable of type ethhdr (representing an ethernet frame) which contains the data.

We should now check if eth + 1 is not higher than data_end. This check is mandatory (without it, the program refuses to compile). If the size is higher, we do nothing (we return the XDP_PASS constant and so do not filter the packet).

We then check if the packet is a IP packet using if(ntohs(eth→h_proto) != ETH_P_IP). If the packet is not an IP packet, we are not interested by it, so we return XDP_PASS again.

Then, we create a new struct of type iphdr from the ethernet frame. We do again a check on data_end (mandatory), and then we get the packet source IP with iph→saddr.

Filter the packet

We now have the source IP of the packet. We will compare it with the IP address we read in the map at the beginning of the program:

  // drop the packet if the ip source address is equal to ip
  if (ip_src == *ip) {
    u64 *filtered_count;
    u64 *counter;
    counter = bpf_map_lookup_elem(&counter_map, &key);
    if (counter) {
      *counter += 1;
    }
    return XDP_DROP;
  }
  return XDP_PASS;
}

Here, we compare ip_src with ip. If the packet should we filtered, we increment in the counter_map map the number of filtered packet (by using the 0 key again) with the bpf_map_lookup_elem (this function returns a pointer, and we increment its value) and we filter the packet by returning XDP_DROP. Otherwise, we return XDP_PASS.

And that’s it for the kern part !

xdp_ip_filter_user.c

     The code

This file starts like the other one by the inclusion of headers files, followed by:

static int ifindex = 1; // localhost interface ifindex
static __u32 xdp_flags = 0;

// unlink the xdp program and exit
static void int_exit(int sig) {
  printf("stopping\n");
  bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
  exit(0);
}

We define here a ifindex variable which is the index of the localhost interface (I will explain this later), then a xdp_flags variable.

The int_exit function will be used to stop the kern program on a signal by calling bpf_set_link_xdp_fd

The main function, get the IP address

Here is the main function of our program, which will be executed to start our BPF program:

int main(int argc, char **argv) {
  const char *optstr = "i:";
  char *filename="xdp_ip_filter_kern.o";
  char *ip_param = "127.0.0.1";
  int opt;
  // maps key
  __u32 key = 0;

  while ((opt = getopt(argc, argv, optstr)) != -1) {
    switch(opt)
      {
      case 'i':
        ip_param=optarg;
      break;
    }
  }

  // convert the ip string to __u32
  struct sockaddr_in sa_param;
  inet_pton(AF_INET, ip_param, &(sa_param.sin_addr));
  __u32 ip = sa_param.sin_addr.s_addr;
  printf("the ip to filter is %s/%u\n", ip_param, ip);

We define some variables like the expected parameters for the main function, the name of the .o file for the kern program (xdp_ip_filter_kern.o) which will have to be loaded, and a default value for the IP to filter (127.0.0.1).

We retrieve the IP to filter (which we be passed to the program with the -i option) and we convert it in u32 (for exemple "192.168.1.78" ⇒ 0xC0A8014E ⇒ read backward ⇒ 0x4E0180C0 ⇒ 1308721344 in base 10).

Limits update

In a lot of eBPF program the system limits are increased. I did the same thing in mine:

// change limits
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
  if (setrlimit(RLIMIT_MEMLOCK, &r)) {
    perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
    return 1;
  }

Loading the eBPF program

// load the bpf kern file
  if (load_bpf_file(filename)) {
    printf("error %s", bpf_log_buf);
    return 1;
  }

  if (!prog_fd[0]) {
    printf("load_bpf_file: %s\n", strerror(errno));
    return 1;
  }

  // add sig handlers
  signal(SIGINT, int_exit);
  signal(SIGTERM, int_exit);

We load the xdp_ip_filter_kern.o file (which contains our compiled kern program), and we add the int_exit handler on the SIGINT and SIGTERM signals.

Adding the IP to filter in the map

We now have to add the IP address we want to filter in the ip_map map. Remember, we already used this map in the xdp_ip_filter_kern.c file:

  // set the first element of the first map to the ip passed as a parameter
  int result = bpf_map_update_elem(map_fd[0], &key, &ip, BPF_ANY);
  if (result != 0) {
    fprintf(stderr, "bpf_map_update_elem error %d %s \n", errno, strerror(errno));
    return 1;
  }

We update the map with the bpf_map_update_elem function. map_fd[0] returns the first map defined in the kern file, which is our ip_map map (the order of the map declarations is important !). The map now contains for the key 0 the IP address to filter (therefore, the kern program will be able to read it, as showed before).

ebpf update map

Attach the XDP program to a network interface

In the int_exit introduced before, we called bpf_set_link_xdp_fd to stop the XDP program. This function used the ifindex variable. Actually, a XDP program is attached to a network interface (and in int_exit, we detached the program from the interface).

We should attach in our main function the XDP program we loaded to a network interface. The program will filter the packets for this interface only:

// link the xdp program to the interface
  if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
    printf("link set xdp fd failed\n");
    return 1;
  }

Here, we attach our program to the localhost interface.

Gather the statistics

Now, our XDP program is started and filters packets. We want to know how many packets has been filtered, by retrieving for each CPU core the value in the counter_map. Remember, this map is updated by our kern program.

  int i, j;

  // get the number of cpus
  unsigned int nr_cpus = bpf_num_possible_cpus();
  __u64 values[nr_cpus];

  // "infinite" loop
  for (i=0; i< 1000; i++) {
    // get the values of the second map into values.
    assert(bpf_map_lookup_elem(map_fd[1], &key, values) == 0);
    printf("%d\n", i);
    for (j=0; j < nr_cpus; j++) {
      printf("cpu %d, value = %llu\n", j, values[j]);
    }
    printf("\n\n");
    sleep(2);
  }

The counter_map count the filtered packets per core (the map type is BPF_MAP_TYPE_PERCPU_ARRAY). We retrieve the number of core we have with bpf_num_possible_cpus, then we create 2 for loop:

  • One which will periodically (every 2 seconds) retrieve the values from the map. bpf_map_lookup_elem is called on the number two map (map_fd[1], which is our counter_map), and we use the key 0. The values for each core are stored in the values array.

  • One which will iterate on the values array and print the value for each core.

ebpf update map

Here, we see that bpf_map_lookup_elem retrieve for every "counter_map" map for each core the value for the key 0, and store it in an array named values, where the index of the array is the core number.

End of the program

At the end of the program, we detach the XDP program from the localhost interface.

  printf("end\n");
  // unlink the xdp program
  bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
  return 0;

The code is over, we can now compile and test our program !

Test the program

You should use make samples/bpf/ to compile your program. You can now test it. For example, let’s filter all packets coming from the IP address 192.168.1.78:

cd samples/bpf/
sudo ./xdp_ip_filter -i "192.168.1.78"

The output should be:

the ip to filter is 192.168.1.78/1308731584
0
cpu 0, value = 0
cpu 1, value = 0
cpu 2, value = 0
cpu 3, value = 0
cpu 4, value = 0
cpu 5, value = 0
cpu 6, value = 0
cpu 7, value = 0
cpu 8, value = 0
cpu 9, value = 0
cpu 10, value = 0
cpu 11, value = 0
cpu 12, value = 0
cpu 13, value = 0
cpu 14, value = 0
cpu 15, value = 0

You can verify that your XDP program is attached to the localhost interface by calling ip link list. A line that starts with prog/xdp should be added on the interface:

ip link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    prog/xdp id 69 tag 1ddc7360e5987edf

Moreover, you can detach the XDP program from the interface at any time with the ip link set dev lo xdp off command.

Now, let’s test if our program works. I will use scapy to craft some network packets. Install it (using pip or your package manager for example). Then, as root, open a python interpreter with the python command and send some ICMP packets to localhost with the IP 192.168.1.78 defined as a source:

from scapy.all import  *
conf.L3socket=L3RawSocket
sr1(IP(src="192.168.1.78", dst="127.0.0.1")/ICMP())

The response will never arrive, because the packet has been filtered by our program ! Let’s check the output of your program:

cpu 0, value = 0
cpu 1, value = 0
cpu 2, value = 0
cpu 3, value = 0
cpu 4, value = 0
cpu 5, value = 0
cpu 6, value = 0
cpu 7, value = 0
cpu 8, value = 0
cpu 9, value = 1
cpu 10, value = 0
cpu 11, value = 0
cpu 12, value = 0
cpu 13, value = 0
cpu 14, value = 0
cpu 15, value = 0

here, my core number 9 filtered the packet. Try again, and the counters will be updated again !

You can also check the logs of the kern program (the output of bpf_printk) by reading the /sys/kernel/debug/tracing/trace.

Conclusion

I learned a lot about eBPF and XDP by writing this program. These are very interesting technologies, but no easy to use (especially for someone like me who does not have kernel development experiences). Some projects like bcc or bpftrace seems easier to use, but writing some C code is also a good learning exercise.

This will not be my last article on this topic, so stay tuned !

Top of page