I attended the Netdev 2.1 Conference in Montreal from April 6 to 8. Netdev is a community-driven conference mainly for Linux networking developers and developers whose applications rely on code in the Linux kernel networking subsystem. It focuses very tightly on Linux kernel networking and on how packets are handled through the Linux kernel as they pass between network interfaces and applications running in user space.
In this post, I write about the three-day conference and I offer some commentary on the talks and workshops I attended. I grouped my comments in categories based on my interpretation of each talk’s primary topic. The actual order in which these topics were presented is available in the Netdev 2.1 schedule. The slides from the talks, workshops, and keynotes are posted under each session on the Netdev web site. Videos of the talks are available on the netdevconf Youtube channel.
The Netdev conference is the second part of a two-part conference. The first part was a private, invitation-only meeting called Netconf held in Toronto and the second part is a public conference called Netdev held in Montreal. I attended and presented a talk at Netdev in Montreal and wrote this report about that conference. You will find a detailed report on the Netconf conference held in Toronto at Anarcat’s blog.
Each day at the Netdev conference featured a keynote by a prominent member of the Linux networking community. Two of the keynotes covered higher-level views of Linux in the network in the enterprise, cloud, and the Internet of things. The other keynote covered details of the new eXpress Data Path (XDP) feature in the Linux kernel.
Day 1 Keynote: Linux Networking for the Enterprise
Shrijeet Mukherjee from Cumulus Networks presented a keynote about the state of Linux networking in enterprise networks. Shrijeet offered his view that Linux needs to be everywhere in the network, from the smallest host to the largest server, from the simplest switch to the most capable router. In an all-Linux environment, each network element may perform different functions but all network elements can be managed and operated using the same set of Linux tools.
Shrijeet then discussed new Linux networking projects released over the past three years that are making Linux more relevant to enterprise networks. For me, the two most interesting features were Free Range Routing (FRR) and the Network Command-Line Utility (NCLU).
In April 2017, a consortium of Linux networking companies released the Free Range Routing (FRR) project, a fork of the Quagga routing protocol suite. Shrijeet said that the Quagga development has gone stale. I noticed that even the Open Source Routing foundation, previously a strong supporter of Quagga, has moved to support FRR, instead. Hopefully, we’ll see some advancements in Linux routing protocol support coming from this project.
Shrijeet presented the Network Command-Line Utility (NCLU), developed by Culumulus Networks to provide a system-level command-line interface that allows all Linux networking systems to be configured from a single configuration file. It is a Python-based daemon that sits on top of all existing userspace Linux tools. I hope Cumulus will release it as open source.
Shrijeet also covered Linux networking tools used in the enterprise ecosystem: ONIE, which allows for remote booting of Linux-powered network switches; ifupdown2, a tool that manages thousands of interfaces in datacenters; ethtool, which reports detailed information about network interfaces; and SwitchDev, which provides a standardized programming interface to various hardware switches.
Shrijeet argued that developers should support the native networking functions provided by the Linux kernel and avoid implementing key network functions on other applications like Open vSwitch. Innovation in the Linux kernel benefits all users, while innovation in particular applications benefits only those who use those applications and complicates the integration of networking in a network that uses multiple networking applications.
Day 2 Keynote: XDP Mythbusters
David S. Miller, the Linux kernel networking maintainer, presented a keynote on eBPF and XDP titled XDP Mythbusters. A video of his keynote is available on netdevconf Youtube channel. I discuss more of the technical details of eBPF and XDP in the Talks and Workshops section, below. There were many other talks about eBPF and XDP at this conference.
David’s keynote provided an overview of eBPF and XDP. He covered some of the history of the Linux kernel so we could understand why these features were developed and why they are used the way they are. XDP is used to prevent DDos attacks from overloading the system’s CPU, to perform load balancing, to collect statistics about packets, to perform sophisticated traffic sampling, and even to perform high-frequency trading (although the folks that do that won’t tell the Linux kernel development team exactly how they use XDP).
Then, David debunked some of the myths about XDP. He emphasized that XDP is safe to use even though it allows users to run their own code in the kernel because XDP has a built-in verifier to ensure code is safe. He said that XDP will be as flexible as user-space networking implementations like DPDK — it seems to me that the Linux kernel developers feel a bit of rivalry with alternative networking stacks like DPDK — and he addressed some of the overlap issues between XDP, TC, and netfilter.
David compared XDP to Arduino, the popular open-source hardware platform. Both systems are similar in that developers build a program on some other system, compile it, and then load the resulting bytecode onto the target system — the XDP subsystem in this case — where it runs.
Day 3 Keynote: Linux and the Network
Jesse Brandeburg from Intel presented the Day 3 keynote about Linux and the network. His main theme was “Everything is on the network and the network runs Linux”. He argued that the Linux network stack is part of most networks running today. For example, Android smart-phones use the Linux kernel and, since there are billions of smart-phones, they create a large amount of traffic on the network. Linux also runs on wireless base stations in LTE networks, on edge routers, core routers, and on data center switches and servers.
Jesse pointed out that the Linux kernel is best resource for implementing secure networks because the Linux kernel is actively supported by a community of individuals and companies who are regularly improving it and making it more secure. He argued that roll-your-own networking code, such as an alternative networking stack implemented in user space, is harder to secure unless you are the best in the world.
He next discussed how the Internet of Things (IoT) will drive innovation in networking to an extreme degree. Types of network endpoint points could range from self-driving cars to sensors buried under a road, both of which have very different networking requirements and constraints like location and access to power. The Linux networking stack must innovate to support both the high bandwidth, low latency network required by self-driving cars and the very low bandwidth, high latency, and low power networking available to sensors embedded in roadways.
Jesse finished his keynote by describing the directions Intel is taking in Linux kernel networking development. Intel is actively promoting Switchdev as a standard because it provides similar interface for real and virtual hardware, simplifying development. Intel is also interested in hardware offload supporting higher-performance in Linux networking.
eBPF and XDP
The Netdev 2.1 conference featured a lot of talks, and even a keynote, on eBPF and XDP. Both topics also frequently appeared in other presentations related to Linux networking performance, filtering, and traffic control. Netdev conference presenters discussed eBPF and XDP so frequently that I was thinking the Netdev conference should be renamed the “XDP conference”. XDP will be a major factor in Linux networking in the near future, especially as Linux becomes the standard for networking equipment in datacenter networks supporting services based on NFV and SDN technology.
The Extended Berkley Packet Filter (eBPF) is a special-purpose virtual machine in the Linux kernel that was originally developed to support applications that could quickly filter packets out of a stream1. Over time, it has been adopted as a “universal” tool for loading and running compiled bytecode in the Linux kernel. High-performance networking users want to be able to load BPF programs to do fast packet routing, rewrite packet contents at ingress, encapsulate and decapsulate packets, reassemble large packets, etc.2
The eXpress Data Path (XDP) is built around low-level BPF packet processing3. XDP is focused primarily on improving Linux networking performance and so is the Linux kernel’s answer to the Data Plane Development Kit (DPDK), a project championed by Intel that provides high-performance networking features in userspace, outside the Linux kernel. XDP is still very new, with a lot of new development coming in the future.
As I mentioned in the Keynotes section above, David S. Miller presented a keynote on eBPF and XDP titled XDP Mythbusters. David offered the history of eBPF and XDP, what is it used for, and debunked some of the myths about XDP. To highlight how XDP works, he compared XDP to Arduino, the popular open-source hardware platform. Both systems are similar in that developers build a program on some other system, compile it, and then load the resulting bytecode onto the target system — XDP in this case — where it runs.
To me, the most interesting XDP presentation was a one-hour tutorial showing how to build a simple XDP program to perform simple DDoS blacklisting. In this presentation, XDP for the rest of us, Andy Gospodarek from Broadcom and Jesper Dangaard Brouer from Red Hat shared their own experiences getting started with eBPF and XPD and writing their own XDP application. Their presentation provided an excellent overview of eBPF and XDP and offers links to many resources. I recommend downloading their slides and viewing their presentation on YouTube. Also, see Julia Evan’s blog post summarizing this presentation.
Gilberto Bertin from CloudFlare gave a talk about how they use eBPF and XDP in their DDos mitigation service to detect and block attacks on their customers. For those who wish to explore eBPF, Cloudflare open-sourced a suite of tools called bpftools.
A team from Facebook presented how they use eBPF and XDP to stop DDoS attacks. Huapeng Zhou, Doug Porter, Ryan Tierney, and Nikita Shirokovgeneric presented a framework to implement bpf policers that drop packets at the the earliest stage in the networking stack before memory is assigned to process any packets, at line rate.
At some point in these presentations, someone (I cannot remember who) showed a slide that referred a list of Linux enhanced BPF (eBPF) tracing tools compiled by Brendan Gregg. I thought this list was very useful so I wanted to be sure to reference it in this commentary.
Also, in the Day 3 keynote, Jesse Brandeburg mentioned that a new xdp-newbies mailing list had been created for people new to XDP.
Congestion control in the Linux Kernel
When I speak to Linux networking experts, I get the impression that many of them think of the network as an end-to-end connection between two Linux nodes, with the black box in between. The black box has properties such as bit rate, delay, and bit error rates, and may contain some really annoying things like NAT that need to be accounted for in an application. How the black box works is not relevant to Linux kernel networking developers. Linux networking experts speak of the end-to-end principle and focus on end-to-end topics like TCP and UDP performance, instead of, for example, routing protocols.
Linux networking developers are primarily concerned with how to improve the end-to-end performance of applications utilizing the Linux networking stack. One important function of the Linux kernel networking subsystem that can have a major impact on an application’s performance over the network is congestion control algorithms. Linux developers create new congestion control algorithms or improve existing ones.
Also, Linux networking users are similarly interested in congestion control algorithms, but their focus is usually on qualifying the performance of available algorithms for their specific use-cases. So, they are interested in characterizing performance of the existing TCP congestion control mechanisms such as CUBIC and BBR.
At the Netdev 2.1 conference, Jae Won Chung from Verizon presented the results of testing his team performed to evaluate various TCP congestion control mechanisms in 4G and 5G wireless networks. They actually drove a car for 100 km along a highway from New Jersey to Massachusetts and made measurements of application performance using TCP congestion control mechanisms such as CCA (the default in Linux), CUBIC, and BBR. His presentation offered detailed results from the testing. Jae concluded that buffer bloat inside eNodeBs adds to round-trip time (RTT) and that BBR offers significantly better performance than CUBIC in that environment. Some of the discussion between attendees after Jae’s talk hinted at the challenges facing Linux networking in the mobile environment. As we move to higher-performance cellular systems, cells get smaller, increasing the number of hand offs between base stations, especially when driving on a highway. This creates networking challenges that the Linux networking community needs to address and it may be that TCP will be completely replaced with something else for devices in mobile networks.
Hajime Tazaki from IIJ discussed a system he created, using the Linux Kernel Library (LKL), to test kernel code by running it in user space to try kernel innovations like TCP improvements without using virtual machines. He asked if this would skew measurements for performance. While testing TCP BBR, Hajime found that there was a big difference in performance when using LKL instead of using the native Linux kernel in a VM. He discussed the problems he found and was able to increase performance of BBR in the LKL. His presentation demonstrated how using the LKL can make development and testing of new kernel innovations easier.
Alexander Krizhanovsky, founder of Tempestra technologies, presented a new tool to enable faster processing of HTTP traffic to support DDoS response and filtering. Alexander argued that, to better mitigate against TLS handshake DDoS attacks, HTTPS processing should be performed in the Linux the kernel. This fits in with the arguments made some of the keynotes, which proposed that networking innovations should be implemented in the kernel, and not in user space.
Network virtualization enabled the modern data center. Virtual machines and containers need to be able to send data to each other and to users so virtual machines need virtual network interfaces.
You have probably seen virtual network interfaces when setting up VMs on your own computer. Virtual machine managers like VirtualBox, VMware, and KVM will allow you to configure different types of virtual network interface cards (NICs) on your virtual machine such as e1000 or VIRTIO). There are also more types of virtual NICs that support VMs and containers in data centres. The different ways that these virtual NICs support complex new problems like migrating VMs between hosts during live operations is a very interesting topic that I had never thought about before. Additionally, Linux users may apply virtual networking technology to emulate complex networking scenarios.
At the Netdev 2.1 conference, a team from Intel presented the history of Network Virtualization and its future in Software and Hardware. Anjali Singhai Jain, Alexander H Duyck, Parthasarathy Sarangam, and Nrupal Jani offered a very interesting view of the different virtual interfaces (e1000, VIRTIO, etc.) available to virtual machines and containers running on a Linux host. They discussed the current state-of-the-art (SR-IOV) and future projects (VFIO mediated devices and Composable Virtual Functions) that Intel is exploring to further improve the performance and flexibility of networking between VMs and/or containers running on the same host and between VMs and/or containers running on different hosts in a data center.
Stephen Hemminger from Microsoft presented a very informative talk about network device names. The Linux kernel assigns device names when the devices are configured. Many Linux users do not know how the kernel assigns device names. For example, the init system you use — such as systemd, upstart, init, etc. — and the system bus your host computer uses determine the interface names assigned to devices. This is easily seen when working in virtual networks with different hypervisors. For example: When I run Ubuntu 16.04 in a VirtualBox VM, the network interfaces have names like enp0s8, and if I run the same Ubuntu 16.04 image in VMware, network interfaces have names like eth0. The device names are different because these two hypervisors emulate different system buses. I recommend this talk to anyone who manages Linux systems that have more than one interface (like Linux-powered Ethernet switches or routers) and anyone who builds virtual networks using Liniux virtual machines.
Alexander Aring from Pengutronix presented his 6LoWPAN mesh virtual network emulator. This was very interesting to me because it is a new network emulation platform that supports the emulation of low-powered devices on the 6LoWPAN mesh networking technology to connect to each other. This new emulator will enable researchers to investigate the bahavior or Internet of Things devices in a virtual environment. Alexander demonstrated a network emulation in which IoT nodes running RIOT-OS to connect to each other via a fake PHY using the FakeLB kernel driver.
I was given the opportunity to present a talk about investigating network behavior using Linux network emulators. It covered an overview of network emulators I have presented on my blog.
Networking performance Improvement
Presenters and attendees at the Netdev 2.1 conference were very concerned with improving the performance of the Linux kernel networking subsystem. In some cases, the topics of virtual network interfaces (in the section above) and networking performance improvement are closely related.
In these talks, the presenters discussed improvements, new Linux kernel features, and new memory access technologies. Some presenters discussed improving network performance by offloading packet processing and forwarding to other hardware in the system, allowing the network to access system memory directly. Other presenters focused on methods to improve the speed at which new network connections may be created or modified. And, others presented experimental results comparing the performance of different networking functions in the Linux kernel.
Eric Dumazet from Google presented a talk about a new method for scheduling networking workloads on the system CPU, called busy polling. Eric presented a very technical deep dive into how the Linux kernel receives a packet from the network interface and sends it to an application. Performance varies according to interrupt mechanism, host scheduling, and other factors. He proposed a way to speed up throughput, reduce latency and jitter, and achieve a better balance between networking performance and overall system performance. Busy polling sockets dedicate one CPU in a multi-core, multi-threaded system to handle network I/O, which reduces interrupts to all CPUs in the system, improving network performance where low-latency and low jitter are required.
Willem de Brujin from Google presented an extension to one of the Linux kernel’s copy avoidance systems. Linux copy avoidance mechanisms improve system efficiency by not copying network packets multiple times between different locations in memory while the processing the packet. He showed a performance improvement of up to 90% in some cases.
Alexander Duyck from Red Hat chaired a workshop on network performance in the Linux kernel. The workshop consisted of a series of short talks from different presenters. Alexander presented some tips for efficiently mapping memory in the Linux kernel. Jesper Brouer discussed the impact of memory bottlenecks on networking performance. John Fastabend and Bjorn Topel presented improvements to the AF_PACKET socket. A team from Mellanox presented a method to improve throughput by batching requests to network drivers.
Sowmini Varadhan and Tushar Dave from Oracle presented benchmark test results in relational database management systems (RBDMS). They discussed how improvements in the Linux kernel improve performance in a database system.
Jon Maloy from Ericsson presented a new neighbor monitoring algorithm added to Linux kernel to support inter-process communication in Linux clusters (Google it). He showed the new algorithm scales much better than previous methods, which is important in high-performance computing clusters consisting of hundreds of nodes.
Arthur Davis and Tom Distler from NetApp presented a new network configuration daemon for a storage network that will increase the reliability of data center networks. They said that they intend to release this software as open source in the future.
Routing and Switching
Even though much of the Netdev conference is focused on the Linux kernel, a number of topics addressed higher-level topics related to routing and switching in the network. As a non-programmer, these topics were especially interesting to me. I appreciated the opportunity to learn about how Linux supports the Internet of Things (IoT), routing in low-powered wireless networks, and network testing.
Andrew Lunn, Florian Fainelli from Broadcom, and Vivien Didelot from Savoir-faire Linux presented a refreshed approach to an older Linux technology, the distributed switch architecture. This feature was added to the Linux kernel about 10 years ago and languished, underused, for years until 2014, when developers found new uses for it and actively started improving it. Now, it is supported by a variety of commercially-available hardware switches and it can be found running on a variety of network equipment running Linux, from home and office routers to switches used in the transport industry. Distributed switch architecture (DSA) allows a CPU to manage a set of hardware switches. It seems DSA is an alternative to Switchdev.
Stefan Schmidt from Samsung chaired a workshop on IoT-related routing protocols. Stefan, Alexander Aring from Pengutronix, and Michael Richardson from Sandelman Software provided an overview of the various data transfer and routing challenges faced by networking developers as they create new applications in the Internet of Things. The main focus was on establishing common standards for IoT networking to improve the current situation, where there are too many vendor-specific solutions. They discussed protocols for routing and data transfer in low-power, lossy networks such as 6loWPAN, which is IPv6 over Bluetooth, RPL, also known as “ripple”, and an effort to re-start development of Mesh Link Establishment (MLE).
Tom Herbert from Quantonium presented an overview the issues related to real-time networking in the Internet of Things. He got a round of applause when he started by announcing his presentation was “not about XDP”. He discussed the use-cases for real-time networking in the IoT and pointed out the solutions enabled by, and challenges caused by, this new technology. For example, using inputs from a combination of sensors and cameras to identify a specific mobile phone user in a crowded public space, and providing real-time commands to fast-moving autonomous vehicles to avoid collisions. He also addressed the issues of security and spoofing in the IoT. This was a very interesting talk. I recommend viewing the video to get the full impact of the presentation.
Joe Stringer from VMware presented a talk about how Open vSwitch is implemented in the Linux kernel. He pointed out that Open vSwitch has a user space controller and a kernel-based flow switch. Other controllers can interact with the Kernel-based switch. He gave Weaveworks and MidoNet as examples. He covered Linux commands that interact with the Open vSwitch in the kernel, such as ovs-dpctl and conntrack-tools. He also covered Open vSwitch kernel improvements such as conntrack, and packet recirculation.
Lawrence Brakmo from Facebook presented a new tool for testing networks, the NEtwork TESting TOolkit (Netesto). It is a set of tools that run on hosts in a network and collect and display network performance statistics. Lawrence provided an example of using Netesto to evaluate the performance of TCP congestion control algorithms. Facebook has released the code as an open-source project. It looks like this would be an interesting application to evaluate in a network emulator.
Filtering and traffic control
Netfilter and TC have been integrated with the Linux kernel for a long time. Both offer a lot of functionality that most Linux users do not know about. The Netdev 2.1 conference offered sessions covering the technical details of filtering and traffic control. In addition, they discussed the new nf_tables function, which is intended to replace the ip_tables firewall in Linux.
Jamal Hadi Salim from Mojatatu chaired a traffic control workshop covering netfilter, tc offload to hardware, performance issues, new features, and testing. Unfortunately, I had to skip this workshop to get some other business done so I can’t say much about it. The conference organizers have posted a video of the traffic control workshop. The participants were Jiri Pirko, Eran Dahan, Rony Efraim, Kiran Patil, Roman Mashak, Lior Narkis, Madalin-Cristian Bucur, and Lucas Bates
Florian Westphal presented a discussion of tools that support the conntrack feature in netfilter.
Pablo Neira Ayuso, maintainer of netfilter project, chaired a netfilter workshop. He presented an in-depth overview of netfilter and nf_tables. Florian Westphal provided an overview of packet steering using nf_tables.
Arkadi Sharshevsky from Mellanox presented some new debugging functions to support troubleshooting and argued for a vendor-neutral approach to hardware abstraction.
The Netdev 2.1 conference was a very positive experience for me even though I am not a developer. I was, at first, a bit intimidated by the list of very technical topics offered at the conference. But, even though the last C code I wrote was over 20 years ago, I found almost all of the talks and workshops offered at Netdev 2.1 — even the ones focused on development topics that delved deep into Linux kernel — provided me with something useful to take away.
I realized that a lot of the work I’ve been doing on open-source network emulators and simulators is not so relevant to the kernel. Linux network emulators may, depending on how they are implemented, use different features of the kernel.
This conference inspired me to consider some next steps in my research (to be prioritized along with everything else). Some points I will consider for future investigation are:
- Evaluate if XDP be emulated in a VM, or in a container.
- Create a network emulation using only Linux commands, without using user space programs like network emulators.
- Evaluate how each of the network emulators I write about relates to the Linux kernel networking subsystem. Highlight which ones are more appropriate for testing Linux kernel innovations like XDP or filtering, and which ones are better for user space innovations like routing protocols.
From: Prototype Kernel web site, April 2017 http://prototype-kernel.readthedocs.io/en/latest/bpf/index.html#introduction ↩