Intelligent NFV performance with OpenContrail
The most common value for performance comparison between solutions is bandwidth, which shows how much capacity a network connection has for supporting data transfer, as measured in bits per second. In this domain, the OpenContrail vRouter can reach near line speed (about 90%, in fact). However, performance also depends on other factors, such as latency, or packets-per-second (pps), which are as important as bandwidth. Packets per second rate is a key factor for VNF (firewalls, routers, etc.) instances running on top of NFV clouds. In this article, we'll compare PPS rate for different OpenContrail setups so you can decide what will work best for your specific use case.
The simplest way to test PPS rate is to run a VM to VM test. We will provide a short overview of OpenContrail low-level techniques for NFV infrastructure, and perform a comparative analysis of different approaches using simple PPS benchmarking. To make testing fair, we will use only a 10GbE physical interface, and will limit resource consumption for data plane acceleration technologies, making the environment identical for all approaches.
OpenContrail vRouter modesFor different use cases, Mirantis supports several ways of running the OpenContrail vRouter as part of Mirantis Cloud Platform 1.0 (MCP). Let's look at each of them before we go ahead and take measurements.
Kernel vRouterOpenContrail has a module called vRouter that performs data forwarding in the kernel. The vRouter module is an alternative to Linux bridge or Open vSwitch (OVS) in the kernel, and one of its functionalities is encapsulating packets sent to the overlay network and decapsulating packets received from the overlay network. A simplified schematic of VM to VM connectivity for 2 compute nodes can be found in Figure 1:
Figure 1: A simplified schematic of VM to VM connectivity for 2 compute nodes
The problem with a kernel module is that packets-per-second is limited by various factors, such as memory copies, the number of VM exits, and the overhead of processing interrupts. Therefore vRouter can be integrated with the Intel DPDK to optimize PPS performance.
DPDK vRouterIntel DPDK is an open source set of libraries and drivers that perform fast packet processing by enabling drivers to obtain direct control of the NIC address space and map packets directly into an application. The polling model of NIC drivers helps to avoid the overhead of interrupts from the NIC. To integrate with DPDK, the vRouter can now run in a user process instead of a kernel module. This process links with the DPDK libraries and communicates with the vrouter host agent, which runs as a separate process. The schematic for a simplified overview of vRouter-DPDK based nodes is shown in Figure 2:
Figure 2: The schematic for a simplified overview of vRouter-DPDK based nodes
vRouter-DPDK uses user-space packet processing and CPU affinity to dedicate poll mode drivers being served by a particular CPU. This approach enables packets to be processed in user-space during the complete life time - from physical NIC to vhost-user port.
Netronome Agilio SolutionSoftware and hardware components distributed by Netronome provide an OpenContrail-based platform to perform high-speed packet processing. It’s a scalable, easy to operate solution that includes all server-side networking features, such as overlay networking based on MPLS over UDP/GRE and VXLAN. The Agilio SmartNIC solution supports DPDK, SR-IOV and Express Virtio (XVIO) for data plane acceleration while running the OpenContrail control plane. Wide integration with OpenStack enables you to run VMs with Virtio devices or SR-IOV Passthrough vNICs, as in Figure 3:
Figure 3: OpenContrail network schematic based on Netronome Agilio SmartNICs and software
A key feature of the Netronome Agilio solution is deep integration with OpenContrail and offloading of lookups and actions for vRouter tables.
Compute nodes based on Agilio SmartNICs and software can work in an OpenStack cluster based on OpenContrail without changes to orchestration. That means it’s scale-independent and can be plugged into existing OpenContrail environments with zero downtime.
Mirantis Cloud Platform can be used as an easy and fast delivery tool to set up Netronome Agilio-based compute nodes and provide orchestration and analysis of the cluster environment. Using Agilio and MCP, it is easily to setup a high-performance cluster with a ready-to-use NFV infrastructure.
Testing scenarioTo make the test fair and clear, we will use an OpenStack cluster with two compute nodes. Each node will have a 10GbE NIC for the tenant network.
As we mentioned before, the simplest way to test the PPS rate is to run a VM to VM test. Each VM will have 2 Virtio interfaces to receive and transmit packets, 4 vCPU cores, 4096 MB of RAM and will run Pktgen-DPDK inside to generate and receive a high rate of traffic. For each VM a single Virtio interface will be used for generation, and another interface will be used for receiving incoming traffic from the other VM.
To make an analytic comparison of all technologies, we will not use more than 2 cores for the data plane acceleration engines. The results of the RX PPS rate for all VMs will be considered as a result for the VM to VM test.
First of all, we will try to measure kernel vRouter VM to VM performance. Nodes will be connected with Intel 82599 NICs. The following results were achieved for a UDP traffic performance test:
As you can see, the kernel vRouter is not suitable for providing a high packet per second rate, mostly because the interrupt-based model can’t handle a high rate of packets per second. With 64 byte packets we can only achieve 3% of line rate.
For the DPDK-based vRouter, we achieved the following results:
Based on these results, the DPDK based solution is better at handling high-rated traffic based on small UDP packets.
Lastly, we tested the Netronome Agilio SmartNIC-based compute nodes:
With only 2 forwarder cores, we are able to achieve line-rate speed on Netronome Agilio CX 10GbE SmartNICs on all size of packets.
You can also see a demonstration of the Netronome Agilio Solution here.
Since we have achieved line-rate speed on the 10GbE interface using Netronome Agilio SmartNICs we wanted to have the maximum possible PPS rate based on 2 CPUs. To determine the maximum performance result for this deployment, we will upgrade existing nodes with Netronome Agilio CX 40GbE SmartNIC and repeat the maximum PPS scenario one more time. We will use direct wire connection between 40GbE ports and will set up 64-bytes UDP traffic. Even with hard resources limitations, we achieved:
|Rate||Packet size, Bytes|
|Netronome Agilio Agilio CX 40GbE SmartNIC||19.9 Mpps||64|
What we learnedTaking all of the results together, we can see a pattern:
Based on 64 byte UDP traffic, we can also see where each solution stands compared to 10GbE line rate:
|Rate||% of line rate|
|Netronome Agilio||14.9 Mpps||100|
|vRouter DPDK||4.0 Mpps||26|
|Kernel vRouter||0.56 Mpps||3|
- The Kernel vRouter, based on interrupt model packet processing, works, but does not satisfy the high PPS rate requirement.
- The DPDK-based vRouter significantly improves the PPS rate, but due to high resource consumption and because of defined limitations, it can’t achieve the required performance. We also can assume that using a modern DPDK library will improve performance and optimise resource consumption.
- The Netronome Agilio SmartNIC solution significantly improves OpenContrail SDN performance, focusing on saving host resources and providing a stable high-performance infrastructure.