Carrier-Grade Mirantis OpenStack (the Mirantis NFV Initiative), Part 1: Single Root I/O Virtualization (SR-IOV)

The Mirantis NFV initiative aims to create an NFV ecosystem for OpenStack, with validated  hardware at the bottom; hardened, configurationally-optimized OpenStack as a platform in the middle, and validated VNFs and other NFV software and application components at the top. As the pure play OpenStack company, we know that OpenStack is the best way to create an NFV infrastructure (NFVi), but we also know that our NFV clients – both telcos and enterprises – need more than just the OpenStack platform. They need a complete solution for NFV Infrastructure (NFVi) that answers the whole stack of architectural challenges presented by NFV — in compute, networking, storage, availability, scale and performance — and that reliably provides the network functions, orchestration and management functionality carriers need.

To provide this solution, Mirantis is integrating and optimizing OpenStack itself, and working with an ever-growing number of partners. In this article, we’ll talk about one important innovation that will help turn OpenStack into NFVi, Single Root I/O Virtualization or SR-IOV.

SR-IOV is a PCI Special Interest Group (PCI-SIG) specification for virtualizing network interfaces, representing each physical resource as a configurable entity (called a PF for Physical Function), and creating multiple virtual interfaces (VFs or Virtual Functions) with limited configurability on top of it, recruiting support for doing so from the system BIOS, and conventionally, also from the host OS or hypervisor. Among other benefits, SR-IOV makes it possible to run a very large number of network-traffic-handling VMs per compute without increasing the number of physical NICs/ports, and provides means for pushing processing for this down into the hardware layer, off-loading the hypervisor and significantly improving both throughput and deterministic network performance. That’s why it’s an NFV must-have.

We first talked about SR-IOV at the OpenStack Summit in Vancouver, in a session with an unofficial title that might as well have been “Run, Forrest, run!” because the main idea of SR-IOV is to get data to VMs more quickly. Now, we’re going to look at actually using SR-IOV with Mirantis OpenStack.

SR-IOV can be complicated. For example, on Intel NICs, PF cannot support promiscuous mode when SR-IOV is enabled, so L2 bridging cannot be enabled. Because of this, you shouldn’t enable SR-IOV on interfaces that have standard Fuel networks assigned to them. (You may use SR-IOV on the interface which is used only by Fuel Private network if you use nova host aggregates and different flavours for normal and SR-IOV-enabled instances, as shown in the section “Using SR-IOV“).

Note: iptables-based filtering is not usable with SR-IOV NICs, because SR-IOV bypasses the normal network stack, so security groups cannot be used with SR-IOV enabled ports (though you still can use security groups for normal ports).

You should note that SR-IOV has a couple of limitations in the Kilo release of OpenStack. Most notably, instance migration with SR-IOV attached ports is not supported. Also, iptables-based filtering is not usable with SR-IOV NICs, because SR-IOV bypasses the normal network stack, so security groups cannot be used with SR-IOV enabled ports (though you still can use security groups for normal ports).

So now that we know what we’re talking about, let’s look at how to enable SR-IOV and use SR-IOV. While you can use Fuel to deploy a Mirantis OpenStack cloud that includes all of the pieces for SR-IOV, it still needs to be configured separately.

Enabling SR-IOV

To enable SR-IOV, you need to configure it on compute and controller nodes.  Let’s start with the compute nodes.

Configure SR-IOV on Compute nodes

To enable SR-IOV, perform the following steps only on Compute nodes that will be used for running instances with SR-IOV virtual NICs:

  1. Ensure that your compute nodes are capable of PCI passthrough and SR-IOV. Your hardware must provide VT-d and SR-IOV capabilities and these extensions may need to be enabled in the BIOS. VT-d options are usually configured in the Chipset Configuration/North Bridge/IIO configuration” section of the BIOS, while SR-IOV support is configured in “PCIe/PCI/PnP Configuration.”
    If your system supports VT-d you should see the messages related to DMAR in dmesg output:

    # grep -i dmar /var/log/dmesg
    [    0.000000] ACPI: DMAR 0000000079d31860 000140 (v01 ALASKA   A M I  00000001 INTL 20091013)
    [    0.061993] dmar: Host address width 46
    [    0.061996] dmar: DRHD base: 0x000000fbffc000 flags: 0x0
    [    0.062004] dmar: IOMMU 0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de
    [    0.062007] dmar: DRHD base: 0x000000c7ffc000 flags: 0x1
    [    0.062012] dmar: IOMMU 1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020de
    [    0.062014] dmar: RMRR base: 0x0000007bc94000 end: 0x0000007bca2fff

    This is just an example, of course; your output may differ.

    If your system supports SR-IOV you should see SR-IOV capability section for each NIC PF, and the total VFs should be non-zero:

    # lspci -vvv | grep -i "initial vf"

    Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
    Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 01
    Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
    Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 01

  2. Check that VT-d is enabled in the kernel using this command:
    # grep -i "iommu.*enabled" /var/log/dmesg

    If you don’t see a response similar to:

    [0.000000] Intel-IOMMU: enabled

    then it’s not yet enabled.  Enable it by adding the following line to the end of /etc/default/grub:

    GRUB_CMDLINE_LINUX=" console=ttyS0,9600 console=tty0 net.ifnames=0 biosdevname=0 rootdelay=90 nomodeset root=UUID=d2b06335-bf6d-44b8-a0a4-a54224bdc7f8 intel_iommu=on"

    Next, update grub and reboot to get the changes to take effect:

    # update-grub
    # reboot

    and repeat the check. For new environments you may  want to add these kernel parameters before deploying so that they will be applied to all nodes of environment.  You can do that from the Fuel interface in the “Kernel Parameters” section of the “Settings” tab.

    NOTE: If you have an AMD motherboard, you need to check for ‘AMD-Vi’ in the output of the dmesg command and pass the options iommu=pt iommu=1″ to kernel, (but we haven’t yet tested that).
  3. Enable the number of virtual functions required on the SR-IOV interface. NOTE: Do not set the number of VFs to more than required, since this might degrade performance. Depending on kernel and NIC driver version you might get more queues on each PF with fewer VFs (usually, fewer than 32).First, enable the interface:
    # ip link set eth1 up

    Next, from the command-line, get the maximum number of functions that could potentially be enabled for your NIC:

    # cat /sys/class/net/eth1/device/sriov_totalvfs

    Then finally, enable the desired number of virtual functions for your NIC:

    # echo 30 > /sys/class/net/eth1/device/sriov_numvfs
    NOTE: To change the number to some other value afterwards you need to execute the following command first:

    # echo 0 > /sys/class/net/eth1/device/sriov_numvfs

    NOTE: These settings aren’t saved across reboots. To save them, add them to /etc/rc.local:

    # echo "ip link set eth1 up" >> /etc/rc.local
    # echo "echo 30 > /sys/class/net/eth1/device/sriov_numvfs" >> /etc/rc.local
    

    NOTE: there is a bug #1501738 in libnl3 3.2.21-1 (which is shipped with Ubuntu 14.04), that will result in an error message of ‘missing IFLA_VF_INFO in netlink response’ in the nova log when trying to create instance with an SR-IOV port if the number of virtual functions is greater than 30.

    If you plan to use more than 30 functions, install a newer version of libnl3, like so:

    # wget https://launchpad.net/ubuntu/+archive/primary/+files/libnl-3-200_3.2.24-2_amd64.deb
    # wget https://launchpad.net/ubuntu/+archive/primary/+files/libnl-genl-3-200_3.2.24-2_amd64.deb
    # wget https://launchpad.net/ubuntu/+archive/primary/+files/libnl-route-3-200_3.2.24-2_amd64.deb
    # dpkg -i libnl-3-200_3.2.24-2_amd64.deb
    # dpkg -i libnl-genl-3-200_3.2.24-2_amd64.deb
    # dpkg -i libnl-route-3-200_3.2.24-2_amd64.deb

    and restart libvirtd

    # service libvirtd restart
  4. Check to make sure that SR-IOV is enabled:
    # ip link show eth1 |grep vf
         vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
         vf 1 MAC c2:cd:57:9b:6c:7d, spoof checking on, link-state auto
    ...

    If you don’t see ‘link-state auto’ in output, then your installation will require an SR-IOV agent.  You can enable it like so:

    # apt-get install neutron-plugin-sriov-agent
    # nohup neutron-sriov-nic-agent --debug --log-file /tmp/sriov_agent --config-file \
    /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini
  5. Edit /etc/nova/nova.conf:
    pci_passthrough_whitelist={"devname": "eth1", "physical_network":"physnet2"}
  6. Edit /etc/neutron/plugins/ml2/ml2_conf_sriov.ini:
    [sriov_nic]
    physical_device_mappings = physnet2:eth1
  7. Restart the compute service:
    # restart nova-compute
  8. Get the vendor’s product id; you’ll need it to configure SR-IOV on the controller nodes.
    NOTE: This is just an example of the output. Actual value may differ on your hardware.
    # lspci -nn|grep -e "Ethernet.*Virtual"
    06:10.1 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual Function [8086:10ed] (rev 01)
    06:10.3 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual Function [8086:10ed] (rev 01)
    ...

    Write down the vendor’s product id (the value in square brackets).

Configure SR-IOV on the Controller nodes

  1. Edit /etc/neutron/plugins/ml2/ml2_conf.ini; use the vendor’s product id from the previous step as the value for supported_pci_vendor_devs:
    Change the line for mechanism_drivers

    mechanism_drivers =openvswitch,l2population,sriovnicswitch

    and add new section at the end of file:

    [ml2_sriov]
    supported_pci_vendor_devs = 8086:10ed
  2. Add PciPasstrhoughFilter and AggregateInstanceExtraSpecsFilter to the list of scheduler filters in /etc/nova/nova.conf:
    [DEFAULT]
    scheduler_default_filters=DifferentHostFilter,RetryFilter,\
    AvailabilityZoneFilter,RamFilter,CoreFilter,DiskFilter,ComputeFilter,\
    ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,\
    ServerGroupAffinityFilter,PciPassthroughFilter,AggregateInstanceExtraSpecsFilter
  3. Restart services:
    # restart neutron-server
    # restart nova-scheduler

Using SR-IOV

Now you’re ready to actually use SR-IOV.

  1. A recommended practice for using SR-IOV is to create a separate host aggregate for SR-IOV enabled computes.
    # nova aggregate-create sriov
    # nova aggregate-set-metadata sriov sriov=true
    # nova aggregate-create normal
    # nova aggregate-set-metadata normal sriov=false

    … and add some hosts to them:

    # nova aggregate-add-host sriov node-9.domain.tld
    # nova aggregate-add-host normal node-10.domain.tld
  2. Create a new flavor for VMs that require SR-IOV support:
    # nova flavor-create m1.small.sriov auto 2048 20 2
    # nova flavor-key m1.small.sriov set aggregate_instance_extra_specs:sriov=true

    You should update all other flavours so they will start only on hosts without SR-IOV support:

    # openstack flavor list -f csv|grep -v sriov|cut -f1 -d,| tail -n +2| \
    xargs -I% -n 1 nova flavor-key % \
    set aggregate_instance_extra_specs:sriov=false
  3. To use the SR-IOV port you need to create an instance with ports that use the vnic-types “direct.” For now, you have to do the following, via the command line. Because the default Cirros image does not have the Intel NIC drivers included, we’ll use an Ubuntu cloud image to test SR-IOV.
  4. Prepare the ubuntu cloud image:

    # glance image-create --name trusty --disk-format raw --container-format bare  \
    --is-public True \
    --location https://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64-disk1.img

    You can only login to this instance by using an ssh public key, so let’s go ahead and create a keypair. You can do this from the Horizon interface, but we’ll do it from the command-line, like so:

    # nova keypair-add key1 > key1.pem
    # chmod 600 key1.pem
  5. Create a port for the instance:
    # neutron port-create net04 --binding:vnic-type direct --device_owner nova-compute --name sriov-port1
  6. Spawn the instance:
    # port_id=`neutron port-list | grep sriov-port1 | awk '{print $2}'`
    # nova boot --flavor m1.small.sriov --image trusty --key_name key1 \
                --nic port-id=$port_id sriov-vm1
  7. Get the node’s ip address:
    # nova list | grep sriov-vm1 | awk '{print $12}'
    net04=192.168.111.5
  8. Connect to the instance to check if everything up and running:
    Find controllers with namespace which has access to instance:

    # neutron dhcp-agent-list-hosting-net -f csv -c host net04 --quote none | tail -n+2
    node-7.domain.tld
    node-9.domain.tld

    Connect to the instance (this command should be run on one of the controllers which we found in previous step):

    # ip netns exec `ip netns show|grep qdhcp-$(neutron net-list | grep 'net04 ' | awk '{print$2}')` ssh -i key1.pem ubuntu@192.168.111.5

    And that should be it!

Troubleshooting

Sometimes something goes wrong. Here are some common problems and solutions.

  • If you see errors in /var/log/nova/nova-compute.log on the compute host:
    libvirtError: internal error: missing IFLA_VF_INFO in netlink response

    … you should install a newer version of libnl3, as shown above.

  • If you see:
    libvirtError: unsupported configuration: host doesn't support passthrough of host PCI devices

    … in /var/log/nova/nova-compute.log, it means that VT-d is not supported or not enabled.

  • If you see:
    NovaException: Unexpected vif_type=binding_failed

    You should enable the SR-IOV agent, or if you’ve already done so, check that it’s running:

    # neutron agent-list | grep sriov-nic-agent
    | dfa4edcf-63c1-4af7-a291-ec139a16f346 | NIC Switch agent | node-16.domain.tld | :-) | True | neutron-sriov-nic-agent |

    Otherwise, examine the log file /tmp/sriov_agent for clues to what else might be wrong.

Conclusion

For now, configuring Mirantis OpenStack for SR-IOV is still relatively complex, thus potentially challenging to do on large clusters and prone to error. During the Mitaka cycle, we’ll be making improvements to current configurations, doing deeper testing, and working on automating configuration and deployment of SR-IOV via Fuel.

Subscribe to Our Newsletter

Latest Tweets

Suggested Content

LIVE DEMO
Mirantis Application Platform with Spinnaker
WEBINAR
How to Increase the Probability of a VNF Working with Your Cloud