Mirantis | The #1 Pure Play OpenStack Company

Openstack Networking for Scalability and Multi-tenancy with VlanManager

In a previous post I explained the basic mode of network operation for OpenStack, namely FlatManager and its extension, FlatDHCPManager. In this post, I’ll talk about VlanManager. While flat managers are designed for simple, small scale deployments, VlanManager is a good choice for large scale internal clouds and public clouds. As its name implies, VlanManager relies on the use of vlans (“virtual LANs”). The purpose of vlans is to partition a physical network into distinct broadcast domains (so that host groups belonging to different vlans can’t see each other). VlanManager tries to address two main flaws of flat managers, those being:

  • lack of scalability (flat managers rely on a single L2 broadcast domain across the whole OpenStack installation)
  • lack of proper tenant isolation (single IP pool to be shared among all the tenants)

In this post I will focus on VlanManager using multi-host network mode in OpenStack. Outside the sandbox, this is considered safer than using single-host mode, as multi-host does not suffer from the SPOF generated by running a single instance of a nova-network daemon per an entire openstack cluster. However, using VlanManager in single-host mode is in fact possible. (More about multi-host vs single-host mode can be found here).

Difference between “flat” managers and VlanManager

With flat managers, the typical administrator’s workflow for networking is as follows:

  • Create one, large fixed ip network (typically with 16-bit netmask or less) to be shared by all tenants:

  • Create the tenants
  • Once tenants spawn their instances, all of them are assigned whatever is free in the shared IP pool.

So typically, this is how IPs are allocated their instances in this mode:

tenant_1:

tenant_2:

We can see tenant_1 and tenant_2 instances have landed on the same IP network, 10.0.0.0.

With VlanManager, the admin workflow changes:

  • Create a new tenant and note  its tenantID
  • Create a dedicated fixed ip network for the new tenant:

  • Upon spawning, tenant’s instance will automatically be assigned an IP from tenant’s private IP pool.

So, compared to FlatDHCPManager, we additionally define two things for the network:

  • Associate the network with a given tenant (--project_id=<tenantID>). This way no one else than the tenant can take IPs from it.
  • Give this network a separate vlan (--vlan=102).

From now on, once a tenant spawns a new vm, it will automatically get the address from his dedicated pool. It will also be put on a dedicated vlan which OpenStack will automatically create and maintain. So if we created two different networks for two tenants, the situation will look like this:

tenant_1:

tenant2:

It can be clearly seen that tenants’ instances have landed on different IP pools. But how are vlans supported?

How VlanManager configures networking

VlanManager does three things here:

  • Creates a dedicated bridge for the tenant’s network on the compute node.
  • Creates a vlan interface on top of compute node’s physical network interface eth0.
  • Runs and configures dnsmasq process attached to the bridge so that the tenant’s instance can boot from it.

Let’s suppose that the tenant named “t1” spawns its instance t1_vm_1. It lands on one of the compute nodes. This is how the network layout looks:

We can see that a dedicated bridge named “br102” has been created along with a vlan interface “vlan102”. Also a dnsmasq process has been spawned and is listening on address 10.0.2.1. Once instance t1_vm_1 boots up, it receives its address from the dnsmasq based on a static lease (please see this previous post on the details of how dnsmasq is managed by OpenStack).

Now, let’s assume now that tenant “t1” spawns another instance named t1_vm_2, and it happens to land on the same compute node as the instance previously created:

Both instances end up being attached to the same bridge, since they belong to the same tenant, and thus they are placed on the same dedicated tenant’s network. They also get their DHCP configuration from the same dnsmasq server.

Now let’s say that tenant “t2” spawns his first instance. It also lands on the same compute node as tenant “t1”. Also, for his network, a dedicated bridge, vlan interface and dnsmasq are configured:

So it turns out that depending on the number of tenants, this is a normal situation where you have quite a large number of network bridges and dnsmasq processes, all running within a single compute node.

There’s nothing wrong with this, however - OpenStack will manage all of them automatically. Unlike the case of using flat managers, here both tenants’ instances reside on different bridges which are not connected to each other. This ensures traffic separation on L2 level. In case of tenant “t1”, the ARP broadcasts sent over br102 and then through to vlan102 are not visible on br103 and vlan103, and vice versa.

 

Support for tenant networks across multiple compute nodes

So far, we’ve talked about how this plays out on a single compute node. Most likely, you’ll probably use a lot more than one compute node. Usually we want to have as many of them as possible. Then, likely, tenant “t1″ instances will be scattered among many compute nodes. This means that his dedicated network must also be spanned across many compute nodes. Still it will need to meet two requirements:

  • t1′s instances residing on different physical compute nodes must communicate
  • t1′s network spanning multiple compute nodes must be isolated from other tenants’ networks

Typically, compute nodes are connected to a network switch by a single cable. We want multiple tenants to share this link in a way that they don’t see one another’s traffic.

There is a technology that addresses this requirement called Vlan tagging. Technically, it extends each Ethernet frame by adding a 12-bit field called VID (Vlan ID), which bears the vlan number. Frames bearing an identical Vlan tag belong to a single L2 broadcast domain; thus devices whose traffic is tagged with the same Vlan ID can communicate.

It should be obvious, then, that one can isolate tenants’ networks by tagging them with different Vlan IDs.
How does this work in practice? Let us look at the above diagrams.

Traffic for tenant “t1” leaves the compute node via “vlan102″. Vlan102 is a virtual interface connected to eth0. Its sole purpose is to tag frames with a vlan number “102″, using the 802.1q protocol.

Traffic for tenant “t2” leaves the compute node via “vlan103″, which is tagged with vlan tag 103. By bearing different vlan tags, “t1′s” traffic will in no way interfere with “t2’s” traffic.

They are unaware of each other, even though they both use the same physical interface eth0 and, afterwards, the switch ports and backplane.

Next, we need to tell the switch to pass tagged traffic over its ports. This is done by putting a given switch port into “trunk” mode (as opposed to “access” mode, which is the default). In simple words, trunk allows a switch to pass VLAN-tagged frames; more information on vlan trunks can be found in this article. At this time, configuring the switch is the duty of the system administrator. Openstack will not do this automatically. Not all switches support vlan trunking. It’s something you need to look out for prior to procuring the switch you’ll use.

Also – if you happen to use devstack + virtualbox to experiment with VlanManager in a virtual environment, make sure you choose “PCNET – Fast III” as the adapter to connect your vlan network.

Having done this, we come to this model of communication:

 

The thick black line from compute nodes to the switch is a physical link (cable). On top of the same cable, vlan traffic tagged by both 102 and 103 is carried (red and green dashed lines). There is no interference in traffic (the two lines never cross).

So how does the traffic look when tenant “t1” wants to send a ping from 10.0.2.2 to 10.0.2.5?

  • The packet goes from 10.0.2.2 to the bridge br102 and way up to vlan102, where it has the tag 102 applied.
  • It goes past the switch which handles vlan tags. Once it reaches the second compute node, its vlan tag is examined.
  • Based on the examination, a decision is taken by compute node to put it onto vlan102 interface.
  • Vlan102 strips the Vlan ID field off the packet so that it can reach instances (instances don’t have tagged interfaces).
  • Then it goes down the way through br102 to finally reach 10.0.2.5.

 

Configuring VlanManager

To configure VlanManager networking in OpenStack, put the following lines into your nova.conf file:

 

Conclusion

VlanManager is by all means the most sophisticated networking model offered by OpenStack now. L2 scalability and inter-tenant isolation have been addressed.

Nevertheless, it still has its limitations. For example, for each tenant network it relates ip pools (L3 layer) to vlans (L2 layer) (remember? – each tenant’s network is identified by a pair of ip pool + vlan). So it is not possible to have two different tenants use the same ip addressing schemes independently in different L2 domains.

Also – vlan tag field is only 12 bits long, which tops out at only 4096 vlans. That means you can have no more than 4096 potential tenants, not that many at cloud scale.

These limitations are yet to be addressed by emerging technologies such as Quantum, the new network manager for OpenStack and software-defined networking.

 

In the next post in this continuing series, I will give an explanation of how floating ips work.

20 comments
Google Plus Mirantis

20 Responses

  1. leseb

    Hi,

    Nice article.
    Btw nova-manage create network will fail without the –label parameter.

    Do you know if the current driver support VXLAN?

    Cheers!

    August 7, 2012 14:55
    • Piotr Siwczak

      Ieseb,
      Thank you for pointing this out – label is definitely necessary ;-)

      As for your question on vxlan support.
      Definitely none of current network managers in Essex (FlatDHCP, Flat, Vlan) supports it.

      Since vxlan is Cisco’s technology, they might consider writing an appropriate plugin for Quantum network manager which is supposed to be default in Folsom release.
      (check out this post: http://blogs.cisco.com/openatcisco/integrating-vxlan-in-openstack-quantum/).
      Nevertheless, I still cannot find any vxlan-related code in Quantum yet.

      August 8, 2012 23:58
  2. Hendrik

    Great overview of the VlanManager. I have a few questions and comments regarding your comparison:

    You say VlanManager is a good choice for large scale deployments yet you list the limitation to 4096 VLANs which translates to the same amount of tenants. For a public cloud that might be not enough. You even state this fact – contradicting you claim in the first paragraph. As far as I know that’s one of the reasons HP went with DHCPFlat for their public cloud.

    You note that DHCPFlat needs a single L2 broadcast domain across the OpenStack Installation. That is true when “installation” means zone (or whatever the current name is). As VLANs are a L2 concept it is unclear to me how using VLANs changes this requirement. Could you clarify how this would be set up?

    With DHCPFlat on the other hand you could either use Zones, or extend the implementation in a way that models Amazon’s approach, which basically is: L2 inside the rack with L3 between racks. This would need changes to the scheduler/networking so that the correct IPs would be assigned per Rack – or 1 Zone = 1 Rack. But it’s doable and should be scalable without (L2 networking) limit.

    You write that DHCPFlat lacks proper tenant isolation. While it it is true that there’s only one global IP pool, the tenants are isolated using ebtables and iptables. In the end you trust the OpenStack code to setup the isolation using ebtables/iptables with DHCPFlat. With VlanManager you trust it to setup different networking interfaces with VLANs correctly.

    How does Quantum address all these issues? My understanding is, that Quantum itself is only an API that says nothing about the underlying implementation. I could end up using Quantum+DHCPFlat or Quantum+VLANs… I haven’t looked at Quantum to much though, so I could be wrong. Looking forward to the Quantum article.

    August 7, 2012 22:19
    • Piotr Siwczak

      Hi Hendrik.

      Yes – 4096 vlans definitely is wrong when it comes to large scale deployments. My thought for scalability here and was rather that it is still better to have 4096 L2 broadcast domains than a single one that you have by default with FlatDHCP.
      Openstack still gives you flexibility on how to use these four thousand broadcast domains – you can map them to tenants 1 to 1 or treat them as multiple public networks just to reduce L2 broadcasts. I don’t know much about how HP implemented their networking, but my guess here is that they might not have left much of the original openstack code (this statement is based on my personal expertise at Mirantis – customizing openstack for different clients is 80% of our work).

      As for zones, they will be replaced by “cells” in Folsom (check this: http://comstud.com/FolsomCells.pdf)
      Also – in openstack there are two distinct concepts which are often mixed: “availability zone” and “zone” (which both exist ,but mean different things in fact)
      “Rackspace Availability Zones White Paper” gives a fine explanation on this. The document states that in Essex release, zone support is work in progress. As for networking vs cells – cells are meant to be independent from each other, so I guess they will be isolated by L3 (still need to look into it to confirm). Status for now – you really seem to have no “zones” functioning in Essex.

      Your point on L3 routing between server racks is reasonable and in fact I have already seen such a deployment. In this post I talk however about standard openstack setup without code modifications.

      Agreed about tenant isolation. You trust whatever openstack sets up for you. However – I’ve seen a couple of security holes opened in iptables by careless sysadmins (either by applying or incidentally deleting some rules in the course of debugging openstack problems). Primary reason for this – openstack does not monitor firewall rules at all and they are re-applied only once nova-network is restarted.

      Quantum:
      You are right – Quantum is an abstraction layer between openstack and different network devices. One one side Quantum listens to openstack, and on the other, it has plugins , e.g. for linux bridge, openvswitch, cisco UCS blades, openflow controllers (like Ryu), etc.
      So depending on the underlying plugins, you can have a model based on ordinary vlans/bridges, or on openflow, or “whatever you like”. If you want to get rid of the VLAN limitation, the project worth looking at, may be Nicira NVP, which eliminates this problem. Nicira NVP also functions as a quantum plugin.

      Take care, and many thanks for an inspiring comment.”

      August 9, 2012 10:49
  3. Ben Grissinger

    Hello Pitor – more of a planning /design question for you.

    Our network has several private VLan segments, (172.16.9.x, 172.16.10.x …). I would like to assign specific subsets of these floating IP ranges using tenancy (project IP assignments via Horizon) to our internal project customers for there public facing server IP. I was planning also to use another private range that gets assigned each VM (10.x.x.x) on startup of the VM. I was planning then to create all VlAN bridges per node for each 172 subnet we have using eth0 (as per your diagrams above). Eth1 is used for the mgmt network. The eth0 switch ports are setup to support VLAN tagging and trunking as to be able to accept and route all traffic. Do the eth0 ports per node need setup to promiscous still? And should I be doing this differently??

    Does this scenario

    August 14, 2012 06:50
    • Piotr Siwczak

      Ben,
      If I understand correctly, network 172.16.0.0/16 is your floating ip range (the “public” addresses), while 10.0.0.0/16 is private/fixed. In this case, you do not create bridges for networks from 172.16.0.0, as their assignments to vm-s is being done by NAT-ting. Also – for now there does not seem to be any way to enforce floating ip range assignments to specific tenants. Please, check out my later post on how floating ips work: http://www.mirantis.com/blog/configuring-floating-ip-addresses-networking-openstack-public-private-clouds/

      August 20, 2012 02:02
  4. Realde

    Is it possible to configure the Routing so some tenants can talk to each other and others cannot? The typical use case would be to have a virtual DMZ and a backend networks you want to be able to communicate but with a firewall in the middle. Can the Cloud Controller work as such?

    August 15, 2012 08:10
    • Piotr Siwczak

      Hi,
      If you introduce an upstream router and point it to be a gateway for instances, then – yes, but the router is independent from openstack and will be managed independently.

      Basic way to ensure instance isolation is security groups.

      August 20, 2012 02:05
  5. padnala balaji

    Hi,

    Do we need physical switch between two compute nodes as shown in the above diagram,so that the VMs enabled for smae Tenant will talk to each other?.

    Cant we avoid it if we use OVS+Quantum on compute node?

    please share your comments

    September 6, 2012 05:12
    • Piotr Siwczak

      Padnala,

      Even though you use OVS+Quantum you still need some wire that will connect all your cluster nodes. So the switch is necessary here.

      September 6, 2012 05:19
  6. ritesh

    hi piotr,
    The switch connecting the two compute nodes, if i create vlan 102 on that switch and assign it to a port , would the laptop connected to that port able to communicate with vm on computes nodes in that same vlan.

    October 4, 2012 06:06
  7. Shlomi

    Piotr, keep these great articles coming!

    Regarding .1Q 4096 VLANs limit, I guess .1QinQ can save our day here:
    http://en.wikipedia.org/wiki/IEEE_802.1ad

    October 8, 2012 15:00
    • Piotr Siwczak

      Hi,
      I am happy you appreciate our blogging efforts ;-)
      Good point, but rather than relying on 802.1ad, there seems to be bias rather towards technologies like VXLAN, OpenFlow or GRE tunnels.

      October 9, 2012 07:35
  8. Sam Stoelinga

    When using VlanManager and multiple compute nodes, do you also need to specify multi_node=True? According to this blog post it seems not?

    October 18, 2012 02:50
    • Piotr Siwczak

      Hi,
      I haven’t come across this option in nova code anywhere. Surely, the one you must set in case of multihost networking is:
      multi_host=True.

      October 19, 2012 00:19
  9. klepamar

    fantastic article :)
    spent a couple of hours wondering how vlan manager mode works before accessing this blog.

    January 27, 2013 11:12
  10. Ghassen

    Hi, thanks for the post
    it’s very interesting !!
    I need some more details about vlan mode configuration
    What must be the interfaces configuration on the cloud controller and on the compute nodes?
    What must be the configuration of the nova.conf file ? (network and volumes sections)
    Is it possible to have an example of this files ?

    February 12, 2013 03:13
    • Piotr Siwczak

      Hi,

      While I do not have the exact config file handy, I can explain the network layout.

      If you go for the multi_host networking mode (multi_host=True), then you need to equip the compute node with 2 network interfaces:

      1. Private (this is where bridges will be attached and where fixed IP traffic will be going)
      2. Public – this is where instances will be reaching public nets (e.g. google) and where floating IPs will reside.

      March 12, 2013 01:20
  11. ehung

    Hello Pitor :
    Thanks for sharing your concept with vlan networking.
    We have a little problem at vlan networking configuration.
    We deployment dual node (one control node and one compute node) with flat dhcp networking type and work it well.
    And then, We modify networking type to vlan in nova.conf (–network_manager=nova.network.manager.VlanManager) and create nova network for vlan(nova-manage network create private –multi_host=F –fixed_range_v4=10.0.1.0/28 –vlan=1 –project_id=9310969cf76f483cad7a44ab67a58de0 –num_networks=1 –network_size=16 –bridge_interface=eth1).
    When We create vm and We got the error message below.
    =====================================
    :
    udhcpc (v1.17.2) started
    Sending discover…
    Sending discover…
    Sending discover…
    No lease, forking to background
    starting DHCP forEthernet interface eth0 [ [1;32mOK[0;39m ]
    cloud-setup: checking http://169.254.169.254/2009-04-04/meta-data/instance-id
    wget: can’t connect to remote host (169.254.169.254): Network is unreachable
    cloud-setup: failed 1/30: up 9.82. request failed
    :
    ========================================
    Could you give our some advice?

    Thanks in advance.
    ehung

    July 23, 2013 20:10

Continuing the Discussion

  1. Развертывание небольшого облачного Openstack-кластера под управлением CentOS 6 | ArRnorets' blog

    [...] Соответственно, на Сервере 1 и Сервере 2 необходимо настроить интерфейсы eth1 и мосты br100 должным образом. То есть, в нашем примере eth1 должны входить в мост br100, а IP-адрес для br100 должен быть получен из сети 10.0.1.0/24 А вообще, есть хорошее описание того, как работает nova-network: http://www.mirantis.com/blog/openstack-networking-vlanmanager/ [...]

    October 10, 201219:12

Some HTML is OK


or, reply to this post via trackback.