Mirantis OpenStack

  • Download

    Mirantis OpenStack is the zero lock-in distro that makes deploying your cloud easier, and more flexible, and more reliable.

  • On-Demand

    Mirantis OpenStack Express is on demand Private-Cloud-as-a-Service. Fire up your own cloud and deploy your workloads immediately.

Solutions Engineering

Services offerings for all phases of the OpenStack lifecycle, from green-field to migration to scale-out optimization, including Migration, Self-service IT as a Service (ITaaS), CI/CD. Learn More

Deployment and Operations

The deep bench of OpenStack infrrastructure experts has the proven experience across scores of deployments and uses cases, to ensure you get OpenStack running fast and delivering continuous ROI.

Driver Testing and Certification

Mirantis provides coding, testing and maintenance for OpenStack drivers to help infrastructure companies integrate with OpenStack and deliver innovation to cloud customers and operators. Learn More

Certification Exam

Know OpenStack? Prove it. An IT professional who has earned the Mirantis® Certificate of Expertise in OpenStack has demonstrated the skills, knowledge, and abilities needed to create, configure, and manage OpenStack environments.

OpenStack Bootcamp

New to OpenStack and need the skills to run an OpenStack cluster yourself? Our bestselling 3 day course gives you the hands-on knowledge you need.

OpenStack: Now

Your one stop for the latest news and technical updates from across OpenStack ecosystem and marketplace, for all the information you need stay on top of rapid the pace innovation.

Read the Latest

The #1 Pure Play OpenStack Company

Some vendors choose to “improve” OpenStack by salting it with their own exclusive technology. At Mirantis, we’re totally committed to keeping production open source clouds free of proprietary hooks or opaque packaging. When you choose to work with us, you stay in full control of your infrastructure roadmap.

Learn about Our Philosophy

Understanding VlanManager Network Flows in OpenStack Cloud: Six Scenarios

on August 14, 2012
In a couple of recent posts I covered some fundamental concepts of OpenStack networking, including VlanManager and floating IPs. This post builds upon the recent content and aims to present how traffic flows in different scenarios addressed by OpenStack. Reading the prior posts (VlanManagerFloatingIPs) is highly recommended before going through this blog post. Also, consider before reading that all the scenarios presented below are based on multi-host networking mode. Single-host networking is omitted, as it is considered a SPOF and should be avoided in production deployments.
Demands regarding network communication and setup are very different for each OpenStack deployment. In most cases, at least the following demands need to be met:
  • Upon creation, an instance must obtain an IP from a fixed network.
  • In general, instances need access to the Internet (to download security updates, etc.).
  • Usually instances must communicate within their fixed IP network.
  • Some instances need to be exposed to the world on a publicly routable IP.
Another thing to consider is tenant isolation. It can range from having each tenant “imprisoned” inside their fixed networks to 100% visibility across these networks on the other extreme. While the first scenario is typical for public clouds (tenants typically do not want to expose their servers to one another), the latter can be encountered in corporate clouds, where tenants belong to the same organization  (e.g., different developer groups) and often there is a need for their instances to communicate.
Consider the following network layout:

The above diagram shows two compute nodes connected by a network switch.

We have “tenant1” and “tenant2,” which hold different private networks, located in different vlans (100 and 102). Since we are running a multi-host networking setup, every compute node has direct access to external networks (Internet, etc.). For this purpose, an eth1 interface is used. Based on this diagram, I’m going show 6 scenarios that illustrate how different OpenStack networking scenarios are carried out.

Scenario 1

Instances of tenant1 boot up and have their IPs assigned:

  1. Instance VM_1 boots and sends a DHCPDISCOVER broadcast message on the local network.
  2. The message gets broadcast over br100.
  3. The dnsmasq server listens on the address of br100 (“–listen-address 10.0.0.1”). It also has a static lease configured for VM_1. It answers with DHCPOFFER containing:
    • an instance address: 10.0.0.3 and
    • a default gateway pointing to br100: 10.0.0.1.
  4. Instance VM_4 boots and sends a DHCPDISCOVER broadcast message on the local network.
  5. The message gets broadcast over br100.
  6. The dnsmasq server listens on the address of br100 (“–listen-address 10.0.0.5”). It also has a static lease configured for VM_4. It answers with DHCPOFFER containing:
    • an instance address: 10.0.0.6 and
    • a default gateway pointing to br100: 10.0.0.5.

 

Note:
Instances running on different compute hosts have different default gateways configured.

 

Scenario 2

VM_1 wants to access the Internet (e.g., Google’s DNS: 8.8.8.8) and it has only a fixed IP assigned.

  1. VM_1 sends a ping to Google’s DNS 8.8.8.8.
  2. Since google.com is not on its local network, VM_1 decides to send the ping directly via the default gateway (10.0.0.1 in this case).
  3. Compute node makes a routing decision. 8.8.8.8 does not reside on any of the directly connected networks, so it also decides to send the packet via compute node’s default gateway (91.207.15.105).
  4. On its way out, the packet gets SNAT-ted to eth1′s IP: 91.207.15.105. There is a special rule in nova-compute’s iptables nat table that handles it: nova-network-snat -s 10.0.0.0/24 -j SNAT --to-source 91.207.15.105 The setting in nova.conf that controls this rule’s behavior is routing_source_ip=91.207.15.105
  5. 8.8.8.8 sends a reply to 91.207.15.105. The response arrives at 91.207.15.105, and the kernel NAT table is referenced to send the packet back to VM_1.

Scenario 3

Let’s assume that tenant1 wants to ping from VM_1 to VM_2. We have two important things to note here:

  • Both instances belong to tenant1.
  • Both instances reside on the same compute node.

Here’s what the traffic is going to look like:

  1. VM_1 sends a packet to VM_2. VM_2 is on the same net as VM_1. VM_1 does not know the MAC of VM_2 yet, so it sends an ARP broadcast packet.
  2. The broadcast is passed through br100 to all of tenant1’s  network, including VM_2, which sends a reply to VM_1.
  3. Once the VM_2 MAC is determined, IP packets are sent to it from VM_1.

Note:
Instances of the same tenant live in the same L2 broadcast domain. This broadcast domain expands on both compute nodes via the vlan100 interface and switch, which support 802.1Q vlan traffic. So all ARP broadcasts sent on the tenant1 network are visible on all compute nodes that have the vlan100 interface created. In this case, even if VM_1 and VM_2 are on a single host, the ARP broadcast sent in point (1) is also visible to VM_4 and VM_5.

Scenario 4

Now let’s assume the ping goes from VM_1 to VM_5. Both of them still belong to tenant1, but are located on different physical compute nodes.

  1. VM_1 wants to send a packet to VM_5, which is located on a different compute node. It sends out an ARP broadcast to determine the MAC of VM_5.
  2. The broadcast is passed by bridge br100 to all the interfaces it has connected, including vlan100.
  3. The packet is tagged with 802.1Q vlan tag number 100 on the compute node’s interface.
  4. The tagged packet goes to the switch. The switch ports are configured in “trunk” mode. Trunking allows passage of vlan tag information between two compute nodes.
  5. The tagged packet arrives at the physical network interface of the other compute node. Since it bears tag 100, it is passed further to vlan100 interface. The tag is stripped from the packet here.
  6. Packet goes through br100.
  7. VM_5 receives the broadcast and replies with its MAC. The reply packet goes the same path in reverse. At this point VM_1 and VM_2 can exchange traffic.
Now let’s take a look at the case of intertenant communication. As stated in the introduction to this post, in the case of internal/corporate clouds, “tenants” are often different developer groups or projects inside a single company. In this case, there is often a need to allow these projects to communicate, even though they may reside on different fixed IP networks.
Users allow traffic to their instances using security groups. In our scenario, tenant1 and tenant2 could simply let their instances communicate by making the following adjustments in their security groups:
tenant1: nova secgroup-add-rule default tcp 1 65535 10.1.0.0/24
tenant2: nova secgroup-add-rule default tcp 1 65535 10.0.0.0/24
tenant1 lets TCP traffic on port range 1-65535 from tenant2′s net (10.1.0.0) and vice versa.
There are two scenarios, depending on where instances are put by OpenStack:
  • Instances of both tenants reside on the same compute node.
  • Instances of both tenants reside on different compute nodes.

Scenario 5

This time the ping goes from VM_1 to VM_3. In this case, packet exchange happens between two tenants. It is critical to understand how the routing between their networks is carried out.

 

  1. VM_1 wants to reach VM_3, which belongs to tenant2 and is located on the same compute node. VM_1 knows that VM_3 is on a different network than its own (10.0.0.0/24 vs 10.1.0.0/24), so it sends the packet directly to the gateway, which is 10.0.0.1.
  2. The packet arrives at br100.
  3. Compute node routes the packet to br102 based on its internal routing table.
  4. The packet arrives at br102. VM_3 MAC is determined by ARP broadcast.
  5. VM_3 replies with its MAC. Since VM_1 is on a different network than VM_3, VM_3 sends the reply to the default gateway, which is 10.1.0.1 (the address of br102). The packet is then routed back to tenant1’s network via br100.
Note:
We see that br100 and br102 are given IP addresses from both tenants’ nets. A side effect of this is that compute nodes start to perceive these bridges as gateways to tenants’ networks. So the leftmost compute node starts to perceive br100 (10.0.0.1 as the gw to tenant1′ net) and br102 (10.1.0.1 as the gw to tenant2′s net). This is how the routing entries for networks might look like on the compute in our case:

10.0.0.0/24 dev br100  proto kernel  scope link

10.1.0.0/24 dev br102  proto kernel  scope link

Scenario 6

This time the ping goes from VM_1 to VM_6. They both belong to different tenants and reside on different compute nodes. Pay special attention here to observe the “asymmetric routing” (i.e., requests from VM_1 to VM_6 go one way, but the response from VM_6 to VM_1 takes a different route).

 

  1. VM_1 wants to exchange data with VM_6. VM_6 belongs to a different tenant and is located on a different compute node. Since VM_6 is not on VM_1’s local network, VM_1 sends the packet with the target address 10.1.0.4 directly to the gateway, which is 10.0.0.1 in this case.
  2. The packet from VM_1 arrives at br100.
  3. Compute node sees the target network (10.1.0.0/24 belonging to tenant2) on br102 so it routes the packet to br102.
  4. The packet is now on tenant2’s L2 network segment.
  5. It gets its 802.1Q vlan tag.
  6. It goes through the switch trunk ports.
  7. The tagged packet arrives at the physical network interface of the other compute node. Since it bears the tag 102, it is passed further to the vlan102 interface. The tag is stripped from the packet here.
  8. The packet goes through br102 and reaches VM_6.
  9. VM_6 sends a reply to VM_1 with a destination address (10.0.0.3). Since VM_1 is located on a different subnet, the reply is sent directly to the default gateway, which is 10.1.0.5 in this case.
  10. The compute node sees the target network (10.0.0.0/24 belonging to tenant1) on br100 so it routes the packet to br100.
  11. The packet is now on tenant1’s L2 network segment.
  12. It gets its 802.1Q vlan tag.
  13. It goes through the switch trunk ports.
  14. It arrives at the physical network interface of the left compute node. Since it bears tag 100, it is passed further to the vlan100 interface. The tag is stripped from the packet here.
  15. The packet goes through br100 and VM_1 gets a reply back.

 But what about this one?

OpenStack provisions networks “on demand”. This means that if you create a new network in openstack, the setup for it (i.e. dedicated bridge plus a vlan interface) is not propagated across all compute nodes by default. It’s only when an instance to be attached to a new network lands on the compute node that the network is actually bootstrapped. So it is common to have differences in network configurations between compute nodes. For example, on one compute node there could be br100, br102, br103 set up, while on another there could only be br102 present.

Such behavior may lead to a frustrating situation. Imagine we want to ping VM_1 to VM_6 again. But this time we have no tenant2 instances running on the leftmost compute node (so no bridge br102 and no vlan102 are present). But wait a minute — since bridge br102 was acting as a gateway to network 10.1.0.0, how do we get to tenant2’s network? The answer is simple — by default we can’t!

  1. VM_1 wants to send a ping to VM_6, which is on a different network than VM_1 itself.
  2. So VM_1 sends the ping to the default gateway (br100, 10.0.0.1).
  3. On the compute node there is no direct route to network 10.1.0.0 (there is no br102 yet, which would provide one).
  4. So the compute node sends the packet to its own default gateway, which happens to be eth1.
  5. Since eth1 is on a completely different wire than eth0, there is no chance of finding the way to 10.1.0.0. We get a “host unreachable” message.

We now can see that intertenant connectivity cannot be ensured by relying only on fixed IPs. There are a number of potential remedies for this problem, though.

The simplest way (and the best in my opinion) would be to place the intertenant communication on floating IPs rather than on fixed IPs:

Scenario 7

VM_1 wants to send a ping to VM_6. Both instances have floating IPs assigned. Given our understanding of how floating IPs work from the previous post, it is clear that these IPs will be attached as secondary IP addresses to eth1 interfaces on compute nodes. Let’s assume that the following floating IP pool was configured by the cloud administrator: 91.208.23.0/24. Both tenant1 and tenant2 allocated themselves addresses from this pool and then associated them with their instances in this manner:

  • tenant1: VM_1: 91.208.23.11
  • tenant2: VM_6: 91.208.23.16

So the situation is now as follows:

 

Let’s say VM_1 wants to ping VM_6. Tenants use their floating IPs to communicate instead of fixed.

    1. VM_1 wants to ping VM_6, whose floating IP is 91.208.23.16 . So the ping goes from source 10.0.0.3 to destination 91.208.23.16.
    2. Since 91.208.23.16 is not on VM_1′s local network, the packet is sent with destination address 10.0.0.1, VM_1′s default gateway.
    3. The compute node makes a routing decision to let this packet out via eth1.
    4. Source NAT-ting (SNAT-ting) of the packet is performed (rewrite of the source address: 10.0.0.3 -> 91.208.23.11). So now the packet source/dest. look like this: source 91.208.23.11 and destination 91.208.23.16.
    5. The packet arrives at the destination compute node with the source 91.208.23.11 and the destination 91.208.23.16. Destination Network Address Translation (DNAT) of the packet is performed, which changes its destination IP from 91.208.23.16 to 10.1.0.4. The packet source/dest. now look like this: source 91.208.23.11 and destination 10.1.0.4.
    6. Based on the destination IP, the compute node routes the packet via br102.
    7. The packet reaches the destination: VM_6.

The response from VM_6 to VM_1 goes the same way, but in the reverse order. There is one difference, though. Since the ICMP reply is considered to be “related” to the ICMP request previously sent, no explicit DNAT-ting is done, as the reply packet returns to the leftmost compute node via eth1. Instead, the kernel’s internal NAT table is consulted.

Other ways to ensure proper intertenant communications based on fixed IPs (as far as my knowledge of networking is concerned, and please note — these are only my personal thoughts on the problem) would require patching OpenStack code:

  • You could ensure that when a new network is added, the bridge and corresponding vlan interface would be created on all compute nodes (right now it’s done only when an instance is spawned).
  • An upstream router could be attached to the OpenStack private network (eth0), which could act as a gateway between tenants’ networks. Some code however would need to be added to OpenStack to ensure proper 802.1q tagging of the router traffic, based on what vlans are configured for OpenStack networks.
  • Use FlatDHCPManager if it fits. This way only one network bridge is created on each compute node and it is there right from the start.

Conclusion

In this post I analyzed various scenarios of instance traffic, starting with the instance’s birth and boot-up, ending up on intertenant communication. The current networking model has its limitations and doesn’t always function as the user might be expecting. Still, OpenStack is highly flexible and extendable software. It provides users with the means to either tackle these limitations with proper configuration changes or address their specific needs with custom code extensions. Also, Quantum is coming in the next release and will introduce a small revolution to the whole networking concept, which probably means an end to all these pains.

24 comments

24 Responses

  1. Nirbhay Tomar

    Hello,

    In scenario 7, is it possible if i add a route for that network which is not present on the host and it goes to the truked port on switch. Is there a way by doing this ?

    August 30, 2012 03:54
    • Piotr Siwczak

      Nirbhay,
      I also thought about it and this does not seem to be a right solution (at least to my knowledge – and I am rather an ops guy than net guy). To implement inter-tenant traffic in an upstream router, you would need to have a mechanism in Openstack which would “talk” to the router and tag packets in sync with nova database (which keeps mapping between tenant ip pools and vlan id-s). Trunk ports only pass tagged packets back and forth. Tagging should be done either on compute node (which requires us to have a tagged interface per network – which is not there in case of scenario 7) or on the router – for this, router needs to have a mapping between IP pools and vlan id-s coming from openstack DB.

      What do you think?

      August 31, 2012 01:44
      • Piotr Siwczak

        This link seems to explain inter-vlan routing rather nicely:
        http://www.routergeek.net/general/how-to-configure-inter-vlan-routing-on-a-cisco-router/

        August 31, 2012 01:50
      • Nirbhay Tomar

        Hi,

        This is possible if we have this on router but how we manage that it should not create gw bridge on host ?

        September 3, 2012 03:52
        • Piotr Siwczak

          I believe that to achieve this a dnsmasq configuration code change is required to point to the upstream router as the default gateway. This can be easily done, as the dnsmasq code is rather simple.

          September 5, 2012 12:37
          • Nirbhay Tomar

            Hello,

            I tried this but i am not from that background can you help me out in doing this.

            September 10, 2012 22:33
          • Piotr Siwczak

            Nirbhay,

            By default, addresses are given out to instances by dnsmasq server (which is controlled by nova-network daemon). If not told otherwise, dnsmasq will take the address on which it is listening (the address of the bridge) and provide it as a default gateway for all the instances that get dhcp from it.

            What you can do to change this behavior:
            put into your nova.conf the following line:
            dnsmasq_config_file=/etc/dnsmasq.conf

            edit /etc/dnsmasq.conf and put there the default gw
            This post explains nicely available config options for dnsmasq:
            http://www.iceflatline.com/2010/02/how-to-install-and-configure-dnsmasq/

            September 11, 2012 01:23
  2. padnala balaji

    Hi Piotr,

    I want to understand the deployment scenario of open stack like how the physical nodes will be like will they have two phsical interfaces like Eth0 and Eth1 so that Eth0 will be connected to Openstack Contrller and Eth1 will be used for private network traffic between Comput nodes..

    Please give us some insight on this..

    thanks in advance..

    September 20, 2012 23:00
    • Piotr Siwczak

      Padnala,

      Yes – you should distinguish your internal traffic from the one of your tenants. It is best then to separate them on different circuits. Still you can go with one interface and create vlans. But in this case you might suffer from performance degradation sometimes.

      -Piotr

      September 27, 2012 00:35
  3. Nirbhay Tomar

    Thanks for the help i am working on it.
    let you know about its progress.

    // Nirbhay

    September 25, 2012 00:12
    • Nirbhay Tomar

      Hi,

      I tried to manage network gateways from external switch and telling openstack through dnsmasq.conf but not able to implement it as you mentioned earlier. Can you please help me carry it out.

      Thanks,

      Nirbhay Tomar

      February 10, 2013 02:30
      • Piotr Siwczak

        Sorry for late answer, but yes – I will be happy to help if I can. Please, tell me if you:

        1. Run multi host networking
        2. How your network topology looks like (i.e. how is a fixed ip network attached to the upstream dc)

        March 12, 2013 01:37
  4. Ramesh Maharjan

    very good explanation. the diagrams helped me to understand more and interesting as well.
    is there any other solution other than having to configure router?
    as you mentioned that using floating ip is a solution but is it only for pinging or in real case as well ?

    September 27, 2012 03:11
  5. Mallikarjun

    Thanks !!! very good document ,but in Scenario 5

    This time the ping goes from VM_1 to VM_3. In this case, packet exchange happens between two tenants. My question here is do we need to allow ICMP -1 in security group of both tenants to get ping working..pls revert back

    October 18, 2012 03:53
    • Piotr Siwczak

      Mallikarjun,

      Yes – tenants are isolated by secgroups, so you should explicitely enable pings.

      October 19, 2012 00:16
      • mallikarjun

        Piotr Siwczak once again thanks lot for clear explaination..Do you have any flow similar like VlanManager Network Flows for Quantum.If yes pls share, it will be of great help ….Thanks Mallikarjun

        November 8, 2012 10:53
        • Piotr Siwczak

          Hi,

          Thank you for a positive comment on our blog ;-)

          We have no network flows for Quantum yet. In fact – they can be totally different depending on Quantum L2 plugin used. I will try to come up with flows based on OVS plugin in the upcoming weeks.

          Regards,
          -Piotr

          November 8, 2012 11:31
          • Mallikarjun

            Dear Piotr

            Any update on network flows for Quantum..

            Regards
            Mallikarjun

            November 26, 2012 02:13
  6. George

    This is buy far the best explanation for the VLAN model in Openstack, and I think the Openstack should hire you to write the official documentation as your style is more clear than anything I read there…

    December 12, 2012 06:44
  7. CliMz

    Hi,

    Thanks for this awesome post.
    I have one question, Where do you configure each brxxx address ?

    Thanks

    March 19, 2013 08:38
  8. Jeffrey4l

    what’s the package flow when ping the floating ip between the VMs which located in the same physical machine?

    April 9, 2013 02:58
  9. Kaya

    thanks for this posting. It’s very helpful! I have a question that why there is an IP address assigned for the bridge? In general it will work without an IP for the bridge but don’t know if there is a specific reason for nova-network to do so.

    January 8, 2014 23:44

Continuing the Discussion

  1. OpenStack Community Weekly Newsletter (Aug 10-17) » The OpenStack Blog

    [...] Understanding VlanManager Network Flows in OpenStack Cloud: Six Scenarios [...]

    August 17, 201201:23

Some HTML is OK


or, reply to this post via trackback.