Scaling OpenStack With a Shared-Nothing Architecture

When it comes to pushing the boundaries of OpenStack scaling, there are basically two supported constructs: Cells and Regions. With Nova Cells, instance database records are divided across multiple “shards” (i.e., cells).  This division ensures that we can keep scaling our compute capacity without getting bogged down by the limitations of a single relational database cluster or message queue.  This is what we mean by a shared-nothing architecture: Scaling a distributed system without limits by removing single points of contention.

However, in OpenStack, cells currently only exist for Nova.  If we want to extend this kind of paradigm to other OpenStack services such as Neutron, Ceilometer, and so on, then we have to look to OpenStack Regions.  (You may already be looking at using regions for other reasons – for example, to optimize response times with proximity to various geographic localities.)

There are many ways of implementing regions in OpenStack.  You will find online references that show the same Keystone & Horizon shared between multiple regions, with some variants throwing in Glance too, while others exclude Keystone.  These are all variations in expressing the degree to which we want to share a set of common services between multiple cloud environments, versus keeping them separate.  To depict the extremes (sharing everything, vs sharing nothing):

Shared-Nothing diagram

Shared services offer the convenience of a central source of truth (e.g., for user, tenant, and role data in the case of Keystone), a single point of entry (e.g., Keystone for auth or Horizon for the dashboard), and can be less trouble than deploying and managing distributed services.

On the other hand, with this paradigm we can’t horizontally scale the relational database behind Keystone, Horizon’s shared session cache, or other single points of contention that are created when centralizing one of the control plane services.

Beyond scaling itself though, let’s take a look at some other points of discussion between the two:

Flexibility

The shared-nothing paradigm offers the flexibility to support different design decisions and control plane optimizations for different environments, providing a contrast to the “one size fits all” control plane philosophy.

It also permits the operation of different releases of OpenStack in different environments.  For example, we can have a “legacy cloud” running an older/stable OpenStack, at the same time as an “agile cloud” running a more recent, less stable OpenStack release.

Upgrades & Updates

OpenStack has been increasingly modularized by projects that specialize in doing one specific thing (e.g., the Ironic project was a product of the former bare metal driver in Nova).  However, despite this modularization, there remains a tight coupling between most of these components, given their need to work together to make a fully functioning platform.

This tight coupling is a hardship for upgrades, as it often requires a big-bang approach (different components that have to be upgraded at the same time because they won’t work properly in an incremental upgrade scenario or with mixed versions).  Most of the upstream testing is focused on testing of the same versions of components together, not in the mixing of them (especially as we see more and more projects make their way into the big tent).

When we don’t share components between clouds, we open the possibility of performing rolling upgrades that are fully isolated and independent of other environments.  This localizes any disruptions from upgrades, updates, or other changes to one specific environment at a time, and ultimately allows for a much better controlled, fully automated, and lower risk change cycle.

Resiliency & Availability

When sharing components, we have to think about common modes of failure.  For example, even if we deploy Keystone for HA, if we have corruption in the database backend, or apply schema updates (e.g., for upgrades), or take the database offline for any other maintenance reasons, these will all cause outages for the service as a whole, and by extension all of your clouds that rely on this shared service.

Another example: Suppose you are using PKI tokens and you need to change the SSL keys that encode and decode tokens.  There is not really any graceful way of doing this transition: you have to do hard cut-over to the new key on all Keystone nodes at the same time, purge all cached signing files stored by every other openstack service, and revoke all tokens issued under the old key.

Also, denial of service attacks are both easier to perform and more impactful with shared infrastructure elements.

In contrast, the shared-nothing approach removes common modes of failure and provides full isolation of failure domains.  This is especially relevant for cloud native apps that deploy to multiple regions to achieve their SLAs, where failures are taken to be independent, and where the presence of common modes of failure can invalidate the underlying assumptions of this operational model.

Performance & Scaling

When distributing services, degraded or underperforming shards do not affect the performance or integrity of other shards.  For example, in times of high loading, or denial of service attacks (whether or not malicious in nature), the impacts of these events will be localized and not spread or impact other environments.

Also, faster API response times may be realized (since requests can be processed locally), as well as lower utilization of WAN resources.  Even small latencies can add up (e.g., Keystone calls in particular should be kept as fast as possible to maximize the response time for the overall system).

Scaling out is a simple matter of adding more shards (regions).  As mentioned previously, this also helps get around the fact that we have components that cannot otherwise be horizontally scaled, such as the horizon shared session cache or relational database backend.

Design Complexity

An important factor to consider with any deployment paradigm is: “How close is this to the reference upstream architecture?”  The closer we stay to that, the more we benefit from upstream testing, and the less we have to go out and develop our own testing for customizations and deviations from this standard.

Likewise from the operations side, the closer we stick to that reference architecture, the easier time we have with fault isolation, troubleshooting, and support.

If your organization is also doing some of their own OpenStack development, the same statement could also be made about your developers: In effect, the closer your environment is to something that can be easily reproduced with DevStack, the lower the barrier of entry is for your developers to onboard and contribute.  And regardless of whether you are doing any OpenStack development, your dev and staging environments will be easier to setup and maintain for the same reasons.

The elegance of the shared-nothing approach is that it allows you to use this standard, reference deployment pattern, and simply repeat it multiple times.  It remains the same regardless of whether you deploy one or many.  It aims to commoditize the control plane and make it into something to be mass produced at economies of scale.

Challenges

There are two key challenges/prerequisites to realizing a shared-nothing deployment pattern.

The first challenge is the size of the control plane: It should be virtualized, containerized, or at least miniaturized in order to reduce the footprint and minimize overhead of having a control plane in each environment.  This additional layer may increase deployment complexity and brings its own set of challenges, but is becoming increasingly mainstream in the community (for example, see the TripleO and Kolla openstack projects, which are now part of the big tent).

The second challenge is the management and operational aspects of having multiple clouds.  Broadly speaking, you can classify the major areas of cloud management as follows:

  • Configuration Management (addressed by CM systems like Ansible, Puppet, etc)
  • OpenStack resource lifecycle management.  Specifically we are interested in those resources that we need to manage as cloud providers, such as:
    • Public Images
    • Nova flavors, host aggregates, availability zones
    • Tenant quotas
    • User identities, projects, roles
    • Floating/public networks
    • Murano catalogs
    • VM resource pools for Trove or other aaS offerings

Coordinated multi-cloud resource lifecycle management is a promising possibility, because it permits us to get back some of what we sacrificed when we decentralized our deployment paradigm: the single source of truth with the master state of these resources.  But rather than centralizing the entire service itself, we centralize the management of a set of distributed services.  This is the key distinction with how we manage a set of shared-nothing deployments, and leverage the relatively backwards-compatible OpenStack APIs to do multi-cloud orchestration, instead of trying to synchronize database records with an underlying schema that is constantly changing and not backwards-compatible.

What we could envision then is a resource gateway that could be used for lifecycle management of OpenStack resources across multiple clouds.  For example, if we want to push out a new public image to all of our clouds, then that request could be sent to this gateway which would then go and register that image in all our clouds (with the same image name, UUID, and metadata to each Glance API endpoint).  Or as an extension, this could be policy driven – e.g., register this image only in those clouds in certain countries, or where certain regulations don’t apply.

In terms of CAP theory, we are loosening up consistency in favor of availability and partition tolerance.  The resources being managed could be said to be “eventually consistent”, which is reasonable given the types of resources being managed.

Also note that here, we only centralize those resources that cloud operators need to manage (like public images), while private image management is left to the user (as it would be in a public cloud setting).  This also gives the end-user the most control about what goes where – for example, they don’t have to worry about their image being replicated to some other location which may increase their image’s exposure to security threats, or to some other country or jurisdiction where different data laws apply.

There have been a number of implementations designed to address this problem, all originating from the telco space.  Kingbird (started by OPNFV; open source) and ORM (by AT&T, with plans to open source by Q4 2016 – Q1 2017) can be classified as resource management gateways.  Tricircle (Telco working group and OPNFV; open source) is another community project which also has similar aims.

It will be very interesting to see how these projects come along this year, and to what degree we see a community standard emerge to define the way we implement shared-nothing.  It would also be great to get feedback from anyone else out there who is thinking along similar lines, or if they know of any other implementations that I missed in the list above.  Feel free to comment below!

One response to “Scaling OpenStack With a Shared-Nothing Architecture

  1. As you mentioned Tricircle and Kingbird, I would like to provide some more information here(it was shared in openstack-dev, openstack-operators mail-list too).

    Multisite is a requirement project in OPNFV to identify the gap and requirement
    in OpenStack to make OpenStack work for NFV multi-site cloud.

    Kingbird is one sub-project of Multisite, started after gap analysis in OPNFV,
    which is aiming at centralized quota management, centralized view for
    distributed virtual resources, synchronization of ssh keys, images, flavors,
    etc. across regions in OpenStack multi-region deployments.
    Currently the project is working on key-pair sync, and
    centralized quota management feature has been implemented in OPNFV
    C release. Kingbird is one major topic in OPNFV Multisite weekly meeting.

    While Tricircle is one OpenStack big-tent official project, which was accepted in
    Nov.2016, and has been narrowed its scope on networking automation across Neutron in
    OpenStack multi-region deployments during the big-tent application.
    Tricircle has basic features of L2/L3 networking across OpenStack cloud, currently local
    network and shared_VLAN based L2/L3 networking are supported, and is working on
    VxLAN L2 networking across Neutron, so that L2/L3 networking can also leverage the
    VxLAN L2 networking capability. You can refer to (review) the networking guide prepared:
    https://review.openstack.org/#/c/420316/.

    During the discussion happened in 2015 , both kingbird / tricircle are candidate
    solutions to address the multisite clouds, Kingbird and Tricircle
    can work together or separately in OpenStack multi-region deployment scenario, they are
    complimented each other now. Kingbird has no features about networking
    automation, and Tricircle has no features related to Nova/Cinder…

    Tricircle is mostly visible in OpenStack community, while Kingbird is mostly
    visible in OPNFV community.

    Welcome to join the meeting:
    Tricircle: IRC meeting:
    https://webchat.freenode.net/?channels=openstack-meeting on every Wednesday
    starting from UTC 13:00
    Multisite & Kingbird: IRC:
    http://webchat.freenode.net/?channels=opnfv-meeting on every Thursday 8:00-9:00
    UTC (During winter time, means CET 9:00 AM).

Leave a Reply

Your email address will not be published. Required fields are marked *

NEWS VIA EMAIL

Recommendations

Archive

LIVE DEMO
Mirantis Cloud Platform
WEBINAR
Automate Upgrades with Mirantis DriveTrain
WEBINAR
ONAP Overview