Pumphouse Deep Dive, Part 1
Welcome back to Mirantis Labs. In the introductory article for the Pumphouse series, we gave you an overview of what Pumphouse does; now we'll go into more detail about how Pumphouse actually migrates workloads between OpenStack clouds. In this Deep Dive series we'll talk specifically about version 1.0 of Pumphouse, and also look at any limitations and how we'll be addressing them in future versions. In this article we'll look specifically at the process of migrating resources from one cloud to another.
The ultimate goal of Pumphouse is to be able to migrate any type of resource, provided by any OpenStack service, from one cloud to another. In version 1.0, however, the list of resources is limited to those that are most important.
As I mentioned in part 1, the smallest unit of workload that is sensible to migrate is a virtual server, or VM, However, the server itself cannot exist in a cloud without a bunch of other resources, which we call server dependencies, so Pumphouse must replicate all of those resources in the Destination cloud prior to moving the server itself.
At its most basic level, Pumphouse requests information about the server from Nova via the Compute API. This information includes references to the server's dependencies. Usually, those references are IDs that allow Pumphouse to fetch the meta-data for those dependency resources and build a list of them.
Once the list has been prepared, Pumphouse builds an unordered flow of tasks that replicate the resources in question in the Destination cloud. Due to how Taskflow handles the unordered flow pattern, these tasks are executed in parallel.
In version 1.0, Pumphouse supports the following dependency resources, or meta-resources, for virtual servers:
- A flavor defines resources allocated to a server instance in the cloud.
- An image is required to boot the server if the ‘image’ migration strategy is chosen. (We'll discuss the various strategies below.)
- Security groups define permissions for network access to the server. A single server may have more than one security group assigned to it.
- Networks connect the virtual server instance to the outside world.
- Floating ips are used to access the server from external networks.
- Identity is a combination of authentication/authorization resources that allows the original owner of the server (and other resources as well) to manage it after migration.
Most of these resources are replicated simply by passing attributes fetched from the Source cloud APIs to the Destination cloud APIs. Pumphouse maintains a mapping between the identifiers (usually UUIDs) of corresponding resources in the Source and Destination clouds within the Taskflow store. Later, when migrating the server itself, it fetches the IDs of the dependency resources created in the Destination cloud from that store and uses them in the creation of the new server.
Identity resources include tenants, roles and users from Keystone. Additionally, they include user-role assignments and user credentials such as password hashes.
The main difficulty in terms of identity migration is that Pumphouse must preserve ownership of resources in the Destination cloud. To achieve that, it has to talk to APIs on behalf of users who own resources it is trying to replicate. Version 1.0 of Pumphouse achieves this feat by generating a password for the user in the Destination cloud when migration begins, performing all operations for that user under this login, and then setting the password to the original value from the Source cloud via the database.
Server migration strategies
Pumphouse supports two options for migrating virtual servers in version 1.0: image- and snapshot-based migration. Although these strategies are similar, there are important differences.
With image-based migration, Pumphouse suspends the server chosen for migration in the Source cloud, translates its meta-data based on the information about the migration of dependency resources from the Taskflow store, and issues a servers.boot request to the Destination cloud’s Compute API with the modified meta-data. The server boots from the Destination Glance image that is a copy of the image from which the server was instantiated in the Source cloud.
This is a simplified representation of the flow that implements server migration. Bold arrows represent direct links between tasks in a linear flow. Dashed arrows are indirect dependencies that result when the output of one task (arrow source) serves as input to another task (arrow target). UUIDs of resources have been replaced with capital letters for readability. Most meta-resources were excluded from this diagram.
If the boot request succeeds and the server becomes ‘active’ in the Destination cloud within the expected time, Pumphouse terminates the original instance of the virtual server in the Source cloud. This migration is fairly quick, however it is only suitable for servers that run completely stateless applications, as no data from the server’s ephemeral disk in the Source cloud will be available to server’s instance in the Destination cloud.
The snapshot-based migration scenario includes an additional step after the suspension of the server in the Source cloud. Pumphouse creates a snapshot of the server in the Glance store of the Source cloud and copies that snapshot to the Destination cloud as an ordinary image. It then boots the virtual server in the Destination cloud from that snapshot instead of the copy of the original image.
This graph depicts a flow that migrates the server by making a snapshot of it. Pumphouse copies the snapshot to the Destination cloud and uses it to boot the migrated instance of the server. Note that the graph was simplified for readability by excluding all meta-resources.
Snapshot migration allows you to preserve all data written to the virtual server’s ephemeral disk by applications and users. However, with snapshot-based migration, the server will be unavailable for the entire time needed to take the snapshot and transfer it between the Source and Destination Glance service, which, depending on the flavor of the instance, can be significant.
How fast is it?
This diagram shows results of the migration benchmark. We migrated 15 'medium' virtual servers, each with 2 vCPUs, 4Gb of RAM, and 40 GB of ephemeral storage. Every server was assigned a floating IP address, which is used for ICMP and HTTP probing. All 15 servers use the same image. The time to migrate the first server includes time to cache the image to Compute node. In this graph, time is noted in seconds.
The Taskflow parallel engine allows Pumphouse to run multiple migrations at one time. Our benchmarks suggest that it usually takes about 30 to 60 seconds to boot a medium or large flavor virtual server. The flavor itself does not affect the time to boot the server; image size has an effect only in cases when Compute needs to cache the image.
This part of the Pumphouse Deep Dive series described the concepts behind the migration of virtual resources between OpenStack clouds. The most difficult challenge we’ve faced with virtual resources is the complex structure of dependencies between them, which we solved by leveraging Taskflow and its explicit graph-oriented structure of tasks.
In the following posts of Deep Dive we’ll see how Pumphouse manages bare metal to upgrade hypervisor hosts and reassign them to the target cloud. We’ll also cover limitations of the current version in a separate article. Stay tuned for updates from Mirantis Labs!