As part of our efforts to make Ironic faster and more stable under high load, we have been working on finding the most efficient way to provide images when OpenStack spins up a new node. In this article, we’ll talk about how we improved bare metal deployment performance by using Torrents to stream images from storage rather than pulling them directly.
Optimizing a process
Optimizing performance can be done in more than one. The first is to tune a single parameter:
- Test performance and scalability
- Find top impacting bottleneck
- Fix it
- Rinse and repeat
Another approach is to analyze a process, find compound elements and improve each one. If the element is on a critical path for any metric this metric will improve as result.
Our team brainstormed a few ideas where an improvement could lead to overall performance and scalability improvements for Ironic. The list of examples of such possible vectors for improvement included:
- Multi-cast for image dispensing
- Group deploy for nodes enrollment
- High-availability (HA) and redundancy deployment architecture for Ironic services (conductor, API)
- System image caching
- TFTP replacement with better performing servers (e.g. Apache Web server)
In this case, we started by testing Ironic’s ability to handle different levels of load, from 100 nodes to 4000 nodes. To our excitement, Ironic services (API, conductor) are quite good at keeping many nodes up and running. The original Ironic design deals mostly with node changes, such as enrollment, maintenance, cleaning, and so on. Once the nodes are up and running, there is no extra load, as designed. That’s not to say Ironic is completely unaffected; there is, of course, some work to keep the cluster up and running, such as syncing power states, sending statistics to ceilometers, emitting notifications, and so on. However, we have tested this mode up to 4000 pseudo-baremetal nodes without visible performance degradation, so we decided to look elsewhere for an element to optimize.
We decided on using Torrents for image dispensing to determine whether we could improve performance.
Ironic node enrollment: big picture
Before we go into changes we made, let’s take a look at how bare metal nodes are currently enrolled. (We’re going to focus on the areas where Torrents are most useful; for a complete look at enrollment, look at the Ironic documentation.)
Before enrollment, the node must registered in Ironic, providing metadata such as CPU, RAM, Deploy driver, kernel, ramdisk, and so on. It must also be in the “available” state. We can then create a separate flavor for these nodes so they can be easily identified when spinning up new nodes.
Next the user can spin up a new node using the specific flavor. In Nova, all Ironic nodes are represented as separate Hypervisors in the Nova DB, though this may change in the near future as the new Nova scheduler is more smoothly integrated with Ironic. For now, though, the Nova scheduler chooses a hypervisor from among our bare metal modes on which to place our new VM/BM node.
This process does include a number of weak points. For example, Nova doesn’t treat bare metal nodes any differently than nodes that run on hypervisors, which brings a lot of legacy limitations that are currently being worked on. Another weak point is the random choice of available MAC addresses, on which Nova registers a new Neutron port and associates it with pre-created Ironic ports. It then shares information about the VIF (virtual interface) ID.
As part of this process, instance_info must include a reference to the bootstrap and target system images. As you may have already guessed, the bootstrap image(s) is/are used to bootstrap the bare metal node for the first time. This image must contain an agent, such as IPA (Ironic Python Agent), FA (fuel-agent aka Bareon), or ADD (Ansible-Deploy Driver). The agent (or agentless deployment driver, in the case of ADD) is responsible for downloading the target system image and writing it to the node’s disk.
This is where the Torrent architecture comes in.
Not all situations will benefit from this new approach, of course. We have found that the ideal situation for obtaining the most gain from using torrents includes:
- Large image sizes (target system image of 1-5 GB)
- Multiple nodes that enroll at approximately the same time
Because the torrent protocol has a bit more overhead than straight HTTP, there is a theoretical overhead that torrent brings to retrieving the target image, but in our tests starting even from 3 simultaneously booting nodes we saw an improvement in download speed and network bandwidth usage.
Implementing Torrents in Ironic
Once we’d decided that this was the improvement we wanted to make, we needed to decide how we were going to implement this change. We had two different options.
The first step was to create a new service that handles torrents provisioning, but then were should we put it?
One option was to add the functionality to the Ironic Conductor. The advantage here was that the operator didn’t have to do any additional work; the Conductor would handle seeding. However, since we wanted the Conductor to be doing other things (namely, booting the new nodes) we opted for the second option: embedding torrent functionality directly on the storage nodes, right on top of the SWIFT API.
As you can see in this sequence diagram, the image is downloaded directly from the object store by setting the image’s temporary URL as the webseed for the torrent.
This approach moves the load to the storage nodes instead of the conductor, but requires the generation and uploading of the torrent file into glance. Let’s look at how to do that.
Installation and setup of Torrent-based Ironic images
The general process for using Torrents with Ironic consists of three steps:
- Update OpenStack with the appropriate drivers and patches.
- Create the torrent file and associate it with an image
- Boot a new node with that image
Let’s look at these steps in more detail.
Prerequisites, dependencies and assumptions
This process assumes that you meet two requirements:
- You have an OpenStack installation (mitaka release or later) with ironic and Swift-compatible object storage (such as Swift itself or RadosGW) used as the glance backend and
- You’ve set up the desired number of torrent trackers.
The current proof-of-concept implementation is done using the IPA, but you can easily implement it with other agents. For torrent provisioning to work on the IPA side, you need to regenerate the deployment images:
$ cd ironic-python-agent $ git fetch https://git.openstack.org/openstack/ironic-python-agent refs/changes/20/404120/2 && git checkout FETCH_HEAD $ make $ glance image-create --name deploy_kernel --disk-format aki --container-format aki --file tinyipa.vmlinuz --visibility public $ glance image-create --name deploy_ramdisk --disk-format ari --container-format ari --file tinyipa.gz --visibility public
The following patch needs to be applied to the ironic itself to enable using torrents with IPA.
$ cd ironic $ git fetch https://git.openstack.org/openstack/ironic refs/changes/11/409711/2 && git checkout FETCH_HEAD # pip install -e .
Now let’s configure the system.
The next step is to complete the configuration. Follow these steps:
- Update the existing nodes or create the new ones with the new deployment images:
$ ironic node-update <node> \ add driver_info/deploy_kernel=<glance image UUID of the deploy kernel> \ driver_info/deploy_ramdisk=<glance image UUID of the deploy ramdisk>
- Add the Torrent tracker URLs to the ironic conductor config file in the [deploy]torrent_trackers option.
- Restart the conductor.
At this point the nodes will be able to download images from object storage via torrents.
Create the Torrent-based image
Next you’ll need to create a torrent file and associate it with an image. Follow these steps:
- Create the torrent file for the image;
# apt-get install -y mktorrent $ wget https://download.fedoraproject.org/pub/fedora/linux/releases/24/CloudImages/x86_64/images/Fedora-Cloud-Base-24-1.2.x86_64.raw.xz $ xz -d Fedora-Cloud-Base-24-1.2.x86_64.raw.xz $ mktorrent -a none Fedora-Cloud-Base-24-1.2.x86_64.raw
- Upload the torrent file into glance. (In the future, we will consider using glare, or generating the torrent file on-the-fly in glance.)
$ glance image-create --name fedora_raw.torrent --disk-format raw --container-format bare --file Fedora-Cloud-Base-24-1.2.x86_64.raw.torrent --visibility public
- Upload the image into glance, specifying the torrent file glance ID in the “torrent_file” property:
$ glance image-create --name fedora_raw --disk-format raw --container-format bare --file Fedora-Cloud-Base-24-1.2.x86_64.raw --visibility public --progress --property cpu_arch='x86_64' --property hypervisor_type='baremetal' --property torrent_file=6cee4d5c-ceaf-4ff0-93a2-9e9505380a67
Boot the new node using the torrent-based image
Now you need to boot the new node using the image you just created. There’s no magic here; it’s a simple nova boot command:
nova boot --image IMAGE_NAME --flavor FLAVOR_NAME INSTANCE_NAME
nova boot --image fedora_raw --flavor bm.medium MyNewNode
If the node the instance is scheduled to is using the torrent-capable IPA image, it will start the torrent download with the aria2c library, using the list of trackers specified in the ironic conductor config file. The swift temporary URL of the instance image will be set as the webseed in the torrent file. That way, the node will be able to use both peers (nodes downloading the same image) and object storage itself for image download, and of course share the downloaded data with other peers.
To determine the effectiveness of this change, we did a number of tests, measuring mean time to provision servers, ration of incoming to outgoing traffic, CPU and RAM load on the various servers, and so on.
The two main scenarios were:
- Booting 15 nodes with 3 GiB image via standard IPA provisioning available in ironic;
- Booting 90 nodes with 3 GiB image via IPA provisioning with torrents, using object store as the webseed.
If there’s interest in the full results I’ll be happy to create a second post, but in essence, we discovered the following:
- PXE-booting: PXE-booting the bare metal nodes over TFTP created prominent sharp peaks of CPU and Network usage on Ironic conductor nodes at the beginning of deployment.
- CPU: Using Torrent-based provisioning moves some of the load from storage nodes to ironic nodes booted with the same image.
- Network: The network-related results were specifically interesting. In the case of standard image provisioning from a centralized image store over HTTP (Ceph-backed RadosGW in our case) we actually hit the network load limit of a storage node, which was using 1 Gbps connection, with 15 bare metal nodes booted simultaneously. This requires bumping deploy timeout to very big values, for example, to boot 15 nodes simultaneously in this case it should be set to 1 hour (which roughly corresponds to the total time for all nodes to become active).
When using torrents with webseed, the load on the storage node does not persist throughout booting of all nodes. Instead, the period of high load is short (around 5 minutes) and all 90 bare metal nodes are able to boot in approximately 15 minutes. Most of the load shifts to the nodes seeding the image themselves.
Thus we can definitely conclude that using Torrent-based image distribution for concurrent provisioning of baremetal nodes allows us to offload the Storage cluster and fully utilize current Network bandwidth. Torrent usage improves overall performance for both individual node boot during concurrent deployment and total time frame.
Where we go from here
Given these results, we plan to continue investigating torrent-based image provisioning and how we can incorporate it into the current OpenStack state of the art. We will look into offloading the torrent tracker role to a dedicated proxy/cache service, perhaps making it part of Glance or object storage, or seeding directly from the RadosGW/Ceph block device. We’d also like to know whether there’s a benefit for Nova to use torrents to do traditional distribution of images to VMs on compute nodes.