OpenStack Nova: basic disaster recovery

Alexander Sakhnov - June 6, 2011 -

Today, I want to take a look at some possible issues that may be encountered while using OpenStack. The purpose of this topic is to share our experience dealing with the hardware or software failures which definitely would be faced by anyone who attempts to run OpenStack in production.

Software issue
Let’s look at the simplest, but possibly the most frequent issue. For example, if we need to upgrade the kernel or software that will require a host reboot on one of the compute nodes, the best decision in this case is to migrate all virtual machines running on this server to other compute nodes. Unfortunately, sometimes it may be impossible due to several reasons, such as lack of shared storage to perform migration or cpu/memory resources to allocate all VMs. The only option is to shut down virtual machines for the maintenance period. But how should they be started correctly after being rebooted? Of course, you may set the special flag in nova.conf and instances will start automatically on the host system boot:

--start_guests_on_host_boot=true

However, you may want to disable it (in fact, setting this flag is a bad idea if you usenova-volume service).

There are many ways to start virtual machines. Probably the simplest one is to run:

nova reboot 

It will recreate and start the libvirt domain using instance XML. This method works good if you don’t have remote attached volume; otherwise, nova boot will fail with an error. In this case, you’ll need to start the domain manually using the virsh tool, connect the iscsi device, create an XML file and attach it to the instance, which is a nightmare if you have lots of instances with volumes.

Hardware issue
Imagine another situation. Assume our server with a compute node experiences a hardware issue that we can’t eliminate in a short time. The bad thing is that it often happens unpredictably, without the ability to transfer virtual machines to a safe place. Yet, if you have shared storage, you won’t lose instances data; however, the way to recover may be pretty vague. Going into technical details, the procedure can be described by following steps:

  • update host information in DB for recovered instance
  • spawn instance on compute node
  • search for any attached volumes in database
  • look for volume device path, connect to it by iscsi or some other driver if necessary
  • attach it to the guest system

Solution
For this and previous situations we developed python script that would run a virtual machine on the host where this script is executed. You can find it on our git repository:openstack-utils. All you need is to copy the script on the compute node where you want to recover the virtual machine and execute:

./nova-compute -i 

You can look for instance_id using the nova list command. The only limitation is that the virtual machine should be available on the host system.

Of course, in everyday OpenStack usage, you will be faced with lots of troubles that couldn’t be solved by this script. For example, you may have storage configuration that provides the mirroring of data between two compute nodes and you need to recover the virtual machine on the third node that doesn’t contain it on local hard drives. The more complex issues require more sophisticated solutions and we are working to cover most of them.

banner-img
test
tst
tst
Deploy Mirantis Secure Registry on any Kubernetes (Minikube, EKS, GKE, K0S, etc.)

Note: this blog post was originally published by Avinash Desireddy on Medium. You can view the original post here. Docker Containers, Kubernetes, CNCF, and many other relevant projects completely changed how we package, ship, and run applications. As you all know, Kubernetes has become a defacto standard for running applications. At the same time, container registries and chart repositories play a …

Deploy Mirantis Secure Registry on any Kubernetes (Minikube, EKS, GKE, K0S, etc.)
Software Supply Chain Security on Any Kubernetes with Mirantis Secure Registry 3.0

Security and cloud infrastructure availability concerns have been in the news of late with the recent Log4j vulnerabilities and outages at some of the world’s largest public cloud providers. The security and integrity of your container-based images has never been more important. Many have taken to Kubernetes to assist in the deployment and management of their container-based workloads, and are leveraging …

Software Supply Chain Security on Any Kubernetes with Mirantis Secure Registry 3.0
A Year in Review: A Look Back at the Most Powerful Mirantis Resources from 2021

2021 has been quite the year - and while there have been plenty of not-so-good times, we at Mirantis would like to take a moment to focus on the good. We are thankful for the opportunity to provide our readers with informative, accurate, and, above all, educational content via our company blog. We try not only to include helpful information …

A Year in Review: A Look Back at the Most Powerful Mirantis Resources from 2021
LIVE WEBINAR
Getting started with Kubernetes part 2: Creating K8s objects with YAML

Thursday, December 30, 2021 at 10:00 AM PST
SAVE SEAT
LIVE WEBINAR
Manage your cloud-native container environment with Mirantis Container Cloud

Wednesday, January 5 at 10:00 am PST
SAVE SEAT
LIVE WEBINAR
Istio in the Enterprise: Security & Scale Out Challenges for Microservices in k8s

Presented with Tetrate
SAVE SEAT