Orchestrating containers for production?
Creating Docker containers for use with Docker Compose with host networking is easy for development and testing scenarios. However if you want to run it in production, there you want scaling, balancing, orchestration, rolling updates, etc. Today we consider Kubernetes as most suitable platform, which provides as flexibility to run containers application in production.
During prototyping we had to solve two things:
- Kubernetes architecture – how to deploy and operate Kubernetes cluster
- Service decomposition – how to separate openstack services into containers and what Kubernetes resources use for operations.
The Kubernetes code base is moving very quickly. Every minor version contains fixes, features and sometimes deprecates existing functionality. We are currently working on the first robust packaging pipeline for Kubernetes and developed a new salt formula for deployment and manifests orchestration.
Network is usually most difficult part for every cloud solution. OpenStack Neutron plugins are most controversial and crucial part of every discussion with a potential user. Kubernetes provides several plugins like Weave, Flannel, Calico, OpenContrail, OpenVSwitch, L2 bridging, etc.
We have comprehensive experience with OpenContrail as an overlay NFV & SDN most suitable for OpenStack Neutron. It is de facto standard for our enterprise customers. Given that we hoped that we could use the same plugin for OpenStack as Kubernetes, even using same instance of OpenContrail. However after testing and deeper discussion we realized that overlay is not suitable for underlay environment, which must be reliable and very simple. Running the same instance of OpenContrail for OpenStack Controllers as well as workload increase single point of failure. When we deployed extra OpenContrail cluster just for Kubernetes, it means high available cassandra, zookeeper, kafka, contrail services.
After some testing we found that Calico provides features what we need for underlay infrastructure – no overlay, simple, full L3, BGP support with ToR and no manual configuration of each node Docker0 bridge like Flannel. Another reason is that we do not require any mutli-tenancy or security on network level. OpenStack control-plane services can be visible between each other. Calico does not have any complexity in database clusters. It uses just another ETCD instance. Standard overlay SDNs use different clusters like cassandra, zookeeper or kafka.
In the end we get OpenContrail on top of Calico, but it provides exactly what we require from production deployment.
- Underlay – Kubernetes + Calico
- Overlay – OpenStack + OpenContrail
Second decision point in building Kubernetes cluster. This point is tightly connected with next section of OpenStack service decomposition and specification, because we have to define storage share and persistency requirements.
We use three types of Kubernetes volumes
- emptyDir – this type lives together with POD and container. It is not persistent. It is suitable for temporary space like logs, tmp or signing dir.
- hostPath – volume mounts a file or directory from the host node’s filesystem into your pod. For instance this is important for nova-compute /var/lib/nova/instances and libvirt cgroups, etc.
- Glusterfs – gluster has been already used in standard deployment for keystone fernet tokens or small glance image repositories. It used share storage from controllers space. These 2 volumes are mounted into Keystone and Glance pods.
Following diagram shows logical architecture for underlay Kubernetes cluster running on bare metal servers with Ubuntu 14.04 OS. On top of this Kubernetes cluster we run OpenStack including libvirt and compute. All these services are installed directly to bare metal server by salt-formula-kubernetes.
SaltMaster contains metadata definition, packages and formulas for deployment both underlay and overlay.
- Kube Master – controller for Kubernetes and Calico. To simplify picture we have just one instance. For production there must be at least 3 nodes with clustered ETCD. These nodes can be virtual machines as well.
- Kube Node Controller 1 – 3 – standard kube nodes with Calico node dedicated for OpenStack control services. GlusterFS is deployed on local disks with 2 volumes for glance images and keystone fernet tokens.
- Kube Node Compute X – standard kube nodes with Calico used for KVM hypervisors. This node contains OpenContrail vRouter with compiled kernel module. vRouter can be run in container too, but it is not suitable for production yet. This node will host nova-compute and libvirt containers, whose will start OpenStack Virtual Machines.
OpenStack Service decomposition and specification
When Kubernetes underlay architecture is done, next step is decomposition of OpenStack services. Transformation from service oriented approach to Kubernetes platform is shown on following diagram. HAProxy with Keepalived (VIP address on VRRP) is replaced by built-in Kubernetes service load balancing. Virtual Machines are replaced by pods. Load balancing is implemented by a native service (iptables). This provides simplification of architecture and decrease potential components with errors. However this cannot be used advanced balancing methods important for Mysql galera.
Sample OpenStack Nova service
This create container-base = pod = service =
For build we need to determine between two parameters
- Dynamic parameters – dynamic values depend on environment, replaced during container launching by entrypoint.sh (keystone_host, db_host, rabbit_host). Values comes from Kubernetes env or Docker Compose env values from manifest and etcd.
- Static parameters – pillar data pushed into container during its build. Pillar data in “salt” terminology are specific metadata. Hard coded in each container version inside of on-premise docker registry (cpu_overallocation_ratio, ram_allocation_ratio, token_engine, etc.).
Pillar definition snippet for this container build in reclass. Bold parameters represents dynamic parameters and rest values are static parameters.
nova: controller: version: kilo enabled: true networking: contrail cpu_allocation_ratio: 8.0 ram_allocation_ratio: 1.0 disk_allocation_ratio: 1.0 bind: public_address: 0.0.0.0 public_name: openstack.domain.com novncproxy_port: 6080 database: engine: mysql host: “$SERVICE_DATABASE_HOST” port: 3306 name: nova user: nova password: “$SERVICE_DATABASE_PASSWORD” identity: engine: keystone host: “$SERVICE_IDENTITY_HOST” port: 35357 user: nova password: “$SERVICE_IDENTITY_PASSWORD” tenant: service …
Entrypoint.sh will replace dynamic parameters at configuration files by values provided by Kubernetes shared env variables.
e.g. $SERVICE_DATABASE_HOST -> 192.168.50.50
Kubernetes manifests will define deployment resource and service endpoint. This is generated by salt-reclass too and explain in next section about a single source of truth.
Kubernetes Deployment <service_name>-<role_name>-deployment.yml
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nova-controller namespace: default labels: app: nova-controller spec: replicas: 5 template: metadata: labels: app: nova-controller spec: containers: - name: nova-api image: tcpcloud/nova-controller:kilo imagePullPolicy: IfNotPresent securityContext: privileged: True ports: - containerPort: 8773 name: ec2-api - containerPort: 8774 name: os-api - containerPort: 8775 name: metadata command: - /entrypoint.sh - api - name: nova-cert image: tcpcloud/nova-controller:liberty imagePullPolicy: IfNotPresent command: - /entrypoint.sh - cert - name: nova-conductor image: tcpcloud/nova-controller:liberty imagePullPolicy: IfNotPresent command: - /entrypoint.sh - conductor ….
Kubernetes Service <service_name>-<role_name>-svc.yml
apiVersion: v1 kind: Service metadata: labels: name: nova-controller app: nova-controller name: nova-controller spec: ports: - port: 8773 name: ec2-api - port: 8774 name: os-api - port: 8775 name: metadata type: LoadBalancer selector: app: nova-controller
OpenStack on Kubernetes
Following figure shows OpenStack compute and controller node detailed architecture for a deeper understanding of how it works.
Kube node as OpenStack Controller runs kubelet, kube-proxy, Calico node, Docker as underlay services for management. On top of that kubernetes starts Docker containers defined by Kube deployment. OpenStack runs exactly the same like any other application. Rabbitmq runs as a pod with single container, but nova-controller pod contains 6 docker containers per nova service. Nova-controller docker image is same for these 6 containers, but starts different nova services on foreground.
Kube Node as OpenStack compute is little bit more difficult to explain. It runs same components for underlay as controller except opencontrail vrouter. Vrouter can be delivered by container as well, but it uses own kernel module, which must be compiled with target kernel. Therefore we decided to install vrouter in host OS.
Kubernetes orchestrates each compute pod with libvirt and nova-compute containers in privileged mode. These containers manage Virtual Machine workloads launched by openstack. It is interesting that we are using 2 SDN solution in single host OS. The main difference is using tcp connection to libvirt instead of unix socket.
Single Source of Truth
Infrastructure as a Code is an approach, which reuse git workflow for doing changes in infrastructure. We treat infrastructure as a micro-services. Kubernetes as underlay together with Kubernetes manifest for OpenStack overlay must be contained in single repository as a “source of truth”. As we already mentioned salt-formula-kubernetes provides not only deployment itself, but Kubernetes manifests as well. This repository also contains definition for docker image build process.
The following diagram shows single git repository generated per deployment, which contains metadata for all components including bare metal infrastructure provisioning, kubernetes and calico control services, docker image building metadata and kubernetes manifests for openstack control services. It is set of yaml structured files with classes and roles.
The following figure shows lifecycle workflow of configuration change in OpenStack. This demonstrate scenario for upgrading from kilo to liberty.
- Administrator clone git repository with reclass metadata model. Then change version from kilo to liberty and push back.
- Post hook action notifies CI system about this change, which automatically starts jobs.
- It builds and upload a new version of nova-controller docker image into private registry.
- Meantime another job updates metadata model directly on salt master server.
- Next step depends on manual action of administrator. He has to call deploy action from salt master. It updates kubernetes manifests on master.
- Then apply a new version of manifest, which trigger deployment rolling update build in kubernetes.
- During rolling update kubelet automatically downloads new images from registry.
- Final report is sent back to salt master for administrator verification.
Now let's show this concept looks in live. Next section contains several videos with live demo from environment.
Upgrade OpenStack in 2 minutes?
We created following simple deployment for showcase scenario. We deployed this in OpenStack Days Budapest and Prague. We used 6 baremetal servers for this showcase:
- Salt Master – single source of truth and orchestration
- Kubernetes master – single kube controller for cluster with manifests.
- Kube node as OpenStack Controller – 3 nodes for running control services.
- Kube node as OpenStack Compute – 2 nodes for running nova-compute, libvirt and opencontrail vrouter.
In this video from OpenStack Day Prague we explain journey behind this solution and live show of complete OpenStack Kilo deployment, scaling and then upgrading to Liberty.
In next video we showed OpenStack Kilo deployment on Kubernetes cluster in 3 minutes. Then how easily can be scaled on 3 instances for High Availability. Last part contains booting of 2 instances and live upgrade of whole OpenStack without any packet loss between VMs. Everything can be done in 10 minutes, which shows how efficient is running openstack in containers.
Last video is from deep dive session Kubernetes on OpenStack, where we show the architecture and technical design.
We proved how easily can be reused everything what was built and developed in last 3 years to provide a new enterprise deployment workflow. There still must be done testing of use cases. However this is really suitable for CI/CD OpenStack pipelines. If you are more interested please join regular IRC openstack-salt meetings or send us feedback.