This guest post by Shawn Bower at Cornell University was originally published on the Docker blog.
In my role as Cloud Architect I often hear, “Docker sounds great but it won’t work for my application.” In my experience Docker can improve the state of many applications including legacy and vendor solutions. The first production workload at Cornell on Docker was the University’s wiki which is run on Atlassian’s Confluence in April 2015.
Our installation of Confluence is an interesting intersection of legacy and vendor solution. We have customized the code, to work with our single sign on solution, as well as a custom synchronization with LDAP for group management. When we started the project to move Confluence to the cloud the infrastructure, the software was old, compiled from the source and was being hand maintained.
The stack looked like this:
This presented us with a number of challenges including:
- Apache 2.2.10
- OpenSSL 0.9.8H
- Java 1.6 (EOL 2/13)
- Confluence 5.6.5
As we looked at the state of the application, we knew we wanted to make sure the environment was supportable going forward. We decided to move towards an approach that applied the principles of infrastructure as code. We wanted to have a repeatable process for building out Confluence environments and we wanted to be able to track changes that were made. Using Docker helped to supercharge this effort.
We were able to leverage Dockerfiles to create a reproducible infrastructure, coupled with Puppet we can create instance specific images. At Cornell we have implemented a series of base images that we use to build on. In the case of Confluence, we build on a tomcat image which is built on a java image which is based on a Cornell specific Ubuntu image. These base images are rebuilt with the latest patches on a daily basis. Every time we build and deploy Confluence we have a fully patched system, no longer are we running on an end of life’d version of Java!
These builds are no longer done by hand as we use Jenkins to automatically build and deploy our containers to our dev/test environments while pushing a tagged copy of these images to our private Docker Trusted Registry. If we need to roll back, we can easily grab the last known working image tag. When we have sufficiently tested in these environments we can automatically deploy to production.
The project for us to Dockerize and move Confluence to the cloud took two months and was highly successful. As I write this article I checked on Confluence and it has been running for 3 months and 2 days. The only reason it hasn’t been up longer is that since we have Dockerized we have been doing Quarterly upgrades. This is amazing! I remember that in the past upgrade projects were months long but now we do them in a couple weeks four times a year. For the first time at Cornell we have been able to remain on a current patched Confluence release. In the past we used to automatically restart Confluence every Sunday to address performance issues. In addition to the automatic restarts we used to restart Confluence multiple times a week to address intermittent errors. Docker helped to decrease time spent firefighting issues with the environment and have enabled us to eliminate these restart issues entirely.
After Dockerizing and moving Confluence to the cloud we have been able to drastically improve both HA and DR. The on premises deployment of Confluence used a single VM in production with a single database backend also running on a single VM. This was partly because Confluence does not allow more than one instance to run unless you pay extra for their Confluence Data Center product. In the cloud we are able to use an auto scaling group which will maintain one healthy server running at all times. We are using a multi AZ database which allows us to stay up even when a single zone fails. For all our backups we snapshot our volumes then migrate them to a separate region within 30 minutes. So we can have Confluence up and running in another region within 30 minutes. On premise DR relies on tape backup and would takes hours or days to complete. After all is said in done we have been able to dramatically increase the resilience and durability of Confluence and it cost $2,100 less annually to run.
- The version of Java stopped receiving public updates in February of 2013.
- Multiple vulnerabilities reported in the OpenSSL and Apache versions we were using.
- The last upgrade project for Confluence took six months.
- We had multiple environments for development, testing and production and over time these servers had fallen out of sync.
- The engineers who had originally set up these servers and made customizations, have left the University. When we looked how we might add high availability or whether a disaster, we found it too difficult to replicate the environment.
What is the bottom line?
We are talking about a wiki after all, this is not a sexy application. This is not a huge decrease in cost. At Cornell, as with most research universities, we are not in the business of running wikis. Confluence is an administrative application that is important but doesn’t directly enhance Cornell’s mission. In the 6 months before this project we spent 1770 staff hours supporting Confluence in the six months after we spent 178 hours. This is a dramatic improvement (10X reduction). We have spent this time working with researchers and teaching them about Docker. We have been able to create solutions running on swarm that process massive amounts of data. The bottom line is that the less time spent supporting legacy and vendor application, the more time we can spend helping Cornell researchers change the world.
I urge you to consider Docker when looking at your legacy and vendor application. You will be surprised by the efficiencies you find. Also do yourself a favor at look at the Docker Enterprise. We have benefited greatly from Docker’s commercial support and the relationships we have made within Docker the company.