On January 14, 2014, Dmitry Borodaenko presented on Ceph and OpenStack at the inaugural Silicon Valley Ceph User group. Here’s an excerpt of the presentation transcript; You can find the full transcript on OpenStack:Now.
Looking for more
about Ceph and
How to Stop Worrying
And Learn to
January 22, 2014
10 am Pacific
To understand how Ceph works as part of Mirantis OpenStack, we need to take that 20,000-foot view first. You need to know what Ceph is, what OpenStack is, and what you can do with them. And, then, we’ll get into details that actually makes this combination work. So, first we’ll explain how Ceph came about and what it turned out to be.
What is Ceph?
Defining Ceph is less trivial than you’d think. When I first heard of it, it was called a file system, but actually it’s not quite there yet. On the other hand it’s quite a bit more than that. Ceph is a free clustered storage platform that provides unified object, block, and file storage.
Somewhere along the way it picked up an object storage backend, which turned out to be more important than its ability to store files. And, on top of that, Ceph has added a block storage layer, which uses also objects as a backend to provide RBD block devices, and that’s the most interesting for OpenStack.
To reiterate, Ceph consists of:
Object storage. RADOS objects support snapshotting, replication, and consistency.
Block storage. RBD block devices are thinly provisioned over RADOS objects and can be accessed by QEMU via librbd library, as seen in the following figure.
Figure 1 Ceph block storage
File storage. CephFS metadata servers (MDS) provide a POSIX-compliant overlay over RADOS.
Now, let’s talk about OpenStack. Figure 2 outlines a couple of core components, just the ones that are actually relevant to the discussion of what Ceph can do for OpenStack.
What is Mirantis OpenStack?
OpenStack is an open source cloud computing platform.
Figure 2 OpenStack components relevant to Ceph
Nova computing provisions VMs, Cinder provide volumes for block devices or volumes for those VMs. Those VMs are based on images taken from Glance. They can store objects in the Swift object storage system, which can also be used as a storage backend for Glance.
Mirantis OpenStack is basically a set of hardened OpenStack packages wrapped in a nice tooling that allows it to be installed very simply out of the box with a range of different, rather complex configurations, including a configuration where you use Ceph instead of everything else for storage.
Mirantis provides the Fuel utility to simplify the deployment of OpenStack and Ceph. Fuel uses Cobbler, MCollective, and Puppet to discover nodes, provision OS, and set up OpenStack services, as shown in the following diagram.
Figure 3 Fuel in action
As you can see, we use Cobbler to provisions nodes, and then we use MCollective and Puppet to discover them and distribute Puppet manifests, and set up OpenStack using those Puppet manifests.
How does Ceph fit into OpenStack?
RBD drivers for OpenStack make libvirt configure the QEMU interface to librbd.
So how does Ceph fit into OpenStack? Very simple, that RADOS block device, or in short RBD, a block device layer on top of Ceph object storage has drivers for QEMU. And OpenStack has drivers for RBD, which make libvirt tell QEMU to use RBD backend for all its storage needs. This process is shown in the next figure.
Figure 4 How Ceph fits into OpenStack
So you get quite a lot. First of all, unlike a basic Cinder, LVM-based backend, Ceph provides multi-node redundancies so that if you lose one storage node, your volumes that were stored on that node do not disappear because they have a replica elsewhere.
Another nice thing is it allows you to do copy-on-write clones of images and volumes and instances. So, once you got system image, you can move it around and start different VMs based on it without any unnecessary data copy operations, which actually speeds things up and makes your storage usage a bit more efficient.
And another thing that comes out of all that is that since Ceph allows you to do all sorts of storage, like block storage and object storage and can be a backend for Cinder, Nova, and Glance, that means that all of your storage needs of your OpenStack cloud can be based on the same storage pool. Therefore, the same set of hard drives can be distributed between your needs as necessary, so you don’t have to have dedicated object storage and block storage nodes–it’s all part of the cloud or just generic storage.
And that’s one of the reasons I was having trouble called Ceph a file system. It’s really more of a platform because it just gives you storage, period.
Finally, one of the nicer things that you can do with Ceph thanks to the things we’ve done with the recent release of Fuel is getting live migrations to work on all sorts of Ceph-backed instances.
To sum things up, Ceph benefits include:
Multi-node striping and redundancy for block storage (Cinder volumes and Nova ephemeral drives)
Copy-on-write cloning of images to volumes and instances
Unified storage pool for all types of storage (object, block, and POSIX)
Live migration of Ceph-backed instances
However, Ceph is not a panacea for all problems. It has some issues that still need to be overcome. Those problems are:
Ceph is quite sensitive to clock drift. If you want to use Ceph, you’d better make sure that you’ve got your infrastructure rock solid. If you don’t, your servers drift out of sync, and your cluster breaks.
Multisite support. Unlike Swift which has had multisite support for quite some time, Ceph has only capacity for asynchronous replication is the most recent release called Emperor. Before then Ceph was synchronous replication only, and sync classification means that you cannot replicate on a long-distance link, which means you cannot replicate between multiple sites, which does limit Ceph’s usability. We haven’t yet tried the asynchronous replications that was introduced in Emperor, so I cannot tell you how well it works, how suitable it is for different needs, but we are very excited about it.
Block storage density. Unlike Cinder, which just gives one replica of data, so we basically have 100 percent utilization of all your hard drives, with Ceph you have to have data replication, and that means that you have at least two copies of your data for all of your data. That means that your actual raw storage capacity has to be twice or three times bigger than your data set.
One way to address that that’s been promised in the next iteration in a tool called Firefly is erasure coding, which means that instead of a full replication of the data, you could have erasure coded striping that would make the data multiplication requirements of Ceph quite a bit smaller. Instead of twice the data or three times the data, you can have one and a half times the data, or even 1.2 times the data.
Swift API gap. If you do want to use Ceph as your primary object storage, there will be some minor bits of Swift APIs that will not be 100 percent supported in Ceph, because right now the only way to use Swift API with Ceph is to use RADOS Gateway, which is an implementation of Swift API. That’s not Swift itself, so there are bound to be some gaps.
That’s all fine and dandy, but what has Fuel got to do with it?