Mirantis OpenStack

  • Download

    Mirantis OpenStack is the zero lock-in distro that makes deploying your cloud easier, and more flexible, and more reliable.

  • On-Demand

    Mirantis OpenStack Express is on demand Private-Cloud-as-a-Service. Fire up your own cloud and deploy your workloads immediately.

Solutions Engineering

Services offerings for all phases of the OpenStack lifecycle, from green-field to migration to scale-out optimization, including Migration, Self-service IT as a Service (ITaaS), CI/CD. Learn More

Deployment and Operations

The deep bench of OpenStack infrrastructure experts has the proven experience across scores of deployments and uses cases, to ensure you get OpenStack running fast and delivering continuous ROI.

Driver Testing and Certification

Mirantis provides coding, testing and maintenance for OpenStack drivers to help infrastructure companies integrate with OpenStack and deliver innovation to cloud customers and operators. Learn More

Certification Exam

Know OpenStack? Prove it. An IT professional who has earned the Mirantis® Certificate of Expertise in OpenStack has demonstrated the skills, knowledge, and abilities needed to create, configure, and manage OpenStack environments.

OpenStack Bootcamp

New to OpenStack and need the skills to run an OpenStack cluster yourself? Our bestselling 3 day course gives you the hands-on knowledge you need.

OpenStack: Now

Your one stop for the latest news and technical updates from across OpenStack ecosystem and marketplace, for all the information you need stay on top of rapid the pace innovation.

Read the Latest

The #1 Pure Play OpenStack Company

Some vendors choose to “improve” OpenStack by salting it with their own exclusive technology. At Mirantis, we’re totally committed to keeping production open source clouds free of proprietary hooks or opaque packaging. When you choose to work with us, you stay in full control of your infrastructure roadmap.

Learn about Our Philosophy

Trove + Cassandra = Love: NoSQL Database Solutions and the OpenStack Ecosystem

on March 14, 2014

NoSQL databases are systems for data storage and retrieval that do not primarily use the now-dominant RDBMS model: tabular data structures, organized relationally and accessed using Structured Query Language (SQL). Instead, NoSQL databases employ a host of methods, such as schema-free key-value pairs, intended to map better to a growing class of problems that may be difficult to solve with RDBMS.  For example, some problems are best approached using data-structures (for example, trees) that are hard to represent with relational tables, algorithms that are difficult to express in SQL, or problems whose efficient solution requires creation and access to very large, unstructured and/or distributed databases using massive parallelism.

Apache Cassandra NoSQL Database

Apache Cassandra is an extremely high-performance, scaleable, distributed and robust NoSQL database designed to handle very large data stores on many inexpensive commodity servers and across multiple datacenters with no single point of failure, using a very flexible, simple partitioned row-store data model. It was originally developed by Avinash Lakshman (the author of Amazon Dynamo) and Prashant Malik at Facebook to solve their Inbox-search problem. The code was published in July 2008 as free software under the Apache V2 license, and since then, development of Cassandra has continued at an amazing pace, driven in part by contributions from IBM, Twitter and Rackspace. Since February 2010, Cassandra has been an “Apache top-level project.”

Cassandra forgoes the widely used Master-Slave setup in favor of a peer-to-peer cluster. This contributes to Cassandra having no single-point-of-failure, as there is no master server which, when faced with lots of requests or when down, would render all of its slaves useless. Any number of commodity servers can be grouped into a Cassandra cluster. While this architecture is a lot more complex to implement behind the scenes, we don’t have to deal with that as users. Not having to distinguish between a Master and a Slave node allows you to add any number of machines to any cluster in any datacenter, without having to worry about what type of machine you need at the moment. Every server accepts requests from any client. Every server is equal.

What Cassandra is Good For

Cassandra is great for situations requiring:

  • Fast read or write performance

  • The ability to add more machines as you need additional capacity

  • Reliable cross-datacenter replication

… and that don’t require ACID transactions (Atomicity, Consistency, Isolation and Durability – commonly achieved by locking) in the database layer.

If you go up a level, what does that mean?

Cassandra excels at online transactions, also known as real time transactions: requests that need to fully execute in a small amount of time because otherwise, users will perceive latency. Such queries need to execute at the single millisecond level, not hundreds or thousands of milliseconds. With Cassandra’s multiple caching levels, your data can be served incredibly quickly. Every write is fast with Cassandra thanks to the log-structured storage design, and each write is persisted with a commit log, making Cassandra an excellent choice when downtime or data loss is unacceptable.

Cassandra also does well in the other area of data management – analytics. With the current release, MapReduce is supported across your stored data. MapReduce is an algorithm popularized by Google that allows for analytical queries to be run on large data sets across large numbers of servers in parallel. It’s not real time – typical jobs can take minutes if not hours – but it’s capable of processing gigantic data sets to scour your data for the information you need. Because Cassandra provides both online and analytical solutions, you can use a single technology to accomplish the majority of your data needs — beneficial for both development, QA and operational efficiency. Given that Cassandra has shown itself to work at scale, you know you can trust it to perform well as your needs grow.

 Cassandra and OpenStack

As should be clear by now, Cassandra and OpenStack are conceptually a good pairing, with OpenStack powering and abstracting the datacenters and defining the server infrastructure Cassandra needs to work, and simplifying all phases of development, deployment and operations.

Up until recently, however, managing Cassandra on OpenStack was difficult. It was possible to provision database instances using Orchestrator templates, but normal security policies (i.e., no access to database from the WAN) made management by end-users largely impractical. Today, however, the Trove OpenStack DBaaS solution has arrived – offering an API letting users interact directly with in-VM agents and enabling all possible operations defined by the  management interface.

Cassandra and OpenStack DBaaS

OpenStack DBaaS now supports the Apache Cassandra NoSQL database. Its first iteration will cover:

  • Provisioning of Cassandra DB as a single instance database.

  • Power maintenance (start, stop, restart, restart with new configurations).

  • Resize events (volume and flavors).

The next iteration of improvements for OpenStack’s Juno release will cover:

  • Configuration management.

  • Backup (nodetool snapshot + custom scripts).

  • Restore (custom scripts).

  • Incremental backup (for version Cassandra 2.x.x or above).

Conclusion

Cassandra is a highly available, Internet-scale NoSQL database with design goals that are very different from those of traditional relational databases. The differences between Cassandra and relational databases identified in this article should each be considered for their pros and cons and be evaluated in the context of your problem domain. Also, using NoSQL does not exclude the use of RDBMS – it’s quite common to have a hybrid architecture where each database type is used in different situations according its strengths.

When starting their first NoSQL project, developers are likely to enter new territory and have their first encounters with related concepts such as big data and eventual consistency. Relational databases are often associated with strong consistency, whereas NoSQL systems are associated with eventual consistency (even though the use of a certain type of database doesn’t formally imply a particular consistency model). When moving from the relational world and strong consistency to the NoSQL world, the biggest mindshift may be in understanding and architecting an application for eventual consistency. Data modeling is another area where developers may need to develop new understanding.

Cassandra is a very interesting product with a wide range of use cases. I think it’s particularly well suited for use cases involving:

  • Very large data volumes

  • Very large user transaction volumes

  • High reliability requirements for data storage

  • A dynamic data model, where data may be relatively unstructured, or whose structure may change over time

  • Cross-datacenter distribution

And now Apache Cassandra NoSQL Database service comes as part of OpenStack Database cloud service.

2 comments

2 Responses

  1. Jacek

    Well, great about Cassandra, what about HBase + OpenStack? Is going to be covered/combined within the Savanna project?

    March 18, 2014 18:07
    • Denis Makogon

      Thanks for your comment. About HBase – good question, and also good topic for the summit design session. As for me, HBase would be the part of the Trove, someday. Also, Sahara could be the great option for Trove as the provisioning engine for Hadoop and HBase. So, the short answer, Trove would support HBase. But the other question – how soon?

      April 15, 2014 23:37

Continuing the Discussion

  1. Dell Open Source Ecosystem Digest #39 - Dell TechCenter - TechCenter - Dell Community

    […] Mirantis: Trove + Cassandra = Love: NoSQL Database Solutions and the OpenStack Ecosystem NoSQL databases are systems for data storage and retrieval that do not primarily use the now-dominant RDBMS model: tabular data structures, organized relationally and accessed using Structured Query Language (SQL). Instead, NoSQL databases employ a host of methods, such as schema-free key-value pairs, intended to map better to a growing class of problems that may be difficult to solve with RDBMS. For example, some problems are best approached using data-structures (for example, trees) that are hard to represent with relational tables, algorithms that are difficult to express in SQL, or problems whose efficient solution requires creation and access to very large, unstructured and/or distributed databases using massive parallelism. Read more. […]

    March 21, 201403:30

Some HTML is OK


or, reply to this post via trackback.