Moving to Hierarchical Database infrastructure
Ongoing development of various OpenStack infrastructures with dozens of roles with only several nodes assigned to each lead us to seek an alternative formats to describe this infrastructures for consumption by our configuration management tools than we were using. Modern system configuration management software, such as the popular Salt, Puppet, and Ansible generally target configuration of each host.
This approach works well in homogeneous infrastructure, but in the heterogeneous environments with many various behavioral roles, this approach quickly gets unmanageable. We use strong service decomposition with many parameters at service level as well as on interface level. Finally we ended up with configuration files with more than 1000 lines of parameters for each host in the architecture. Adding a new host became nightmare as many parameters across several files that had to be modified.
So the goal was to replace the YAML documents that described the infrastructure with some new approach. The hierarchical databases are quite new way of serialization of service catalog and host inventory. There are 2 major implementations of hierarchical database engines suitable for configuration management. First is reclass in python and the second is hiera written in ruby. A little comparison of these 2 tools.
- reclass merges data during a recursive descent of a class hierarchy, where data in more specific classes overwrite data in more generic classes. Parameters may reference each other, independent of their position in the hierarchy.
- Puppet works by merging data during walkthough of a class list,where data latter classes overwrite data in more generic classes. Parameters may alse reference each other.
- Reclass supports class selection based just on host id/hostname where Hiera uses complete set of facts or grains sent from configuration management server.
- Hiera can use pluggable backends and encrypt specific parameters with GPG or RSA keys.
We chose reclass over hiera as it is written in Python so there's no need to and has Salt adapter already installed. We'll show how reclass does it's job. It helps to start off with a short glossary of reclass-specific terminology:
A node, usually a computer in your infrastructure.
A category, tag, feature, or role that applies to a node classes may be nested, form a class hierarchy.
A specific set of behaviour to apply.
Node-specific variables, with inheritance throughout the class hierarchy.
A class consists of zero or more parent classes, zero or more applications, and any number of parameters. A node is almost equivalent to a class, except that it usually does not (but can) specify applications.
When reclass parses a node (or class) definition and encounters a parent class, it recurses to this parent class first before reading any data of the node (or class). When reclass returns from the recursive, depth first walk, it then merges all information of the current node (or class) into the information it obtained during the recursion.
Now let's look at some real life examples. We have a defined Cluster with 2 roles, some location based parameters, where production lies in New York and development is done in Amsterdam. Following directory structure shows root of reclass inventory.
location/amsterdam.yml defines specific parameters for Amsterdam data center.
location/new_york.yml defines specific parameters for New York data center.
timezone: America/New York
cluster/web.yml defines web server role in our application cluster.
cluster/database.yml defines database role in our application cluster.
database: name ..
dev1.ams.domain.com.yml is development node in Amsterdam, that uses both roles together to save capacity.
This is just a brief example of capabilities of reclass functions. If you take 30 cervices spread across tens of servers the saving of redundant data may go to 60-80% of the original definition size. The killer features hierarchical databases are:
- Deep property merging - merge nested data structures, lists, dictionaries.
- Property interpolation - point to other variable within the definion, both scalar or complex.
By using the hiearchical data classification we are able now adhere DRY principles and keep the system's description clean and sorted. The future work is to protect sensitive data like passwords and keys within the classification at least at the VCS repository server as nature of SaltStack makes it rather difficult to maintain secrects on the managed modes within the infrastructure.
Cloud Software Engineer