Getting started with Kubernetes, part 1: Introduction to YAML [webinar]
If you have any questions, or if there are particular topics you'd like us to do training webinars on, please put them in the comments!
Webinar Transcript:
Alright, so there are different uses for YAML. You can see YAML in things like configurations, such as Kubernetes, or Swarm or lots of things like OpenStack templates, Ansible Maven, you see it all over the place. And we are going to cover that as we go through, so we'll talk about the different uses. We'll talk about the overall structure. There are actually data types, even though this is all you know, text, and then we will look at some of the tools that are available for you.Uses cases for YAML
Alright, so let's look at the different uses for YAML. As I said before, configurations, templates and so on. So for example, in Kubernetes, we use YAML for defining things like pods. So a pod is a unit of workload that Kubernetes orchestrates.apiVersion: v1 kind: Pod metadata: name: rss-site labels: app: web spec: containers: - name: front-end image: nginx ports: - containerPort: 80 - name: rss-reader image: nickchase/rss-php-nginx:v1 ports: - containerPort: 88Swarm, as you can see here, is totally different, even though they are both talking about orchestrating containers.
version: "3.9"So this is OpenStack. You'll notice they're all doing different things. But what you're seeing is a format, okay, and that format basically takes care of what makes it YAML.
services: web: image: 127.0.0.1:5000/stackdemo build: . ports: - "8000:8000" redis: image: redis:alpine
heat_template_version: 2015-04-30So, you've got your text, you've got indentations, such as in this Ansible snippet...
parameters: key_name: type: string description: Name of a KeyPair
resources: server: type: OS::Nova::Server properties: key_name: {get_param: key_name} flavor: m1.small image: ubuntu-trusty-x86_64
tasks:... or no indentations depending on the situation, such as this Maven snippet.
- action: uri url=http://www.example.com return_content=yes register: webpage
- fail: msg: 'service is not happy' when: "'AWESOME' not in webpage.content"
modelVersion: 4.0.0 groupId: io.takari.polyglot artifactId: yaml-project version: 0.0.1-SNAPSHOT name: 'YAML Maven Love'
properties: {sisuInjectVersion: 0.0.0.M2a}
What makes a YAML document a YAML document?
But you'll notice there's no common vocabulary. And that's important to understand because YAML is not a "language" in that you have to learn all the keywords and everything like that. It's a type of markup language.
(Although, technically speaking, YAML stands for YAML Ain't Markup Language -- or Yet Another Markup Language. Iit depends who you ask.)
So there's no specific vocabulary. It's based on the formatting and it's made to be predictable.
Overall YAML structure
Alright, so let's look at overall YAML structure to start out with.Associative arrays in YAML
A very simple YAML document would be something like these associative arrays.name: nginx image: nginx:1.10Okay, now you've dealt with this in all kinds of other languages. You've got a key and a value, or a name and the data that goes with it. So this is kind of the simplest YAML document that you can have. So these associative arrays, we can put all over other documents.
Now, here's the interesting thing that we have to make note of: this is our traditional YAML format. But YAML is actually a superset of JSON. So if it's valid JSON, it's valid YAML. For example, if I wanted to write this YAML document as JSON, I could do that.
{ "name": "nginx", "image": "nginx:1.10" }And this format you're familiar with, okay, you've seen this a thousand times. It's the same thing.
Now, a couple things to note here. See how there's a colon here, and we have not put this in quotes. That's why you must put a space after the colon in your associative arrays. If it was JSON, it wouldn't matter because you got your quotes setting everything off, but that's the price you pay for simplicity.
Nested values in YAML
Okay, now, another thing that we can do is nested values. So for example, the value of the metadata in this document is another associative array, and in that case, that would be name: pod-example. So the way that we associate this array with this name is to indent it.apiVersion: v1 kind: Pod metadata: name: pod-example spec: name: ubuntu image: ubuntu:trusty command: ["myscript"] args: ["arg1", "arg2"]Now, like with other, other indent-dependent languages, you need to make sure that you are consistent. So if you're going to use two spaces, that's fine. If you're going to use four spaces, that's fine. But you must be consistent.
The other thing is make sure that you never ever, ever use tabs instead of spaces; you must use spaces to indent your YAML.
So in this case, you can see that we are indenting using two spaces, and we've got two objects that have nested values. Also, notice that there are multiple items here that are all part of spec and since they're all indented, we know what they all are.
Sequences in YAML
In this case, we have five items, each of which is an item of seo_metadata. But as you saw earlier, we had one spec with four parameters. These hyphens are the important part here; they're what indicate that we have a single list of items, and that list is the value of seo_metadata.seo_metadata: - One weird trick to reduce belly fat - You'll never believe these texts! - Adorable dogs cavorting in the snow - Which celebrities hate other celebrities - Kubernetes, Kubernetes, Why Kubernetes, Kubernetes Why
Multiple documents in a single YAML file
You can put multiple objects in a single YAML document, and the way that you do that is by separating them with these three dashes. Okay, so here, I have two different objects (they happen to be the same) but there are two objects in this document.--- apiVersion: v1 metadata: name: pod-example spec: name: ubuntu image: ubuntu:trusty command: ["myscript"] --- apiVersion: v1 metadata: name: pod-example spec: name: ubuntu image: ubuntu:trusty command: ["myscript"][[ NOTE: Although for clarity's sake the webinar talked about multiple objects in a single document, the actual terms used in YAML are that there are multiple documents in a single YAML file. ]]
YAML comments
Another thing for the basic format that we need to know is how you put in comments.apiVersion: extensions/v1beta1 kind: DaemonSet metadata: # Unique key of the DaemonSet instance name: daemonset-example spec: template: spec: containers: # This container is run once on each Node in the cluster - name: daemonset-example image: ubuntu:trusty command: - /bin/sh args: - -c # This script is run throughThis one has a lot of comments in it, they are all preceded by the the hash symbol, or pound sign depending on where you are and what you like to call it. Okay, so you can see, the comments here are in bold.
# `sh -c <script>` - >- while [ true ]; do echo "DaemonSet running on $(hostname)" ; sleep 10 ; done
And here, we can also see some of the flexibility. Let's kind of work our way through this to see how all of this structure manifested itself. We have metadata, we have one item of metadata with a comment. The spec consists of a template, which consists of another spec, which consists of one or more containers. And those containers are set off by a hyphen (-). So if we were going to put another container in here, it would be down here at the same level with another hyphen.
The definition of this container has multiple values, including command and arguments. So here, you can see we've got an argument that's set off by the hyphen, it also has a hyphen, but it doesn't matter, because we've got that space in there. So that's fine.
Hopefully, that was all clear. If you have any questions, put them into the question box, and we will get to them at the end. Next, we'll talk about data types.
Data types in YAML
So far, that's all that's all text, which is fine. The problem and the opportunity here are that these YAML files are typically used by other software to DO things. You know, to configure your Kubernetes, or to install your application, or to define your OpenStack, Heat stack, whatever. And as such, there are going to be times when the type of the data does actually matter. Is that a string? Is it an integer? Is it a hex value? And YAML actually does define those things, so that your software can get at them.There are built in data types.
- str
- map/dictionary
- int
- float
- boolean
- base64
- binary
- set/list
- timestamp
- hex
Reading a YAML file with Python
Okay, so let's take a quick look at how we would actually read a YAML file. If I were going to install PyYAML, (because I use Python, that's just sort of my go-to language, you don't have to; there are, as we'll see later, libraries for YAML in pretty much every language) I'm just going to dopip install PyYAML
Then I've got my script.import yamlAs you can see as I'm importing the YAML itself, and then I'm going to open the file, just as you normally would. I'm going to use the YAML package to load all of the documents in there.
with open("/Users/nchase/Documents/yamlwebinar/data.yml") as f:
docs = yaml.load_all(f, Loader=yaml.FullLoader) for doc in docs: for k, v in doc.items(): print(k, type(v), v)
So remember, we said that we can separate them with those three dashes. And then we're going to run through each one. And for each one, in my case, I know that I'm going to have just associative arrays. So I want to print out the name and the value. So I'm going to grab those, I'm going to get the name, the type of the value, and then the value itself.
So if I were to grab all of these, let's take a look at them or put them in a file:
this_is_a_list: - one - two --- this_is_a_string: one --- this_is_an_integer: 1 --- this_is_a_float: 1.0 --- this_is_a_dictionary: one: 1So we're used to the list, we saw the sequences. If we just take the expression one, all right, we can express it as a string, an integer, a float, a dictionary. Now notice, I'm not really doing anything different there. So will YAML be able to tell the difference between an integer and a float based on just what's there?
this_is_a_list <class 'list'> ['one', 'two'] this_is_a_string <class 'str'> one this_is_an_integer <class 'int'> 1 this_is_a_float <class 'float'> 1.0 this_is_a_dictionary <class 'dict'> {'one': 1}Alright, it noticed that it was a list and it knew that it was this array of two items. It recognized the string, it recognized the integer, it recognized the float because we had the decimal point and the following zero, and it recognized the dictionary. Notice that it's here in JSON format.
Okay, so let's look at our next set:
--- hex: 0x4824 --- octal: 010 --- date: 2007-06-01 --- boolean: Off --- boolean: True --- boolean: Yes # Note that this depends on YAML version --- this_is_another_string: "1.0"So hex value, again, this is a string, or is it a number? And octal values? Let's see what we get on these. Let's run the script.
hex <class 'int'> 18468 octal <class 'int'> 8 date <class 'datetime.date'> 2007-06-01 boolean <class 'bool'> False boolean <class 'bool'> True boolean <class 'bool'> True this_is_another_string <class 'str'> 1.0Hex comes out as an integer. In this parser, the type is noted as an integer, but you'll notice that the value is not "zero x 4824" (0x4824). That little "zero x" at the beginning was the cue to the YAML parser that we're talking about a hexadecimal value here. And what it's done is it's translated that into a base 10 integer.
It's done the same with the octal value octal being base eight. For those of you who are far enough away from math class, our normal numbers are base 10. So it'd be 0123456789 and then we go over to the next column. For octal, that would be 01234567 and then we go over to the next column. So this is actually a value of 8, and YAML this because we preceded the number with a zero.
Okay, so you need to make sure you understand these things, because if you just have a leading zero in front of your integers, and all of a sudden you're getting weird, crazy values, your YAML parser may think that you're dealing in base eight with octal values.
It does recognize dates. Now remember, I showed you timestamps, some parsers will recognize the actual timestamp value. This one does not
You'll notice that all of these were recognized as Booleans. They're just strings, but they were recognized as Booleans. So On/Off, True/False, Yes/No were all recognized as Booleans. Now, as far as Yes/No, be careful, because this depends on the version of YAML. In YAML 1.1, Yes/No is recognized as a Boolean. However, in YAML 1.2, it is not; it's just a string.
So be careful. Ah, and of course, you know, just putting your quotes around something is going to make it into a string, even if it normally would be recognized as something else.
Going back to our data types, there are other ways that we can deal with data types, and that would be to sort of force what we want on them.
picture: !!binary | R0lGODdhDQAIAIAAAAAAANn Z2SwAAAAADQAIAAACF4SDGQ ar3xxbJ9p0qa7R0YxwzaFME 1IAADs= --- this_is_an_integer: !!float 1 --- this_is_a_float: !!int "1"So if I were to come in here and save the file this time, clear this. So I can go ahead and tell YAML specifically that I want to use this built-in type. So this is binary data, and I want to force this to be a float, and I can try and force it to be an integer. But let's see what happens when we do all that.
picture <class 'bytes'> b"GIF87a\r\x00\x08\x00\x80\x00\x00\x00\x00\x00\xd9\xd9\xd9,\x00\x00\x00\x00\r\x00\x08\x00\x00\x02\x17\x84\x83\x19\x06\xab\xdf\x1c['\xdat\xa9\xae\xd1\xd1\x8cp\xcd\xa1L\x13R\x00\x00;" this_is_an_integer <class 'float'> 1.0 this_is_a_float <class 'int'> 1Okay. So if I come back here. All right. So you'll notice now that our binary data has been translated. Okay. Our float has been made into a float now. And also our integer, even though this was in quotes, was translated into an integer. But of course, if you had the word "one", you that obviously is not going to fly.
One more thing on how these work. This is not so much about data types. But remember, I told you, we were talking about these formats here. So a lot of times in YAML, you're going to have multi-line data, and you're going to have to deal with it. There are two ways that YAML deals with multi-line data.
example: > HTML goes into YAML without modificationIn this case, you'll notice I'm using a greater than sign (>). I want to take all of this text, and I want it to be just one string. Now notice, I have indented it here. Because if I were to put this out here...
message: | <blockquote style="font: italic 1em serif"> <p>"Three is always greater than two, even for large values of two"</p>
<p>--Author Unknown</p> </blockquote>
example: > HTML goes into YAML without modificationthen it would be an error because the parser wouldn't know that it was part of the example. So let's look at the difference between the greater than sign and the pipe (|).
Alright, so if I were to come back here, save this file and run it...
example <class 'str'> HTML goes into YAML without modification... let's take a look. What I see here is that our first string has been put into a single line, even though it was on multiple lines. However, when I use the pipe, the pipe then goes ahead, and it sets it up so that the formatting is preserved. You'll notice everything is preserved, including this blank line here. So if you need to preserve your spaces, within a block, you can use the pipe to do that.
message <class 'str'> <blockquote style="font: italic 1em serif"> <p>"Three is always greater than two, even for large values of two"</p>
<p>--Author Unknown</p> </blockquote>
Now, notice all of this whitespace is gone. These are these are up against the left margin here, even though there's plenty of whitespace here, because technically speaking, as part of the data, that whitespace does not exist. That whitespace is solely to identify that that text is part of the message object.
So, one place that you see this come up a lot is in things like ConfigMaps for Kubernetes.
apiVersion: v1 kind: ConfigMap metadata: name: app-config namespace: default data: special.how: very weight: 42 picture: | R0lGODdhDQAIAIAAAAAAANn Z2SwAAAAADQAIAAACF4SDGQ ar3xxbJ9p0qa7R0YxwzaFME 1IAADs=So, you can see an example of this, where you've got different different pieces of data specified by different names. And you're, you're being very specific about what you want. Oh, quick note here, notice, that you can put a dot (.) in your value names, so your associative arrays can have that for your names.
Another place that this comes up a lot is in passing secrets to Kubernetes.
apiVersion: v1 kind: Secret metadata: name: mysecret type: Opaque data: username: !!base64 YWRtaW4= password: !!base64 MWYyZDFlMmU2N2RmAgain, we are using the base64 type here. My parser doesn't recognize it, but the parser that's used by Kubernetes does. So that's where, again, it's important to know your parser, know what you are working with.
YAML tools
If you've ever sat down with just a plain text editor, that's all YAML is. You can use whatever you want: EMACS, vi, you know, whichever religious order you are part of, feel free. But there are definitely tools that are on the market that are there to kind of help you use YAML more easily.For example, I love the fact that there are so many open source projects. Also, as I said earlier, I wasn't aware of a language that didn't have a YAML parser. So if you go to yaml.org, you will see tons and tons and tons and tons of options. So you can definitely go there, pick whatever you need from whatever language: there's test suites, there's different versions, and so on.
There are also tools that will allow you to easily work with your YAML. For example, this is a whole collection online at onlineyamltools.com. By the way, I don't work for any of these. I'm not being compensated. They're just fun things that I have found. So for example, if I were to take my data here it's already color coded.
And for example, I use VSCode. VSCode already understands YAML because I have the YAML plugin. So it does that for me. But if I didn't, I could take it in here. And it would color code for me, or I could use minify.
Minify takes, what it does is it takes your YAML and makes it smaller. Now remember our spacing and our indents. All that's crucial to YAML; you can't really minify YAML itself. But you can turn it into its JSON version, and minify that.
There are also a ton of tools for converting, you know, YAML to JSON, YAML to XML, vice versa, you can also convert them to a class. So for example, if I wanted to convert this to a class, here's my class definition in Python, and then I could then, you know, set my data, and so on. So this is something that I would then want to use when I was doing my programming.
Linting is one of my favorite things here. So a lot of times what will happen is you'll wind up with a complicated object. So for example, let me try to go back here. Let's say I have a complicated demo object, and it's not working for some reason, and I can't figure out why. What I can do is run it through a YAML linter. And what that will do is, it will find if there is an error, so in this case, there is an error and it'll clean it up to what it thinks you want. So be careful.
And then one last thing, StrictYAML. There is a movement to create something that is a little less loosey goosey and more predictable. The good thing about having things flexible is they're easy to use. The bad thing about having things flexible is they can be really difficult to use. So in this case, you can check out StrictYAML.
So that was part 1! Please join us for Getting Started with Kubernetes, Part 2: Creating K8s objects with YAML on March 18!