We have updated our Privacy Policy, effective from 25 September 2024. It’s available here

< BLOG HOME

Getting started with Kubernetes, part 1: Introduction to YAML [webinar]

Nick Chase - March 03, 2021
image
Last week, we were pleased to present the first in a new series of webinars, Getting Started with Kubernetes, Part 1: Introduction to YAML.  During that webinar we received dozens of questions -- way more than we could answer in the short time that we had, so we're going to be putting out a separate blog that answers all of those questions in the next few days. In the meantime, though, we wanted to bring you the webinar itself, as well as the transcript (for those people, like me, who like to skim down to the parts they're looking for.)

If you have any questions, or if there are particular topics you'd like us to do training webinars on, please put them in the comments!
 

Webinar Transcript:

Alright, so there are different uses for YAML. You can see YAML in things like configurations, such as Kubernetes, or Swarm or lots of things like OpenStack templates, Ansible Maven, you see it all over the place. And we are going to cover that as we go through, so we'll talk about the different uses. We'll talk about the overall structure. There are actually data types, even though this is all you know, text, and then we will look at some of the tools that are available for you.

Uses cases for YAML

Alright, so let's look at the different uses for YAML. As I said before, configurations, templates and so on. So for example, in Kubernetes, we use YAML for defining things like pods. So a pod is a unit of workload that Kubernetes orchestrates. 
apiVersion: v1
kind: Pod
metadata:
  name: rss-site
  labels:
    app: web
spec:
  containers:
    - name: front-end
      image: nginx
      ports:
        - containerPort: 80
    - name: rss-reader
      image: nickchase/rss-php-nginx:v1
      ports:
            - containerPort: 88
Swarm, as you can see here, is totally different, even though they are both talking about orchestrating containers. 
version: "3.9"

services:  web:    image: 127.0.0.1:5000/stackdemo    build: .    ports:      - "8000:8000"  redis:            image: redis:alpine
So this is OpenStack. You'll notice they're all doing different things. But what you're seeing is a format, okay, and that format basically takes care of what makes it YAML.
heat_template_version: 2015-04-30

parameters:   key_name:     type: string     description: Name of a KeyPair
resources:   server:     type: OS::Nova::Server     properties:       key_name: {get_param: key_name}       flavor: m1.small              image: ubuntu-trusty-x86_64
So, you've got your text, you've got indentations, such as in this Ansible snippet...
tasks:

  - action: uri url=http://www.example.com return_content=yes     register: webpage
  - fail:       msg: 'service is not happy'              when: "'AWESOME' not in webpage.content"
... or no indentations depending on the situation, such as this Maven snippet.
modelVersion: 4.0.0
groupId: io.takari.polyglot
artifactId: yaml-project
version: 0.0.1-SNAPSHOT
name: 'YAML Maven Love'

properties: {sisuInjectVersion: 0.0.0.M2a}

What makes a YAML document a YAML document?

 

But you'll notice there's no common vocabulary. And that's important to understand because YAML is not a "language" in that you have to learn all the keywords and everything like that. It's a type of markup language.
(Although, technically speaking, YAML stands for YAML Ain't Markup Language -- or Yet Another Markup Language. Iit depends who you ask.)
So there's no specific vocabulary. It's based on the formatting and it's made to be predictable.

Overall YAML structure

Alright, so let's look at overall YAML structure to start out with. 

Associative arrays in YAML

A very simple YAML document would be something like these associative arrays. 
name: nginx
image: nginx:1.10
Okay, now you've dealt with this in all kinds of other languages. You've got a key and a value, or a name and the data that goes with it. So this is kind of the simplest YAML document that you can have. So these associative arrays, we can put all over other documents. 
Now, here's the interesting thing that we have to make note of: this is our traditional YAML format. But YAML is actually a superset of JSON. So if it's valid JSON, it's valid YAML. For example, if I wanted to write this YAML document as JSON, I could do that. 
{
"name": "nginx",
"image": "nginx:1.10"
}
And this format you're familiar with, okay, you've seen this a thousand times. It's the same thing.
Now, a couple things to note here. See how there's a colon here, and we have not put this in quotes. That's why you must put a space after the colon in your associative arrays. If it was JSON, it wouldn't matter because you got your quotes setting everything off, but that's the price you pay for simplicity. 

Nested values in YAML

Okay, now, another thing that we can do is nested values. So for example, the value of the metadata in this document is another associative array, and in that case, that would be name: pod-example. So the way that we associate this array with this name is to indent it. 
apiVersion: v1
kind: Pod
metadata:
  name: pod-example
spec:
  name: ubuntu
  image: ubuntu:trusty
  command: ["myscript"]
  args: ["arg1", "arg2"]
Now, like with other, other indent-dependent languages, you need to make sure that you are consistent. So if you're going to use two spaces, that's fine. If you're going to use four spaces, that's fine. But you must be consistent. 
The other thing is make sure that you never ever, ever use tabs instead of spaces; you must use spaces to indent your YAML.
So in this case, you can see that we are indenting using two spaces, and we've got two objects that have nested values. Also, notice that there are multiple items here that are all part of spec and since they're all indented, we know what they all are. 

Sequences in YAML

In this case, we have five items, each of which is an item of seo_metadata. But as you saw earlier, we had one spec with four parameters. These hyphens are the important part here; they're what indicate that we have a single list of items, and that list is the value of seo_metadata.
seo_metadata:
- One weird trick to reduce belly fat
- You'll never believe these texts!
- Adorable dogs cavorting in the snow
- Which celebrities hate other celebrities
- Kubernetes, Kubernetes, Why Kubernetes, Kubernetes Why

Multiple documents in a single YAML file

You can put multiple objects in a single YAML document, and the way that you do that is by separating them with these three dashes. Okay, so here, I have two different objects (they happen to be the same) but there are two objects in this document.
---
apiVersion: v1
metadata:
  name: pod-example
spec:
  name: ubuntu
  image: ubuntu:trusty
  command: ["myscript"]
---
apiVersion: v1
metadata:
  name: pod-example
spec:
  name: ubuntu
  image: ubuntu:trusty
  command: ["myscript"]
[[ NOTE:  Although for clarity's sake the webinar talked about multiple objects in a single document, the actual terms used in YAML are that there are multiple documents in a single YAML file. ]]

YAML comments

Another thing for the basic format that we need to know is how you put in comments. 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  # Unique key of the DaemonSet instance
  name: daemonset-example
spec:
  template:
    spec:
      containers:
      # This container is run once on each Node in the cluster
      - name: daemonset-example
        image: ubuntu:trusty
        command:
        - /bin/sh
        args:
        - -c
        # This script is run through 

        # `sh -c <script>`         - >-           while [ true ]; do           echo "DaemonSet running on $(hostname)" ;           sleep 10 ;           done
This one has a lot of comments in it, they are all preceded by the the hash symbol, or pound sign depending on where you are and what you like to call it. Okay, so you can see, the comments here are in bold. 
And here, we can also see some of the flexibility. Let's kind of work our way through this to see how all of this structure manifested itself. We have metadata, we have one item of metadata with a comment. The spec consists of a template, which consists of another spec, which consists of one or more containers. And those containers are set off by a hyphen (-). So if we were going to put another container in here, it would be down here at the same level with another hyphen. 
The definition of this container has multiple values, including command and arguments. So here, you can see we've got an argument that's set off by the hyphen, it also has a hyphen, but it doesn't matter, because we've got that space in there. So that's fine. 
Hopefully, that was all clear. If you have any questions, put them into the question box, and we will get to them at the end. Next, we'll talk about data types. 

Data types in YAML

So far, that's all that's all text, which is fine. The problem and the opportunity here are that these YAML files are typically used by other software to DO things. You know, to configure your Kubernetes, or to install your application, or to define your OpenStack, Heat stack, whatever. And as such, there are going to be times when the type of the data does actually matter. Is that a string? Is it an integer? Is it a hex value? And YAML actually does define those things, so that your software can get at them.
 There are built in data types. 
  • str
  • map/dictionary
  • int
  • float
  • boolean
  • base64
  • binary
  • set/list
  • timestamp
  • hex
Now, here's the thing that you have to understand. The YAML specification defines all these things, which is great, but not every YAML parser is going to support all of these types. For example, I'm going to use PyYAML in this webinar, and I'll show you how we're going to build that script in a second. But PyYAML, or at least the version of PyYAML that I have, doesn't support base64 directly. So the main thing is, that you have to understand what your parser supports and deal with it appropriately.For example, if I wanted to use base64, I would have to have it in there as a string, and then manipulate it through my Python script.

Reading a YAML file with Python

Okay, so let's take a quick look at how we would actually read a YAML file. If I were going to install PyYAML, (because I use Python, that's just sort of my go-to language, you don't have to; there are, as we'll see later, libraries for YAML in pretty much every language) I'm just going to do 
pip install PyYAML
Then I've got my script.
import yaml

with open("/Users/nchase/Documents/yamlwebinar/data.yml") as f:
docs = yaml.load_all(f, Loader=yaml.FullLoader) for doc in docs:   for k, v in doc.items():        print(k, type(v), v)
As you can see as I'm importing the YAML itself, and then I'm going to open the file, just as you normally would. I'm going to use the YAML package to load all of the documents in there. 
So remember, we said that we can separate them with those three dashes. And then we're going to run through each one. And for each one, in my case, I know that I'm going to have just associative arrays. So I want to print out the name and the value. So I'm going to grab those, I'm going to get the name, the type of the value, and then the value itself.
So if I were to grab all of these, let's take a look at them or put them in a file:
this_is_a_list:
 - one
 - two
---
this_is_a_string: one
---
this_is_an_integer: 1
---
this_is_a_float: 1.0
---
this_is_a_dictionary:
 one: 1
So we're used to the list, we saw the sequences. If we just take the expression one, all right, we can express it as a string, an integer, a float, a dictionary. Now notice, I'm not really doing anything different there. So will YAML be able to tell the difference between an integer and a float based on just what's there? 
this_is_a_list <class 'list'> ['one', 'two']
this_is_a_string <class 'str'> one
this_is_an_integer <class 'int'> 1
this_is_a_float <class 'float'> 1.0
this_is_a_dictionary <class 'dict'> {'one': 1}
Alright, it noticed that it was a list and it knew that it was this array of two items. It recognized the string, it recognized the integer, it recognized the float because we had the decimal point and the following zero, and it recognized the dictionary. Notice that it's here in JSON format. 
Okay, so let's look at our next set: 
---
hex: 0x4824
---
octal: 010
---
date: 2007-06-01
---
boolean: Off
---
boolean: True
---
boolean: Yes # Note that this depends on YAML version
---
this_is_another_string: "1.0"
So hex value, again, this is a string, or is it a number? And octal values? Let's see what we get on these. Let's run the script. 
hex <class 'int'> 18468
octal <class 'int'> 8
date <class 'datetime.date'> 2007-06-01
boolean <class 'bool'> False
boolean <class 'bool'> True
boolean <class 'bool'> True
this_is_another_string <class 'str'> 1.0
Hex comes out as an integer. In this parser, the type is noted as an integer, but you'll notice that the value is not "zero x 4824" (0x4824). That little "zero x" at the beginning was the cue to the YAML parser that we're talking about a hexadecimal value here. And what it's done is it's translated that into a base 10 integer. 
It's done the same with the octal value octal being base eight. For those of you who are far enough away from math class, our normal numbers are base 10. So it'd be 0123456789 and then we go over to the next column. For octal, that would be 01234567 and then we go over to the next column. So this is actually a value of 8, and YAML this because we preceded the number with a zero. 
Okay, so you need to make sure you understand these things, because if you just have a leading zero in front of your integers, and all of a sudden you're getting weird, crazy values, your YAML parser may think that you're dealing in base eight with octal values. 
It does recognize dates. Now remember, I showed you timestamps, some parsers will recognize the actual timestamp value. This one does not
You'll notice that all of these were recognized as Booleans. They're just strings, but they were recognized as Booleans. So On/Off, True/False, Yes/No were all recognized as Booleans. Now, as far as Yes/No, be careful, because this depends on the version of YAML. In YAML 1.1, Yes/No is recognized as a Boolean. However, in YAML 1.2, it is not; it's just a string.
So be careful. Ah, and of course, you know, just putting your quotes around something is going to make it into a string, even if it normally would be recognized as something else. 
Going back to our data types, there are other ways that we can deal with data types, and that would be to sort of force what we want on them. 
picture: !!binary |
 R0lGODdhDQAIAIAAAAAAANn
 Z2SwAAAAADQAIAAACF4SDGQ
 ar3xxbJ9p0qa7R0YxwzaFME
 1IAADs=
---
this_is_an_integer: !!float 1
---
this_is_a_float: !!int "1"
So if I were to come in here and save the file this time, clear this. So I can go ahead and tell YAML specifically that I want to use this built-in type. So this is binary data, and I want to force this to be a float, and I can try and force it to be an integer. But let's see what happens when we do all that. 
picture <class 'bytes'> b"GIF87a\r\x00\x08\x00\x80\x00\x00\x00\x00\x00\xd9\xd9\xd9,\x00\x00\x00\x00\r\x00\x08\x00\x00\x02\x17\x84\x83\x19\x06\xab\xdf\x1c['\xdat\xa9\xae\xd1\xd1\x8cp\xcd\xa1L\x13R\x00\x00;"
this_is_an_integer <class 'float'> 1.0
this_is_a_float <class 'int'> 1
Okay. So if I come back here. All right. So you'll notice now that our binary data has been translated. Okay. Our float has been made into a float now. And also our integer, even though this was in quotes, was translated into an integer. But of course, if you had the word "one", you that obviously is not going to fly.
One more thing on how these work. This is not so much about data types. But remember, I told you, we were talking about these formats here. So a lot of times in YAML, you're going to have multi-line data, and you're going to have to deal with it. There are two ways that YAML deals with multi-line data. 
example: >
       HTML goes into YAML
       without modification

message: |       <blockquote style="font: italic 1em serif">       <p>"Three is always greater than two,           even for large values of two"</p>
      <p>--Author Unknown</p>        </blockquote>
In this case, you'll notice I'm using a greater than sign (>). I want to take all of this text, and I want it to be just one string. Now notice, I have indented it here. Because if I were to put this out here...
example: >
       HTML goes into YAML
without modification
then it would be an error because the parser wouldn't know that it was part of the example. So let's look at the difference between the greater than sign and the pipe (|). 
Alright, so if I were to come back here, save this file and run it...
example <class 'str'> HTML goes into YAML without modification

message <class 'str'> <blockquote style="font: italic 1em serif"> <p>"Three is always greater than two,    even for large values of two"</p>
<p>--Author Unknown</p> </blockquote>
... let's take a look. What I see here is that our first string has been put into a single line, even though it was on multiple lines. However, when I use the pipe, the pipe then goes ahead, and it sets it up so that the formatting is preserved. You'll notice everything is preserved, including this blank line here. So if you need to preserve your spaces, within a block, you can use the pipe to do that. 
Now, notice all of this whitespace is gone. These are these are up against the left margin here, even though there's plenty of whitespace here, because technically speaking, as part of the data, that whitespace does not exist. That whitespace is solely to identify that that text is part of the message object. 
So, one place that you see this come up a lot is in things like ConfigMaps for Kubernetes. 
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: default
data:
  special.how: very
  weight: 42
  picture: |
  R0lGODdhDQAIAIAAAAAAANn
  Z2SwAAAAADQAIAAACF4SDGQ
  ar3xxbJ9p0qa7R0YxwzaFME
  1IAADs=
So, you can see an example of this, where you've got different different pieces of data specified by different names. And you're, you're being very specific about what you want. Oh, quick note here, notice, that you can put a dot (.) in your value names, so your associative arrays can have that for your names. 
Another place that this comes up a lot is in passing secrets to Kubernetes. 
apiVersion: v1
kind: Secret
metadata:
  name: mysecret
type: Opaque
data:
  username: !!base64 YWRtaW4=
  password: !!base64 MWYyZDFlMmU2N2Rm
Again, we are using the base64 type here. My parser doesn't recognize it, but the parser that's used by Kubernetes does. So that's where, again, it's important to know your parser, know what you are working with. 

YAML tools

If you've ever sat down with just a plain text editor, that's all YAML is. You can use whatever you want: EMACS, vi, you know, whichever religious order you are part of, feel free. But there are definitely tools that are on the market that are there to kind of help you use YAML more easily. 
For example, I love the fact that there are so many open source projects. Also, as I said earlier, I wasn't aware of a language that didn't have a YAML parser. So if you go to yaml.org, you will see tons and tons and tons and tons of options. So you can definitely go there, pick whatever you need from whatever language: there's test suites, there's different versions, and so on. 
There are also tools that will allow you to easily work with your YAML. For example, this is a whole collection online at onlineyamltools.com. By the way, I don't work for any of these. I'm not being compensated. They're just fun things that I have found. So for example, if I were to take my data here it's already color coded. 
And for example, I use VSCode. VSCode already understands YAML because I have the YAML plugin. So it does that for me. But if I didn't, I could take it in here. And it would color code for me, or I could use minify. 
Minify takes, what it does is it takes your YAML and makes it smaller. Now remember our spacing and our indents. All that's crucial to YAML; you can't really minify YAML itself. But you can turn it into its JSON version, and minify that. 
There are also a ton of tools for converting, you know, YAML to JSON, YAML to XML, vice versa, you can also convert them to a class. So for example, if I wanted to convert this to a class, here's my class definition in Python, and then I could then, you know, set my data, and so on. So this is something that I would then want to use when I was doing my programming. 
Linting is one of my favorite things here. So a lot of times what will happen is you'll wind up with a complicated object. So for example, let me try to go back here. Let's say I have a complicated demo object, and it's not working for some reason, and I can't figure out why. What I can do is run it through a YAML linter. And what that will do is, it will find if there is an error, so in this case, there is an error and it'll clean it up to what it thinks you want. So be careful. 
And then one last thing, StrictYAML. There is a movement to create something that is a little less loosey goosey and more predictable. The good thing about having things flexible is they're easy to use. The bad thing about having things flexible is they can be really difficult to use. So in this case, you can check out StrictYAML
So that was part 1!  Please join us for Getting Started with Kubernetes, Part 2: Creating K8s objects with YAML on March 18!

 

Choose your cloud native journey.

Whatever your role, we’re here to help with open source tools and world-class support.

GET STARTED

Join Our Exclusive Newsletter

Get cloud-native insights and expert commentary straight to your inbox.

SUBSCRIBE NOW