Yaml Files Linting and Formatting

Keep the Yaml files in your project tidy is useful. Not only it improves code readability but also helps avoiding misunderstanding and bugs.

Luckily there are many tools available that helps with “linting” and formatting yaml file, I will introduce two of those tools in this small article. I will also cover the issue of handling long strings thoroughly. So let us start.

Linting

yamllint is an open-source command line tool written by Adrien Vergé in python and hosted on github under: https://github.com/adrienverge/yamllint/.

You can install it using pip with:

pip install --user yamllint

Once installed, the tool can be invoked directly from the command line with:

yamllint some_file.yaml

The tool checks for the validity of syntax, additionally it also highlights multiple usage of the same key, as well as non-breaking formatting issues such as lines’ width, trailing spaces, indentation, spaces before the comments.

The output would look like the following example depending on the content of the file:

yamllint some_file.yaml
./somme_file.yaml
  24:81     error    line too long (152 > 80 characters)  (line-length)
  26:81     error    line too long (156 > 80 characters)  (line-length)
  39:30     warning  truthy value should be one of [false, true]  (truthy)
  52:81     error    line too long (144 > 80 characters)  (line-length) 

The tool is well maintained and well documented, and the docs can be found under https://yamllint.readthedocs.io/en/stable/.

To use yamllint consistently, it is recommended to add a git hook that invokes yamllint before committing the code to the git repository.

One way to do that is to integrate yamllint with pre-commit a took that magically add git pre-commit hooks to any project. https://pre-commit.com/ is a wonderful tool that I encourage all developers to use to ensure simple mistakes, and bugs are not introduced accidentally to the code. It can be used to enforce code formatting styles. Integrating yamllint with pre-commit is detailed here: https://yamllint.readthedocs.io/en/stable/integration.html.

I you are into automation like me, you would want to know that yamllint exits with 0 when no errors nor warnings have been found and exists with non-zero otherwise.

It is also important to mention that yamllint can be imported as a python package and can be used programmatically to lint yaml files. In fact I use this technique as part of my test-suite to ensure yaml files in a project are passing linting tests for example.

Formatting

The output of yamllint can be used to direct manual formatting. Let’s take the following example:

yamllint some_file.yaml
./somme_file.yaml
  1:1       error    too many blank lines (1 > 0)  (empty-lines)
  2:1       warning  missing document start "---"  (document-start)
  26:14     warning  missing starting space in comment  (comments)
  26:13     warning  missing starting space in comment  (comments)
  28:14     warning  comment not indented like content  (comments-indentation)
  277:81    error    line too long (90 > 80 characters)  (line-length)
  299:27    error    no new line character at the end of file  (new-line-at-end-of-file)

The first two issues indicates that the some_file.yaml does not starts with “—” but with an empty line. To fix this issue, we can replace the empty line with “—” (without the quote). The next warnings concentrates on comments, missing starting space in comment which means the comment token # and the actual comment string after it are not separated by a space. To fix the issue we could manually introduce the space. If we fix the indentation of comments to match the indentation of the content, then we resolve the next warning related to comments at 28:14.

The error in 277:81 is stating that the line is simply too long for the rules set to the yamllint where the limit is 80 characters. This issue is not trivial to fix and that’s why we have dedicated a section for it later in this post, however, the general idea is to break the long string over several lines without introducing newline characters where those are not needed.

The last error can be fixed by adding a new line at the very end of the document.

Manual formatting works. What would be nice if yamllint would fix these issues as they are detected. Like black or other similar tools. Sadly this is not the plan for yamllint.

Luckily, someone thought of this issue already and wrote a hook that integrates with pre-commit and formats broken yaml files. The open-source pre-commit hook is called yamlfmt and can be found here: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt. To use hook add it to the .pre-commit file in combination with yamllint as explained in the documentation

- repo: https://github.com/adrienverge/yamllint.git
  rev: v1.27.1  # or higher tag
  hooks:
      - id: yamllint
        args: [--format, parsable, --strict]

- repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
  rev: 0.2.2  # or other specific tag
  hooks:
      - id: yamlfmt

The auto-formatting hook can fix most issues highlighted by yamllint but not all of them. Notably the long strings errors cannot be resolved with yamlfmt hence the need for more on manual formatting of long strings.

There are other tools that focuses on fixing yaml issues such as yamlfix that can be found under https://github.com/lyz-code/yamlfix, and yamlfixer that can be found under https://github.com/opt-nc/yamlfixer. I might alter or expand this post to include my experience with both tools once I have finished testing them thoroughly.

Handling long strings

Now let us focus on handling long strings. But what does the error actually means? In short it means there are more characters in one line that the rule defined by yamllint allows.

The default value is 80 characters. Any line that has more characters than 80, will trigger the error.

Let’s say we have the following line in the yaml file:

Key: 'this is my very very very very very very long string'

How do we break it into several lines. Would something line the following works?

Key: 'this is my very very very ' +
     'long string'

The answer is: no that will not work.

Actually, there is a very nice Stackoverflow.com Q/A article that covers this topic. In this blog post I am using parts of the best answer by Steve Bennett in that article, for my safe-keeping more than anything.
There are several ways to write a string over multiple lines, here are methods I care about:

Folded style >:

This style removes single newlines within the string (but adds one at the end, and converts double newlines to singles). It allow characters such as \ and " without escaping, and add a new line (\n) to the end of the string. Extra leading space is retained and causes extra newlines. Example:

Key: >
  this is my very very very
  long string

and the result will be:

this is my very very very long string\n

Literal Style |:

This style turns every newline within the string into a literal newline and adds one at the end. Example:

Key: |
  this is my very very very 
  long string

the result will be:

this is my very very very\nlong string\n

More on this topic in the Literal style in the yaml documentation.

Folded and Literal Block styles with block chomping indicator (>-|->+|+)

You can control the handling of the final new line in the string, and any trailing blank lines (\n\n) by adding a block chomping indicator character:

use >- or |- to “strip” the line feed and remove the trailing blank lines.

use >+ or |+ to “keep” the line feed and the trailing blank lines.

Example:

Key: >-
  this is my very very very
  long string

The result will be:

this is my very very very long string

Note the missing newline at the end.

Flow styles ( , "')

Is another way to spread a string over several lines but it will not be covered here. Please refer to the stackoverflow answer referenced earlier, or to the original yaml documentation for details about this method.

Conclusion

In this article I discussed several tools for linting, formatting and fixing yaml files. The goal is to keep files in the repository clean of formatting issues and to avoid introducing accidental mistakes that might waste precious development time.