What is YAML?
Explore IBM's YAML solution Subscribe for AI updates
Coworkers looking at code on computer monitor

Published: 11 December 2023
Contributors: Tasmiha Khan, Michael Goodwin

What is YAML?

YAML is a versatile, human-readable data serialization language commonly used for writing configuration files.

It provides a standardized format for representing structured data in a way that is both easily understandable to humans and interpretable by machines. “YAML” is an acronym which stands for "YAML Ain't Markup Language" or "Yet Another Markup Language." The former is meant to underscore that the language is intended for data rather than documents.

At its core, YAML is designed with simplicity and readability in mind. It uses a clean and minimalistic syntax, relying on indentation, key-value pairs, and intuitive conventions. This approach allows developers and users to express complex data structures in a format that resembles natural language and is easy to comprehend at a glance.

The emphasis on human-readability makes YAML especially well-suited for various applications, including configuration (config) files and data exchange between different systems. Its straightforward and intuitive structure enhances its usability across different domains, enabling users to define and organize data in a clear and understandable manner. YAML supports Unicode characters, allowing the representation of a wide range of characters and symbols from different languages and character sets. Valid YAML results in specification without syntax errors.

YAML's adaptability makes it a versatile choice across a wide spectrum of applications. From configuration management to data exchange and automation, YAML's usability spans various domains, offering an accessible and structured means to represent and manage data.

Learn and operate Presto

Explore the free O'Reilly ebook to learn how to get started with Presto, the open source SQL engine for data analytics.

Related content

Register for the Gartner report

YAML syntax and attributes

There are various attributes and key elements within YAML syntax. It is vital to understand the structure, data types and conventions used in YAML files to ensure efficient data representation and readability.

Map (dictionary)

In YAML, dictionaries are represented as mappings. They are a collection of key-value pairs where each key is associated with a value. This data structure resembles the concept of dictionaries or maps found in various programming languages. 

Indentation

YAML syntax heavily relies on indentation and the number of spaces to represent the structure of data. Whitespaces—not tab characters, which are forbidden in YAML—are used to in YAML to denote hierarchy and nesting. Because YAML relies on indentation for structure, consistency throughout the YAML document is important.

Newlines represent line breaks, or the end of a line within YAML format used to separate different elements.

Quotation marks

For most scalars in YAML, quotation marks are not needed. However, quotation marks may be necessary to avoid confusion in certain scenarios, like around a text string that contains special characters and could be mistaken with YAML syntax. Or if you have a string that consists only of “true” that you do not want converted to a boolean, for instance. In cases like these, single or double quotes can be used, depending on the data and what needs to be expressed.

Key-value pairs

YAML employs a straightforward key-value pair format separated by a colon for the representation of data associations.  

For example:

profession: teacher

Sequences (arrays)

Sequences (arrays or lists in other languages), allow you to define a list of items in YAML. Indentation separates a sequence from the parent, and each list item starts with a dash (-) followed by a space. All items in the sequence must be indented the same amount.

For example:

fruits:
    - apple
    - orange
    - pear

Sequences can also be represented in a flow sequence using brackets and commas.1

fruits: [apple, orange, pear]

Data types

YAML supports various data types such as strings, integers, floats, booleans and null values. These data types provide flexibility in representing different kinds of information.

Comments

YAML supports comments denoted by the # symbol. Comments aid in adding explanations, notes or context within YAML files.

Multi-line strings

YAML supports multi-line strings, allowing text to span multiple lines without requiring explicit line breaks. Multi-line strings are useful for including blocks of text in YAML documents.

YAML files

YAML files typically use extensions like .yaml or .yml. Conventions in naming and structuring YAML files ensure consistency and proper interpretation of data. YAML files can also be read in Perl, Ruby and Python.

YAML, JSON and XML

YAML and JSON share similarities in data representation, however, YAML stands out for its readability, expressiveness and support for complex data structures. YAML is a superset of JSON, meaning that it contains all the features of JSON in addition to expanded features and commands.

JSON (JavaScript Object Notation) utilizes a more explicit syntax with braces {}, brackets [], and commas. While concise and widely used, JSON's syntax might become less readable, especially in larger datasets. JSON's support for data structures is comparatively limited, primarily featuring arrays, objects and scalar values.

JSON is often favored for its cross-compatibility for data interchange in web applications and APIs, while YAML is more commonly used in scenarios where human readability and more complex data structures are required, like configuration files and certain types of data documentation and exchange.2

When compared to XML, YAML offers a more concise and human-friendly alternative, emphasizing simplicity and ease of comprehension in data representation and exchange. YAML and XML have fundamental differences in syntax and purpose.

XML is highly structured, relying on explicit opening and closing tags, making it slightly confusing. In contrast, YAML employs a simpler and more natural language-like structure, focusing on readability through indentation and key-value pairs without explicit closing tags.

YAML use cases

YAML can be used with all programming languages and is often used for configuration files, as well as data exchange and documentation. Its human-readable format enhances documentation clarity.

YAML and DevOps

YAML plays a pivotal role in DevOps and is instrumental in automation, orchestration and configuration management. Within DevOps practices, YAML files serve as blueprints to define sequences of actions and configurations in an easily understandable format. These files are used to precisely outline the steps and procedures required for automation, allowing for clear and concise representation of complex workflows.

Infrastructure as code (IaC)

YAML is used to define infrastructure as code, which is the use of code, rather than manual processes, to define and manage IT infrastructure. IaC enables more efficient and consistent IT infrastructure configuration. YAML can be used to define the desired configuration of infrastructure like virtual machines, networks and storage, and to describe the relationship between IT infrastructure components.

Deployments

YAML is used to create deployment files for applications that specify app configurations, dependencies, resource limits and other information important to efficient application deployment and performance. YAML files help reduce deployment errors and increase application delivery speed through version-control and automation.

CI/CD pipeline configuration

YAML can play an important role in continuous integration and continuous delivery (CI/CD) pipelines, an important agile DevOps workflow. Similar to infrastructure configuration and deployments, YAML files are used to define the pipeline steps and targets, and ultimately help automate the CI/CD process.

YAML and DevOps tools

Many tools and programs used by DevOps teams leverage YAML, including:

Ansible

Ansible is an open-source automation software application that uses YAML-formatted files, known as playbooks, to define tasks and automation procedures. ​​YAML templates allow users to program the automation of repetitive tasks without knowledge of an advanced programming language.3

Using IBM watsonx™ Code Assistant for Red Hat® Ansible® Lightspeed, users can write a task in plain English and receive YAML code recommendations for automation tasks. These code recommendations are used to create Ansible Playbooks.3

Kubernetes

Kubernetes is an open-source container orchestration platform used for automating the deployment, scaling and management of containerized applications. Kubernetes works based on “states,” trying to reach a desired state from a current one based on specific instructions. YAML files can be used to create Kubernetes resources like pods, objects and deployments, as well as to specify and communicate the desired state of Kubernetes objects.

GitHub

GitHub, a web-based platform for version control and collaboration in software development, incorporates YAML for defining workflows. YAML-based configurations in GitHub repositories enable the setup of automated workflows for continuous integration and project management.

Docker Compose

Docker Compose is a tool for defining and running multi-container Docker applications.4  YAML files are used in Docker Compose to configure an application’s services.

Cross-language data sharing

YAML is language independent, making it ideal for cross-language data sharing. Once a YAML file is defined, it can be executed in other languages like Python or Ruby.

Log files

Log files are computer-generated textual data files that contain information about the operations and patterns within applications, systems, servers and other IT resources or devices. They are used to gauge resource performance and play a crucial role in system observability. Due to its simplicity, YAML is used to create intuitive, clean log files.

Advantages of YAML

YAML has become a popular data serialization language for several reasons, including its simplicity, compatibility and usefulness in creating configuration files.

Simplicity

YAML’s syntax resembles natural language structures. Its simplicity and minimalist design make it easily understandable for both developers and non-technical users, enhancing comprehension and reducing errors.

Use for configuration files

YAML is well-suited for configuration files due to its structured and readable format. It simplifies the process of defining configurations by using indentation and key-value pairs, making it manageable and adaptable for various software applications.

Compatibility

YAML's platform-independent nature ensures compatibility across different systems and programming languages, facilitating seamless data exchange and interoperability between various platforms and environments.

Tools for YAML processing

PyYAML is a prominent Python library used for parsing and working with YAML files in Python-based applications. It provides methods for loading YAML data into Python objects. PyYAML enables the conversion of YAML files into practical data structures within Python applications and vice versa.5

Tools like PyYAML and other YAML parsers and validators, like yamllint and YAML Validator, play an important role in preserving the accuracy, validity and integrity of YAML files. Their primary function involves validating YAML syntax, identifying errors and ensuring consistency within YAML documents. 

Related solutions
IBM watsonx Code Assistant for Red Hat Ansible Lightspeed

IBM watsonx Code Assistant for Red Hat Ansible Lightspeed demystifies the process of Ansible Playbook creation through generative AI-powered content recommendations. Purpose-built to accelerate IT automation, the product is designed to deliver automation content recommendations for an enhanced Ansible experience.

Explore watsonx Code Assistant for Red Hat Ansible Lightspeed Book a live demo

IBM watsonx Code Assistant

IBM watsonx Code Assistant leverages generative AI to accelerate development while maintaining the principles of trust, security and compliance at its core. Developers and IT operators can speed up application modernization efforts and generate automation to rapidly scale IT environments.

Explore IBM watsonx Code Assistant

Resources IBM watsonx Code Assistant for Red Hat Ansible Lightspeed technical preview

Sign up to try the technical preview of IBM watsonx Code Assistant for Red Hat Ansible Lightspeed.

Generative AI Code Automation Webinar Series

Learn from experts, market analysts and research scientists how generative AI can be applied to your ITOps to achieve business outcomes at scale.

Reshaping IT automation with IBM watsonx Code Assistant

Through automation and AI technologies, IT organizations can narrow skills gaps and enable developers to write quality code with greater efficiency.

watsonx Code Assistant for Red Hat Ansible Lightspeed Community

Learn from your peers, share ideas and collaborate with our community.

Take the next step

Developers and IT operators can speed up application modernization efforts and generate automation to rapidly scale IT environments with IBM watsonx Code Assistant, which leverages generative AI to accelerate development while maintaining the principles of trust, security and compliance at its core.

Explore watsonx Code Assistant
Footnotes

1 “How to represent arrays in YAML" (link resides outside ibm.com),” Tarun Telang, Educative, Inc., 2023

2 “What’s the difference between YAML and JSON” (link resides outside ibm.com), Amazon Web Services, 2023

“What is YAML” ( link resides outside ibm.com), Redhat.com, 3 March 2023

4 “Docker Compose overview” ( link resides outside ibm.com), Docker.com, 2023 

5 “Python YAML | Guide to Handing YAML Files” ( link resides outside ibm.com), Gabriel Ramuglia, 11 September 2023