Have you ever heard the expression “SRE is what happens when you ask a software engineer to design an operations team”? To quote the IBM Learn Hub article on SRE: “Site reliability engineering (SRE) uses software engineering to automate IT operations tasks (e.g., production system management, change management, incident response, even emergency response) that would otherwise be performed manually by systems administrators (sysadmins).”

The role of the SRE is to keep the organization focused on what matters most to users — ensuring that the platform and services are reliable. If you are familiar with the traditional disciplines of development and operations, SRE bridges the two. The goal of SRE is to codify every aspect of operations in order to build resiliency within infrastructure and applications. This implies that reliability deliverables are to be delivered via the same continuous integration (CI)/continuous delivery (CD) pipeline as development, managed by using version control tools and checked for issues by using test frameworks.

In summary, SRE implies operations to be a software delivery problem. SRE uses a software engineering approach to solve operational problems.

In an Embedded SRE model (described in the SRE model section), development and SRE collaborate throughout the lifecycle of minimum viable product (MVP) delivery. As MVP progresses through technical feature specification and development, the SRE collaborates with Development and OM to ensure cloud-native practices are enabled. For example, they identify critical user journeys, associated key SLIs and SLOs for each component.

The SRE should understand service design, including frontend, backend, business logic and database dependencies. This understanding is critical in order to document all failure points and deliver automation for service restoration. By using service design knowledge, the SRE should ensure delivery of the required automation that is described in the cloud native section.

As illustrated in the following diagram, Development and SRE collaborate to deliver functionality and reliability for MVP by using the same CI/CD delivery pipelines and release processes while focusing on their success metrics: