While the 12-factor application guidelines are spot-on, there are another 7 factors are equally essential for a production environment
The 12-factor application provides a well-defined guideline for developing microservices and is a commonly used pattern to run, scale, and deploy applications. In the IBM Cloud Private platform, we follow the same 12-factor application guidelines for developing containerized applications. See Michael Elder's blog post "Kubernetes & 12-factor apps" to go through how we apply 12-factor towards Kubernetes model of container orchestration.
As we reflected on the principles of developing containerized microservices running in Kubernetes, we found that while the 12-factor application guidelines are spot-on, the following 7 factors are equally essential for a production environment:
- Factor XIII: Observable
- Factor XIV: Schedulable
- Factor XV: Upgradable
- Factor XVI: Least privilege
- Factor XVII: Auditable
- Factor XVIII: Securable
- Factor XIX: Measurable
Let's discuss what each means and why it is necessary to consider the additional factors.
Factor XIII: Observable
Apps should provide visibility about current health and metrics
Distributed systems can be a challenge to manage because multiple microservices work together to build an application. Essentially, many moving parts need to work together for a system to function. If one microservice fails, the system needs to detect it and fix it automatically. Kubernetes provides great capabilities to rescue, such as readiness and liveliness probes.
Kubernetes uses readiness probes to ensure the application is ready to accept traffic. If a readiness probe starts to fail, Kubernetes stops sending traffic to the pod until the readiness probe returns a success status.
For example, you have an application composed of three microservices: frontend, business logic, and databases. For this application, your frontend should have a readiness probe to check if business logic and databases are ready before accepting traffic.
See in the following animated image that no request is sent to the application instance until the readiness probe returns success:
You can use HTTP, Command, or TCP probe, and you can control probe configurations. For instance, you can specify how often they should run, what the success and failure thresholds are, and how long to wait for responses. There is one very important setting that you need to configure when using readiness probes, which is the initialDelaySeconds setting. Ensure the probe doesn’t start until the app is ready—if not set correctly, the application restarts itself constantly. See the following YAML snippet:
readinessProbe: # an http probe httpGet: path: /readiness port: 8080 initialDelaySeconds: 20 periodSeconds: 5
Kubernetes uses liveliness probes to check if your application is alive or dead. If your application is alive, then Kubernetes leaves it alone. If your application is dead, Kubernetes removes the pod and starts a new one to replace it. This validates the need for microservices to be stateless and disposable (Factor X). See the following animated image where Kubernetes restarts the pods once the liveliness probe fails:
A great benefit to using these probes is that you can deploy your applications in any order, without worrying about dependencies.
However, we found that the probes are not enough for a production environment. The applications usually have application-specific metrics that need to be monitored. Users set up threshold and alerts for these application-specific metrics (e.g., transactions per seconds).
IBM Cloud Private fills this gap with a completely secure monitoring stack comprised of Prometheus and Grafana enabled with a role-based access control model.
Prometheus scrapes targets from the metrics endpoint. Your application needs to define the metrics endpoint by using the annotation. See the following:
Prometheus then discovers the endpoint automatically and scrapes metrics from it as shown in the following animated image:
Factor XIV: Schedulable
Applications should provide guidance on expected resource constraints
Let's say that management picks your team to experiment with a project on Kubernetes. Your team works hard setting up the environment, and you end up with an application that is running with exemplary response time and performance. Another team then follows your lead—creates their application and hosts in the same environment. When the second application goes live, the original application starts experiencing performance degradation. When you start to troubleshoot, the first place to look is the compute resource assigned (CPU and memory) to your containers. It's very likely that your containers are starving for compute resources, and that leads into the question of how you can ensure compute resources for your applications.
Kubernetes has a great capability that allows you to set request and limits for the containers. Requests are guaranteed. If a container requests a resource, Kubernetes only schedules it on a node that can give it that resource. Limits, on the other hand, ensure a container never goes above a certain value.
See the YAML snippet below for setting compute resource:
Resources: requests: memory: “ 64Mi” cpu: “150m” limits: memory : “64Mi” cpu : “200m”
Another effective capability for administrators in a production environment is setting quota for namespaces. If a quota is set, Kubernetes does not provision containers that do not have request and limits set in that namespace. In the following image, resource quota is set for namespaces:
Factor XV: Upgradable
Apps must upgrade data formats from previous generations
Security or feature patches are often needed for applications running in production, and it is important for production applications to upgrade without service disruption. Kubernetes provides rolling updates for applications to upgrade with no service outage. With rolling updates, you can update one pod at a time without taking down the entire service. See the following animated image of a second version of an application, which can be rolled out with no downtime:
See the following YAML snippet:
minReadySeconds: 5 strategy: # indicate which strategy # we want for rolling update type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1
Pay attention to
maxSurge when enabling rolling update strategy.
maxUnavailable is an optional field that specifies the maximum number of Pods that can be unavailable during the update process. Though its optional, you want to set the value to ensure service availability.
maxSurge is another optional (but critical) field that tells Kubernetes the maximum number of pods that can be created over the desired number of pods.
Factor XVI: Least Privilege
Containers should be running with the least privilege
Not to sound pessimistic, but you should think of every permission you allow in your container as a potential attack, as seen in the next image. For example, if your container is running as root, then anyone with access to your container can inject malicious process into it. Kubernetes provides Pod Security Policies (PSP) that you can use to restrict access to your filesystem, host port, Linux capabilities, and more. IBM Cloud Private provides a set of out-of-the-box PSPs that can be associated when provisioning containers in a namespace.
See more details on using namespaces with Pod Security Policies.
Factor XVII: Auditable
Know what, when, who, and where for all critical operations
Auditability is critical for any actions performed on Kubernetes clusters or at the application. For example, if your application handles credit card transactions, you need to enable auditing to keep audit trails of each transaction. IBM Cloud Private leverages the cloud-agnostic industry standard format, Cloud Auditing Data Federation (CADF). See more details at Audit logging in IBM Cloud Private.
CADF event catches following information:
initiator_id: ID of the user that performed the operation
target_uri: CADF specific target URI, (for example: data/security/project)
action: The action being performed, typically: operation : resource_type
Factor XVIII: Securable (Identity, Network, Scope, Certificates)
Protect your app and resources from the outsiders
This factor deserves its own article. Suffice it to say that applications need end-to-end security when running in production. IBM Cloud Private address the following and more for security that is required for production environment:
- Authentication: Confirm identities
- Authorization: Validate what authenticated users can access
- Certificate management: Manage digital certificates, including creation, storage, and renew
- Data protection: Security measures for data in transit and at rest
- Network security and isolation: Prevent unauthorized users and process from accessing the network
- Vulnerability Advisor: Identify any security vulnerabilities in the images
- Mutation Advisor: Identify any mutation in containers
You can learn more from the IBM Cloud Private security guide.
Specifically, let’s talk about certificate manager. IBM Cloud Private Certificate Manager service is based on the open source Jetstack project. Certificate Manager is used to issue and manage certificates for services that run on IBM Cloud Private. It supports both self-signed and public certificates, fully integrated with kubectl and role based access control.
Factor XIX: Measurable
Application usage should be measurable for quota or chargebacks
At the end of day, IT central has to handle the cost as seen in the following image. The compute resources allocated to run the containers should be measurable, and organizations using the cluster should be accountable. Make sure you follow Factor XIV: Schedulable. IBM Cloud Private provides metering, which collects allocated compute resources for each container and aggregates at namespace scope for showback and chargeback.
I hope you have found this topic interesting and have checked off the factors you already use and plan to use the others next time.
If you'd like to learn more, check out the talk that Michael Elder and I gave at KubeCon 2019, Shanghai, about the 12 + 7 factors for the Kubernetes model of container orchestration.