Kubernetes Persistent Volumes backed by IBM Cloud Object Storage Buckets

8 February 2021

6 min read

An IBM Cloud Object Storage (COS) bucket can be used as the backing store for a PVC

When you imagine a file system, you are probably thinking of the block storage provided by disk drives. Object storage buckets can also be used for file system volumes on Kubernetes and might fit well into your application. For example, buckets can be managed outside of the application with a variety of tools and the IBM Cloud® Console. Getting data in and out is a breeze.

This post starts from scratch and demonstrates the creation of a cluster, buckets, volumes and applications. Off-the-shelf container images for nginx (link resides outside ibm.com) and jekyll (link resides outside ibm.com) will be used to demonstrate, so no application coding is required.

     

    Background

    IBM Cloud supports fully managed Kubernetes clusters. “Storing data on Block Storage for VPC” explains how to use high-performance block storage for Kubernetes Persistent Volume Claims (PVCs) (link resides outside ibm.com). PVCs serve as the backing read/write storage that are mounted as volumes in pods.

    An IBM Cloud Object Storage (COS) bucket can be used as the backing store for a PVC. Buckets might fit your use case better than block storage. Examples include the following:

    • Import or export data
    • Share data between pods
    • Utilize COS buckets using file system APIs

    These are a few things to think about when choosing a bucket versus block storage:

    • Price: Bucket objects are persisted for a few pennies per GB per month. Pay as you grow and start for free.
    • Multiple pods: Multiple pods can mount the same PVC bucket.
    • Operational simplicity: COS bucket can be easily populated and/or read in variety of work flows.
    • Resiliency: Global location and resiliency options include cross region and regional options. 
    • Performance: Block storage and buckets have drastically different characteristics. Verify against application requirements.

    This post demonstrates how to create everything from scratch. The scripts and Terraform configuration are available on GitHub. Use them to create all of the resources required to see PVC buckets in action. The basic steps are as follows:

    • Create a VPC and Kubernetes cluster.
    • Create a COS instance with associated Kubernetes secrets and storage classes.
    • Create the Kubernetes resources: PVC, deployment, service and ingress.
    • Verify it works.
    • Create a blog in a second PVC using Jekyll (link resides outside ibm.com).
    • Serve the blog with nginx.

    A security feature of the bucket is also highlighted — the bucket IP “allow list” limits bucket access to the Kubernetes cluster VPC.

    OK, lets do it!

    Prerequisites

    The provision steps are going to be done from the CLI. This will allow you to move these steps to a CI/CD pipeline or into IBM Cloud Schematics, over time. See “Getting started with solution tutorials” for help with the following required tools:

    • Git
    • IBM Cloud CLI
    • Terraform
    • Docker
    • Jq

    You should be the account owner or have enough privileges to create resources. In the terminal, make your own copy of the local.env file and make the suggested edits. The prereq script will verify the tools are installed. This step is complete when you see: >>> Prerequisites met.

    git clone https://github.com/IBM-Cloud/kubernetes-cos-pvc 
    cd kubernetes-cos-pvc
    cp template.local.env local.env 
    edit local.env 
    source local.env 
    ./000-prereq.sh
    

    Container Registry Service

    The default template.local.env has IBMCR set to “false”, which makes this step optional. Initialize IBMCR to “true” to use the IBM Container Registry. If pod access to hub.docker.com is disabled, set to “true”. This step will copy the required Docker images into a newly created namespace in the IBM Container Registry. Image names will resemble us.icr.io/$BASENAME/:

    ./010-container-registry.sh
    

    Cluster

    You can use an existing VPC-based cluster or execute this step to create a cluster. Either way, the local.env CLUSTER_NAME is required. The creation is done in the cluster/ directory where you will find the Terraform configuration. The script will create a terraform.tfvars, then execute Terraform.

    Take a look at cluster/main.tf to see all of the resources created. For example:

    • Resource “ibm_is_vpc” — create a VPC
    • Resource “ibm_container_vpc_cluster” — create a cluster
    ./020-create-cluster.sh
    

    It can take over 30 minutes to create a cluster. Once it completes, check out the cluster in the cloud console.

    Resources

    The rest of the resources are created in the terraform/ directory. The script will create a terraform.tfvars, then execute Terraform.

    TLDR; skip down to 025-create-resources.sh.

    IBM Cloud Object Storage (COS) bucket storage classes are installed by the resources in cos_storage_class.tf. If you had previously followed the Installing the IBM Cloud Object Storage plug-in, there are comments in cos_storage_class.tf that can be used to avoid this step.

    The main.tf file creates the rest of the resources:

    • COS instance and secret keys that are then used to populate a couple of Kubernetes secrets
    • PVC configured to create a bucket automatically with limited access
    • Deployment for the nginx image
    • Service to expose the deployment pods
    • Ingress to expose the service to the public

    Optionally, you can open main.tf and take a closer look a few of the resources. TLDR — skip down and execute the shell script.

    Here is a cut down of the PVC:

    resource "kubernetes_persistent_volume_claim" "pvc" {   
     metadata {     
       name = local.pvc_nginx_claim_name   
       annotations = {      
          "ibm.io/auto-create-bucket" : "true"   
          "ibm.io/set-access-policy" : "true"
      spec {    
        storage_class_name = "ibmc-s3fs-standard-regional"
    

    The basics are pretty simple. The annotations allow some configuration (for example automatically creating the bucket and setting the access policy), which means setting the allow list of IPs for the bucket (demonstrated later). The full list of annotations and storage classes can be found in the documentation at “Storing data on IBM Cloud Object Storage.”

    Here is a cut down of the nginx deployment:

    resource "kubernetes_deployment" "nginx" {  
     spec {  
         spec {   
           container {     
             name    = "nginx"   
             image   = var.imagefqn_nginx    
             command = ["sh", "-c", "echo '#Success' >
     /usr/share/nginx/html/index.html ; exec nginx -g 'daemon off;'"]        
             port {        
              container_port = "80"   
            volume_mount {        
              name       = "volname"        
              mount_path = "/usr/share/nginx/html"   
            volume {      
              name = "volname"      
              persistent_volume_claim {     
                claim_name = local.pvc_nginx_claim_name
    

    The command echoes a string to the default site file for nginx (index.html). We can test this later to verify success. The volume_mount adds the volume to a directory within the deployment’s pod. The volume configuration ties the PVC to the deployment.

    Two more configuration files demonstrate the ability to write contents to a bucket for reading and also share the PVC with another deployment. jekylblog.tf:

    • Resource “kubernetes_persistent_volume_claim” “jekyllblog” — PVC and associated bucket
    • Resource “kubernetes_deployment” “jekyllblog” — Deployment that generates a blog and starts a web server. These commands do the work:

    Then, jekyllnginx.tf has a deployment that mounts the same PVC:

    • “kubernetes_deployment” “jekyllnginx” — Deployment that creates a symlink to the same PVC. These are the commands:
      • cd /usr/share/nginx
      • rm -rf html
      • ln -s /blog/kubernetes-cos-pvc/example/jekyllblog/myblog/_site html
      • exec nginx -g ‘daemon off;’

    Ingress exposes all three of these services with the subdomain nginx.<ingress domain>, jekyllblog,<ingress domain> and jekuyllnginx.<ingress domain>. Here is a cut down:

    resource "kubernetes_ingress" "example_ingress" {  
      spec {   
       tls {    
         secret_name =
    data.ibm_container_vpc_cluster.cluster.ingress_secret  
         hosts       =
    [data.ibm_container_vpc_cluster.cluster.ingress_hostname]  
       }  
      rule {    
       host =
    "nginx.${data.ibm_container_vpc_cluster.cluster.ingress_hostname}" 
          http {    
            path {
              backend {
                service_name = 
    kubernetes_service.nginx.metadata[0].name
                 service_port = 80
         rule {
           host =
    "jekyllblog.${data.ibm_container_vpc_cluster.cluster.ingress_hostname}"
          ...
         rule {
           host = "jekyllnginx.${data.ibm_container_vpc_cluster.cluster.ingress_hostname}"
           ...
    

    Write configuration values to terraform/terraform.tfvars and then execute the Terraform configuration:

    025-create-resources.sh
    

    Once this is completes, check out the following:

    • IBM Cloud Object Storage instance (find it in the storage section of the resource list)
    • Navigate to the COS instance and then the bucket with nginx in the name. Look for the following:
      • The Objects section indicates Access denied — this is because of the Access policies.
      • The Access policies section in the Authorized IPs panel has a list of IP addresses that can access this bucket. The VPC’s cloud service endpoint source addresses are listed. See “VPC behind the curtain” for more details.
      • For Kubernetes clusters, click your cluster, open the Kubernetes dashboard to see the rest of the resources created or use the kubectl command line like kubectl get deployments. The names all start with $BASENAME*:
        • deployments
        • pods
        • services
        • ingresses
        • persistentvolumeclaims

    Test

    Run the test script. It will read the simple nginx service by using curl. It is expecting the string success that was put into index.html. Two URLs are displayed — you should open each of these to verify that the blog is being served by the other two deployments. It can take a couple of minutes for the blogs to become available:

    ./030-test.sh
    

    Clean up

    Clean it all up with the following command:

    ./040-cleanup.sh
    

    Conclusion

    Kubernetes Persistent Volume Claims (link resides outside ibm.com) (PVC) bucket for hosting static content works well, and a production environment could build the static contents in the CI/CD pipeline to create a new release.

    A PVC bucket may help with legacy applications that use local volume for storage:

    • Create a container image for the application. 
    • Upload the local volume to a bucket, and mount a PVC bucket for the application.
    • Manage backup and restore through bucket operations.

    The PVC bucket can be used to export application data for archives, analysis and more.

    Learn more

    Author

    Powell Quiring

    Offering Manager