Stream Landing Kafka Data to Object Storage using Terraform

3 min read

Learn how to archive your Event Streams Kafka data to Object Storage using SQL Query. This process, called stream landing, can be set up using the Terraform scripts provided in this post. 

You can easily archive data to IBM Cloud Object Storage for long-term storage or to gain insight by leveraging interactive queries or big data analytics. You can achieve this through the Event Streams UI, where topics can be selected and linked to Cloud Object Storage buckets, with data automatically and securely streamed using the fully-managed IBM Cloud SQL Query service. All data is stored in Parquet format, making it easy to manage and process. Check out "Streaming to Cloud Object Storage by using SQL Query" for more info. 

In this post, you will set up the Cloud Object Storage stream landing using Terraform. 

What is Terraform?

Terraform is an open-source “Infrastructure as Code” tool created by HashiCorp.

A declarative coding tool, Terraform enables developers to use a high-level configuration language called HCL (HashiCorp Configuration Language) to describe the desired “end-state” cloud or on-premises infrastructure for running an application. It then generates a plan for reaching that end-state and executes the plan to provision the infrastructure:

Streaming to Cloud Object Storage by using SQL Query.

Streaming to Cloud Object Storage by using SQL Query.

Let's get started 

If you have Terraform set up on your machine, follow the steps below:

  1. Open a terminal or command prompt on your machine, clone the GitHub repository and move to the directory:
    git clone https://github.com/IBM-Cloud/stream-landing-terraform
    cd stream-landing-terraform
  2. Create the local.env file from the template file provided in the repo and update the environment variables accordingly. Once updated, source the file:
    cp template.local.env local.env
    source local.env
  3. You can now run the individual Terraform commands to provision the required IBM Cloud services:
    terraform init 
    terraform plan 
    terraform apply

Use the IBM Cloud Schematics UI

Alternatively, you can use the IBM Schematics UI. You don't need to install anything on your machine:

  1. Navigate to Schematics Workspaces on IBM Cloud and click on Create workspace.
  2. Under the Specify Template section, provide https://github.com/IBM-Cloud/stream-landing-terraform under GitHub or GitLab repository URL
  3. Select terraform_v0.14 as the Terraform version and click Next.
  4. Provide the workspace name — stream-landing — and choose a resource group and location.
  5. Click Next and then click Create.
  6. You should see the Terraform variables section. Fill in the variables as per your requirement by clicking the action menu next to each of the variables.
  7. Scroll to the top of the page to Generate (terraform plan) and Apply (terraform apply) the changes.
  8. Click Apply plan and check the progress under the Log. (Generate plan is optional.)

To understand more about Terraform and IBM Cloud Schematics, check this blog post: "Provision Multiple Instances in a VPC Using Schematics." In short, you can run any Terraform script just by simply pointing to the Git repository with the scripts.

This is what the Terraform scripts do:

  1. Create a new resource group and provision resources under the group.
  2. Create a Key Protect service with a root key.
  3. Provision an Event Streams service with a topic.
  4. Provision a Cloud Object Storage service with a bucket.
  5. Provision a SQL Query service for stream landing.
  6. Stream landing permissions and authorizations.

Test stream landing

To produce messages to the event streams service, you can use tools like kcat (formerly Kafkacat) or Event Streams sample producer.

  1. Verify that the specified prefix in IBM Cloud Object Storage is filled with Parquet objects by navigating to the Object Storage service under your resources.
  2. Check the status of all streaming jobs in the SQL Query UI. 
  3. Alternatively, use the REST API of SQL Query to get the list and the details of running stream landing jobs. 
  4. In the Event Streams UI, you also get information about the active stream landing jobs per topic. Using Event Streams, you can view and stop the landing configuration.

Further reading

If you have any queries, feel free to reach out to me on Twitter or on LinkedIn

Be the first to hear about news, product updates, and innovation from IBM Cloud