December 7, 2021 By Vidyasagar Machupalli 3 min read

Learn how to archive your Event Streams Kafka data to Object Storage using SQL Query. This process, called stream landing, can be set up using the Terraform scripts provided in this post. 

You can easily archive data to IBM Cloud Object Storage for long-term storage or to gain insight by leveraging interactive queries or big data analytics. You can achieve this through the Event Streams UI, where topics can be selected and linked to Cloud Object Storage buckets, with data automatically and securely streamed using the fully-managed IBM Cloud SQL Query service. All data is stored in Parquet format, making it easy to manage and process. Check out “Streaming to Cloud Object Storage by using SQL Query” for more info. 

In this post, you will set up the Cloud Object Storage stream landing using Terraform. 

What is Terraform?

Terraform is an open-source “Infrastructure as Code” tool created by HashiCorp.

A declarative coding tool, Terraform enables developers to use a high-level configuration language called HCL (HashiCorp Configuration Language) to describe the desired “end-state” cloud or on-premises infrastructure for running an application. It then generates a plan for reaching that end-state and executes the plan to provision the infrastructure:

Streaming to Cloud Object Storage by using SQL Query.

Let’s get started 

If you have Terraform set up on your machine, follow the steps below:

  1. Open a terminal or command prompt on your machine, clone the GitHub repository and move to the directory:
    git clone https://github.com/IBM-Cloud/stream-landing-terraform
    cd stream-landing-terraform
  2. Create the local.env file from the template file provided in the repo and update the environment variables accordingly. Once updated, source the file:
    cp template.local.env local.env
    source local.env
  3. You can now run the individual Terraform commands to provision the required IBM Cloud services:
    terraform init 
    terraform plan 
    terraform apply

Use the IBM Cloud Schematics UI

Alternatively, you can use the IBM Schematics UI. You don’t need to install anything on your machine:

  1. Navigate to Schematics Workspaces on IBM Cloud and click on Create workspace.
  2. Under the Specify Template section, provide https://github.com/IBM-Cloud/stream-landing-terraform under GitHub or GitLab repository URL
  3. Select terraform_v0.14 as the Terraform version and click Next.
  4. Provide the workspace name — stream-landing — and choose a resource group and location.
  5. Click Next and then click Create.
  6. You should see the Terraform variables section. Fill in the variables as per your requirement by clicking the action menu next to each of the variables.
  7. Scroll to the top of the page to Generate (terraform plan) and Apply (terraform apply) the changes.
  8. Click Apply plan and check the progress under the Log. (Generate plan is optional.)

To understand more about Terraform and IBM Cloud Schematics, check this blog post: “Provision Multiple Instances in a VPC Using Schematics.” In short, you can run any Terraform script just by simply pointing to the Git repository with the scripts.

This is what the Terraform scripts do:

  1. Create a new resource group and provision resources under the group.
  2. Create a Key Protect service with a root key.
  3. Provision an Event Streams service with a topic.
  4. Provision a Cloud Object Storage service with a bucket.
  5. Provision a SQL Query service for stream landing.
  6. Stream landing permissions and authorizations.

Test stream landing

To produce messages to the event streams service, you can use tools like kcat (formerly Kafkacat) or Event Streams sample producer.

  1. Verify that the specified prefix in IBM Cloud Object Storage is filled with Parquet objects by navigating to the Object Storage service under your resources.
  2. Check the status of all streaming jobs in the SQL Query UI. 
  3. Alternatively, use the REST API of SQL Query to get the list and the details of running stream landing jobs. 
  4. In the Event Streams UI, you also get information about the active stream landing jobs per topic. Using Event Streams, you can view and stop the landing configuration.

Further reading

If you have any queries, feel free to reach out to me on Twitter or on LinkedIn

Was this article helpful?
YesNo

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters