IBM Cloud SQL Query now supports automating the transport and transformation of data from IBM Cloud Object Storage to Db2 on Cloud.

Building with flexible, scalable, cloud native services is fun, right? You know what isn’t fun though? Capturing, storing, filtering, and analyzing all the terabytes of data produced by your applications round the clock, such as audit trails, clickstream logs, or IoT sensor streams.

The lack of a cost-efficient architecture to extract value from your data can offset the beneficial economics of the cloud and leave game-changing insights on the table. That’s why here at IBM Cloud, we want to enable architectures that give you precise control over your storage costs and analytical capabilities. 

New capabilities for IBM Cloud SQL Query

IBM Cloud SQL Query offers users the power of serverless SQL over data stored in IBM Cloud Object Storage. With this latest release, we have added new capabilities to SQL Query that makes it easier than ever to build extensible data pipelines to stem the rising tides of data.

Data professionals can now take advantage of the power and scalability of Apache Spark-based SQL processing and native time series and location data processing in SQL Query to run highly parallelized data pipelines into their IBM Cloud data services.

Data pipelines, in general, reflect an ETL (Extract, Transform, Load) process that allows you to obtain more value from your data across multiple sources via data extraction and data transformation to downstream cloud services.

IBM Cloud SQL Query now automates pipelining of data from IBM Cloud Object Storage to IBM Cloud Databases such as IBM Db2 Warehouse on Cloud or Db2 on Cloud. Because Db2 natively has a constrained set of options to load data from object storage (supporting only CSV data format), IBM SQL Query’s latest offering opens up a broad range of data formats stored in IBM Cloud Object Storage to be seamlessly loaded into Db2, including Parquet.  

Serverless analytics and data transformation, such as those provided by SQL Query, are foundational capabilities of a cloud native data lake. While data warehousing is an established solution for analytics on big data, by connecting your Db2 Warehouse or Db2 on Cloud deployment to your IBM Cloud Object Storage-based data lake, you can gain more precise control of the data quality and lifecycle at orders-of-magnitude lower costs. For example, you can archive petabytes of data into IBM Cloud Object Storage and then only keep the most relevant subset of data in a high-performing database like Db2 Warehouse on Cloud.

Pipedreams or pipelines?

Data pipelines are the lifeline of analytics. The more versatile and automated they are, the easier you can on-ramp the necessary data to your analytic tools and services. With data pipelines, you can connect your data silos to analytics.

All too often, data acquisition and preparation are still a manual data-wrangling process as teams attempt to break down those silos. With a powerful serverless and SQL-based data pipeline, data extraction and data transformation processes can now be easily created, automated, and scaled to your daily data volumes.

Use case: Application log processing

In practice, many customers struggle with archiving and analyzing access and audit logs over the lifetime of compliance-mandated retention windows. To improve accessibility and lower costs, one might turn to a serverless data lake. Here is how it would look.

Most services on the IBM Cloud platform—like IBM Cloud Foundry Enterprise Edition, IBM Cloud Kubernetes Services, and IBM Cloudant—already automatically integrate Activity Tracker with LogDNA. This audit logging solution can automatically archive data into IBM Cloud Object Storage. Once it lands there, it can be transformed by SQL Query into a more query-friendly format like Apache Parquet and enriched with other data sources, such as your database logs from IBM Cloudant or third-party data sources. Once your audit data is prepared and enriched, it is available for performant ad-hoc analytics, but SQL Query can also move your data to a more performant warehousing service, such as Db2 Warehouse on Cloud, that can then visualized with any JDBC/ODBC driver, like Cognos Analytics.

This ETL reflects an example of effective pipeline architecture for teams building serverless security intelligence stacks with IBM Cloud SQL Query, Cloud Object Storage, and Db2 Warehouse.

Get the tutorial.

Opening the aperture: Advantage of the cloud

These new SQL-driven ETL capabilities make it easier for data professionals to acquire and prepare new data sources in the cloud (such as machine-generated or telemetry data) and to automatically load the data into Db2 to augment and extend your Db2-based analytic solutions.

This provides fully elastic and automated data on-ramp pipelines, from the durability and cost-effectiveness of Object Storage to the performance of the Db2 ecosystem. In practice, users can combine the elasticity and cost benefits of data acquisition and preparation in object storage with the quality of service of enterprise analytics.

Get started with IBM Cloud SQL Query

Learn more about Data Lakes in the IBM Cloud

More from Analytics

In preview now: IBM watsonx BI Assistant is your AI-powered business analyst and advisor

3 min read - The business intelligence (BI) software market is projected to surge to USD 27.9 billion by 2027, yet only 30% of employees use these tools for decision-making. This gap between investment and usage highlights a significant missed opportunity. The primary hurdle in adopting BI tools is their complexity. Traditional BI tools, while powerful, are often too complex and slow for effective decision-making. Business decision-makers need insights tailored to their specific business contexts, not complex dashboards that are difficult to navigate. Organizations…

IBM unveils Data Product Hub to enable organization-wide data sharing and discovery

2 min read - Today, IBM announces Data Product Hub, a data sharing solution which will be generally available in June 2024 to help accelerate enterprises’ data-driven outcomes by streamlining data sharing between internal data producers and data consumers. Often, organizations want to derive value from their data but are hindered by it being inaccessible, sprawled across different sources and tools, and hard to interpret and consume. Current approaches to managing data requests require manual data transformation and delivery, which can be time-consuming and…

A new era in BI: Overcoming low adoption to make smart decisions accessible for all

5 min read - Organizations today are both empowered and overwhelmed by data. This paradox lies at the heart of modern business strategy: while there's an unprecedented amount of data available, unlocking actionable insights requires more than access to numbers. The push to enhance productivity, use resources wisely, and boost sustainability through data-driven decision-making is stronger than ever. Yet, the low adoption rates of business intelligence (BI) tools present a significant hurdle. According to Gartner, although the number of employees that use analytics and…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters