December 16, 2022 By Daniel Pittner 3 min read

How to query your log archives directly from object storage.

Today, we are thrilled to announce the release of a new feature that will change the way you access and analyze your log data. With IBM Cloud Data Engine, you can now query your log archives directly from object storage, without the need to download and store them locally.

How to use SQL for analyzing unstructured logs

Log data is typically generated by various systems and applications, and the schema of the data can change over time as new systems and applications are added and existing ones are updated. This makes it difficult to query log data using SQL because the schema is not fixed and can change from one log entry to the next.

To address these challenges, you can now query log archives as text. Let’s look at an example:

WITH logs ( 
SELECT get_json_object(value, "$._source._host") as host,
from_unixtime(get_json_object(value, "$._source._ts") / 1000, 'yyyy-MM-dd HH:mm:ss') as timestamp,
get_json_object(value, "$._source._file") as file,
get_json_object(value, "$._source._line") as line
FROM cos://us-geo/logArchives STORED AS TEXT )
SELECT * FROM logs LIMIT 200

First, the query uses the WITH clause to define a derived table named logs that contains the structured log data extracted from the unstructured log archives. The logs table is defined using the SELECT clause and a series of get_json_object functions, which are used to extract the specific fields of interest from the unstructured log data.

This is a powerful approach because it allows the query to work with structured data, which is much easier to query and analyze than unstructured data. By extracting the specific fields of interest and creating a structured table, the query can use standard SQL syntax to filter, sort and aggregate the data as needed.

One of the key benefits of this new capability is the ability to take advantage of parallel query processing for gzip-compressed data. The secret sauce of speed for the parallel processing of gzip is its ability to divide the data being compressed or decompressed into smaller chunks and process those chunks simultaneously — using multiple processors or cores — to achieve orders of magnitude better query execution times.

What insights can you gain from analyzing log archives with SQL?

Here are some examples of how you can use this new capability to gain insights from your historical IBM Cloud log archives:

  • Identify and troubleshoot issues with your applications: By querying your log data, you can easily identify any errors or issues that may have occurred in the past and may be impacting the performance of your applications. This can help you troubleshoot problems and improve the reliability and stability of your applications.
  • Monitor system performance and usage: By analyzing your log data, you can gain insights into how your systems were being used and how they were performing in the past. This can help you identify trends and patterns and make data-driven decisions to optimize your system performance and improve the user experience.
  • Analyze user behavior and preferences: By querying your log data, you can gain insights into how your users were interacting with your applications and what features they were using in the past. This can help you understand their needs and preferences and make data-driven decisions to improve the user experience and drive engagement.

Check out more examples in the documentation.

Get started with Data Engine and log analysis archiving

Overall, this new capability of IBM Cloud Data Engine provides a powerful and flexible way to query and analyze your historical log data, helping you gain valuable insights and make data-driven decisions. We are excited to see how you will use it to unlock the value of your log data and drive success for your business.

Learn more about IBM Cloud Data Engine.

More from Announcements

IBM Hybrid Cloud Mesh and Red Hat Service Interconnect: A new era of app-centric connectivity 

2 min read - To meet customer demands, applications are expected to be performing at their best at all times. Simultaneously, applications need to be flexible and cost effective, and therefore supported by an underlying infrastructure that is equally reliant, performant and secure as the applications themselves.   Easier said than done. According to EMA's 2024 Network Management Megatrends report only 42% of responding IT professionals would rate their network operations as successful.   In this era of hyper-distributed infrastructure where our users, apps, and data…

IBM named a Leader in Gartner Magic Quadrant for SIEM, for the 14th consecutive time

3 min read - Security operations is getting more complex and inefficient with too many tools, too much data and simply too much to do. According to a study done by IBM, SOC team members are only able to handle half of the alerts that they should be reviewing in a typical workday. This potentially leads to missing the important alerts that are critical to an organization's security. Thus, choosing the right SIEM solution can be transformative for security teams, helping them manage alerts…

IBM and MuleSoft expand global relationship to accelerate modernization on IBM Power 

2 min read - As companies undergo digital transformation, they rely on APIs as the backbone for providing new services and customer experiences. While APIs can simplify application development and deliver integrated solutions, IT shops must have a robust solution to effectively manage and govern them to ensure that response times and costs are kept low for all applications. Many customers use Salesforce’s MuleSoft, named a leader by Gartner® in full lifecycle API management for seven consecutive times, to manage and secure APIs across…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters