How to query your log archives directly from object storage.

Today, we are thrilled to announce the release of a new feature that will change the way you access and analyze your log data. With IBM Cloud Data Engine, you can now query your log archives directly from object storage, without the need to download and store them locally.

How to use SQL for analyzing unstructured logs

Log data is typically generated by various systems and applications, and the schema of the data can change over time as new systems and applications are added and existing ones are updated. This makes it difficult to query log data using SQL because the schema is not fixed and can change from one log entry to the next.

To address these challenges, you can now query log archives as text. Let’s look at an example:

WITH logs ( 
SELECT get_json_object(value, "$._source._host") as host,
from_unixtime(get_json_object(value, "$._source._ts") / 1000, 'yyyy-MM-dd HH:mm:ss') as timestamp,
get_json_object(value, "$._source._file") as file,
get_json_object(value, "$._source._line") as line
FROM cos://us-geo/logArchives STORED AS TEXT )

First, the query uses the WITH clause to define a derived table named logs that contains the structured log data extracted from the unstructured log archives. The logs table is defined using the SELECT clause and a series of get_json_object functions, which are used to extract the specific fields of interest from the unstructured log data.

This is a powerful approach because it allows the query to work with structured data, which is much easier to query and analyze than unstructured data. By extracting the specific fields of interest and creating a structured table, the query can use standard SQL syntax to filter, sort and aggregate the data as needed.

One of the key benefits of this new capability is the ability to take advantage of parallel query processing for gzip-compressed data. The secret sauce of speed for the parallel processing of gzip is its ability to divide the data being compressed or decompressed into smaller chunks and process those chunks simultaneously — using multiple processors or cores — to achieve orders of magnitude better query execution times.

What insights can you gain from analyzing log archives with SQL?

Here are some examples of how you can use this new capability to gain insights from your historical IBM Cloud log archives:

  • Identify and troubleshoot issues with your applications: By querying your log data, you can easily identify any errors or issues that may have occurred in the past and may be impacting the performance of your applications. This can help you troubleshoot problems and improve the reliability and stability of your applications.
  • Monitor system performance and usage: By analyzing your log data, you can gain insights into how your systems were being used and how they were performing in the past. This can help you identify trends and patterns and make data-driven decisions to optimize your system performance and improve the user experience.
  • Analyze user behavior and preferences: By querying your log data, you can gain insights into how your users were interacting with your applications and what features they were using in the past. This can help you understand their needs and preferences and make data-driven decisions to improve the user experience and drive engagement.

Check out more examples in the documentation.

Get started with Data Engine and log analysis archiving

Overall, this new capability of IBM Cloud Data Engine provides a powerful and flexible way to query and analyze your historical log data, helping you gain valuable insights and make data-driven decisions. We are excited to see how you will use it to unlock the value of your log data and drive success for your business.

Learn more about IBM Cloud Data Engine.


More from Announcements

IBM TechXchange underscores the importance of AI skilling and partner innovation

3 min read - Generative AI and large language models are poised to impact how we all access and use information. But as organizations race to adopt these new technologies for business, it requires a global ecosystem of partners with industry expertise to identify the right enterprise use-cases for AI and the technical skills to implement the technology. During TechXchange, IBM's premier technical learning event in Las Vegas last week, IBM Partner Plus members including our Strategic Partners, resellers, software vendors, distributors and service…

Introducing Inspiring Voices, a podcast exploring the impactful journeys of great leaders

< 1 min read - Learning about other people's careers, life challenges, and successes is a true source of inspiration that can impact our own ambitions as well as life and business choices in great ways. Brought to you by the Executive Search and Integration team at IBM, the Inspiring Voices podcast will showcase great leaders, taking you inside their personal stories about life, career choices and how to make an impact. In this first episode, host David Jones, Executive Search Lead at IBM, brings…

IBM watsonx Assistant and NICE CXone combine capabilities for a new chapter in CCaaS

5 min read - In an age of instant everything, ensuring a positive customer experience has become a top priority for enterprises. When one third of customers (32%) say they will walk away from a brand they love after just one bad experience (source: PWC), organizations are now applying massive investments to this experience, particularly with their live agents and contact centers.  For many enterprises, that investment includes modernizing their call centers by moving to cloud-based Contact Center as a Service (CCaaS) platforms. CCaaS solutions…

See what’s new in SingleStoreDB with IBM 8.0

3 min read - Despite decades of progress in database systems, builders have compromised on at least one of the following: speed, reliability, or ease. They have two options: one, they could get a document database that is fast and easy, but can’t be relied on for mission-critical transactional applications. Or two, they could rely on a cloud data warehouse that is easy to set up, but only allows lagging analytics. Even then, each solution lacks something, forcing builders to deploy other databases for…