December 19, 2019 By Michael Haide 4 min read

Learn how to revive your archives, allowing you to work on log data from IBM Log Analysis with LogDNA using IBM Cloud SQL Query.

IBM Cloud Log Analysis with LogDNA is an easy-to-use service for analyzing logs, alerting on them, or debugging applications. Typically, logging services keep data for only a relatively short amount of time—30 days or less—in an active state of interactive querying and ad-hoc exploration. That’s because logs are constant streams of data—a firehouse of bits and bytes. Unless that data times out, you are going to have either operational issues or a sky-high bill. 

That is why Log Analysis with LogDNA provides an archival functionality, allowing users to have highly performant and active indexing of their log data, while enabling them, over time, to offload that data to IBM Cloud Object Storage to meet compliance/regulatory requirements.

But what if archived logs weren’t just for evidence? What if they were easily accessible data streams that could help drive insights and improvements for application operations, performance, and resiliency? What if you could perform complex timeseries queries on these data streams? With IBM Cloud SQL Query, we can help you turn your archived logs into insights with simple, serverless SQL. 

Read more below on how to revive your archives, allowing you to work on log data from IBM Log Analysis with LogDNA using IBM Cloud SQL Query. 

Archive log data on IBM Cloud Object Storage

IBM Log Analysis with LogDNA keeps your log data for a maximum of 30 days, and if you like to keep your log data longer, LogDNA provides a feature to archive them on IBM Cloud Object Storage. To enable this feature, click on the settings icon within your LogDNA instance interface and fill out the required settings:

Ensure that you have created an IBM Cloud Object Storage instance together with added service credentials beforehand. Within the service credentials, you find the required information, such as API Key, Endpoint, and the Instance ID. Add that information and the created bucket name into the LogDNA archiving settings fields and click save to enable archiving. 

After a maximum of two days, you should see an exported log file stored on IBM Cloud Object Storage as Gzip JSON:

Use IBM SQL Query to work on archived log data

A log file gets stored every day, including all data collected within the last 24 hours. In case your LogDNA instance retrieves a large amount of data, an hourly log data archiving could be enabled. The problem with that option is that a small amount of rows could be stored in more than one zip file. You can find more details in the LogDNA documentation

Use IBM SQL Query to work on this archived log data efficiently in the IBM Cloud. An easy way to start is to create an IBM Cloud SQL Query Lite instance. It limits you to the ability to run just one query at a time and to scan 30 GB maximum per day. When using a Standard plan, you can run up to five queries in parallel and just pay for the data scanned at $5 per TB/Scanned. 

We highly recommend first transforming the json.gz objects to partitioned JSON objects because zipped files cannot be accessed in parallel and, therefore, reading that kind of data is non-performant. You may wonder why compressed files like Parquet are not even better. This would be correct from a performance point of view, but as the schema data of log files is typically not stable, this could lead to problems reading data from different days. When using JSON, the underlying engine of SQL Query (Apache Spark) always analyzes the used schema and therefore avoids such consistency problems. 

In case you are quite sure that your log schema data is stable, Parquet as an output format would be more efficient both by increasing performance and lowering amount of data scanned. 

Here is an example of an SQL Query statement to transform your zipped logDNA data into partitioned JSON objects:

     date_format(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss'), 'yyyy') AS _year, 
     dayofyear(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss')) AS _dayofyear,         
     date_format(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss'), 'HH') AS _hour 
     cos://us-south/sql-query-logdna-dev/a591844d24.2019-07-17.72.json.gz STORED AS JSON 
     cos://us-south/result/logDNA STORED AS JSON PARTITIONED BY (_year, _dayofyear, _hour)

In the statement, the timestamp information—_ts—of each row is used to generate the year, day of year, and hour information. The result of the above query stored in IBM Cloud Object Storage reflects the specified partitioning as Hive style partitioning. For example, you could see _hour=00 in the object name, which means that all log data belonging to that hour is included in that object. This conversion step can be automated using cloud functions. The details can be found in our data-lake Github repository.

After the above preparation steps, you can start to run SQL on the log data. A simple example is to find all ERROR log rows in the timeframe between 7 am and 8 am on July 17th. 

Only the required partitions are scanned in the following statement:

FROM cos://us-south/result/logDNA/ STORED AS JSON
where _dayofyear=198 and _hour=07 and _source.level = "ERROR"
INTO cos://us-south/result/errorextract STORED AS JSON

SQL Query also provides native timeseries constructs for performing complex timeseries functions like segmentation, temporal statistics, temporal joins, interpolation, and forecasting. A simple relevant query in this context is to identify hourly averages. This is illustrated in the below statement (that takes as input a timeseries view on the log data; you can find further details on this in the SQL Query documentation):

SELECT node, status,
          TS_SEGMENT_BY_TIME(ts_index(ts, 1), 3600*1000, 3600*1000))
FROM da_ts_table

This statement could easily be changed to apply very different transformations. For instance, getting the average request time when a status code 500 occurs:

     TS_SEGMENT_BY_ANCHOR(ts, TS_ANCHOR_INDEX(TS_ANCHOR_GTE(500.0), 0), 1800*1000, 1800*1000))

Now you have the base foundation and pipeline for performing highly scalable analytics on your log data with simple SQL. If you want to take this example to the next level, feel free to explore our Jupyter Notebook

Learn more


Was this article helpful?

More from Cloud

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

A clear path to value: Overcome challenges on your FinOps journey 

3 min read - In recent years, cloud adoption services have accelerated, with companies increasingly moving from traditional on-premises hosting to public cloud solutions. However, the rise of hybrid and multi-cloud patterns has led to challenges in optimizing value and controlling cloud expenditure, resulting in a shift from capital to operational expenses.   According to a Gartner report, cloud operational expenses are expected to surpass traditional IT spending, reflecting the ongoing transformation in expenditure patterns by 2025. FinOps is an evolving cloud financial management discipline…

IBM Power8 end of service: What are my options?

3 min read - IBM Power8® generation of IBM Power Systems was introduced ten years ago and it is now time to retire that generation. The end-of-service (EoS) support for the entire IBM Power8 server line is scheduled for this year, commencing in March 2024 and concluding in October 2024. EoS dates vary by model: 31 March 2024: maintenance expires for Power Systems S812LC, S822, S822L, 822LC, 824 and 824L. 31 May 2024: maintenance expires for Power Systems S812L, S814 and 822LC. 31 October…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters