As of last week, IBM Cloud SQL Query now supports a wide variety of time series functions as native components of the service.
The library, developed by IBM Research, joins our Geospatial functionality as fully supported by the IBM Cloud SQL Query service and team.
SQL Query is a serverless, pay-per-query offering that can manipulate and analyze semi-structured and structured data in IBM Cloud Object Storage. Its new SQL-native time series support is industry-leading by its breadth of directly available capability inside a SQL engine. It significantly simplifies definition and execution of time series processing problems through a declarative language (SQL) instead of having to write hundreds of lines of custom code. SQL-native time series processing allows for very high productivity and achieves low time to value.
Key features of the time series functions
- Full suite of SQL-style temporal joins on aperiodic and unaligned time series
- Multi-typed and multi-time series support for numeric and categorical (string) data
- Time Reference System (akin to Coordinate Reference Systems for geospatial) for handling timestamps at multiple granularities
- Rich support for segmentation based on time, number of records, markers, and anchors
- Built-in SQL constructs for interpolation, similarity, and forecasting
Why time series support?
The growth of data volumes is dominated by machine-generated data, such as IoT sensor feeds, connected cars, or user behavior logs. All this data is time stamped by nature, which is also a key dimension for deriving valuable business insights.
Capturing and analyzing the state of a systems over time allows us to make more informed predictions about future events, observe real-time changes, and capture historical anomalies. Whether it be IoT devices, autonomous driving systems, or network performance, many of the systems and products we use today are constantly emitting time series data that can be used to optimize and improve performance, safety, and robustness.
Coupled with the massive scale-out capabilities of SQL Query and IBM Cloud Object Storage, we can now provide unlimited retention and analytics on this at petabyte scale at an extremely low cost and barrier to entry. Does your team know SQL? Great, they can now tap into time series insights on the IBM Cloud.
Adding sophisticated time series functionality to IBM Cloud SQL Query is an essential realization to our vision of building a cloud-native, serverless data lake for our clients. We aim to deliver a platform that makes data simple, allowing you to seamlessly store, manage, life-cycle, and analyze data and develop analytic solutions at scale.
What does it do?
Our time series functionality includes, but isn’t limited to, the following:
- Artifact creation
- Exploding and flattening
- Statistical insights
- Forecasting
- Filtering
- Temporal join and align
- Interpolation
Check out the full cadre of features and get going with sample queries in our UI.
Benefits of using SQL Query’s time series support
Time series functions allow SQL Query to filter, cleanse, and analyze trillions of observational events per day at an order of magnitude less than a typical database. For example, storing 1 Terabyte of Parquet data and scanning all that data 100 times would only cost:
1,000 GB *.022/GB =$22/month
(100TB/38)*$5/TB-Scanned = $13/month
Total = $35/month
We recommend converting your data to Parquet using IBM Cloud SQL Query to dramatically decrease your total data scanned and improve the speed of your queries. For example, we’ve shown that converting to Parquet allowed us to scan 38x less data than a CSV required, which means that it’s 38x less expensive!
Learn more about Data Lakes in the IBM Cloud
For Jupyter notebook users, we also have an in-depth tutorial of using this functionality for data science.
To get in touch with IBM Cloud about a time series use case, you can reach out to Josh Rosenkranz (jmrosenk@us.ibm.com) or myself at Joshua.Mintz@ibm.com.