We are excited to announce the availability of Time Series Libraries in Watson Studio Spark Environments starting today (October 8, 2020).

This library, developed by IBM Research, includes a full set of time series functionality that is not available in any other competing offerings. It joins our IBM Research Assets, Geospatial functionality, Data Skipping, and Parquet Encryption libraries as fully supported features by Watson Studio Spark Environments. 

The time series library allows users to perform various key operations on time series data, including construction of a collection of time series, imputation functions (like segmentation), transformers, reducers, joins, and machine learning functions (such as forecasting, clustering, and discriminatory sequence mining). The library supports various time series types, including numeric, categorical, and arrays.

Examples of time series data include the following:

  • Stock share prices and trading volumes
  • Clickstream data
  • Electrocardiogram (ECG) data
  • Temperature or seismographic data
  • Network performance measurements
  • Network logs
  • Electricity usage as recorded by a smart meter and reported via an Internet of Things data feed

Key features of the Time Series Libraries in Watson Studio Spark Environments

I. Data model

  • A core data model for univariate and multivariate time series  
  • Time Reference Systems for handling different timestamp representations
  • Support for aperiodic, duplicate, and time of order timestamps
  • Spark RDD and dataframe extensions for timeseries
  • Numeric and categorical timeseries
  • Lossless and lossy compression

II. Transformation and segmentation functions

  • Math: Mean, variance, skew, correlations, PAA, SAX, covariance matrix, Graphical Gaussian Model, etc.
  • Statistical tests: Augmented Dickey-Fuller, Ljung-box, Granger causality
  • Distance metrics: Dynamic Time Warping, Damerau Levenshtein, Longest Common Subsequence, Jaro-winkler,
  • Timeseries reconciliation: Hungarian algorithm, Earth mover distance
  • Change point detection: CU-SUM, Bayesian, Gaussian
  • Segmentation: Window, Record-based, Burst-based, Anchor, Regression

III. Forecasting functions

  • ARIMA
  • Holt-Winters
  • BATS
  • Vector auto-regression
  • Anomaly detection

IV. Joins

  • A complete suite of temporal joins, including inner, outer, left-outer, right-outer, left-inner, and right-inner supported

V. SQL extensions

VI. Spark machine learning

  • Sequence mining
  • Timeseries clustering: K-means, K-shape, Motif-based, Cluster drift detection 
  • Data connectors for feature engineering that provide Spark data frame iterators to TensorFlow and Sci-kit learn.

For full list of functions and how to get started, please refer to the documentation.

Learn more about data lakes in the IBM Cloud

If you would like to know more about time series use case on IBM Cloud, please reach out to Kiran Guduguntla or Josh Rosenkranz.

More from Analytics

Reimagine data sharing with IBM Data Product Hub

3 min read - We are excited to announce the launch of IBM® Data Product Hub, a modern data sharing solution designed to accelerate data-driven outcomes across your organization. Today, we're making this product generally available to our clients across the world, following its announcement at the IBM Think conference in May 2024. Data sharing has become the lifeblood of modern organizations, fueling growth and driving innovation. But traditional approaches to data sharing can often be a bottleneck constricting the seamless sharing of data.…

In preview now: IBM watsonx BI Assistant is your AI-powered business analyst and advisor

3 min read - The business intelligence (BI) software market is projected to surge to USD 27.9 billion by 2027, yet only 30% of employees use these tools for decision-making. This gap between investment and usage highlights a significant missed opportunity. The primary hurdle in adopting BI tools is their complexity. Traditional BI tools, while powerful, are often too complex and slow for effective decision-making. Business decision-makers need insights tailored to their specific business contexts, not complex dashboards that are difficult to navigate. Organizations…

IBM unveils Data Product Hub to enable organization-wide data sharing and discovery

2 min read - Today, IBM announces Data Product Hub, a data sharing solution which will be generally available in June 2024 to help accelerate enterprises’ data-driven outcomes by streamlining data sharing between internal data producers and data consumers. Often, organizations want to derive value from their data but are hindered by it being inaccessible, sprawled across different sources and tools, and hard to interpret and consume. Current approaches to managing data requests require manual data transformation and delivery, which can be time-consuming and…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters