We are excited to announce the availability of Time Series Libraries in Watson Studio Spark Environments starting today (October 8, 2020).

This library, developed by IBM Research, includes a full set of time series functionality that is not available in any other competing offerings. It joins our IBM Research Assets, Geospatial functionality, Data Skipping, and Parquet Encryption libraries as fully supported features by Watson Studio Spark Environments. 

The time series library allows users to perform various key operations on time series data, including construction of a collection of time series, imputation functions (like segmentation), transformers, reducers, joins, and machine learning functions (such as forecasting, clustering, and discriminatory sequence mining). The library supports various time series types, including numeric, categorical, and arrays.

Examples of time series data include the following:

  • Stock share prices and trading volumes
  • Clickstream data
  • Electrocardiogram (ECG) data
  • Temperature or seismographic data
  • Network performance measurements
  • Network logs
  • Electricity usage as recorded by a smart meter and reported via an Internet of Things data feed

Key features of the Time Series Libraries in Watson Studio Spark Environments

I. Data model

  • A core data model for univariate and multivariate time series  
  • Time Reference Systems for handling different timestamp representations
  • Support for aperiodic, duplicate, and time of order timestamps
  • Spark RDD and dataframe extensions for timeseries
  • Numeric and categorical timeseries
  • Lossless and lossy compression

II. Transformation and segmentation functions

  • Math: Mean, variance, skew, correlations, PAA, SAX, covariance matrix, Graphical Gaussian Model, etc.
  • Statistical tests: Augmented Dickey-Fuller, Ljung-box, Granger causality
  • Distance metrics: Dynamic Time Warping, Damerau Levenshtein, Longest Common Subsequence, Jaro-winkler,
  • Timeseries reconciliation: Hungarian algorithm, Earth mover distance
  • Change point detection: CU-SUM, Bayesian, Gaussian
  • Segmentation: Window, Record-based, Burst-based, Anchor, Regression

III. Forecasting functions

  • ARIMA
  • Holt-Winters
  • BATS
  • Vector auto-regression
  • Anomaly detection

IV. Joins

  • A complete suite of temporal joins, including inner, outer, left-outer, right-outer, left-inner, and right-inner supported

V. SQL extensions

VI. Spark machine learning

  • Sequence mining
  • Timeseries clustering: K-means, K-shape, Motif-based, Cluster drift detection 
  • Data connectors for feature engineering that provide Spark data frame iterators to TensorFlow and Sci-kit learn.

For full list of functions and how to get started, please refer to the documentation.

Learn more about data lakes in the IBM Cloud

If you would like to know more about time series use case on IBM Cloud, please reach out to Kiran Guduguntla or Josh Rosenkranz.

Categories

More from Analytics

Data science vs data analytics: Unpacking the differences

5 min read - Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Data science is an area of expertise that combines many disciplines such as mathematics, computer science, software engineering and statistics. It focuses on data collection and management of large-scale structured and unstructured data for various academic and business applications. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to…

Financial planning & budgeting: Navigating the Budgeting Paradox

5 min read - Budgeting, an essential pillar of financial planning for organizations, often presents a unique dilemma known as the “Budgeting Paradox.” Ideally, a budget should give the most accurate and timely idea of anticipated revenues and expenses. However, the traditional budgeting process, in its pursuit of precision and consensus, can take several months. By the time the budget is finalized and approved, it might already be outdated.In today's rapid pace of change and unpredictability, the conventional budgeting process is coming under scrutiny.It's…

How Macmillan Publishers authored success using IBM Cognos Analytics

5 min read - Macmillan Publishers is a global publishing company and one of the “Big Five” English language publishers. If you're a reader, chances are good you've read a book from Macmillan. They published many perennial favorites including Kristin Hannah’s The Nightingale, Bill Martin’s Brown Bear, Brown Bear, what do you see? and some of the more recent bestsellers such as The Silent Patient by Alex Michaelides, Identity by Nora Roberts and Razorblade Tears by S. A. Cosby. It’s no wonder then that Macmillan…

MLOps and the evolution of data science

7 min read - The advancement of computing power over recent decades has led to an explosion of digital data, from traffic cameras monitoring commuter habits to smart refrigerators revealing how and when the average family eats. Both computer scientists and business leaders have taken note of the potential of the data. The information can deepen our understanding of how our world works—and help create better and “smarter” products. Machine learning (ML), a subset of artificial intelligence (AI), is an important piece of data-driven…