Important:

IBM Cloud Pak® for Data Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Spark libraries

You can enhance your Spark capabilities with libraries in Analytics Engine powered by Apache Spark instances.

Analytics Engine powered by Apache Spark provides these libraries:

  • The data skipping library can significantly boost the performance of SQL queries by skipping over irrelevant data objects or files based on a summary metadata associated with each object. See Using the data skipping libraries.
  • The time series library allows you to perform various key operations on time series data, including segmentation, forecasting, joins, transforms, and reducers. See Time series analysis.
  • The spatio-temporal library expands your data science analysis to include location analytics by gathering, manipulating and displaying imagery, GPS, satellite photography and historical data. See Using the geospatio-temporal library.
  • Parquet modular encryption protects sensitive information in Parquet files. See Parquet encryption.

Parent topic: Apache Spark