Migrating to a new version of Apache Spark
IBM® Z Platform for Apache Spark (FMID HSPK130) is built on Apache Spark. Different service (PTF) levels of the product might provide different versions of Apache Spark. Perform the following steps if you are migrating from one version of Apache Spark to another.
Before you begin
IBM Platform for Apache Spark - Spark, Version 3.2.4 build for Hadoop 3.3.5
Built with Java JRE 1.8.0 IBM ZOS build 8.0.8.25 - pmz6480sr8fp25-20240328_01(SR8 FP25)
Built from Git zos_Spark_3.2.4.5 (revision dd2218dd09818049c3ee64f0a67ecc6d6bba50b2)
Built via Jenkins job zAnalytics/IzODA/zSpark/zos_Spark_3.2.4.5, Build#2
Build flags: -Phive -Phive-thriftserver -Phadoop-3.2
Before installing the new version of Apache Spark
Complete the following steps to understand the impact of migrating to a newer version of Apache Spark on your applications and to update your level of Java™.
- Review the new functionality in the new version of Apache Spark and the changes to the Spark APIs to
determine any changes that you might need to make to your applications before migration. Use the
information at the following links to learn about the changes. Be sure to consider the changes for
each in between Apache Spark version.
For example, if you are migrating from Apache Spark 3.2.0 to 3.5.0, you need to consider
the changes for Apache Spark 3.3.0, and
3.4.0.
- For high-level information about new features; changes, removals, and deprecations made to the
Apache Spark APIs; performance
improvements and known issues, see the following links.
- If your previous Apache Spark version is 3.2.0, start here:
- The Spark SQL and Spark ML projects have additional migration changes for each version of Apache Spark. See the following resources for details.
- The following Spark projects have no specific migration steps. However, they might document new behaviors as of the Spark version.
- For high-level information about new features; changes, removals, and deprecations made to the
Apache Spark APIs; performance
improvements and known issues, see the following links.
- Based on your findings from the information in step 1, update your applications as needed to work with the new Spark version.
- If you are using an older Java level than the one indicated in the RELEASE file, consider updating your Java level.
Ensure that any other open source or third-party software in your environment that interacts with Spark supports the new version of Apache Spark. For example, some versions of Scala Workbench do not work with the new versions of Apache Spark.
Installing the new version of Apache Spark
Install IBM Z Platform for Apache Spark, FMID HSPK130 and its service updates (PTFs).
For installation guidelines, see Program Directory for IBM Z Platform for Apache Spark (GI13-5806-01 or later).
After installing the new version of Apache Spark
- Recompile applications that use any of the changed Spark APIs.
- Examine any new
Apache Spark configuration options and
make necessary changes to your spark-defaults.conf and
spark-env.sh configuration files.
For the current list of configuration options, see http://spark.apache.org/docs/3.2.4/configuration.html or http://spark.apache.org/docs/3.5.0/configuration.html. A new Apache Spark version might introduce new configuration options as well as deprecate existing ones.
Note: For the contents of the spark-defaults.conf and spark-env.sh configuration files, you can find IBM-supplied default values in spark-defaults.conf.template and spark-env.sh.template. - If you use the
spark-submitorspark-sqlcommand line interface, you must either invoke them from a writable directory or change your configuration files. For more information, see Updating the Apache Spark configuration files.