Spark settings for instance groups

Once you have set up the basic settings for your instance group, you can configure Spark settings for the instance group. The cluster management console includes a Spark tab for working with Spark settings; it's the only instance group components tab that automatically shows (you add the other component tabs, as needed).

The Spark tab also contain sections to address other instance group information related to the Spark version, such as sections for consumers, resource groups and plans, containers, and data connectors.

The latest Spark version installed on your system is the version of Spark that your instance group uses. To add a new Spark version, see Adding Spark versions.

By default, Spark runs under the built-in IBM® JRE, in the $EGO_TOP/jre directory:
  • You can use any of the other supported Spark packages, and use your own JDK. If you do, note the following information about the JAVA_HOME value for your JDK:
    • For Spark 3.0.1, you can manually specify the JAVA_HOME value to the non-default OpenJDK 11 location.
    • For Spark 3.3.1, you can specify the JAVA_HOME value to the non-default OpenJDK 11, OpenJDK 17 location.
    • If you require using these higher versions of JRE to run Spark 3.0.1 or 3.3.1 instance groups, follow the appropriate Spark configuration in the following sections.

To use OpenJDK 11 (for Spark 3.0.1 or 3.3.1 instance groups)

  1. Customize the JAVA_HOME configuration for a Spark 3.0.1 or 3.3.1 instance group to point to the home path for OpenJDK 11.
  2. To address a compatibility issue, modify the Spark configuration for this Spark 3.0.1 or 3.3.1 instance group:
    1. Click Workload > Instance Groups > and select the instance group created.
    2. In the instance group overview, click Manage > Configure.
    3. Click the Spark tab, then the Configuration button, and search for SPARK_MASTER_OPTS.
    4. For the SPARK_MASTER_OPTS parameter found, enter:
      '--add-opens java.base/jdk.internal.ref=ALL-UNNAMED'
      If installation is to a shared location, add -Xshare:off in the SPARK_MASTER_OPTS parameter, to disable the class shared data feature. For example:
      '--add-opens java.base/jdk.internal.ref=ALL-UNNAMED -Xshare:off'
    5. Click Close.
    6. Click the Modify Instance Group button, start the instance group, and then run application.

To use OpenJDK 17 (for Spark 3.3.1 instance groups)

  1. Customize the JAVA_HOME configuration for a Spark 3.3.1 instance group to point to the home path for OpenJDK 17.
  2. To address a compatibility issue, modify the Spark configuration for this Spark 3.3.1 instance group:
    1. Click Workload > Instance Groups > and select the instance group created.
    2. In the instance group overview, click Manage > Configure.
    3. Click the Spark tab, then the Configuration button, and search for spark.driver.extraJavaOptions.
    4. For the spark.driver.extraJavaOptions parameter found, enter this value:
      '--add-opens=java.base/java.lang=ALL-UNNAMED 
      --add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
      --add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
      --add-opens=java.base/java.io=ALL-UNNAMED 
      --add-opens=java.base/java.net=ALL-UNNAMED 
      --add-opens=java.base/java.nio=ALL-UNNAMED 
      --add-opens=java.base/java.util=ALL-UNNAMED 
      --add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
      --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
      --add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
      --add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
      --add-opens=java.base/sun.security.action=ALL-UNNAMED 
      --add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
      --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'
    5. Search for spark.executor.extraJavaOptions and enter the same value as you provided for the spark.driver.extraJavaOptions parameter.
    6. Click Close.
    7. Click the Modify Instance Group button, start the instance group, and then run application.