Known issues for Analytics Engine powered by Apache Spark

The following known issues and limitations apply to Analytics Engine powered by Apache Spark.

Analytics Engine powered by Apache Spark pods are failing after OCP upgrade

Analytics Engine pods are failing after performing OCP upgrade from version 4.14.15 to 4.14.46.

You must clean up the pods by deleting helm release:

{
  'helm ls
  helm delete <kernel-id>
}

Unable to view the custom configurations set at Analytics Engine CR

Applies to: 5.1.2

When you add customer configuration in Analytics engine Spark post the installation, You must restart the following pods:

  • spark-hb-control-plane
  • spark-hb-ui pods
  • spark-hb-deployer-agent

Spark notebooks fail to recognize CA Certificates

Applies to: 5.1.2

When injecting a custom CA certificate into the Cloud Pak for Data namespace, though it works fine in Python 3.11 notebooks, it fails in Spark. Even with the correct labels (cpd-platform-ca-certs=true), Spark notebooks throw an SSLCertVerification error due to missing certificates.

Spark 3.4-R43 notebook kernel does not support FIPS 140-2 compliant encryption

Applies to: 5.1

The Spark 3.4-R43 notebook kernel scenarios does not support FIPS (Federal Information Processing Standard) compliant encryption (FIPS 140-2).

Delay in starting Spark application

Applies to: 5.1

When using observability agents such as Instana or Dynatrace in the Kubernetes cluster, the agent processes running within the container might collect metrics, traces, and logs from both the application container(s) and the pod environment. This additional processing can occasionally lead to a delay in the startup time of Spark applications, particularly affecting the Spark driver process. To resolve this, allocate one additional CPU core to the Spark driver process.

Downtime for upgrade from previous Cloud Pak for Data Version to 5.0.x

Applies to: 5.1

Upgrading Analytics Engine powered by Apache Spark from previous Cloud Pak for Data Version to 5.0.x causes downtime during the upgrade process.

Job or kernel fails with PVC not found for tethered namespaces

Applies to: 5.1

When a Spark instance is created from a non-dataplane tethered namespace, Spark jobs and kernels will fail with a message similar to the following:

{
  'type': 'server_error',
  'code': 'cluster_creation_error',
  'message': 'Could not complete the request. Reason - FailedScheduling. Detailed error - 0/9 nodes are available: 9 persistentvolumeclaim "volumes-home-vol-pvc" not found., From - ibm-cpd-scheduler'
}

Workaround: You must create a service instance using a dataplane tethered namespace to resolve this issue.

Timeout message when submitting a Spark job

Applies to: Spark applications using V2 or V3 APIs only.

The expected behaviour by the Spark service when you submit a Spark application is that you create a SparkContext or SparkSession at the beginning of your Spark application code. When you submit the Spark job via the REST API, it returns the Spark application ID once the SparkContext is successfully created.

However, if you don't create a SparkContext or SparkSession:

  • At the beginning of the Spark application
  • At all in the Spark application or if your application is in plain Python, Scala or R

the REST API will wait for your application to complete which can lead to a REST API timeout. The reason is that the Spark service expects the Spark application to have started, which is not the case if you are running a plain Python, Scala or R application. This application will be listed in Jobs UI even though the REST API timed out.

Parent topic: Service issues