Important:

IBM Cloud Pak® for Data Version 4.6 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.6 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Adding self-signed certificates in Analytics Engine Powered by Apache Spark

You can add your own self-signed certificates or CA certificates that are owned by your organization to the Spark truststore. You add certificates to securely connect between the Spark runtime and your resources, like the web server, IBM Cloud Object Storage and any databases.

You must be project administrator to add self-signed certificates to the Spark truststore.

To add self-signed certificates:

  1. Fetch the internal certificate. You can run the following commands to copy the internal certificate to a local file:

    oc get secret internal-tls -n ${PROJECT_CPD_INSTANCE} -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt
    oc get secret internal-tls -n ${PROJECT_CPD_INSTANCE} -o jsonpath='{.data.tls\.crt}' | base64 -d > tls.crt
    
  2. Append the certificates that you want to include to the local file ca.crt, which you want to apply and use to establish a secure connection while accessing endpoints from your Spark notebooks or Spark applications. For example, if your external endpoint certificate is ext.crt, you need to append that to ca.crt as follows:

    cat ext.crt >> ca.crt
    

    Ensure that the contents of new ca.crt looks as follows:

    -----BEGIN CERTIFICATE-----
    ...
    existing cert
    ...
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
    ...
    external endpoint cert
    ...
    -----END CERTIFICATE-----
    
  3. Create a kubernetes secret. When the ca.crt file is ready, create a secret in the OpenShift project where is installed. The example shows using the created secret new-certificates-chain:

    # create secret with new certificate chain
    $ oc create secret generic new-certificates-chain --from-file=ca.crt --from-file=tls.crt -n ${PROJECT_CPD_INSTANCE}
    

    This command returns the information that the secret/new-certificates-chain was created.

  4. Determine the image name that you must set in the job configuration to use for pod creation:

    # Find the image to be used for the pod creation
    $ oc get deploy spark-hb-create-trust-store -o jsonpath="{..image}" -n ${PROJECT_CPD_INSTANCE}
    

    This command returns something like:

    cp.icr.io/cp/cpd/spark-hb-truststore-util@sha256:bb1ac4bba2a201995f07de7995d1055cd571a865b60bc7fad8cbb7879f41150d
    
  5. Create a kubernetes pod to update the certificates. Run the following command to deploy the kubernetes pod, which updates the truststores used by the Analytics Engine Powered by Apache Sparkservice. Before running the command, replace REPLACE_WITH_IMAGE with the image name, which was returned in previous step.

    # replace REPLACE_WITH_IMAGE with image name
    $ oc run spark-hb-update-certificates  -n ${PROJECT_CPD_INSTANCE} --image $REPLACE_WITH_IMAGE --restart OnFailure --generator=run-pod/v1 --overrides '{"apiVersion":"v1","kind":"Pod","metadata":{"name":"spark-hb-update-certificates","labels":{"app":"spark-hb-update-certificates","run":"spark-hb-update-certificates"}},"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"beta.kubernetes.io/arch","operator":"In","values":["amd64"]}]}]}},"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchExpressions":[{"key":"run","operator":"In","values":["spark-hb-update-certificates"]}]},"topologyKey":"kubernetes.io/hostname"}]}},"containers":[{"args":["bash /opt/ibm/entrypoint/create-trust-store-and-secret.sh changeit spark-hb-java-trust-store spark-hb-os-trust-store /opt/hb/icp4d-certs"],"command":["/bin/sh","-c"],"image":"$REPLACE_WITH_IMAGE","imagePullPolicy":"Always","name":"spark-hb-update-certificates-container","resources":{"limits":{"cpu":"100m","memory":"128Mi"},"requests":{"cpu":"100m","memory":"128Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"readOnlyRootFilesystem":false,"runAsNonRoot":true,"runAsUser":1000320999},"volumeMounts":[{"mountPath":"/opt/hb/icp4d-certs","name":"icp4d-certs","readOnly":true},{"mountPath":"/opt/ibm/entrypoint/","name":"spark-hb-create-trust-store-secret-script"}]}],"restartPolicy":"OnFailure","serviceAccount":"zen-editor-sa","serviceAccountName":"zen-editor-sa","terminationGracePeriodSeconds":30,"volumes":[{"name":"icp4d-certs","secret":{"defaultMode":420,"secretName":"new-certificates-chain"}},{"configMap":{"defaultMode":420,"items":[{"key":"create-trust-store-and-secret.sh","path":"create-trust-store-and-secret.sh"}],"name":"spark-hb-create-trust-store-secret-script"},"name":"spark-hb-create-trust-store-secret-script"}],"tolerations":[{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300}]}}'
    
  6. Monitor the pod; it'll take around a minute to complete the task:

    $ oc get pod spark-hb-update-certificates
    
  7. When the status of spark-hb-update-certificates is set to "running", check the logs:

    oc logs -f spark-hb-update-certificates 
    

    Example of the log output:

    count 3
    secret "spark-hb-java-trust-store" deleted
    exit_code : 0
    count 3
    secret "spark-hb-os-trust-store" deleted
    exit_code : 0
    count 3
    secret/spark-hb-java-trust-store created
    exit_code : 0
    count 3
    secret/spark-hb-os-trust-store created
    exit_code : 0 
    

Parent topic: Administering Analytics Engine Powered by Apache Spark