Cannot submit profile jobs or preview assets (IBM watsonx.data intelligence)
If you are profiling an asset in a catalog or a project fails, you might encounter the error: FAILED TO SUBMIT JOB: Job failed to start.
In addition, if you are trying to preview an asset when data protection rules are applied, you might encounter the error: An error occurred attempting to preview this asset.
These errors might be caused by an issue with the instance of IBM Analytics Engine powered by Apache Spark that is used for profiling and previewing. To check the Analytics Engine instance and eventually fix the issue, complete the following steps:
-
Log in to the your Red Hat OpenShift cluster as a cluster administrator:
oc login <OpenShift_URL:port> -
Log in to the profiling pod:
a. Identify the profiling pod by running the following command:
oc get pod | grep wdp-profilingThis command can return a list of pods. Select the pod where the name prefix corresponds exactly to
wdp-profiling.b. Log in to this pod by using the following command:
oc rsh <podName>Example:
oc rsh wdp-profiling-d989b575b-v85wx -
Complete these checks:
-
If vaults are enabled in your environment, go to the
/etc/.secretsdirectory and make sure the INSTANCE_API_KEY and ANALYTICS_ENGINE_INSTANCE_ID secrets files are available.To obtain the INSTANCE_API_KEY and ANALYTICS_ENGINE_INSTANCE_ID values, run the following commands:
cat /etc/.secrets/INSTANCE_API_KEYcat /etc/.secrets/ANALYTICS_ENGINE_INSTANCE_ID -
If vaults are not enabled in your environment, check the following environment variables and make sure they have values set:
- INSTANCE_API_KEY
- INSTANCE_USER
- ANALYTICS_ENGINE_INSTANCE_ID
Use the following command:
env | grep -i instanceThe output should look similar to this example:
INSTANCE_API_KEY=UJQzjcf6HJcR6IAUilR3fYu3qbvPoCnVX6mc1eN6 INSTANCE_USER=__internal_profiler__ ANALYTICS_ENGINE_INSTANCE_ID=1654844464492037
All of these environment variables or secrets files must be available. Otherwise, there's a serious problem with the installation.
-
-
Generate an API token for the instance user:
curl -k -sS -X POST -H 'Content-Type: application/json' -d '{"username":"<INSTANCE_USER>","api_key":"<INSTANCE_API_KEY>"}' https://<CPDHost>/icp4d-api/v1/authorizeThis command returns a JSON object with the token.
-
Look for the keyword
tokenin the returned JSON. Copy the value of that property and export it to a variable TOKEN by using the commandexport TOKEN=<token>. -
Get instance details and check whether the instance is in
RUNNINGstate:curl -k -sS -X GET https://<CPDHost>/zen-data-ui/v3/service_instances/<ANALYTICS_ENGINE_INSTANCE_ID>?include_service_status=true --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN"The response should look similar to this example:
{"service_instance":{"addon_type":"spark","addon_version":"4.5.0","connection_info":{"History server endpoint":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark_history_server","Spark jobs V2 endpoint":"$HOST/ae/spark/v2/f32a7728-eb57-4300-a5b4-bb0f772b9964/v2/jobs","Spark jobs V3 endpoint":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark/applications","Spark kernel endpoint":"$HOST/v2/spark/ae/f32a7728-eb57-4300-a5b4-bb0f772b9964/jkg/api/kernels","View history server":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark_history_ui/"},"created_at":"2022-06-10T07:01:04.502057Z","display_name":"ProfHbIntrnl","id":"1654844464492037","instance_identifiers":null,"metadata":{"volumeName":"wkc::ProfStgIntrnl"},"misc_data":{},"namespace":"wkc","owner_uid":"1000331001","owner_username":"__internal_profiler__","parameters":{"file_server":{"start":true},"storageClass":"managed-nfs-storage","storageSize":"5Gi","volumeName":"wkc::ProfStgIntrnl"},"provision_status":"PROVISIONED","resources":{},"roles":["Admin"],"updated_at":"2022-07-22T06:10:16.092222Z","zen_service_instance_info":{"docker_registry_prefix":"icr.io/cpopen/cpfs"}},"services_status":"RUNNING"}The value of the keyword
services_statusshould be RUNNING.However, if the GET call returns a
service_not_foundexception, the Analytics Engine instance is no longer available. In this case, continue with the following steps to fix the issue. Otherwise, skip the rest of the steps. -
List all service instances. If the token obtained in step 4 is still valid, you can use this token. It the token is expired, generate a new one and export it as described in that step.
curl -k -sS -X GET https://<CPDHost>/zen-data-ui/v3/service_instances?include_service_status=true --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN"A JSON object is returned that lists the details for 2 instances at maximum: for the service volume instance and for the Analytics Engine instance. If no service instances are available, the JSON object is empty.
-
If no service volume instance is found, create it by using the following cURL command. Replace values in the sample payload as appropriate:
curl -k -sS -X POST https://<CPDHost>/zen-data-ui/v3/service_instances --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN" --header 'Content-Type: application/json' -d ' {"addon_type":"volumes","display_name":"<volumeNameOfYourChoice>","namespace":"${PROJECT_CPD_INST_OPERANDS}","addon_version":"-","create_arguments":{"resources":{},"parameters":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"description":"","metadata":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"owner_username":""},"pre_existing_owner":false,"transientFields":{}}'This command creates a service volume instance and returns its ID in the response. Make a note of the service volume instance ID.
-
If no Analytics Engine instance is found, create it by using the following cURL command. Replace values in the sample payload as appropriate:
curl -k -sS -X POST https://<CPDHost>/zen-data-ui/v3/service_instances --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN" --header 'Content-Type: application/json' -d '{"addon_type":"spark","display_name":"<IAE_INSTANCE_NameOfYourChoice>","namespace":"${PROJECT_CPD_INST_OPERANDS}","addon_version":"<spark_version>","create_arguments":{"resources":{},"parameters":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"description":"","metadata":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"owner_username":""},"pre_existing_owner":false,"transientFields":{}}'This command creates an Analytics Engine instance and returns its ID as response. Make a note of this ID.
-
-
Update the
wdp-profilingand thedp-transformdeployments to work with the new Analytics Engine instance you created in the previous step.-
If vaults are enabled in your environment, complete these steps:
- Navigate to the
/etc/.secretsdirectory. - Edit the ANALYTICS_ENGINE_INSTANCE_ID secrets file and update the value of the
iae_instance_idkey. - Save your changes.
To apply the changes, restart the
wdp-profilinganddp-transformpods. - Navigate to the
-
If vaults are not enabled in your environment, complete these steps:
-
Encode the instance ID:
echo -n '<instance_id>' | base64 -
Edit the
wdp-profiling-iae-secretssecret:oc edit secret wdp-profiling-iae-secrets -
Update the value of the
iae_instance_identry with the base64-encoded value:apiVersion: v1 data: iae_instance_id: <base64-encoced value> iae_usermgmt_apikey: <apikey> kind: Secret -
Restart the
wdp-profilingpod by running running these commands replacingwith required number of replicas: oc scale deployment wdp-profiling --replicas=0 oc scale deployment wdp-profiling --replicas=<n> -
Restart the
dp-transformpod by running running these commands replacingwith required number of replicas: oc scale deployment dp-transform --replicas=0 oc scale deployment dp-transform --replicas=<n> -
Save your changes.
The deployment is updated. After the pods are refreshed, the new Analytics Engine instance is used for profiling and previewing.
-
-