Cannot submit profile jobs or preview assets (IBM Knowledge Catalog)
If you are profiling an asset in a catalog or a project fails, you might encounter the error: FAILED TO SUBMIT JOB: Job failed to start
.
In addition, if you are trying to preview an asset when data protection rules are applied, you might encounter the error: An error occurred attempting to preview this asset
.
These errors might be caused by an issue with the instance of IBM Analytics Engine powered by Apache Spark that is used for profiling and previewing. To check the Analytics Engine instance and eventually fix the issue, complete the following steps:
-
Log in to the your Red Hat OpenShift cluster as a cluster administrator:
oc login <OpenShift_URL:port>
-
Log in to the profiling pod:
a. Identify the profiling pod by running the following command:
oc get pod | grep wdp-profiling
This command can return a list of pods. Select the pod where the name prefix corresponds exactly to
wdp-profiling
.b. Log in to this pod by using the following command:
oc rsh <podName>
Example:
oc rsh wdp-profiling-d989b575b-v85wx
-
Complete these checks:
-
If vaults are enabled in your environment, go to the
/etc/.secrets
directory and make sure the INSTANCE_API_KEY and ANALYTICS_ENGINE_INSTANCE_ID secrets files are available.To obtain the INSTANCE_API_KEY and ANALYTICS_ENGINE_INSTANCE_ID values, run the following commands:
cat /etc/.secrets/INSTANCE_API_KEY
cat /etc/.secrets/ANALYTICS_ENGINE_INSTANCE_ID
-
If vaults are not enabled in your environment, check the following environment variables and make sure they have values set:
- INSTANCE_API_KEY
- INSTANCE_USER
- ANALYTICS_ENGINE_INSTANCE_ID
Use the following command:
env | grep -i instance
The output should look similar to this example:
INSTANCE_API_KEY=UJQzjcf6HJcR6IAUilR3fYu3qbvPoCnVX6mc1eN6 INSTANCE_USER=__internal_profiler__ ANALYTICS_ENGINE_INSTANCE_ID=1654844464492037
All of these environment variables or secrets files must be available. Otherwise, there's a serious problem with the installation.
-
-
Generate an API token for the instance user:
curl -k -sS -X POST -H 'Content-Type: application/json' -d '{"username":"<INSTANCE_USER>","api_key":"<INSTANCE_API_KEY>"}' https://<CPDHost>/icp4d-api/v1/authorize
This command returns a JSON object with the token.
-
Look for the keyword
token
in the returned JSON. Copy the value of that property and export it to a variable TOKEN by using the commandexport TOKEN=<token>
. -
Get instance details and check whether the instance is in
RUNNING
state:curl -k -sS -X GET https://<CPDHost>/zen-data-ui/v3/service_instances/<ANALYTICS_ENGINE_INSTANCE_ID>?include_service_status=true --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN"
The response should look similar to this example:
{"service_instance":{"addon_type":"spark","addon_version":"4.5.0","connection_info":{"History server endpoint":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark_history_server","Spark jobs V2 endpoint":"$HOST/ae/spark/v2/f32a7728-eb57-4300-a5b4-bb0f772b9964/v2/jobs","Spark jobs V3 endpoint":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark/applications","Spark kernel endpoint":"$HOST/v2/spark/ae/f32a7728-eb57-4300-a5b4-bb0f772b9964/jkg/api/kernels","View history server":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark_history_ui/"},"created_at":"2022-06-10T07:01:04.502057Z","display_name":"ProfHbIntrnl","id":"1654844464492037","instance_identifiers":null,"metadata":{"volumeName":"wkc::ProfStgIntrnl"},"misc_data":{},"namespace":"wkc","owner_uid":"1000331001","owner_username":"__internal_profiler__","parameters":{"file_server":{"start":true},"storageClass":"managed-nfs-storage","storageSize":"5Gi","volumeName":"wkc::ProfStgIntrnl"},"provision_status":"PROVISIONED","resources":{},"roles":["Admin"],"updated_at":"2022-07-22T06:10:16.092222Z","zen_service_instance_info":{"docker_registry_prefix":"icr.io/cpopen/cpfs"}},"services_status":"RUNNING"}
The value of the keyword
services_status
should be RUNNING.However, if the GET call returns a
service_not_found
exception, the Analytics Engine instance is no longer available. In this case, continue with the following steps to fix the issue. Otherwise, skip the rest of the steps. -
List all service instances. If the token obtained in step 4 is still valid, you can use this token. It the token is expired, generate a new one and export it as described in that step.
curl -k -sS -X GET https://<CPDHost>/zen-data-ui/v3/service_instances?include_service_status=true --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN"
A JSON object is returned that lists the details for 2 instances at maximum: for the service volume instance and for the Analytics Engine instance. If no service instances are available, the JSON object is empty.
-
If no service volume instance is found, create it by using the following cURL command. Replace values in the sample payload as appropriate:
curl -k -sS -X POST https://<CPDHost>/zen-data-ui/v3/service_instances --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN" --header 'Content-Type: application/json' -d ' {"addon_type":"volumes","display_name":"<volumeNameOfYourChoice>","namespace":"${PROJECT_CPD_INST_OPERANDS}","addon_version":"-","create_arguments":{"resources":{},"parameters":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"description":"","metadata":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"owner_username":""},"pre_existing_owner":false,"transientFields":{}}'
This command creates a service volume instance and returns its ID in the response. Make a note of the service volume instance ID.
-
If no Analytics Engine instance is found, create it by using the following cURL command. Replace values in the sample payload as appropriate:
curl -k -sS -X POST https://<CPDHost>/zen-data-ui/v3/service_instances --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN" --header 'Content-Type: application/json' -d '{"addon_type":"spark","display_name":"<IAE_INSTANCE_NameOfYourChoice>","namespace":"${PROJECT_CPD_INST_OPERANDS}","addon_version":"<spark_version>","create_arguments":{"resources":{},"parameters":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"description":"","metadata":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"owner_username":""},"pre_existing_owner":false,"transientFields":{}}'
This command creates an Analytics Engine instance and returns its ID as response. Make a note of this ID.
-
-
Update the
wdp-profiling
and thedp-transform
deployments to work with the new Analytics Engine instance you created in the previous step.-
If vaults are enabled in your environment, complete these steps:
- Navigate to the
/etc/.secrets
directory. - Edit the ANALYTICS_ENGINE_INSTANCE_ID secrets file and update the value of the
iae_instance_id
key. - Save your changes.
To apply the changes, restart the
wdp-profiling
anddp-transform
pods. - Navigate to the
-
If vaults are not enabled in your environment, complete these steps:
-
Encode the instance ID:
echo -n '<instance_id>' | base64
-
Edit the
wdp-profiling-iae-secrets
secret:oc edit secret wdp-profiling-iae-secrets
-
Update the value of the
iae_instance_id
entry with the base64-encoded value:apiVersion: v1 data: iae_instance_id: <base64-encoced value> iae_usermgmt_apikey: <apikey> kind: Secret
-
Restart the
wdp-profiling
pod by running running these commands replacingwith required number of replicas: oc scale deployment wdp-profiling --replicas=0 oc scale deployment wdp-profiling --replicas=<n>
-
Restart the
dp-transform
pod by running running these commands replacingwith required number of replicas: oc scale deployment dp-transform --replicas=0 oc scale deployment dp-transform --replicas=<n>
-
Save your changes.
The deployment is updated. After the pods are refreshed, the new Analytics Engine instance is used for profiling and previewing.
-
-
Parent topic: Troubleshooting IBM Knowledge Catalog