Cannot submit profile jobs or preview assets (IBM Knowledge Catalog)

If you are profiling an asset in a catalog or a project fails, you might encounter the error: FAILED TO SUBMIT JOB: Job failed to start.

In addition, if you are trying to preview an asset when data protection rules are applied, you might encounter the error: An error occurred attempting to preview this asset.

These errors might be caused by an issue with the instance of IBM Analytics Engine powered by Apache Spark that is used for profiling and previewing. To check the Analytics Engine instance and eventually fix the issue, complete the following steps:

  1. Log in to the your Red Hat OpenShift cluster as a cluster administrator:

    oc login <OpenShift_URL:port>
    
  2. Log in to the profiling pod:

    a. Identify the profiling pod by running the following command:

    oc get pod | grep wdp-profiling
    

    This command can return a list of pods. Select the pod where the name prefix corresponds exactly to wdp-profiling.

    b. Log in to this pod by using the following command:

    oc rsh <podName>
    

    Example: oc rsh wdp-profiling-d989b575b-v85wx

  3. Complete these checks:

    • If vaults are enabled in your environment, go to the /etc/.secrets directory and make sure the INSTANCE_API_KEY and ANALYTICS_ENGINE_INSTANCE_ID secrets files are available.

      To obtain the INSTANCE_API_KEY and ANALYTICS_ENGINE_INSTANCE_ID values, run the following commands:

      cat /etc/.secrets/INSTANCE_API_KEY
      
      cat /etc/.secrets/ANALYTICS_ENGINE_INSTANCE_ID
      
    • If vaults are not enabled in your environment, check the following environment variables and make sure they have values set:

      • INSTANCE_API_KEY
      • INSTANCE_USER
      • ANALYTICS_ENGINE_INSTANCE_ID

      Use the following command:

      env | grep -i instance
      

      The output should look similar to this example:

      INSTANCE_API_KEY=UJQzjcf6HJcR6IAUilR3fYu3qbvPoCnVX6mc1eN6
      INSTANCE_USER=__internal_profiler__
      ANALYTICS_ENGINE_INSTANCE_ID=1654844464492037
      

    All of these environment variables or secrets files must be available. Otherwise, there's a serious problem with the installation.

  4. Generate an API token for the instance user:

    curl -k -sS -X POST -H 'Content-Type: application/json' -d '{"username":"<INSTANCE_USER>","api_key":"<INSTANCE_API_KEY>"}' https://<CPDHost>/icp4d-api/v1/authorize
    

    This command returns a JSON object with the token.

  5. Look for the keyword token in the returned JSON. Copy the value of that property and export it to a variable TOKEN by using the command export TOKEN=<token>.

  6. Get instance details and check whether the instance is in RUNNING state:

    curl -k -sS -X GET https://<CPDHost>/zen-data-ui/v3/service_instances/<ANALYTICS_ENGINE_INSTANCE_ID>?include_service_status=true --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN"
    

    The response should look similar to this example:

    {"service_instance":{"addon_type":"spark","addon_version":"4.5.0","connection_info":{"History server endpoint":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark_history_server","Spark jobs V2 endpoint":"$HOST/ae/spark/v2/f32a7728-eb57-4300-a5b4-bb0f772b9964/v2/jobs","Spark jobs V3 endpoint":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark/applications","Spark kernel endpoint":"$HOST/v2/spark/ae/f32a7728-eb57-4300-a5b4-bb0f772b9964/jkg/api/kernels","View history server":"$HOST/v2/spark/v3/instances/f32a7728-eb57-4300-a5b4-bb0f772b9964/spark_history_ui/"},"created_at":"2022-06-10T07:01:04.502057Z","display_name":"ProfHbIntrnl","id":"1654844464492037","instance_identifiers":null,"metadata":{"volumeName":"wkc::ProfStgIntrnl"},"misc_data":{},"namespace":"wkc","owner_uid":"1000331001","owner_username":"__internal_profiler__","parameters":{"file_server":{"start":true},"storageClass":"managed-nfs-storage","storageSize":"5Gi","volumeName":"wkc::ProfStgIntrnl"},"provision_status":"PROVISIONED","resources":{},"roles":["Admin"],"updated_at":"2022-07-22T06:10:16.092222Z","zen_service_instance_info":{"docker_registry_prefix":"icr.io/cpopen/cpfs"}},"services_status":"RUNNING"}
    

    The value of the keyword services_status should be RUNNING.

    However, if the GET call returns a service_not_found exception, the Analytics Engine instance is no longer available. In this case, continue with the following steps to fix the issue. Otherwise, skip the rest of the steps.

  7. List all service instances. If the token obtained in step 4 is still valid, you can use this token. It the token is expired, generate a new one and export it as described in that step.

    curl -k -sS -X GET https://<CPDHost>/zen-data-ui/v3/service_instances?include_service_status=true --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN"
    

    A JSON object is returned that lists the details for 2 instances at maximum: for the service volume instance and for the Analytics Engine instance. If no service instances are available, the JSON object is empty.

    • If no service volume instance is found, create it by using the following cURL command. Replace values in the sample payload as appropriate:

      curl -k -sS -X POST https://<CPDHost>/zen-data-ui/v3/service_instances --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN" --header 'Content-Type: application/json' -d ' {"addon_type":"volumes","display_name":"<volumeNameOfYourChoice>","namespace":"${PROJECT_CPD_INST_OPERANDS}","addon_version":"-","create_arguments":{"resources":{},"parameters":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"description":"","metadata":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"owner_username":""},"pre_existing_owner":false,"transientFields":{}}'
      

      This command creates a service volume instance and returns its ID in the response. Make a note of the service volume instance ID.

    • If no Analytics Engine instance is found, create it by using the following cURL command. Replace values in the sample payload as appropriate:

      curl -k -sS -X POST https://<CPDHost>/zen-data-ui/v3/service_instances --header 'Accept: application/json' --header "Authorization: Bearer $TOKEN" --header 'Content-Type: application/json' -d '{"addon_type":"spark","display_name":"<IAE_INSTANCE_NameOfYourChoice>","namespace":"${PROJECT_CPD_INST_OPERANDS}","addon_version":"<spark_version>","create_arguments":{"resources":{},"parameters":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"description":"","metadata":{"storageClass":"managed-nfs-storage","storageSize":"50Gi","volumeName":"${PROJECT_CPD_INST_OPERANDS}::<volumeNameOfYourChoice>","file_server":{"start":true}},"owner_username":""},"pre_existing_owner":false,"transientFields":{}}'
      

      This command creates an Analytics Engine instance and returns its ID as response. Make a note of this ID.

  8. Update the wdp-profiling and the dp-transform deployments to work with the new Analytics Engine instance you created in the previous step.

    • If vaults are enabled in your environment, complete these steps:

      1. Navigate to the /etc/.secrets directory.
      2. Edit the ANALYTICS_ENGINE_INSTANCE_ID secrets file and update the value of the iae_instance_id key.
      3. Save your changes.

      To apply the changes, restart the wdp-profiling and dp-transform pods.

    • If vaults are not enabled in your environment, complete these steps:

      1. Encode the instance ID:

        echo -n '<instance_id>' | base64
        
      2. Edit the wdp-profiling-iae-secrets secret:

        oc edit secret wdp-profiling-iae-secrets
        
      3. Update the value of the iae_instance_id entry with the base64-encoded value:

        apiVersion: v1
        data:
          iae_instance_id: <base64-encoced value>
          iae_usermgmt_apikey: <apikey>
        kind: Secret
        
      4. Restart the wdp-profiling pod by running running these commands replacing with required number of replicas:

        oc scale deployment wdp-profiling --replicas=0
        oc scale deployment wdp-profiling --replicas=<n>
        
      5. Restart the dp-transform pod by running running these commands replacing with required number of replicas:

        oc scale deployment dp-transform --replicas=0
        oc scale deployment dp-transform --replicas=<n>
        
      6. Save your changes.

      The deployment is updated. After the pods are refreshed, the new Analytics Engine instance is used for profiling and previewing.

Parent topic: Troubleshooting IBM Knowledge Catalog