Temporal event training fails when run outside the pod
The temporal event training command fails when it is run from outside the pod.
Problem
When you attempt to run the temporal event training command from outside the pod, the command fails with spark error messages. This issue is observed on smaller deployments (trial size clusters).
Cause
This issue occurs because the spark pod does not work to optimal levels with only one pod for controller or worker node.
Resolution
To resolve this issue, run the training command from within the pod by completing the following
steps:
- Run the training command:
The command times out, which is expected behavior.kubectl delete pod trainer; kubectl run trainer -it --command=true --restart=Never --env=LICENSE=accept --env=RUNNING_IN_CONTAINER=true --image=$TOOL_VERSION --command sleep infinity
- Verify that the pod is running by running the
command:
oc get pods |grep trainer
- Log in to the pod by running the
command:
oc exec -it trainer /bin/bash
- Run the training command from within the pod:
Where <release_name> is the full text value of the release, and <start_date> and <end_date> are the start and end dates of the data that is selected for the training.cd bin ./runTraining.sh -r <release_name> -t cfd95b7e-3bc7-4006-a4a8-a73a79c71255 -a seasonal-events -s <start_date> -e <end_date> -d true ./runTraining.sh -r <release_name> -t cfd95b7e-3bc7-4006-a4a8-a73a79c71255 -a related-events -s <start_date> -e <end_date> -d true