This document gives an outline about how to typically use the LSF
Predictor service.
About this task
The following list contains the basic concepts associated with the Predictor service:
- Experiment
- An experiment is an LSF
simulation run that uses a selected LSF
configuration and workload snapshot.
- Cluster configuration
- An LSF
configuration is a full set of LSF
cluster configurations and workload policies.
- Workload snapshot
- A workload snapshot is a set of job submissions and completion records that are imported from
the LSF
cluster events files (lsb.events*).
- Prediction
- An AI model training process that includes data selecting, data cleaning, starting and stopping
training, pipeline viewing and publishing, model testing, workload snapshot optimizing, and
more.
Procedure
The following is an example outline of a typical Predictor service workflow by using samples that
are provided for you on which you can practice and learn.
- Workflow example 1:
- Start by optimizing the sample data and then rerunning a sample experiment. You can then compare
the results of the original (baseline) and the new experiment. This example does not require
IBM Cloud Pak
for Data to
be running.
-
Select the Workload Snapshots tab.
-
In the
WL_clusterA_small row, click the Optimize
action.
Select the default sample_model_max_mem.model (local) prediction model. Click
Optimize.
-
Select the Experiments tab.
-
In the
Sample Experiment row, click the Rerun
action.
In the Modify and Rerun Experiment wizard, select the
Workload Snapshot tab. Select the newly-generated workload snapshot. You can
click Next to review your experiment selections. Click
Rerun. The progress of the Sample Experiment job can be tracked. Click
Refresh to update the progress bar in the table. The experiment is completed
when the progress bar shows 100% jobs are completed.
-
To view the prediction results charts, click on the Sample Experiment
name and select the Prediction Results tab.
- Workflow example 2:
- Use a small job data set to practise how to create a model with the LSF
Predictor.
- Train a model to predict the amount of memory required for a job.
Note: This example does
require IBM Cloud Pak
for Data to be
running.
-
To create a prediction regarding the amount of memory required for a job, select the
Prediction tab.
Click Create +. Enter a prediction name such as
Job_memory. Leave the default selections of max-mem prediction
target and Regression prediction type. You can add a description of the prediction
if you want. Click Next.
-
To select historical job data in lsb.acct files, enter a data source
location and your LSF
cluster name.
Select the workload start and end times. Click Next.
-
Select job attributes for the model training data under the Job Features
tab.
Previews of the data update dynamically as you make changes to the job attributes and filters.
-
To start the model training, click Create.
The model training time depends on the amount of the training data. For example, training with
10000 jobs and 6 attributes takes about 20 minutes.
To display the pipeline details when the model training completes, click on the prediction name,
then the Pipelines tab.
-
To publish the pipeline as a model, locate the pipeline with rank
1 and click
Publish.
-
Test the model interactively by clicking Test.
To discover the predicted job memory size, enter job attributes and click
Predict.
- Workflow example 3:
- Tune your model to improve prediction accuracy.
-
Click the completed prediction name, check the
MAE (for Regression type) or
Accuracy (for Classification type) value for each pipeline of the prediction.
If the MAE value is too big relative to the real job memory, or the
Accuracy value is too small, it's time to tune the model.
-
To tune the model, the important task is to find the most relevant job features.
The feature importance column in the pipeline table shows the importance number as determined
from the last training run. Select the job features with the largest importance numbers for the next
tuning practise run.
-
Click the Tune action to start tuning the prediction.
Carefully select the important job features. The following general rules can result in better
predictions:
-
Find the relevant information using a tag from the long string instead of the whole
string.
-
Remove the irrelevant job features.
-
Use a smaller job data set to reduce the tuning-process time.
It's advisable to select the number of jobs to be less than 100k for tuning.
-
After the tuning is done, recheck the
MAE or Accuracy
value.
If the value is still not good enough, return to step
12
and tune it again.
-
If the
MAE or Accuracy value is good enough, apply the
features selection to a large data set to improve the accuracy of the prediction.
Create a new prediction or tune an existing prediction by selecting a large number of jobs.
However, based on benchmark results, when the number of jobs is too big, for example 10 million, it
does not result in an improvement in model accuracy.