Machine learning models
The Watson Studio Local client provides tools to help you create and train machine learning models that can analyze data assets and extract value from them. Users can also deploy their models to make them available to a wider audience.
Tasks you can perform:
- Create a model with APIs
- Create a model from a file
- Create a model with the model builder
- Test a model online
- Batch score a model
- Evaluate a model
Watson Studio Local supports the following machine learning model types:
- Spark ML
- PMML with online scoring
- Custom models with batch scoring
- scikit-learn 0.19.1 (Python 2.7 and Python 3.5) - 0.19.1 (GPU-Python 3.5) with pickle or joblib format
- XGBoost 0.7.post3 (Python 2.7 and 3.5) - 0.71 (GPU-Python 3.5)
- Keras 2.1.3 (Python 2.7 and Python 3.5) - 2.1.5 (GPU-Python 3.5)
- TensorFlow 1.5.0 (Python 2.7 and Python 3.5) - 1.4.1 (GPU-Python 3.5)
- WML
Create a model with APIs
Watson Studio Local provides sample notebooks to help users create their own custom applications that can be powered by machine learning.
To learn more about the machine learning repository client API commands and syntax that you can use inside a notebook, see Save a model in Python and Create a model in HDP.
If you use these commands and syntax, be aware that for Watson Studio Local, the repository URL is static, and you do not need to authenticate to the repository.
For each scoring iteration that you run from a notebook, Watson Studio Local automatically increments the model version. Later in the model details page, you can compare the accuracies of each version that you ran.

Create a model from a file
You can import three types of models:
- PMML
- An XML file written in the Predictive Model Markup Language. The PMML is scored using JPMML Evaluator (make sure you review the "Supported model types", "Not yet supported model types" and "Known Limitations").
- Custom Batch
- A third-party vendor model in a compressed .gz format that will perform
batch scoring. If you are using the Carolina tool, see Carolina for
Hadoop or Carolina Standalone product page for more details.Requirement: You must ZIP up all scripts and dependent models into a single .gz file before you import it. You can use the utility script provided at /user-home/.scripts/common-helpers/batch/custom/createcustombatch.sh to zip the files.
- Custom Online
- A third-party vendor model in a .jar format that will perform online
scoring. Requirement: If you are using the Carolina tool, run your scripts through it to generate the .jar file. See the Carolina for Integration product page for more details.To perform the online scoring, place all third-party JAR files and license file into the /user-home/_global_/libs/ml/mlscoring/thirdparty/carolina/ folder of Watson Studio Local.
To import a model into your project from a file, complete the following steps:
- In your project, go to the Models page and click add models.
- In the Create Model window, click the From File
tab.
Specify the name and description.
- In the Type file, select what kind of model you are importing.
- Browse to the file or drag it into the box.
- Click the Create button.
Create a model with the model builder
To create a new model in your project, complete the following steps:
- In your project, go to the Models page and click add
models.

- In the Add Model window, click the Blank tab.
Specify the name and description. Select Machine Learning. Select whether you
want to create the model automatically or manually and click Create to create
an untrained model.
If you opt to create the model manually, a Prepare window opens where you can add and configure more transformers than just the default. A transformer acts on the data, usually by appending new columns and mapping existing data to the new column.

- On the Select Data step, click your newly created model and select the data asset to run the
model on. Ensure that none of the columns use boolean data types. You can also add new data assets.
Click Next to load the data.

- On the Prepare step, if you selected Manual when you created the model,
then add and configure each transformer accordingly. Click Next.

- On the Train step, select the column value to predict and the technique to train it with
(Watson Studio Local will suggest the best one). You can add estimators to
train on the data and produce a model for each one; then you can select the best trained model to
deploy and use for predictions. You can also adjust the validation split to experiment with how much
of the data to train, test, and hold out. Click Next to train and evaluate
the model.

- On the Evaluate step, select which trained model you want to keep and click
Save to save it. Each time you save the model, its version is incremented.
Later in the model details page, you can compare the accuracies of each version you ran.

You can also select the best version to publish.
Test a model online
In the Models page of your project, click Real-time score next to the model to input data and simulate predictions on it as a pie chart or bar graph.

Batch score a model
To run batch prediction jobs that read in a data set, score the data, and output the predictions in a CSV file, complete the following steps:
- In the Models page of your project, click Batch Score next to the model.
- Select the execution type, input data set, and output data set CSV file. Restriction: WML models created in the visual model builder can only use a remote data set as the input data asset. Other types of models can use a local CSV file for the input.

- Click the Generate Batch Script button. Watson Studio Local automatically generates a Python script that you can edit
directly in the Result view.Tip: This script can be customized to pre-process your data, for example, ensuring the case of the dataframe headers is suitable for ML models.
- Click the Run now button to immediately create and run a job for the
script. Alternatively, you can click Advanced settings to save the script as
either a .py script or a .ipynb notebook in your project;
then later from the Jobs page of your project, you can create a scheduled job
for the script or notebook you saved with Type set to Batch
scoring. Restriction: If you select a GPU worker for the job, you can only batch score Keras models.
Requirement: If you are scoring a PMML model in Python 3.5 or later, you must specify environment variable SPARK_VERSION=2.1.

From the job details page, you can click on each run to view results and logs. You can also view a batch scoring history from the model details.
Evaluate a model
To evaluate the performance of a model, complete the following steps:
- In the Models page of your project, click Evaluate next to the model.
- Select an input data set that contains the prediction column. For each evaluator, you can opt to
customize your own threshold metric and specify what fraction of the overall data must be relevant
for the model to be considered healthy. For Spark 2.1 model evaluations, the output data set field
is ignored.

- Click the Generate Evaluation Script button. Watson Studio Local automatically generates a Python script that you can edit
directly in the Result view. Tip: You can customize this script to pre-process your data, for example, to ensure that the case of the data frame headers is suitable for ML models.
- Click the Run now button to immediately create and run a job for the
script. Alternatively, you can click Advanced settings to save the script as
either a .py script or a .ipynb notebook in your project; then later from the
Jobs page of your project, you can create a scheduled job for the script or
notebook you saved with Type set to Model evaluation.
Requirement: If you are evaluating a PMML model in Python 3.5 or later, you must specify environment variable SPARK_VERSION=2.1.
From the job details page, you can click on each run to view results and logs. Go to the model details page to view the evaluation history.