Machine learning models

The Watson Studio Local client provides tools to help you create and train machine learning models that can analyze data assets and extract value from them. Users can also deploy their models to make them available to a wider audience.

Tasks you can perform:

Create a model with APIs
Create a model from a file
Test a model online
Batch score a model
Evaluate a model

Watson Studio Local supports the following machine learning model types:

Spark ML
PMML with online scoring
Custom models with batch scoring
scikit-learn 0.19.1 (Python 2.7 and Python 3.5) - 0.19.1 (GPU-Python 3.5) with pickle or joblib format
XGBoost 0.7.post3 (Python 2.7 and 3.5) - 0.71 (GPU-Python 3.5)
Keras 2.1.3 (Python 2.7 and Python 3.5) - 2.1.5 (GPU-Python 3.5)
TensorFlow 1.5.0 (Python 2.7 and Python 3.5) - 1.4.1 (GPU-Python 3.5)
WML

Create a model with APIs

Watson Studio Local provides sample notebooks to help users create their own custom applications that can be powered by machine learning.

To learn more about the machine learning repository client API commands and syntax that you can use inside a notebook, see Save a model in Python and Create a model in HDP.

If you use these commands and syntax, be aware that for Watson Studio Local, the repository URL is static, and you do not need to authenticate to the repository.

Restriction: You can insert only CSV and JSON files into data frames for a model.

For each scoring iteration that you run from a notebook, Watson Studio Local automatically increments the model version. Later in the model details page, you can compare the accuracies of each version that you ran.

Accuracy history

Create a model from a file

You can import three types of models:

PMML: An XML file written in the Predictive Model Markup Language. The PMML is scored using JPMML Evaluator (make sure you review the "Supported model types", "Not yet supported model types" and "Known Limitations").
Custom Batch: A third-party vendor model in a compressed .gz format that will perform batch scoring. If you are using the Carolina tool, see Carolina for Hadoop or Carolina Standalone product page for more details.
Requirement: You must ZIP up all scripts and dependent models into a single .gz file before you import it. You can use the utility script provided at /user-home/.scripts/common-helpers/batch/custom/createcustombatch.sh to zip the files.
Custom Online: A third-party vendor model in a .jar format that will perform online scoring.
Requirement: If you are using the Carolina tool, run your scripts through it to generate the .jar file. See the Carolina for Integration product page for more details.
To perform the online scoring, place all third-party JAR files and license file into the /user-home/_global_/libs/ml/mlscoring/thirdparty/carolina/ folder of Watson Studio Local.

To import a model into your project from a file, complete the following steps:

In your project, go to the Models page and click add models.
In the Create Model window, click the From File tab.
Specify the name and description.
In the Type file, select what kind of model you are importing.
Browse to the file or drag it into the box.
Click the Create button.

Test a model online

In the Models page of your project, click Real-time score next to the model to input data and simulate predictions on it as a pie chart or bar graph.

Real-time score

Batch score a model

Restriction: Spark, PMML, and WML models do not support data sets with DECIMAL column types.

To run batch prediction jobs that read in a data set, score the data, and output the predictions in a CSV file, complete the following steps:

In the Models page of your project, click Batch Score next to the model.
Select a local or remote Spark cluster, an input data set, and output data set (CSV file).
Restriction: WML models created in the visual model builder can only use a remote data set as the input data asset. Other types of models can use a local CSV file for the input.
Click the Generate batch script button. Watson Studio Local automatically generates a Python script that you can edit directly in the Result view.
Tip: This script can be customized to pre-process your data, for example, ensuring the case of the dataframe headers is suitable for ML models.
Click the Run now button to immediately create and run a job for the script. Alternatively, you can click Advanced settings to save the script as either a .py script or a .ipynb notebook in your project (ensure the file name is unique); then later from the Jobs page of your project, you can create a scheduled job for the script or notebook you saved with Type set to Batch scoring.
Restriction: If you select a GPU worker for the job, you can only batch score Keras models.

Requirement: If you are scoring a PMML or WML model in Python 3.5 or later, you must specify environment variable SPARK_VERSION=2.1. If you are scoring a Spark model in Python 3.5 or later, you can specify either SPARK_VERSION=2.1 or SPARK_VERSION=2.2.

Check the log to verify that the job completed successfully. An output CSV file should appear as a data set in your project. Click Preview next to the data set to view the contents.

Tip: If the job reports Success but no CSV file was outputted, the job might have failed. Validate whether the input table exists by using the remote data set in the notebook.

Batch output

From the job details page, you can click on each run to view results and logs. You can also view a batch scoring history from the model details.

Evaluate a model

Restriction: Spark, PMML, and WML models do not support data sets with DECIMAL column types.

To evaluate the performance of a model, complete the following steps:

In the Models page of your project, click Evaluate next to the model.
Select a local or remote Spark cluster.
Select an input data set that contains the prediction column. For each evaluator, you can opt to customize your own threshold metric and specify what fraction of the overall data must be relevant for the model to be considered healthy. For Spark 2.1 model evaluations, the output data set field is ignored.
Click the Generate evaluation script button. Watson Studio Local automatically generates a Python script that you can edit directly in the Result view.
Tip: You can customize this script to pre-process your data, for example, to ensure that the case of the data frame headers is suitable for ML models.
Click the Run now button to immediately create and run a job for the script. Alternatively, you can click Advanced settings to save the script as either a .py script or a .ipynb notebook in your project (ensure the file name is unique); then later from the Jobs page of your project, you can create a scheduled job for the script or notebook you saved with Type set to Model evaluation.

Requirement: If you are evaluating a PMML or WML model in Python 3.5 or later, you must specify environment variable SPARK_VERSION=2.1. If you are evaluating a Spark model in Python 3.5 or later, you can specify either SPARK_VERSION=2.1 or SPARK_VERSION=2.2.

From the job details page, you can click on each run to view results and logs. Go to the model details page to view the evaluation history.