Machine learning models
The Watson Studio Local client provides tools to help you create and train machine learning models that can analyze data assets and extract value from them. Users can also deploy their models to make them available to a wider audience.
Tasks you can perform:
- Create a model with APIs
- Create a model from a file
- Test a model online
- Batch score a model
- Evaluate a model
Watson Studio Local supports the following machine learning model types:
- Spark ML
- PMML with online scoring
- Custom models with batch scoring
- scikit-learn 0.19.1 (Python 2.7 and Python 3.5) - 0.19.1 (GPU-Python 3.5) with pickle or joblib format
- XGBoost 0.7.post3 (Python 2.7 and 3.5) - 0.71 (GPU-Python 3.5)
- Keras 2.1.3 (Python 2.7 and Python 3.5) - 2.1.5 (GPU-Python 3.5)
- TensorFlow 1.5.0 (Python 2.7 and Python 3.5) - 1.4.1 (GPU-Python 3.5)
- WML
Create a model with APIs
Watson Studio Local provides sample notebooks to help users create their own custom applications that can be powered by machine learning.
To learn more about the machine learning repository client API commands and syntax that you can use inside a notebook, see Save a model in Python and Create a model in HDP.
If you use these commands and syntax, be aware that for Watson Studio Local, the repository URL is static, and you do not need to authenticate to the repository.
For each scoring iteration that you run from a notebook, Watson Studio Local automatically increments the model version. Later in the model details page, you can compare the accuracies of each version that you ran.
Create a model from a file
You can import three types of models:
- PMML
- An XML file written in the Predictive Model Markup Language. The PMML is scored using JPMML Evaluator (make sure you review the "Supported model types", "Not yet supported model types" and "Known Limitations").
- Custom Batch
- A third-party vendor model in a compressed .gz format that will perform
batch scoring. If you are using the Carolina tool, see Carolina for
Hadoop or Carolina Standalone product page for more details.Requirement: You must ZIP up all scripts and dependent models into a single .gz file before you import it. You can use the utility script provided at /user-home/.scripts/common-helpers/batch/custom/createcustombatch.sh to zip the files.
- Custom Online
- A third-party vendor model in a .jar format that will perform online
scoring. Requirement: If you are using the Carolina tool, run your scripts through it to generate the .jar file. See the Carolina for Integration product page for more details.To perform the online scoring, place all third-party JAR files and license file into the /user-home/_global_/libs/ml/mlscoring/thirdparty/carolina/ folder of Watson Studio Local.
To import a model into your project from a file, complete the following steps:
- In your project, go to the Models page and click add models.
- In the Create Model window, click the From File
tab.
Specify the name and description.
- In the Type file, select what kind of model you are importing.
- Browse to the file or drag it into the box.
- Click the Create button.
Test a model online
In the Models page of your project, click Real-time score next to the model to input data and simulate predictions on it as a pie chart or bar graph.
Batch score a model
To run batch prediction jobs that read in a data set, score the data, and output the predictions in a CSV file, complete the following steps:
- In the Models page of your project, click Batch
Score next to the model.
- Select a local or remote Spark cluster, an input data set, and output data set (CSV
file).Restriction: WML models created in the visual model builder can only use a remote data set as the input data asset. Other types of models can use a local CSV file for the input.
- Click the Generate batch script button. Watson Studio Local automatically generates a Python script that you can edit
directly in the Result view.Tip: This script can be customized to pre-process your data, for example, ensuring the case of the dataframe headers is suitable for ML models.
- Click the Run now button to immediately create and run a job for the
script. Alternatively, you can click Advanced settings to save the script as
either a .py script or a .ipynb notebook in your project (ensure the file name is unique); then
later from the Jobs page of your project, you can create a scheduled job for
the script or notebook you saved with Type set to Batch
scoring. Restriction: If you select a GPU worker for the job, you can only batch score Keras models.
Requirement: If you are scoring a PMML or WML model in Python 3.5 or later, you must specify environment variableSPARK_VERSION=2.1. If you are scoring a Spark model in Python 3.5 or later, you can specify eitherSPARK_VERSION=2.1orSPARK_VERSION=2.2.
From the job details page, you can click on each run to view results and logs. You can also view a batch scoring history from the model details.
Evaluate a model
To evaluate the performance of a model, complete the following steps:
- In the Models page of your project, click Evaluate
next to the model.
- Select a local or remote Spark cluster.
- Select an input data set that contains the prediction column. For each evaluator, you can opt to customize your own threshold metric and specify what fraction of the overall data must be relevant for the model to be considered healthy. For Spark 2.1 model evaluations, the output data set field is ignored.
- Click the Generate evaluation script button. Watson Studio Local automatically generates a Python script that you can edit
directly in the Result view. Tip: You can customize this script to pre-process your data, for example, to ensure that the case of the data frame headers is suitable for ML models.
- Click the Run now button to immediately create and run a job for the
script. Alternatively, you can click Advanced settings to save the script as
either a .py script or a .ipynb notebook in your project (ensure the file name is unique); then
later from the Jobs page of your project, you can create a scheduled job for
the script or notebook you saved with Type set to Model
evaluation.
Requirement: If you are evaluating a PMML or WML model in Python 3.5 or later, you must specify environment variableSPARK_VERSION=2.1. If you are evaluating a Spark model in Python 3.5 or later, you can specify eitherSPARK_VERSION=2.1orSPARK_VERSION=2.2.
From the job details page, you can click on each run to view results and logs. Go to the model details page to view the evaluation history.