Training models by using a subset of data

Prescriptive Maintenance on Cloud trains models by using 100% of historical data. However, you can train by using a subset of data.

It is common to use a subset of the available historical data to train a model. This way you can compare multiple iterations during the model build process and pick the iteration with the lowest error. However, Prescriptive Maintenance on Cloud uses 100% of historical data. If some of the training data is eliminated, overall model accuracy decreases.

If you prefer to use a subset of data for training, you can upload a subset of the historical data for training, then upload the rest of the data for scoring without retraining. You can measure accuracy by using the predictions that are obtained from the data that is used for scoring. After you measure accuracy in this way, retrain by using all data. Doing this improves model accuracy over the number you measured previously.

Failure prediction models are sensitive to the number of failure records that they are trained on because failures are rare events. If you eliminate a random number of rare events from the training data, the accuracy of the model and prediction stability of subsequent training jobs is compromised.