Modeling Assumptions

As you begin to narrow down your modeling tools of choice, take notes on the decision-making process. Document any data assumptions as well as any data manipulations made to meet the model's requirements.

For example, both the Logistic Regression and Neural Net nodes require the data types to be fully instantiated (data types are known) before execution. This means you will need to add a Type node to the stream and execute it to run the data through before building and running a model. Similarly, predictive models, such as C5.0, may benefit from rebalancing the data when predicting rules for rare events. When making this type of prediction, you can often get better results by inserting a Balance node into the stream and feeding the more balanced subset into the model.

Be sure to document these types of decisions.