Building a Model

By exploring and manipulating the data, you have been able to form some hypotheses. The ratio of sodium to potassium in the blood seems to affect the choice of drug, as does blood pressure. But you cannot fully explain all of the relationships yet. This is where modeling will likely provide some answers. In this case, you will use try to fit the data using a rule-building model, C5.0.

Since you are using a derived field, Na_to_K, you can filter out the original fields, Na and K, so that they are not used twice in the modeling algorithm. You can do this using a Filter node.

Figure 1. Editing the Filter node
Editing the Filter node

On the Filter tab, click the arrows next to Na and K. Red Xs appear over the arrows to indicate that the fields are now filtered out.

Next, attach a Type node connected to the Filter node. The Type node allows you to indicate the types of fields that you are using and how they are used to predict the outcomes.

On the Types tab, set the role for the Drug field to Target, indicating that Drug is the field you want to predict. Leave the role for the other fields set to Input so they will be used as predictors.

Figure 2. Editing the Type node
Editing the Type node

To estimate the model, place a C5.0 node in the workspace and attach it to the end of the stream as shown. Then click the green Run toolbar button to run the stream.

Figure 3. Adding a C5.0 node
Adding a C5.0 node

Next