Creating a Scatterplot of the Dependent by the Independent

  1. To produce a scatterplot of infected e-mails by time, from the menus choose:

    Graphs > Chart Builder...

    Figure 1. Chart Builder
    Chart Builder
  2. Select the Scatter/Dot gallery and choose Simple Scatter.
  3. Select Proportion of infected messages as the y variable and Hours since detection as the x variable.
  4. Click OK.

These selections produce the scatterplot.

Figure 2. Scatterplot of Proportion of infected messages by Hours since detection
Scatterplot of Proportion of infected messages by Hours since detection

The resulting scatterplot shows a rise, leveling out, and eventual decline in the proportion of infected e-mails over time. The shape of the plot is such that it is unlikely that a single nonlinear equation will both provide a good fit and allow sufficient interpretability. Closer examination suggests that a segmented model could perform quite well here.

The initial curve in the plot has an S-shape -- there is an initial bend before the rapid rise, followed by another bend as it levels off. A classic growth curve, the logistic equation, can be used to model this shape.

At approximately hour 20, the proportion of infected e-mails drops precipitously with each passing hour and the rate at which the proportion drops appears to decrease with time, until the virus threat is essentially eliminated. An appropriate model for this kind of pattern is the asymptotic regression model.

A segmented model that uses a logistic equation for the first 19 hours and an asymptotic regression for the remaining hours should provide a good fit and interpretability over the entire time period.

Next