# Introduction to bootstrapping

When collecting data, you are often interested in the properties of the population from which you
took the sample. You make inferences about these population parameters with estimates computed from
the sample. For example, if the *Employee data.sav* dataset that is included with the product
is a random sample from a larger population of employees, then the sample mean of $34,419.57 for
*Current salary* is an estimate of the mean current salary for the population of employees.
Moreover, this estimate has a standard error of $784.311 for a sample of size 474, and so a 95%
confidence interval for the mean current salary in the population of employees is $32,878.40 to
$35,960.73. But how reliable are these estimators? For certain "known" populations and well-behaved
parameters, we know quite a bit about the properties of the sample estimates, and can be confident
in these results. Bootstrapping seeks to uncover more information about the properties of estimators
for "unknown" populations and ill-behaved parameters.

## How bootstrapping works

At its simplest, for a dataset with a sample size of *N*, you take *B* "bootstrap"
samples of size *N* with replacement from the original dataset and compute the estimator for
each of these *B* bootstrap samples. These *B* bootstrap estimates are a sample of size
*B* from which you can make inferences about the estimator. For example, if you take 1,000
bootstrap samples from the *Employee data.sav *dataset, then the bootstrap estimated standard
error of $776.91 for the sample mean for *Current salary* is an alternative to the estimate of
$784.311.

Additionally, bootstrapping provides a standard error and confidence interval for the median, for which parametric estimates are unavailable.

## Support for bootstrapping in the product

Bootstrapping is incorporated as a subdialog in procedures that support bootstrapping. See Procedures that support bootstrapping for information on which procedures support bootstrapping.

When bootstrapping is requested in the dialogs, a new and separate `BOOTSTRAP`

command is pasted in addition to the usual syntax generated by the dialog. The
`BOOTSTRAP`

command creates the bootstrap samples according to your specifications.
Internally, the product treats these bootstrap samples like splits, even though they are not
explicitly shown in the Data Editor. This means that, internally, there are effectively
*B***N* cases, so the case counter in the status bar will count from 1 to
*B***N* when processing the data during bootstrapping. The Output Management System (OMS)
is used to collect the results of running the analysis on each "bootstrap split". These results are
pooled, and the pooled bootstrap results displayed in the Viewer with the rest of the usual output
generated by the procedure. In certain cases, you may see a reference to "bootstrap split 0"; this
is the original dataset.