# Explore

This feature requires the Statistics Base option.

The Explore procedure produces summary statistics and graphical displays, either for all of your cases or separately for groups of cases. There are many reasons for using the Explore procedure--data screening, outlier identification, description, assumption checking, and characterizing differences among subpopulations (groups of cases). Data screening may show that you have unusual values, extreme values, gaps in the data, or other peculiarities. Exploring the data can help to determine whether the statistical techniques that you are considering for data analysis are appropriate. The exploration may indicate that you need to transform the data if the technique requires a normal distribution. Or you may decide that you need nonparametric tests.

**Example.** Look at the distribution of maze-learning times
for rats under four different reinforcement schedules. For each of
the four groups, you can see if the distribution of times is approximately
normal and whether the four variances are equal. You can also identify
the cases with the five largest and five smallest times. The boxplots
and stem-and-leaf plots graphically summarize the distribution of
learning times for each of the groups.

**Statistics and plots.** Mean, median, 5% trimmed mean, standard
error, variance, standard deviation, minimum, maximum, range, interquartile
range, skewness and kurtosis and their standard errors, confidence
interval for the mean (and specified confidence level), percentiles,
Huber's M-estimator, Andrews' wave estimator, Hampel's redescending
M-estimator, Tukey's biweight estimator, the five largest and five
smallest values, the Kolmogorov-Smirnov statistic with a Lilliefors
significance level for testing normality, and the Shapiro-Wilk statistic.
Boxplots, stem-and-leaf plots, histograms, normality plots, and spread-versus-level
plots with Levene tests and transformations.

Explore Data Considerations

**Data.** The Explore procedure can be used for quantitative
variables (interval- or ratio-level measurements). A factor variable
(used to break the data into groups of cases) should have a reasonable
number of distinct values (categories). These values may be short
string or numeric. The case label variable, used to label outliers
in boxplots, can be short string, long string (first 15 bytes), or
numeric.

**Assumptions.** The distribution of your data does not have
to be symmetric or normal.

To Explore Your Data

This feature requires the Statistics Base option.

- From the menus choose:
- Select one or more dependent variables.

Optionally, you can:

- Select one or more factor variables, whose values will define groups of cases.
- Select an identification variable to label cases.
- Click Statistics for robust estimators, outliers, percentiles, and frequency tables.
- Click Plots for histograms, normal probability plots and tests, and spread-versus-level plots with Levene's statistics.
- Click Options for the treatment of missing values.

This procedure pastes EXAMINE command syntax.