In the first article in this series, you developed PHP-based code that did the following:
- Generated a 1-, 2- or 3-factor experimental design with a randomized order of presentation
- Assigned a Web offer version (factor-level combination) to each new and returning Web site visitor
- Logged whether a Web site visitor responded to the particular Web offer version they were assigned (along with the number of exposures and the time it took to elicit their response)
My purpose in this article is to analyze the resulting data. (You will absorb the reasoning in this article more easily if you have read or are familiar with concepts discussed in the prerequisite articles suggested in Resources.)
Categorical data analysis, or CDA, is concerned with the simulation and analysis of data measured using a categorical scale of measurement. CDA is relevant to your goals because the WebOffer data table consists of two categorical explanatory variables (an image factor and a text factor) and the main categorical response variable (joined). To develop models for how these variables might be related, you will find CDA concepts and techniques useful.
Table 1 displays the subset of the WebOffer columns that I will focus on in this article.
Table 1. WebOffer columns to be analyzed
| image | text | joined |
| person | long | NULL |
| person | short | y |
| product | short | NULL |
| ... | ||
Observe that none of the columns contains numeric data, so the numeric operations you can perform on this data given your measurement scale are limited to counting the number of times that particular factor-level combinations were present when a response occurred (such as, computing joint frequency counts).
If you decided to conduct your Web offer study using a sample size of 400 visitors, you might observe the following response totals (joined=y) for the two levels of your IMAGE and TEXT factors (or, your four factorially constructed ad banner versions):
Table 2. Tabular summary of Web offer results
| TEXT | ||||
| short | long | sum | ||
| IMAGE | person | 2 | 8 | 10 |
| product | 6 | 2 | 8 | |
| sum | 8 | 10 | 18 | |
Table 2 is called a 2x2 contingency table and you will learn how to simulate and analyze the count data appearing in it.
To simulate contingency table data, you will have to construct a sampling model that specifies:
- The probability distribution to use
- The parameter estimates to use
- Any constraints on the form of the data to be generated (like censoring, sampling without replacement, or structural zeros)
Discrete probability distributions
In CDA, three discrete probability distributions are commonly used in sampling models. Table 3 lists probability distributions along with other basic information about them.
Table 3. Commonly used discrete probability distributions
| Distribution | Parameters | Probability Density Function | Mean | Variance |
| Binomial | n, p | P(x | n, p) = n! / ((n-x)! x!) pn (1 - p)n-x | np | np(1 - p) |
| Poisson | P(x | | |||
| Multinomial | n, p1, .., pn | P(x1,.., xn | n, p1,.., pn) = ( n! / x1! .. xn! ) p1x1 .. pnxn | npi | npi(1 - pi) |
If 18 of the last 400 visitors to your Web site responded to your offer, then the maximum likelihood estimate (MLE) of the probability of responding p can be mathematically shown to be 18 / 400 = 0.045 with a Beta distribution for the p parameter. (This is illustrated further in "Implement Bayesian inference using PHP: Part 2.") This result also agrees with your common sense intuitions about the best estimate of p to use and the shape of your uncertainty about the estimate (represented by the Beta distribution).
Given this limited amount of information about your sample, you can construct a null effects binomial sampling model for your contingency table data by instantiating four separate binomial distributions with the same parameter values, as in Listing 1:
Listing 1. Sampling model consisting of four independent binomial distributions with same parameter values
<?php require_once "config.php"; require_once PHPMATH . "/PDL/PoissonDistribution.php"; $trials = 100; $p = 0.045; $bin1 = new BinomialDistribution($trials, $p); $bin2 = new BinomialDistribution($trials, $p); $bin3 = new BinomialDistribution($trials, $p); $bin4 = new BinomialDistribution($trials, $p); ?> |
If you then call a binomial random number generator 100 times (as in, $bin->RNG(100)) for each instantiated binomial, you can obtain a simulated cell count by summing the returned array of 0s and 1s, as in Listing 2:
Listing 2. Simulating cell counts under null effect binomial sampling model
<?php $cell1 = array_sum($bin1->RNG($trials)); $cell2 = array_sum($bin2->RNG($trials)); $cell3 = array_sum($bin3->RNG($trials)); $cell4 = array_sum($bin4->RNG($trials)); ?> |
You can use the values returned from your null effects binomial sampling model to simulate your contingency table data under the assumption that your experimental factors have no differential effect.
Conversely, a binomial effects sampling model can be implemented by instantiating your four binomial distributions with different p values to represent the differential effectiveness of your factors in eliciting a response.
The binomial distribution converges to the poisson distribution for large values of N and small values of p . While the binomial distribution is still conceptually applicable to simulating your results, the poisson distribution is more often used to represent rare-event distributions and behaves more reliably than my current binomial distribution implementation when supplied with extreme parameter settings (such as large N values and small p values).
The poisson distribution accepts only one parameter, a success-rate parameter called lambda (
). For a null effects poisson sampling model , you can efficiently simulate the effects of four independent poisson variables by instantiating the poisson distribution once with λ=0.045 and calling the RNG method four times (simulating 100 experimental trials per RNG call).
Listing 3. Simulating cell counts under the null effects poisson sampling model
<?php require_once "config.php"; require_once PHPMATH . "/PDL/PoissonDistribution.php"; $lambda = 0.045; $trials = 100; $pois = new PoissonDistribution($lambda); $cell1 = array_sum($pois->RNG($trials)); $cell2 = array_sum($pois->RNG($trials)); $cell3 = array_sum($pois->RNG($trials)); $cell4 = array_sum($pois->RNG($trials)); ?> |
The contingency table data appearing in Table 2 was generated by this script.
You can implement a poisson effects sampling model by instantiating your four poisson distributions with different
values to represent the differential effectiveness of your factors in eliciting a response.
You can use the poisson distribution to simulate the data that you might observe if you conducted your Web offer study. To select the lambda parameter values to use, you may find it easier to create bar charts of the possible outcomes by selecting the lambda parameter values and trial parameter values required to reliably reproduce the bar chart that you deem the most likely outcome of your Web experiment.
Figure 1. A likely outcome

This bar chart (in Figure 1) reflects the theory that the IMAGE factor and TEXT factor will each exert an effect. No interaction between the factors is expected. You expect the two effects associated with your factor levels to combine in an additive manner.
You will often find it useful to engage in data simulation and graphing as part of the planning phase of a Web experiment. The exercise can serve to clarify data-analysis issues that you might not have anticipated. Issues related to the power of your experiment and the sample size required to detect differences can also be tackled if you have a simulation and graphing tools you can use to explore possible outcomes.
Eliciting your prior distribution
As a Bayesian sympathizer, I also recommend pre-experimental graphing and data simulation to effectively elicit your subjective prior distribution for the lambda
parameters. You can use your prior distribution for the lambda parameters to derive the expected counts to use in the chi-square model-fitting procedure. I will examine this possibility in more depth after I discuss the chi-square formula and the concept of an expected count.
Applying the chi-square test procedure to contingency table data involves computing a goodness-of-fit score by summing over the differences between the observed cell counts and expected cell counts for each cell in the contingency table.
Following are the formulas that compute the chi-square score for one-, two- and three-dimensional contingency tables. The r, c, and l subscripts denote the number of levels associated with the row, column, and layer factors. The second formula is the one I will use in this article.
Figure 2a. Chi-square model-fitting formula, one factor

Figure 2b. Chi-square model-fitting formula, two factors

Figure 2c. Chi-square model-fitting formula, three factors

Use these three model-fitting formulas to analyze the contingency table data resulting from 1-, 2-, or 3-factor Web experiments. (Implementations of the chi-square test procedure for the analysis of 1-, 2-, and 3-dimensional contingency tables are contained in the Chi1D.php, Chi2D.php, and Chi3D.php classes bundled in the downloadable CHI Package accompanying this article. See Resources.)
Generalizations of the chi-square formula to more than three dimensions are possible (for example, add a summation and subscript to the formula for each additional factor), but not used in practice. See logistic and loglinear analysis if you want to analyze experiments having three or more categorical variables. Chi3D.php is still a work in progress.
In all the formulas, the difference score for each cell (such as, E - O ) is squared and rescaled by dividing by the expected value E for that cell. These normalized difference scores are then summed to arrive at an overall goodness-of-fit score.
The fundamental question you need to ask when using the chi-square test procedure is how to compute the expected values to use in this model-fitting procedure.
One way to compute the expected cell counts is to assume that no effects are associated with any of your experimental factors. Under this null effects model, you would expect to observe similar cell counts among all four cells in your 2x2 contingency table (also know as homogeneity of proportions).
Under this null effects model, you can compute the expected number of responses for each cell with the formula
E(n) = Np
in which N is the total number of Web offers administered and p is the probability of responding. In turn, you can estimate the value to use for p using this formula
p = r/N
in which r denotes the number of visitors who responded to the Web offer in N trials. In other words, the maximum likelihood estimate (MLE) of p is the response percentage to date.
One of the first statistical analyses that you might want to conduct on your Web offer results is apply the two-dimensional chi-square procedure to your contingency table data in which the expected counts to use are derived under the assumption that no effects are present. This is equivalent to assuming that you are sampling from a homogeneous population. When you use the chi-square test procedure to measure the variance between your observed frequencies and expected frequencies, you will discard the null effects model if the summed difference scores are normally too large to be generated by a sampling model consisting of four independent but identically parameterized poisson random deviates (where the
parameter is estimated using r/N).
Another model that you might want to test is one that assumes that cell probabilities are the simple product of the marginal probabilities.
pij = pi+p+j
with row marginals computed using this formula
pi+ = ni+ / N
and column marginals using this formula
p+j = n+j / N
Table 4 illustrates how to use these formulas to convert a table of frequency counts (see Table 2) to a table of response probability estimates.
Table 4. Converting observed frequencies to probability estimates
| TEXT | ||||
| short | long | sum | ||
| IMAGE | person | p11 = (10/18) * (8/18) = 0.2469 | p12 = (10/18) * (10/18) = 0.3086 | p1+ = 10/18 |
| product | p21 = (8/18) * (8/18) = 0.1975 | p23 = (8/18) * (10/18) = 0.2469 | p2+ = 8/18 | |
| sum | p+1 = 8/18 | p+2 = 10/18 | 18 | |
You can use these probability estimates to derive the expected cell count where Eij is equal to Npi+p+j.
Table 5. Converting probability estimates to expected counts
| TEXT | ||||
| short | long | sum | ||
| IMAGE | person | E11 = 18 * 0.2469 = 4.4442 | E12 = 18 * 0.3086 = 5.5548 | 10 |
| product | E21 = 18 * 0.1975 = 3.555 | E22 = 18 * 0.2469 = 4.4442 | 8 | |
| sum | 8 | 10 | 18 | |
The product rule pij = pi+p+j expresses the idea of factor independence, the idea that Factor A exerts a constant factor-level effect regardless of the level of Factor B (and vice versa).
Test this "independence model" (and the expected cell counts derived from it) using the chi-square goodness-of-fit procedure. A large summed-differences score returned by the two-dimensional chi-square test procedure tells you that your factors are not independent. Your theoretical goal might then be viewed as trying to find the simplest model to explain your results.
The most complex model, called the saturated model, requires at least one parameter to represent each cell in the table. When modeling your data, your aim might be to reduce that number (use the same parameter estimate for more than one cell) while accurately accounting for the data patterns.
If your observed chi-square score is not significant (as in a null interaction), then examine each factor separately to determine whether there were any main effects and if so, what their size is. You can use the one-dimensional chi-square procedure to assess main effects (such as factor-level differences for one factor) once you recompute your cell totals by collapsing over (or ignoring) the levels of the other factor. You can think of one-dimension chi-square analysis as doing main effects analyses on the row or column marginals. The Chi1D.php and Chi2D.php classes also have a showResidualErrors() method that reports the residual error between your expected and observed counts. Examination of residuals is a critical part of the chi-square model-fitting procedure.
I use the independence model as the default model in Chi2D.php to compute the expected frequencies for use in the two-dimensional chi-square analysis. This is because the two-dimensional chi-square procedure is most commonly used in experimental contexts to test for possible interactions between your categorical variables where the null model is the factor independence model.
Another way to derive the expected counts to use in the chi-square model-fitting procedure is to base your estimates of pij on subjective probability estimates elicited by simulating and graphing your results during pre-experimental planning. A Bayesian might ultimately formalize prior belief in experimental outcomes as a set of elicited lambda estimates from which expected frequencies can be derived (like
ij * N = Eij where N is the number of experimental trials) and fitted to the observed frequencies using the chi-square procedure.
The informational value of the experiment would be proportional to the size of the chi-square score computed when the experiment is over and the observed cell counts are compared to the expected counts derived from your prior distribution for the lambda
parameters. The expected counts might be denoted Eijprior to distinguish this way of computing the expected counts from the two different methods used in the null effects model and the independence model.
I will tie together these ideas on simulating and analyzing contingency tables by building a doe_explorer.php script that:
- Simulates the contingency table data that you might observe if you were to run a Web offer experiment
- Performs a chi-square test on the simulated contingency table data
Use the doe_explorer.php script during the planning stages of a Web experiment to:
- Determine the number of subjects required to detect an hypothesized effect of a given size.
- Elicit rigorous subjective estimates of the poisson lambda parameters. These estimates can compute the expected cell counts to use in your test for the information value of the experiment.
- Determine whether possible findings warrant the expenditure of time and resources to run the Web experiment.
Listing 4 presents the doe_explorer.php source. The output of this script is displayed in the next section, "Explorer output."
Listing 4. Source code of doe_explorer.php
<?php
/**
* @package CHI
*
* Simulates and analyzes data from a hypothetical Web offer study.
*
* @author Paul Meagher
* @license PHP v3.0
* @version 0.3
*/
require_once "config.php";
require_once PHPMATH ."/PDL/PoissonDistribution.php";
require_once PHPMATH ."/CHI/Chi2D.php";
$chi = new Chi2D;
// Step 1. Load factors names and factor levels.
$factors["image"] = array("person", "product");
$factors["text"] = array("short", "long");
$chi->setFactors($factors);
// Step 2. Simulate the outcome of a Web offer study.
$lambda1 = 0.05; $lambda2 = 0.02;
$lambda3 = 0.02; $lambda4 = 0.03;
$trials = 500;
$pois1 = new PoissonDistribution($lambda1);
$pois2 = new PoissonDistribution($lambda2);
$pois3 = new PoissonDistribution($lambda3);
$pois4 = new PoissonDistribution($lambda4);
$cell1 = array_sum($pois1->RNG($trials));
$cell2 = array_sum($pois2->RNG($trials));
$cell3 = array_sum($pois3->RNG($trials));
$cell4 = array_sum($pois4->RNG($trials));
$obs_freqs["person"]["short"] = $cell1;
$obs_freqs["person"]["long"] = $cell2;
$obs_freqs["product"]["short"] = $cell3;
$obs_freqs["product"]["long"] = $cell4;
// Step 3. Load simulated observed frequencies.
$chi->setObservedFrequencies($obs_freqs);
// Step 4. Analyze the simulated data.
$chi->analyze();
// Step 5. Show cross tabulated
$chi->showContingencyTable();
echo "<br />";
// Step 6. Show residuals.
$chi->showResiduals();
// Step 7. Show bar graph of results
$params["figureTitle"] = "Web Offer Analysis";
$params["plotWidth"] = 300;
$params["plotHeight"] = 200;
$params["yTitle"] = "Responses";
$params["yMin"] = 0;
$params["yMax"] = 50;
$params["yTicks"] = 5;
$params["yHideMajor"] = false;
$params["yHideMinor"] = true;
$params["xTitle"] = "Offer Variants";
$params["xLabels"] = array("PER-SH","PER-LO","PRD-SH","PRD-LO");
$params["xHideMajor"] = true;
$params["xHideMinor"] = true;
$params["yData"] = array($cell1, $cell2, $cell3, $cell4);
$chi->showBarGraph($params);
echo "<br />";
// Step 8. Show line graph of results
$params["xTitle"] = "Image Factor";
$params["xLabels"] = array("PERSON","PRODUCT");
$params["yData1"] = array($cell1, $cell3);
$params["yData2"] = array($cell2, $cell4);
$chi->showLineGraph($params);
?>
|
Note that I selected the lambda parameters ($lambda1=0.05, $lambda2=0.02, $lambda3=0.02, $lambda4=0.03) and the trial parameters ($trials=500) in Listing 4 to illustrate how to simulate an interaction. I used empirically reasonable estimates of the lambda parameter sizes (0.05 success rate or less for responding to an ad banner).
The trials parameter value of 500 represents a reasonable estimate of the number of trials (sample size per condition) you would need to run in order to reliably observe the hypothesized interaction given your effect sizes (specified through different lambda settings). Without the doe_explorer.php tool, it would be more difficult to obtain a rigorous and intuitive sense of how sample size and effect sizes might interact in your experiment and what parameter estimates are reasonable to use.
Tables 6 and 7 and Figures 3 and 4 are screens that were generated by pointing my browser at the doe_explorer.php script. The simulated data is meant to represent realistic response data for 2,000 site visitors (or 500 simulated visitors per factor-level combination). If you refresh your browser on this script, you will observe different outcomes; it was by using this primitive method that I estimated how many subjects might be required to detect the relatively small effects (expressed as differences in the lambda parameter) one might expect to observe in a proposed Web offer study.
You could further develop this monte carlo procedure to obtain more rigorous estimates of the sample size required to reliably detect differences (in 95 percent of runs) given different estimates of the effect size (lambda value differences).
In the doe_explorer.php script, the $this->showContingencyTable() method is used to display simulated contingency table data:
Table 6. Output of the showContingencyTable() method
| text | ||||
| Short | Long | sum | ||
| image | Person | 24 | 12 | 36 |
| Product | 6 | 13 | 19 | |
| sum | 30 | 25 | 55 | |
The $this->showResiduals() method displays the difference between observed and expected counts (see Table 7). Note that the overall chi-square score is 6.18 and that the probability of this outcome under the independence model is 0.0130, which tells you that your test detected the interaction effect you specified. If you lower the $trials parameter value from 500 to 100, your ability to detect a specified interaction is diminished (which is one reason I chose a larger value of 500).
Table 7. Output of the showResiduals() method
| Cell | Oij | Eij | (Oij - Eij) | (Oij - Eij)2 | (Oij - Eij)2 / Eij |
| person-short | 24 | 19.64 | 4.36 | 19.04 | 0.97 |
| person-long | 12 | 16.36 | -4.36 | 19.04 | 1.16 |
| product-short | 6 | 10.36 | -4.36 | 19.04 | 1.84 |
| product-long | 13 | 8.64 | 4.36 | 19.04 | 2.20 |
| Sums | 55 | 55 | 0.00 | 76.17 | 6.18 |
| P(6.18, 1) = 0.0130 | |||||
| Critical X2 for alpha=0.05 is 3.84 | |||||
The $this->showBarGraph() method displays your simulated contingency table data in bar graph format:
Figure 3. Output of showBarGraph() method

The $this->showLineGraph() method displays your simulated data in line graph format:
Figure 4. Output of showLineGraph() method

Note that the line graph rendering makes it particularly easy to see that a cross-over interaction is present. You should also consider visualizing your results using grouped bar charts which is a graph type that JPGraph also supports.
Computer simulation and mathematical analysis are two major approaches categorical data analysts use to understand contingency table data. In this article, you learned how to both simulate and analyze contingency table data arising from a two-factor Web experiment.
You can achieve data simulation by sampling from a theoretical probability distribution with an appropriate set of parameter estimates. You learned that data simulation can be used to:
- Assist in the planning stages of a Web experiment
- Determine the number of subjects needed to detect an effect of a given size (also known as power analysis)
- Rigorously elicit subjective estimates of the lambda parameters so you can derive the expected counts to use in a test of the information value of an experiment
- Help determine whether a proposed Web experiment is worth running
In this series, I have discussed the application of DOE and CDA techniques to the task of improving the quality of Web site offers. In particular, I looked at improving the quality of the ad banner component by manipulating image and text channel factors and examining the response counts for the different factor-level combinations.
The simplest ad banner experiment involves manipulating one factor, namely, the ad banner version (the version factor) and conducting a one-dimensional chi-square analysis on the resulting response counts. Such one-variable-at-a-time experiments can be inefficient and ineffective ways to accumulate knowledge about your Web site. Factorial Web experiments are more efficient and effective knowledge acquisition tools when planned, administered, and analyzed appropriately.
| Name | Size | Download method |
|---|---|---|
| wa-phpexp2.tar.gz | 45.4 KB | HTTP |
Information about download methods
- Download the source code for this article.
- To get updates to the SR, CHI and PDL Packages (webexp02.tar.gz) as they are available, go to www.phpmath.com.
- Review necessary concepts for this article in these prerequisite articles:
- "Conduct Web Experiments using PHP, Part 1" sets the framework for the Web offer experiments -- a must-read to get a full appreciation of this article (developerWorks, October 2004).
- "Apply probability models to Web data using PHP" introduces the basic concepts, techniques, and PHP-based tools that define the area of probability modeling and probability distributions (developerWorks, October 2003).
- "Implement Bayesian inference using PHP: Part 2" examines how to use Bayesian inference methods to solve parameter estimation problems (parameter estimations being the process of using sample data to estimate the value of a population or a model parameter (developerWorks, April 2004).
- "Take Web data analysis to the next level with PHP" is a comprehensive overview of how to design Web data analysis that goes beyond simple raw counts (developerWorks, August 2003).
-
For a list of references on CDA, take a look at Alan Agresti's site. It includes a link to his book, Categorical Data Analysis, 2nd Ed. (John Wiley and Sons, July 2002).
-
For a concise, clear introduction to CDA, read Brian Everitt's The Analysis of Contingency Tables, 2nd Ed. (Chapman and Hall, February 1992).
- Check out
Steven Fienberg's site for a list of resources on CDA, including the classic text he co-wrote entitled Discrete Multivariate Analysis: Theory and Practice (MIT Press, June 1977).
- Take an intriguing look into CDA techniques research with the a list of publications on Leo Goodman's site.
- In the paper "CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies," discover how IBM researchers are exploiting the chi-square procedure in the sampling tool CORDS (CORelation Detection via Sampling) (PDF).
- Read up on probability and simulation concepts in "Desert Island Math" (developerWorks, January 2004).
- Learn more about the single-factor anova technique in "ANOVA Statistical Programming with PHP." You can use this quantitative analogue of the single-factor chi-square technique to analyse other aspects of your Web experimental data, such as response time.
- Get a copy of JpGraphs library (which the author used to render the graphs in this article).
-
Want to learn more about the Chi Square procedure? Try Chapter 8: Chi-Square Procedures for the Analysis of Categorical Frequency Data of Richard Lowry's e-textbook Concepts and Applications of Inferential Statistics.
- Check out
Michael Friendly's page which displays several graphical methods for categorical data analysis.
- Web quality control experiments may generally require a larger N than usability studies. For more on this, read Jakob Neilsen's essay "Card Sorting: How Many Users to Test on sample size selection in usability research.





