Introduction
User modeling is an advanced feature of software systems. Universities, research institutes, and large companies take the lead in developing advanced software systems that incorporate user modeling engines, but that shouldn't exclude the small Web-development shops that believe it is beyond their reach to incorporate user-modeling technologies into their Web sites.
In this tutorial, I will show you how to construct a user-modeling platform without esoteric technologies. PHP and MySQL technologies are up to the task and furthermore, they may be the best technologies to use for a species of user-modeling called Web site user modeling.
Web site user modeling is a mathematical discipline; the math involved makes many people think this technique would pose a major stumbling block for them to incorporate into their Web sites. Therefore, a second objective of this tutorial is to show you that the relevant mathematics is not necessarily as difficult as you may expect. In this tutorial, I will discuss two mathematical concepts:
- Transition matrices
- Markov processes
These mathematical concepts will provide a simple and powerful perspective on how to model for Web site users.
This tutorial provides a brief review of the math used in an area of probability theory called the Markov process theory (or Markov chain theory).
Basic familiarity with matrices and probability are useful, but I will provide introductory coverage of the relevant concepts and mathematics. This tutorial can be a good starting point to familiarize yourself with Markov process theory; since it provides an interesting example application, namely Web site user modeling, it should make the mathematical concepts seem real and relevant.
Mainly, you need a willingness to learn some mathematical concepts. These concepts are presented using examples and some math notation, but primarily I focus on these concepts as reflected in their implementation as PHP classes. For example, you will map the mathematical concept of a transition matrix onto instance variables and methods supplied by a TransitionMatrix.php class. This approach invites you to learn more by downloading the code, playing with it, then extending it further as you learn more about these mathematical concepts.
In my opinion, clickstream data is being thrown away by most Web developers, jettisoned because they are not looking at clickstream data from the point of view of its user-modeling potential. Instead, after the fact, they run report generators to figure out the number of visitors per hour, day, week, and so on.
A count of visitors is useful information, but it does not exhaust the potential of clickstream data to inform you about how sites are individually used and how they might be dynamically reconfigured based upon up-to-date user-modeling data for each individual site visitor.
The idea of Web site user modeling suggests that developers might instead use clickstream data to update dynamic models of individuals users as they click through sites. Is this possible? If so, how can it be done? These are the questions that this tutorial will explore and propose answers to.
I will explore the Markov process theory as a way to solve a specific user-modeling problem, namely that of predicting what Web page a user will visit next. The section on Markov process theory brings with it many other applications that are only lightly touched upon in this tutorial. Should you wish to pursue Markov process theory beyond this tutorial, you will find a research area that is undergoing a renaissance pertaining to its application in many academic disciplines, from psychology to physics to compression algorithms to search engines to some of the hottest areas in statistics and mathematics. Hopefully, this tutorial will also provide you with a sense of what the Markov process buzz is about.
This tutorial has five sections:
- The first section, "What is a Web site user model? ," discusses in detail the components of Web site user modeling and highlights recent research in this area. This research suggests a simple three-component Web site user model as a way to organize thinking and research about Web site user modeling.
- The second section, "Fast and lean transition matrices ," discusses the concept of a transition matrix and how to implement it using PHP and MySQL.
-
The third section, "Encoding visitor clickstreams
," discusses how to use the
TransitionMatrix.phpclass along with cookie and session variables to gather transition frequency data for each Web site user. -
The fourth section, "Modeling users as Markov processes
," discusses Markov process theory, which is concerned with calculating the probability of future states based on transition matrix data. In this section, I explain what a Markov process is and how you might implement some common Markov probability calculations in PHP. A
Markov.phpclass implementing these calculations is discussed. - In the final section, "Wrap up ," I discuss clickstream data issues, other applications of Markov process theory, and ethical issues involved in gathering Web site user-modeling data.

