Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Web site user modeling with PHP

Paul Meagher (paul@datavore.com), CEO, Datavore Productions
Paul Meagher is a freelance Web developer, writer, and data analyst. Paul has a graduate degree in Cognitive Science and has spent the last six years developing Web applications. His current projects and interests center around e-learning, content management, and math-enabled Web applications. Paul resides in Truro, Nova Scotia.

Summary:  Web site user modeling, a mathematical discipline, is easier than you might expect. In this tutorial, Paul Meagher shows you how to construct a user-modeling platform with PHP and MySQL -- technologies well suited for a species of user-modeling called Web site user modeling. Even small Web-development shops can use clickstream data to build Web site user models.

Date:  30 Dec 2003
Level:  Introductory PDF:  A4 and Letter (300 KB | 52 pages)Get Adobe® Reader®

Activity:  8066 views
Comments:  

Introduction

Power to the people

User modeling is an advanced feature of software systems. Universities, research institutes, and large companies take the lead in developing advanced software systems that incorporate user modeling engines, but that shouldn't exclude the small Web-development shops that believe it is beyond their reach to incorporate user-modeling technologies into their Web sites.

In this tutorial, I will show you how to construct a user-modeling platform without esoteric technologies. PHP and MySQL technologies are up to the task and furthermore, they may be the best technologies to use for a species of user-modeling called Web site user modeling.

Web site user modeling is a mathematical discipline; the math involved makes many people think this technique would pose a major stumbling block for them to incorporate into their Web sites. Therefore, a second objective of this tutorial is to show you that the relevant mathematics is not necessarily as difficult as you may expect. In this tutorial, I will discuss two mathematical concepts:

  1. Transition matrices
  2. Markov processes

These mathematical concepts will provide a simple and powerful perspective on how to model for Web site users.


What you need

This tutorial provides a brief review of the math used in an area of probability theory called the Markov process theory (or Markov chain theory).

Basic familiarity with matrices and probability are useful, but I will provide introductory coverage of the relevant concepts and mathematics. This tutorial can be a good starting point to familiarize yourself with Markov process theory; since it provides an interesting example application, namely Web site user modeling, it should make the mathematical concepts seem real and relevant.

Mainly, you need a willingness to learn some mathematical concepts. These concepts are presented using examples and some math notation, but primarily I focus on these concepts as reflected in their implementation as PHP classes. For example, you will map the mathematical concept of a transition matrix onto instance variables and methods supplied by a TransitionMatrix.php class. This approach invites you to learn more by downloading the code, playing with it, then extending it further as you learn more about these mathematical concepts.


Why should I care?

In my opinion, clickstream data is being thrown away by most Web developers, jettisoned because they are not looking at clickstream data from the point of view of its user-modeling potential. Instead, after the fact, they run report generators to figure out the number of visitors per hour, day, week, and so on.

A count of visitors is useful information, but it does not exhaust the potential of clickstream data to inform you about how sites are individually used and how they might be dynamically reconfigured based upon up-to-date user-modeling data for each individual site visitor.

The idea of Web site user modeling suggests that developers might instead use clickstream data to update dynamic models of individuals users as they click through sites. Is this possible? If so, how can it be done? These are the questions that this tutorial will explore and propose answers to.

I will explore the Markov process theory as a way to solve a specific user-modeling problem, namely that of predicting what Web page a user will visit next. The section on Markov process theory brings with it many other applications that are only lightly touched upon in this tutorial. Should you wish to pursue Markov process theory beyond this tutorial, you will find a research area that is undergoing a renaissance pertaining to its application in many academic disciplines, from psychology to physics to compression algorithms to search engines to some of the hottest areas in statistics and mathematics. Hopefully, this tutorial will also provide you with a sense of what the Markov process buzz is about.


Tutorial outline

This tutorial has five sections:

  • The first section, "What is a Web site user model? ," discusses in detail the components of Web site user modeling and highlights recent research in this area. This research suggests a simple three-component Web site user model as a way to organize thinking and research about Web site user modeling.
  • The second section, "Fast and lean transition matrices ," discusses the concept of a transition matrix and how to implement it using PHP and MySQL.
  • The third section, "Encoding visitor clickstreams ," discusses how to use the TransitionMatrix.php class along with cookie and session variables to gather transition frequency data for each Web site user.
  • The fourth section, "Modeling users as Markov processes ," discusses Markov process theory, which is concerned with calculating the probability of future states based on transition matrix data. In this section, I explain what a Markov process is and how you might implement some common Markov probability calculations in PHP. A Markov.php class implementing these calculations is discussed.
  • In the final section, "Wrap up ," I discuss clickstream data issues, other applications of Markov process theory, and ethical issues involved in gathering Web site user-modeling data.

1 of 8 | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=136562
TutorialTitle=Web site user modeling with PHP
publish-date=12302003
author1-email=paul@datavore.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.