Clickstream Example Streams Flow

Clickstream - What is it all about?

Clickstream is the recording of areas of the screen that a user clicks while web browsing. As the user clicks anywhere in the web page, the action is logged. The log contains information such as time, URL, the user’s machine, type of browser, type of event (for example, browsing, checking out, logging in, logging out with purchase, removing from cart, logging out with purchase), product information (for example, ID, category, and price), total purchase in basket, number of items in basket, and session duration. This information can give valuable clues about what visitors are doing on your web site, and about the visitors themselves.
Clickstream analysis is useful for web activity analysis and market research. The navigation path can indicate purchase interests and price range. You can identify browsing patterns to determine the probability that the user will place an order.

Business use cases for Clickstream events

Let’s say that your online retail store wants to find out what shoppers are doing in your web site. What pages are they visiting? Do they buy online after visiting those pages, or do they leave without purchasing anything? Do the same shoppers return for more purchases, or do they come once and never return? How many times does a visitor browse a page before making a purchase?

A data scientist can combine this clickstream data with your retail store’s ERP data to identify each shopper’s preferences and price range. The data scientist can also combine the clickstream data with social media data about the shopper to offer targeted offers.

Example Clickstream data

The sample data that is used in the Clickstream streams flow contains formatted data from user actions in a web page. The data includes: customer ID, time stamp, type of click event, name of the product, category of the product, price, total price of all products in the basket, total number of all products in the basket, number of distinct items in the basket, and how long the user was on the site.

Goal

Our goal is to exercise creating and running a flow when the online user has added something to the shopping cart. The data will be used for off-line analysis.

Description of operators

The following screen capture shows how the clickstream example streams flow looks in the canvas:

Let’s look more closely at these three operators.

Sample Data operator

Sample Data is the source of clickstream data for the streams flow. We supply this data. The following screen captures show the clickstream properties and some of its schema attributes:

The following screen capture shows the properties and the schema of the sample data.

Properties and schema of Clickstream sample data

The streams flow ingests the sample data. The schema attributes include customer ID, time zone, type of click event, total price of items in the user’s shopping cart, and so on. In this example flow, we’re interested in the click_event_type attribute.

Filter operator

Next, we want to pull out only the data when a user puts something in the shopping cart. We use the Filter operator to select data where the click event type is add_to_cart. All other tuples are ignored.

Properties and schema of Clickstream sample data

Running the Clickstream example flow

When you click Run icon in the canvas, the Clickstream example flow is automatically created and deployed for you. The Metrics page opens and shows the example flow as it begins to deploy. Sample data flow starts at the Sample Data operator, continues to the Filter operator, and then terminates in the Debug operator.

Clickstream Metrics

The Metrics page has the following graphs:

By default, the Sample Data operator in the Flow graph is selected. Click Filter or Debug operator to see its Throughput.

Note that in the Metrics page, the throughput from the Filter operator is greatly reduced because we’re selecting only one type of clickstream action to use.