Clickstream Example Streams Flow
Clickstream - What is it all about?
Clickstream is the recording of areas of the screen that a user clicks while web browsing. As the user clicks anywhere in the web page, the action is logged. The log contains information such as time, URL, the user’s machine, type of browser, type of event (for example, browsing, checking out, logging in, logging out with purchase, removing from cart, logging out with purchase), product information (for example, ID, category, and price), total purchase in basket, number of items in basket, and session duration. This information can give valuable clues about what visitors are doing on your web site, and about the visitors themselves.
Clickstream analysis is useful for web activity analysis and market research. The navigation path can indicate purchase interests and price range. You can identify browsing patterns to determine the probability that the user will place an order.
Business use cases for Clickstream events
Let’s say that your online retail store wants to find out what shoppers are doing in your web site. What pages are they visiting? Do they buy online after visiting those pages, or do they leave without purchasing anything? Do the same shoppers return for more purchases, or do they come once and never return? How many times does a visitor browse a page before making a purchase?
A data scientist can combine this clickstream data with your retail store’s ERP data to identify each shopper’s preferences and price range. The data scientist can also combine the clickstream data with social media data about the shopper to offer targeted offers.
Example Clickstream data
The sample data that is used in the Clickstream streams flow contains formatted data from user actions in a web page. The data includes: customer ID, time stamp, type of click event, name of the product, category of the product, price, total price of all products in the basket, total number of all products in the basket, number of distinct items in the basket, and how long the user was on the site.
Goal
Our goal is to exercise creating and running a flow when the online user has added something to the shopping cart. The data will be used for off-line analysis.
Description of operators
The following screen capture shows how the clickstream example streams flow looks in the canvas:
Let’s look more closely at these three operators.
Sample Data operator
Sample Data is the source of clickstream data for the streams flow. We supply this data. The following screen captures show the clickstream properties and some of its schema attributes:
The following screen capture shows the properties and the schema of the sample data.
The streams flow ingests the sample data. The schema attributes include customer ID, time zone, type of click event, total price of items in the user’s shopping cart, and so on. In this example flow, we’re interested in the click_event_type attribute.
Filter operator
Next, we want to pull out only the data when a user puts something in the shopping cart. We use the Filter operator to select data where the click event type is add_to_cart
. All other tuples are ignored.
Running the Clickstream example flow
When you click in the canvas, the Clickstream example flow is automatically created and deployed for you. The Metrics page opens and shows the example flow as it begins to deploy. Sample data flow starts at the Sample Data operator, continues to the Filter operator, and then terminates in the Debug operator.
The Metrics page has the following graphs:
-
Flow shows all operators and the flow of data between them in the streams flow. Hover your mouse pointer over a data flow to show its throughput speed and event size.
-
Ingest Rate shows the number of events that are submitted to the streams flow per second for each streams flow source.
-
Throughput shows the throughput of input and output flows, if they exist. It also shows events that have errors.
By default, the Sample Data operator in the Flow graph is selected. Click Filter or Debug operator to see its Throughput.
Note that in the Metrics page, the throughput from the Filter operator is greatly reduced because we’re selecting only one type of clickstream action to use.