E-Retail Example--Integrating Data
A Web-Mining Scenario Using CRISP-DM
With multiple data sources, there are many different ways in which the e-retailer can integrate data:
- Adding customer and product attributes to event data. In order to model Web log events using attributes from other databases, any customer ID, product number, and purchase order number associated with each event must be correctly identified and the corresponding attributes merged to the processed Web logs. Note that the merged file replicates customer and product information every time a customer or product is associated with an event.
- Adding purchase and Web log information to customer data. In order to model the value of a customer, their purchases and session information must be picked out of the appropriate databases, totaled, and merged with the customer database. This involves the creation of new attributes as discussed in the constructing data process.
After integrating databases, the e-retailer goes through an exploration process to make sure that the data merge was performed correctly.