E-Retail Example--Selecting Data
A Web-Mining Scenario Using CRISP-DM
Many of the e-retailer's decisions about which data to select have already been made in earlier phases of the data mining process.
Selecting items. The initial study will be limited to the (approximately) 30,000 customers who have registered on the site, so filters need to be set up to exclude purchases and Web logs of nonregistered customers. Other filters should be established to remove calls to image files and other non-informative entries in the Web logs.
Selecting attributes. The purchase database will contain sensitive information about the e-retailer's customers, so it is important to filter attributes such as the customer name, address, phone number, and credit card numbers.