Collecting Initial Data
At this point in CRISP-DM, you're ready to access data and bring it into IBM® SPSS® Modeler. Data come from a variety of sources, such as:
- Existing data. This includes a wide variety of data, such as transactional data, survey data, Web logs, etc. Consider whether the existing data are enough to meet your needs.
- Purchased data. Does your organization use supplemental data, such as demographics? If not, consider whether it may be needed.
- Additional data. If the above sources don't meet your needs, you may need to conduct surveys or begin additional tracking to supplement the existing data stores.
Task List
Take a look at the data in IBM SPSS Modeler and consider the following questions. Be sure to take notes on your findings. See the topic Writing a Data Collection Report for more information.
- Which attributes (columns) from the database seem most promising?
- Which attributes seem irrelevant and can be excluded?
- Is there enough data to draw generalizable conclusions or make accurate predictions?
- Are there too many attributes for your modeling method of choice?
- Are you merging various data sources? If so, are there areas that might pose a problem when merging?
- Have you considered how missing values are handled in each of your data sources?