Collecting Initial Data

At this point in CRISP-DM, you're ready to access data and bring it into IBM® SPSS® Modeler. Data come from a variety of sources, such as:

  • Existing data. This includes a wide variety of data, such as transactional data, survey data, Web logs, etc. Consider whether the existing data are enough to meet your needs.
  • Purchased data. Does your organization use supplemental data, such as demographics? If not, consider whether it may be needed.
  • Additional data. If the above sources don't meet your needs, you may need to conduct surveys or begin additional tracking to supplement the existing data stores.

Task List

Take a look at the data in IBM SPSS Modeler and consider the following questions. Be sure to take notes on your findings. See the topic Writing a Data Collection Report for more information.

  • Which attributes (columns) from the database seem most promising?
  • Which attributes seem irrelevant and can be excluded?
  • Is there enough data to draw generalizable conclusions or make accurate predictions?
  • Are there too many attributes for your modeling method of choice?
  • Are you merging various data sources? If so, are there areas that might pose a problem when merging?
  • Have you considered how missing values are handled in each of your data sources?