Identifying Unusual Cases in a Medical Database

A data analyst hired to build predictive models for stroke treatment outcomes is concerned about data quality because such models can be sensitive to unusual observations. Some of these outlying observations represent truly unique cases and are thus unsuitable for prediction, while other observations are caused by data entry errors in which the values are technically “correct” and thus cannot be caught by data validation procedures.

This information is collected in stroke_valid.sav. See the topic Sample Files for more information. Use the Identify Unusual Cases procedure to clean the data file. Syntax for reproducing these analyses can be found in detectanomaly_stroke.sps.