Using simulation to model treatment costs for diabetes
An insurance company is interested in modeling annual treatment costs for policy holders with diabetes to ensure that premiums are adequate to cover expected costs. To mitigate risk, they want to go beyond point estimates of the costs incurred by a particular policy holder and ask questions about the distribution of costs over the whole population of policy holders with diabetes. For example, what is the threshold such that for 99% of the population, costs are below that threshold?
To make statements based on the distribution of costs over the whole population of policy holders with diabetes, the analysts need to be confident that enough data are available to adequately represent the distribution. Data for their existing policy holders has already been used to build a predictive model of cost on a per patient basis that seems to perform well. The analysts believe that the range of the existing data reasonably represents the target population, but they are concerned that the data do not provide sufficient coverage of the possible combinations of input values to use it alone as a model of the distribution of costs. Using simulation, the analysts can then simulate as much data as needed and apply their predictive model to the data to obtain the desired cost distribution.
This example uses the data file diabetes_costs.sav. See the topic Sample Files for more information. It also the uses the model file diabetes_costs.xml, which contains the specifications of the predictive model for costs, based on the data in diabetes_costs.sav. In particular, the analysts have built a generalized linear model for treatment cost.