Topic
  • 9 replies
  • Latest Post - ‏2015-03-31T16:19:01Z by Kameron
RichH6109
RichH6109
10 Posts

Pinned topic Building a scorecard with SPSS Modeler

‏2012-07-01T19:07:07Z |
Hi all,

I purchased Modeler for primary use as a data mining tool and as a piece of software to allow me to build very traditional accept/reject risk scorecards. I'm finding the latter particularly difficult and often am forced to fall back on the likes of Excel to achieve simple tasks. Can someone assist me in dealing with the following hurdles so I am performing less transformation outside of Modeler? The datasets I am using have one target variable (always 1(accept) or 0(reject)) and circa 200 input variables. The number of discrete records ranges from 5,000-100,000 depending on the task.

(1) How can I get Modeler to easily find the most predictive, say, 20 variables in the dataset and suggest the best segmentation of continuous variables (e.g. determine that age is a predictive input and the segmentation of customers 18-25 works better than 18-30)? Currently I am performing univariate analysis in Excel and looking for Weight of Evidence (WOE) and Information Values (IV) for each input. This is very time consuming with the size of the datasets I am using. I have attempted to try and replicate this using the interactive Decision List without much success.

(2) I'm attempting to model build using the Logistic Modelling node. However, I am constantly receiving the error messages:
"Unexpected singularities in the Hessian matrix are encountered. This indicates that either some predictor variables should be excluded or some categories should be merged. The NOMREG procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.". The node always attempts to use every selected input (is this correct?) and my $LRP-target output is largely nulls.

I suspect that what I am trying to do is eminently achievable in Modeler but I am lacking the experience with the software to make it work.

B.
Updated on 2012-11-21T08:32:04Z at 2012-11-21T08:32:04Z by SystemAdmin
  • TedFischer
    TedFischer
    266 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-07-02T13:39:04Z  
    To find the best 20 variables for a model using univariate analysis use the Feature Selection node. You could also build a model with all your variables, turn on predictor importance, and select the top 20 variables.

    Logistic regression requires all your data to be populated. If most of your data has nulls in one or more fields, you may not be able to build a model. You can estimate what the null values should be using a an imputation technique (basic ones are available in the data audit node, quality tab) or use a model technique that is more robust to the presence of nulls (such as some decision trees).

    Ted
  • IrinaG
    IrinaG
    7 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-14T16:29:16Z  
    To find the best 20 variables for a model using univariate analysis use the Feature Selection node. You could also build a model with all your variables, turn on predictor importance, and select the top 20 variables.

    Logistic regression requires all your data to be populated. If most of your data has nulls in one or more fields, you may not be able to build a model. You can estimate what the null values should be using a an imputation technique (basic ones are available in the data audit node, quality tab) or use a model technique that is more robust to the presence of nulls (such as some decision trees).

    Ted
    Hello Ted,

    Is there any way to buid incremental model using Clementine based on WOE for variable selection?

    Thank you
  • TedFischer
    TedFischer
    266 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-14T17:42:54Z  
    • IrinaG
    • ‏2012-11-14T16:29:16Z
    Hello Ted,

    Is there any way to buid incremental model using Clementine based on WOE for variable selection?

    Thank you
    I am not sure what you mean by "weight of evidence" as I have seen that it could refer to a Bayesian approach or an information criterion. There is a Bayesian Net model and C5.0 uses information gain for creating a model.

    Thanks.

    Ted
  • SystemAdmin
    SystemAdmin
    435 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-15T11:57:50Z  
    1.) There are some ways of calculating WOE and things like ROC-curve, AUC in Modeler. I have share some streams with you if desired. However, these do only calculate these values (and give ROC-curve) and are not directly used to create a model. However, once these values are known, you can decide from those (in an automatic way) which to use in a model and which not. This may then look similar as the current feature selection node, but instead of p-values, you use WoE or ROC.

    2.) By Default Logistic node always includes all of your fields. This may not always be the best option, certainly not if you have many missing data (and there is no way to impute). You can easily change this to a stepwise algorithm in the logistic node. This will probably also take care of the errors you are getting.
  • SystemAdmin
    SystemAdmin
    435 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-19T17:45:24Z  
    1.) There are some ways of calculating WOE and things like ROC-curve, AUC in Modeler. I have share some streams with you if desired. However, these do only calculate these values (and give ROC-curve) and are not directly used to create a model. However, once these values are known, you can decide from those (in an automatic way) which to use in a model and which not. This may then look similar as the current feature selection node, but instead of p-values, you use WoE or ROC.

    2.) By Default Logistic node always includes all of your fields. This may not always be the best option, certainly not if you have many missing data (and there is no way to impute). You can easily change this to a stepwise algorithm in the logistic node. This will probably also take care of the errors you are getting.
    hi Everybody,
    i am building a credit scorecard model with SPSS Modeler. Please can you send me the streams?
    thank you
  • SystemAdmin
    SystemAdmin
    435 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-19T20:31:39Z  
    hi Everybody,
    i am building a credit scorecard model with SPSS Modeler. Please can you send me the streams?
    thank you
    Attached you can find a stream that calculates WoE and VI for categorical variables, and ROC, AUC and GINI for all continuous variables.
  • SystemAdmin
    SystemAdmin
    435 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-20T20:14:52Z  
    Attached you can find a stream that calculates WoE and VI for categorical variables, and ROC, AUC and GINI for all continuous variables.
    Hello Rosius
    thank you for the file. But i cant open it using SPSS Modeler...the error message is the file is dropped. can you try to post it again or just describe how i can build it by myself
    thanks again
  • SystemAdmin
    SystemAdmin
    435 Posts

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-21T08:32:04Z  
    Hello Rosius
    thank you for the file. But i cant open it using SPSS Modeler...the error message is the file is dropped. can you try to post it again or just describe how i can build it by myself
    thanks again
    Hi Blaise,
    It is a stream created on version15, which can not be opened in earlier versions.
    Here is the same stream on version14.
    Cheers
  • Kameron
    Kameron
    1 Post

    Re: Building a scorecard with SPSS Modeler

    ‏2015-03-31T16:19:01Z  
    Hi Blaise,
    It is a stream created on version15, which can not be opened in earlier versions.
    Here is the same stream on version14.
    Cheers

    Hi,

    I am currently using SPSS version 20. I have a more than 200K rows and more than 1000 variables in the dataset.

    I am also looking for a script to calculate WoE and IV for each variable automatically.

    Do you have any scripts or appraoch to do that? Can you please share your script or shed some light on how to do the calculation?

    Appreciate your help!