Topic
9 replies Latest Post - ‏2015-03-31T16:19:01Z by Kameron
RichH6109
RichH6109
10 Posts
ACCEPTED ANSWER

Pinned topic Building a scorecard with SPSS Modeler

‏2012-07-01T19:07:07Z |
Hi all,

I purchased Modeler for primary use as a data mining tool and as a piece of software to allow me to build very traditional accept/reject risk scorecards. I'm finding the latter particularly difficult and often am forced to fall back on the likes of Excel to achieve simple tasks. Can someone assist me in dealing with the following hurdles so I am performing less transformation outside of Modeler? The datasets I am using have one target variable (always 1(accept) or 0(reject)) and circa 200 input variables. The number of discrete records ranges from 5,000-100,000 depending on the task.

(1) How can I get Modeler to easily find the most predictive, say, 20 variables in the dataset and suggest the best segmentation of continuous variables (e.g. determine that age is a predictive input and the segmentation of customers 18-25 works better than 18-30)? Currently I am performing univariate analysis in Excel and looking for Weight of Evidence (WOE) and Information Values (IV) for each input. This is very time consuming with the size of the datasets I am using. I have attempted to try and replicate this using the interactive Decision List without much success.

(2) I'm attempting to model build using the Logistic Modelling node. However, I am constantly receiving the error messages:
"Unexpected singularities in the Hessian matrix are encountered. This indicates that either some predictor variables should be excluded or some categories should be merged. The NOMREG procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.". The node always attempts to use every selected input (is this correct?) and my $LRP-target output is largely nulls.

I suspect that what I am trying to do is eminently achievable in Modeler but I am lacking the experience with the software to make it work.

B.
Updated on 2012-11-21T08:32:04Z at 2012-11-21T08:32:04Z by SystemAdmin
  • TedFischer
    TedFischer
    248 Posts
    ACCEPTED ANSWER

    Re: Building a scorecard with SPSS Modeler

    ‏2012-07-02T13:39:04Z  in response to RichH6109
    To find the best 20 variables for a model using univariate analysis use the Feature Selection node. You could also build a model with all your variables, turn on predictor importance, and select the top 20 variables.

    Logistic regression requires all your data to be populated. If most of your data has nulls in one or more fields, you may not be able to build a model. You can estimate what the null values should be using a an imputation technique (basic ones are available in the data audit node, quality tab) or use a model technique that is more robust to the presence of nulls (such as some decision trees).

    Ted
    • IrinaG
      IrinaG
      7 Posts
      ACCEPTED ANSWER

      Re: Building a scorecard with SPSS Modeler

      ‏2012-11-14T16:29:16Z  in response to TedFischer
      Hello Ted,

      Is there any way to buid incremental model using Clementine based on WOE for variable selection?

      Thank you
      • TedFischer
        TedFischer
        248 Posts
        ACCEPTED ANSWER

        Re: Building a scorecard with SPSS Modeler

        ‏2012-11-14T17:42:54Z  in response to IrinaG
        I am not sure what you mean by "weight of evidence" as I have seen that it could refer to a Bayesian approach or an information criterion. There is a Bayesian Net model and C5.0 uses information gain for creating a model.

        Thanks.

        Ted
  • SystemAdmin
    SystemAdmin
    435 Posts
    ACCEPTED ANSWER

    Re: Building a scorecard with SPSS Modeler

    ‏2012-11-15T11:57:50Z  in response to RichH6109
    1.) There are some ways of calculating WOE and things like ROC-curve, AUC in Modeler. I have share some streams with you if desired. However, these do only calculate these values (and give ROC-curve) and are not directly used to create a model. However, once these values are known, you can decide from those (in an automatic way) which to use in a model and which not. This may then look similar as the current feature selection node, but instead of p-values, you use WoE or ROC.

    2.) By Default Logistic node always includes all of your fields. This may not always be the best option, certainly not if you have many missing data (and there is no way to impute). You can easily change this to a stepwise algorithm in the logistic node. This will probably also take care of the errors you are getting.
    • SystemAdmin
      SystemAdmin
      435 Posts
      ACCEPTED ANSWER

      Re: Building a scorecard with SPSS Modeler

      ‏2012-11-19T17:45:24Z  in response to SystemAdmin
      hi Everybody,
      i am building a credit scorecard model with SPSS Modeler. Please can you send me the streams?
      thank you
      • SystemAdmin
        SystemAdmin
        435 Posts
        ACCEPTED ANSWER

        Re: Building a scorecard with SPSS Modeler

        ‏2012-11-19T20:31:39Z  in response to SystemAdmin
        Attached you can find a stream that calculates WoE and VI for categorical variables, and ROC, AUC and GINI for all continuous variables.
        • SystemAdmin
          SystemAdmin
          435 Posts
          ACCEPTED ANSWER

          Re: Building a scorecard with SPSS Modeler

          ‏2012-11-20T20:14:52Z  in response to SystemAdmin
          Hello Rosius
          thank you for the file. But i cant open it using SPSS Modeler...the error message is the file is dropped. can you try to post it again or just describe how i can build it by myself
          thanks again
          • SystemAdmin
            SystemAdmin
            435 Posts
            ACCEPTED ANSWER

            Re: Building a scorecard with SPSS Modeler

            ‏2012-11-21T08:32:04Z  in response to SystemAdmin
            Hi Blaise,
            It is a stream created on version15, which can not be opened in earlier versions.
            Here is the same stream on version14.
            Cheers
            • Kameron
              Kameron
              1 Post
              ACCEPTED ANSWER

              Re: Building a scorecard with SPSS Modeler

              ‏2015-03-31T16:19:01Z  in response to SystemAdmin

              Hi,

              I am currently using SPSS version 20. I have a more than 200K rows and more than 1000 variables in the dataset.

              I am also looking for a script to calculate WoE and IV for each variable automatically.

              Do you have any scripts or appraoch to do that? Can you please share your script or shed some light on how to do the calculation?

              Appreciate your help!