IBM Support

PI68241: PROBABILTIES DIFFER BETWEEN BINARY LOGISTIC REGRESSION AND PROPENSITY SCORE MATCHING ON THE LAST 2 DIGITS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as user error.

Error description

  • You work with IBM SPSS Statistics 24 and run a Binary Logistic
    Regression with example syntax below.
    
    LOGISTIC REGRESSION VARIABLES premat
      /METHOD=ENTER age lwt_kg smoke ht ui race2
      /SAVE=PRED
      /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
    
    You save the predicted probabilities as a new variable in your
    data file.
    
    Next, you run a Propensity Score Matching (PSM) model with the
    same variables and settings so that the results for the saved
    predicted values should be the same.
    
    When you set the decimal digit display for both predicted
    variables from Logistic Regression and PSM to the maximum of 16
    digits on Data Editor and compare the values you see they are
    the same up to the 14.th decimal digit but seem to differ
    slightly on the last 2 digits.
    
    For example on the variable for Logistic regression you see a
    value of
    .0703200813016620
    and for the PSM variable and the same case you get:
    .0703200813016621
    
    This was reported to IBM SPSS Development but is functioning as
    designed.
    

Local fix

  • The issue here is finite precision error or "fuzz" that actually
    has nothing to do per se with the extension or Python, other
    than the fact that the extension makes use of the variable level
    information and issues a CATEGORICAL subcommand that causes the
    variables smoke, ht, ui and race2 to be coded as 1-0 as opposed
    to the 0-1 codings in the data that are used in the original
    running of the model in LOGISTIC REGRESSION. This results in a
    different intercept and subtracting where one formerly added the
    values of the four coefficients. If you insert a CATEGORICAL
    subcommand into the LOGISTIC REGRESSION command, as in:
    
    LOGISTIC REGRESSION VARIABLES premat
      /METHOD=ENTER age lwt_kg smoke ht ui race2
      /CATEGORICAL smoke ht ui race2
      /SAVE=PRED
      /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
    
    and run this, the saved predicted probabilities match those from
    the extension exactly to double precision.
    Then the values in the variable from logistic regression shows
    that these match the PS variable exactly.
    
    Differences of these orders are going to be found in a lot of
    computations done in SPSS Statistics and other similar software.
    They're to be expected with finite precision computing. They're
    also generally irrelevant except when you're making exact
    comparisons. When doing this, you really have to include fuzz
    checks to avoid calling things different that differ only
    because of finite precision computations.
    
    This is a typical numerical discrepancy due to rounding,
    happening when the order of computation changes, even though
    algorithm is the same. Mathematically, the results should be the
    same but, given limitation on the number of decimal digits, this
    is unavoidable.
    

Problem summary

Problem conclusion

Temporary fix

Comments

  • This is functioning as designed.
    The issue here is finite precision error or "fuzz" that actually
    has nothing to do per se with the extension or Python, other
    than the fact that the extension makes use of the variable level
    information and issues a CATEGORICAL subcommand that causes the
    variables smoke, ht, ui and race2 to be coded as 1-0 as opposed
    to the 0-1 codings in the data that are used in the original
    running of the model in LOGISTIC REGRESSION. This results in a
    different intercept and subtracting where one formerly added the
    values of the four coefficients. If you insert a CATEGORICAL
    subcommand into the LOGISTIC REGRESSION command, as in:
    
    LOGISTIC REGRESSION VARIABLES premat
      /METHOD=ENTER age lwt_kg smoke ht ui race2
      /CATEGORICAL smoke ht ui race2
      /SAVE=PRED
      /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
    
    and run this, the saved predicted probabilities match those from
    the extension exactly to double precision.
    Then the values in the variable from logistic regression shows
    that these match the PS variable exactly.
    
    Differences of these orders are going to be found in a lot of
    computations done in SPSS Statistics and other similar software.
    They're to be expected with finite precision computing. They're
    also generally irrelevant except when you're making exact
    comparisons. When doing this, you really have to include fuzz
    checks to avoid calling things different that differ only
    because of finite precision computations.
    
    This is a typical numerical discrepancy due to rounding,
    happening when the order of computation changes, even though
    algorithm is the same. Mathematically, the results should be the
    same but, given limitation on the number of decimal digits, this
    is unavoidable.
    

APAR Information

  • APAR number

    PI68241

  • Reported component name

    SPSS STATISTICS

  • Reported component ID

    5725A54ST

  • Reported release

    O00

  • Status

    CLOSED USE

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-08-29

  • Closed date

    2016-08-30

  • Last modified date

    2016-08-30

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCQ88K","label":"Statistics Desktop"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"O00","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 August 2016