Examples (VALIDATEDATA command)

Default Analysis Variable Validation

VALIDATEDATA VARIABLES=ALL.
  • The only specification is a list of analysis variables (all variables in the active dataset).
  • The procedure reports all variables with a high proportion of missing values, as well as variables that are constant or nearly so.
  • In addition, empty cases are identified.

Default Case Identifier Variable Validation

VALIDATEDATA ID=Firstname Lastname.
  • Two case identifier variables are specified.
  • By default, the procedure reports cases having incomplete or duplicate case identifiers. The combination of Firstname and Lastname values is treated as a case identifier.
  • In addition, empty cases are identified.

Non-Default Analysis and Case Identifier Variable Validation

VALIDATEDATA VARIABLES=Satis1 Satis2 Satis3 
            ID=Firstname Lastname
 /VARCHECKS STATUS=ON PCTMISSING=30
 /IDCHECKS DUPLICATE 
 /SAVE EMPTYCASE.
  • Three ordinal satisfaction variables (Satis1, Satis2, Satis3) are specified as analysis variables.
  • ID specifies two case identifier variables, Firstname and Lastname.
  • VARCHECKS reports analysis variables with more than 30% missing values. By default, the procedure also reports categorical analysis variables if more than 95% of their cases fall into a single category.
  • IDCHECKS reports cases that have duplicate case identifiers and turns off the default check for incomplete case identifiers.
  • Empty cases are identified by default.
  • SAVE saves a variable to the active dataset that indicates which cases are empty.

Using Single-Variable Validation Rules

DATAFILE ATTRIBUTE 
 ATTRIBUTE=$VD.SRule[1] ("Label='Likert 1 to 5',"+
                         "Description='Likert 5-point scale',"+
                         "Type='Numeric',"+
                         "Domain='Range',"+
                         "FlagUserMissing='No',"+
                         "FlagSystemMissing='No',"+
                         "FlagBlank='No',"+
                         "Minimum='1',"+
                         "Maximum='5',"+
                         "FlagNoninteger='Yes'").

COMPUTE Likert1to5_Satis_=NOT(Satis LE 5 AND Satis GE 1 AND
                              Satis=TRUNC(Satis)).
VARIABLE ATTRIBUTE
  VARIABLES= Likert1to5_Satis_  
  ATTRIBUTE=$VD.RuleOutcomeVar("Yes").

VARIABLE ATTRIBUTE
  VARIABLES=Satis
  ATTRIBUTE=$VD.SRuleRef[1]("Rule='$VD.SRule[1]',"+
                            "OutcomeVar='Likert1to5_Satis_'"). 

VALIDATEDATA VARIABLES=Satis
 /CASEREPORT DISPLAY=YES CASELIMIT=NONE
 /SAVE RULEVIOLATIONS.
  • DATAFILE ATTRIBUTE defines a single-variable validation rule named $VD.SRule[1]. The rule flags any value that is not an integer within the range 1 to 5.
  • The first VARIABLE ATTRIBUTE command links the rule to the variable Satis. It also references an outcome variable where 1 indicates an invalid value for the variable Satis.
  • COMPUTE creates the rule outcome variable.
  • The second VARIABLE ATTRIBUTE command marks Likert1to5_Satis_ as an outcome variable in the data dictionary.
  • VALIDATEDATA specifies Satis as an analysis variable. In addition to performing default checks for constants, variables with a large proportion of missing values, and empty cases, the procedure summarizes violations of single-variable validation rules defined for Satis.
  • CASEREPORT reports all cases having a rule violation.
  • SAVE saves a variable to the active dataset that indicates the total number of validation rule violations per case.

Using Cross-Variable Validation Rules

DATAFILE ATTRIBUTE
 ATTRIBUTE=$VD.CRule[1]("Label='Pregnant Male',"+
                        "Expression='Sex =''Male'' AND Pregnant ''Yes''',"+
                        "OutcomeVar='PregnantMale_'").  

COMPUTE PregnantMale_= Sex ='Male' AND Pregnant = 'Yes'.   
VARIABLE ATTRIBUTE
  VARIABLES=PregnantMale_  
  ATTRIBUTE=$VD.RuleOutcomeVar("Yes").

VALIDATEDATA CROSSVARRULES=$VD.CRule[1]
 /CASECHECKS REPORTEMPTY=NO.
  • DATAFILE ATTRIBUTE defines the cross-variable rule $VD.CRule[1].
  • COMPUTE creates the outcome variable PregnantMale_ referenced by the cross-variable rule. For PregnantMale_ , values of 1 identify cases containing males coded as being pregnant.
  • VARIABLE ATTRIBUTE marks PregnantMale_ as an outcome variable in the data dictionary.
  • VALIDATEDATA specifies $VD.CRule[1] as a cross-variable rule to be summarized. The procedure reports the total number of pregnant males in the active dataset.
  • CASECHECKS turns off the default check for empty cases.
  • By default, the first 500 cases that violated at least one validation rule are listed.