Examples (VALIDATEDATA command)
Default Analysis Variable Validation
VALIDATEDATA VARIABLES=ALL.
- The only specification is a list of analysis variables (all variables in the active dataset).
- The procedure reports all variables with a high proportion of missing values, as well as variables that are constant or nearly so.
- In addition, empty cases are identified.
Default Case Identifier Variable Validation
VALIDATEDATA ID=Firstname Lastname.
- Two case identifier variables are specified.
- By default, the procedure reports cases having incomplete or duplicate case identifiers. The combination of Firstname and Lastname values is treated as a case identifier.
- In addition, empty cases are identified.
Non-Default Analysis and Case Identifier Variable Validation
VALIDATEDATA VARIABLES=Satis1 Satis2 Satis3
ID=Firstname Lastname
/VARCHECKS STATUS=ON PCTMISSING=30
/IDCHECKS DUPLICATE
/SAVE EMPTYCASE.
- Three ordinal satisfaction variables (Satis1, Satis2, Satis3) are specified as analysis variables.
-
ID
specifies two case identifier variables, Firstname and Lastname. -
VARCHECKS
reports analysis variables with more than 30% missing values. By default, the procedure also reports categorical analysis variables if more than 95% of their cases fall into a single category. -
IDCHECKS
reports cases that have duplicate case identifiers and turns off the default check for incomplete case identifiers. - Empty cases are identified by default.
-
SAVE
saves a variable to the active dataset that indicates which cases are empty.
Using Single-Variable Validation Rules
DATAFILE ATTRIBUTE
ATTRIBUTE=$VD.SRule[1] ("Label='Likert 1 to 5',"+
"Description='Likert 5-point scale',"+
"Type='Numeric',"+
"Domain='Range',"+
"FlagUserMissing='No',"+
"FlagSystemMissing='No',"+
"FlagBlank='No',"+
"Minimum='1',"+
"Maximum='5',"+
"FlagNoninteger='Yes'").
COMPUTE Likert1to5_Satis_=NOT(Satis LE 5 AND Satis GE 1 AND
Satis=TRUNC(Satis)).
VARIABLE ATTRIBUTE
VARIABLES= Likert1to5_Satis_
ATTRIBUTE=$VD.RuleOutcomeVar("Yes").
VARIABLE ATTRIBUTE
VARIABLES=Satis
ATTRIBUTE=$VD.SRuleRef[1]("Rule='$VD.SRule[1]',"+
"OutcomeVar='Likert1to5_Satis_'").
VALIDATEDATA VARIABLES=Satis
/CASEREPORT DISPLAY=YES CASELIMIT=NONE
/SAVE RULEVIOLATIONS.
-
DATAFILE ATTRIBUTE
defines a single-variable validation rule named $VD.SRule[1]. The rule flags any value that is not an integer within the range 1 to 5. - The first
VARIABLE ATTRIBUTE
command links the rule to the variable Satis. It also references an outcome variable where 1 indicates an invalid value for the variable Satis. -
COMPUTE
creates the rule outcome variable. - The second
VARIABLE ATTRIBUTE
command marks Likert1to5_Satis_ as an outcome variable in the data dictionary. -
VALIDATEDATA
specifies Satis as an analysis variable. In addition to performing default checks for constants, variables with a large proportion of missing values, and empty cases, the procedure summarizes violations of single-variable validation rules defined for Satis. -
CASEREPORT
reports all cases having a rule violation. -
SAVE
saves a variable to the active dataset that indicates the total number of validation rule violations per case.
Using Cross-Variable Validation Rules
DATAFILE ATTRIBUTE
ATTRIBUTE=$VD.CRule[1]("Label='Pregnant Male',"+
"Expression='Sex =''Male'' AND Pregnant ''Yes''',"+
"OutcomeVar='PregnantMale_'").
COMPUTE PregnantMale_= Sex ='Male' AND Pregnant = 'Yes'.
VARIABLE ATTRIBUTE
VARIABLES=PregnantMale_
ATTRIBUTE=$VD.RuleOutcomeVar("Yes").
VALIDATEDATA CROSSVARRULES=$VD.CRule[1]
/CASECHECKS REPORTEMPTY=NO.
-
DATAFILE ATTRIBUTE
defines the cross-variable rule $VD.CRule[1]. -
COMPUTE
creates the outcome variable PregnantMale_ referenced by the cross-variable rule. For PregnantMale_ , values of 1 identify cases containing males coded as being pregnant. -
VARIABLE ATTRIBUTE
marks PregnantMale_ as an outcome variable in the data dictionary. -
VALIDATEDATA
specifies $VD.CRule[1] as a cross-variable rule to be summarized. The procedure reports the total number of pregnant males in the active dataset. -
CASECHECKS
turns off the default check for empty cases. - By default, the first 500 cases that violated at least one validation rule are listed.