Creating Dichotomous Valid/Missing Variables for Diagnosing Missing Data

To determine whether or not the pattern of missing data is random, we create a special diagnostic variable that indicates whether the variable is missing or valid for each case in the data set.  Each diagnostic variable is dichotomous, using the value 1 for 'Valid' and the value 0 for 'Missing'

 

Since we may need to refer back to the original variables in the course of the missing data analysis, I recommend a naming convention for the diagnostic variables that makes it easy to identify the original variable.  If the original variable name is less than eight characters, an underscore is appended to the end of the original variable name, e.g. the diagnostic variable for "race" would be "race_".  If the original variable name is eight characters, the last character is replaced with an underscore, e.g. the diagnostic variable name for "response" would be "respons_".  If replacing the last character with an underscore duplicates the name assigned to another diagnostic variable for an eight-character variable name, we drop the last two characters from the original name and append an underscore followed by a sequence letter or digit, e.g. the diagnostic variable name for "response" would be "respon_1" if we had already used the name "respons_" for a diagnostic variable.

 

When we assign variable labels to the diagnostic variables, we can add a keyword to the original variable label to designate it as a missing/valid diagnostic variable, e.g. the variable label for the diagnostic variable that had an original variable label of "Grade Level" could be "Grade Level (Valid/Missing)". 

 

We will demonstrate the process of creating dichotomous Valid/Missing variables for diagnosing missing data using the variables in the HATMISS.SAV data set.  If the copy of HATMISS.SAV that you are working with does not have variable labels and value labels, do the exercise "Applying a Data Dictionary" to apply the data labels from the HATCO.SAV data set to the HATMISS.SAV data set. A quick test for the presence of variable labels is to position the mouse over a variable name in the data editor.  If a variable label appears in a yellow tips box, a variable label has been added for that variable.