History: PAIR Project Biographical Sketch History: PAIR Team Memories People: Current Staff of the Project People: Key Collaborators with the Project People: Student Contributors to the Project People: PAIR Photo Gallery Research: PAIR Project Methods Research: PAIR Quantitative Data Research: PAIR Qualitative Data Research: PAIR Results, Findings, & Abstracts PAIR News & Updates

Preparing Quantitative Data for Analysis


Master Copies    Password Protection
Preparing a Dataset for Analysis
     Documentation   Updating Master Copies


This part of our project is not specifically designed to address longitudinal research. Nevertheless, the preparation of data for analysis is an important part of data handling, especially when large amounts of data are involved, as is usually the case with longitudinal research.


Importance of Master Copies: The Original Data Principle

Always save a computer file copy of the original, unaltered data (Davidson, 1996, p. 61). In our case, we save two master copies of our complete data (one kept in the lab, and one back-up copy kept off-site). Our master copies are held on IOmega100 'zip disks' because of the large amount of information these diskettes can hold. Although our complete data set is quite large, one zip disk can hold all four phases of data.


Importance of Password Protection

It is essential that we protect the master copies with a password. The password helps preserve the integrity of the master copies by restricting access to the most knowledgeable and responsible members of the project staff. We 'write-protect' our zip disks to allow members of the research team to download data without allowing them to change information.

The project staff varies in size and often includes undergraduate students, graduate students, and professionals, all with varying amounts of knowledge of and experience with the PAIR project data. Those with imperfect knowledge of the project and the data system may unintentionally contaminate the master copy. The more people use the master copy, the greater the chance that the data will be corrupted accidentally. These kinds of human errors can happen even to the most careful and responsible person, so unnecessary access to the master copies must be avoided.

For example: One of our senior researchers was working with a graduate student who was new to the project. While this new student was working with the data on the master copy, she came across what looked to be an 'error' in the database. Without careful thought (and without the master copy being password protected), the supposed 'error' might have been 'corrected,' thereby jeopardizing the integrity of the whole datafile. Fortunately, the senior researcher advised the student NOT to attempt to correct what looked to be a 'mistake,' (which through further investigation turned out not be an 'error').

Without password protection, someone who does not know every detail of the data system (or someone who does, but does not have time to properly check what appears to be a mistake) while intending to do good, may nonetheless harm the database.


Preparing a Data Set for Analysis

In order to conduct specialized statistical analyses, each researcher must extract the necessary data from a master copy. Since the master copies contain a vast amount of data, it is important to think in advance about the kind of data that is needed. Creating a checklist in which researchers answer the following types of questions can be very helpful and time saving, since it gives detailed information about what kind of data is needed. Such a checklist makes it much easier for the proper data to be downloaded from a master copy. Examples of such questions may be:

What level of analysis do you need? (individual or couple level)
Which Phase(s) is/are needed? (Phase 1, 2, 3, and/or 4)
Do you need diary and/or questionnaire data?
What kind of measurements do you need? Which variables (raw variables or transformed/summed variables, integrated data or individual measures)? Which scales?
Special requirements for data (Only married couples? Just parents? Only parents of boys?)

In essence, the PAIR project is a collection of databases, all at different levels of analyses. Deciding precisely what data is needed is a time consuming activity that needs to be done upfront before a copy of the data can be made.

For each individual research project, the specialized data set extracted from the project's master copy becomes a 'little' master copy in its own right, which the researcher will use as basic reference data. From here, one or more additional copies of this data should be created. In essence, this repeats the Original Data Principle noted above: Always save a computer file copy of the original, unaltered data (Davidson, 1996, p. 61).


Importance of Documentation: The Reinvented Wheel Principle

The Reinvented Wheel Principle states: Keep an archive of successful procedures, routines, and programs so that you do not have to rediscover and redesign them each time they are needed. This principle implies that you set up a system to organize and catalog your data and programs. Don't needlessly re-invent the wheel (Davidson, 1996, p. 206). On our project we encourage two types of documentation, personal documentation and team documentation.

Personal documentation of analysis and results

Once the relevant data have been extracted from a master copy to a personalized data set, keep a list of how that particular data was generated. It is important to know how your specific sample was selected from the larger sample. In essence, the answers to the questions asked above will provide you with such a list. This checklist will become the first page in your personal notebook and is the first important step to documenting your data.

It is important to create a notebook for each project or study you are working on. In a prominent position on the outside, or certainly on the first page, of this notebook should be the name of your datafile. In your notebook, you should record in a timely manner all changes to the data, all analyses performed, all results, and all syntax files. The rule here is: Do your documentation well, otherwise you will never be able to reconstruct what you did. Since such reconstruction would inevitably need to be done, poor documentation results in having to re-do the analyses from scratch. Nearly everyone who has worked on the PAIR Project has learned this lesson the hard way. It's frustrating to find out that you don't remember how an important printout of significant results was generated.

Relabel your data set whenever you make changes (e.g., recoding). Document how and why you made these changes. Although it seems we are advocating a never-ending paper trail, throw away all documentation that was generated before your perfect solution is found. You don't want to make the mistake of repeating incorrect analyses.  See the development and description of the Sex-Typing of Leisure Activity scale for an example of personal documentation procedures.

Team documentation of local procedures

As part of the larger team project, you will also want to develop a team notebook of helpful procedures. Think of this notebook in terms of an 'insider's guide,' or sourcebook, for procedures to help others get through common problems used in local analyses. This document should be kept at the lab, accessible to all who would be conducting specific analyses (e.g., aggregating the telephone diary data; common EQS problems). For an example, see the team documentation example for computing variables from the daily calls.


Updating Master Copies

If a researcher discovers data or codebook mistakes that may warrant changes to the master copy, this discovery needs to be documented. This documentation notes when and where the mistake was found, as well as the date in which the correction was completed. There needs to be good reason for a change to be made to the master copy because any changes require accompanying changes in the codebook(s), the manuals, and all phase data. Even a small change may require a lot of work. Major changes to the master copy will need to be decided by a committee of senior researchers.

Periodically, one senior researcher will be responsible for updating the master copy and will follow a list of procedures on how to do so, including updating the back-up copy.


References

Davidson, F. (1996). Principles of Statistical Data Handling. Sage: Thousand Oaks, CA.


The PAIR Project at the University of Texas at Austin
Principal Investigator, Ted L. Huston
Page last modified: 16 January 2002