Documentation of Quantitative Data
Guide to constructs
The guide to constructs summarizes the procedures used to collect the data, the nature of the data collected and the constructs measured.
Guide to scales and their construction
The PAIR Project includes many scales and variables from the diary data. The "Guide to Scales and their Construction" summarizes information about each scale we have used or developed.
Codebooks are essential documents in a longitudinal research project. They provide information on the meaning and placement of variables in data files within the database. To ensure proper documentation, each file in the database has its own codebook. The PAIR Project maintains three types of codebooks:
Questionnaire Codebook. The Questionnaire Codebook involves placing variable names on a copy of the actual questionnaires used in the study. This codebook provides a quick reference for variable names associated with each questionnaire used during data collection.
Extensive Codebook. The Extensive Codebook is more involved and provides in-depth information about the placement of variables and codes used in a particular datafile.
Three versions of the Extensive Codebook exist: (1) dyadic data, (2) individual data, and (3) daily diary data. Recently we created an "Integrated Codebook," a couple-level extended codebook that includes all the most important aggregate variables of the study. The Extensive Codebook was constructed by use of a template containing information that is the same in each codebook. Additionally, the template provides essential information for the construction of codebooks. The following pieces of information are provided by the Extensive Codebook:
Each Extended Codebook contains the following variables to fully identify the data and ensure proper linking of data files: Couple number (CPL), Spouse (for individual data) (SPS), and Phase (PHASE). The PAIR Project also maintains general guidelines that ensure consistency across phases in the naming of variables and the meaning of codes. The first letter of the dyadic level data refers to the participant answering the question (H = Husband; W = Wife). Up to six additional characters define unique variables. These characters usually provide some abbreviated description of the question. For example, when asking about leisure preferences, the extent to which a person likes to attend parties is coded with the characters PARTY. The last character designates the phase of data collection:
If a measure is used during more than one data collection phase, the variable name is identical at each phase, substituting only the phase digit. This keeps variable names consistent from phase to phase.
The variable names in the Individual data files are the same as in the Dyadic data files; however, the H/W distinction is omitted. The husband / wife distinction is made through a variable (SPS) which indicates which spouse provided the data.
Each codebook has a header that identifies three key pieces of information: (1) the filename of the codebook, (2) the corresponding data files (both ASCII and SPSS) and (3) the date the codebook was last revised.
It is also important for individuals to keep track of how they have used, manipulated, or changed their own subset of data for use in their own research. The following provides an example of such personal documentation procedures with a project that involved the sex-typing of couples' leisure.
One potentially useful way of classifying leisure activities is by whether the activity is sex-typed in that it is liked more by men than by women. The scales described below represent a person's average liking for male-preferred, female preferred, and undifferentiated leisure activities.
Development of Scale
In principle, determining which leisure activities are liked in different degrees by men and women is a simple matter. Since some of the variance of leisure preference can reasonably be assumed to be due to couples liking similar activities, paired t-tests between husbands' preferences and wives' preferences can be used to test for difference in preferences. However, since there were four phases of data, in practice, some leisure activities are difficult to classify. For instance, an activity may be significantly masculine at one time, but have no significant sex difference at another time. Therefore, to build the scales, a somewhat arbitrary decision rule must be made. In our case we decided on a rather strenuous set of criteria for considering an item to be male-preferred or female-preferred. An item had to be significantly different from zero in the same direction in at least three of the four phases, and if the difference was not significant in one phase, that t-value must still be in the same direction as the other three significant values. For instance, the item "bar" listed below is not considered male-preferred or female-preferred because, in spite of three significant results toward men preferring the activity more than women, the fourth result leans towards women preferring the activity more than men. We labeled all items that were not male-preferred or female-preferred by our decision rules "undifferentiated."
It must be stressed that the way we have built these scales is not the only possible way to do so. If another decision rule is appropriate for a certain project, one may wish to change the scales. The results of the paired t-tests for each items are listed in Table 1 so that one can apply different criteria if one wishes to do so. The decisions about items are based on our criteria.
It is also important for teams of researchers to keep track of data manipulations and variable creation as they proceed. The following section provides three examples of such documentation.Computing Variables from the Daily Calls: Three Examples
Since the daily data are not set up in the traditional form (with one line of data per individual or couple), it is important to know how to manipulate the daily data so that one can create variables that can be used in conjunction with the Integrated Data File. Although the manipulations one can do are infinite, the best way to learn how to do such manipulations is to see examples of such computation. The three examples depicted below were chosen because they are quite different from each other and are each complicated in their own ways. These should be helpful because many of the tasks within the examples can readily be applied to create other variables that you may need.
Example One: Creating an "average amount of time in leisure per day" variable.
One common type of variable one may want from the diary data is the average amount of time per day a participant spent doing some activity. The following example shows how one can compute the average amount of time spent in leisure per day. If one is interested in a more specific leisure activity (e.g., how much time spent watching TV), one can simply select only those activities referring to TV before following the procedures listed below.
Finding average leisure per day. First you have to get the total leisure time for each call (i.e., day).
Finding average leisure per day with spouse alone.
Example Two: Creating Conversation Variables
If one were interested in marital closeness, one may want to know the amount of all conversations a person has with his or her spouse. Although this would seem to be the same process as one would follow with the leisure, it is a bit different because the data are organized differently. Instead of each conversation being a line of data (like each leisure activity is a line of data) each call has one line of data with as many conversations as that person had that day. We'll go through the steps of computing both total and average time in conversations each day.
Computing total time in conversation each day.
Computing time talking with spouse alone.
The basic issue here is that only conversations coded "1000" should count in the amount of time talking with the spouse alone. This is a bit trickier for the conversation data than with the leisure data because each call is a case rather than each conversation. Thus, one cannot simply select cases where a conversation is coded "1000" because many cases have some conversations that are "1000" and some that are coded something else.
Example Three: Creating a "Diversity of Leisure Activities" Score
Sometimes, one may wish to know the diversity of activities (i.e., the number of different activities) that a person engaged in rather than simply the total amount of time doing activities. The case below describes finding the number of different leisure activities the individuals did in Phase 4.
Get a count of how many times the person did each of the types of leisure.
Get a count of the number of different activities done by each person.
The PAIR Project at the University of Texas at Austin
Principal Investigator, Ted L. Huston
Page last modified: 20 September 2002