# Documentation of Quantitative Data

## Guide to constructs

The guide to constructs summarizes the procedures used to collect the data, the nature of the data collected and the constructs measured.

## Guide to scales and their construction

The PAIR Project includes many scales and variables from the diary data. The "Guide to Scales and their Construction" summarizes information about each scale we have used or developed.

## Codebooks

Codebooks are essential documents in a longitudinal research project. They provide information on the meaning and placement of variables in data files within the database. To ensure proper documentation, each file in the database has its own codebook. The PAIR Project maintains three types of codebooks:

Questionnaire Codebook. The Questionnaire Codebook involves placing variable names on a copy of the actual questionnaires used in the study. This codebook provides a quick reference for variable names associated with each questionnaire used during data collection.

Extensive Codebook. The Extensive Codebook is more involved and provides in-depth information about the placement of variables and codes used in a particular datafile.

Three versions of the Extensive Codebook exist: (1) dyadic data, (2) individual data, and (3) daily diary data. Recently we created an "Integrated Codebook," a couple-level extended codebook that includes all the most important aggregate variables of the study. The Extensive Codebook was constructed by use of a template containing information that is the same in each codebook. Additionally, the template provides essential information for the construction of codebooks. The following pieces of information are provided by the Extensive Codebook:

• Question: The complete question
• Variable name: Variable name assigned to each question in the data file
• Value: Acceptable coded values for each question
• Value Label:Verbal description of the values given in 3 (above)
• Missing: Indicates the number of missing cases for the variable
• Explanation: Catch-all category giving information vital to understanding the question or questionnaire (e.g., reverse scoring, omitted items)
• Type of variable: Indication of the variable type (e.g., string, continuous, categorical, scale component)
• Source: Source of the question if appropriate
• Calculation: Provides formula for computing variable if appropriate
• Row: For ASCII data, provides the row number of the variable
• Column: For ASCII data, provides the column number(s) of the variable

Each Extended Codebook contains the following variables to fully identify the data and ensure proper linking of data files: Couple number (CPL), Spouse (for individual data) (SPS), and Phase (PHASE).  The PAIR Project also maintains general guidelines that ensure consistency across phases in the naming of variables and the meaning of codes. The first letter of the dyadic level data refers to the participant answering the question (H = Husband; W = Wife). Up to six additional characters define unique variables. These characters usually provide some abbreviated description of the question. For example, when asking about leisure preferences, the extent to which a person likes to attend parties is coded with the characters PARTY. The last character designates the phase of data collection:

• 1 - Newlywed (1981)
• 2 - Married 1+ Years (1982)
• 3 - Married 2+ Years (1983)
• 4 - Married 13+ Years (1994)

If a measure is used during more than one data collection phase, the variable name is identical at each phase, substituting only the phase digit. This keeps variable names consistent from phase to phase.

The variable names in the Individual data files are the same as in the Dyadic data files; however, the H/W distinction is omitted. The husband / wife distinction is made through a variable (SPS) which indicates which spouse provided the data.

Each codebook has a header that identifies three key pieces of information: (1) the filename of the codebook, (2) the corresponding data files (both ASCII and SPSS) and (3) the date the codebook was last revised.

## Personal Documentation

It is also important for individuals to keep track of how they have used, manipulated, or changed their own subset of data for use in their own research. The following provides an example of such personal documentation procedures with a project that involved the sex-typing of couples' leisure.

One potentially useful way of classifying leisure activities is by whether the activity is sex-typed in that it is liked more by men than by women. The scales described below represent a person's average liking for male-preferred, female preferred, and undifferentiated leisure activities.

Development of Scale

In principle, determining which leisure activities are liked in different degrees by men and women is a simple matter. Since some of the variance of leisure preference can reasonably be assumed to be due to couples liking similar activities, paired t-tests between husbands' preferences and wives' preferences can be used to test for difference in preferences. However, since there were four phases of data, in practice, some leisure activities are difficult to classify. For instance, an activity may be significantly masculine at one time, but have no significant sex difference at another time. Therefore, to build the scales, a somewhat arbitrary decision rule must be made. In our case we decided on a rather strenuous set of criteria for considering an item to be male-preferred or female-preferred. An item had to be significantly different from zero in the same direction in at least three of the four phases, and if the difference was not significant in one phase, that t-value must still be in the same direction as the other three significant values. For instance, the item "bar" listed below is not considered male-preferred or female-preferred because, in spite of three significant results toward men preferring the activity more than women, the fourth result leans towards women preferring the activity more than men. We labeled all items that were not male-preferred or female-preferred by our decision rules "undifferentiated."

It must be stressed that the way we have built these scales is not the only possible way to do so. If another decision rule is appropriate for a certain project, one may wish to change the scales. The results of the paired t-tests for each items are listed in Table 1 so that one can apply different criteria if one wishes to do so. The decisions about items are based on our criteria.

## Team Documentation

It is also important for teams of researchers to keep track of data manipulations and variable creation as they proceed. The following section provides three examples of such documentation.

Computing Variables from the Daily Calls: Three Examples

Since the daily data are not set up in the traditional form (with one line of data per individual or couple), it is important to know how to manipulate the daily data so that one can create variables that can be used in conjunction with the Integrated Data File. Although the manipulations one can do are infinite, the best way to learn how to do such manipulations is to see examples of such computation. The three examples depicted below were chosen because they are quite different from each other and are each complicated in their own ways. These should be helpful because many of the tasks within the examples can readily be applied to create other variables that you may need.

Example One: Creating an "average amount of time in leisure per day" variable.

One common type of variable one may want from the diary data is the average amount of time per day a participant spent doing some activity. The following example shows how one can compute the average amount of time spent in leisure per day. If one is interested in a more specific leisure activity (e.g., how much time spent watching TV), one can simply select only those activities referring to TV before following the procedures listed below.

Finding average leisure per day. First you have to get the total leisure time for each call (i.e., day).

• Open a leisure data file (e.g., dayleis4.sav)
• Under the category Data, select Aggregate.
• Use CPL, SPS, and CALL as the Break Variables. Enter the variable you want aggregated in the Aggregate box--in this case the variable was called "timeout4."
• Hit the Function button and select Sum (rather than the default, Mean)
• This will create a new file called "aggr.sav." (You can select other file names.) The new file will have the total time spent in leisure for each call.
• Open the new data file "aggr.sav."
• Unfortunately, getting the average leisure per day is not so simple as averaging the values for each call. If you scroll down through the data, you will find missing calls. This occurs in the aggregation process. If a person reported no leisure for a call, this was simply not entered as part of the data (i.e., a "zero" was not entered to show that no leisure was done that day). The aggregate command reads those as missing rather than as zeros. So, for example, if a person reported doing 60 minutes of leisure on three of the six (Phase Four) calls and no leisure on the other three calls, the aggregated file will list only the three calls with 60 minutes of leisure. If one were to average across those, one would think this person averaged 60 minutes of leisure per day, when, in fact, s/he did 30 minutes per day. One way to solve this problem is to re-enter those calls that should be registered as a zero. An easy way to do this is through a process called making a "merge master." Please see Appendix 2 for the details of making a merge master.
• After you have inserted zeros for all the "missing calls," you are ready to take an average amount of leisure per call (i.e., average per day). Choose Aggregate again. This time use the default function, Mean. Select CPL and SPS as the Break Variables, and the time in leisure each day variable as the Aggregate variable.
• The new aggregate file should now have an average amount of leisure per day for each person. Of course, before one can integrate this new variable with the integrated database, one must change the data so that the husband and the wife of each couple is listed on one line. To do this, see Appendix 1, "Moving Individual Level Data to the Integrated Database."

Finding average leisure per day with spouse alone.

• Open the data file (e.g., dayleis4.sav).
• Under Data, choose Select Cases. Click on the If box.
• To get leisure with spouse alone, your "if" statement would be, "nucout4=1 & asnout4=0 & relout4=0." The first value specifies the husband and wife together without the kids; the other two values specify that nobody else is there. Of course, this is the step where any combination of people can be selected. For example, if one just wanted to know how much leisure the husband and wife did together (regardless of who else was there), one would do an if "nucout4=1 OR nucout4=3."
• Once the cases have been selected, go through the aggregation process exactly as one did for finding the average amount of all leisure per day.

Example Two: Creating Conversation Variables

If one were interested in marital closeness, one may want to know the amount of all conversations a person has with his or her spouse. Although this would seem to be the same process as one would follow with the leisure, it is a bit different because the data are organized differently. Instead of each conversation being a line of data (like each leisure activity is a line of data) each call has one line of data with as many conversations as that person had that day. We'll go through the steps of computing both total and average time in conversations each day.

Computing total time in conversation each day.

• Under Transform, choose Compute and create a new target variable. The numeric expression should include a sum of all the times (e.g., with phase 2, the statement would be   SUM(cnvatm2,cnvbtm2........cnvxtm2).
• Under Data choose Aggregate
• CPL and SPS should be your break variables, and the variable you computed in step "a" should be the aggregate variable.
• Under Function choose SUM.
• Paste the syntax (i.e., don't hit the OK button, hit the PASTE button).
• Before the period of the syntax, type "/ count = N." For example, if the variable you computed in step "a" is called "cnvtim2," your final syntax should look something like:
AGGREGATE
/OUTFILE='C:\SPSSWIN\AGGR.SAV'
/BREAK=CPL SPS
/cnvtim_1 = SUM(cnvtim2)/ count = N.
• Run the syntax.
• Open your new file (in this case "aggr.sav"). It will have the total amount of time in conversation (in this case, a variable called "cnvtim_1") and a variable (in this case called count) that tells you how many scores were summed to reach that total.
• Since the number of the scores in the total are equal to the number of calls completed, "cnvtim_1" divided by "count" equals the average time each day in conversations.
• Compute the average (under Transform, use the Compute function. You can choose any target variable and the numeric expression should be "cnvtim_1 / count").
• You are now ready to "move individual level data to the integrated database." Please follow the directions in Appendix 1. You should probably delete the "count" variable before doing so.

Computing time talking with spouse alone.

The basic issue here is that only conversations coded "1000" should count in the amount of time talking with the spouse alone. This is a bit trickier for the conversation data than with the leisure data because each call is a case rather than each conversation. Thus, one cannot simply select cases where a conversation is coded "1000" because many cases have some conversations that are "1000" and some that are coded something else.

• To solve this, I did a series of "conditional if" recoding that set the time of a conversation equal to "0" of the code was not "1000." One has to do this for each conversation variable. Sample syntax I used was:
DO IF (cnva3 ~= 1000).
RECODE cnvatm3 (0000 thru 9999=0).
END IF.
EXECUTE.
DO IF (cnvb3 ~= 1000).
RECODE cnvbtm3 (0000 thru 9999=0).
END IF.
EXECUTE.
• One would continue this type of conditional recode statement for as many conversation codes as there was for a particular phase.
• After running the syntax, it would be a good idea to make sure all the "1000's" have a value and all the "non-1000's" are coded zeros.
• Save the file UNDER A NEW NAME.
• Your file is now set up so that only conversations with the spouse alone will be included. Go through the same steps you did to find the total and average amounts of conversation above.

Example Three: Creating a "Diversity of Leisure Activities" Score

Sometimes, one may wish to know the diversity of activities (i.e., the number of different activities) that a person engaged in rather than simply the total amount of time doing activities. The case below describes finding the number of different leisure activities the individuals did in Phase 4.

Get a count of how many times the person did each of the types of leisure.

• Under Data, select Aggregate.
• Use CPL, SPS, and ACTOUT4 as the break variables. (Note, in other phases, the types of activities are sometimes called "ACTTYPE." Use the variable label that corresponds to the data set you are using.
• Click on the box that says "Save number of cases in break group as variable."

• You may either hit OK or paste the syntax. If you paste your syntax, it should look something like:
AGGREGATE
/OUTFILE='C:\11PAIRDA\AGGR.SAV'
/BREAK=cpl sps actout4
/N_BREAK=N.

Note: Although N_BREAK is the default variable name, you can select another name that makes sense.

Get a count of the number of different activities done by each person.

• Open the data file you created above (by my syntax it would be called "aggr.sav.")
• Under Data, select Aggregate.
• Use CPL and SPS as the break variables.
• Click on the box that says "Save number of cases in break group as variable."
• Again, you may choose OK, or paste the syntax.
• Your new file will have the number of different activities a person did.

The PAIR Project at the University of Texas at Austin
Principal Investigator, Ted L. Huston