home
Conduct research

Analyzing survey data

Enter closed-ended survey responses into a program

Enter closed-ended survey responses into a data file of an analysis program like Excel, Access, SPSS, or SAS, which can perform statistical analyses and create tables and graphs. Often, you can import data from electronic surveys directly into the analysis software. Code the data by assigning a number for each response choice (for example, "strongly disagree" = 1, "disagree" = 2, etc.) and create a key explaining the coding for each question.

Reverse score responses if necessary

Some survey questions may be worded so that a given response (i.e. “definitely no” = 1) represents an unfavorable rating for one question, but a favorable rating for another.  An example would be two questions that asked, “The instructor communicated effectively” and “The instructor communicated poorly.”  In order to compare or aggregate these survey responses, the inconsistent survey question should be reverse scored. 

To reverse score, switch the highest and lowest numerical values of a response, then substitute the next highest and lowest values, and so on.  Non-numerical responses, such as “other” or “non-applicable”, should not be included in reverse scoring.

Example

Question: "The instructor communicated poorly"

Original score: 

"definitely yes" =4

"yes" = 3"

"no" = 2

"definitely no" = 1

Reverse scored:

"definitely yes" = 1

"yes" = 2

"no" = 3

"definitely no" = 4

Open-ended survey responses

For open-ended survey responses, identify general themes and noteworthy exceptions to trends. If you have a large amount of text, you may want to code reliability responses to organize content and reveal patterns. [more] With short-answer questions, you can obtain basic information about response frequencies by categorizing responses, assigning a numerical code to each category, and then entering the codes into a statistical analysis program.

Recoding answers

You may need to recode some answers to questions that have an open-ended "other" response option. For example, one person may answer the question, "Do you consider yourself African American, Caucasian, Asian, Hispanic, or other?" by circling "other" and writing, "Chinese." To maintain consistency, code the answer as "Asian" rather than "other."

Inspect the data

Inspect the data for errors that can occur during data entry or when a respondent provides inconsistent answers. For large databases, check at least five percent of entered data for accuracy. If you find any errors, check the remainder of the data and correct the errors.

Calculate the response rate

Calculate the response rate by dividing the number of people who submitted a completed survey (80% or more of questions answered) by the number of people you contacted or attempted to contact to complete the survey. If 185 project participants were asked to complete the survey, and 107 responded, the response rate was 107/185 or 58%. Consider other formulas for calculating responses rates, such as counting partially completed surveys as responses. In keeping with standards developed by the American Association for Public Opinion Research(2000), you may also need to calculate and report cooperation rates (the proportion of people contacted who participated in the survey), refusal rates (the proportion of people who refused to participate or broke off an interview of all potentially eligible cases), and contact rates (the proportion of all cases in which someone eligible to complete the survey was reached).

Response rates below 70 percent

If the response rate is below 70 percent, determine if the sample is representative of the target population by comparing sample and target population means for characteristics that would likely affect responses, such as race, age, grade point average. An unrepresentative sample may produce response bias. For example, if 40% of student participants in an instructional program are from the School of Engineering, yet only 10% of survey respondents are from that school, results will not represent the concerns of this subgroup. To address the problem, you might supplement your results by surveying additional engineering students. In some cases the best solution is to weight results so that the attitudes of important subgroups are not underrepresented. For example, you could multiply the limited responses from engineering students by four. Weighting, however, is problematic if the people who responded differ in important ways from those who did not respond.

Calculate response frequencies and percentages for each question

Count the number of respondents who selected each response choice for a question to obtain frequencies and divide these frequencies by the total number of responses to the question to compute percentages. For example, for the question below, the "strongly disagree" response percentage would be calculated: 10/107 = .09 or 9%.

Example

To create more meaningful categories, combine the "agree" and "strongly agree" categories to obtain the percentage of agreement (65%) and the "disagree" and "strongly disagree" categories to obtain the percentage disagreement (24%).

Compute cross-tabulations

Compute cross-tabulations to see relationships between responses for two survey questions. For example, you may ask students to provide their race in the first question and rate their overall satisfaction with a project in the second question. Display response choices for the first question as column labels and choices for the second question as row labels. A cross-tabulation might reveal that the percentage of participants who were satisfied with a project depended on their race:

Example

Response White African-American
Satisfied

68%

46%

Unsatisfied

15%

32%

Neutral

17%

22%

Total

100%

100%

Cross-tabulations can highlight contrasts between groups of participants or findings that are consistent across groups. Your research questions and hypotheses should help you choose which cross-tabulations to compute.

Chi-square statistics

Chi-square statistics - used with data that fall into mutually exclusive categories, such as gender, rather than continuous numerical data-tell you whether one categorical variable is independent of another. For example, if you randomly survey 75 male and 75 female math students at UT Austin, a chi-square test could tell you if satisfaction with math courses is independent of gender among UT Austin students. Statistical programs provide a p value that indicates the probability that group differences occurred by chance alone. A p value of .03 indicates that there is a 3% probability that gender differences in your sample are due to chance error rather than real differences between men and women at UT Austin. Results are generally considered to be statistically significant if they yield p values of .05 or less. Chi-square tests require a minimum of five responses per cell, so-as with all statistical tests-make sure you understand and meet the assumptions underlying the test.

Correlation

Correlations indicate whether a statistically significant positive or negative relationship exists between two continuous variables. For example, a positive correlation between time spent watching television and time spent watching movies at theaters indicates that the more time respondents spend watching television, the more time they tend to spend watching movies. Finding significant correlations between variables does not tell you what causes those relationships.

Linear regression

Linear regression is used to determine a line that best summarizes a set of data points and enables the prediction of an outcome variable using one or more continuous variables. For example, you might use course satisfaction ratings to predict course grades.

Confidence interval

A confidence interval provides a range of values likely to include an unknown population value. Sampling error is usually expressed as a confidence interval. Even if you randomly sample to ensure survey respondents are representative of the population of interest, there will still be some error because of chance variation. For example, responses of 200 randomly selected graduate students to a UT Austin library survey question asking how often they use interlibrary services during an academic year yield a mean of 4.5 uses. A 95% confidence interval, expressed as 2.8 ≤ µ ≤ 5.8, tells you that there is a 95% probability that the actual mean number of uses for all graduate students falls between 2.8 and 5.8 per year. The width of the confidence interval provides an estimate of the degree of uncertainty about the unknown population value. Confidence intervals are only valid if a survey sample is randomly selected.

Additional information

American Association for Public Opinion Research (2000). Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, Second Edition [Electronic version], Ann Arbor, MI: AAPOR

Arsham, H. (2005). Questionnaire Design and Surveys Sampling, 9th ed. Retrieved June 21, 2006 from: http://home.ubalt.edu/ntsbarsh/stat-data/Surveys.htm

Bogdan R. B. & Biklin, S. K. (1998). Qualitative Research for Education: An Introduction to Theory and Methods, Third Edition. Needham Heights, MA: Allyn and Bacon.

Darlington, R.B. (1997). Factor analysis. Retrieved June 21, 2006 from: http://comp9.psych.cornell.edu/Darlington/factor.htm

Helberg, C. (1995). Pitfalls of data analysis. Retrieved June 21, 2006 from: http://my.execpc.com/4A/B7/helberg/pitfalls/

Lane, D. M. (2003). Tests of linear combinations of means, independent groups. Retrieved June 21, 2006 from the Hyperstat Online textbook: http://davidmlane.com/hyperstat/confidence_intervals.html

Linear regression. (n.d.) Retrieved June 21, 2006 from the Yale University Department of Statistics Index of Courses 1997-98 Web site: http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm

Lowry, R. P. (2005). Concepts and Applications of Inferential Statistics. Retrieved June 21, 2006 from: http://faculty.vassar.edu/lowry/webtext.html

Spector, P. A. (1992). Summated Rating Scale Construction: An Introduction. Newbury Park, CA, Sage.

Page last updated: Sep 21 2011
Copyright © 2007, The University of Texas at Austin