Determining the findings of your observation data involves more than simply reporting initial results. Instead, it is important to critically examine results and check for statistical pitfalls to develop accurate findings upon which you can make reliable conclusions.
Critically examine results
No matter what your results, ask some critical questions:
- If observers used a form, was it clear and easy to use?
- Were observers able to accurately and objectively record the data? Observers' initial positive impressions of a person can distort their later judgments, or observers may give consistently favorable ratings or ratings that hover around the midpoint of a scale. If possible, use observers who do not know enough about the study to form expectations about behaviors they will observe.
- Did observations avoid ceiling and floor effects, which occur when observers encounter upper or lower limits to a measure? For example, if observers completed a checklist recording whether students asked any questions during a lecture, almost all checklists could be marked "yes."
- If you implemented an intervention and are comparing observations between/among groups or periods of time,
- did anything happen other than the instructional intervention that would have resulted in improvement in later observations?
- were the behaviors you observed good indicators of changes you expected to take place?
- were there significant differences between groups in observed behaviors before you started your intervention?
- were conditions for groups roughly the same (for example, equivalent classrooms, instruction, and assistance outside of class)?
- was there any difference in motivation between/among groups before or during the study?
Check for statistical pitfalls
- While any conclusive findings should be statistically significant, having statistically significant results does not mean, they are important or valuable; it just indicates that the difference you found is unlikely to be due to chance.
- If you used multiple observers, is the level of inter-observer reliability acceptable (e.g., .70 or higher)? Do results indicate any type of bias on the part of one or more of the observers? If you find poor reliability or suspected bias, your results are possibly unreliable and data should be regathered and/or reanalyzed.
- If you are comparing observations between/among groups or periods of time,
- was regression to the mean a problem? If you assign participants to groups, such as a low performing and high performing group, based on initial observations, the low performing group may improve when re-observed because they had an uncharacteristically poor initial performance.
- could there be any errors due to sample size? If you have fewer than 25 cases per group, you may lack adequate statistical power to detect differences between groups. On the other hand, if you have very large groups, almost any difference, even a trivial one, will be statistically significant, and could lead you to make unwarranted conclusions. For this reason, you should indicate effect sizes, which allow the readers to judge how meaningful the differences are between/among groups. [more]
- Are you making multiple comparisons between variables? Each additional comparison between groups increases the chance of finding an erroneous relationship due to chance. Decrease errors resulting from multiple comparisons by using a more stringent significance level, adjusting for the number of comparisons made, or using multiple comparison techniques that account for this issue.
- Evaluate your results based on how well they answer your research questions or confirm your hypotheses.
- View qualitative observations (e.g., narrative forms) from a distance until you see a larger picture and understand how this picture relates to your research questions or hypotheses. Theory or previous research may help you make sense of repeating ideas and larger themes. Review your data repeatedly to check that your conclusions are grounded in what was observed rather than pre-existing assumptions.
- Statistically significant and practically significant findings, as well as important theme findings, should form the basis of your main conclusions. Emphasize your strongest findings.
- Consider all possible explanations for results before concluding an intervention definitely worked or did not work.
- Verify (triangulate) findings from your observations with other data sources such as interviews or surveys that can provide additional insight. Finding similar results using different methods strengthens conclusions. On the other hand, differing results call for further analysis.
Aron, A. & Aron, E. N. (2002). Statistics for Psychology, 3rd edition. Upper Saddle River, N J: Prentice Hall.
Berkowitz, S. (1997). Analyzing Qualitative Data. In J. Frechtling, L. Sharp, and Westat (Eds.), User-Friendly Handbook for Mixed Method Evaluations (Chapter 4). Retrieved June 20, 2006 from National Science Foundation, Directorate of Education and Human Resources Web site: http://www.ehr.nsf.gov/EHR/REC/pubs/NSF97-153/CHAP_4.HTM.
Bogdan R. B. & Biklin, S. K. (1998). Qualitative Research for Education: An Introduction to Theory and Methods, Third Edition. Needham Heights, MA: Allyn and Bacon.
Coe, R. (2000). What is an 'effect
size'? A guide for users.
Retrieved March 1, 2005 from the University of Durham, Curriculum,
Evaluation and Management Centre, Evidence-Based Education-UK Web
Note: No longer available.
Cohen's Kappa: Index of inter-rater reliability. Retrieved June 20, 2006 from University of Nebraska Psychology Department Research Design and Data Analysis Directory Web site: http://www-class.unl.edu/psycrs/handcomp/hckappa.PDF.
Greiner, J. M. (2004). Trained observer ratings. In J. S. Wholey, H. P. Hatry & K. E. Newcomers (Eds.), Handbook of Practical Program Evaluation (2nd ed.) (pp. 211-256). San Francisco: Jossey-Bass.
T-test. Retrieved December 4, 2007 from the Georgetown University, Department of Psychology, Research Methods and Statistics Resources Web site: http://www1.georgetown.edu/departments/psychology/resources/researchmethods/statistics/8318.html.
Weunschk, K.L. (2003). Inter-rater Agreement. Retrieved June 20. 2006 from: http://core.ecu.edu/psyc/wuenschk/docs30/InterRater.doc.