Analyzing observational data
The analysis is guided by the evaluation's central questions, which are shaped by the program's objectives. For example, if the goal of a program is to improve instructors' oral presentation skills, you might use classroom observations to evaluate these skills before and after program participation.
Assess reliability among multiple observers
If more than one observer was used, record the level of agreement among them. If you are using a 5-point rating scale, for example, you might compute the percentage of time two observers made the same rating and the percentage of time ratings differed by one point. As a rule of thumb with a 5-point scale, two raters should agree exactly at least 60% of the time.
With checklists, count the number of positive behaviors recorded, compute this as a percentage of total behaviors, and compare groups or times:
| # yes responses | = % positive behaviors |
total # of responses |
For example, consider this checklist assessing lecture organization:
Example
| The Instructor | Comments | ||
|---|---|---|---|
1. stated the purpose of the lecture. |
_X_ Yes |
___ No |
stated clearly at start of class |
2. explained the relation of the class to the previous one. |
_X_ Yes |
___ No |
very clear and concise |
3. put class objectives on a PowerPoint slide |
_X_ Yes |
___ No |
good, reinforced #1 |
4. verbally provided an outline of lecture content. |
_X_ Yes |
___ No |
with PowerPoint slide |
5. made transition statements between lecture segments |
___ Yes |
_X_ No |
mostly jumped to new topic |
6. summarized periodically and at the end of class. |
___ Yes |
_X_ No |
some at end, but otherwise no |
7. connected different points or topics in summaries. |
___ Yes |
_X_ No |
would be helpful to tie together |
Four of seven (57%) possible positive behaviors were observed. During a later lecture, the number increased to seven out of seven (100%), clearly indicating improvement. Alternatively, you can weight items differently based on relative importance when computing a total score. For example, if students consider it twice as important to state the purpose of the lecture than to provide a verbal outline of lecture content, a "yes" on item 1 would receive 2 points, while a "yes" on item 4 would receive 1 point. When conducting numerous observations of different instructors, you can emphasize the lack of an important behavior by computing a percentage: "In 57% of the classes observed, instructors did not state the purpose of the lesson."
When using ratings, compare means of groups or times:
| MEAN (Time 1) = |
Sum of all ratings at Time 1 |
vs. | MEAN (Time 2) = |
Sum of all ratings at Time 2 |
| # of ratings at Time 1 |
# of ratings of Time 2 |
For example, observers rated the clarity of an instructor's lectures from 1 = no clarity to 5 = outstanding clarity. For instructors participating in a public speaking course, mean ratings of clarity increased from 2.3 before the course to 3.7 after the course. If you are rating the extent of behaviors, note large differences between groups or times. For example, observations comparing classrooms equipped with computers for every student and those not equipped with computers found that a greater percentage of instructors in equipped classrooms at least occasionally act as coaches or facilitators (47%) than in non-equipped classrooms (34%):
Example
How often does the instructor act as a coach or facilitator during class?
| Never | Rarely | Occasionally | Frequently | Extensively | |
|---|---|---|---|---|---|
| Computer equipped | 31% |
22% |
20% |
17% |
10% |
| Not computer equipped | 41% |
25% |
16% |
10% |
8% |
Definitions:
Never = not observed in any classes for this instructor
Rarely = less than five minutes per class
Occasionally = average of between 5 and 15 minutes per class
Frequently = average of between 15 and 25 minutes per class
Extensively = average of more than 25 minutes per class
You should also analyze comments that accompany checklists or ratings, identifying themes and significant points, such as what works well and what needs improvement.
Often, it is a good idea to share preliminary results with sponsors and stakeholders before you have completed your analysis. This is particularly true when doing a formative evaluation or when the sponsor wishes to make program changes quickly. For example, you may share observation ratings and comments without providing any interpretation. Avoid making conclusions before you have fully analyzed the data.
