Analyzing observational data
The analysis is guided by the evaluation's central questions, which are shaped by the purpose of the instructional technology. For example, if the purpose of the technology is to improve instructors' presentation skills, you might use classroom observations to evaluate these skills before and after the technology implementation.
Assess reliability among multiple observers
If more than one observer was used, record the level of agreement among them. If you are using a 5-point rating scale, for example, you might compute the percentage of time two observers made the same rating and the percentage of time ratings differed by one point. As a rule of thumb with a 5-point scale, two raters should agree exactly at least 60% of the time.
With checklists, count the number of positive behaviors recorded, compute this as a percentage of total behaviors, and compare groups or times:
#
yes responses |
= % positive behaviors |
For example, consider this checklist assessing Turning Point, an interactive presentation program that combines PowerPoint with an audience response system:
Example
| Presentation | Comments | ||
|---|---|---|---|
1. Attendance was taken with the remote response system |
_X_ Yes |
__ No |
Students pressed 1 on their remote keypads |
2. Slides appeared similar to PowerPoint and used effectively |
_X_ Yes |
__ No |
One slide was even inserted while presenting |
3. Students entered their response for a multiple-choice prior knowledge question |
_X_ Yes |
__ No |
All students appeared to respond |
4. Presenter displayed the results easily |
_X_ Yes |
__ No |
Using a graph on screen |
5. Sliced responses based on student demographics |
__ Yes |
_X_ No |
Presenter had some trouble configuring this function |
6. Altered slides based on student response |
__ Yes |
_X_ No |
Had difficulty with this function. Went to the wrong slide. |
Four of six (66%) possible positive objectives were observed. During a later lecture, the number increased to six out of six (100%), clearly indicating improvement with this technology.
Alternatively, you can weight items differently based on relative importance when computing a total score. For example, if you considered it twice as important for the lecturer to alter slides based on student response than to take attendance with the response system, a "yes" on item 6 would receive 2 points, while a "yes" on item 1 would receive 1 point.
When conducting numerous observations of different instructors, you can emphasize the lack of an important behavior by computing a percentage: "In 66% of the classes observed, instructors had difficulty altering slides based on student response."
When using ratings, compare means of groups or times:
| Mean (Time 1) |
= | sum of all ratings at Time 1 # ratings of responses at Time 1 |
vs. | Mean (Time 2) |
= | sum of all ratings at Time 2 # ratings of responses at Time 2 |
For example, observers rated the ability of an instructor to display student responses on screen using Turning Point from 1 = difficult to 4 = easy. For instructors participating in a Turning Point training course, mean ratings for this objective increased from 2.3 before the course to 3.7 after the course.
If you are rating the extent of behaviors, note large differences between groups or times. For example, observations comparing classrooms equipped with computers for every student and those not equipped with computers found that a greater percentage of instructors in equipped classrooms at least occasionally act as coaches or facilitators (47%) than in non-equipped classrooms (34%).
Example
How often does the instructor act as a coach or facilitator during class?
| Never | Rarely | Occasionally | Frequently | Extensively | |
|---|---|---|---|---|---|
| Computer equipped | 31% |
22% |
20% |
17% |
10% |
| Not computer equipped | 41% |
25% |
16% |
10% |
8% |
Definitions:
Never = not observed in any classes for this instructor
Rarely = less than five minutes per class
Occasionally = average of between 5 and 15 minutes per class
Frequently = average of between 15 and 25 minutes per class
Extensively = average of more than 25 minutes per class
You should also analyze comments that accompany checklists or ratings, identifying themes and significant points, such as what works well and what needs improvement. [more]
Often, it is a good idea to share preliminary results with sponsors and stakeholders before you have completed your analysis. This is particularly true when doing a formative evaluation or when the sponsor wishes to make program changes quickly. For example, you may share observation ratings and comments without providing any interpretation. Avoid making conclusions before you have fully analyzed the data.

