Assess technology

Analyzing observational data

The analysis is guided by the evaluation's central questions, which are shaped by the purpose of the instructional technology. For example, if the purpose of the technology is to improve instructors' presentation skills, you might use classroom observations to evaluate these skills before and after the technology implementation.

Assess reliability among multiple observers

If more than one observer was used, record the level of agreement among them. If you are using a 5-point rating scale, for example, you might compute the percentage of time two observers made the same rating and the percentage of time ratings differed by one point. As a rule of thumb with a 5-point scale, two raters should agree exactly at least 60% of the time.

With checklists, count the number of positive behaviors recorded, compute this as a percentage of total behaviors, and compare groups or times:

   # yes responses    
total # of responses

 = % positive behaviors

For example, consider this checklist assessing Turning Point, an interactive presentation program that combines PowerPoint with an audience response system:


Presentation Comments

1. Attendance was taken with the remote response system

_X_ Yes

__ No

Students pressed 1 on their remote keypads

2. Slides appeared similar to PowerPoint and used effectively

_X_ Yes

__ No

One slide was even inserted while presenting

3. Students entered their response for a multiple-choice prior knowledge question

_X_ Yes

__ No

All students appeared to respond

4. Presenter displayed the results easily

_X_ Yes

__ No

Using a graph on screen

5. Sliced responses based on student demographics

__ Yes

_X_ No

Presenter had some trouble configuring this function

6. Altered slides based on student response

__ Yes

_X_ No

Had difficulty with this function.  Went to the wrong slide.

Four of six (66%) possible positive objectives were observed. During a later lecture, the number increased to six out of six (100%), clearly indicating improvement with this technology.

Alternatively, you can weight items differently based on relative importance when computing a total score. For example, if you considered it twice as important for the lecturer to alter slides based on student response than to take attendance with the response system, a "yes" on item 6 would receive 2 points, while a "yes" on item 1 would receive 1 point.

When conducting numerous observations of different instructors, you can emphasize the lack of an important behavior by computing a percentage: "In 66% of the classes observed, instructors had difficulty altering slides based on student response."

When using ratings, compare means of groups or times:

(Time 1)
 =  sum of all ratings
at Time 1    

# ratings of responses
at Time 1
vs. Mean
(Time 2)
 =  sum of all ratings
at Time 2     

# ratings of responses
at Time 2

For example, observers rated the ability of an instructor to display student responses on screen using Turning Point from 1 = difficult to 4 = easy. For instructors participating in a Turning Point training course, mean ratings for this objective increased from 2.3 before the course to 3.7 after the course.

If you are rating the extent of behaviors, note large differences between groups or times. For example, observations comparing classrooms equipped with computers for every student and those not equipped with computers found that a greater percentage of instructors in equipped classrooms at least occasionally act as coaches or facilitators (47%) than in non-equipped classrooms (34%).


How often does the instructor act as a coach or facilitator during class?

  Never Rarely Occasionally Frequently Extensively
Computer equipped






Not computer equipped






Never = not observed in any classes for this instructor
Rarely = less than five minutes per class
Occasionally = average of between 5 and 15 minutes per class
Frequently = average of between 15 and 25 minutes per class
Extensively = average of more than 25 minutes per class

You should also analyze comments that accompany checklists or ratings, identifying themes and significant points, such as what works well and what needs improvement. [more]

Often, it is a good idea to share preliminary results with sponsors and stakeholders before you have completed your analysis. This is particularly true when doing a formative evaluation or when the sponsor wishes to make program changes quickly. For example, you may share observation ratings and comments without providing any interpretation. Avoid making conclusions before you have fully analyzed the data.

Page last updated: Sep 21 2011
Copyright © 2007, The University of Texas at Austin