Evaluate programs

Analyzing observational data

The analysis is guided by the evaluation's central questions, which are shaped by the program's objectives. For example, if the goal of a program is to improve instructors' oral presentation skills, you might use classroom observations to evaluate these skills before and after program participation.

Assess reliability among multiple observers

If more than one observer was used, record the level of agreement among them. If you are using a 5-point rating scale, for example, you might compute the percentage of time two observers made the same rating and the percentage of time ratings differed by one point. As a rule of thumb with a 5-point scale, two raters should agree exactly at least 60% of the time.

With checklists, count the number of positive behaviors recorded, compute this as a percentage of total behaviors, and compare groups or times:

# yes responses

= % positive behaviors

total # of responses

For example, consider this checklist assessing lecture organization:


The Instructor Comments

1. stated the purpose of the lecture.

_X_ Yes

___ No

stated clearly at start of class

2. explained the relation of the class to the previous one.

_X_ Yes

___ No

very clear and concise

3. put class objectives on a PowerPoint slide

_X_ Yes

___ No

good, reinforced #1

4. verbally provided an outline of lecture content.

_X_ Yes

___ No

with PowerPoint slide

5. made transition statements between lecture segments

___ Yes

_X_ No

mostly jumped to new topic

6. summarized periodically and at the end of class.

___ Yes

_X_ No

some at end, but otherwise no

7. connected different points or topics in summaries.

___ Yes

_X_ No

would be helpful to tie together

Four of seven (57%) possible positive behaviors were observed. During a later lecture, the number increased to seven out of seven (100%), clearly indicating improvement. Alternatively, you can weight items differently based on relative importance when computing a total score. For example, if students consider it twice as important to state the purpose of the lecture than to provide a verbal outline of lecture content, a "yes" on item 1 would receive 2 points, while a "yes" on item 4 would receive 1 point. When conducting numerous observations of different instructors, you can emphasize the lack of an important behavior by computing a percentage: "In 57% of the classes observed, instructors did not state the purpose of the lesson."

When using ratings, compare means of groups or times:

(Time 1) =
Sum of all ratings
at Time 1
vs. MEAN
(Time 2) =
Sum of all ratings
at Time 2
# of ratings at
Time 1
# of ratings of
Time 2

For example, observers rated the clarity of an instructor's lectures from 1 = no clarity to 5 = outstanding clarity. For instructors participating in a public speaking course, mean ratings of clarity increased from 2.3 before the course to 3.7 after the course. If you are rating the extent of behaviors, note large differences between groups or times. For example, observations comparing classrooms equipped with computers for every student and those not equipped with computers found that a greater percentage of instructors in equipped classrooms at least occasionally act as coaches or facilitators (47%) than in non-equipped classrooms (34%):


How often does the instructor act as a coach or facilitator during class?

Never Rarely Occasionally Frequently Extensively
Computer equipped






Not computer equipped






Never = not observed in any classes for this instructor
Rarely = less than five minutes per class
Occasionally = average of between 5 and 15 minutes per class
Frequently = average of between 15 and 25 minutes per class
Extensively = average of more than 25 minutes per class

You should also analyze comments that accompany checklists or ratings, identifying themes and significant points, such as what works well and what needs improvement.

Often, it is a good idea to share preliminary results with sponsors and stakeholders before you have completed your analysis. This is particularly true when doing a formative evaluation or when the sponsor wishes to make program changes quickly. For example, you may share observation ratings and comments without providing any interpretation. Avoid making conclusions before you have fully analyzed the data.

Page last updated: Sep 21 2011
Copyright © 2007, The University of Texas at Austin