Have an outside evaluator, rather than program staff, evaluate outcomes to avoid bias. Check that conclusions are grounded in what was observed rather than pre-existing assumptions.
To verify conclusions, combine observation with other assessment methods such as interviews or surveys. Finding similar results using different methods strengthens conclusions. For example, in the study comparing classrooms equipped and not equipped with computers, student surveys, instructor interviews, and classroom observations all indicated that instructors lectured less and coached more in equipped classrooms.
On the other hand, results that differ, depending on the method, call for further analysis. If surveys and interviews suggest important changes have taken place, but observations do not, examine observation methods and reliability. Are observations measuring the same behavior that surveys and interviews are measuring? Do these behaviors occur frequently enough for you to see differences between groups or times? Are observers agreeing with one another? Could observers be biased by expectations of what they should see? Is it possible that program participants are acting differently because they are being observed? Changes in participant behavior are more likely when observations are short and infrequent.
Alternatively, if you believe observations are accurate, it may be that results from other methods are suspect. For example, program managers may believe they have incorporated changes when they have not. If results from different methods are contradictory, you may need to gather more information to be confident in your conclusions.
Greiner, J. M. (2004). Trained observer ratings. In J. S. Wholey, H. P. Hatry & K. E. Newcomers (Eds.), Handbook of Practical Program Evaluation (2nd ed.) (pp. 211-256). San Francisco: Jossey-Bass.