February 17, 2020

Evaluating Teaching Appropriately

This content was previously published by Campus Labs, now part of Anthology. Product and/or solution names may have changed

Have you ever thought that improving teaching should be at least as important as evaluating it? If so, you’ll find much to like about a recent policy brief published by the Division of Educational Psychology of the American Psychological Association (APA), titled Addressing Teacher Evaluation Appropriately. Alyson Lavigne and Thomas Good, authors of the brief, acknowledge that because teachers have an impact on student learning, considerable time and resources should rightly be invested in teacher evaluation. However, they argue that at least as much emphasis should be placed on improving teaching as on evaluating it.

Although the APA policy brief focuses exclusively on K-12 education, much of it confirms what Campus Labs has been recommending to college teachers for many years:

  • Eliminate high-stakes teacher evaluation, such as over-reliance on quantitative measures (e.g., student ratings of instruction (SRI), aka, course evaluations).
  • Provide opportunities for teaching improvement.
  • Encourage feedback that improves teaching.
  • Emphasize formative feedback over summative evaluation.

Typically, in formative evaluation faculty peers, department chairs or mentors make classroom observations and then provide feedback to the instructor. When done successfully, formative evaluation can strengthen instructor confidence in teaching, change attitudes about students, improve teaching, and increase student self-reported learning and course satisfaction.

In summative evaluation, a quantitative approach often defines “good” teachers as those who have higher student ratings of instruction scores (aka, course evaluations) than other teachers under similar circumstances.

Lavigne and Good argue that relying exclusively on either classroom observations or quantitative measures is flawed. Classroom observations often fail to account for the complexity of teaching, because observers cannot witness individual instruction and student teamwork that often occur outside the classroom. Also, observers are often inadequately prepared and do not have the time and resources to provide helpful feedback and consultation. Then again, quantitative measures have their limitations. They may create a competitive atmosphere in academic departments, which discourages collegial exchange of ideas and resources. Also, they often do not provide a fair comparison among teachers, because they fail to account for disparate contexts and students’ differential prerequisite knowledge and skills. Moreover, most course evaluations lack diagnostic information about how to improve instruction.

It is worth mentioning at this point that the IDEA student ratings of instruction (SRI) system controls for many of the flaws in course evaluations. Scores are adjusted for course characteristics (difficulty, class size) and student characteristics (motivation, background preparation, work habits), which control for disparate contexts and students’ differential prerequisite knowledge and skills. In addition, the Diagnostic Feedback report provides instructor-specific information about how to enhance student learning. Each recommendation is tailored to class size and self-reported student motivation.

Given such assets, we believe quantitative measures can still have value when they are paired with classroom observations by experienced and well-trained colleagues. Although we agree with APA that teacher evaluation should not depend exclusively on either feedback source, we nonetheless see their value as two of the multiple sources of evidence worth considering. Thus, we offer the following suggestions for best practices:

Choose reliable instruments.

Reliability means that if the same student ratings instrument were administered at different times with the same group of students, the results would generally be the same. The IDEA SRI system consistently has high reliability among students within the same class who rate the same instructor as well as across courses taught by the same instructor.

Choose valid measures.

Validity concerns whether an evaluation tool is being used for its intended purpose. Does it really measure what it is supposed to measure? For example, in the IDEA system faculty select which of 13 learning objectives are relevant to their course. Validity is thus supported because the instrument reflects the instructor’s purpose. Also, the validity of the 13 learning objectives is demonstrated in two other ways. First, student ratings of progress on instructor-identified relevant objectives are positively correlated with course exam performance. Second, faculty ratings of relevance are multidimensional. The learning objectives represent general life skills, professional skills, cultural/creative development, and course-specific skills.

Ensure fairness with measures that adequately represent the complexity of teaching and its multiple outcomes.

The IDEA SRI methodology recognizes the complexity of teaching because students rate how frequently they observe 19 different teaching behaviors. Multiple outcomes are represented in the 13 learning objectives.

Consider the unintended consequences of measures.

Any measure leads to intended and unintended consequences. Under high-stakes conditions, an unintended consequence is that some instructors might lower their standards and expectations based on the erroneous belief that doing so will lead to higher ratings.

Plan for the classroom observation.

Prior to the classroom visit, meet with the observer to describe your course goals, to convey the intention of your teaching strategies and planned activities, and to discuss your students’ characteristics. Also, provide relevant course materials, and mention any specific behaviors or activities you would like them to focus on. Finally, decide on a method of observation (e.g., checklist, rating form, open-ended comments) agreeable to you and the observer.

Meet and consult with the observer as immediately as possible.

Talk while memories are fresh. Ask specifically for recommendations about how to improve teaching and the course. Keep an open mind. Even if your opinions differ, alternative perspectives can be helpful.

Educate those who will provide feedback.

Prepare students ahead of time about the value and purpose of their feedback, the meaning of individual course evaluation items, your reasons for selecting specific learning objectives as relevant to the course, and the importance of participating in the ratings system. Faculty peers and administrators should also be educated about how and what to observe and about how to evaluate course materials.

Draw upon multiple sources of evidence.

In addition to student ratings and classroom observations, other measures (e.g., instructor self-assessments, course materials, student products) should be used to increase the likelihood that the evaluation will encompass all dimensions of teaching (i.e., course design, course delivery, assessments, instructor availability, course management).

Seek feedback that leads to change.

Formative feedback is most effective when it focuses on behaviors rather than on the teacher, when it is descriptive rather than judgmental, and when it comes immediately after the observation. Feedback without recommendations on how to improve is unlikely to lead to change.

Create a growth mindset.

Far too often faculty are ranked based on quantitative measures but are not given specific feedback about how to improve. Teachers then come to dread evaluation and despise student ratings. Rather than simply comparing yourself to others, assess your current scores relative to your past performance. Have you seen growth?

In conclusion, to foster growth in teaching on college campuses, improving it must be considered at least as important as evaluating it. Educators should combine the information collected from course evaluations and classroom observations with other sources to make informed decisions that strengthen teaching and learning.


  1. Benton, S. L., Duchon, D., & Pallett, W. H. (2013). Validity of self-reported student ratings of instruction. Assessment & Evaluation in Higher Education, 38, 377-389.
  2. Benton, S. L., Li, D., Brown, R., Guo, M., & Sullivan, P. (2015). IDEA Technical Report No. 18: Revising the IDEA Student Ratings of Instruction System. Manhattan, KS: The IDEA Center.
  3. Benton, S. L., & Young, S. (2018). IDEA Paper #69: Best practices in the evaluation of teaching. Manhattan, KS: The IDEA Center.
  4. Davis, B. G. (2009). Tools for teaching (2nd Ed.). San Francisco, CA: Jossey-Bass.
  5. Lavigne, A. L., & Good, T. L. (2020). Addressing teacher evaluation appropriately. APA Division 15 Policy Brief Series, 1, 1-7.
  6. Li, D., Benton, S. L., Brown, R., Sullivan, P., & Ryalls, K. R. (2016). IDEA Technical Report No. 19: Analysis of student ratings of instruction system 2015 pilot data. Manhattan, KS: The IDEA Center.

Headshot of Steve Benton, Ph.D.

Steve Benton, Ph.D.

Data Scientist

Steve Benton, Ph.D., is a data scientist in the Campus Labs data science team. Previously, he was Senior Research Officer at The IDEA Center where, from 2008 to 2019, he led a research team that designed and conducted reliability and validity studies for IDEA products. He is also Emeritus Professor and Chair of Special Education, Counseling, and Student Affairs at Kansas State University where he served from 1983 to 2008. His areas of expertise include student ratings of instruction, teaching and learning, and faculty development and evaluation. Steve received his Ph.D. in Psychological and Cultural Studies from the University of Nebraska-Lincoln, from whom he received the Alumni Award of Excellence in 1997. He is a Fellow in the American Psychological Association and the American Educational Research Association.