Yes Virginia, there is a way to reduce gender bias on your student evaluations of teaching

Title: Mitigating gender bias in student evaluations of teaching

Authors: David A. M. Peterson, Lori A. Biederman, David Andersen, Tessa M. Ditonto, Kevin Roe

First author’s institution: Iowa State University

Journal: PLoS ONE 14(5) e0216241 (2019) [Open Access]

As the end of the semester approaches, universities are asking their students to evaluate their professors through forms often referred to as student evaluation of teaching (SET). Typically, students will answer multiple questions about the instructor and the course material on a strongly disagree to strongly agree scale, such as whether they thought the instructor was interested in teaching the course and whether they thought they learned a lot in the course. While these surveys are subjective, university administrators often use this survey data as the primary means of evaluating teaching effectiveness and making tenure and promotion decisions.

Yet, research suggests that students tend to rate female instructors more critically than they rate male instructors; this is true even for objective measures such as the time the instructor took to return an assignment. For a typical five-point scale question (strongly disagree, disagree, neutral, agree, strongly agree), students rate female instructors an average of 0.5 points lower than their male counterparts.

Unfortunately, this gap is likely caused by implicit bias , which can be difficult to overcome as the person showing the bias is not aware of that bias. Typically, combating implicit bias requires calling attention to the bias, motivation to overcome the bias, and training individuals to overcome the bias. Given that the gender gap on student evaluation of teaching is likely caused by implicit bias, the authors of today’s paper wondered if alerting students to their potential for bias and asking them to avoid stereotyping could help close the gap between evaluations of male and female instructors.

To test their idea, the authors selected four introductory courses in biology and American politics at Iowa State University, where one of the courses in each discipline was taught by a woman. The authors then randomly assigned the students in each course to either the control or experimental condition. That is, each course had students in the control and experimental groups so the authors could compare between students in the same course rather than between courses, controlling for any factors that could be different between courses. All the students answered their survey questions online and the questions were the same for both groups. However, the experimental group saw an additional two paragraphs of text before they answered their questions:

Student evaluations of teaching play an important role in the review of faculty. Your opinions influence the review of instructors that takes place every year. Iowa State University recognizes that student evaluations of teaching are often influenced by students’ unconscious and unintentional biases about the race and gender of the instructor. Women and instructors of color are systematically rated lower in their teaching evaluations than white men, even when there are no actual differences in the instruction or in what students have learned.

As you fill out the course evaluation please keep this in mind and make an effort to resist stereotypes about professors. Focus on your opinions about the content of the course (the assignments, the textbook, the in-class material) and not unrelated matters (the instructor’s appearance).

After collecting all the student responses, the authors focused on three key items: “your overall rating of this instructor is”, “what is your overall rating of this instructor’s teaching effectiveness”, and “your overall rating of this course is”. These items were all scored on a scale of 1-5, with 5 representing the most favorable answer. In addition, the authors collected the students’ GPA, their class level, their responses to items about the textbook and in-class activities, their expected grade in the course, and their gender. These should be the same between the two conditions within a single course if the students were randomly distributed, which the authors found to be true.

When looking at the results, the researchers found that students in the experimental group on average rated the female instructors 0.41 points higher overall and 0.30 points higher in teaching effectiveness. In addition, they rated the course overall as 0.51 points higher than the control group did. Figure 1 shows the distribution of responses.

Figure 1: Counts of the number of students giving the female instructor each score on three items on the teaching evaluation, Students in the experimental group tended to rate female instructors more favorably than students in the control group. (Fig 1 in paper).

On the other hand, students in the experimental group did not rate the male instructors more favorably than the students in the control group. That is, asking the students to recognize their possible implicit bias did not affect their perception of the male instructors, but caused the students to rate the female instructors more highly and on the same level as their male counterparts.

When looking at the individual students, the authors noticed that women in the experimental group did not rate the female instructors more highly. However, when looking at the men in the experimental group, the evidence was mixed. Simply using summary statistics showed that the men in the experimental condition did rate the female instructors higher but post-hoc tests showed that the increase was not statistically significant. The authors suggest that this may be due to the limited sample size (only around 90 students in the courses taught by men).

The results of this study suggest that simply including a statement calling attention to implicit bias can help reduce the gap. In this case, the improvement in the female instructor’s scores was similar to the previously reported gender gap, suggesting that this technique may actually eliminate the gap. The authors caution that this intervention may only be a short term solution. Since the statement was novel to the students, they may have been more likely to read it while if all of their surveys contained the statement, they may just skip over it. Further, all of the instructors in the study were white so the results may not generalize to instructors of color, who also face bias on student evaluations. Thus, there may not be a way to fully counteract bias on student evaluations, but the results of this study suggest one possible direction.

Interested in exploring your own implicit bias. Try taking one of the tests: https://implicit.harvard.edu/implicit/takeatest.html

Figure used under CC BY 4.0.

Nick Young

I am a postdoc in education data science at the University of Michigan and the founder of PERbites. I’m interested in applying data science techniques to analyze educational datasets and improve higher education for all students

Leave a Reply Cancel reply