Authors: Devyn Shafer, Maggie S. Mahmood, and Tim Stelzer
First author’s institution: University of Illinois at Urbana-Champaign
Journal: Physical Review Physics Education Research 17, 010113 (2021)
If you’ve been reading PERbites for a while, you might remember a paper by Salehi et al. we covered back in September 2019 (if not, read it here). That paper claimed that once differences in prior preparation as measured by ACT/SAT math score and physics conceptual inventories were taken into account, demographic gaps on physics 1 final exams disappeared. Today’s paper revisits that study and offers a different interpretation. They claim that demographic gaps only disappeared because of how Salehi et al. accounted for demographics and if demographics are modeled differently, the gaps are still present after taking prior preparation into account.
What this study did
Let’s talk about this study compared to the Salehi et al. study. This study also looked at physics 1 students, using data from over 8,500 students across 8 years of classes. The authors also collected similar information as in the Salehi et al. study such as race and ethnicity, gender, ACT/SAT math scores and course exam scores. Unlike the Salehi et al. paper, the authors did not have access to physics conceptual inventory scores so they used their department’s physics placement test instead as a physics specific measure of prior preparation.
The authors then repeated the analysis of Salehi et al and combined anyone who identified as white or Asian into a non-URM (underrepresented minority) group and anyone identifying as a different race or ethnicity into the URM group and then used linear regression to model final exam score.
In addition, the authors repeated the analysis but instead of classifying students as URM/non-URM , they used a binary version of the student’s race (e.g. African American yes or no, Asian American yes or not, etc.). Under this model, the authors could see if there were differences between students of different races instead of just between URM and non-URM students.
As you might have guessed by the paper’s title, the first analysis did replicate the findings of Salehi et al. That is, while URM students did score about one-third of a standard deviation lower than their non-URM peers on the final exam, once measures of prior preparation were taken into account, the difference was no longer statistically significant (Figure 1).
When individual races and ethnicities were included in the model though, a different story emerged. When just looking at race and ethnicity, African American, Asian American, and Hispanic students all scored significantly lower than their white peers on the final exam. But even when taking into the student’s ACT math scores and physics placement test score, the differences were still statistically significant, though scoring gaps tended to decrease (Figure 2). Interestingly, Asian American students were the exception with the scoring gap increasing once measures of prior preparation were taken into account.
The authors then tried a new model where they added the students’ scores on the first three exams in the course. Even when including these course-specific measures of prior preparation, the demographic gaps still remained!
What did we learn
On a classroom level, we learned that in contrast to Salehi et al.’s claim, demographic gaps do not appear to be preparation gaps in disguise. In fact, this study suggests there are prior preparation gaps as well as demographic gaps. For educators, this suggests that what’s happening the course may be just as important as what students are bringing into the course. Even something as seemingly innocuous as the instructor’s mindset about student ability can inflate demographic gaps.
On the educational policy level, this study suggests researchers need to be careful about how they are analyzing their data. Here, using “URM” instead of the individual races and ethnicities hid the fact that race and ethnicity do have explanatory power for final exam grades.
To determine how to best model the data and not hide possible trends, the authors recommend running descriptive stats and possibly running separate models for disaggregated race and ethnicity data before running a combined model. Doing so can act as a first check to ensure that combining groups is in fact an appropriate action to take.
Figures used under CC BY 4.0.
I am a physics and computational mathematics, science, and engineering PhD student at Michigan State University and the founder of PERbites. I’m interested in applying machine learning to analyze educational datasets and am currently studying the physics graduate school admissions process.