Title: Percent Grade Scale Amplifies Racial/Ethnic Inequities in Introductory Physics
Authors: Cassandra A. Paul and David J. Webb
First author’s institution: San Jose State University
Journal: arXiv pre-print [2203.0262] (2022)
Almost two years ago, we covered a paper that explored the differences between grading on the percentage scale (93%-100% is A, 90-92.99% is A-, etc.) and grading directly on the 4.0 scale used for grade point averages (4.0 is A, 3.7 is A-, etc.) (Read our article here). The paper found that more students failed the course when the instructor used a percentage grading scale than when they used a 4.0 grading scale and that instructors gave more non-zero F grades on exam problems under the percentage grading scale than under the 4.0 grading scale. What that paper didn’t investigate however was how the grading scale might affect different populations. For example, grades show inequities along the axes of first generation status, low income status, and racial/ethnic identity, with these students earning lower grades than their more privileged peers. Today’s paper shows that the percentage grading scale inflicts an additional penalty on racially and ethnically minoritized students.
Some necessary background
Typically, studies looking at grade gaps adopt a student deficit model where the gaps are assumed to be caused by some deficiency of the student such as lacking the needed preparation to succeed in the course. For this study however, the authors chose to view the results through a course deficit model where grade gaps are the result of policies and procedures of the course itself and hence, are something the instructors have control over.
The authors also assume that courses should strive for roughly equal outcomes among students rather than equal gains among students. The former approach, referred to as Equity of Parity, focuses on eliminating inequities while the latter approach, referred to as Equity of Fairness, focuses on not adding additional inequities to those that already exist.
Just like their previous study, the authors used the same database of course and exam grades for a sequence of introductory physics courses aimed at biological science students. Some of the classes were graded under a 4.0 scale while other classes used the more traditional percent scale. Otherwise, the courses were nearly identical, with some instructors having used both grading schemes at some point.
Unlike the previous paper, the authors introduce a new measure to control for each student’s understanding of the course materials, allowing them to disentangle student understanding from grading policy. Because they had access to question-level exam scores, the authors could see how many questions students earned an “A” on. The authors found that the fraction of “A”s a student earned on test questions was largely independent of the course grading policy, both in general and for instructors that had used both grading scales at some point over their tenure as an instructor for the course.
Experiment 1: Examining racial gaps
First, the authors wanted to determine how grades differed under each grading scale for racially minoritized students and racially majoritized students as well as any interactions between the two. To do so, they used hierarchical linear modeling because students are grouped into courses that would create a systematic variation in the grades.
They found that minoritized students earned almost a quarter of a grade point less than their majoritized peers in the courses graded on the 4.0 scale. However, the grade gap between the majoritized and minoritized students was 30% larger in the courses that used a percentage grading scale.
To determine if the results were influenced by the students’ understanding of the material, the authors reran the model controlling for the fraction of “A”s in the course. While the grade gap decreased in the courses graded under the 4.0 scale, the gap was largely unchanged in the courses graded with the percentage scale.
Experiment 2: Determining why the gaps are present
While the previous experiment was able to determine that there were grade gaps between majoritized and minoritized students, it wasn’t able to determine what aspects of the course were driving it. For the next experiment, the authors explored various explanations for why the gap existed.
First, the authors normalized all of the grade data so that the distributions would have a similar shape. Under the percentage grade scale, the distribution of grades is much wider because over half of the scale is devoted to failing grades. Normalizing the grades then reduces the impact of these non-zero “F”s.
Once they did so, the authors found that the grade gap shrank. More importantly however, the grade gap in the percentage graded courses disappeared, suggesting that whatever is causing the grade gap has the same origin as what causes the overall grade differences between the 4.0 grading scale and the percentage grading scale.
One possible explanation is the test taking strategies of majoritized and minoritized students and how those are rewarded or punished. For example, prior work has found that majoritized students leave fewer questions blank on exams and miss fewer exams than minoritized students do. As such, we would expect the percentage grade scale to then punish minoritized students more because non-zero “F”s hold more weight in the grade calculation.
Sure enough, once the authors controlled for the number of exam questions left blank and the number of exams missed, the grade gap disappeared in the 4.0 graded courses but not the percentage graded courses. However, once the authors controlled for the number of non-zero F grade exam questions, the percentage grade gap also disappeared.
The researchers then concluded that the reason for majoritized and minoritized students earning different grades under both systems was due to the grading penalty as a result of leaving exam questions blank, missing exams, and the increased weight of “F”s under the percentage grading scale. Because these situations are all handled under various course policies, they are in the realm of the instructor to make changes.
Experiment 3: Longer-term impacts
In their previous work, the authors found that the percentage grading scale caused more students to fail the course compared to the 4.0 grading scale. Because these courses are often required for graduation, students who fail would need to retake the course. The authors expected that these outcomes might also show differences between majoritized and minoritized students and decided to see if that was the case.
When they repeated their analysis and looked at majoritized and minoritized students separately, they found that the number of minoritized students who failed a course increased by 11 percentage points under the percentage grading scale compared to the 4.0 grading scale, nearly double that of majoritized students. Likewise, the number of minoritized students who repeated the course was nearly 6 percentage points higher under the percentage grading scale compared to the 4.0 grading scale, nearly 50% higher than that of majoritized students.
The results suggest that changing the grading scale could then allow more students to pass a course. Instructors however might be worried that the students who pass under the 4.0 grading scale but not under the percentage grading scale might face more difficulties in future classes and not be able to pass those, especially if they are graded under the percentage scale. Yet, when looking at the grades of students after taking the course sequence, the authors found no statistical difference between their GPAs. The result was found for both students who entered the course sequence with higher GPAs (>3.0) and students who entered the course sequences with average GPAs (2.0-3.0) and for the entire class and just minoritized students. The authors interpreted these results to mean that in practice, the grading scale choice was not passing “unqualified” students along.
Returning to the big picture
In this study, the authors found that minoritized students earned lower grades in the course sequence compared to their majoritized peers, even when controlling for understanding of the material. They found that this difference was attributable to the percentage grading scale’s increased weight of failing grades, both in terms of missing grades and non-zero “F”s. These results suggest that simply changing the grading policy would be a first step toward reducing inequity and that changing course policies around missing exams and blank questions could also help. The authors warned that simply telling students not to leave questions blank isn’t the solution because it places the blame on the minoritized student for not answering questions the same way their majoritized peers do rather than the course policies that reward one behavior over the other.
In response to instructors who fear that switching to a 4.0 grading scale would do a disservice to students in the long run because most of their future courses will be graded under the percentage scale, the authors argue that instructors shouldn’t perpetuate inequities simply to prepare students to exist in an inequitable system.
I am a postdoc in education data science at the University of Michigan and the founder of PERbites. I’m interested in applying data science techniques to analyze educational datasets and improve higher education for all students