Authors: David J. Webb, Cassandra A. Paul, Mary K. Chessey
First author’s institution: University of California-Davis
Journal: arXiv preprint 1903:06747 (2020)
Regardless of your view of grades, grades are a large part of education. Whether students earn a passing or failing grade determines whether they can continue on in their course sequence and degree program, whether they can keep their scholarships, and even if they can graduate. Further, grade point averages are a key part of admissions to graduate and professional programs.
Ideally, grades are a meaningful representation of the student’s achievements in class and are fair. That is, students taking the same course and experiencing similar successes and failures should earn the same grade. How to measure this achievement is often left up to the instructor in terms of deciding how to award grades and what constitutes a passing grade.
Most commonly, instructors use a percentage scale where each student’s grade is converted from a numerical score to a letter. For example, a course grade of 87-89% might be a B+ while a course grade of 90-92% might be an A-. However, this method of grading is not without criticism. For example, a failing grade is often a 60% or lower, meaning failing grades make up a larger portion of the scale than all passing grades combined. Due to this, students can have a hard time recovering from a single failing grade. Further, studies of instructors grading identical work have showed that grades can vary by as much as 10% or the difference between an A and a B.
While alternative methods of grading have been proposed such as minimum grading and standards-based grading, a popular alternative is the 4-point scale which is already used to calculate GPA. Rather than converting from a percentage to a letter grade, all of the grading is already done in the 4-point scale (where A is 4, B is 3, C is 2, D is 1, and F is 0). A key advantage of this scale is that each grade is given equal space on the scale and it is easier for students to recovered from a failed grade. For example, a student would only need one A to balance an F to a C (and hence a passing grade) compared to 3 A’s under the percentage scale. Given the importance of passing courses, this difference in grading scales may not be trivial. The goal of today’s paper is to see how student outcomes (such as passing a course) are related to the grading scale used.
The data for this study came from two quarters of a three quarter credit course at University of California Davis over a 10 year period (2003-2012), including 96 classes and over 15,000 students. From 2003-2006, the course instructors graded under a system similar to the 4 point grading system. However, after 2006, some of the instructors decided to switch to a percentage scale (table 1). As the curriculum and activities in the course were largely unchanged over the 10-year period, the authors of today’s paper could compare how the choice of grading scale may have affected the percentage of students failing the course.
Percentage of Students failing
At UC Davis, a student needs at least a C- to pass the class so the authors decided to treat any grade below a C- as a failing grade. When they did this, they found that on average, students in classes graded under the percentage scale failed the course at a rate 5 times higher than students in the classes graded under a 4-point scale (figure 1) even though the drop/withdrawal rates were similar regardless of the grading method.
To take into account any differences in the students enrolling in the class, the authors used a logistic regression model to control for the students grade point average at the start of the course. When they did this, the authors found that students in the classes graded under a percentage scale still failed at a much higher rate.
Effects of Grade Inflation
Perhaps the reason students failed the 4-point graded courses less often was because the 4-point courses were easier in the sense that the final grades were inflated or curved. Yet when looking at the actual final grades of the students, the authors found the opposite! In fact, the students in the percentage-graded courses were actually 20% more likely to earn an A than their peers in the 4-point graded courses. While the average grade in the 4-point graded classes was higher than the average in the percentage-graded classes, this was mainly due to a narrower distribution of grades in the 4-point classes (Figure 2). Thus, grade inflation doesn’t appear to account for the difference in failing rates.
Use of the Grade Space
To test the idea that the average grade is higher in the 4-point course due to more grades at the lower end of the grade scale, the authors then looked at how instructors graded exams in each course since exams make up a large portion of the final grade. While instructors under both scales were more likely to give zeroes on exam questions, instructors in the percentage-grade courses were more like to give non-zero Fs (such as 50% or “half-credit”) on exam questions. However, the number of A’s and B’s are roughly the same, suggesting that non-zero Fs are the main reason for different average grades. (Figure 3)
As another check, the authors converted all of the percentage scale grades to the 4-point scale grades to see how the failure rate would change. The authors assumed a worst-case scenario where any failing grade under the percentage scale (<60%) would be treated as a 0 on the 4-point scale (even though 0-0.5 would have possible). While 8.3% of students failed the course under the percentage scale, only 1.3% of students would have failed under the 4-point scale. That is, treating any failing grade under the percentage scale as a zero and using the 4-point scale would have cut the failure rate by a factor of 6.
What about the instructors?
So far, we’ve only talked about the students and their grades. Perhaps the instructors in the percentage-graded courses are grading their students more harshly. Of the 60 instructors who taught the course over the ten-year period, 7 instructors had taught the course using both grading schemes. The authors could then compare the fail rates of these instructors under each scheme and see if the grading method did make a difference. Sure enough, all seven instructors gave more failing grades when they used the percentage-grading method than when using the 4-point scale. There was wide variation in how large of an effect this was, ranging from 4 times as many failures to 12 times as many.
What about the actual grade calculation?
Typically, instructors perform some type of weighted average to calculate the final course grade. However, as the final grades are letter grades and not percentages, it might be more reasonable to take the median instead. Aside from questions of scale-of-measure, using a median wouldn’t penalize students for a single low grade.
To see how the grades would change if the instructor had used the median instead of the mean to calculate final grades, the authors plotted each student’s final average grade versus the median grade they would have earned (figure 4).
For the percentage grading, 92% of the students would have earned a higher final grade if the instructor had used the median instead of the mean to calculate their final grade while the number of students who would have earned a higher grade under the 4-point system is much lower. In addition, for any specific median grade, the variation in average grades was higher for the classes graded using percentage grading.
What does this all mean???
Through this research, the authors were concerned with uncovering differences in student outcomes as a result of the grading scale. They found that more students failed the course when the instructor used a percentage-scale and that this effect was not instructor-dependent. In addition, instructors using the percentage scale gave more non-zero Fs on exam questions compared to instructors using the 4-point scale. Finally, the authors found that these results were not the result of grade inflation by the 4-point scale and that grades calculated using the mean and median were more aligned under the 4-point scale. Through this research, the authors are not calling for instructors to discontinue using the percentage scale but instead, alerting instructors to biases of the grading scale that might make their grading philosophies misaligned with their teaching philosophies.
Figures used under CC BY-SA 4.0.
I am a physics and computational mathematics, science, and engineering PhD student at Michigan State University and the founder of PERbites. I’m interested in applying machine learning to analyze educational datasets and am currently studying the physics graduate school admissions process.