Extra time on physics exams doesn’t appear to improve performance (or close equity gaps)

Title: Extended exam title has a minimal impact on disparities in student outcomes in introductory physics

Authors: Nita A. Tarchinski, Heather Rypkema, Thomas Finzell, Yuri O. Popov, and Timothy A. McKay

First author’s institution: University of Michigan

Journal: Frontiers in Education (2022)

Disclaimer: The author has previously and is currently working with some of the researchers on this paper but was not involved in this project.

During my physics undergrad, long exams were the norm. While the professors claimed the exams could be completed in 2 hours, many students took 3 or even 4 hours to finish it. After all, doing well on exams is important in physics given the weight of them in the final grade and it isn’t unreasonable to assume that spending more time on the exam would result in a better grade. Without sufficient time to complete an exam, students might feel a time pressure, possibly making careless mistakes or skipping more difficult questions. Providing more time to all students might then result in better performance on the exams.

In addition, previous studies suggest that extra time might offer disproportionate benefits to students who are minoritized in higher education, though the results have been found in the context of standardized tests and not classroom tests. Given the known disparities in physics grades between majoritized and minoritized students that are driven mainly by differences in high stakes exams, if the exam time closed the gaps, it would be a relatively easy solution to make physics courses more equitable.

Unfortunately, today’s study is unable to provide evidence that that is the case.

The study took place in a first-semester, large introductory physics course at the University of Michigan in Winter 2018. The course was taught in a lecture hall but used active learning strategies like pre-class readings and assignments and question discussions during class. Unlike in previous semesters, the instructors provided all students with 50% more time on the exams and final compared to previous semesters.

Because the study occurred in an actual classroom instead of a lab, there wasn’t a true control group. However, the instructors included some questions from previous years’ exams on the Winter 2018 exams and used those as a measure of baseline performance under “normal” exam time conditions.

In addition to comparing performance on the questions, the researchers were also interested in how students used the extra time. To get that data, they had students swipe their student IDs when they turned in their exams to get a time stamp.

To analyze the data, the researchers used a “Better Than Expected” (BTE) measure to account for each student’s typical academic performance, which was the student’s average grade in all courses they had taken up to the time they took the introductory physics course excluding the physics course itself. The difference between the score on the repeated questions and their average grade in all other courses is then the BTE, which gives an idea of how much better students are doing on their exam than they would be expected to based on their previous academic performance.

Finally, the researchers split the results by binary sex, race, and first generation college status. Because the BTE accounts for prior performance to a degree, the BTE approach can look for any additional disparities as a result of the physics course itself.

When the researchers looked at the results, they found that all students had a negative BTE, meaning that they did worse in physics relative to their other courses. As they were interested in disparities between students, the researchers looked at how different groups compared on the BTE measure, finding small differences between males and females and white and Asian students and Black, Hispanic, Multiracial, and Native American students (Figure 1). That is, the exam time benefited the students differently. However, converting the BTE measure to a percent difference, the researchers found that the gap was only 2 percentage points, suggesting that the extra time hardly affected the score gaps on exams.

*Figure 1: BTE performance difference by demographic group and exam time condition.* *The gaps are all relatively small and only amount to small differences on the percent scale. (Figure 3 in paper)*

Yet, that doesn’t mean the extra time was worthless. All students tended to do slightly better on the repeated questions with the extra time compared to the historical data where students had the typical exam time. The size of the effect was only 2 percentage points, suggesting the effect is small.

Finally, the researchers looked at which students used the extra time. To do so, they used a clustering algorithm to create groups of students based on the amount of time they took on each exam. Doing so, they found three groups of students, which they called the “early, middle, and late departures.”

Perhaps unsurprisingly, they found that students who tended to earn higher grades in their other courses tended to take advantage of the extra time and be in the middle and late departure groups. Looking at demographics (Figure 2), the researches found that females fell into the middle and late groups more often than males did while more first generation students fell into the early groups than continuing generation students did.

*Figure 2: Composition of the early, middle, and late test completers by sex, race, and college generation. (Figure 6 in paper)*

Unfortunately though, the researchers did not find evidence that the amount of time spent on the exam had a significant impact on the exam grade the student earned (Figure 7).

Figure 3: Exam performance split by sex, race, and college generation and the time to complete the exam. Notice that length of the bars are similar for each group, regardless of how long the student took to complete the exam. (Figure 7 in paper).

While the results aren’t what the researchers hoped, they did express some concerns that might influence their results. First, the course has had consistent topics for many years and instructors have a good idea of the type of questions to ask students and the level of difficulty they should be. In a course with less historical data and more variability in the difficulty of questions asked, the extra time could be helpful. In addition, there are variations in instructors and teaching styles between semesters that may influence what and how much students learn that could cancel out some of the effects of extra time. For example, if students did poorly on a topic one year, the instructor may try teaching the material in a different way the next time, possibly resulting in better exam grades in the future.

However, the current evidence suggests that providing students with 50% additional time on exams does not result in large increases in performance on exams or lower the performance differences between majoritized and minoritized groups. Instead, the extra time may have slightly increased the gaps. While there were small improvements overall on the exam, they are just that, small. Given the limited benefit of the extra time and the increased time required of students and proctors, extra time on exams was not kept as a feature of the course in following semesters. Because of specific implementations of the study at the university, the authors recommended that researchers replicate their approach in other contexts before reaching a conclusion. Right now however, the evidence that extra exam time benefits students in introductory physics is limited at best.

Figures used under CC BY 4.0.

Nick Young

I am a postdoc in education data science at the University of Michigan and the founder of PERbites. I’m interested in applying data science techniques to analyze educational datasets and improve higher education for all students

Leave a Reply Cancel reply