Title: Analysis of Praxis physics subject assessment examinees and performance: Who are our prospective physics teachers?
Authors: Lisa Shah, Jie Hao, Christian A. Rodriguez, Rebekah Fallin, Kimberly Linenberger-Cortes, Herman E. Ray, Gregory T. Rushton
First Author Institution: Stony Brook University
Journal: Physical Review Physics Education Research 14, 010126 (2018)
I probably don’t have to convince you that teachers have a significant impact on the success of their students and that effective teaching requires knowledge of the subject matter. Because of this, many US states require teachers to undergo some type of certification before they can begin teaching that usually includes passing a certification exam. In many states, the Praxis physics subject exam is used which is a 100-question, multiple choice test that covers topics in a typical introductory physics course. Who actually takes this test and how they perform on the test has not been well studied. The author of today’s paper attempt to determine the personal and professional characteristics of those who have taken the Praxis physics subject exam in the past decade and to describe how those characteristics may be correlated with performance.
The data from the study comes from all the examinees who took the Praxis physics subject exam from June 2006 to May 2016, (N=9667). In the case that someone took the exam multiple times, only their highest score is used in the data. Since each state determines the minimum passing score for its teachers and that cut score can vary year to year, the researchers decided to use the median cut score across all the exams used in the study. Using the exam’s 100 to 200 scaled score, this median cut score corresponds to a scaled score of 140 or answering roughly 54% of the questions correctly.
Since multiple types of demographic data are collected from the Praxis examinees, the researchers used stepwise linear regression to determine which variables were associated with performance on the Praxis physics subject exam. Stepwise linear regression adds or removes a single variable from the linear regression model and sees how well the resulting model explains the data. The procedure then selects the model that best represent the data. Doing so, the researchers found that using undergraduate major, gender, and undergraduate GPA resulted in the best model, which could explain 24% of the variation in scores.
Finally, the researchers performed differential item functions (DIF) on the individual test questions, which analyzes whether individual test questions are answered differently by different populations. ETS (which writes the Praxis exam) classifies “A” items as those with no difference, “B” items as those with some difference, and “C” items as those with a large difference. In this paper, a “C” classification means that the probability of the reference group (say males) answering correctly is at least 89% higher than the probability of another group (say females) answering correctly. A large DIF does not necessarily indicate bias however, just that the populations answered the question differently.
So what did the researchers find? First, they found a wide variation in the minimum passing score by state, varying from 126 (~46% correct) to 153 (~64% correct). The median score for each state that uses the Praxis is shown in figure 1. Overall, about 75% of examinees pass the exam.
Second, they found that unsurprisingly physics and engineering majors tended to perform best on the exam, scoring between 10 and 20 scaled points higher than any other majors. Scores for all majors are shown in figure 2. Interestingly, physics and engineering majors represented only about a third of examinees. Graduate major was not found to be a significant predictor of performance but this may be because only 60% of examinees provided this data.
Third, the researchers found that most examinees (78.7%) performed well in their undergraduate classes and earned a GPA of at least 3.0 (on a 4.0 scale). Examinees with GPAs above 3.5 tended to outscore the other examinees by between 5 and 10 scaled points. However, even examinees with GPAs less than 2.5 tended to pass the Praxis physics subject exam.
Fourth, the researchers found that a majority of the examinees were male (~63%) and tended to outperform women by 10 scaled points. This difference was consistent over the ten years of exams analyzed in this study and is shown in figure 3. Women were also 20% less likely to pass the exam than males were. On a positive note however, the number of questions that showed a “C” DIF has decreased from an average of 16.5% of the questions to 5% and 7% of the questions on the most recent Praxis exams. Despite this, a gender gap in the scores still exists.
Finally, researchers looked at the race and ethnicity of the examinees. Even though this was not found to be a significant predictor, physics is still overwhelmingly white and male; hence, the researchers wanted to investigate the diversity (or lack thereof) of the examinees. Around 85% of the examinees were white, meaning the lack of race/ethnicity being a significant predictor of success may be due to a lack of non-white examinees in the population. While white test-takers and “other” (Asian, Native American, Pacific Islander) test-takers performed about the same, they both significantly outperformed Black examinees by anywhere from 11 scaled points to 24 scaled points. In most years, Black examinees scored below the national median cut score (140) while all the other race/ethnicity categories scored above the national median cut score in every year. Full results are shown in figure 4. Due the low number of Hispanic test-takers, there is considerable variability in their scores and hence it is difficult to determine if a performance gap is present. On average, 16.5% of questions had a “C” DIF score when comparing Black and White test takers while 10.5% of the questions had a “C” DIF score when comparing Hispanic and White test-takers.
So what can we take away from this paper? First, the prospective teaching pool suffers from the same lack of diversity as found in the those who earn physics degrees and that performance gaps exist between white males and other populations. Second, even among the best subgroups in the data (physics and engineering majors), the average scaled scored corresponds to answering roughly 66% of the mainly conceptual questions correctly. This performance suggests that information is not being retained from introductory physics courses. As reformed and active learning courses have been shown to result in improved conceptual learning, this result suggests the need to further increase their use to best prepare our future physics educators.
While this study uses a nationally representative sample, the researchers acknowledge that many of the most populous states do not use the Praxis exam and hence are not included in the data. In addition, the choice of using a national median cutoff score of 140 was chosen arbitrarily. The researchers acknowledge that using the mean of the mean state cut scores would have slightly changed the results since this score was 2 scaled points lower than the national median cutoff score. Second, the researchers emphasize that this paper did not take into account whether the examinee actually passed or not, just whether they were above a score that was representative of the national cutoff point. Finally, the researchers acknowledge that not everyone who takes the exam is trying to become a physics teacher but may be trying to become certified to teach a second or third subject in order to become more competitive on the job market. Thus, it appears much more work is needed before we can arrive at a “final answer” to the question of who are our prospective physics teachers.
Figures used under Creative Commons Attribution 4.0 International License.
I am a postdoc in education data science at the University of Michigan and the founder of PERbites. I’m interested in applying data science techniques to analyze educational datasets and improve higher education for all students