Can we explain the physics conceptual inventory gender gap?

Title: Partitioning the gender gap in physics conceptual inventories: Force Concept Inventory, Force and Motion Conceptual Evaluation, and Conceptual Survey of Electricity and Magnetism

Authors: Rachel Henderson, John Steward, and Adrienne Traxler

First author’s institution: Michigan State University

Journal: Physical Review Physics Education Research 15 010131 (2019)

Disclaimer: I currently work with one of the authors but I was not involved in this project


If you have read PERbites for a little while you probably are familiar with the gender gap on conceptual inventories. That is, men tend to outperform women by 12% on mechanics post-instruction conceptual inventories and 8.5% on electricity and magnetism post-instruction conceptual inventories. While many studies have found a gender gap, there have been various explanations of its cause. These include differences in academic preparation, physics and math preparation, science anxiety, mathematics anxiety, and stereotype threat. However, no study to date has tried to determine how much of the gender gap can be attributed to each of these factors. That is the goal of today’s paper.

For their study, the researchers used data from three previously published papers on gender gaps on conceptual inventories. The data came from three separate universities and included one course that used the Force Concept Inventory (FCI), two courses that used the Force and Motion Conceptual Evaluation (FMCE) and two courses that used the Conceptual Survey of Electricity and Magnetism (CSME). From this data, the researchers used a subset of questions to form a “corrected” version of each concept inventory. This corrected version only included questions where men did not outperform women (or vice versa) and questions that weren’t too easy or too difficult. By comparing the gender gap on the corrected versions of the conceptual inventories to the gender gap on the original conceptual inventories, the researchers could examine how much of the gender gap could be attributed to the tests themselves. In addition, the researchers had access to some of the students’ in-class physics exam scores and their SAT and ACT scores. Using this data, the researchers used hierarchical linear modeling with the student’s academic performance measures, their gender, and their conceptual inventory pre-test (pre-instruction) scores to predict the students’ post-instruction conceptual inventory score. In line with previous studies, students were grouped into bins based on test score. The gender gap could then be expressed as \delta G = \delta G_{pop} + \delta G_{fair} + \delta G_{prep} + \delta G_{equal} where \delta G_{pop} is the gender gap as a result of academic performance, \delta G_{fair} is the gender gap as a result of fairness of the instrument, \delta G_{prep} is the gender gap as a result of prior physics preparation, and \delta G_{fair} is the gender gap of equally prepared and performing students, which means it functions as a measure of everything not already accounted for in the model.

Initially, the results weren’t too surprising based on previous findings. First, the researchers found that the percentage of women in a pre-test bin decreases as the average score of the bin increased. That is, men outperform women on conceptual inventories even before the material is covered in the course. Even when using the corrected versions of the conceptual inventories, the same patterns emerge (see figures 1 and 2).

Figure 1: Pre-test scores on the various conceptual inventories (original on left, correct on right) vs post-test score on same assessment. Numbers on the plot correspond to the number of students in that bin. (Figure 1 in paper)
Figure 2: The plots show the same information as in figure 1 except these students took the CSEM (figure 2 in paper)

When looking at the various factors that go into \delta G, the researchers found \delta G_{equal} could “explain” most of the gap followed by a student’s physics preparation. The amount of explanation each offered varied greatly between the different courses and conceptual assessments (figure 3).

Figure 3: The percentage of \delta G that can be explained by each of the variables in the researcher’s models. Notice that percentage attributed to \delta G_{equal} and \delta G_{prep} vary greatly between tests and institutions, represented by the number after the concpetual inventory acronym. (Figure 3 in paper).

When discussing their result, the researchers suggested that psychosocial factors such as stereotype threat may account for some of the gender gap included in \delta G_{equal} but likely not all of it. For example, they note that in the course labeled “FCI-1”, a gender gap favoring men was found on qualitative exam questions but there was no such gap on quantitative exam questions, suggesting psychosocial factors could not be the only thing contributing to \delta G_{equal}. Additionally, for the classes that used the CSEM, some of the students had also been in the courses that used the FCI and FMCE allowing the researchers to include their post-test scores as a variable in the model. When taking into account how the student scored on those post-tests, the amount of the gender gap attributable to \delta G_{equal} decreased 21% (figure 4).

Figure 4: The percentage of \delta G that can be explained by each of the other variables for the CSEM once the researchers included post-test FCI or FMCE as additional measures of prior preparation. Notice that the percentage attributed to \delta G_{equal} has been reduced compared to figure 3. (Figure 4 in paper).

Taken together, the results of this study suggest that physics preparation accounts for a large portion of the gender gap on conceptual inventories, while academic performance and the tests themselves contribute very little to the observed gender gap. However, due to large value of \delta G_{equal} much of the gender gap is still unaccounted for, possibly being explained by other measures of physics preparation and psychosocial factors.

For instructors and administrators using the exams, the researchers offer a few recommendations. First, instructors need to calibrate the conceptual inventories for their courses. That is, they should make sure the items they are using are valid for the students taking the conceptual inventory. Since item difficulty is just the fraction of students answering the question correctly, instructors can easily check if their items are too easy or too difficult for valid conclusions to be drawn. This is especially important if the conceptual inventories are graded for accuracy. Second, the researchers note that conceptual inventories are just one way to assess learning. There are many factors that go into the student’s numeric score such as the student’s skill with multiple choice tests and the concepts the instructor emphasized during the course and these can materialize in students’ scores. Finally, addressing gender gaps goes beyond conceptual inventories. Instructors should also examine other aspects of their learning environments that may contribute to the gender gap, including their own practices. Even though the gender gap may still not be fully explained, we can take steps to reducing its effects in the classroom.

Figures are used under Creative Commons Attribution 4.0 License

Leave a Reply

Your email address will not be published. Required fields are marked *