Title: Nothing’s plenty: The significance of null results in physics education research
Authors: Luke D. Conlin, Eric Kuo, and Nicoe R. Hallinen
First author’s institution: Salem State University
Journal: arXiv preprint 1810.10071
How do journal editors decide which research should be published and which research shouldn’t be published? Typically, publication requires finding a statistically significant result. In the literature, a statistically significant result means that the chance of the effect occurring at random (the p-value) is less than some specified threshold, typically 5%. By requiring a statistically significant result for a study to be published, there is a publication bias against null results, that is results in which the desired outcome or result is not achieved. This bias is not restricted to educational studies as recent work estimates that 95.6% of social science journal articles reject the null hypothesis (show statistically significant results), 85.4% of medical journal articles reject the null hypothesis, and more recently, a study of over a million biomedical journals found that 96% of them rejected the null hypothesis. Of course, not all null results should be published: some null results may be a result of a bad experiment or a misguided hypothesis. The authors of today’s article argue that some types of null results should be published as they can provide just as much knowledge as a statistically significant result could.
As an introduction to the usefulness of null results, the authors begin by introducing one of the most famous null results of all time: the Michelson-Morley experiment. In the late 19th century, it was presumed that light waves needed a medium to travel through, known as the aether which could be found throughout the universe. If this were the case, the speed of light should be different based on where Earth was in its orbit since the relative motion between the aether and Earth would be different at different points in the orbit.
However, Michelson and Morley detected no such difference, failing to confirm the aether hypothesis, eventually leading to the idea that light can travel through the vacuum and Einstein’s Theory of Special Relativity (yes that one, E=mc2 among other results).
While many (most) null results are unlikely to have the impact of the Michelson-Morley experiment, the authors claim that null results can contribute in three key areas: serving as existence proofs, showing effects fail to generalize to new contexts, and determining necessary conditions for replication.
Many studies examining new instructional techniques or interventions are aimed at showing how the new technique is better than a current technique. However, showing a new technique is just as good as a current technique can be useful as well. As an example, the authors consider a study (pg 31) comparing a new technique called general principle strategy to the control of variable strategy when trying to determine how independent variables affect the outcome variable. Most students are taught control of variables, in which all independent variables are held constant except for one which is varied and then any variation in the outcome variable must have been due to the changed independent variable. In contrast, the general principle strategy compares trials with the same outcome and sees which independent variables changed and therefore had no effect or an interaction effect with another variable that changed. This strategy is more useful when the independent variables cannot be directly manipulated as in medical diagnosis.
The researchers had originally thought that students taught the general principle strategy would outperform students who were only taught the control of variable strategy on questions such as the one shown in figure 2.
However, students were equally successful in determining which independent variables influenced an outcome variable regardless of which strategy they used. Thus, even though the researchers failed to show their new strategy was more effective, they showed it was just as useful as the ubiquitous control of variables approach and led to a new research question about whether there are situations where the general principle strategy may be more useful.
Effects Failing to Generalize to New Contexts
Studies in physics education research are situated in a specific context that may impact the results. For example, using students from a large research university may lead to different results than using students from a regional liberal arts college or presenting a problem as a multiple choice question may lead to different errors than if the question were presented as an open response question. In today’s paper, the authors focused on a specific example of solving simple algebraic problems. Prior work found that students were more accurately able to solve simple one- or two-step word problems than if the same underlying mathematical equations were provided as just equations to be solved, likely because the students were able to use informal reasoning. The researchers called this verbal advantage. When the problems contained more steps, students performed better on just solving the equations, which the researchers called a symbolic advantage. The underlying hypothesis was the difficulty of the problem was the main factor determining which approach would be more useful.
A second research group tried to extend these findings to a new type of problems: proportion word problems (such as “6 is to 30 as 5 is to x”). Since these types of questions were similar to problems used in the original study, the researchers reasoned that these problems should show either a verbal advantage or a symbolic advantage. However, the students solved the problems with similar degrees of accuracy, regardless of whether the problems were presented as word problems or just as equations to solve. Since these problems were similar in difficulty to the simple equations in the first study, the researchers reasoned that problem difficulty could not explain the results of the first study alone. Thus, the null result showed that the original findings did not generalize to a new type of problem and provided evidence against the initial explanation of the results.
Determining Necessary Conditions for Replication
A key part of science is reproduciblity of results. If a finding is not a result of chance, other groups of scientists should observe the same results when conducting the experiment. However, many of these replicability studies do not make it into the literature, either because they are not novel if they confirm a result or fail to support the original conclusion (and are hence a null result). In PER especially, understanding when an effect is not observed can lead to just as much information as when that effect is observed. In the paper, the authors point to a specific case involving free body diagrams. Prior work had shown that prompting students to draw a free body diagram actually caused the students to perform worse on a problem than not prompting them to. The researchers thought this was because the students tended to solve the problems using informal reasoning and by prompting the students to draw a free body diagram, the students were engaging in formal reasoning they did not understand, leading to more errors. A replication study failed to find this difference in performance however. Instead, prompting the students to draw free body diagrams had no effect on their accuracy. The researchers who did the replication study thought that the amount of class time spent on force problems may have been responsible for the failure to replicate. The classes in the replication study only spent two fifty-minute classes on the force problems, which may not have been enough time for students to have developed informal reasoning around the topic. In this case, the null result suggests that the effect may require students to have developed a certain level of problem solving skill before the effect can be observed.
Discussion and Takeaways
While null results can clearly provide useful information, not all null results should be published as flooding journals with poor studies drowns out the useful null results and the statistically significant results. One method that has been gaining traction in psychology is to review pre-registered studies. In these types of studies, researchers submit their research question, their methods, and their planned analysis for review. Reviewers then approve or reject the study based on the significance and design of the study rather than the findings, which are not submitted as part of the review. If the study is accepted, the study is published regardless of the results.
So what can we take away from this paper? First, even studies that fail to find the expected effect can provide useful information. They can serve as existence proofs of new methods, can show how results may fail to generalize, and can determine necessary conditions for replication. More importantly, null results can challenge existing theories and provide further insights into previous results. The authors then suggest that the criterion for publication should be whether the study advances our knowledge rather than whether it finds a statistically significant result.
I am a postdoc in education data science at the University of Michigan and the founder of PERbites. I’m interested in applying data science techniques to analyze educational datasets and improve higher education for all students