Authors: Bas Hofstra, Vivek V. Kulkarni, Sebastian Munoz-Najar Galvez, Bryan He, Dan Jurafsky, Daniel A. McFarland
First author’s institution: Stanford University
Journal: Proceedings of the National Academies of Science
In 2018, Donna Strickland became only the third woman awarded the Nobel Prize in Physics. She won the award for her work as a graduate student on chirped pulse amplification, which created the field of high-intensity ultrashort pulses of light beams.
More broadly, research has shown that people from underrepresented groups in their field are more likely to “connect the dots” between ideas that have been traditionally not linked. In theory, making these connections as a graduate student, like Donna Strickland did, shows the student is able to do impactful and meaningful science, and hence we would expect them to lead a successful scientific career. Yet, when it comes to people underrepresented in their field, this is not the case. Despite being more innovative, they are less likely to become research faculty. This is the diversity-innovation paradox.
One possible explanation of the paradox is that work by people from underrepresented groups is discounted by their non-underrepresented peers and hence, doesn’t make the impact or receive the recognition it should. For example, the Nobel Prize in Physics has been almost entirely awarded to men, failing to recognize the major contributions of women scientists such as Lise Meitner, Chien-Shiung Wu, Vera Rubin, and Jocelyn Bell Burnell. The goal of today’s paper is to understand the diversity-innovation paradox through the discounting of minoritized scholar’s work on a broader scale.
To conduct their analysis, researchers analyzed over 1 million PhD theses published in the U.S. between 1977 and 2015 from a variety of fields. Rather than read all of those theses, the researchers used natural language processing to extract the concepts and ideas discovered and cited in each thesis. The researchers use these citations to link newer theses to the older theses and allowed for a measure of impact. If a thesis linked two ideas together that hadn’t be linked together before, the thesis was considered to have conceptual novelty with a higher conceptual novelty score meaning the thesis linked more previously unlinked ideas together. For example, figure 1 is a visual depiction of Strickland’s thesis, with the dotted line showing a new connection between the ideas of “grate” and “stretch” since her work introduced the idea of using grating-based stretchers and compressors to achieve the laser amplification.
The researchers also used the number of times other theses cited the new link as a measure of impact, assuming that more impactful ideas would be referenced more times in future theses. For Strickland’s thesis, 22 other theses in the data set reference her idea (figure 1).
When looking at their results, the researchers found that people underrepresented in their field (either by gender or race) were indeed more likely to link new concepts, and hence, introduce innovative ideas. Conversely, the more the person was represented in their field (in terms of gender), the more likely their work was to be taken up by others. That is, researchers from underrepresented groups are more innovative in their work (more new links in their theses), but their work has less of an impact (less references in future theses). This was true for all non-white male groups: nonwhite women, white women, and nonwhite men all had higher rates of novelty in their theses, but had less of an impact with their theses.
Now, why might this be the case? One possible reason that the researchers could test using their data is that innovations that link concepts from different fields make less impact than innovations that link concepts from the same field. As an example, this means that a thesis that linked a physics concept with an education concept would make less of an impact than a thesis that linked a physics concept to another physics concept or an education concept to another education concept. (See figure 2 for examples of closely related ideas (A) and distant ideas (B)).
Using word embedding to determine how closely concepts were related, the researchers found that there was evidence to support their claim: women were more likely than men to link ideas from different fields and that the farther the concepts were separated, the less times the link was referenced in future theses.
As the final part of the project, the researchers checked if the novelty and impact of the thesis was related to going on to a research faculty or other research job (industry scientist, non-tenure track professor, etc.). Indeed, that was the case. However, there were disparities between men and women and white and non-white researchers. Women has 5% lower odds of becoming researchers while researchers of color had 25% lower odds.
Overall, the work does suggest that people from underrepresented groups do tend to be more innovative in that they make more new connections between concepts that than well-represented peers. This means that, in addition to rectifying historical injustices, diversifying academia does appear to result in more innovative work.
The second key takeaway is that despite this work being more innovative, new ideas from underrepresented researchers make less of impact in their field. While the researchers use their data to show that this can be a result of which ideas are linked, there is also a more general “citation gap” in which men are cited more often than women and white scholars are cited more often than scholars of color. So when writing your next paper, think about who you are citing and ensure you aren’t missing the work of underrepresented scholars. Also, when inviting scholars to speak or nominating colleagues for awards, think about if you are perpetuating or challenging the status quo of recognizing mainly white men.
Figures used under the PNAS License.
I am a postdoc in education data science at the University of Michigan and the founder of PERbites. I’m interested in applying data science techniques to analyze educational datasets and improve higher education for all students