Suggestions for picking pseudonyms for research participants

Nick Young

4 years ago

Title: “*Pseudonyms Are Used Throughout”: A Footnote, Unpacked

Author: Janet Heaton

Author’s institution: University of the Highlands and Islands

Journal: Qualitative Inquiry (2021) [closed access]

The first time I did a qualitative research project, I was a first year graduate student. I didn’t know too much about qualitative research at the time, but I knew that I needed to come up with some way to refer to my participants without disclosing their names. I wasn’t given too much guidance on how to do that because it’s something researchers just do. Or so I thought.

While changing the names of the participants might seem like a trivial part of the research process, how to best do it remains an open question. The goal of changing the names is to make sure that participants cannot be identified but at the same time, someone’s identity provides crucial context for their lived experiences. Thus, there exists a tension between giving enough information about the participants to properly contextualize the data but not too much that participant confidentiality might be compromised.

Unfortunately, there isn’t a simple, general answer that will work in all situations. However, the author of today’s paper provides some suggestions and recommendations for how to handle the tradeoff of context and confidentiality.

How can I hide participant’s names?

Today’s paper addresses three main methods to do so. First, and probably one of the most common in PER, is to use pseudonyms. Here, the participant’s name is replaced by a different name, which can be determined at random, alphabetically, by the participant, or through some well known fictional characters (e.g. Shakespearian or in my case, science fiction).

Alternatively, the researchers can use an epithet and use a descriptive term to describe the person or place. For example, in a study of the dynamics of a research lab, a researcher could refer to the lab’s primary investigator as P.I. or professor rather than use their name.

Finally, researchers can use codenames to represent participants. Unlike pseudonyms, codenames don’t have to be names in the traditional sense of names, but rather can be index terms such as ID04 or subject 2. A strength of codenames is that they can be combined or defined to encode certain information easily. However, they can also get complicated quickly, making it hard for a reader to follow. For example, if the P.I. is a 43 year old, Asian, woman, her codename could be PI/43/A/w.

How much information about the participant’s identity should I include?

While all of these methods work are useful for protecting confidentiality, they have various degrees of utility when it comes to providing context and respect for participants’ identities. After all, a person’s name can hold deep personal, social, and symbolic meaning as well as convey information about a person’s ethnicity, age, gender, religion, etc. For example, consider what information about a participant might be assumed by the names Helga, José, Clarence, and La’Tonia.

When it comes to pseudonyms, a researcher must decide whether to pick identifiers that match on some characteristics or avoid the issue all together and use genetic names. Likewise, when it comes to epithets, researchers must decide which characteristics are important and which not are. At the same time, the researchers must be aware of how the label might reflect on the participant. Using “low-income” or “disabled” to describe a student comes with a different set of assumptions than referring to them as “female, 22”.

One possible solution to avoid the issue would be to allow participants to pick their own method of referring to themselves. However, that doesn’t necessarily allow the researcher to avoid the issues with picking an identifier for themselves. For example, the article notes participants might chose ambiguous identifiers such as “Super woman,” which could be interpreted as showing a positive self-image or could be interpreted as having too many tasks to balance at once.

Another possible solution is to use codenames because relevant information can be explicitly mentioned. However, doing so can de-personalize the participant, reducing them to a number or code.

What about me, the researcher?

In addition to accurately capturing the identities of participants, a researcher should also consider how to capture the relationship between themselves and the participants. Data aren’t obtained in a vacuum and in many cases, result from a series of interactions with a researcher that can influence the results.

Consider for example the relationship implied by using “Sam”, “Samantha”, and “Dr. Jones.” In the first case, the implied relationship is more informal and causal while the latter two names reflect a more formal relationship that might not be the case if the participant and the researcher had been interacting for many months during the study.

Alternatively, using an informal name might be seen as disrespectful, especially if the participant’s title is neglected or there exists an unbalanced power dynamic between the researcher and participant.

In the case of common names like “Sam,” repeated use of the name can lead the participant to be seen as a Jane Doe or John Smith (or in physics-speak, an Alice or a Bob) rather than as a individual.

Concerns such as these aren’t only limited to the participant, however. How to refer to the researcher is just as important.

In many cases, the interviewer is simply referred to as “interviewer,” without reference who is doing the interviewing. Yet, doing so ignores potentially useful information about the dynamic between the researcher and participant. Previous work has documented cases where the identity of the interviewer affected what the participant was comfortable sharing.

In addition, moving away from using an ambiguous “interviewer” can also make it clear whether there was a single interviewer or team. In this sense, the use of the interviewer’s names (or other identifier) humanizes the interviewer rather than depicting them as a single background observer.

When should I worry about all this?

In addition to thinking about the strategy for preserving participant’s identities while maintaining confidentiality, it is also important to think about when it should happen. Should identities be changed before data collection, after data collection, after analysis, or only in passages that appear in publications?

As you might have guessed, the answer also depends. Changing later in the research process means that there are likely more people who know the true identity of the participant while changing identities earlier in the process might mean that important information is lost or opportunities for building rapport are lost. For example, if people and places are referred to by pseudonyms during the interview, the participant might not feel like they are allowed to share as much personal information while if pseudonyms are introduced before analysis, a researcher could misinterpret a quote due to a lack of context.

What am I supposed to do then?

Rather than give a specific recommendation on how to anonymize participants, the author of today’s paper suggests that researchers follow a two-step process.

First, researchers should talk with the participants in their study about their preferences and concerns for maintaining privacy while avoiding erasing their identities.

Second, researchers should include information about the process in their publications. As the title points out, this information is often relegated to a footnote saying it happened without any details. Instead, researchers could include how they de-identified data, what choices they made, and how those choices might strengthen or weaken the conclusions of their research.

Nick Young

I am a postdoc in education data science at the University of Michigan and the founder of PERbites. I’m interested in applying data science techniques to analyze educational datasets and improve higher education for all students