Retrieval-based practice and peer instruction in introductory Physics

Title: Comparing retrieval-based practice and peer instruction in physics learning

Authors: Tianlong Zu, Jeremy Munsell, and N. Sanjay Rebello

First Author’s Institution: Purdue University

Journal: Physical Review Physics Education Research, 15 010105 (2019)

Education researchers have studied several interventions with the aim of improving student learning. These range from activities for individual students to systems for managing students’ path through schools. In today’s paper the authors compare two well tested interventions, namely retrieval practice and peer instruction, in the context of introductory physics for would-be elementary school teachers. Specifically, the authors conduct a clinical experiment to determine whether students learning introductory physics via the two techniques differ in performance on different types of tasks as well as in the judgment of their own learning.

Retrieval practice and Peer Instruction

Retrieval practice (also known as the testing effect) is retrieving something from memory. There are many ways of actually performing this practice, such as taking a quiz after studying a chapter from a textbook, closing books and notes and trying to recall as much about a topic as possible, using the technique of mind-mapping to recall various aspects of a topic, using flash cards to recall meanings of words, mathematical theorems etc., and so on. One important finding is that retrieval practice has been shown to outperform repeated studying. That is, participants who use a retrieval strategy to solidify their understanding of material can recall more information accurately than those who repeatedly re-read the material. Retrieval practice has been shown to benefit not just in memorizing already seen information but also in some transfer tasks where participants have to infer information that was not directly present in the study materials. That is, retrieval practice seems to support both the durability as well as the flexibility of memory(3).

Peer Instruction is a simple but effective collaborative learning technique that was developed by Eric Mazur and colleagues. In the classic version of peer instruction, the instructor presents some concept and then asks a question related to the concept. Students are encouraged to think for themselves and to note down their answer. Students are then asked to discuss the topic with their peers, making any necessary changes to their answer. Finally each student gives an answer, usually via a classroom response system such as clickers, and the instructor carries out a discussion in class about the student responses. This simple strategy is surprisingly effective at improving student understanding. For example, students taught using peer instruction tend to outperform students taught using traditional lecture on the Force Concept Inventory.

Goals of this paper

The authors of this paper suggest that, even though retrieval practice has a rich tradition in educational psychology, it has rarely been tested with the kind of materials that are used in physics classrooms. Similarly peer instruction has been well tested against traditional lectures, but it needs to be compared with retrieval practice. In this study, the authors want to compare retrieval practice with peer instruction in the context of introductory physics via performance on different types of tasks as well as on students’ judgment of their own learning.

Before testing students, we have to decide what to test. Do we test on the same material that the students used while learning or should we test whether students can transfer their learning to situations they haven’t seen? The transfer of learning to novel situations is a very important topic in educational psychology(1). After all, of what use is learning if we can only apply it in situations that are identical to the learning situation? Keeping this important aim of education in mind, educational psychologists classify tasks used to test learning into two broad categories: near transfer tasks and far transfer tasks. In this context, the tasks that students use to learn a topic can be called learning tasks. Speaking in broad terms near transfer tasks are tasks that can be solved using the concepts learned and are similar to the learning task, where as far transfer tasks can be solved using the concepts learned but are very different from the learning task. The catch is that the terms near transfer and far transfer can be defined in many ways and there is no universal definitions of these(1). Each study has to careful explain why they consider certain tasks to be near and others to be far in terms of transfer of learning. The authors of this paper have taken good care to define what exactly they mean by these terms.

The authors presented students with two concepts, the definition of speed and the concept of conservation of energy, via animated videos(4); if you are interested in seeing what the students viewed, see reference 4. The authors use motion along a horizontal straight line to illustrate the definition of speed. Figure 1 is a screenshot from the video explaining speed and distance in the context of straight line motion. They use the case of a person skating along a path similar to a roller coaster (figure 2) as well as an analogy using two buckets (figure 3) to illustrate conservation of energy and how energy gets partitioned into kinetic and potential energies.

Figure 1: Screenshot from video describing the definition of speed.
Figure 2: Screenshot from video describing conservation of energy.
Figure 3: Screenshot from video describing conservation of energy.

As explained in more detail in the next section, student groups are divided into two groups: retrieval practice group and peer instruction group. The student groups in the peer instruction group restudied slides from the video as a group activity, after watching the videos, and discussed the contents of the videos among themselves. Student groups in retrieval practice group first watched the videos and then worked, each student working on their own, on two tasks that are very similar to the way the material was presented in the videos. These are illustrated in figure 4 for speed and in figure 5 for energy.

Figure 4: Learning task associated with definition of speed.
Figure 5: Learning task associated with conservation of energy.

Thus the videos and the group work represent the learning tasks for peer instruction group where as the videos and the two tasks illustrated in figure 4 and figure 5 form the learning tasks for the retrieval practice group.

While figures 1-5 illustrate the conditions of learning, figure 6 and figure 7 show the near and far transfer tasks used to test students’ understanding of the definition of speed and conservation of energy, respectively. From the figures it is clear what the authors mean by near transfer and far transfer in the context of this study.

Figure 6: near and far transfer tasks associated with the definition of speed.
Figure 7: near and far transfer tasks associated with the concept of energy conservation.

Having carefully defined by what they mean by learning task and what near and far transfer are, the authors aim to compare how students learning via retrieval practice and students learning via peer instruction differ in performance on initial task, near transfer task and far transfer task. An initial task is a test task that is very similar to the learning task.

In addition to checking the difference in performance, the authors also want to check if the two groups differed in how well they think they have learned. A learner’s Judgment Of Learning (JOL) is their confidence in how well they have learned some material. It has been found in many studies that learner’s judgment of learning is very different from their actual level of understanding. For example, learners who engage in repeated reading tend to have higher confidence that they have learned the material well when compared to those using retrieval practice. But when their knowledge is tested, students using retrieval practice tend to outperform those who engage in repeated-reading(2). In this study the authors wanted to see if there is any such relationship in the judgment of learning of students learning via retrieval practice and those learning via peer instruction.

The Experiment

68 students enrolled in an introductory physics class for future elementary school teachers participated in the study and received course credits for the participation. Most of the students did not have previous physics coursework. Students were seated in groups of 3-4 and carried out tasks on individual computers on a single table. The groups were formed at the beginning of the semester and lasted the whole of the semester — the students were familiar with fellow group members.

The student groups were divided into two groups: retrieval practice group and peer instruction group. Note that the students are first divided into groups of 3-4, all seated at a table, and then each group engages in either peer instruction or retrieval practice. In retrieval practice the students at each table work on their own. In peer instruction the students at a table engage in group discussion. The tasks each group carried out are illustrated in the figure 8 and are described below.

Figure 8: The Experiment

Both groups first watched two videos — one on the definition of speed and the other on the conservation of energy. Then the two groups filled out a judgment of learning survey that asked them to rate how well they think they know the material they just watched.

The next task is different for the two groups. The peer instruction group were given slides with information contained in the videos and asked to discuss the topics among themselves, one topic after the other. The authors do not report on whether they checked how well these discussions were being carried out. The retrieval practice group were presented with opportunities for retrieval which the students carried out individually. For each topic they were presented with a problem that was very similar to that in the video, and then asked to select the principle that applies in that situation from a list of choices. The students then had to write down the definition of the relevant principle, irrespective of their choice. Finally they had to solve the problem by applying the principle to the scenario. See figure 4 and figure 5 for illustrations of these activities. By introducing these three steps the authors give students the opportunity to practice retrieval through the means of recognition, recall and execution. In their taxonomy of transfer learning Barnett and Ceci present these as important aspects of successful transfer of learning(1).

The remaining tasks are same for both groups. Both groups gave another judgment of learning survey and took a test (“immediate final test”). Both groups returned after 1 week and gave another judgment of learning survey followed by a test (“delayed final test”). Each test had three components: an “initial task” (IN) that was closely related to the scenario presented in the videos, a near transfer tasks (NT) that had a different context but with the same representation, and a far transfer task (FT) that was different in both context and representation. The authors also measure working memory capacity (WMC) of the participants as the final step in the experiment — we do not discuss this aspect of the study.

To summarize there are 2 tests: immediate final test and delayed final test (1 week delay). Each test has three tasks: Initial Task (IN), Near Transfer Task (NT), and Far Transfer Task (FT). 3 judgment of learning measurements were made: JOL1 after watching instructional video, JOL2 immediately before the “immediate final test” and JOL3 immediately before the “delayed final test”. Two main aims of this study are to see if there are any differences in 1) the performance on the three types of tasks across the two tests between the peer instruction group and the retrieval practice group 2) the judgment of learning between the two groups.

Results and discussion

The results are summarized in the figure below, which was created using data in table 1 of the paper.

Figure 9: Data from table 1 of Zu et. al. 2019. The y-axis denotes the task and the test. “Delayed” against a task means the task is from the 1 week delayed test. Tasks without “Delayed” against them are from the immediate test. IN stands for Initial task, NT stands for near transfer task and FT stands for far transfer task. The points come in pairs. In each pair the one of the top is for the peer instruction group and the one on the bottom is for the retrieval practice group. The line passing through each point denotes the standard deviation.

From visual inspection of figure 9 above, we can that there is a large scatter in the performance of each group and that the group averages are quite close to each other. Are the two groups different from each other? The authors performed significance testing and find that on the Delayed IN task (initial task on the delayed final test), mean score for the retrieval practice group is larger than that for the peer instruction group at a statistically significant level (with α=0.05 and p=0.049). The authors also report that for FT (far transfer task on immediate test) retrieval practice is just under the limit for being different than peer instruction in statistically significant manner where as for the remaining 4 criterion tasks, the differences between the mean performance of both groups are well within the uncertainties.

How does this compare to prior research?

Based on the above results, the authors make the claim that, consistent with prior research, retrieval practice performs better than restudying. Here the authors equate restudying with group discussion of the topic that the students in the peer instruction group perform within each group at a table. Since in this study retrieval practice has been compared with peer instruction, which has been shown to be effective in classrooms, and not just an individual studying on their own, the authors suggest that the effect can be considered to be stronger than in prior research.

In typical studies on the effect of retrieval practice, the performance on immediate test is better for restudying compared to retrieval practice, while performance on delayed test is better for retrieval practice compared to restudying. But in this study, we find that in the immediate test (IN) performance between two groups are not statistically different whereas in the delayed test (Delayed IN) we have retrieval practice outperforming restudying, as in prior studies. The authors note that this difference could be because in typical experiments restudying is simply recalling information where as in this study it is more organized.

What about Judgment of Learning?

The judgment of learning results are summarized in figure 10 below that reproduces table 3 from the paper. Consistent with prior research, delayed judgment of learning is lower than immediate judgment of learning: the average 3rd judgment of learning (from the delayed test) was significantly lower than the average of the 1st judgment of learning (just after watching video). Also consistent with prior research, restudying inflates judgment of learning compared to retrieval: the average of 1st and 2nd judgment of learning was significantly higher for the peer instruction condition. The higher confidence in their abilities by participants who restudy could be because restudying makes the material seem familiar. But in later tests this apparent familiarity doesn’t translate to better performance(2). Delayed judgment of learning is similar for both groups, perhaps because students are aware that they could have forgotten some of the material they learned during the past week.

Figure 10: Results from judgment of learning surveys.


This paper is an important contribution to physics education research since it is one of the few attempts to compare retrieval practice in physics contexts, especially comparing retrieval practice and peer instruction. Typically, instructional techniques are compared against “traditional lecture” but not against each other. This paper will hopefully inspire more work on comparing research validated instructional techniques with each other and not just with traditional lecture.

As the authors point out, individual studying as a control condition would be a useful comparison to make with retrieval practice and peer instruction. That is, it is important to know whether individual studying would produce or not produce results that are similar to what we see here. This study was carried out in a clinical manner and more research in real classrooms would be beneficial. The authors also make the important point that instructions that combine retrieval practice and peer instruction should also be compared against either of them on their own: the authors are not advocating one method over the other.

There are two broad questions we can ask of this study. The peer instruction intervention used in this study seems to be very different from the way peer instruction is usually described. The authors acknowledge this point and mention that their use is very similar to the way peer instruction has been used in two other studies. Would a peer instruction intervention more in line with the classic descriptions make any difference?

Another question we can ask is that doesn’t peer instruction include elements of retrieval. Staying within the particular implementation of peer instruction used in this paper, the students presumably had to recall the definitions of the topics as they discussed it with their group members. Within the classic peer instruction opportunities for recall are abundant. Discussing a topic is itself a form of retrieval (see, for example, Retrieval Practice in the Classroom: Is Asking Questions Enough?). So is this experiment testing retrieval practice against peer instruction or it testing two versions of retrieval?

More extensive implementations of the study, in both clinical and real classroom settings, can shed more light on these issues.


Figures used under Creative Commons Attribution 4.0 International. Header image used under Creative Commons Attribution 2.0 Generic from Flickr user Waifer X.

Leave a Reply

Your email address will not be published. Required fields are marked *