Teaching and Measuring: How Empirical Research Made Me Rethink Exam Design
- Dr. Roee Sarel

- 1 day ago
- 3 min read
I am often asked how my teaching informs my research and vice versa. The intersections are more frequent than one might expect, but let me share one practice I have recently adopted that I find particularly valuable.
Exams as Sampling Instruments
In my course on Empirical Legal Studies, I teach students about sampling—the idea that we measure a phenomenon by drawing a random sample from a population rather than observing the entire population. The rationale is straightforward: surveying every member of a population is typically too costly or logistically impractical.
I have been thinking about what this logic implies for exam design. As professors, we write questions intended to evaluate a student's knowledge. If we think of that knowledge as the "population," then each exam question is, in effect, a sample drawn from the student's understanding—sometimes quasi-random, sometimes not. This framing raises an important question: how many questions should an exam contain? One? Two? Fifteen?
The answer, I believe, follows directly from sampling theory. Our goal should be to ask questions for which the quality of answers is not strongly correlated—that is, questions that capture independent dimensions of a student's understanding. For example, if I teach students about (i) the Coase Theorem and (ii) efficient breach, I should only ask about both topics if I believe they measure something meaningfully different. If every student who understands the Coase Theorem also understands efficient breach, there is little value in asking about both—doing so would amount to sampling the same construct twice.
Using Statistical Tools to Evaluate Exam Questions
This is where research methodology informs teaching practice. To determine whether my exam questions are capturing distinct aspects of student knowledge, I use statistical software to examine the correlations between question scores.
Consider a recent exam in a Law & Economics course, where I posed three questions. The first addressed supply and demand: students were given a scenario involving an external event (such as a tax) and asked to analyze its effects on equilibrium price, equilibrium quantity, and consumer surplus. The second tested game theory: students were asked to construct a payoff table, solve for the Nash Equilibrium, and discuss what changes when the game is transformed from a static to a dynamic setting. The third question asked students to identify a market failure and briefly discuss how a behavioral Law & Economics intervention might address it.
After grading the exams according to a standardized rubric, I computed the inter-question correlation matrix (excluding one outlier who produced largely nonsensical responses and received near-zero marks overall)

The results are instructive. Question 1 was not significantly correlated with Questions 2 or 3—exactly the pattern one hopes to see, as it indicates that the first question captures a distinct dimension of understanding. Questions 2 and 3, by contrast, showed a statistically significant correlation (r = 0.42, p < 0.05). This suggests some overlap, but given that the correlation is moderate rather than strong, the third question likely contributes meaningful additional explanatory power.
Scatter plots further illustrate these relationships (here below, as an example, the scatter of question 2 versus 3):

Why This Matters
I am motivated to refine this approach for at least two reasons. First, there is the practical consideration of efficiency: grading exams is time-intensive, and if a question does not generate additional information about a student's ability, the effort spent drafting and evaluating it is poorly allocated. Second, and more importantly, if all questions on an exam effectively measure the same underlying construct, I risk penalizing capable students who happen to struggle with one particular topic while performing well across others.
My hope is that, over time, applying empirical research methods to my own teaching practice will help me design more effective assessments—and ultimately become a better educator.



Comments