Post by Michael J. Hanmer (Associate Professor & Director of Graduate Studies, Department of Government and Politics, University of Maryland) and Kerem Ozan Kalkan (Assistant Professor, Department of Government, Eastern Kentucky University) regarding: “Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models” by Michael J. Hanmer and Kerem Ozan Kalkan
Political scientists increasingly use sophisticated statistical models to test their theories about politics and policy. Often these models are used with data that were expensive to collect as they generally aim to represent important entities, such as voting eligible citizens, or democracies in the world during a particular time frame.
Translating the results from the statistical models into meaningful outcomes that the researcher and readers can understand is not always straightforward. The importance of this task increases when the research has implications for action by policy makers, other governmental officials, or other stakeholders. Researchers often have to make choices about how to present the results and then have to engage in an additional stage of analyses to generate the outcomes that allow the researcher and readers to understand the given political process. This is particularly the case for researchers who study political outcomes that produce only a limited number of possible responses.
For example, to study the factors that influence voter turnout, researchers would generally use data on the behavior of a large number of individual citizens and for each individual in the data set, essentially ask the following question: Did the individual vote or not? The researcher here would observe one of just two outcomes: yes the individual voted or no the individual did not vote. By studying the process in this way, the researcher can learn about the extent to which demographic factors (e.g. education, race), political factors (e.g. partisan strength, campaign mobilization) and contextual factors (e.g. restrictions on absentee voting, voter identification requirements) influence the probability that an individual will vote. A number of important political processes produce just one of two possible outcomes and start with questions such as: Did a legislator vote yes or no on a bill?; Is a country engaged in a civil war or not?; Did a terrorist group strike this year or not?
Other commonly-studied issues that require advanced statistical models produce outcomes that count the number of times some politically-relevant event occurred. For example, researchers trying to understand the factors that influence terror attacks might collect data on the number of terror attacks each year across countries for 25 years and develop a statistical model to predict the number of terror attacks that took place in a given year. Here, for each country in a given year the main question is: How many terror attacks took place? Other questions that utilize similar statistical models include: How many died in war? How many bills did a legislator sponsor?
In addition to studying politics, many political scientists study ways to improve the methods researchers use to do so, often as a direct result of trying to better understand an important political science or policy question. In “Behind the Curve,” we examine the approaches used to translate the results from the initial statistical model of some political process that produces limited outcomes (like those mentioned above) into meaningful quantities. We find that advice on this issue from textbooks is lacking and that the most common practice is inefficient and less effective at testing the full scope of one’s theories about politics. That is, we advocate for an approach that makes more and better use of the data and thus provides direct tests of researchers’ hypotheses that then allow one to draw conclusions that can be generalized to the population of interest. The last point is crucial when it comes to studies that relate to public policy or other action. An example will help illustrate the point. In a study of voter turnout, researchers might show the effect of changing voter identification laws on what they describe as the “average case,” created by taking the average values for each of the variables used in the model to create this composite individual. Doing this in a typical U.S. survey would result in the researcher examining the effect of these laws on 48 year old white women who are politically independent and have incomes of $40-$45k. This is severely limiting as the effect for this individual using the standard models would not be representative of the average effect in the population or other important subgroups. Instead, we argue that the effect of the law should be evaluated for each individual and then appropriately summarized, perhaps by taking the mean across the sample or comparing the means for whites to the mean for Blacks. From a policy-making perspective, only the latter approach would give confidence regarding the outcomes of the policy.