Adjusting for Confounding with Text Matching

The forthcoming article “Adjusting for Confounding with Text Matching” by Margaret E. Roberts, Brandon M. Stewart and Richard A. Nielsen,  is summarized by the author(s) below. 

Adjusting for Confounding with Text Matching

We propose a family of methods for conditioning on text when it confounds the relationship between treatment and an outcome.   

Does being censored once make Chinese citizens more likely to be censored online in the future?  Is there gender bias in the reception of academic scholarship?  Do jihadist preachers get more popular when they are killed in counterterrorism operations?  These are questions we have tackled in our research because we believe the answers are important to society.   

However, they are also questions that are difficult to answer.  Gold-standard experimental evidence isn’t possible on these questions because experimentation with censorship, gender bias, and terrorism is usually unethical and infeasible.  Instead, we have to use non-experimental data, but then face the challenge of confounding factors.   

To get the intuition for why non-experimental data is problematic in these settings, consider our investigation of the effects of gender on citation counts in International Relations, a subfield of Political Science.  We want to know whether the author’s perceived gender influences the number of citations the paper ultimately receives.  Understanding this would allow us to better understand the obstacles certain groups face in academia.  If we could run an experiment, we could randomly assign names to papers and then see how the effect of having a female name on a paper influenced downstream citation counts.  However, we can’t do this for ethical and logistical reasons.  What we can do is collect academic papers in International Relations and compare the citation counts of those written by men and those written by women.  But this approach is also flawed.  Women tend to write about different subject matter than men, and we wouldn’t be able to tell whether or not the differences in citation count are due to these different topics, or due to their gender, a classic case of the old adage “correlation does not equal causation.” 

Modern statistical methods offer principled ways for trying to move from correlation to causation when we believe that experimental intervention is as good as randomly assigned within units with similar confounders.  However, we couldn’t use existing techniques because in each of these examples, we were working with confounders that were measured with text data. To make progress, we developed a new family of methods that we call “text matching.” 

To detect whether there is a gender citation gap, we start with the articles written by women and then use text matching to find a comparison set of articles written by men that are very similar in terms of subject matter and approach.  Comparing the downstream citation counts of these matched comparison sets gives us a plausible estimate of the causal effect of perceived gender on citations.  Consistent with previous studies, we find large negative effects of female authorship on citation count. 

Text matching is a family of statistical tools, not a single technique.  This paper describes our favorite approach, which we have given the unwieldy name of Topical Inverse Regression Matching, or TIRM for short.  However, there are now other options to choose from.  Since we first proposed text matching in a 2015 working paper,1 other research groups have suggested alternative techniques that might be better for some research questions.2  We urge readers of this paper to also learn about the exciting developments in the growing literature. 

About the Author(s): Margaret E. Roberts is an Associate Professor, Department of Political Science at University of California, San Diego, Brandon M. Stewart is an Assistant Professor and Arthur H. Scribner Bicentennial Preceptor, Department of Sociology at Princeton University and Richard A. Nielsen is Associate Professor, Department of Political Science at Massachusetts Institute for Technology. Their research “Adjusting for Confounding with Text Matching” is now available in Early View and will appear in a forthcoming issue of the American Journal of Political Science. 

Speak Your Mind



The American Journal of Political Science (AJPS) is the flagship journal of the Midwest Political Science Association and is published by Wiley.

%d bloggers like this: