The forthcoming article “Using cross-encoders to measure the similarity of short texts in political science” by Gechun Lin is summarized by the author below.
The most commonly used methods in political science struggle to identify when two texts convey the same meaning as they rely too heavily on identifying words that appear in both documents. This issue is especially salient when the underlying documents are short, an increasingly prevalent form of textual data in modern political research. To address the limitation of current methods, I introduce a state-of-the-art transformer model, cross-encoder, which utilizes pair embedding technique that considers the context of both snippets to achieve better estimates of semantic similarity for short texts, such as news headlines and Facebook posts.
I illustrate this model in three examples in American politics. First, I apply an off-the-shelf pretrained cross-encoder to measure the similarity between social messages written by experimental subjects and the original Reuters article about the US economic performance that they read in a “telephone-game” conducted by Carlson (2019), showing that the cross-encoder estimates of information distortion are better at capturing the amount of partisan bias contained in social messages. In the second application which studies the competing media framing of US Supreme Court (SCOTUS), I train a customized cross-encoder model with manually labeled pairs of news headlines to predict the heterogeneity of media coverage of case decisions. The cross-encoder not only outperforms a wide range of word-based and sentence embedding approaches, but also uncover empirical patterns that otherwise would be missed—cases with published dissents receive more diverse coverage than unanimous decisions. The last example presents a more challenging task in which I apply cross-encoder and other models to measure the similarities of social media posts from inter- and intra-party US senators that a topic model has already identified to be on the same policy issue. Only the cross-encoder yields conclusions that are predicted by established theories, which state that elite polarization is more intensive in domestic policies compared to international affairs.
About the Author: Gechun Lin is a Ph.D. candidate in Political Science at Washington University in St. Louis. Their research “Using cross-encoders to measure the similarity of short texts in political science” is now available in Early View and will appear in a forthcoming issue of the American Journal of Political Science.
