Today's speaker: Shusei Eshima (Harvard) and Tomoya Sasaki (MIT), "Keyword Assisted Topic Models"
Abstract: For a long time, many social scientists have conducted a content analysis by simply counting carefully selected key words and phrases contained in documents of interest. In recent years, however, probabilistic topic models have become increasingly popular because of their ability to uncover topics and keywords based on the co-occurrence of certain words. Unfortunately, applied researchers find that these models often fail to yield topics of their interest by inadvertently creating nonsensical topics, merging unrelated topics, or splitting a single coherent topic. In this paper, we empirically demonstrate that providing topic models with a small number of keywords can dramatically improve their performance. The proposed keyword assisted topic model (keyATM) offers an important advantage that the specification of keywords requires researchers to label topics prior to fitting a model to the data. This contrasts with a widespread practice of post-hoc topic interpretation and adjustments that compromises the objectivity of empirical findings. In our applications, we find that the keyATM provides more interpretable results, has better document classification performance, and is more robust to the number of topics than the standard topic models. Finally, keyATM can also model covariate effects and time trends. An open-source software package is freely available for implementing the proposed methodology.
The Applied Statistics Workshop (Gov 3009) meets all academic year, Wednesdays, 12pm-1:30pm, in CGIS K354. This workshop is a forum for advanced graduate students, faculty, and visiting scholars to present and discuss methodological or empirical work in progress in an interdisciplinary setting. The workshop features a tour of Harvard's statistical innovations and applications with weekly stops in different fields and disciplines and includes occasional presentations by invited speakers. Free lunch is provided.
More information is available at the Gov 3009 website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009