Tracy Ke (Workshop in Applied Statistics)


Wednesday, January 26, 2022, 12:00pm to 1:30pm


Virtual via Zoom

Today's speaker: Tracy Ke (Harvard Department of Statistics), "Learning Research Areas and Author Research Interests from Bibtex and Citations"



Given the scientific publications in a field, we are interested in using bibtex and citation data to estimate (a) the primary research areas in this field, (b) the research interests of individual authors (which may evolve with time), and (c) the citation impacts of different research topics in this field. We answer questions (a)-(b) by studying the co-citation networks of authors. We model them by a dynamic mixed-membership model, where each primary area is a “community”, and the author research interests are described by the time-varying “mixed membership vectors”. We propose a spectral algorithm for estimating these membership vectors. We answer question (c) by joint modeling citations and paper abstracts. We propose the Hofmann-Stigler model, which imposes K “topic vectors” in text abstracts, K “export scores” to model the citation impact of these topics, and a “topic weight vector” for each paper. We propose a spectral algorithm for parameter estimation, which output can be used to rank topics.

We implemented our methods in a data set about publications in statistics. It covers over 83K papers in 36 journals in statistics spanning 41 years. We discovered a “Statistics Triangle” that is connected to Bradley Efron’s Statistics Philosophy Triangle (Efron’s triangle is subjective, but our triangle is from data). We also discovered the trend of moving towards the popular sub-area of “High-dimensional Data Analysis” of quite a few high-profile authors. We also found that the research topic “Mathematical Statistics” is ranked 1st in terms of the citation impact.

This is joint work with Pengsheng Ji, Jiashun Jin and Wanshan Li. The talk is partially based on the paper “Co-citation and Co-authorship Networks of Statisticians” (Journal of Business & Economic Statistics, to appear).


The Applied Statistics Workshop (Gov 3009) meets all academic year, Wednesdays, 12pm-1:30pm, in CGIS K354. This workshop is a forum for advanced graduate students, faculty, and visiting scholars to present and discuss methodological or empirical work in progress in an interdisciplinary setting. The workshop features a tour of Harvard's statistical innovations and applications with weekly stops in different fields and disciplines and includes occasional presentations by invited speakers.

More information is available at the Gov 3009 website: