The Applied Statistics Workshop (Gov 3009) meets all academic year, Wednesdays, 12pm-1:30pm, in CGIS K354. This workshop is a forum for advanced graduate students, faculty, and visiting scholars to present and discuss methodological or empirical work in progress in an interdisciplinary setting. The workshop features a tour of Harvard's statistical innovations and applications with weekly stops in different fields and disciplines and includes occasional presentations by invited speakers. Free lunch is provided.
Paramveer Dhillon (MIT) presents
Linear Methods for Big Data
Statistical Machine Learning has seen great advances in the last decade owing to the availability of large-scale annotated datasets and significant improvements in computation hardware. Amidst this measurement revolution, it has become increasingly important to come up with statistical methods that are not only statistically efficient but that are also computationally efficient i.e. they run fast.
Drawing on these developments and recent advances in random matrix theory, I will present my work on building fast and theoretically sound methods for linear regression (OLS) and canonical correlation analysis (CCA). I will also describe how these methods can be used to generate linear features that give a state-of-the-art performance on several natural language processing tasks.