The Applied Statistics Workshop (Gov 3009) meets all academic year, Wednesdays, 12pm-1:30pm, in CGIS K354. This workshop is a forum for advanced graduate students, faculty, and visiting scholars to present and discuss methodological or empirical work in progress in an interdisciplinary setting. The workshop features a tour of Harvard's statistical innovations and applications with weekly stops in different fields and disciplines and includes occasional presentations by invited speakers. Free lunch is provided.
Rebecca Betensky (Harvard- Public Health) presents "Nonidentifiability in the presence of factorization for truncated data"
Title: Nonidentifiability in the presence of factorization for truncated data
Truncation is a structured form of selection bias that arises often in cohort studies. A time to event, X, is left truncated by T if X can be observed only if T < X. This often results in over sampling of large values of X, and necessitates adjustment of estimation procedures to avoid bias. Simple risk-set adjustments can be made to standard risk-set based estimators to accommodate left truncation as long as T and X are “quasi-independent,” i.e., independent in the observable region. Through examination of the likelihood function, we derive a weaker factorization condition for the conditional distribution of T given X in the observable region that likewise permits risk-set adjustment for estimation of the distribution of X (but not T). Quasi-independence results when the analogous factorization condition for X given T holds, as well, in which case both distributions of X and T are easily estimated. While we can test for factorization, if the test does not reject, we cannot identify which factorization condition holds, or whether both (i.e., quasi-independence) hold. Importantly, this means that we must ultimately make an unidentifiable assumption in order to estimate the distribution of X based on truncated data. This contrasts with common understanding that truncation is distinct from censoring in that it does not require any unidentifiable assumptions. We illustrate these concepts through examples and a simulation study.