November 2009
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


May 6, 2008

Tuesday: Tips & Tricks

I've been programming in R for four years now, and it seems that no how much I learn there are a million tiny ways that I could do it better. We all have our own programming styles and frequently used functions that may prove useful to others. I often find that a casual conversation with an office mate yields new approaches to a programming quandary. I'm speaking not of statistical insights, though those are important too, but rather the "simple" art of data manipulation and programming implementation--those essential tricks that help to improve coding efficiency. So, to that end I'm announcing the beginning of a bi-weekly "Tuesday Tips & Tricks" posting. These tips may include the description of a useful and perhaps obscure function, or the solutions to common coding problems. I'm selfishly hoping that if readers of this blog know of better or alternate approaches, they'll respond in the comment section. So I'm looking forward to reading your responses.

This week's tip: How to quickly summarize contents of an object.

Answer: summary(), str(), dput()

The primary option, of course, is the familiar summary() command. This command works well for viewing model output, but also to get a quick sense of data frame, matrices and factors. For example, summary of a data frame or matrix shows the following:

> summary(dat1)
Hello test citynames
Min. :1.00 Min. :-3 Length:2
1st Qu.:1.25 1st Qu.:-2 Class :character
Median :1.50 Median :-1 Mode :character
Mean :1.50 Mean :-1
3rd Qu.:1.75 3rd Qu.: 0
Max. :2.00 Max. : 1

This is an incredibly useful function for numeric data, but is less useful for string data. For character vectors the summary function only reveals the length, class, and mode of the variable. In this case, to get a quick look at the data, one might want to use str(). Officially str() "compactly displays the structure of an arbitrary R object", and in practice this is incredibly useful. So using the same dataframe as an example:

> str(dat1)
'data.frame': 2 obs. of 3 variables:
$ Hello : num 1 2
$ test : num -3 1
$ citynames: chr "Cambridge" "Rochester"

In this case, this is just a 2 x 3 data frame, where the first variable is Hello, it's a numeric variable, and the values of the variable Hello are: 1, 2. In this case, the character vector for citynames is much more usefully displayed. While this is a small example, the function works just as well for much larger data frames and matrices where it only displays the first ten values of each variable.

For smaller objects, the function dput() might also prove useful. This function shows the ASCII text representation of the R object and it's characteristics. So for this same example:

> dput(dat1)
structure(list(Hello = c(1, 2), test = c(-3, 1), citynames = c("Cambridge",
"Rochester")), .Names = c("Hello", "test", "citynames"), row.names = c(NA,
-2L), class = "data.frame")

Posted by Eleanor Neff Powell at 4:41 PM

December 13, 2006

Applied Statistics – Harrington

This week the Applied Statistics Workshop will present a talk by David Harrington, Professor of Biostatistics at Harvard’s School of Public Health, and in the Department of Biostatistical Science at the Dana Farber Cancer Institute.

Professor Harrington received his Ph.D. from the University of Maryland and taught at the University of Virginia before coming to Harvard. He has served as Principal Investigator on numerous NIH and NSF grants researching topics including Nonparametric Tests for Censored Cancer Data, and Statistical Problems for Markov Branching Processes. His research has appeared in Journal of the American Statistical Association, Biostatistics, Genetic Epidemiology, Journal of Clinical Oncology, and Biometrics among many others.

Professor Harrington is involved in two different lines of research. The first is research in statistical methods for clinical trials and prospective cohort studies in which the time to an event is a primary outcome. He has worked in efficient nonparametric tests and regression methods for right-censored data, sequential designs for clinical trials, and nonparametric methods for estimating nonlinear covariate effects on survival. Recently, he and co-workers in the Department of Biostatistics have been studying methods for analyzing survival data when some covariates have missing observations. Missing data are common in both prospective and retrospective cohort studies, and simply ignoring cases with missing observations can lead to substantial biases in inference.

Dr. Harrington 's second line of research, on which he will be presenting, is collaborative research in cancer. He is the principal investigator of the Statistical Coordinating Center for the Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium. This NCI-funded study is a network of sites around the country that are conducting a population-based study of access to and outcomes from cancer care, with special focus on ethnic subgroups and subgroups defined by age.

Professor Harrington will present a talk entitled "Statistical Issues in the Cancer Care Outcomes Research and Surveillance Consortium (CarCORS)." The presentation will be at noon on Wednesday, December 13 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 9:23 AM

December 6, 2006

Applied Statistics - Imbens and Ridder

This week the Applied Statistics Workshop will present a talk by Guido Imbens, Professor of Economics at Harvard University, and Geert Ridder, Professor of Economics at the University of Southern California.

Professor Imbens has recently rejoined the Department of Economics at Harvard and is one of the faculty sponsors of the Applied Statistics Workshop, so we are delighted that he will be speaking at the Workshop. He received his Ph.D. from Brown University and served on the faculties of Harvard, UCLA, and Berkeley before returning to Harvard. He has published widely, with a particular focus on questions relating to causal inference. Professor Imbens has been the recipient of numerous National Science Foundation grants and teaching awards. His work has appeared in Econometrica, Journal of Econometrics, Journal of the Royal Statistical Society, and Biostatistics among many others.

Geert Ridder is Professor of Economics at the University of Southern California. Before coming to the United States he was Professor of Econometrics at the Rijksuniversiteit Groningen and the Vrije Universiteit in Amsterdam in The Netherlands. In the United States he was Professor of Economics at the Johns Hopkins University and visiting professor at Cornell University, the University of Iowa, and Brown University. He received his Ph.D. from the University of Amsterdam. Professor Ridder’s research area is econometrics, in particular microeconometrics, and its applications in labor economics, public finance, economic development, economic demography, transportation research, and the economics of sports. His methodological interests are the (nonparametric) identification of statistical and economic structures from observed distributions (mainly in duration data and discrete choice data), models and estimation methods for duration data and panel data, (selectively) missing data, causal inference, and errors-in-variables. His work has appeared in Econometric, Economics of Education Review, Journal of the European Economic Association, and Journal of Econometrics among others.

Professors Imbens and Ridder will present a talk entitled "Complementarity and Aggregate Implications of Assortative Matching: A Nonparametric Analysis." The presentation will be at noon on Wednesday, December 6, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 10:34 AM

November 29, 2006

Applied Statistics - Alan Zaslavsky

This week the Applied Statistics Workshop will present a talk by Alan Zaslavsky, Professor of Health Care Policy (Statistics) in the Department of Health Care Policy at Harvard Medical School. Dr. Zaslavsky's statistical research interests include surveys, census methodology, small area estimation, official statistics, missing data, hierarchical modeling, and Bayesian methodology. His research topics in health care policy center on measurement of the quality of care provided by health plans through consumer assessments and clinical and administrative data. Among his current major projects are (1) the Consumer Assessments of Healthcare Providers and Systems (CAHPS) survey implementation for the Medicare system, (2) methodology for surveys in psychiatric epidemiology, centered on validation of the CIDI-A (adolescent) survey in the National Comorbidity Study-Adolescent, and (3) studies on determinants of quality of care for cancer, including both the Statistical Coordinating Center and a research site for the NCI-funded CanCORS (Cancer Consortium for Outcomes Research and Surveillance) study. Other research interests include measurement of disparities in health care, and privacy and confidentiality for health care data.

He is a member of the Committee on National Statistics (CNSTAT) of the National Academy of Sciences and has served on CNSTAT panels on census methodology, small area estimation and race/ethnicity measurement, as well as the Committee on the National Quality Report on Health Care Delivery of the Institute of Medicine.

Dr. Zaslavsky received his A.B. degree at Harvard College, his M.S. at Northeastern University, and his Ph.D. at the Massachusetts Institute of Technology. He is a Fellow of the American Statistical Association.

Professor Zaslavsky will present a talk entitled "Modeling the covariance structure of random coefficients to characterize the quality variation in health plans." The presentation will be at noon on Wednesday, November 29th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 7:59 AM

November 13, 2006

Applied Statistics –Joshua Angrist

This week the Applied Statistics Workshop will present a talk by Joshua Angrist, Professor of Economics at the Massachusetts Institute of Technology.

Professor Angrist received his Ph.D. in Economics at Princeton University. After which he joined the Economics Departments at Harvard University and Hebrew University before coming to MIT. He is a Fellow of the American Academy of Arts and Sciences, The Econometric Society, and has served as Co-editor of the Journal of Labor Economics. His publications have appeared in Econometrica, The American Economic Review, The Economic Journal, and The Quarterly Journal of Economics among others. His research interests include the effects of school inputs and organization on student achievement, the impact of education and social programs on the labor market, immigration, labor market regulation and institutions, and econometric methods for program and policy evaluation. Prof. Angrist also has a long-standing interest in public-policy. In addition to his academic work, he has worked as a consultant to the U.S. Social Security Administration, The Manpower Demonstration Research Corporation, and for the Israeli government after the Oslo peace negotiations in 1994.

Professor Angrist will present a talk entitled "Lead them to Water and Pay them to Drink: An Experiment in Services and Incentives for College Achievement." The presentation will be at noon on Wednesday, November 15th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 1:08 PM

October 30, 2006

Applied Statistics - Nan Laird & Christoph Lang

This week the Applied Statistics Workshop will present a talk by Nan Laird, Professor of Biostatistics in the Harvard School of Public Health, and Christoph Lang, Assistant Professor of Biostatistics in the Harvard School of Public Health.

Before joining the Department of Biostatistics, Professor Laird received her Ph.D. in Statistics from Harvard and was an Assistant Prof. of Statistics at Harvard. She has published extensively in Statistics in Medicine, Biostatistics, American Journal of Human Genetics and the American Journal of Epidemiology among others. Her research interest is the development of statistical methodology in four primary areas: statistical genetics, longitudinal studies, missing or incomplete data, and analysis of multiple informant data.

Professor Lang earned his Ph.D. in Applied Statistics from the University of Reading, and has been a member of the Department of Biostatistics since then. His publications have appeared in Biostatistics, the American Journal of Human Genetics, Genetic Epidemiology, and Genetics. Prof. Lange's current research interests fall into the broad areas of statistical genetics and generalized linear models. Recent topics in statistical genetics include family-based association tests, meta-analysis of linkage studies, GEE-methods in linkage analysis and marker-assisted selection.

Prof. Laird and Prof. Lang will present a talk entitled “Statistical Challenges and Innovations for Gene Discovery”. An abstract for the talk and associated background papers are available from the course website. The presentation will be at noon on Wednesday, November 1st, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 9:05 AM

October 16, 2006

Applied Stats - Loeffler

This week the Applied Statistics Workshop will present a talk by Charles E. Loeffler, Ph.D. Candidate in Sociology at Harvard University.

Charles graduated from Magna Cum Laude from Harvard with a degree in Social Studies, before going on to receive his M. Phil in Criminology from Cambridge University. He has recently completed the National Consortium on Violence Research Pre-Dissertation Fellowship under the mentorship of Prof. Steven Levitt of the University of Chicago. His work has appeared in The New Republic Online, Federal Sentencing Reporter, and Ars Aequi: A Biographical History of Legal Science. Charles's research interests include Criminology, Quasi-Experimental Methods and Decisionmaking.

Charles will present a talk entitled "Is justice blind? A natural experiment in the use of judicial discretion in criminal trials". The working paper for the talk is available from the course website. The presentation will be at noon on Wednesday, October 18th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 5:00 AM

October 9, 2006

Applied Stats - Harding

This week the Applied Statistics Workshop will present a talk by Matthew C. Harding, Ph.D. Candidate in Economics at the Massachusetts Institute of Technology.

Before coming to MIT, he received his M. Phil. in Economics at Oxford University. His research interests include Econometrics, American Politics, Political Economy, Macro-Finance, Economic Theory, Industrial Organization and Behavioral Economics. His publications appear in the International Economic Review and Macroeconomics: Imperfections, Institutions, and Policies.

Harding will present a talk entitled "Stochastic Eigen-analysis for Economics, Finance and Political Science". An abstract and accompanying working papers for the talk are available from the course website. The presentation will be at noon on Wednesday, October 4th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 5:08 PM

October 2, 2006

Applied Statistics –Subharup Guha & Louise Ryan

This week the Applied Statistics Workshop will present a talk by Subharup Guha, Post-Doctoral Research Fellow in the Harvard School of Public Health Department of Biostatistics, and Louise Ryan, Henry Pickering Walcott Professor of Biostatistics in the Harvard School of Public Health and Department of Biostatistical Science at the Dana-Farber Cancer Institute.

Before coming to Harvard, Dr. Guha received his Ph.D. in Statistics at Ohio State University. Dr. Guha’s publications appear in Environmental and Ecological Statistics, Journal of the American Statistical Association, Journal of Computational and Graphical Statistics and the Journal of the Royal Statistical Society. His research interests include Bayesian modeling, computational biology, MCMC simulation, Semiparametric Bayesian methods, Spatio-temporal models and survival analysis.

Professor Ryan earned her Ph.D. in Statistics from Harvard University, and has been a member of the Department of Biostatistics since then. She has received numerous honors and distinctions during that time including the the Spiegelman Award from the American Public Health Association, and was named Mosteller Statistician of the Year. She has published extensively in Biometrics, Journal of the American Statistical Association, Journal of Clinical Oncology, and the New England Journal of Medicine. Her research interests focus on statistical methods related to environmental risk assessment for cancer, developmental and reproductive toxicity and other non-cancer endpoints such as respiratory disease, with a special interest in the analysis of multiple outcomes as they occur in these applied settings.

Dr. Guha and Professor Ryan will present a talk entitled "Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates." The paper that accompanies the talk is available from the course website. The presentation will be at noon on Wednesday, October 4th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 12:02 PM

September 26, 2006

Applied Statistics – Ben Hansen

This week the Applied Statistics Workshop will present a talk by Ben Hansen, Assistant Professor of Statistics at the University of Michigan. Professor Hansen graduated from Harvard College, magna cum laude, with a degree in Mathematics and Philosophy. He went on to win a Fulbright Fellowship to study philosophy at the University of Oslo, Norway, after which he earned his Ph.D. in Logic and Methodology of Science at the University of California, Berkeley.

Professor Hansen’s primary research interests involve causal inference in comparative studies, particularly observational studies in the social sciences. His publications appear in the Journal of Computational and Graphical Statistics, Bernoulli, Journal of the American Statistical Association, and Statistics and Probability Letters. He is currently working on providing methods for statistical adjustment that enable researchers to mount focused, specific analogies of their observational studies to randomized experiments.

Professor Hansen will present a talk entitled "Covariate balance in simple, stratified and clustered comparative studies." The working paper that accompanies the talk is available from the course website. The presentation will be at noon on Wednesday, September 27, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

If you missed the workshop’s first meeting, you should check out the abstract of Jake Bowers’ talk, “Fixing Broken Experiments: A Proposal to Bolster the Case for Ignorability Using Subclassification and Full Matching”.

Posted by Eleanor Neff Powell at 4:34 PM