Tidying Data with Python and OpenRefine


Thursday, April 16, 2020, 1:00pm to 3:30pm




In his paper "Tidy Data," Hadley Wickham riffs on Tolstoy: "Like families, tidy datasets are all alike but every messy dataset is messy in its own way." When we spend 75% of our "analysis" time cleaning and preprocessing data, it makes sense to focus on strategies to standardize our data. In this workshop, we will focus on correcting common errors in collected data and (re)structuring datasets to facilitate analysis. We will be using OpenRefine and Python for these tasks; while you don't need to be a Pythonista, you should have some familiarity with Python or other similar scripting languages, as we won't be spending much time on syntax.

Please see the following page for registration details: https://dssg.fas.harvard.edu/event/tidy-data-python-openrefine/