Python for Geospatial Big Data and Data Science Using the FASRC

Date: 

Tuesday, September 26, 2023, 9:30am to 4:30pm

Location: 

CGIS South, room S030 (Lee Gathering Room)

Lead Instructor: Robert Spang
Co-Instructors: Devika Kakkar and Xiaokang Fu

Mode

Full-day, on-site workshop with limited capacity (maximum of 20 participants)

Topics Covered

  1. Introduction and Fundamentals of High-Performance Computing, with a focus on FASRC
  2. Foundations of Data Analysis and Data Science, emphasizing Big Data
  3. Concepts of filter/map/reduce, multi-processing, and Apache Spark using Python
  4. Practical application using a large social media data set (The GeoTweets Data Set / Twitter Sentiment Geographical Index) to address a sample research question

Learning Objectives

Participants will learn how to analyze large data sets using Python and FASRC. The workshop will cover various tools and techniques used in Data Science and Big Data computations. Attendees will be prepared to work with their own data sets and apply their analyses using FASRC.

Target Audience

This is a workshop for intermediate level Python users; basic Python development experience is required. Participants should be comfortable using Python on their own machines, be able to load and inspect CSV files locally, and use SSH to connect to a remote server. Having some experience with Numpy and Pandas is recommended, but not required. The workshop is suitable for first-time users of HPCs and those interested in Geo-analyses.

Expected Prerequisites

  • Bring your own laptop and laptop charger.
  • Create a FASRC account before attending the workshop. We won't have time for on-site registration, so please set up and test your account at least a week in advance.
    • To assist you with the account setup, you can follow this 10-minute tutorial for FASRC: FASRC Account Setup Tutorial (minimum requirement).
    • Additionally, consider watching this 8-minute tutorial on executing basic Python code / Jupyter Notebook on FASRC machines: Running Python on FASRC Tutorial.

To Attend

The workshop is free for individuals with a valid Harvard ID. Please click on the green button below to register. A video version of the workshop will be made publicly available a few days later for those unable to attend in person.

HARVARD APPLY

Contact

For any questions regarding the workshop, please email robertspang @ fas.harvard.edu. A Slack channel for questions and assistance will also be created prior to the event.