Travel Time Estimation for Geospatial Big Data: A case study of healthcare accessibility in the USA

June 6, 2023
Plot map of the US; legend includes Pediatric Hospitals, ZIP Code Population Centroids, and OD Pairs

by Xiaokang Fu, Devika Kakkar, and Jeff Blossom
    

Introduction

Estimating drive times is crucial and essential in various fields, such as urban planning, transportation engineering, business management, public health, and healthcare accessibility studies. In public health and medical service accessibility research it is crucial to estimate the travel time between patient locations and health services, clinics, or hospitals. Although numerous drive time estimation methods are available, there is a lack of comprehensive comparative analysis to guide researchers and professionals in choosing the most suitable method for their specific needs, particularly when dealing with geospatial big data. This article highlights our recent work to do a comparative study of six drive time estimation methods, focusing on accuracy, cost, and scalability using a case study in the USA.

Methods

Here we propose a systematic framework to perform a comparative analysis of six drive time estimation methods, including Web Service APIs (Google Maps API, Bing Maps API, Esri Routing Web Service), GIS desktop software (ArcGIS Pro Network Analyst), and open-source packages (OSMnx, OSRM). The framework consists of several steps: selecting use cases, generating sample data from use cases, calculating driving time, comparing results, and analyzing as shown in the figure below:

Flowchart from Use Case to Sampling to Routing Calculation to Results Comparison

To create a representative sample for our comparative analysis, we generated 10,000 random pairs of ZIP code centroids and hospitals that offer pediatric services (Origin-Destination pairs or OD pairs), ensuring a comprehensive spatial representation of the entire USA. Thereafter, we calculate drive times using each of the above methods and compare their accuracy, efficiency, and cost-effectiveness in a real-world context. The Google Maps API, Bing Maps API, and Esri Routing Web Service were accessed using the Python Requests package, while ArcGIS Pro was used to manually calculate drive times on locally stored Esri road network data. For OSMnx, we performed street network analysis using OpenStreetMap data and partitioned the OD pairs based on their respective states to manage the computational load. Lastly, we utilized the OSRM web API, both through the demo server and a self-hosted server on FAS research computing (FASRC). By the above framework, we successfully compared the performance of the six drive time estimation methods and selected the most suitable method for our specific use case.

Use Case

The study area for our research is the United States of America (USA), focusing on the 48 contiguous states, which provides a diverse range of urban and rural contexts to evaluate the performance of drive time estimation methods. The USA was chosen due to its well-developed and extensive road network, which allows for a comprehensive analysis of drive time estimates across different regions and environments. Additionally, the availability of detailed and up-to-date geospatial data, such as ZIP code population centroids and hospital locations, makes the USA an ideal choice for this analysis.

For our use case, we utilized two main data components: USA ZIP code population centroids and the locations of hospitals that offer pediatric services (pediatric hospitals). The ZIP code centroids are obtained from the US Department of Urban Development, representing the central point of the ZIP code areas determined by population distribution. In our study, these locations represent pediatric residents in need of healthcare services. We also compiled a list of 928 pediatric hospitals across the USA using data from the American Hospital Association and other relevant sources. To support our analysis, we used supplementary data sources, including road network data from ESRI for use with ArcGIS Pro and OpenStreetMap for use with OSMnx and OSRM. These datasets provide essential information on road geometries, distances, and speed limits for calculating drive times. For the web service APIs (Google, Bing, and Esri), traffic data was automatically incorporated into the drive time estimations. In contrast, the open-source packages (OSMnx and OSRM) utilized default speed limits and travel speeds for their calculations.

By leveraging this diverse study area and comprehensive data sources, we were able to thoroughly evaluate the performance of the six drive time estimation methods and identify the most suitable method for accuracy, efficiency, and cost when calculating drive times on geospatial big data. The plot below shows the location of pediatric hospitals, zip code centroids, and OD pairs for our use case.

Plot map of the US; legend includes Pediatric Hospitals, ZIP Code Population Centroids, and OD Pairs

Results

We evaluated the processing speed, cost, and scalability of six drive time estimation methods to determine their performance and suitability for large-scale applications. 

The results revealed that Google Maps, Bing Maps, and ESRI Routing Service were fast and accurate, but their limitations in terms of daily quota and request rates might make them unsuitable for big data applications. In contrast, open-source or no-cost solutions like OSRM (local server) provided rapid processing, low cost, greater scalability, and consistent results, making them more suitable for geospatial big data projects. 

Bar graph of Routing Tools vs Run Time in hours

As shown in the figure below, the drive time estimations obtained from all the methods exhibited a linear relationship, with some differences in the results. OSMnx generated relatively shorter drive times with more significant fluctuations, while OSRM generally exhibited longer driving times than Google Maps. Bing Maps' results were typically shorter than Google Maps. The Google Maps results were closely aligned with those from the ESRI routing service and ArcGIS Pro. We also provide a detailed comparison of estimated drive times for routes less than 140 minutes, revealing a departure of the OSRM drive times from Google around the 50 - 60-minute mark. This difference might be due to how the OSRM and Google algorithms handle the computation of driving times on highways.
Graph of Google Maps Drive Time in minutes vs Drive Time in minutes

The results of this analysis offer valuable insight for researchers and practitioners in selecting the most appropriate drive time estimation method for their specific needs and the scale of their projects.

Conclusions

In conclusion, our comparative study of six drive time estimation methods using 10,000 OD pairs has demonstrated the efficiency, accuracy, and cost-effectiveness of the different methods for big data projects. Based on our findings, we utilized OSRM for a larger calculation of 32.8 million OD pairs in less than 6 minutes at no cost. The results are currently being utilized by a research team at Boston Children’s Hospital to improve our understanding of pediatric hospital capacity in the USA.

Our study offers valuable guidance to the geospatial research community looking to perform drive time calculations on big data. By examining a diverse sample of 10,000 ZIP/Hospital pairs, we compared six drive time calculation methods and found that, except for OSMnx, all methods offer accurate results for drive time estimations of 10,000 pairs or less, cost-effectively.

These findings contribute to the broader understanding of drive time estimation methods and their performance in different contexts. Our research acts as a benchmark for those seeking to choose the most suitable method for their specific needs. It is important to note that our analysis is focused on the conterminous USA, and the performance of these methods may vary in different geographical regions. The results are discussed in detail in the below publication:

X. Fu, D. Kakkar, J. Chen, K. M. Moynihan, T. A. Hegland, & J. Blossom. (2023). A COMPARATIVE STUDY OF METHODS FOR DRIVE TIME ESTIMATION ON GEOSPATIAL BIG DATA: A CASE STUDY IN USA. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. (Accepted)
 

For more information about the Center for Geographic Analysis, you can visit: https://gis.harvard.edu/