Traffic Patterns in the Twin Cities
Executive Summary
This in-progress project examines traffic volume and weather data collected by an automated traffic recorder on US interstate 94 between Minneapolis and St. Paul, Minnesota. So far, I have done quite a bit of data cleaning and worked on gathering general trends in traffic in an average week. I found the deviation from an "average" weekday and certain days of the week to be particularly interesting. Specifically, Monday traffic is distinctly lighter than the rest of the week; Thursday, heavier.
As I continue to develop this project, I hope to build a model to predict traffic densities based on many of the variables provided in the dataset.
Introduction
Interstate-94 (I-94) is a US interstate highway that traverses the north-central US traversing south central Michigan, the Chicago metropolitan area, Milwaukee and Madison, WI, and Minneapolis/St. Paul, MN, before terminating in central Montana (Figure 1).
Traffic data is collected at various points along the length of I-94, and this project seeks to analyze west-bound traffic volume data collected at Minnesota automatic traffic recording station 301 (MN ATR 301, Figure 2).
Goal
The goal of this project is to identify patterns in time and weather that might indicate heavy traffic on westbound I-94 between Minneapolis and St. Paul, and to eventually build a model that to help predict traffic patterns within the Twin Cities area in the future.
Being able to predict traffic patterns on any stretch of road offers benefits ranging from redirection of excess traffic to effective staging of emergency personnel in the case of accidents.
Data Cleaning and Preparation
Detailed information on the process of cleaning the data collected by ATR 301 can be found on this subpage. Briefly, duplicate rows resulting from multiple weather descriptors were combined, strings were standardized, and outliers were dropped.
Overall Traffic Density
Before evaluating the specific patterns of traffic with respect to time-of-day and weather, I explored the distribution of traffic volumes across the entire data set to set expectations for later analyses. To begin, I produced a kernel density estimate (KDE) of all the collected traffic volume data (Figure 3). This allows us to compare the frequencies of different traffic volumes within the data set.
Based on figure 3, I made some initial observations:
Traffic volume ranges from near 0 to over 7,000 vehicles per hour.
There appear to be several overlapping traffic density distributions. One centered at 2,900, one at 4,800, and one at 5,900.
The most common traffic density is around 400 vehicles per hour.
It is difficult to identify the significance of these peaks without further evaluation of the data, but we can make some reasonable hypotheses. Specifically:
Different distributions correspond to different times of day
Lowest traffic times are the most densely concentrated and likely correspond to late night and early morning traffic
I visually depict these hypotheses in figure 4.
Figures 3 and 4 give an idea of the traffic trends across the entire data set and provide some initial hypotheses about the underlying source of different peak traffic frequencies. Since the hypotheses represented in figure 4 are based on time-of-day, a logical next step is to look at the average traffic volume by hour of day and day of the week. Figure 5 depicts the average traffic density by day of the week and hour of the day.
Broadly speaking, figure 5 shows the type of traffic pattern I would expect for a large North American city:
Two weekday "rush hours" at 7 am and 4 pm (blue boxes)
Overnight minimum traffic between 1 am and 4 am (gold box)
The weekend traffic distributions differ significantly from the weekday traffic distributions
Figure 5 clearly depicts the temporal distributions of traffic, as well as relative changes in traffic over the course of an average week. Since the differences between weekdays are somewhat challenging to ascertain based on the colors above, I decided to compare the traffic on weekdays to the "average weekday," which I calculated by averaging the values of all weekdays in the dataset hour-by-hour.