Skip to content

Data project about the public bike service traffic in New York

Notifications You must be signed in to change notification settings

FlorianLD/bike_service_project

Repository files navigation

About

Data project about the public bike service traffic in New York.

Objective

This is a data project I made when I learned to use seaborn and matplotlib libraries.
The objective was to rely entirely on those two libraries to create data visualizations.
See notebook here.

Tools

Material

The material used is a csv file with data from Citi Bike, a bike services company from New York.
Data used for this project is from January 2023 (source).
The file consists of several columns with information such as:

  • ride IDs
  • station names
  • date and time for departures and arrivals
  • latitude/longitude coordinates

Test

Context

Managing traffic flows is one of the main challenges of civil engineering, hence why usage of data and especially real time data is essential to better understand the patterns of traffic flows.
In a metropolis like New York, road traffic can change heavily with weather conditions, national holidays, seasons, events, public renovations.
Over the last decade, public bike services have grown popular as a commuting mean, changing the way road network is designed.
By leveraging public bike services data, we can better understand how people use this service and what they expect from it. Analyzing this data is crucial to adapt capacity and density of the bike service, so that it suits users' needs and habits.

Data

Data was prepared to enable the creation of the visualizations below:

  • duration distribution
  • hour frame distribution
  • weekday distribution
  • usage/distance relation

Duration distribution

Bar chart displaying the distribution of rides by duration categories.

Test

Three categories were created:

  • 0 to 5 minutes, representing "short" rides
  • 5 to 10 minutes, representing "medium" rides
  • 15 minutes and over, representing "long" rides

This visualization shows the high level data of usage habits and highlights the prevalence of medium-length rides.

Hour frame distribution

Line chart displaying the mean value of ride departures by hour frame.

Test

The goal was to get insights about the evolution of traffic throughout the day, showing which hour frames had the highest traffic and which had the lowest.
This line chart uses the aggregated values of the dataset to display the mean value for each hour frame. Additionnally, a confidence interval is displayed around the line, showing the estimation range for data points.
This type of visualization can be crucial to assess the capacity of each station throughout the day.

Weekday distribution

Bar chart displaying the distribution of rides by day of the week, with a confidence interval showing an estimation range.

Test

Usage/distance relation

Scatter plot displaying the 10 ride routes with the most rides.
A ride route is defined as a ride starting from a station A and ending at a station B.

Below is a dataframe with the 10 most used ride routes.

Test

Since the dataset only contains latitude and longitude coordinates, the distance was calculated using Google Maps Routes API. The benefit of this step was to retrieve the actual distance, taking into account the road network for more precise distance calculation.

Test

Once the distance values are retrieved, data can be plotted on a scatterplot to display the 10 most popular routes.

Test

When scaled to a complete year and to all the ride routes, this type of visualization can help us understand the overall usage habits of the bike service users. Leveraging this data would also be valuable to better manage the existing stations and estimate the best areas to target in the creation of new stations.

About

Data project about the public bike service traffic in New York

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published