About

Data project about the public bike service traffic in New York.

Objective

This is a data project I made when I learned to use seaborn and matplotlib libraries.
The objective was to rely entirely on those two libraries to create data visualizations.
See notebook here.

Tools

Material

The material used is a csv file with data from Citi Bike, a bike services company from New York.
Data used for this project is from January 2023 (source).
The file consists of several columns with information such as:

ride IDs
station names
date and time for departures and arrivals
latitude/longitude coordinates

Context

Managing traffic flows is one of the main challenges of civil engineering, hence why usage of data and especially real time data is essential to better understand the patterns of traffic flows.
In a metropolis like New York, road traffic can change heavily with weather conditions, national holidays, seasons, events, public renovations.
Over the last decade, public bike services have grown popular as a commuting mean, changing the way road network is designed.
By leveraging public bike services data, we can better understand how people use this service and what they expect from it. Analyzing this data is crucial to adapt capacity and density of the bike service, so that it suits users' needs and habits.

Data

Data was prepared to enable the creation of the visualizations below:

duration distribution
hour frame distribution
weekday distribution
usage/distance relation

Duration distribution

Bar chart displaying the distribution of rides by duration categories.

Three categories were created:

0 to 5 minutes, representing "short" rides
5 to 10 minutes, representing "medium" rides
15 minutes and over, representing "long" rides

This visualization shows the high level data of usage habits and highlights the prevalence of medium-length rides.

Hour frame distribution

Line chart displaying the mean value of ride departures by hour frame.

The goal was to get insights about the evolution of traffic throughout the day, showing which hour frames had the highest traffic and which had the lowest.
This line chart uses the aggregated values of the dataset to display the mean value for each hour frame. Additionnally, a confidence interval is displayed around the line, showing the estimation range for data points.
This type of visualization can be crucial to assess the capacity of each station throughout the day.

Weekday distribution

Bar chart displaying the distribution of rides by day of the week, with a confidence interval showing an estimation range.

Usage/distance relation

Scatter plot displaying the 10 ride routes with the most rides.
A ride route is defined as a ride starting from a station A and ending at a station B.

Below is a dataframe with the 10 most used ride routes.

Since the dataset only contains latitude and longitude coordinates, the distance was calculated using Google Maps Routes API. The benefit of this step was to retrieve the actual distance, taking into account the road network for more precise distance calculation.

Once the distance values are retrieved, data can be plotted on a scatterplot to display the 10 most popular routes.

When scaled to a complete year and to all the ride routes, this type of visualization can help us understand the overall usage habits of the bike service users. Leveraging this data would also be valuable to better manage the existing stations and estimate the best areas to target in the creation of new stations.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
dataset.png		dataset.png
duration_chart.png		duration_chart.png
google_maps_api.png		google_maps_api.png
hour_frame_chart.png		hour_frame_chart.png
most_popular_routes_df.png		most_popular_routes_df.png
notebook.ipynb		notebook.ipynb
relation_chart.png		relation_chart.png
weekday_chart.png		weekday_chart.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Objective

Tools

Material

Context

Data

Duration distribution

Hour frame distribution

Weekday distribution

Usage/distance relation

About

Releases

Packages

Languages

FlorianLD/bike_service_project

Folders and files

Latest commit

History

Repository files navigation

About

Objective

Tools

Material

Context

Data

Duration distribution

Hour frame distribution

Weekday distribution

Usage/distance relation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages