Skip to content

Package with useful functions to create geo-spatial visualizations from a GTFS.

License

Notifications You must be signed in to change notification settings

zehbrandao/gtfs_functions

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GTFS functions

This package allows you to create various layers directly from the GTFS and visualize the results in the most straightforward way possible. It is still in its testing face.

Table of contents

Installation

!pip install gtfs_functions
import gtfs_functions as gtfs

GTFS Import

The function import_gtfs takes the path or the zip file as argument and returns 5 dataframes/geodataframes.

routes, stops, stop_times, trips, shapes = gtfs.import_gtfs(r"C:\Users\santi\Desktop\Articles\SFMTA_GTFS.zip")
routes.head(2)
route_id agency_id route_short_name route_long_name route_desc route_type route_url route_color route_text_color
0 15761 SFMTA 1 CALIFORNIA 3 https://SFMTA.com/1
1 15766 SFMTA 5 FULTON 3 https://SFMTA.com/5
stops.head(2)
stop_id stop_code stop_name stop_desc zone_id stop_url geometry
0 390 10390 19th Avenue & Holloway St POINT (-122.47510 37.72119)
1 3016 13016 3rd St & 4th St POINT (-122.38979 37.77262)
stop_times.head(2)
trip_id arrival_time departure_time stop_id stop_sequence stop_headsign pickup_type drop_off_type shape_dist_traveled route_id service_id direction_id shape_id stop_code stop_name stop_desc zone_id stop_url geometry
0 9413147 81840.0 81840.0 4015 1 NaN NaN 15761 1 0 179928 14015 Clay St & Drumm St POINT (-122.39682 37.79544)
1 9413147 81902.0 81902.0 6294 2 NaN NaN 15761 1 0 179928 16294 Sacramento St & Davis St POINT (-122.39761 37.79450)
trips.head(2)
trip_id route_id service_id direction_id shape_id
0 9547346 15804 1 0 180140
1 9547345 15804 1 0 180140
shapes.head(2)
shape_id geometry
0 179928 LINESTRING (-122.39697 37.79544, -122.39678 37...
1 179929 LINESTRING (-122.39697 37.79544, -122.39678 37...

Stop frequencies

This function will create a geodataframe with the frequency for each combination of stop, time of day and direction. Each row with a Point geometry. The stops_freq function takes stop_times and stops created in the previous steps as arguments. The user can optionally specify cutoffs as a list in case the default is not good. These cutoffs are the times of days to use as aggregation.

cutoffs = [0,6,9,15.5,19,22,24]
stop_freq = gtfs.stops_freq(stop_times, stops, cutoffs = cutoffs)
stop_freq.head(2)
stop_id dir_id window ntrips frequency max_trips max_freq stop_name geometry
8157 5763 Inbound 0:00-6:00 1 360 5 12 Noriega St & 48th Ave POINT (-122.50785 37.75293)
13102 7982 Outbound 0:00-6:00 1 360 3 20 Moscow St & RussiaAvet POINT (-122.42996 37.71804)
9539 6113 Inbound 0:00-6:00 1 360 5 12 Portola Dr & Laguna Honda Blvd POINT (-122.45526 37.74310)
12654 7719 Inbound 0:00-6:00 1 360 5 12 Middle Point & Acacia POINT (-122.37952 37.73707)
9553 6116 Inbound 0:00-6:00 1 360 5 12 Portola Dr & San Pablo Ave POINT (-122.46107 37.74040)

Line frequencies

This function will create a geodataframe with the frequency for each combination of line, time of day and direction. Each row with a LineString geometry. The line_freq function takes stop_times, trips, shapes, routes created in the previous steps as arguments. The user can optionally specify cutoffs as a list in case the default is not good. These cutoffs are the times of days to use as aggregation.

cutoffs = [0,6,9,15.5,19,22,24]
line_freq = gtfs.lines_freq(stop_times, trips, shapes, routes, cutoffs = cutoffs)
line_freq.head()
route_id route_name dir_id window frequency ntrips max_freq max_trips geometry
376 15808 44 O'SHAUGHNESSY Inbound 0:00-6:00 360 1 12 5 LINESTRING (-122.46459 37.78500, -122.46352 37...
378 15808 44 O'SHAUGHNESSY Inbound 0:00-6:00 360 1 12 5 LINESTRING (-122.43416 37.73355, -122.43299 37...
242 15787 25 TREASURE ISLAND Inbound 0:00-6:00 360 1 15 4 LINESTRING (-122.39611 37.79013, -122.39603 37...
451 15814 54 FELTON Inbound 0:00-6:00 360 1 20 3 LINESTRING (-122.38845 37.73994, -122.38844 37...
241 15787 25 TREASURE ISLAND Inbound 0:00-6:00 360 1 15 4 LINESTRING (-122.39542 37.78978, -122.39563 37...

Bus segments

The function cut_gtfs takes stop_times, stops, and shapes created by import_gtfs as arguments and returns a geodataframe where each segment is a row and has a LineString geometry.

segments_gdf = gtfs.cut_gtfs(stop_times, stops, shapes)
segments_gdf.head(2)
route_id direction_id stop_sequence start_stop_name end_stop_name start_stop_id end_stop_id segment_id shape_id geometry distance_m
0 15761 0 1 Clay St & Drumm St Sacramento St & Davis St 4015 6294 4015-6294 179928 LINESTRING (-122.39697 37.79544, -122.39678 37... 205.281653
1 15761 0 2 Sacramento St & Davis St Sacramento St & Battery St 6294 6290 6294-6290 179928 LINESTRING (-122.39761 37.79446, -122.39781 37... 238.047505

Scheduled Speeds

This function will create a geodataframe with the speed_kmh and speed_mph for each combination of line, segment, time of day and direction. Each row with a LineString geometry. The function speeds_from_gtfs takes routes, stop_times and segments_gdf created in the previous steps as arguments. The user can optionally specify cutoffs as a list in case the default is not good. These cutoffs are the times of days to use as aggregation.

# Cutoffs to make get hourly values
cutoffs = list(range(24))
speeds = speeds_from_gtfs(routes, stop_times, segments_gdf, cutoffs = cutoffs)
speeds.head(1)
route_id route_name dir_id segment_id window speed_kmh s_st_id s_st_name e_st_id e_st_name distance_m stop_seq runtime_h max_kmh geometry speed_mph max_mph
0 15761 1 CALIFORNIA Inbound 4015-6294 10:00-11:00 12.0 4015 Clay St & Drumm St 6294 Sacramento St & Davis St 205.281653 1 0.017222 12.0 LINESTRING (-122.39697 37.79544, -122.39678 37... 7.456452 7.456452
speeds.loc[(speeds.segment_id=='3114-3144')&(speeds.window=='0:00-6:00')]
route_id route_name dir_id segment_id window speed_kmh s_st_id s_st_name e_st_id e_st_name distance_m stop_seq runtime_h max_kmh geometry speed_mph max_mph
11183 15792 30 STOCKTON Inbound 3114-3144 0:00-6:00 12.8 3114 3rd St & Brannan St 3144 3rd St & Bryant St 373.952483 2 0.028565 13.0 LINESTRING (-122.39323 37.77923, -122.39431 37... 7.953549 8.077823
16862 15809 45 UNION-STOCKTON Inbound 3114-3144 0:00-6:00 13.0 3114 3rd St & Brannan St 3144 3rd St & Bryant St 373.952483 2 0.027778 13.0 LINESTRING (-122.39323 37.77923, -122.39431 37... 8.077823 8.077823
19889 15831 91 3RD-19TH AVE OWL Outbound 3114-3144 0:00-6:00 17.0 3114 3rd St & Brannan St 3144 3rd St & Bryant St 373.952483 56 0.021667 17.0 LINESTRING (-122.39323 37.77923, -122.39431 37... 10.563307 10.563307
22823 ALL_LINES All lines NA 3114-3144 0:00-6:00 15.2 3114 3rd St & Brannan St 3144 3rd St & Bryant St 373.952483 2 0.024511 NaN LINESTRING (-122.39323 37.77923, -122.39431 37... 9.444839 NaN

Segment frequencies

cutoffs = [0,6,9,15.5,19,22,24]
seg_freq = gtfs.segments_freq(segments_gdf, stop_times, routes, cutoffs = cutoffs)
seg_freq.head(2)
route_id route_name dir_id segment_id window frequency ntrips s_st_id s_st_name e_st_name max_freq max_trips geometry
23191 ALL_LINES All lines NA 3628-3622 0:00-6:00 360 1 3628 Alemany Blvd & St Charles Ave Alemany Blvd & Arch St 20 18 LINESTRING (-122.46949 37.71045, -122.46941 37...
6160 15787 25 TREASURE ISLAND Inbound 7948-8017 0:00-6:00 360 1 7948 Transit Center Bay 29 Shoreline Access Road 15 4 LINESTRING (-122.39611 37.79013, -122.39603 37...
seg_freq.loc[(seg_freq.segment_id=='3114-3144')&(seg_freq.window=='0:00-6:00')]
route_id route_name dir_id segment_id window frequency ntrips s_st_id s_st_name e_st_name max_freq max_trips geometry
10566 15809 45 UNION-STOCKTON Inbound 3114-3144 0:00-6:00 120 3 3114 3rd St & Brannan St 3rd St & Bryant St 12 5 LINESTRING (-122.39323 37.77923, -122.39431 37...
7604 15792 30 STOCKTON Inbound 3114-3144 0:00-6:00 60 6 3114 3rd St & Brannan St 3rd St & Bryant St 12 5 LINESTRING (-122.39323 37.77923, -122.39431 37...
13209 15831 91 3RD-19TH AVE OWL Outbound 3114-3144 0:00-6:00 30 12 3114 3rd St & Brannan St 3rd St & Bryant St 30 2 LINESTRING (-122.39323 37.77923, -122.39431 37...
16580 ALL_LINES All lines NA 3114-3144 0:00-6:00 17 21 3114 3rd St & Brannan St 3rd St & Bryant St 5 59 LINESTRING (-122.39323 37.77923, -122.39431 37...

Save files

file_name = 'stop_frequencies'
gtfs.save_gdf(stop_frequencies_gdf, file_name, shapefile=True, geojson=True)

Map your work

Stop frequencies

# Stops
condition_dir = stop_freq.dir_id == 'Inbound'
condition_window = stop_freq.window == '6:00-9:00'

gdf = stop_freq.loc[(condition_dir & condition_window),:].reset_index()

gtfs.map_gdf(gdf = gdf, 
              variable = 'ntrips', 
              colors = ["#d13870", "#e895b3" ,'#55d992', '#3ab071', '#0e8955','#066a40'], 
              tooltip_var = ['frequency'] , 
              tooltip_labels = ['Frequency: '], 
              breaks = [10, 20, 30, 40, 120, 200])

stops

Line frequencies

# Line frequencies
condition_dir = line_freq.dir_id == 'Inbound'
condition_window = line_freq.window == '6:00-9:00'

gdf = line_freq.loc[(condition_dir & condition_window),:].reset_index()

gtfs.map_gdf(gdf = gdf, 
              variable = 'ntrips', 
              colors = ["#d13870", "#e895b3" ,'#55d992', '#3ab071', '#0e8955','#066a40'], 
              tooltip_var = ['route_name'] , 
              tooltip_labels = ['Route: '], 
              breaks = [5, 10, 20, 50])

line

Speeds

If you are looking to visualize data at the segment level for all lines I recommend you go with something more powerful like kepler.gl (AKA my favorite data viz library). For example, to check the scheduled speeds per segment:

# Speeds
import keplergl as kp
m = kp.KeplerGl(data=dict(data=speeds, name='Speed Lines'), height=400)
m

kepler_speeds

Segment frequencies

# Segment frequencies
import keplergl as kp
m = kp.KeplerGl(data=dict(data=seg_freq, name='Segment frequency'), height=400)
m

kepler_segment_freq

Other plots

Histogram

# Histogram
import plotly.express as px
px.histogram(
    stop_freq.loc[stop_freq.frequency<50], 
    x='frequency', 
    title='Stop frequencies',
    template='simple_white', 
    nbins =20)

histogram

Heatmap

# Heatmap
import plotly.graph_objects as go
dir_0 = speeds.loc[(speeds.dir_id=='Inbound')&(speeds.route_name=='1 CALIFORNIA')].sort_values(by='stop_seq') 
dir_0['hour'] = dir_0.window.apply(lambda x: int(x.split(':')[0]))
dir_0.sort_values(by='hour', ascending=True, inplace=True)

fig = go.Figure(data=go.Heatmap(
                   z=dir_0.speed_kmh,
                   y=dir_0.s_st_name,
                   x=dir_0.window,
                   hoverongaps = False,
                   colorscale=px.colors.colorbrewer.RdYlBu, 
                   reversescale=False
))

fig.update_yaxes(title_text='Stop', autorange='reversed')
fig.update_xaxes(title_text='Hour of day', side='top')
fig.update_layout(showlegend=False, height=600, width=1000,
                 title='Speed heatmap per direction and hour of the day')

fig.show()

heatmap

Line chart

by_hour = speeds.pivot_table('speed_kmh', index = ['window'], aggfunc = ['mean','std'] ).reset_index()
by_hour.columns = ['_'.join(col).strip() for col in by_hour.columns.values]
by_hour['hour'] = by_hour.window_.apply(lambda x: int(x.split(':')[0]))
by_hour.sort_values(by='hour', ascending=True, inplace=True)

# Scatter
fig = px.line(by_hour, 
           x='window_', 
           y='mean_speed_kmh', 
           template='simple_white', 
           #error_y = 'std_speed_kmh'
                )

fig.update_yaxes(rangemode='tozero')

fig.show()

line_chart

Fancy line chart

# Line graphs
import plotly.graph_objects as go
example2 = speeds.loc[(speeds.s_st_name=='Fillmore St & Bay St')&(speeds.route_name=='All lines')].sort_values(by='stop_seq') 
example2['hour'] = example2.window.apply(lambda x: int(x.split(':')[0]))
example2.sort_values(by='hour', ascending=True, inplace=True)

fig = go.Figure()

trace = go.Scatter(
    name='Speed',
    x=example2.hour, 
    y=example2.speed_kmh,
    mode='lines',
    line=dict(color='rgb(31, 119, 180)'),
    fillcolor='#F0F0F0',
    fill='tonexty',
    opacity = 0.5)


data = [trace]

layout = go.Layout(
    yaxis=dict(title='Average Speed (km/h)'),
    xaxis=dict(title='Hour of day'),
    title='Average Speed by hour of day in stop Fillmore St & Bay St',
    showlegend = False, template = 'simple_white')

fig = go.Figure(data=data, layout=layout)

# Get the labels in the X axis right
axes_labels = [] 
tickvals=example2.hour.unique()[::3][1:]

for i in range(0, len(tickvals)):
    label = str(tickvals[i]) + ':00'
    axes_labels.append(label)

fig.update_xaxes(
    ticktext=axes_labels,
    tickvals=tickvals
)

# Add vertical lines
y_max_value = example2.speed_kmh.max()

for i in range(0, len(tickvals)):
    fig.add_shape(
        # Line Vertical
        dict(
            type="line",
            x0=tickvals[i],
            y0=0,
            x1=tickvals[i],
            y1=y_max_value,
            line=dict(
                color="Grey",
                width=1
            )
        )
    )
    
# Labels in the edge values
for i in range(0, len(tickvals)):    
    y_value = example2.loc[example2.hour==tickvals[i], 'speed_kmh'].values[0].round(2)
    fig.add_annotation(
        x=tickvals[i],
        y=y_value,
        text=str(y_value),
    )
fig.update_annotations(dict(
            xref="x",
            yref="y",
            showarrow=True,
            arrowhead=0,
            ax=0,
            ay=-18
))

fig.update_yaxes(rangemode='tozero')

fig.show()

line_chart1

About

Package with useful functions to create geo-spatial visualizations from a GTFS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%