Skip to content

Acquiring, cleaning and combining various data sources for analysis of urban life

License

Notifications You must be signed in to change notification settings

colmanhumphrey/urbananalytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The purpose of this (first) project is twofold. First, to outline a method for acquiring, cleaning and reshaping
data from many sources about many facets of a given city. Second,
to perform some initial analyses.

----------------

The Data:

Here, we acquire and clean:
- Demographic data:
    - census, full collection
      -- gives population by race on block level
    - data that is re-collected routinely
      -- gives e.g. income and poverty metrics at the blockgroup level
- Crime Data
    - at the very least, should give location, time and category of each crime
      -- just have to be careful with e.g. car thefts: which location is recorded, crime or recovery?
- Landuse Data
    - gives use of land by lot, generally many lots fit in a block, in some minor cases
      lots extend over multiple blocks / blockgroups
- Business Data
   - data from Google, Yelp, Foursquare
   - hard thing is to put it together (categories)
   - well, downloading with the APIs isn't so easy, but at least is "straight forward"

Not used in these analyses, but useful later:
- Street Data
    - mainly for intersections
- School Data
    - size, location, years taught
- Transit Routes / Stops
- Property Data
  - age, height, value etc, many covariates

Collecting this data is not easy, and then it needs to be cleaned. Even then its format
might be too different to ours to fully apply our analyses, but hopefully they can be adjusted
without huge changes.

----------------

The cleaning etc:

First, we run setup_main.R, within code/get_clean_data. It has its own readme,
but basically it reads in census data from the API (you just supply your key),
and landuse and crime from external files. The cleaning functions for the latter
assume certain file characteristics, but it should be clear enough what you would
need to change if yours is different.

Next, within code/get_business_data, we don't include the code relevant to the APIs
for the three services (Google, Yelp, Foursquare) and we further assume you can
adjust the raw JSON... but we do include how to combine and remove duplicates.

----------------

The analyses:

We outline our analyses and plots from "Analysis of Urban Vibrancy and Safety
in Philadelphia".

The matching is done within code/first_matching. The folders short_long
and high_low only contain one relevant file each, those create the
data necessary to do our matching comparisons.

All plots are created within code/plots.

overall_blockgroup.R creates figs 1, 2, 4, 6.

landuse_color.R creates fig 3, it's slow.

crime_bar.R creates fig 5.

excess_plots.R creates figs 7, 8, and 9.

Within the folder code/plots/matching, short_long_plots.R creates fig 10,
and high_low_plots.R creates figs 11 and 12.

----------------

About

Acquiring, cleaning and combining various data sources for analysis of urban life

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages