This repository contains software to compute the segregation indices based on Class Coverage Times (CCT), as explained in the paper:
S. Sousa, V. Nicosia, Nature Communications "Quantifying ethnic segregation in cities through random walks". DOI:10.1038/s41467-022-33344-3
The code can be copied, used, modified, and redistributed under the terms of the MIT/Expat License. Please see the file LICENSE for additional details.
Software requirements: * The list bellow indicates the versions at which the experiments were conducted.
- anaconda 4.8.2 (recommended for operating system compatibility)
- Python 3.7.6
- numpy 1.18.1
- pandas 1.0.1
- shapely 1.6.4
- fiona 1.8.6
For segregation measures and maps:
- geopandas 0.8.1
- segregation 1.5.0 (Pysal)
- libpysal 4.4.0 (Pysal)
The script get_adjacent_polygons.py
returns the edge-list and the nodes
properties (ethnicity distribution). It takes the following inputs:
shape: Shapefile of the census tracts at country extent
code_name: The loop uk code (GISJOIN) string to match the tracts
pop_data: CSV file containing the tracts within the CSA
fileout: Optional filename to save the nodes properties of the resulting
Constructing the individual CSV file for each city is trivial and can be
obtained either by filtering the data by name
using your preferred editor or
using a library such as pandas in python to group rows by the column. The CSV
files from this process are available in the folder
census-original. Note that the largest component is not
automatically provided by the script, a version that provides the largest
connected component is available
here.
Note that the code_name
parameter is different for each country, the Shapefile
for US systems can be downloaded at US Census
Tiger
and they use the GEOID10
for joining data with census tables. The Shapefile for
the UK systems can be downloaded at the UK Data
Service
where the geo_code
is used to merge data.
US example:
python get_adjacent_polygons.py Tract_2010Census_DP1.shp GEOID10 data/census-original/us/SF1P9_Census_Atlanta.csv nodes_ethnics_Census_Atlanta.txt > edge_list_Census_Atlanta.txt
UK example:
python get_adjacent_polygons.py infuse_ward_lyr_2011_merged.shp geo_code data/census-original/uk/QS211_Wards_Bristol.csv nodes_ethnics_Wards_Bristol.txt > edge_list_Wards_Bristol.txt
A second edge-list file will be created automatically with following the naming
convention:
edges_ids_
+ pop_data filename
= edges_ids_QS211_Wards_Bristol.txt
Edge-list details:
- Node IDs are defined by the row index of the population table
- Each edge is reported once
The random walk on the adjacency graph of real systems can be simulated by
running the python script rw_cct_fnt.py
and the following input must be
provided:
edge: Edge-list file;
prop: Node properties file with the frequency of each ethnicity;
num: Number of walk repetitions from the same node;
idx: range of nodes or single node ID to run the walk from, e.g.: 1, 0-10.
Example:
python rw_cct_fnt.py data/census-clean/edge_list_Wards_Bristol.txt data/census-clean/nodes_ethnics_distrib_Wards_Bristol.txt 1000 0-143 >> [output_filename]
All the files provided in data/census-clean are already in the input format needed for the script. The script will print, for each node passed to the parameter, the following results:
"Node ID" "[list with the CCT for each fraction c]"
where each line corresponds to the average over all repetitions of the random walk from node i. The runtime depends on the number of nodes in the graph and the number of repetitions per node, which is expected to take a few seconds for a single node with 100 repetitions. The runtime can vary significantly per repetition since the trajectory of the walker is a pure stochastic process.
The synthetic systems use the script rw_cct_fnt_synthetic.py
which needs the
following input:
edge: Edge-list file;
C: number of classes to construct the MxC matrix;
num: Number of walk repetitions from the same node;
idx: range of nodes or single node ID to run the walk from, e.g.: 1, 0-10.
mode: the spatial pattern of the population distribution
Example:
python rw_cct_fnt_synthetic.py data/synthetic/edge_list_16x16_grid.txt 5 1000 0-256 544 >> [output_filename]
For details about each pattern, please consult the inline documentation in the script.
Census-clean:
edge_list_Census_[city].txt
: Edge-list for US systems with the assigned numeric ID.edge_list_Wards_[city].txt
: Edge-list for UK systems with the assigned numeric ID.nodes_ethnics_Census_[city].txt
: Ethnic data associated with the census tract.nodes_ethnics_distrib_Wards_[city].txt
: Ethnic data associated with the wards.
Census-original:
QS211_[scale]_[city].csv
: Census ethnic data with headers at the scales LSOA, OA and Wards.SF1P9_Census_[city].csv
: Census ethnic data with headers.
dfa:
edges_un_[scale]_[city].txt
: Edge-list for UK systems with the assigned numeric ID *.nodes_[scale]_entropy_[city].txt
: Entropy of the population distribution for UK systems with the assigned numeric ID *.
* All files available at three spatial scales LSOA, OA and Wards.
Socio-economic-US:
ACSDP5Y2011.DP03_data_with_overlays_2021-02-21T040944.csv
: Socio-Economic Indicators atCSA
level.ACSDP5Y2011.DP03_metadata_2021-02-21T040944.csv
: Metadata information.ACSDP5Y2011.DP03_table_title_2021-02-21T040944.txt
: Description of the fields in theCSV
table.
Synthetic:
edge_list_[size]_[boundary].txt
: Edge-list for synthetic systems for distinct lattice size and boundary conditions.