create`get_TRI_parents()` function #78

ericnost · 2024-07-19T13:56:12Z

Something like this (see Cumulative Impacts notebook)

# Get all places that have reported to TRI here
this_place_unique_facs = this_place.drop_duplicates(subset="REGISTRY_ID")

# Link these to TRI company info
## Load TRI company info
import requests, zipfile, io, json

url = 'https://www3.epa.gov/tri/current/US_2022.zip'
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/content")

tri_hqs = pandas.read_csv("US_4_2022.txt", delimiter = "\t", encoding='latin1', on_bad_lines='skip') # Some problems loading one row - skipping for now

# Join TRI releases with context information
parents = pandas.merge(this_place_unique_facs, tri_hqs, left_on = "REGISTRY_ID", right_on = "16. EPA REGISTRY ID", how = "left")
parents.loc[parents['15. STANDARDIZED PARENT COMPANY NAME'].isna(), "parent"] = parents["FAC_NAME"] # Use Fac Name if no parent info
parents.loc[~(parents['15. STANDARDIZED PARENT COMPANY NAME'].isna()), "parent"] = parents['15. STANDARDIZED PARENT COMPANY NAME']
parents.groupby(by="parent")[["REGISTRY_ID"]].nunique().sort_values(by="REGISTRY_ID", ascending=False) #"27. SUBMITTED STANDARDIZED PARENT COMPANY NAME"

The text was updated successfully, but these errors were encountered:

ericnost added the enhancement New feature or request label Jul 19, 2024

ericnost self-assigned this Jul 19, 2024

ericnost added this to ECHO_modules Jul 19, 2024

ericnost moved this to Todo in ECHO_modules Jul 19, 2024

ericnost added this to the v0.2.0 milestone Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create`get_TRI_parents()` function #78

create`get_TRI_parents()` function #78

ericnost commented Jul 19, 2024

createget_TRI_parents() function #78

createget_TRI_parents() function #78

Comments

ericnost commented Jul 19, 2024

create`get_TRI_parents()` function #78

create`get_TRI_parents()` function #78