-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding updated scripts to fetch the data and convert it into json #8
Conversation
This code could likely be pared down a bit - i was mostly putting the tracks down on as I was driving over them. Ideally I can finish this in the morning and we can merge this. |
scripts/scrape_elections.py
Outdated
pairs = [(contest, race) for contest, c_info in results_metadata.items() for race in c_info["races"]] | ||
#print(len(pairs)) | ||
#async with ClientSession() as cs: | ||
# async with cs.get("https://chicagoelections.gov/elections/results/156/download?contest=15&ward=&precinct=") as resp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def fetch(race: int, contest: int, cs: ClientSession):
resp = await cs.get("https://chicagoelections.gov/elections/results/{race}/download?contest={content}&ward=&precinct=")
return book_pandas(await resp.content.read(), race, contest)
We can asyncio.gather
this and it should reduce time wasted on I/o which is likely the biggest performance killer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to pool this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also cache the request response. these will likely never change
scripts/scrape_elections.py
Outdated
|
||
|
||
|
||
def book_pandas(book: BytesIO): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def book_pandas(book, race, contest):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a todo note for later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
scripts/scrape_elections.py
Outdated
cols[i] = f"{cols[i-1]} %" | ||
subtables[ward] = pd.DataFrame(sub_table[1:], columns=sub_table[0]).set_index('Precinct').to_dict(orient="index") | ||
cur_row = next(rows, None) | ||
dump(subtables, open("subtable.json", 'w')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`f"{race}_{contest}.json" or something. Oh and probably pickle for prod.
scripts/scrape_elections.py
Outdated
except StopIteration: | ||
pass | ||
cols = sub_table[0] | ||
print(sub_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
scripts/scrape_elections.py
Outdated
cols = sub_table[0] | ||
print(sub_table) | ||
print(cols) | ||
for i in range(len(cols)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'cols = [col if col != '%' else cols[i-1] + " %" for i, cols in enumerate(cols)]'
from json import load, dump | ||
from asyncio import run | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check for edge cases.
scripts/scrape_elections.py
Outdated
#async with ClientSession() as cs: | ||
# async with cs.get("https://chicagoelections.gov/elections/results/156/download?contest=15&ward=&precinct=") as resp: | ||
# book_pandas(await resp.content.read()) | ||
book_pandas(open("/home/yash/Downloads/download.xls", "rb").read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial review. I think for this to be ready to bring in we'll want to have it:
- save the results in the same place as the old script
- remove the unused scraper code that this replaces
- update the readme as necessary
We can do it here, or in a future PR, but the elections.json
file seems like something we could scrape and generate dynamically based on the HTML on this page: https://chicagoelections.gov/elections/results
scripts/scrape_elections.py
Outdated
|
||
|
||
|
||
def book_pandas(book: BytesIO): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a todo note for later?
scripts/scrape_elections.py
Outdated
pairs = [(contest, race) for contest, c_info in results_metadata.items() for race in c_info["races"]] | ||
#print(len(pairs)) | ||
#async with ClientSession() as cs: | ||
# async with cs.get("https://chicagoelections.gov/elections/results/156/download?contest=15&ward=&precinct=") as resp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also cache the request response. these will likely never change
So this code works and is largely performant (although I have some thoughts about how to improve performance) - I'm not really sure what the desired output format is.... right now it just puts everything into a big ol json |
Never mind, I did figure it out. It's trivial enough that I'll do it in the morning and hopefully this should be good enough to merge. |
@yashBhosale sounds good! we can plan to look at it tonight |
❌ Deploy Preview for chicago-election-archive failed.
|
Working on a new script to fetch all of the election data. This is not the finished script but it is at 90%. Finishing touches needed. Will close #1 when complete.