adding updated scripts to fetch the data and convert it into json #8

yashBhosale · 2024-09-17T08:29:16Z

Working on a new script to fetch all of the election data. This is not the finished script but it is at 90%. Finishing touches needed. Will close #1 when complete.

yashBhosale · 2024-09-17T08:30:23Z

This code could likely be pared down a bit - i was mostly putting the tracks down on as I was driving over them. Ideally I can finish this in the morning and we can merge this.

yashBhosale · 2024-09-17T23:00:09Z

scripts/scrape_elections.py

+    pairs = [(contest, race) for contest, c_info in results_metadata.items() for race in c_info["races"]]
+    #print(len(pairs))
+    #async with ClientSession() as cs:
+    #    async with cs.get("https://chicagoelections.gov/elections/results/156/download?contest=15&ward=&precinct=") as resp:


def fetch(race: int, contest: int, cs: ClientSession): resp = await cs.get("https://chicagoelections.gov/elections/results/{race}/download?contest={content}&ward=&precinct=") return book_pandas(await resp.content.read(), race, contest)

We can asyncio.gather this and it should reduce time wasted on I/o which is likely the biggest performance killer.

also need to pool this

we should also cache the request response. these will likely never change

yashBhosale · 2024-09-17T23:03:14Z

scripts/scrape_elections.py

+
+
+
+def book_pandas(book: BytesIO):


def book_pandas(book, race, contest):

is this a todo note for later?

yashBhosale · 2024-09-17T23:07:41Z

scripts/scrape_elections.py

+                cols[i] = f"{cols[i-1]} %"
+        subtables[ward] = pd.DataFrame(sub_table[1:], columns=sub_table[0]).set_index('Precinct').to_dict(orient="index")
+        cur_row = next(rows, None)
+    dump(subtables, open("subtable.json", 'w'))


`f"{race}_{contest}.json" or something. Oh and probably pickle for prod.

yashBhosale · 2024-09-17T23:08:05Z

scripts/scrape_elections.py

+        except StopIteration:
+            pass
+        cols = sub_table[0]
+        print(sub_table)


yashBhosale · 2024-09-17T23:10:52Z

scripts/scrape_elections.py

+        cols = sub_table[0]
+        print(sub_table)
+        print(cols)
+        for i in range(len(cols)):


'cols = [col if col != '%' else cols[i-1] + " %" for i, cols in enumerate(cols)]'

yashBhosale · 2024-09-17T23:11:52Z

scripts/scrape_elections.py

+from json import load, dump
+from asyncio import run
+
+


We need to check for edge cases.

yashBhosale · 2024-09-17T23:12:25Z

scripts/scrape_elections.py

+    #async with ClientSession() as cs:
+    #    async with cs.get("https://chicagoelections.gov/elections/results/156/download?contest=15&ward=&precinct=") as resp:
+    #        book_pandas(await resp.content.read())
+    book_pandas(open("/home/yash/Downloads/download.xls", "rb").read())


Oops. Remove.

derekeder

I did an initial review. I think for this to be ready to bring in we'll want to have it:

save the results in the same place as the old script
remove the unused scraper code that this replaces
update the readme as necessary

We can do it here, or in a future PR, but the elections.json file seems like something we could scrape and generate dynamically based on the HTML on this page: https://chicagoelections.gov/elections/results

derekeder · 2024-09-19T20:52:56Z

scripts/scrape_elections.py

+
+
+
+def book_pandas(book: BytesIO):


is this a todo note for later?

derekeder · 2024-09-19T20:54:08Z

scripts/scrape_elections.py

+    pairs = [(contest, race) for contest, c_info in results_metadata.items() for race in c_info["races"]]
+    #print(len(pairs))
+    #async with ClientSession() as cs:
+    #    async with cs.get("https://chicagoelections.gov/elections/results/156/download?contest=15&ward=&precinct=") as resp:


we should also cache the request response. these will likely never change

…/race ids

yashBhosale · 2024-09-24T08:55:17Z

So this code works and is largely performant (although I have some thoughts about how to improve performance) - I'm not really sure what the desired output format is.... right now it just puts everything into a big ol json
Once I figure that out it should be trivial to convert?

yashBhosale · 2024-09-24T09:04:14Z

Never mind, I did figure it out. It's trivial enough that I'll do it in the morning and hopefully this should be good enough to merge.

derekeder · 2024-09-24T19:05:39Z

@yashBhosale sounds good! we can plan to look at it tonight

netlify · 2024-09-25T05:23:28Z

❌ Deploy Preview for chicago-election-archive failed.

Name	Link
🔨 Latest commit	`b6caed1`
🔍 Latest deploy log	https://app.netlify.com/sites/chicago-election-archive/deploys/66fbe429d26094000929dbc0

adding updated scripts to fetch the data and convert it into json

a815124

yashBhosale commented Sep 17, 2024

View reviewed changes

some minor updates

cc7de2a

derekeder reviewed Sep 19, 2024

View reviewed changes

yashBhosale added 2 commits September 19, 2024 19:54

cleanup, caching requests, new (unfinished) code to fetch the contest…

fc0e3ac

…/race ids

some error handling, some performance stuff

b7815fb

yashBhosale added 2 commits September 24, 2024 18:49

outputting csvs in the output folder

803a080

finished enough to merge

752f396

yashBhosale added 2 commits September 25, 2024 00:24

this isnt on the website anymore

0cd9745

column formatting, adding id

b6caed1

yashBhosale merged commit fd38ec8 into main Oct 1, 2024
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding updated scripts to fetch the data and convert it into json #8

adding updated scripts to fetch the data and convert it into json #8

yashBhosale commented Sep 17, 2024 •

edited

Loading

yashBhosale commented Sep 17, 2024

yashBhosale Sep 17, 2024

yashBhosale Sep 18, 2024

derekeder Sep 19, 2024

yashBhosale Sep 17, 2024

derekeder Sep 19, 2024

yashBhosale Sep 19, 2024

yashBhosale Sep 17, 2024

yashBhosale Sep 17, 2024

yashBhosale Sep 17, 2024

yashBhosale Sep 17, 2024

yashBhosale Sep 17, 2024

derekeder left a comment

derekeder Sep 19, 2024

derekeder Sep 19, 2024

yashBhosale commented Sep 24, 2024 •

edited

Loading

yashBhosale commented Sep 24, 2024

derekeder commented Sep 24, 2024

netlify bot commented Sep 25, 2024 •

edited

Loading

adding updated scripts to fetch the data and convert it into json #8

adding updated scripts to fetch the data and convert it into json #8

Conversation

yashBhosale commented Sep 17, 2024 • edited Loading

yashBhosale commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekeder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yashBhosale commented Sep 24, 2024 • edited Loading

yashBhosale commented Sep 24, 2024

derekeder commented Sep 24, 2024

netlify bot commented Sep 25, 2024 • edited Loading

❌ Deploy Preview for chicago-election-archive failed.

yashBhosale commented Sep 17, 2024 •

edited

Loading

yashBhosale commented Sep 24, 2024 •

edited

Loading

netlify bot commented Sep 25, 2024 •

edited

Loading