update Makefile and scrape_table.py to pull from updated data source #1

derekeder · 2024-08-28T02:17:08Z

current Makefile produces the following error:

Traceback (most recent call last):
  File "/home/user/Documents/chicago-elections-archive/scripts/scrape_table.py", line 69, in <module>
    main()
  File "/home/user/Documents/chicago-elections-archive/scripts/scrape_table.py", line 56, in main
    candidates = get_candidates(soup)
  File "/home/user/Documents/chicago-elections-archive/scripts/scrape_table.py", line 9, in get_candidates
    headers = [v.get_text().strip() for v in soup.find("table").find_all("b")]
AttributeError: 'NoneType' object has no attribute 'find_all'

Looks like the chicagoelections.gov has updated their pages that produce these results. https://chicagoelections.gov/en/election-results-specifics.asp is now a 404 page. This is the new one https://chicagoelections.gov/elections/results/

yashBhosale · 2024-08-30T20:15:52Z

Some color here: the new flow is that https://chicagoelections.gov/elections/results/ provides a menu of election links, then you go into an individual election which is just https://chicagoelections.gov/elections/results/{election_id} which then gives you a menu where you pick a race/ward/precinct and then it pulls up a table of the election data and a link to download that election data as an excel file. Incidentally, you can also get the excel file through https://chicagoelections.gov/elections/results/{election_id}?contest={contest_id}&ward={ward_number}&precinct={precinct_number}

This puts me in the rock/hard place of either using selenium to try to leverage the existing beautifulsoup code OR figuring out how to fetch/process the excel file as ergonomically as possible in the circumstances.

derekeder · 2024-08-30T20:37:45Z

@yashBhosale thanks for digging into this. I think if we can get the data in excel, lets go for it. Even if it requires a significant rewrite - the new code will be much less brittle and likely a lot simpler

yashBhosale · 2024-08-30T20:43:16Z

Yeah, that's what I was leaning towards as well. Secondary goal (or maybe an umbrella goal) will actually to be able to retire the makefile altogether, since it's actually only for building the data.

derekeder · 2024-08-30T20:44:51Z

cool. the makefile may still be useful if we want to chain some commands together, but we'll see.

for the excel data, we can make use of csvkit

nofurtherinformation · 2024-09-11T13:56:54Z

Hey, I was digging in to this a bit and found some insights. The Excel file that site outputs seems to be malformed in some way, and neither pandas.read_excel nor xlrd can read it from disk or bytes from the server.

An alternative approach could be to just use the HTML that gets returned from the Drupal endpoint, but we don't need selenium. Here's one naive and minimal example:

import requests
from bs4 import BeautifulSoup

contest_id = 205
# Drupal URL for html table
url = f"https://chicagoelections.gov/elections/results/{contest_id}?ajax_form=1&_wrapper_format=drupal_ajax"

# Form data for POST request
form_data = {
    'election_id': contest_id,
    'form_id': 'election_results_form',
}

# send POST request
response = requests.post(url, data=form_data)

# Returns a list of object, last one has data in html table
html_table = response.json()[-1]['data']

# Parse html table
table = BeautifulSoup(html_table, 'html.parser')

# TODO: Clean up table, make into Pandas DataFrame

This looks something like:

...
<thead>
<tr>
<th>Registered Voters</th>
<th>Ballots Cast</th>
<th>Turnout</th>
</tr>
</thead>
<tbody>
<tr>
<td>1,494,199</td>
<td>452,529</td>
<td>30.29%</td>
</tr>
...

yashBhosale mentioned this issue Aug 30, 2024

Decouple data deployment and code deployment #3

Closed

yashBhosale added the bug Something isn't working label Aug 30, 2024

yashBhosale self-assigned this Aug 30, 2024

yashBhosale mentioned this issue Sep 17, 2024

adding updated scripts to fetch the data and convert it into json #8

Merged

yashBhosale closed this as completed in #8 Oct 1, 2024

yashBhosale mentioned this issue Oct 30, 2024

Scrape Metadata #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update Makefile and scrape_table.py to pull from updated data source #1

update Makefile and scrape_table.py to pull from updated data source #1

derekeder commented Aug 28, 2024 •

edited by yashBhosale

Loading

yashBhosale commented Aug 30, 2024 •

edited

Loading

derekeder commented Aug 30, 2024

yashBhosale commented Aug 30, 2024

derekeder commented Aug 30, 2024

nofurtherinformation commented Sep 11, 2024 •

edited

Loading

update Makefile and scrape_table.py to pull from updated data source #1

update Makefile and scrape_table.py to pull from updated data source #1

Comments

derekeder commented Aug 28, 2024 • edited by yashBhosale Loading

yashBhosale commented Aug 30, 2024 • edited Loading

derekeder commented Aug 30, 2024

yashBhosale commented Aug 30, 2024

derekeder commented Aug 30, 2024

nofurtherinformation commented Sep 11, 2024 • edited Loading

derekeder commented Aug 28, 2024 •

edited by yashBhosale

Loading

yashBhosale commented Aug 30, 2024 •

edited

Loading

nofurtherinformation commented Sep 11, 2024 •

edited

Loading