Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bill scrape imports duplicate bill subjects #244

Open
reginafcompton opened this issue Sep 26, 2018 · 2 comments
Open

Bill scrape imports duplicate bill subjects #244

reginafcompton opened this issue Sep 26, 2018 · 2 comments

Comments

@reginafcompton
Copy link
Contributor

Recently, Chicago Legistar had a bill with duplicate indexes:

screen shot 2018-09-26 at 9 18 00 am

The scraper imported these duplicates (as seen in our OCD API):

screen shot 2018-09-26 at 9 19 53 am

Our Councilmatic database could not import the bill, due to an Integrity error (i.e., trying to import the same Subject more than once).

We could approach this a couple ways:

(1) The scraper should fail when trying to add duplicate subjects to a Bill.

-OR-

(2) The scraper should not fail, but it should not be able to create duplicate subjects (maybe by making Bill.subject a set, rather than list, in pupa? or checking if a subject exists before appending it?)

@fgregg
Copy link
Contributor

fgregg commented Sep 26, 2018

Let's fix it in the scraper.

@reginafcompton
Copy link
Contributor Author

Once we do this, we'll need to tend to Chicago Councilmatic, since https://ocd.datamade.us/ocd-bill/8efd9d9c-3397-4b5a-a7cc-fc935e9eb10a/ was never imported to the site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants