Add a "clean" mode to combat duplicate entries in Legistar at time of scrape #294

reginafcompton · 2017-09-21T20:26:07Z

Recently, LA Metro had "duplicate" events in Legistar (i.e., same name and time, but different EventId):

http://webapi.legistar.com/v1/metro/events/1265
http://webapi.legistar.com/v1/metro/events/1259 (now defunct)

The scrapers for Metro run multiple times per day, and at the time of a scrape, both events were present.

We use the EventId to create the unique instance of the Identifier class. So, the importer would not have known these were the same event.

Let's add a "clean" scrape mode to pupa, i.e., the "clean" scrape removes data that does not appear on the legistar web api.

jamesturk · 2017-09-22T02:17:36Z

Hmm, right now pupa doesn't delete any top level objects so we'd need to really think about how this was handled. I'm open to discussion/proposals but tend to think that we should favor some other mechanism here.

reginafcompton · 2017-09-22T15:03:39Z

Thanks for the reply, @jamesturk, and I agree this is something we'd want to think about carefully before implementation. I've opened a new issue, which broadens our conversation: #295

reginafcompton mentioned this issue Sep 21, 2017

Events: Board Meeting listed 2x Metro-Records/la-metro-councilmatic#212

Closed

reginafcompton closed this as completed Sep 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a "clean" mode to combat duplicate entries in Legistar at time of scrape #294

Add a "clean" mode to combat duplicate entries in Legistar at time of scrape #294

reginafcompton commented Sep 21, 2017 •

edited

Loading

jamesturk commented Sep 22, 2017

reginafcompton commented Sep 22, 2017

Add a "clean" mode to combat duplicate entries in Legistar at time of scrape #294

Add a "clean" mode to combat duplicate entries in Legistar at time of scrape #294

Comments

reginafcompton commented Sep 21, 2017 • edited Loading

jamesturk commented Sep 22, 2017

reginafcompton commented Sep 22, 2017

reginafcompton commented Sep 21, 2017 •

edited

Loading