Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event detail: Related board reports are not showing up #328

Closed
reginafcompton opened this issue Jul 14, 2018 · 10 comments
Closed

Event detail: Related board reports are not showing up #328

reginafcompton opened this issue Jul 14, 2018 · 10 comments
Assignees

Comments

@reginafcompton
Copy link
Contributor

The most recent events do not show related board reports (though previous events do).

The SQL in the view code is not behaving as expected: https://github.com/datamade/la-metro-councilmatic/blob/master/lametro/views.py#L146

@reginafcompton
Copy link
Contributor Author

reginafcompton commented Jul 16, 2018

Initial Problem
Metro added agendas on July 13, around 2:30pm CST. Over two dozen bills (present in Legistar) were not in the opencivicdata database at this time. As a result, the "Related Board Reports" did not render on the Event Detail pages.

Preliminary Solution
I manually ran the scraper (twice), around 5:00, with the window option:

pupa update lametro bills window=7
pupa update lametro bills window=500

These scrapes caught nearly all of the missing bills (which were added to the OCD API database at 5:25 pm or 5:45 pm on July 13). The data import in Councilmatic subsequently added these bills to LA Metro.

Still, the related board reports did not render as expected.

Cause and Effect
I turned off the scraper crontasks during the manual execution of the scraper - hence, turning off the events scraper, which creates the related_entities and event items upon which the Councilmatic view depends. After turning on cron again, the event scraper ran, mitigating the issue (...seemingly like magic).

Root Problem (Unsolved)
Why were these board reports missing? Many have a Legistar MatterLastModifiedUtc timestamp
of July 6, suggesting that the scraper should have scraped the bill on or before that date. Examples:
http://webapi.legistar.com/v1/metro/matters/5039
http://webapi.legistar.com/v1/metro/matters/5022
http://webapi.legistar.com/v1/metro/matters/5073
http://webapi.legistar.com/v1/metro/matters/5124

The Sentry logs do not show any corresponding errors, and the DataMade team did not develop the scraper on this day.

Approaching Mitigation
We adjusted our error logging to identify when agenda items point to "missing" bills. Then, we can more quickly pinpoint when the scraper misses a bill that should be present. @hancush @fgregg - additional thoughts on this welcome!

@reginafcompton
Copy link
Contributor Author

reginafcompton commented Jul 16, 2018

Reference: List of previously missing bills

'2018-0428'
'2018-0238'
'2018-0351'
'2018-0412'
'2018-0241'
'2018-0389'
'2018-0318'
'2018-0139'
'2018-0291'
'2018-0308'
'2018-0387'
'2018-0441'
'2018-0104'
'2018-0137'
'2018-0140'
'2018-0187'
'2018-0246'
'2018-0339'
'2018-0262'
'2018-0069'
'2018-0289'
'2018-0342'
'2018-0366'
'2018-0230'
'2018-0409'
'2018-0321'
'2018-0232'
'2018-0368'
'2018-0399'
'2018-0411'
'2018-0393'
'2018-0403'
'2018-0244'
'2018-0388'
'2018-0359'
'2018-0453'
'2018-0422'
'2018-0434'
'2018-0433'

@fgregg
Copy link
Collaborator

fgregg commented Jul 17, 2018

My current suspicion this issue is caused by Metro making a board report public and that act not triggering an update of "MatterLastModifiedUtc". If that was so, it would explain why our windowed bill scraping seems to be missing these bills.

If that's so, then here are four things that we could do.

  1. if possible have this fixed in the API, so that making a board report public would update "MattLastModifiedUTC".
  2. let DataMade have access to private bills through the API
  3. Scrape all bills nightly
  4. Scrape all bills with high frequency on Friday night

If 1 or 2 happened, we wouldn't need to do 3 or 4. If we can't do 1 or 2, we probably need to do both 3 and 4.

I'm not completely sure that a failure to update "MatterLastModifiedUtc" on making a board report public is the cause.

In order to confirm it we would need to confirm that toggling the public status of a board report does not change "MatterLastModifiedUtc" With LA Metro's cooperation this should be pretty easy. We can choose some old board report that is currently public, ask them to make it private and then make it public again. This isn't the exact test we would like to do, but if "MatterLastModifiedUtc" does not update, we should be pretty confident.

In addition, we should bring in opencivicdata/python-legistar-scraper#74 and make any other necessary changes so that the only unresolved board reports are private reports.

@reginafcompton
Copy link
Contributor Author

We have a preliminary solution that involves scraping all bills on Fridays, during a designated period of time: datamade/scrapers-us-municipal#22

Metro and DataMade plan to test a couple bills next week to determine how the timestamp correlates with the shift from a "private" to "public" status.

@shrayshray
Copy link
Collaborator

@reginafcompton Re: your questions via email:
Which reports will we test? Metro will create an Agenda with the meeting “Location” field set to “TEST”, and add test reports to this Agenda. Some of the test reports will be set as “Not Viewable Via InSite”.

How will Metro will toggle reports from public to private?

  1. Change an Agenda status to “Final”. (Reports which do not have “Not Viewable Via InSite” checked should become public, and those which do have it check should stay private.)
  2. Uncheck “Not Viewable Via InSite” on a report which was published with this checked.
  3. Check “Not Viewable Via InSite” on a report which was published with it unchecked.

@reginafcompton
Copy link
Contributor Author

Terrific!

We have a mechanism in place for hiding test events, but not test reports. Will you be doing any "real" data creation in Legistar tomorrow afternoon? If not, then we can turn off the automated scrape, while we test; we'll turn on the scraper, only after you remove the test data from Legistar. Does that work for Metro?

Otherwise, we'll need a way of knowing which bills are "TEST" bills, so we can hide them on the Councilmatic site. (That would require a few hours of development time on our part.)

What we want to learn from this testing

(1) How does toggling a report from private to public affect the MatterLastModifiedUtc timestamp (on the Legistar API)? Be sure to consider all three ways of toggling.

(2) If the timestamp does not change automatically, can Metro manually adjust it at the time of toggle?

@shrayshray
Copy link
Collaborator

Sounds great! There shouldn't be any issue with turning off the scraper while we test.

@shrayshray
Copy link
Collaborator

The test agenda is Board of Directors - Regular Board Meeting, 8/6/2018 at 9:30am.
It has 4 test reports. Two are "Not Viewable Via Insite":
2018-0517
2018-0518
Two are viewable:
2018-0519
2018-0520

@shrayshray
Copy link
Collaborator

We'd like to use BOTH options 3 & 4:
3. Scrape all bills nightly
4. Scrape all bills with high frequency on Friday night
Would it be possible to use the dashboard functionality in #257 to display the error log of agenda items pointing to missing items?

@reginafcompton
Copy link
Contributor Author

@shrayshray - we implemented the above suggestions a while back:

I made a note about the admin interface in the relevant issue.

Finally, I want to note that our new changes to the scraper, which enable scraping private bills, also help resolve this issue.

I think we can close this one! Let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants