Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Board Reports: Unpublished reports showing on website! #345

Closed
shrayshray opened this issue Sep 11, 2018 · 32 comments
Closed

Board Reports: Unpublished reports showing on website! #345

shrayshray opened this issue Sep 11, 2018 · 32 comments
Assignees

Comments

@shrayshray
Copy link
Collaborator

A bunch of reports which are not yet published are showing up on the site. See the first 8 reports listed: https://boardagendas.metro.net/search/
E.g., 2018-0435: https://boardagendas.metro.net/board-report/2018-0435/ Status in Legistar is “Agenda Ready”. It is NOT available on metro.legistar.com. The Agenda has not yet been published and is not available: https://boardagendas.metro.net/event/executive-management-committee-8000f2384368/

@shrayshray
Copy link
Collaborator Author

Follow up on this issue ... thank you so much, @reginafcompton, for resolving it right away.
The reports which were showing up in advance of their agenda being published (they are on agendas for 9/19 and 9/20) had been manually changed in Legistar by unchecking the box "Not Viewable Via InSite".

Six of the 8 reports were General Public Comment, which users previously manually set to "Not Viewable ...", and then unchecked weeks/months later when it was decided the policy would be to show them. So for this month's General Public Comment reports, the users got mixed up and assumed they always needed to manually uncheck the box for these reports. The other two reports were changed within about 10 minutes on the same day, so I'm assuming the user just went with the flow he/she got into with this process.

So this was a workflow issue on the Metro side ... users not clear that the box is checked by default before the report's agenda is published, but it will be viewable once the agenda is published - unless the box is checked after publication (and we alert Datamade to this change). We're clarifying the appropriate workflow with our users. Here's what I'm wondering: Would it be possible for the Councilmatic site to follow the same logic for viewable/hidden as InSite, and first check whether the report's agenda has been published? These reports were not visible on InSite while they were visible on the Councilmatic site, because InSite is looking at whether the reports are on a published agenda.

@shrayshray shrayshray added this to the September 2018 issues milestone Sep 13, 2018
@reginafcompton
Copy link
Contributor

Thanks for this detailed report @shrayshray. I have a couple questions before suggesting a solution:

  1. Can you tell me about the MatterAgendaDate for a Board Report on the Legistar API (e.g., http://webapi.legistar.com/v1/metro/matters/5204)? Specifically, does a Board Report have a MatterAgendaDate, only when the agenda has been published?

  2. It looks like the MatterStatusName can be "Agenda Ready", even when the agenda is not published. Can you confirm? (I was looking at this upcoming event, which contains reports listed as "Agenda Ready," though the Agenda is still a draft.)

@shrayshray
Copy link
Collaborator Author

@reginafcompton 1. No, Board Reports are assigned Agenda Dates during the drafting process, not when the agenda is published.
2. Yes, once Board Reports are ready to be published the status is changed to "Agenda Ready". It's like a cue the drafting and approval process is complete and the Report is ready to be on an Agenda.

@shrayshray
Copy link
Collaborator Author

The logic we discussed for consistent treatment of reports and PDF rendering is to:

Check whether "Not Viewable via InSite" is True or False. If True, stop/do not display. If False,
Check report type (this step becomes necessary once the archive of pre-2015 board documents and Board Boxes is added to Legistar). If "Board Box", display report and PDF. If False,
Check whether the report is on a published agenda. If true, display report and PDF. If false, stop/do not display.

@reginafcompton reginafcompton changed the title Board Reports: Unpublished reports showing on website! BEFORE OCT 12 – Board Reports: Unpublished reports showing on website! Sep 25, 2018
@reginafcompton
Copy link
Contributor

reginafcompton commented Oct 15, 2018

Breaking down the various steps:

  • Check whether "Not Viewable via InSite" is True or False. If True, do not scrape it. Handled by: Skip bills with restricted view opencivicdata/scrapers-us-municipal#251

  • Check bill_type - always display "Board Box" (for pre-2015 reports) - this is the "local_classification" in the extras field or bill_type in the Bill model

  • Check whether the report is on a published agenda, by querying the database for the Bill, its related EventAgendaItem, and the related Event. If the status is "passed," then the Bill is on a published agenda.

@reginafcompton
Copy link
Contributor

reginafcompton commented Oct 20, 2018

@shrayshray - I've implemented the logic suggested above. I'd like to deploy it to the staging site and test that it works as expected. For this, could you add a few test bills, ideally one for each case? That being:

  • a bill that replicates the above described workflow error
  • a "Board Box" report
  • a bill that has "Not Viewable via InSite" as TRUE
  • a bill that does not appear on a published agenda, but has "Not Viewable via InSite" as FALSE

Let me know a good date for testing!

@reginafcompton reginafcompton changed the title BEFORE OCT 12 – Board Reports: Unpublished reports showing on website! Board Reports: Unpublished reports showing on website! Nov 1, 2018
@shrayshray
Copy link
Collaborator Author

shrayshray commented Nov 2, 2018

@reginafcompton I'm working on 3 out of 4 of these and will hopefully have them available for you tomorrow. But regarding the 2nd item, "Board Box" report, we currently do not have any of these in Legistar - they're stored in the Board Archive and in the dedicated Board Box Archive and will not be available in Legistar until we migrate the Board Archive into the system.
Edit: To clarify, Board Boxes are not drafted in Legistar. Because they're currently drafted manually, we store them in the Board Archive.

@reginafcompton
Copy link
Contributor

Here are my expectations for the three cases we plan to test:

  • a bill that recreates the above workflow error – the scraper should scrape it and the site should import it, but it should not be viewable, since it does not appear on a published agenda
  • a bill that has "Not Viewable via InSite" as TRUE – the scraper should skip this bill (it should not be in the OCD API, and it should not be in the Councilmatic DB)
  • a bill that does not appear on a published agenda, but has "Not Viewable via InSite" as FALSE – same as the first case

I am using the staging site as the test arena.

@shrayshray - after you add the three bills to Legistar and the scraper and import run, I'll need to rebuild the Solr index. Just let me know when you get started!

@shrayshray
Copy link
Collaborator Author

@reginafcompton I'm about to get started. But first, to clarify, the workflow error was that the report should be viewable, but it should be the current version -- meaning, it reflects the edits made in the time between it was on the original agenda, which was cancelled, and the new agenda. The issue was the report showing on the site was in the state in which it appeared originally and did not reflect updates made in the meantime.

@reginafcompton
Copy link
Contributor

@shrayshray - I believe you might be thinking of #347. That's a different issue handled by the new cache-refresh script.

For this issue, we are just testing whether or not board reports are hidden vs. not hidden.

Does that sound right?

@shrayshray
Copy link
Collaborator Author

@reginafcompton got it. So you want me to stop the workflow reproduction after the meeting is cancelled, and not continue on and add it to a new meeting?

@reginafcompton
Copy link
Contributor

reginafcompton commented Nov 15, 2018

I think the above workflow would go something like:

(1) you create a General comment report that is on an UNPUBLISHED agenda, and you check "Not Viewable Via InSite"
(2) then, I'll run the scrape and import - nothing should get scraped!
(3) then, you uncheck "Not Viewable Via InSite" (but do not publish the agenda).
(4) then, I'll run the scrape and import - the report should be scraped, but not viewable on Councilmatic!

Does that sound okay?

@shrayshray
Copy link
Collaborator Author

shrayshray commented Nov 15, 2018

Sorry for the delay! 1 is complete.
Meeting is Planning and Programming Committee, 3/21/19
Report is 2018-0747

@reginafcompton
Copy link
Contributor

reginafcompton commented Nov 15, 2018

@shrayshray - great! I confirmed that the scraper skips the bill. Query of OCD API.

Let's try move on to 3 and 4.

@shrayshray
Copy link
Collaborator Author

shrayshray commented Nov 15, 2018

Okay, just let me know when you're ready for me to move on to Item 3!

@reginafcompton
Copy link
Contributor

I am! Did you uncheck "Not Viewable Via InSite"? If so, I'll run the scraper.

@shrayshray
Copy link
Collaborator Author

Yes, just unchecked it.

@reginafcompton
Copy link
Contributor

Great, I ran the scraper, and it pulled the bill into the OCD API. I then executed import_data, which added the bill to the Councilmatic database. Then, I updated the Solr index, and behold! The bill is not there.

https://lametro.datamade.us/search/?q=%222018-0747%22&search-all=on

@shrayshray - the system seems to be working as expected. In fact, with this example, we handled all the cases I wanted to test. How do you feel? Do you have any questions?

@shrayshray
Copy link
Collaborator Author

Great! Did you want to test on a published Agenda? Or checking/unchecking "Not Viewable Via Insite" on reports not on an Agenda, but with the status of Agenda Ready? Or have we covered these already?

@shrayshray
Copy link
Collaborator Author

@reginafcompton Could you please ensure the test meetings are removed from the Councilmatic site?
There are 2:

  1. Planning and Programming Committee, 3/21/19 -- location = "test" (it was my understanding "test" in location would prevent it from showing on the site ...)
  2. Planning and Programming Committee, 3/21/19 -- location = "1 Gateway Plaza ..." this one was my mistake ... the Legistar interface didn't update to show it was created, and I thought I'd maybe forgotten to save it, so I created the second meeting.

The Planning and Programming Committee meeting on the 3/20/19 is the actual meeting that month and should remain visible.

@reginafcompton
Copy link
Contributor

@shrayshray yes, I will handle these shortly!

@reginafcompton
Copy link
Contributor

All right - stray items have been removed from the site. I think we might want to test the example you mention above:

Checking/unchecking "Not Viewable Via Insite" on reports not on an Agenda, but with the status of Agenda Ready

In both cases, the report should NOT appear on the site. When would be a good time to try that out?

@shrayshray
Copy link
Collaborator Author

@reginafcompton Could you take a look at reports 2018-0749 and 2018-0750? These reports are showing up on the site, but the Agenda they're going to be on is not published yet.
They do NOT have the "Not Viewable via InSite" box checked; but the logic we established should hide them based on the Agenda not being published/public yet.

@reginafcompton
Copy link
Contributor

reginafcompton commented Nov 28, 2018

@shrayshray - I did not deploy the code with our system of "checks" to production (since we were still testing it!). It looks like the system works, however – since the reports-in-question are not visible on the staging site (e.g., https://lametro.datamade.us/search/?q=%222018-0750%22&search-all=on)

I can go ahead and deploy the new system to production.

@shrayshray
Copy link
Collaborator Author

@reginafcompton yes, please deploy -- thank you!

@reginafcompton
Copy link
Contributor

It's deployed!

@shrayshray
Copy link
Collaborator Author

@reginafcompton Thank you! I did find one report which looks like it slipped through the cracks of the new logic: 2018-0513.
It was on a (published) committee agenda previously, but has since been marked "Not Viewable Via InSite" in Legistar.

@reginafcompton
Copy link
Contributor

@shrayshray - excuse the delay in responding. It appears that this bill continues to appear on agendas for two events in the Legistar API:

http://webapi.legistar.com/v1/metro/events/1389/eventitems (n.b. OCD API)
http://webapi.legistar.com/v1/metro/events/1491/eventitems (n.b. OCD API)

As long as the bill remains on those agendas, it will be rendered in Councilmatic.

If you remove the agenda items in Legistar, let me know, and we'll take a look at the Councilmatic data, again. If not, it seems like we can close this issue!

@reginafcompton reginafcompton self-assigned this Jan 17, 2019
@shrayshray
Copy link
Collaborator Author

@reginafcompton I think this is still an issue, though it's resolved for the bill in question, 2018-0513 -- it was it was on a Committee agenda, then after the meeting someone checked "Not Viewable Via Insite" while making revisions to it. After the revisions were complete, "Not Viewable Via Insite" was unchecked and it was added to the agenda for a Regular Board Meeting.

The problem was it appeared on the Councilmatic site while the "Not Viewable Via Insite" box was checked. It wasn't appearing on metro.legistar.com at that time. Isn't the first step for the scraper to check whether "Not Viewable via InSite" is True or False, and to not scrape if True?

@reginafcompton
Copy link
Contributor

Okay, I see. We have a tricky edge case here! I'll summarize why the Bill made it through the cracks.

  1. The scraper had already scraped the bill. Even though it was not visible on Legistar, the scraper did not have a mechanism for removing it from our database. We've raised this issue in the past.
  2. In the Councilmatic data system, the bill remained an agenda item on the Committee meeting. This raises some important points.

We want to fully insure that "hidden" bills do not appear on the Councilmatic interface. I can think of several not entirely optimal solutions:

ONE ping Legistar when we import agenda items; this would entail running the import with an update_since timestamp for the last 24 hours (can we do that in cron?), so that we get the most up-to-date event data (n.b. the event scraper scrapes all events every night).
TWO ping Legistar in is_viewable. This would add load time to the search page, and it would assume that Legistar is up-and-running (that's not a great dependency to have).

r = requests.head(self.source_url)
if r.status_code == 200:
    return True

THREE ping Legistar for 20 (or less) bills that get rendered in the search view. This would limit load time, but would, again, add a risky dependency with Legistar.

@shrayshray , could think about this a bit more, before landing on a solution?

@shrayshray
Copy link
Collaborator Author

@reginafcompton of course, think it out! This is of very highest priority to resolve, but I understand it's complicated and takes time.

@reginafcompton
Copy link
Contributor

reginafcompton commented Feb 12, 2019

@shrayshray - we have a fix in place for this! Here's what's new.

The scraper now grabs all private bills (i.e., bills with MatterRestrictViewViaWeb set to true). The scrape, however, captures very limited information about these bills, so that our OpenCivicData api does not expose private data. The significant data points are: the timestamp and the value of MatterRestrictViewViaWeb.

We then import these bills to the Metro database, but the view logic hides them.

In other words, the scraper helps us keep track of which bills are public or private and, hence, whether we should show or hide a bill.

Example

For example, report 2018-0660 is currently marked as private.

You can see its full title and other info in the Metro API (by including the token): https://webapi.legistar.com/v1/metro/matters/5391?token=SECRET_TOKEN

Conversely, its entry in the API omits any consequential detail.

Finally, a search for this report on Councilmatic returns zero results.


Can you let me know if you have any questions? and when/if you feel ready to close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants