Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Chicago Scrapers #16

Merged
merged 12 commits into from
Feb 3, 2015
Merged

All Chicago Scrapers #16

merged 12 commits into from
Feb 3, 2015

Conversation

fgregg
Copy link
Contributor

@fgregg fgregg commented Dec 26, 2014

Starting on events scraper.

e.add_media_link(note='Recording',
url = events['Video']['url'],
type="recording",
media_type = '???')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In legistar systems, the Video's link to pages like: http://chicago.granicus.com/MediaPlayer.php?view_id=2&clip_id=401

Through some hackery, we can get a media url out from this, but I'm not sure the best way to proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hurm, I'd just use application/octet-stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right because if you follow that link you get an html page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so pass text/html?

elif datetime.datetime.utcnow().replace(tzinfo = pytz.utc) > when :
status = 'confirmed'
else :
status = 'passed'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of the 'passed' status? Shouldn't this be calculated on the fly by Imago?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for passed, totally. Let's get a bug open on that, and leave it at confirmed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imago

@fgregg fgregg changed the title getting events working All Chicago Scrapers Jan 15, 2015
e.add_document(note= events[doc_type]['label'],
url = events[doc_type]['url'],
media_type="application/pdf")
except ValueError :
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes 'Notice' and 'Agenda' are the same file. Sometimes not. Pupa doesn't allow the same file to appear more than once in document list (throws an ValueError). What's the way we want to handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hurm.

We have an on_duplicate="ignore" with Billy, we could implement something similar with Pupa, I guess. That or make the checking explicit or something

@fgregg
Copy link
Contributor Author

fgregg commented Jan 23, 2015

Okay, as soon as opencivicdata/python-opencivicdata#22 I think this is ready to go.

@paultag paultag merged commit 461214a into opencivicdata:master Feb 3, 2015
feydan pushed a commit to feydan/scrapers-us-municipal that referenced this pull request Nov 14, 2019
…hold

decrease warning threshold for sentry notifications from critical to warn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants