Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Data Tracking with 'Crawl Once' Filter in Scrapy #8

Open
ZubairShahzad opened this issue Nov 8, 2024 · 0 comments
Open

Comments

@ZubairShahzad
Copy link

Hi everyone,

I’m using the 'crawl once' filter in Scrapy to avoid scraping the same link more than once, which helps reduce overall proxy usage. Is there a way to adjust the middleware so that, when 'crawl once' is active and detects a previously scraped listing, it can still yield the UID and ScrapeTime for that listing? If not, that's okay, but I’m hoping to improve data tracking on the backend.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant