Improving Data Tracking with 'Crawl Once' Filter in Scrapy #8

ZubairShahzad · 2024-11-08T10:51:26Z

Hi everyone,

I’m using the 'crawl once' filter in Scrapy to avoid scraping the same link more than once, which helps reduce overall proxy usage. Is there a way to adjust the middleware so that, when 'crawl once' is active and detects a previously scraped listing, it can still yield the UID and ScrapeTime for that listing? If not, that's okay, but I’m hoping to improve data tracking on the backend.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Data Tracking with 'Crawl Once' Filter in Scrapy #8

Improving Data Tracking with 'Crawl Once' Filter in Scrapy #8

ZubairShahzad commented Nov 8, 2024

Improving Data Tracking with 'Crawl Once' Filter in Scrapy #8

Improving Data Tracking with 'Crawl Once' Filter in Scrapy #8

Comments

ZubairShahzad commented Nov 8, 2024