We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have a spider crawl only detail pages and they are never skipped by this middleware.
The text was updated successfully, but these errors were encountered:
A good catch; we need to add process_start_requests method as well.
process_start_requests
Sorry, something went wrong.
@bezkos Are you use meta={'crawl_once': True}? I tested middleware using this simple spider, and that's works correctly.
meta={'crawl_once': True}
import scrapy class QuotesSpider(scrapy.Spider): name = "quotes" start_urls = [ 'http://quotes.toscrape.com/tag/humor/', ] def start_requests(self): for url in self.start_urls: yield scrapy.Request(url, meta={'crawl_once': True}) def parse(self, response): yield { 'title': response.css('h1 a::text').extract_first(), }
First run - request sent.
{'crawl_once/initial': 0, 'crawl_once/stored': 1, 'downloader/request_bytes': 231, 'downloader/request_count': 1}
Second run - request ignored.
{'crawl_once/ignored': 1, 'crawl_once/initial': 1, 'downloader/exception_count': 1, 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1}
Note: requests generated by start_urls has not crawl_once in meta dictionary by default. For append it, use start_requests method.
crawl_once
Can you explain what problem you had?
No branches or pull requests
I have a spider crawl only detail pages and they are never skipped by this middleware.
The text was updated successfully, but these errors were encountered: