-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: can it continue a suspended job? #55
Comments
Yes. Pywebcopy skips files that already exists, so you could consider it being resumed.
No. You have to rerun the scripts/cmd manually i.e. overwrite=False in scripts or without the --overwrite flag in cmd.
Yes. Set debug=True or --debug flag, then it will print logs which you could manually inspect. |
I think he was talking about the crawl delays between the requests (i.e. timeouts / pauses / wait) to prevent the high load and avoid being banned by the source. Is it possible to set such a delay between requests? Like "--wait" in WGET. It would be great for both sides (a source website won't be ddosed and the crawler won't be banned in the middle of the process). |
I don't think I got banned, and I wasn't talking about delay between requests. What I was experiencing was, um, like, the crawling just freezes, with no messages being printed to the console for minutes, after a while, and I had to kill the process and start over (otherwise it won't move). |
Trying to clone a webpage, but it froze after a while, probably due to some network hiccups. I had to kill the process and start over (only to get stuck again, to be honest). Is it possible for this module to continue a suspended job, skipping files that have already been saved?
(Also, what are the time out thresholds and retry limits for the requests? Can I specify these values?)
(Also, can I make it print some logs if a request failed or timed out and is doing a retry?)
Windows 10, Python 3.8.1. Module installed via
pip install pywebcopy
, module called by command linepython -m pywebcopy save_webpage http://y.tuwan.com/chatroom/3701 ./ --bypass_robots
.The text was updated successfully, but these errors were encountered: