New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Inconsistency between image filename saved to disk and link in source file #117

Open

coljac opened this issue Jul 13, 2023 · 1 comment

coljac commented Jul 13, 2023

When spidering a website I have found that pywebcopy saves the images like this:

domain.com/dir/image_1.jpg.jpeg

But the source contains

<img src="./image_1.jpg">

In other words, it's appending a .jpeg extension where it oughtn't.

The text was updated successfully, but these errors were encountered:

Owner

rajatomar788 commented Jul 16, 2023

Hey,
This shouldn't be happening.
But you can take a look at urls.py if you want to figure out the name generation specially the url2path function.

Also this could arise due to threading. So you should try running the job without threading.
Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment