Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between image filename saved to disk and link in source file #117

Open
coljac opened this issue Jul 13, 2023 · 1 comment
Open

Comments

@coljac
Copy link

coljac commented Jul 13, 2023

When spidering a website I have found that pywebcopy saves the images like this:

domain.com/dir/image_1.jpg.jpeg

But the source contains

<img src="./image_1.jpg">

In other words, it's appending a .jpeg extension where it oughtn't.

@rajatomar788
Copy link
Owner

Hey,
This shouldn't be happening.
But you can take a look at urls.py if you want to figure out the name generation specially the url2path function.

Also this could arise due to threading. So you should try running the job without threading.
Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants