You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Answer: this is not possible with merely checking URLs, but it is likely that the multimedia files do not change often, so it is likely that having a "do not update" list for multimedia would be more useful.
Instead for text pages, it would be more useful to first get the page creation date being touched. See here and here for reference. (It could be inaccurate however)
In Python there is a solution with urllib
Some other people have recommended the use of checksum instead, but that poses a risk on dynamically generated websites (especially with ads) that have content that constantly mutates (e.g. recommended reading lists).
There is no perfect solution, a person would have to make a sound judgement as to see which one is better.
Is it possible to only overwrite the file if the file changed since the last crawl?
The text was updated successfully, but these errors were encountered: