Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should get_memento() ignore the mode in archive.org URLs? #115

Open
Mr0grog opened this issue Mar 8, 2023 · 0 comments
Open

Should get_memento() ignore the mode in archive.org URLs? #115

Mr0grog opened this issue Mar 8, 2023 · 0 comments
Labels
question Further information is requested

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Mar 8, 2023

Currently, get_memento() can be called in a few different ways:

  • get_memento(archived_url) requests a memento using the URL, timestamp, and mode that are baked into the URL. (archived_url means a URL like https://web.archive.org/web/[YYYYMMDDHHmmss][mode]/[url])
  • get_memento(cdx_record, mode=mode) requests a memento with the URL and timestamp from the CDX record object, and the given mode (where mode defaults to original)
  • get_memento(url, timestamp, mode=mode) requests a memento of the given URL at the given timestamp with the given mode (again, mode is optional and defaults to original)

Folks using this library will usually want mode=Mode.original, which is what we typically do by default. BUT since an archive URL has the mode baked in, we obey whatever mode was in the URL.

The problem is that mode as a concept is a little advanced and requires extra thinking about what you’re asking for. Folks are prone to copying a URL from their browser and dropping it in here to try things out, or accidentally using cdx_record.view_url instead of just passing the CDX record directly without realizing that they are changing modes (or what that even means!). For example, #109 uncovered a legitimate issue with view mode, but the user didn’t actually want to be using view mode at all! (Once I explained that, it turned out the actual issue wasn’t even a blocker for him — he switched to original mode and was good to go.)

So: should calling get_memento(archived_url) ignore the mode that’s in the URL and use whatever one is explicitly set as a parameter instead (as in all other cases, defaulting to original)? For example:

client.get_memento("https://web.archive.org/web/20230101000000/https://www.epa.gov/")

Currently gets you a memento in view mode. The change I’m thinking about would mean you’d get original mode instead here. If you wanted view mode, you’d have to ask for it explicitly:

client.get_memento("https://web.archive.org/web/20230101000000/https://www.epa.gov/", mode=Mode.view)

It would also mean all these calls get you the same result, instead of different ones:

client.get_memento("https://web.archive.org/web/20230101000000/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000id_/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000js_/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000cs_/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000im_/https://www.epa.gov/")
# Note different mode values ---------------------------------^^^
@Mr0grog Mr0grog added the question Further information is requested label Mar 8, 2023
@Mr0grog Mr0grog changed the title Should get_memento() assume original mode unless the mode parameter is explicitly set? Should get_memento() ignore the mode in archive.org URLs? Mar 8, 2023
@Mr0grog Mr0grog moved this to Unreleased in Wayback Roadmap Dec 13, 2023
@Mr0grog Mr0grog moved this from Unreleased to Backlog in Wayback Roadmap Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: Backlog
Development

No branches or pull requests

1 participant