Submitting pages to the wayback machine #166

8W9aG · 2024-12-03T23:37:21Z

Is there a way to allow this package to submit pages to the wayback machine? Either in the form of a request done and signed by this package or submitting the request to the wayback machine to do on the clients behalf?

Mr0grog · 2024-12-04T00:05:18Z

I’d be happy to accept a pull request for this!

It’s a natural fit, but I haven’t prioritized it since there are a huge number of other tools out there for it (in Python, I generally recommend savepagenow) and I haven’t even had time this year to finish out the other big deal stuff that is already half-done, like #58. It’s also a little complicated to do an ideal implementation, which supports the v2 API (see an example here that was abandoned because of complexity: palewire/savepagenow#31).

Mr0grog · 2024-12-04T00:15:50Z

Either in the form of a request done and signed by this package or submitting the request to the wayback machine to do on the clients behalf?

Also worth noting:

Members of the public cannot upload a WARC (or any other format for archived web pages) that will actually be displayed in the Wayback Machine (too many issues around proving that your content is really what was hosted somewhere, and not something you just invented yourself), although you can upload a WARC for other people to download as a collection item (using the internetarchive package).
BUT you can use the “save page now” API to ask the Wayback Machine to archive the live page itself (what I was talking about in my first reply). So that’s what we’d be doing here.
The Internet Archive also has a for-pay service called Archive-It you can use to crawl and save large websites (and do so repeatedly on a regular basis). If your needs are large-scale, this is probably the best thing to do.

8W9aG · 2024-12-04T00:20:23Z

I wonder if technologies like SXG might go a long way to solving the problem of whether the content is manipulated by a middleman? Perhaps that is a bit orthogonal to the conversation here.

Either way good to know that this package is keen for a save page now solution, I'll see if I can create a PR soon.

Mr0grog · 2024-12-04T00:31:13Z

Perhaps that is a bit orthogonal to the conversation here.

Yeah, I don’t work at the Internet Archive, so we are restricted to their current policies and tools as far as this stuff goes. 🙂

github-project-automation bot added this to Wayback Roadmap Dec 3, 2024

github-project-automation bot moved this to Backlog in Wayback Roadmap Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submitting pages to the wayback machine #166

Submitting pages to the wayback machine #166

8W9aG commented Dec 3, 2024

Mr0grog commented Dec 4, 2024 •

edited

Loading

Mr0grog commented Dec 4, 2024 •

edited

Loading

8W9aG commented Dec 4, 2024

Mr0grog commented Dec 4, 2024

Submitting pages to the wayback machine #166

Submitting pages to the wayback machine #166

Comments

8W9aG commented Dec 3, 2024

Mr0grog commented Dec 4, 2024 • edited Loading

Mr0grog commented Dec 4, 2024 • edited Loading

8W9aG commented Dec 4, 2024

Mr0grog commented Dec 4, 2024

Mr0grog commented Dec 4, 2024 •

edited

Loading

Mr0grog commented Dec 4, 2024 •

edited

Loading