Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance QuestionableQuesting: Export all posts (including non-threadmark posts) to epub #1454

Closed
xypha opened this issue Sep 3, 2024 · 14 comments
Assignees

Comments

@xypha
Copy link

xypha commented Sep 3, 2024

Problem
Non-threadmark posts cannot be exported.

Steps to replicate:

  • Open 'Story Only' thread of With This Ring in browser tab.
  • Scenario 1: Run WebToEpub from toolbar icon → only 1 chapter is loaded. None of the threadmarks on the page are seen.
  • In the browser tab containing QQ post, click on "Threadmarks" button and select "View all 148 threadmarks" option.
  • Scenario 2: Run WebToEpub from toolbar icon → only 25 chapters are loaded.
  • In the browser tab containing QQ post, on the "Threadmarks" overlay, change the "Per page:" option to maximum (as of 2024.09.03, it is 400).
  • Scenario 3: Run WebToEpub from toolbar icon → all 148 chapters are loaded, but non-threadmark posts cannot be exported.

WebtoEpub issue 1
WebtoEpub issue 2
WebtoEpub issue 3

Describe the solution you'd like

Possible solution to Scenario 1 and 2:

  • In WebToEpub popup tab, add warning text (maybe above the Chapters Count), telling users to ensure all threadmarks are loaded before export.
  • Add section in Wiki on how to add all threadmarks to chapter link this on QQ and similar sites (and add a link in the warning text if above solution is implemented).

Possible solution to Scenario 3:

  • Add advanced option to export all posts from all thread pages (in this case, using page navigation instead of threadmarks - pages 1 to 135, as of 2024.09.03).

Describe alternatives you've considered
Exporting through FicHub (also on GitHub) solves Scenario 1 and 2 - but fails to export images (which is a deal breaker).
For Scenario 3, the problem persists. Non-threadmark posts cannot be exported.

Additional context
Current version: 0.0.0.167
Browser: Firefox 129.0.2 (64-bit)
OS: Windows 11 23H2

@xypha
Copy link
Author

xypha commented Sep 3, 2024

Malware alert - file was not opened
2024-09-03 @ 11:15:50

Repository owner deleted a comment Sep 3, 2024
@gamebeaker
Copy link
Collaborator

@xypha lucky you...

@Kiradien
Copy link
Collaborator

Kiradien commented Sep 3, 2024

@gamebeaker At this point we can be fairly confident that zawa999 is violating GitHub's Terms of Service - before deleting comments in the future, it's probably worth reporting their content for chance at a full IP ban. I'd do it, but you seem to find & delete them before I see them xD

As for the mentioned issue, I've actually been thinking of something similar for all Xenforo forums, mostly for the perspective of bulk threadmark download through "Reader Mode", however it would work similarly for this case as well. One issue: it runs into a few faults - mostly with WebToEpubs indexing logic - e.g. Each chapter link is pre-defined before generation begins, which would be impossible under this paging structure. That can theoretically be worked around, but even if it can, it won't work the exact same as other sites.

I'll look at a potential solution on this but if one is possible, it will likely require configuration in [WebToEpub > Advanced Options > Manually Select Parser] to differentiate it from the standard, unless someone has a better idea for handling this case.

@dteviot
Copy link
Owner

dteviot commented Sep 4, 2024

@Kiradien
Some notes

I don't understand Scenario 3

As regards Scenario 2, Am I missing something? WebToEpub could be made to detect there's multiple ToC pages, and fetch them. URL for each page seems to be like: https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/threadmarks?per_page=25&page=4

@Kiradien
Copy link
Collaborator

Kiradien commented Sep 4, 2024

I don't understand Scenario 3

As regards Scenario 2, Am I missing something? WebToEpub could be made to detect there's multiple ToC pages, and fetch them. URL for each page seems to be like: https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/threadmarks?per_page=25&page=4

Yeah, this enhancement is entirely edge-cases; I understand why you're confused, it's also why I will not add these fixes to the main parser.
A number of things are happening here, but it's mostly just that the author didn't threadmark his chapters. This is not a failing of WebToEpub's current design for Xenforo, but a general work-around that is actually useful in other cases.

The UI is also different for this archive page, normally paging isn't really needed for threadmarks... it's a really odd edgecase.


Some notes of my own:
I wouldn't normally consider this type of enhancement, it's only because of "Reader Mode" allowing retrieval of multiple chapter simultaneously that I'm working on it... It can be handy to download these books a bit quicker with less strain on the server side. It's also a fair bit of fun to dig into elements I don't usually touch.

@xypha
Copy link
Author

xypha commented Sep 4, 2024

@Kiradien
To clarify further on Scenario 3:
my intention was to suggest exporting non-chapter posts and comments... sometimes, reading non-threadmark posts (i.e., user comments, speculation/theory crafting and author's responses) is helpful or just plain fun.
An option to export all posts in a thread to epub for easy reading would be nice.

@Kiradien
Copy link
Collaborator

Kiradien commented Sep 4, 2024

@xypha No worries, that is actually what I'm working on. Just taking time since I'm poking around elements I don't usually touch in my free time. It might end up being a bit buggy on chapter titles (Since the title is usually pulled from the 'threadmark'), but the goal should be feasible... Just a bit slower to release than most patches I work on.

My comments about 'Reader Mode' is simply because that is what I will personally use it to export, no intent to make it exclusive to that.

@Jemeni11
Copy link
Contributor

Exporting through FicHub (also on GitHub) solves Scenario 1 and 2 - but fails to export images (which is a deal breaker).
For Scenario 3, the problem persists. Non-threadmark posts cannot be exported.

Hi. I made a CLI tool for adding images to FicHub here. You'll have to install python to use it though

@Kiradien
Copy link
Collaborator

Kiradien commented Sep 28, 2024

Sorry for the delay on this; was working on it on and off and was a little too intent on a 'perfect' solution. PR uploaded with a working solution - it's not the perfect solution I wanted, all posts on each QQ 'page' are corelated to a single chapter, but it does the job.

I'll push the PR through once the issues are resolved

Trying to make each post a chapter with the current setup of web2epub is a bit too much of a nightmare.

In order to use the new parser, you need to open up advanced options and select the "Xenforo Batch Post Parser" under manual parsers.
image

Kiradien added a commit that referenced this issue Sep 28, 2024
@gamebeaker
Copy link
Collaborator

Test versions for Firefox and Chrome have been uploaded to https://github.com/dteviot/WebToEpub/releases/tag/developer-build. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes.

@xypha
Copy link
Author

xypha commented Sep 29, 2024

@Kiradien
This works for With This Ring.

Thank you!

Saw a bunch of errors - mostly about fetching images, but also others.

No complaints though. THANK YOU! this is what I wanted.

Just going to share the errors here in case they might be relevant.

  • several others once the epub was downloaded -- see attached text file (too long to post directly in the comment)

    errors.txt

  • 403 errors (3 in total for different domains) that I had to click on skip to complete the epub download.

Example :

WARNING: Site '1.bp.blogspot.com' has sent an Access Denied (403) error.
You may need to logon to site, or browse site normally
until you get a Cloudflare "Are you a human" page or satisfy some other CAPTCHA
before WebToEpub can continue.
Fetch of image 'http://1.bp.blogspot.com/_M7D1hE_0cz0/S9GqWbJ-0pI/AAAAAAAADLk/AUuEqBBzDCE/s1600/GL4602.jpg' for page 'https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/page-53' failed with network error 403. This is an intermittent error. If you retry in a few minutes, it may succeed. promptUserForRetry@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:57:19
onResponseError@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:48:25
checkResponseAndGetData@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:207:45
wrapFetchImpl@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:197:31
async*retryFetch@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:77:27
async*onResponseError@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:40:25
checkResponseAndGetData@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:207:45
wrapFetchImpl@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:197:31
async*wrapFetch@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:157:27
fetchImage@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/ImageCollector.js:335:40
fetchImages@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/ImageCollector.js:108:28
async*fetchImagesUsedInDocument/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:545:44
promise callback*fetchImagesUsedInDocument@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:543:14
fetchWebPageContent/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:528:31
promise callback*fetchWebPageContent@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:518:59
async*fetchWebPages/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:491:69
fetchWebPages@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:491:41
async*fetchContent@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:463:21
fetchContentAndPackEpub@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:153:16
EventHandlerNonNull*addEventHandlers@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:464:9
window.onload@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:584:13
EventHandlerNonNull*main<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:579:5
@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:598:3

@dteviot
Copy link
Owner

dteviot commented Sep 29, 2024

@xypha

I had a quick skim through them. All I saw were WebToEpub reporting it was unable to retrieve an image. (So you know it won't be in the epub, and it's not WebToEpub's fault.)

e.g. http://static.comicvine.com seems to be down/gone
404 errors speak for themselves.
etc.

Kiradien added a commit that referenced this issue Sep 29, 2024
@dteviot
Copy link
Owner

dteviot commented Nov 9, 2024

@xypha

Updated version (1.0.1.0) has been submitted to Firefox and Chrome stores.
Firefox version is available now.
Chrome might be available in a few hours (typical) to 21 days.

My thanks again to @Kiradien for his hard work

@dteviot dteviot closed this as completed Nov 9, 2024
@xypha
Copy link
Author

xypha commented Nov 10, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants