Enhance QuestionableQuesting: Export all posts (including non-threadmark posts) to epub #1454

xypha · 2024-09-03T05:44:34Z

Problem
Non-threadmark posts cannot be exported.

Steps to replicate:

Open 'Story Only' thread of With This Ring in browser tab.
Scenario 1: Run WebToEpub from toolbar icon → only 1 chapter is loaded. None of the threadmarks on the page are seen.
In the browser tab containing QQ post, click on "Threadmarks" button and select "View all 148 threadmarks" option.
Scenario 2: Run WebToEpub from toolbar icon → only 25 chapters are loaded.
In the browser tab containing QQ post, on the "Threadmarks" overlay, change the "Per page:" option to maximum (as of 2024.09.03, it is 400).
Scenario 3: Run WebToEpub from toolbar icon → all 148 chapters are loaded, but non-threadmark posts cannot be exported.

Describe the solution you'd like

Possible solution to Scenario 1 and 2:

In WebToEpub popup tab, add warning text (maybe above the Chapters Count), telling users to ensure all threadmarks are loaded before export.
Add section in Wiki on how to add all threadmarks to chapter link this on QQ and similar sites (and add a link in the warning text if above solution is implemented).

Possible solution to Scenario 3:

Add advanced option to export all posts from all thread pages (in this case, using page navigation instead of threadmarks - pages 1 to 135, as of 2024.09.03).

Describe alternatives you've considered
Exporting through FicHub (also on GitHub) solves Scenario 1 and 2 - but fails to export images (which is a deal breaker).
For Scenario 3, the problem persists. Non-threadmark posts cannot be exported.

Additional context
Current version: 0.0.0.167
Browser: Firefox 129.0.2 (64-bit)
OS: Windows 11 23H2

The text was updated successfully, but these errors were encountered:

xypha · 2024-09-03T05:47:04Z

Malware alert - file was not opened

gamebeaker · 2024-09-03T05:49:24Z

@xypha lucky you...

Kiradien · 2024-09-03T09:09:48Z

@gamebeaker At this point we can be fairly confident that zawa999 is violating GitHub's Terms of Service - before deleting comments in the future, it's probably worth reporting their content for chance at a full IP ban. I'd do it, but you seem to find & delete them before I see them xD

As for the mentioned issue, I've actually been thinking of something similar for all Xenforo forums, mostly for the perspective of bulk threadmark download through "Reader Mode", however it would work similarly for this case as well. One issue: it runs into a few faults - mostly with WebToEpubs indexing logic - e.g. Each chapter link is pre-defined before generation begins, which would be impossible under this paging structure. That can theoretically be worked around, but even if it can, it won't work the exact same as other sites.

I'll look at a potential solution on this but if one is possible, it will likely require configuration in [WebToEpub > Advanced Options > Manually Select Parser] to differentiate it from the standard, unless someone has a better idea for handling this case.

dteviot · 2024-09-04T09:21:37Z

@Kiradien
Some notes

I don't understand Scenario 3

As regards Scenario 2, Am I missing something? WebToEpub could be made to detect there's multiple ToC pages, and fetch them. URL for each page seems to be like: https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/threadmarks?per_page=25&page=4

Kiradien · 2024-09-04T11:47:29Z

I don't understand Scenario 3

As regards Scenario 2, Am I missing something? WebToEpub could be made to detect there's multiple ToC pages, and fetch them. URL for each page seems to be like: https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/threadmarks?per_page=25&page=4

Yeah, this enhancement is entirely edge-cases; I understand why you're confused, it's also why I will not add these fixes to the main parser.
A number of things are happening here, but it's mostly just that the author didn't threadmark his chapters. This is not a failing of WebToEpub's current design for Xenforo, but a general work-around that is actually useful in other cases.

The UI is also different for this archive page, normally paging isn't really needed for threadmarks... it's a really odd edgecase.

Some notes of my own:
I wouldn't normally consider this type of enhancement, it's only because of "Reader Mode" allowing retrieval of multiple chapter simultaneously that I'm working on it... It can be handy to download these books a bit quicker with less strain on the server side. It's also a fair bit of fun to dig into elements I don't usually touch.

xypha · 2024-09-04T19:15:53Z

@Kiradien
To clarify further on Scenario 3:
my intention was to suggest exporting non-chapter posts and comments... sometimes, reading non-threadmark posts (i.e., user comments, speculation/theory crafting and author's responses) is helpful or just plain fun.
An option to export all posts in a thread to epub for easy reading would be nice.

Kiradien · 2024-09-04T19:20:34Z

@xypha No worries, that is actually what I'm working on. Just taking time since I'm poking around elements I don't usually touch in my free time. It might end up being a bit buggy on chapter titles (Since the title is usually pulled from the 'threadmark'), but the goal should be feasible... Just a bit slower to release than most patches I work on.

My comments about 'Reader Mode' is simply because that is what I will personally use it to export, no intent to make it exclusive to that.

Jemeni11 · 2024-09-27T12:33:30Z

Exporting through FicHub (also on GitHub) solves Scenario 1 and 2 - but fails to export images (which is a deal breaker).
For Scenario 3, the problem persists. Non-threadmark posts cannot be exported.

Hi. I made a CLI tool for adding images to FicHub here. You'll have to install python to use it though

Kiradien · 2024-09-28T14:59:20Z

Sorry for the delay on this; was working on it on and off and was a little too intent on a 'perfect' solution. PR uploaded with a working solution - it's not the perfect solution I wanted, all posts on each QQ 'page' are corelated to a single chapter, but it does the job.

I'll push the PR through once the issues are resolved

Trying to make each post a chapter with the current setup of web2epub is a bit too much of a nightmare.

In order to use the new parser, you need to open up advanced options and select the "Xenforo Batch Post Parser" under manual parsers.

#1454 xenforo batch post dl

gamebeaker · 2024-09-28T15:31:23Z

Test versions for Firefox and Chrome have been uploaded to https://github.com/dteviot/WebToEpub/releases/tag/developer-build. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes.

xypha · 2024-09-29T18:54:49Z

@Kiradien
This works for With This Ring.

Thank you!

Saw a bunch of errors - mostly about fetching images, but also others.

No complaints though. THANK YOU! this is what I wanted.

Just going to share the errors here in case they might be relevant.

several others once the epub was downloaded -- see attached text file (too long to post directly in the comment)

errors.txt
403 errors (3 in total for different domains) that I had to click on skip to complete the epub download.

Example :

WARNING: Site '1.bp.blogspot.com' has sent an Access Denied (403) error.
You may need to logon to site, or browse site normally
until you get a Cloudflare "Are you a human" page or satisfy some other CAPTCHA
before WebToEpub can continue.
Fetch of image 'http://1.bp.blogspot.com/_M7D1hE_0cz0/S9GqWbJ-0pI/AAAAAAAADLk/AUuEqBBzDCE/s1600/GL4602.jpg' for page 'https://forum.questionablequesting.com/threads/with-this-ring-young-justice-si-story-only.8961/page-53' failed with network error 403. This is an intermittent error. If you retry in a few minutes, it may succeed. promptUserForRetry@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:57:19
onResponseError@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:48:25
checkResponseAndGetData@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:207:45
wrapFetchImpl@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:197:31
async*retryFetch@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:77:27
async*onResponseError@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:40:25
checkResponseAndGetData@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:207:45
wrapFetchImpl@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:197:31
async*wrapFetch@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/HttpClient.js:157:27
fetchImage@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/ImageCollector.js:335:40
fetchImages@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/ImageCollector.js:108:28
async*fetchImagesUsedInDocument/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:545:44
promise callback*fetchImagesUsedInDocument@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:543:14
fetchWebPageContent/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:528:31
promise callback*fetchWebPageContent@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:518:59
async*fetchWebPages/<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:491:69
fetchWebPages@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:491:41
async*fetchContent@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/Parser.js:463:21
fetchContentAndPackEpub@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:153:16
EventHandlerNonNull*addEventHandlers@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:464:9
window.onload@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:584:13
EventHandlerNonNull*main<@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:579:5
@moz-extension://9c47d7d8-1255-4f03-beca-5faaf67f2e8b/js/main.js:598:3

dteviot · 2024-09-29T19:40:17Z

@xypha

I had a quick skim through them. All I saw were WebToEpub reporting it was unable to retrieve an image. (So you know it won't be in the epub, and it's not WebToEpub's fault.)

e.g. http://static.comicvine.com seems to be down/gone
404 errors speak for themselves.
etc.

Update XenforoBatchParser.js

dteviot · 2024-11-09T21:11:41Z

@xypha

Updated version (1.0.1.0) has been submitted to Firefox and Chrome stores.
Firefox version is available now.
Chrome might be available in a few hours (typical) to 21 days.

My thanks again to @Kiradien for his hard work

xypha · 2024-11-10T06:39:07Z

Thank you!

Repository owner deleted a comment Sep 3, 2024

gamebeaker mentioned this issue Sep 3, 2024

How to warn people about viruses/ malware? #1456

Closed

Kiradien self-assigned this Sep 3, 2024

Kiradien added the Status: In Progress label Sep 3, 2024

dteviot mentioned this issue Sep 11, 2024

Reduce http header cookie clutter #1481

Merged

Kiradien mentioned this issue Sep 28, 2024

#1454 xenforo batch post dl #1524

Merged

Kiradien added a commit that referenced this issue Sep 28, 2024

Merge pull request #1524 from Kiradien/#1454-Xenforo_Batch_Post_DL

ad728f3

#1454 xenforo batch post dl

Kiradien added Status: Completed and removed Status: In Progress labels Sep 28, 2024

Kiradien added a commit that referenced this issue Sep 29, 2024

Merge pull request #1527 from Kiradien/#1454-Xenforo_Batch_Post_DL

0696b59

Update XenforoBatchParser.js

dteviot closed this as completed Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance QuestionableQuesting: Export all posts (including non-threadmark posts) to epub #1454

Enhance QuestionableQuesting: Export all posts (including non-threadmark posts) to epub #1454

xypha commented Sep 3, 2024

xypha commented Sep 3, 2024 •

edited

Loading

gamebeaker commented Sep 3, 2024

Kiradien commented Sep 3, 2024 •

edited

Loading

dteviot commented Sep 4, 2024

Kiradien commented Sep 4, 2024 •

edited

Loading

xypha commented Sep 4, 2024 •

edited

Loading

Kiradien commented Sep 4, 2024

Jemeni11 commented Sep 27, 2024

Kiradien commented Sep 28, 2024 •

edited

Loading

gamebeaker commented Sep 28, 2024

xypha commented Sep 29, 2024

dteviot commented Sep 29, 2024

dteviot commented Nov 9, 2024

xypha commented Nov 10, 2024 •

edited

Loading

Enhance QuestionableQuesting: Export all posts (including non-threadmark posts) to epub #1454

Enhance QuestionableQuesting: Export all posts (including non-threadmark posts) to epub #1454

Comments

xypha commented Sep 3, 2024

xypha commented Sep 3, 2024 • edited Loading

gamebeaker commented Sep 3, 2024

Kiradien commented Sep 3, 2024 • edited Loading

dteviot commented Sep 4, 2024

Kiradien commented Sep 4, 2024 • edited Loading

xypha commented Sep 4, 2024 • edited Loading

Kiradien commented Sep 4, 2024

Jemeni11 commented Sep 27, 2024

Kiradien commented Sep 28, 2024 • edited Loading

gamebeaker commented Sep 28, 2024

xypha commented Sep 29, 2024

dteviot commented Sep 29, 2024

dteviot commented Nov 9, 2024

xypha commented Nov 10, 2024 • edited Loading

xypha commented Sep 3, 2024 •

edited

Loading

Kiradien commented Sep 3, 2024 •

edited

Loading

Kiradien commented Sep 4, 2024 •

edited

Loading

xypha commented Sep 4, 2024 •

edited

Loading

Kiradien commented Sep 28, 2024 •

edited

Loading

xypha commented Nov 10, 2024 •

edited

Loading