Add parser for mtgstory.com #1500

Darthagnon · 2024-09-21T17:42:45Z

WIP. Add parser for mtgstory.com (redirects to https://magic.wizards.com/en/story). Seems to work on most versions of the website (e.g. current live version, archive.org version from 2-3 years ago, untested on 10 years ago archive.org version). Still missing fallback support for mtglore.com.

Based on MagicWizardsParser.js v0.6 from https://github.com/Darthagnon/web2epub-tidy-script

Darthagnon · 2024-09-21T17:43:21Z

For some reason, gitignore was set to ignore additions to the parsers folder, so I have commented that line.

gamebeaker · 2024-09-21T18:48:00Z

@Darthagnon can you fix the eslint errors? (just push to your branch i think it should update in the merge request)
https://github.com/dteviot/WebToEpub/actions/runs/10974479483/job/30473842554?pr=1500

dteviot · 2024-09-21T19:41:21Z

@Darthagnon

Replacing

findCoverImageUrl(dom) {
    // Try to find an image inside the '.swiper-slide' or inside an 'article'
    let imgElement = dom.querySelector(".swiper-slide img, article img");

    // If an image is found, return its 'src' attribute
    if (imgElement) {
        return imgElement.getAttribute("src");
    // Check if the URL starts with '//' (protocol-relative URL)
        if (imgSrc && imgSrc.startsWith("//")) {
            // Add 'https:' to the start of the URL
            imgSrc = "https:" + imgSrc;
        }
    }
    // Fallback if no image was found
    return null;
}

with

    findCoverImageUrl(dom) {
        return util.getFirstImgSrc(dom, ".swiper-slide img, article img");
    }

will fix your problems.

dteviot · 2024-09-21T19:47:34Z

@Darthagnon

For some reason, gitignore was set to ignore additions to the parsers folder, so I have commented that line.

You can just remove that line.

dteviot · 2024-09-21T19:57:41Z

@Darthagnon

Commented out line 5

//parserFactory.register("mtglore.com", () => new MagicWizardsParser());

should be removed.

This

        if (authorPattern.test(href)) {
            return true;
        } else {
            return false;
        }

should be

        return authorPattern.test(href));

I'm not convinced that

if (window.location.hostname.includes("web.archive.org"))

does what you think it does.
I'm lazy. Please provide link links to the two cases it should distinguish between, and I'll check for myself.

gamebeaker · 2024-09-22T13:55:05Z

@Darthagnon Maybe you can change .gitignore to ignore the new files if someone does npm install (plugin/jszip/dist/jszip.min.js and package-lock.json)

Darthagnon · 2024-09-22T16:22:03Z

This

       if (authorPattern.test(href)) {
           return true;
       } else {
           return false;
       }

should be

        return authorPattern.test(href));

This change breaks the parser, results in it being unable to pick up any chapters.

plugin/jszip/dist/jszip.min.js was already there, but commented out; restored.

I'm not too sure what to do about the spacing/lint errors in packed.js... I have pack.js but packed.js does not exist. I haven't touched either file and don't know what tool to use to automatically fix them (maybe JSLint or NPPExec with eslint?)

Some test pages:

Live site https://magic.wizards.com/en/story#story-archive (select any of the stories in the timeline carousel other than the default most recent)
Old site https://web.archive.org/web/20160411073205/http://magic.wizards.com/en/articles/columns/magic-story-archive and https://web.archive.org/web/20160412030018/https://magic.wizards.com/en/articles/archive/uncharted-realms/blood-will-have-blood-2014-06-04
Very old site https://web.archive.org/web/20140302084755/http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg/daily/ur/263
Archive site/redirects https://mtglore.com

I believe the archive.org logic may be needed to account for slight variations in the article selectors over time, but I will keep testing.

Darthagnon · 2024-09-22T16:31:18Z

Hmmm... definitely WIP, I need to do some more work on it.

gamebeaker · 2024-09-22T16:36:12Z

@Darthagnon The spacing error message comes from npm run lint this command packages all js files into one file eslint/packed.js and evaluates it/ searches for warnings/ errors. The line from the error message is the line in packed.js as the normal Experimentaltab version has no errors the errors must be in a new file you created changed etc.
You have fixed these errors as the github actions which runs this command had no problems.

gamebeaker · 2024-09-22T16:42:10Z

An easy test is to change the indentation in main.js

now run npm run lint

in eslint/packed.js you can see the problem in line 23919

revert main.js and run npm run lint there are now no errors and packed.js also changed to reflect the changes.

Improves compatibility with 2016 version of site

gamebeaker · 2024-09-22T19:45:19Z

î guess the problem is here

line 18++

Also add TODO to JS. 2024 site and pre-2018 site work and are priority, as they cover all modern stories and older lost chapters. (Ancient MTG articles from pre-2014 not accounted for yet)

Darthagnon · 2024-09-22T20:12:34Z

Ongoing improvements mean the script now deals quite well with both the 2023-2024 version and 2014-2018 version of the website (v0.72, chapter titles now generalised and correctly selected).

dteviot · 2024-09-22T20:14:59Z

@Darthagnon

return authorPattern.test(href));

D'oh! Copy/paste mistake on my part. Should only be one closing bracket. i.e.

return authorPattern.test(href);

Darthagnon · 2024-09-22T20:16:56Z

@Gamebreaker No idea where packed.js is from, are you sure that isn't your dev build? I only have pack.js

Everything search for pack [space] .js, which would show up packed.js if it existed. And I haven't touched that file, I have only added MagicWizardsParser.js and edited popup.html, nothing more.

dteviot · 2024-09-22T20:23:14Z

@Darthagnon

packed.js is created when the build runs and creates the WebToEpub extension. As you're not running the build, you won't see this file on your machine.

I think the lines with the indentation problem are these:

WebToEpub/plugin/js/parsers/MagicWizardsParser.js

Lines 60 to 62 in fd8c87f

    
           titleElement = link.closest("article")?.querySelector(selector) ||  
        
                       link.closest(".article-item")?.querySelector(selector) ||  
        
                       link.closest(".details")?.querySelector(selector);

Should be

            titleElement = link.closest("article")?.querySelector(selector) || 
               link.closest(".article-item")?.querySelector(selector) || 
               link.closest(".details")?.querySelector(selector);

The line following a line ending with a || should be indented 4 more spaces.

dteviot · 2024-09-22T20:24:32Z

@Darthagnon
Give me 10 minutes, I'll run the build using your file and confirm.

dteviot · 2024-09-22T20:30:22Z

@Darthagnon

I'm wrong, @gamebeaker is correct. In my defense, it was hard to see the highlighted rows in his screenshot.
The problem is lines 29 to 32 here

WebToEpub/plugin/js/parsers/MagicWizardsParser.js

Lines 27 to 33 in fd8c87f

    
           async getChapterUrls(dom) { 
        
               let chapterLinks = []; 
        
                   chapterLinks = [...dom.querySelectorAll("article a, .article-content a, window.location.hostname, #content article a, #content .article-content a, .articles-listing .article-item a, .articles-bloc .article .details a")]; 
        
                   // Filter out author links using their URL pattern 
        
                   chapterLinks = chapterLinks.filter(link => !this.isAuthorLink(link)); 
        
                   return chapterLinks.map(this.linkToChapter); 
        
           }

gamebeaker · 2024-09-22T20:34:07Z

@Darthagnon here is how you can run lint the first time, you need npm

2024-09-22.22-37-23.mp4

@dteviot

change depending on @dteviot in dteviot#1500 (comment)

Add parser for mtgstory.com

1e81d43

MagicWizardsParser v0.7. Minor changes as requested

3f5df32

MagicWizardsParser.js v.71

bfda6d9

Improves compatibility with 2016 version of site

MagicWizardsParser.js v0.72 - Generalise title selection

fd8c87f

Also add TODO to JS. 2024 site and pre-2018 site work and are priority, as they cover all modern stories and older lost chapters. (Ancient MTG articles from pre-2014 not accounted for yet)

gamebeaker and others added 4 commits September 25, 2024 17:38

try to fix eslint MagicWizardParser

6ee4970

Merge branch 'ExperimentalTabMode' into MagicWizards

1905f9d

Update MagicWizardsParser.js

cf5dc82

change depending on @dteviot in dteviot#1500 (comment)

Add contribution Darthagnon

d79da20

gamebeaker merged commit 43b4af7 into dteviot:ExperimentalTabMode Sep 25, 2024
1 check passed

gamebeaker mentioned this pull request Sep 26, 2024

Please add https://wizards.com sites for MTG stories #1300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parser for mtgstory.com #1500

Add parser for mtgstory.com #1500

Darthagnon commented Sep 21, 2024

Darthagnon commented Sep 21, 2024

gamebeaker commented Sep 21, 2024 •

edited

Loading

dteviot commented Sep 21, 2024

dteviot commented Sep 21, 2024

dteviot commented Sep 21, 2024

gamebeaker commented Sep 22, 2024

Darthagnon commented Sep 22, 2024 •

edited

Loading

Darthagnon commented Sep 22, 2024

gamebeaker commented Sep 22, 2024

gamebeaker commented Sep 22, 2024

gamebeaker commented Sep 22, 2024

Darthagnon commented Sep 22, 2024

dteviot commented Sep 22, 2024

Darthagnon commented Sep 22, 2024

dteviot commented Sep 22, 2024

dteviot commented Sep 22, 2024

dteviot commented Sep 22, 2024

gamebeaker commented Sep 22, 2024 •

edited

Loading

Add parser for mtgstory.com #1500

Add parser for mtgstory.com #1500

Conversation

Darthagnon commented Sep 21, 2024

Darthagnon commented Sep 21, 2024

gamebeaker commented Sep 21, 2024 • edited Loading

dteviot commented Sep 21, 2024

dteviot commented Sep 21, 2024

dteviot commented Sep 21, 2024

gamebeaker commented Sep 22, 2024

Darthagnon commented Sep 22, 2024 • edited Loading

Some test pages:

Darthagnon commented Sep 22, 2024

gamebeaker commented Sep 22, 2024

gamebeaker commented Sep 22, 2024

gamebeaker commented Sep 22, 2024

Darthagnon commented Sep 22, 2024

dteviot commented Sep 22, 2024

Darthagnon commented Sep 22, 2024

dteviot commented Sep 22, 2024

dteviot commented Sep 22, 2024

dteviot commented Sep 22, 2024

gamebeaker commented Sep 22, 2024 • edited Loading

gamebeaker commented Sep 21, 2024 •

edited

Loading

Darthagnon commented Sep 22, 2024 •

edited

Loading

gamebeaker commented Sep 22, 2024 •

edited

Loading