Fixed search failure due to unexpected parser state #300
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In many plugins the HTML parser's state isn't reset between pages. It is initialized once and then feed() is called multiple times.
This means that if a page ends in a weird state (eg in the middle of a row because truncated or temporary error or unexpected html), all following pages would fail to find results.
torrentproject noticed the issue and overrode feed() to reset some of its state between pages.
This PR changes the logic to create a new parser for each page. There is no reason not to (creating a parser isn't slow or anything).
Multi-page support was also updated to keep searching until less/no results are found in a page (up to 5). This is in contrast to previously where a plugin would check the page size (unreliable) or extract page links (unreliable because sometimes they truncate the links list like
[1] [2] ... [9] [10]
)