Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A noticeable number of pages are not correctly loaded #7

Open
stefanmagureanu opened this issue Oct 28, 2021 · 1 comment
Open

A noticeable number of pages are not correctly loaded #7

stefanmagureanu opened this issue Oct 28, 2021 · 1 comment

Comments

@stefanmagureanu
Copy link
Contributor

There are a couple of thousand pages (less than 10k - but still many) that do not load properly. It is hard to tell whether it is because the MHTML doesn't render correctly in the WTL browser or because the MHTML itself was not saved properly.

@stefanmagureanu
Copy link
Contributor Author

One way of identifying pages that are guaranteed to load properly is to use only pages that have corresponding screenshots in the screenshot dataset. This will exclude more pages than just the ones that do not load properly as WTL snapshots. Around 10k pages do not render appropriately for computer vision applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant