-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Missing ads on news sites #266
Comments
I’m using the beta.browsertrix GUI v. 1,8* with no blocking of ads and I can’t change crawling browser. To me the ads replay seems much better than for a year ago. During crawl I can see all the ads in the crawl windows, so the crawler sees the ads.
Some of the ads are replayed fine, but not all.
I think, it’s “only” a question about harvesting url’s and replay 😊.
Best regards
Tue
|
If you download https://beta.browsertrix.cloud/orgs/kb/items/crawl/manual-20240323083932-bb9b135d-357?workflowId=bb9b135d-3573-4901-bdef-a80d35a15741#files:~:text=20240323084140064%2Dbb9b135d%2D357%2D0.wacz |
Browsertrix Cloud Version
v1.8.0-beta.4-7d985a9
What did you expect to happen? What happened instead?
Missing ads on most used news sites.
replay of news sites are missing most of the ads - some are traced with Archived Page Not Found or not displayed and a few displayed. All ads can be seen in watch crawl window.
Step-by-step reproduction instructions
e.g.
politiken.dk
crawl: "pol frontpage with all context"
https://beta.browsertrix.cloud/orgs/netarkivet-det-kgl-bibliotek/items/crawl/sched-bb9b135d-357-28341060?workflowId=bb9b135d-3573-4901-bdef-a80d35a15741#replay
Archived Page Not Found
Sorry, this page was not found in this archive:
https://0e9755db0ca066211b5983705fdb4922.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html?n=2
tv2.dk
crawl: tv2.dk frontpage complete context incl. ads
https://beta.browsertrix.cloud/orgs/netarkivet-det-kgl-bibliotek/items/crawl/manual-20231118064936-03e01f26-37d?workflowId=03e01f26-37dd-4fa6-880f-db7bd6dd6679
berlingske.dk frontpage with context
crawl: https://beta.browsertrix.cloud/orgs/netarkivet-det-kgl-bibliotek/items/crawl/manual-20231118095211-a4e6bc32-473?workflowId=a4e6bc32-4733-4a3f-8231-43b6df1c4031#replay
Additional details
No response
The text was updated successfully, but these errors were encountered: