Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Katana doesn't seem to work when trying to scrape certain dynamically loaded pages, especially those that have a dynamic map #520

Open
PixelNinja2023 opened this issue Jul 17, 2023 · 0 comments
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@PixelNinja2023
Copy link

katana version: v1.0.2

Current Behavior:

The body of text being returned when scraping this specific page doesn't return any of the information related to the petrol stations/ shops.

Expected Behavior:

Since katana operates using playwright I assume that by loading the page with javascript with the correct input parameters it would return the entire body of text as if I were loading it with a normal browser.

Steps To Reproduce:

I insert the following command into my terminal to scrape the following page of its body of text:

/root/go/bin/katana -timeout 10 -headless -d 2 -flc /config.yaml -f address
-u https://www.oil-tankstellen.de/tankstellen-tankstationen/tankstellenfinder-kraftstoffpreise-benzinpreise/ -no-incognito -ns -jc -ct 10

where the config.yaml file has the following basic regex rule to extract the entire page body:

  • name: address
    type: regex
    regex:
    • '(.*)'
      group: 1

Is there anyway I can scrape the entire page with all the information for all the Oil! petrol stations using katana?

@PixelNinja2023 PixelNinja2023 added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

1 participant