New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: add gmaps scraping blog #2772

Open

souravjain540 wants to merge 3 commits into master from gmaps

Collaborator

souravjain540 commented Dec 13, 2024

approved by adam and marketing.

souravjain540 added 2 commits

December 13, 2024 11:14


          add new author

a0bb0c6


          add blog

0b3f422

souravjain540 requested a review from vdusek

December 13, 2024 06:09

fix

4ef1b8e

janbuchar requested changes

View reviewed changes

Contributor

janbuchar left a comment

Reviewed the article, it is very good and reads well, but it doesn't use the full potential of Crawlee in some places - let's improve that 🙂

...aps-using-python/img/scrape-google-maps-with-crawlee-screenshot-hotel-analysis-dashboard.gif

Contributor

janbuchar Dec 13, 2024

this is huge, isn't there a more adequate format than gif?

Collaborator Author

souravjain540 Dec 13, 2024

i don't know. any suggestions? gif kinda fits here

...-google-maps-using-python/img/scrape-google-maps-with-crawlee-screenshot-connect-to-page.png

Contributor

janbuchar Dec 13, 2024

Could this be webp as well?

Collaborator Author

souravjain540 Dec 13, 2024

yes

website/blog/2024/12-13-scrape-google-maps-using-python/index.md

Comment on lines +167 to +169

+                  @crawler.router.default_handler
+                  async def default_handler(context):
+                      await scrape_google_maps(context)

Contributor

janbuchar Dec 13, 2024

the default_handler does not make much sense here...

Suggested change

      
                @crawler.router.default_handler
          
                async def default_handler(context):
          
                    await scrape_google_maps(context)
          
                crawler.router.default_handler(scrape_google_maps)

This should be enough if you want to keep the handler definition outside of the main function.

website/blog/2024/12-13-scrape-google-maps-using-python/index.md

+                  """
+                  page = context.page
+                  await page.goto(context.request.url)
+                  print("Connected to:", context.request.url)

Contributor

janbuchar Dec 13, 2024

Suggested change

      
                print("Connected to:", context.request.url)
          
                print("Processing: ", context.request.url)

website/blog/2024/12-13-scrape-google-maps-using-python/index.md

Comment on lines +285 to +287

+                      # Pretty-print the data
+                      print(json.dumps(data, indent=4))
+                      print("\n")

Contributor

janbuchar Dec 13, 2024

This is not how it's supposed to be done - it'd be better to use context.push_data(data)

website/blog/2024/12-13-scrape-google-maps-using-python/index.md

Comment on lines +370 to +371

		with open('google_maps_data.json', 'w', encoding='utf-8') as f:
		json.dump(all_data, f, ensure_ascii=False, indent=2)

Contributor

janbuchar Dec 13, 2024

If you use the default dataset for this, you can simply do crawler.export_data_json('path', ensure_ascii=False, indent=2)

website/blog/2024/12-13-scrape-google-maps-using-python/index.md

+              First, we need a function that can handle the scrolling and detect when we've hit the bottom. Copy-paste this new function in the `gmap_scraper.py` file:
+              ```python
+              async def load_more_items(page) -> bool:

Contributor

janbuchar Dec 13, 2024

The article does not mention where the function should be called.
Crawlee already has context.infinite_scroll() - does it not work in this case?

vdusek changed the title ~~docs: add gmaps scraping blog.~~ docs: add gmaps scraping blog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet