Skip to content

TheWoops/Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping kununu.com

Little project to scrape company reviews from kununu.com with the Scrapy framework (based on Python).

"Scrapy is an application framework for crawling web sites and extracting structured data
which can be used for a wide range of useful applications, like data mining, information processing or historical archival."

Prerequisites

How to run this project

  1. Clone this repo into your scrapy folder. (where the default tutorial folder should exist after your installation)

  2. Your folder structure should look something like this:

      scrapy/kununu/
         README.md
         scrapy.cfg
         __init__.p
         kununu_project/
                 items.py
                 middlewares.py
                 pipelines.py
                 settings.py
                 spiders/
                    __init__.py
                    kununu.py
    
     scrapy/tutorial/
         scrapy.cfg
         tutorial/
                 items.py
                 ...
    
  3. Open your python CLI (I used anaconda prompt):
    3.1 Navigate into the spider folder within scrapy folder → (scrapy/kununu/kununu_project/spiders)
    3.2 Execute the following command: scrapy runspider kununu.py

  4. By default it scrapes reviews from ec4u expert consulting ag.
    You can change this by adapting the links within the "kununu.py" - Spider.

About

Scraping kununu.com with Scrapy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages