Skip to content

A standalone crawler that crawls only .gov.si web sites using Playwright.

License

Notifications You must be signed in to change notification settings

mevljas/gov.si-crawler-playwright

Repository files navigation

Gov.si crawler playwright

A standalone crawler that crawls only .gov.si web sites using Playwright.

Project setup

Setup environment variables

cp .env.example .env

Edit .env file if necessary. Number of threads can be set using the N_THREADS parameter.

Run Docker Postgres database

docker-compose up -d ieps-db

Create and use virtual env

pip install virtualenv
python<version> -m venv <virtual-environment-name>
source env/bin/activate

Alternatively you can set it up using Pycharm.

Install requirements

pip install -r requirements.txt

Install Playwright browsers (chromium, firefox, webkit)

playwright install

Run database migrations

python migrate.py

Run the crawler

python main.py

PgAdmin (optional)

You can run PgAdmin Docker container with the following command:

docker-compose up -d pgadmin

Access the pgadmin4 via your favorite web browser by visiting the URL. Use the [email protected] as the email address and root as the password to log in.

About

A standalone crawler that crawls only .gov.si web sites using Playwright.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  

Languages