Skip to content

Abandoned Go language learning project. The project works and can parse the necessary data.

Notifications You must be signed in to change notification settings

artsuhov/priceparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PRICE PARSER

Abandoned pet-project to get acquainted with the Go programming language. Some good practices are not supported here. The project is workable and ready to be used.

How to use

You need to prepare the pricewatcher.db before use the app. First, populate the shops table:

  • id - unique identifier;
  • shops.title - shop name; e.g., aliexpress, ozon, etc.
  • title_x_path, picture_x_path, price_x_path - x path of the html element that stores the data to be parsed; for example it could be taken from the google chrome html inspector.

Second, populate the items table by data you want to parse:

  • id - uniqure identifier;
  • items.title - name of the item you want to parse; e.g., ps5, iphoneN, etc.;
  • link - url to the item page in a web store;
  • shop_id - id of the record from the shops table;

Third. go run main.go Results can be found in the prices table. btw there should be the log/ directory in the main.go folder:

  • %YYYY-mm-dd_hh:mm%.log - is the output from the main.go execution;
  • %timestamp%/%shop name%/%item title%/ - directory contains received screenshot and html files of the page to be parsed;

How to deploy

There is a dockerfile I've checked only once :]

How to schedule

main function already has (but commented) usage of the cron lib for go. You can add it to depends and uncomment section from the main() function.

I used crontab on ubuntu server: crontab -e

add this line at the end: 0 */3 * * * cd root/workspace/priceparser && bash launch.sh

launch.sh with the following content:

#!/bin/bash
cd /root/workspace/priceparser/
go run main.go >> "log/$(date +%Y-%m-%d_%H:%M).log"

Dependencies

I almost forgot. chromedp requires chrome to be installed on your system :] I believe you just need to install headless-chrome for linux server. I think I used this github-gist to install this on my ubuntu server.

TODO

  • - add table to store parsed values;
  • - add connection from go to sqlite db;
  • - read/write data from/to sqlite db;
  • - receive shop lists from db;
  • - receive shop items from db;
  • - fill the database with source data that will need to be parsed;
  • - select and prepare data to parse;
  • - parse only 1 element (price w/o title, desc, etc) from the source;
  • - can't receive data from aliexpress;
  • - prepare structure to store into db parsed data;
  • - remove letters from parsed price;
  • - do not forget to store original (not filtered) price into db;
  • - add parsing data into db;
  • - store screeshot and logs paths into db;
  • - use proxy;
  • [] - deploy via docker;
  • [] - use this cron library
  • [] - can use datadog to check logs;
  • [] - add concurrent execution (no need to parse data sequentially);
  • [] - increase timeout between requests to the same domains;
  • [] - need to switch emulated-clients periodically;

About

Abandoned Go language learning project. The project works and can parse the necessary data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages