Skip to content

NotCompsky/rscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rscraper

Docker Images

Description

RScraper is a family of independent tools including a scraper, browser addon, and chart generators.

Taster

Components

  • rtagger addon - the browser addon for tagging Reddit users
  • tagger - the server for the browser addon addon
  • hub - a GUI manager for the database and configuring the scraper
  • init - one-off helper tools to initialse the database
  • scraper - tool for scraping data from Reddit
  • io - import/export tools (as an alternative to scraping Reddit yourself)
  • man - UNIX man pages
  • utils - CLI database admin tools

Tagger

To install the rtagger browser addon, you do not need to install any of these packages; only the addon (or Javascript script) is necessary. Only the server needs to install (and run) the rscraper-tagger package.

Even the server doesn't need any packages other than that one, though whoever is managing the server will want to install either the rscraper-io or rscraper-scraper packages to populate the database, and the rscraper-gui package for managing the database, and the rscraper-init package to initialise the database.

Usage

See hub usage guide for detailed instructions on using rscraper-hub.

See man directory for more generic instructions on using the other programs.

Platforms

Debian-based systems can use the deb installer packages in the releases page - amd64 for x86_64 systems (most laptops and desktops), armhf for 64bit arm (e.g. Raspberry Pi). I have tested it on Ubuntu, Raspbian, and Debian. Other (up to date) Debian-based distros should also work.

It should work on MacOS and other Linux distros too. I just don't have access to such systems, so currently the only option for these systems is to build from source.

Windows support is pending someone more knowledgeable about Windows builds helping out.

Installing

Ubuntu, Raspbian, and other Debian-based systems

First install libcompsky:

regexp="https://github\.com/NotCompsky/libcompsky/releases/download/[0-9]+\.[0-9]+\.[0-9]+/libcompsky-[0-9]+\.[0-9]+\.[0-9]+-$(dpkg --print-architecture)\.deb"
url=$(curl -s https://api.github.com/repos/NotCompsky/libcompsky/releases/latest  |  egrep "$regexp" | sed 's%.*"\(https://.*\)"%\1%g')
wget -O /tmp/libcompsky.deb "$url"
sudo apt install /tmp/libcompsky.deb

Then set the array of packages you wish to install (init is not required but the configuration guide assumes it is installed)

Then download the packages you want from the releases page.

Then see the configuration guide.

If installation still fails for some reason, see installing on Ubuntu (and also make a bug report).

Windows 10

Not supported yet, but very open to PRs. Some weeks ago it cross-compiled fine, so there shouldn't be many changes to the source code required to build it on or for Windows.

The big hurdle to build for Windows is doing one of the following:

  • Modifying CMake to cross-compile on MXE for Windows
  • Convert the CMake to pro files for qmake
  • Convert the CMake to work with Visual Studio files

The person who issues a PR to allow building for Windows will get a big recognition at the top of the page here. Create an issue if you want to discuss with me the steps I took in cross-compiling test versions.

Building

See BUILDING.md

ROADMAP

This is still in active development, so expect quite a few things to change.

What should stay the same is the database structure. Purely aesthetic changes - such as the names of columns - will not be made.

Backwards-incompatible changes are very unlikely in the database structure (defined in init.sql), tagger, init and io, and unlikely in utils.

Features may be added in particular to rscraper-hub.