Skip to content

Latest commit

 

History

History
59 lines (48 loc) · 2.33 KB

README.md

File metadata and controls

59 lines (48 loc) · 2.33 KB

sh2-dblp-ranking

Code for ranking conferences of dblp by urgency of ingestion, and evaluating the result based on a gold standard.

Part of the DFG-funded research project "Smart Harvesting II".

An earlier version of this code has been used to produce the following publication:

    @inproceedings{neumann2018prioritizing,
        title = {Prioritizing and Scheduling Conferences for Metadata Harvesting in dblp},
        author = {Neumann, Mandy and Michels, Christopher and Schaer, Philipp and Ralf, Schenkel},
        booktitle = {JCDL '18 Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries },
        doi = {10.1145/3197026.3197069},
        eventdate = {June 03 - 07, 2018},
        eventtitle = {18th ACM/IEEE on Joint Conference on Digital Libraries},
        venue = {Fort Worth, Texas, USA},
        isbn = {978-1-4503-5178-2},
        pages = {45-48},
        publisher = {ACM},
        address = {New York, NY, USA},
        url = {https://dl.acm.org/citation.cfm?doid=3197026.3197069},
        year = 2018
    }

A preprint of this paper is available on arXiv.

Prerequisites

The execution of this workflow relies on the existence of a database containing metadata related to the conferences like event dates, ingestion dates, publishing authors etc. This database is created from dblp data with the help of dblp-internal software that is not publicly available.

To run the code, you would need to set up your own database, following a specific schema (see below). Then, create configuration files according to the provided templates to specify database connection parameters.

Database schema

The application expects a table that can be created with the following DDL:

CREATE TABLE dblp_stream_scores (
    stream_key varchar(255),
    affil_score numeric,
    citation_score numeric,
    intl_score numeric,
    prominence_score numeric,
    size_score numeric,
    rating_score numeric,
    log_score_y2018m03 numeric,
    log_score_y2018m04 numeric,
    log_score_y2018m05 numeric,
    log_score_y2018m06 numeric,
    log_score_y2018m07 numeric,
    log_score_y2018m08 numeric,
    log_score_y2018m09 numeric,
    log_score_y2018m10 numeric,
    log_score_y2018m11 numeric)