Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension: Extend Biblio-glutton to DBLP #48

Open
cverluise opened this issue Jun 11, 2020 · 0 comments
Open

Extension: Extend Biblio-glutton to DBLP #48

cverluise opened this issue Jun 11, 2020 · 0 comments
Milestone

Comments

@cverluise
Copy link

Hello @here,

Thanks a lot for the great tool.

Some months ago, we (with @ste210) started investigating the idea of extending Crossref with the DBLP dataset as part of the PatCit project.

After some time exchanging, here are the main findings (see full discussion thread here):

  • the DBLP dataset (w/o theses) contains 4,777,622 docs
  • 3,900,859 of these docs have a DOI (81.7%)
  • 3,520,018 of these DOIs are also in the CrossRef Database (90%)

It leaves a good number of relevant publications (based on conference rank) which are not covered by CrossRef but which have high quality bibliographical references from DBLP (see breakdown here)

At this point, my idea was to:

  1. take the subset of documents which are in DBLP but not in CrossRef
  2. map the DBLP xml objects to the crossref jsonl format - for the restriction of attributes used by biblio-glutton in the matching process
  3. append the DBLP data (properly formated) to the Crossref database
  4. there we go

I know that biblio-glutton was thought to be DOI-centric. That being said, the DOI is mainly used to harvest extra data from PubMed, Unpaywall, etc right? So, for the bibliographical references in the DBLP which have no DOI, we could replace the DOI value by the DBLP unique identifier. This is not very pretty but it could do the work right?

I might miss the complexity due to the internal functioning of biblio-glutton, so, let me know if you think that this is unrealistic ;)

If it sounds reasonable, I'll be happy to share the code/feedback on the hack here and on PatCit.

Thanks in advance,

Cyril

@kermitt2 kermitt2 modified the milestones: 0.3, 0.4 Apr 12, 2022
@lfoppiano lfoppiano modified the milestones: 0.3, 0.4 Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants