You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some months ago, we (with @ste210) started investigating the idea of extending Crossref with the DBLP dataset as part of the PatCit project.
After some time exchanging, here are the main findings (see full discussion thread here):
the DBLP dataset (w/o theses) contains 4,777,622 docs
3,900,859 of these docs have a DOI (81.7%)
3,520,018 of these DOIs are also in the CrossRef Database (90%)
It leaves a good number of relevant publications (based on conference rank) which are not covered by CrossRef but which have high quality bibliographical references from DBLP (see breakdown here)
At this point, my idea was to:
take the subset of documents which are in DBLP but not in CrossRef
map the DBLP xml objects to the crossref jsonl format - for the restriction of attributes used by biblio-glutton in the matching process
append the DBLP data (properly formated) to the Crossref database
there we go
I know that biblio-glutton was thought to be DOI-centric. That being said, the DOI is mainly used to harvest extra data from PubMed, Unpaywall, etc right? So, for the bibliographical references in the DBLP which have no DOI, we could replace the DOI value by the DBLP unique identifier. This is not very pretty but it could do the work right?
I might miss the complexity due to the internal functioning of biblio-glutton, so, let me know if you think that this is unrealistic ;)
If it sounds reasonable, I'll be happy to share the code/feedback on the hack here and on PatCit.
Thanks in advance,
Cyril
The text was updated successfully, but these errors were encountered:
Hello @here,
Thanks a lot for the great tool.
Some months ago, we (with @ste210) started investigating the idea of extending Crossref with the DBLP dataset as part of the PatCit project.
After some time exchanging, here are the main findings (see full discussion thread here):
It leaves a good number of relevant publications (based on conference rank) which are not covered by CrossRef but which have high quality bibliographical references from DBLP (see breakdown here)
At this point, my idea was to:
I know that biblio-glutton was thought to be DOI-centric. That being said, the DOI is mainly used to harvest extra data from PubMed, Unpaywall, etc right? So, for the bibliographical references in the DBLP which have no DOI, we could replace the DOI value by the DBLP unique identifier. This is not very pretty but it could do the work right?
I might miss the complexity due to the internal functioning of biblio-glutton, so, let me know if you think that this is unrealistic ;)
If it sounds reasonable, I'll be happy to share the code/feedback on the hack here and on PatCit.
Thanks in advance,
Cyril
The text was updated successfully, but these errors were encountered: