Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about consolidation behavior #1140

Open
oborin1 opened this issue Jul 8, 2024 · 5 comments
Open

Question about consolidation behavior #1140

oborin1 opened this issue Jul 8, 2024 · 5 comments
Labels
consolidation Issue related to consolidation and biblio-glutton/crossref external service

Comments

@oborin1
Copy link

oborin1 commented Jul 8, 2024

I have moved from CrossRef API consolidation to biblio-glutton with CrossRef database loaded as described in its documentation few weeks ago and met a difference in its behavior.
Earlier with CrossRef API for consolidation, I have received the bibtex data for the translated version of an article using its transliterated original, which is desired. I suppose that the consolidation was based on the author string and year.

<span>@</span>article{15,
  author = {Bersenev, I S and Bragin, V V and Evstyugin, S N and Petryshev, A Yu and Pigarev, S P and Pokolenko, A Yu},
  title = {Evolution of structure and metallurgical properties of iron ore pellets when fluxing with dolomite, JSC Mikhailovsky GOK named after A.V. Varichev},
  journal = {Steel in Translation},
  publisher = {Allerton Press},
  date = {2020-11},
  year = {2020},
  month = {11},
  pages = {788-794},
  volume = {50},
  number = {11},
  doi = {10.3103/s0967091220110054},
  raw = {16. I.S. Bersenev, V.V. Bragin, S.N. Evstyugin i dr. Evolyutsiya struktury i metallurgicheskikh svoistv zhelezorudnykh okatyshei AO «MGOK im. A.V. Varicheva» pri oflyusovanii dolomitom // Stal'. 2020. № 11. S. 11 – 17.}
}

Unfortunately, biblio-glutton now yields a different result:

<span>@</span>article{0,
  author = {Bersenev, I S and Bragin, V V and Evstyugin I Dr, S N},
  title = {Evolyutsiya struktury i metallurgicheskikh svoistv zhelezorudnykh okatyshei AO},
  journal = {MGOK im. A.V. Varicheva» pri oflyusovanii dolomitom // Stal},
  date = {2020},
  year = {2020},
  pages = {11--17},
  volume = {11},
  raw = {I.S. Bersenev, V.V. Bragin, S.N. Evstyugin i dr. Evolyutsiya struktury i metallurgicheskikh svoistv zhelezorudnykh okatyshei AO «MGOK im. A.V. Varicheva» pri oflyusovanii dolomitom // Stal'. 2020. № 11. S. 11 – 17}
}

How can I adjust the consolidation behavior of the biblio-glutton method?

If needed, my OS is Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-105-generic x86_64) and my java version is 17.0.6.

The consolidation with biblio-glutton is still possible with the data of the translated version:

<span>@</span>article{0,
  author = {Bersenev, I S and Bragin, V V and Evstyugin, S N and Petryshev, A Yu.  and Pigarev, S P and Pokolenko, A Yu. },
  title = {Evolution of Structure and Metallurgical Properties of Iron Ore Pellets When Fluxing with Dolomite, JSC Mikhailovsky GOK Named after A.V. Varichev},
  journal = {Steel in Translation},
  publisher = {Allerton Press},
  date = {2020-11},
  year = {2020},
  month = {11},
  pages = {788-794},
  volume = {50},
  number = {11},
  doi = {10.3103/s0967091220110054},
  raw = {Bersenev, I.S., Bragin, V.V., Evstyugin, S.N., Petryshev, A.Yu., Pigarev, S.P., and Pokolenko, A.Yu., Evolution of structure and metallurgical properties of iron ore pellets when fluxing with dolomite, JSC Mikhailovsky GOK named after A.V. Varichev, Steel in Translation, 2020, vol. 50, no. 11, pp. 788-794.}
}
@lfoppiano
Copy link
Collaborator

lfoppiano commented Jul 13, 2024

Hi @oborin1, thanks for your report.

I have a few questions.

Did you obtain the original version when you process the file through grobid? Or you call directly biblio-glutton?

If in the case of Grobid, could you provide the logs?

I just tested via firstAuthor + title and biblio glutton yield the correct result so we should understand which query was sent to biblio-glutton.

Here my example (just a reference on the query, as the server might be down, as it's a on-demand GC service)

http://34.28.170.80/glutton/service/lookup?firstAuthor=Bersenev&atitle=Evolution%20of%20structure%20and%20metallurgical%20properties%20of%20iron%20ore%20pellets%20when%20fluxing%20with%20dolomite%2C%20JSC%20Mikhailovsky%20GOK%20named%20after%20A.V.%20Varichev

{"URL":"http://dx.doi.org/10.3103/s0967091220110054","resource":{"primary":{"URL":"http://link.springer.com/10.3103/S0967091220110054"}},"member":"1627","score":0.0,"created":{"date-parts":[[2021,3,11]],"date-time":"2021-03-11T15:05:38Z","timestamp":1615475138000},"update-policy":"http://dx.doi.org/10.1007/springer_crossmark_policy","license":[{"start":{"date-parts":[[2020,11,1]],"date-time":"2020-11-01T00:00:00Z","timestamp":1604188800000},"content-version":"tdm","delay-in-days":0,"URL":"http://www.springer.com/tdm"},{"start":{"date-parts":[[2020,11,1]],"date-time":"2020-11-01T00:00:00Z","timestamp":1604188800000},"content-version":"vor","delay-in-days":0,"URL":"http://www.springer.com/tdm"}],"ISSN":["0967-0912","1935-0988"],"container-title":["Steel in Translation"],"issued":{"date-parts":[[2020,11]]},"issue":"11","prefix":"10.3103","reference-count":16,"author":[{"given":"I. S.","family":"Bersenev","sequence":"first","affiliation":[]},{"given":"V. V.","family":"Bragin","sequence":"additional","affiliation":[]},{"given":"S. N.","family":"Evstyugin","sequence":"additional","affiliation":[]},{"given":"A. Yu.","family":"Petryshev","sequence":"additional","affiliation":[]},{"given":"S. P.","family":"Pigarev","sequence":"additional","affiliation":[]},{"given":"A. Yu.","family":"Pokolenko","sequence":"additional","affiliation":[]}],"DOI":"10.3103/s0967091220110054","is-referenced-by-count":8,"published":{"date-parts":[[2020,11]]},"published-print":{"date-parts":[[2020,11]]},"alternative-id":["1284"],"published-online":{"date-parts":[[2021,3,11]]},"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"title":["Evolution of Structure and Metallurgical Properties of Iron Ore Pellets When Fluxing with Dolomite, JSC Mikhailovsky GOK Named after A.V. Varichev"],"link":[{"URL":"http://link.springer.com/content/pdf/10.3103/S0967091220110054.pdf","content-type":"application/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http://link.springer.com/article/10.3103/S0967091220110054/fulltext.html","content-type":"text/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http://link.springer.com/content/pdf/10.3103/S0967091220110054.pdf","content-type":"application/pdf","content-version":"vor","intended-application":"similarity-checking"}],"source":"Crossref","type":"journal-article","publisher":"Allerton Press","journal-issue":{"issue":"11","published-print":{"date-parts":[[2020,11]]}},"volume":"50","references-count":16,"issn-type":[{"value":"0967-0912","type":"print"},{"value":"1935-0988","type":"electronic"}],"assertion":[{"value":"12 October 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 March 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"deposited":{"date-parts":[[2021,3,11]],"date-time":"2021-03-11T15:22:43Z","timestamp":1615476163000},"language":"en","page":"788-794","short-container-title":["Steel Transl."]}

@oborin1
Copy link
Author

oborin1 commented Jul 13, 2024

Hi @lfoppiano,

thank you for your reply!

In all cases I started GROBID servers and called the Python client to process the references with the corresponding changes in the server's configuration. (Having tried to get logs within my running GROBID container, I found out that logging is different in grobid.yaml and grobid-full.yaml by default; is it intended?) The grobid logs attached just say that a particular consolidation wasn't successful.
grobid-service.log

If I directly call biblio-glutton with the curl request

curl "http://localhost:8080/service/lookup?biblio=I.S.+Bersenev,+V.V.+Bragin,+S.N.+Evstyugin+i+dr.+Evolyutsiya+struktury+i+metallurgicheskikh+svoistv+zhelezorudnykh+okatyshei+AO+«MGOK+im.+A.V.+Varicheva»+pri+oflyusovanii+dolomitom+//+Stal'.+2020.+№+11.+S.+11+–+17"

it returns
{"message":"Best bibliographical record did not passed the post-validation"}

In cases with the raw string of the translated version (available in CrossRef database), it yields the correct result.

Maybe, the post-validation is the key to the solution I seek?

@lfoppiano
Copy link
Collaborator

Ok maybe now I understand better.
Are you processing a PDF document with the original bibliographic data?

Then I think crossref returns the translated version, and with biblio-glutton you don't get any condolidation because of the post-validation. The post-validation is a mechanism to avoid false positive, when results from biblio-glutton and the input are too different, therefore biblio-glutton prefer to abort the consolidation than to return wrong results.

@oborin1
Copy link
Author

oborin1 commented Jul 13, 2024

@lfoppiano,
thank you for your answers.
Is LookupEngine.java the place to dig further?
I am processing text files with reference strings, rather than PDFs. (My intention is to process reference lists for publication, so any mistakes introduced by PDFs are annoying.)
Have I understood it correctly that GROBID has no post-validation mechanism when it receives the results from the CrossRef API?

@lfoppiano
Copy link
Collaborator

Is LookupEngine.java the place to dig further?

I think so, definitely you can track it down starting from the contoller. Feel free to open a specific issue on the biblio-glutton repo.

I am processing text files with reference strings, rather than PDFs. (My intention is to process reference lists for publication, so any mistakes introduced by PDFs are annoying.)

OK so do you call directly biblio-glutton, without grobid?

Have I understood it correctly that GROBID has no post-validation mechanism when it receives the results from the CrossRef API?

Actually is the other way around, grobid is not responsible on the quality of the retrieval, so it does assume that everything that is returned is the best possible matching. When we wrote biblio-glutton we decided to do not answer, rather than answer something completely wrong. So, I would say it's a feature in biblio-glutton that I'm not sure crossref has 😉

@lfoppiano lfoppiano added the consolidation Issue related to consolidation and biblio-glutton/crossref external service label Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consolidation Issue related to consolidation and biblio-glutton/crossref external service
Projects
None yet
Development

No branches or pull requests

2 participants