Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should do sanity checks on uploaded bibtex files. #23

Closed
kmccurley opened this issue Apr 30, 2023 · 4 comments
Closed

We should do sanity checks on uploaded bibtex files. #23

kmccurley opened this issue Apr 30, 2023 · 4 comments
Assignees

Comments

@kmccurley
Copy link
Member

If something is uploaded for iacrcc.cls, then there is a check that runs on the metadata of citations to make sure that everything has a DOI. For other document classes, we could run a bibtex parser and check the quality of references there. Since the bibtex file may contain unused references, we should extract things like \citation{galindo2021fully} from the main.aux file and check only those references.

@kmccurley kmccurley self-assigned this Apr 30, 2023
@kmccurley
Copy link
Member Author

Every time I try to do something with BibTeX I am reminded of what it is like to work with stone tools. It turns out that there is no formal grammar for the BibTeX file format, and the only definition is in the code. Many people have tried to write bibtex parsers, with varying degrees of success.

  • bibpy is one attempt. It mangles some non-ascii characters
  • pybtex looks promising, and it's what cryptobib uses. It also has problems.
  • biblib (note: not the one from pypi) claims to be the only one that implements the correct grammar defined in the bibtex binary. It no longer works in python 3.10 and has not been updated in ten years.
  • bibtexparser is at least maintained, but it appears to also have problems.

This reminds me why we didn't try to parse bibtex directly. How do you parse a format that is described only by a binary?

@kmccurley
Copy link
Member Author

Note: bibtexparser will not parse cryptobib because it fails with @string{acisped = ""}

@kmccurley
Copy link
Member Author

The iacrcc.cls style has switched to using alphaurl bibliography style and iacrcc.bst is being dropped. As a result, we no longer generate bibliographic references in an ad-hoc format from the iacrcc.bst style. This means that the citations element of metadata/Compilation:Meta is no longer needed, and we can instead just capture the bibtex references that are being used. This was mentioned in this issue where it was suggested that we can use either bibexport or pybtex to extract the original bibtex entries.

Bibliographic entries need to be converted into other formats:

  1. HTML for the web pages.
  2. XML for crossref and/or JATS.
  3. XML for XMP when we use an extended schema.
    There will undoubtedly be issues in converting to these, but I think it's best to just store the original BibTeX entries and solve problems in converting them to other formats.

@kmccurley
Copy link
Member Author

The code now uses a combination of bibexport and pybtex to extract and check the bibtex entries in webapp/metadata/meta_parse.py. It also uses pybtex to conver the references to both JATS and crossref format.

The extraction of bibtex entries is tricky when the author uses biblatex because bibexport only supports bibtex output. I fake it by parsing the main.bcf file (it's XML) and creating a fake main.aux to parse with bibexport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant