We should do sanity checks on uploaded bibtex files. #23

kmccurley · 2023-04-30T22:29:43Z

If something is uploaded for iacrcc.cls, then there is a check that runs on the metadata of citations to make sure that everything has a DOI. For other document classes, we could run a bibtex parser and check the quality of references there. Since the bibtex file may contain unused references, we should extract things like \citation{galindo2021fully} from the main.aux file and check only those references.

The text was updated successfully, but these errors were encountered:

kmccurley · 2023-05-01T21:33:09Z

Every time I try to do something with BibTeX I am reminded of what it is like to work with stone tools. It turns out that there is no formal grammar for the BibTeX file format, and the only definition is in the code. Many people have tried to write bibtex parsers, with varying degrees of success.

bibpy is one attempt. It mangles some non-ascii characters
pybtex looks promising, and it's what cryptobib uses. It also has problems.
biblib (note: not the one from pypi) claims to be the only one that implements the correct grammar defined in the bibtex binary. It no longer works in python 3.10 and has not been updated in ten years.
bibtexparser is at least maintained, but it appears to also have problems.

This reminds me why we didn't try to parse bibtex directly. How do you parse a format that is described only by a binary?

kmccurley · 2023-05-02T00:14:39Z

Note: bibtexparser will not parse cryptobib because it fails with @string{acisped = ""}

kmccurley · 2023-07-24T17:46:45Z

The iacrcc.cls style has switched to using alphaurl bibliography style and iacrcc.bst is being dropped. As a result, we no longer generate bibliographic references in an ad-hoc format from the iacrcc.bst style. This means that the citations element of metadata/Compilation:Meta is no longer needed, and we can instead just capture the bibtex references that are being used. This was mentioned in this issue where it was suggested that we can use either bibexport or pybtex to extract the original bibtex entries.

Bibliographic entries need to be converted into other formats:

HTML for the web pages.
XML for crossref and/or JATS.
XML for XMP when we use an extended schema.
There will undoubtedly be issues in converting to these, but I think it's best to just store the original BibTeX entries and solve problems in converting them to other formats.

kmccurley · 2023-09-27T18:22:02Z

The code now uses a combination of bibexport and pybtex to extract and check the bibtex entries in webapp/metadata/meta_parse.py. It also uses pybtex to conver the references to both JATS and crossref format.

The extraction of bibtex entries is tricky when the author uses biblatex because bibexport only supports bibtex output. I fake it by parsing the main.bcf file (it's XML) and creating a fake main.aux to parse with bibexport.

kmccurley self-assigned this Apr 30, 2023

kmccurley closed this as completed Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We should do sanity checks on uploaded bibtex files. #23

We should do sanity checks on uploaded bibtex files. #23

kmccurley commented Apr 30, 2023

kmccurley commented May 1, 2023

kmccurley commented May 2, 2023

kmccurley commented Jul 24, 2023

kmccurley commented Sep 27, 2023

We should do sanity checks on uploaded bibtex files. #23

We should do sanity checks on uploaded bibtex files. #23

Comments

kmccurley commented Apr 30, 2023

kmccurley commented May 1, 2023

kmccurley commented May 2, 2023

kmccurley commented Jul 24, 2023

kmccurley commented Sep 27, 2023