Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have GitHub and Zenodo releases synchronized #238

Open
Adafede opened this issue Aug 16, 2023 · 6 comments
Open

Have GitHub and Zenodo releases synchronized #238

Adafede opened this issue Aug 16, 2023 · 6 comments

Comments

@Adafede
Copy link

Adafede commented Aug 16, 2023

Hi,

Thank your for all your effort put in MassBank!
I was trying to access its data and realized https://github.com/MassBank/MassBank-data/releases and https://doi.org/10.5281/zenodo.3378723 are not synchrone.

This can be easily done by following https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content.

This way, each GitHub release ends up archived on Zenodo and having its DOI automatically.

Hope this makes sense!

@meier-rene
Copy link
Collaborator

Thank you for bringing this to our attention. An automatic procedure should be in place, but apparently its not working atm. I will look into this.

@meier-rene
Copy link
Collaborator

I just checked and didn't found any differences. Could you please explain a little bit more of your finding?
What I did:

@Adafede
Copy link
Author

Adafede commented Aug 16, 2023

Wow, this is a fast reply!

I actually found the different json/sql/msp files available in the releases/tag/2023.06 very convenient and they do not seem to appear on Zenodo, but maybe I missed something?

P.S.: Is there any reason for having an sql and no sqlite which would make it directly readable by MsBackendMassbank? (Or did I miss something again here?)

@meier-rene
Copy link
Collaborator

Yes, you are right. Zenodo only covers the txt files. Thats a result of the automatic zenodo release procedure of github. I dont know how to automatically attach the other release artifacts to the zenodo release.

For your second question I have no answer atm. The sql file is released for the MsBackendMassbank package, but we did not put too much effort into it. Its basically the dump of our internal data structure.
Maybe this sql file needs to be processed to an sqlite file? I need to do some research. Maybe @jorainer didnt want to create additional workload on our side? I found that script: https://github.com/rformassspectrometry/MsBackendMassbank/blob/main/inst/scripts/massbank-to-sqlite.R. If thats the case we can probably modify our scripts to create the sqlite artifact instead of the sql file.

@Adafede
Copy link
Author

Adafede commented Aug 16, 2023

👍🏼
The different "ready-to-use" files would be a plus on Zenodo (I also don't know how to attach artifacts to Zenodo releases automatically...will search a bit and come back if I find something).
I was also using the nice script of @jorainer, and we are probably many out there to do so...so generating the sqlite directly would probably indeed add some work on your side, but avoid it being replicated many times elsewhere.

@jorainer
Copy link

Note: my preferred way to access/use MassBank data in R is through AnnotationHub:

library(AnnotationHub)
ah <- AnnotationHub()
query(ah, "MassBank")
AnnotationHub with 3 records
# snapshotDate(): 2023-06-23
# $dataprovider: MassBank
# $species: NA
# $rdataclass: CompDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH107048"]]' 

             title                                
  AH107048 | MassBank CompDb for release 2021.03  
  AH107049 | MassBank CompDb for release 2022.06  
  AH111334 | MassBank CompDb for release 2022.12.1

So, as for now there are these 3 releases available through AnnotationHub. To use one of them:

mb <- ah[["AH107049"]]
mb
class: CompDb 
 data source: MassBank 
 version: 2022.06 
 organism: NA 
 compound count: 90190 
 MS/MS spectra count: 90190 

This CompDb can be used directly with Spectra (i.e. Spectra(mb) would get you all MS2 spectra). Besides being available through AnnotationHub, the resource (sqlite file) gets also locally cached. So, first time downloaded, and any subsequent use will load it from the local cache.

There's however a manual step involved - since I need to convert the MassBank data structures into a CompDb SQLite (using this script) and then also to upload and maintain these releases in Bioconductor's AnnotationHub... but I think that this should simplify usage of MassBank in R tremendously. Long term goal is to provide also other annotation resources (as CompDb?) through AnnotationHub...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants