Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New check to detect terms with same MetaCyc/KEGG xrefs #28146

Open
sjm41 opened this issue Jun 14, 2024 · 10 comments · May be fixed by #28252
Open

New check to detect terms with same MetaCyc/KEGG xrefs #28146

sjm41 opened this issue Jun 14, 2024 · 10 comments · May be fixed by #28252

Comments

@sjm41
Copy link
Contributor

sjm41 commented Jun 14, 2024

Hi Jim

Could you set up a check to initially flag, and subsequently prevent, GO terms having identical MetaCyc or KEGG term xrefs?
This check would be equivalent to the one you've made for duplicate EC xrefs alluded to here: #26235

Currently, there are not many KEGG xref duplicates - see #28133 for list, which are either already fixed or in the process of being fixed.

Currently, there are ~117 MetaCyc xref duplicates (https://docs.google.com/spreadsheets/d/1NihWOJ3pfG05Jon0KUrIj4xK_uj3uHs42--WaMA1RxM/edit?gid=0#gid=0).
I'll be working through these in the coming week.

@pgaudet

@sjm41 sjm41 added this to To do in GO-EC-RHEA xref alignment via automation Jun 14, 2024
@sjm41
Copy link
Contributor Author

sjm41 commented Jun 17, 2024

I also found a case where the same MetaCyc ID appeared twice on a single GO term, both as a narrowMatch xref and an un-typed ref. I would be good if the new check also flagged these types of case.

id: GO:0016805
name: dipeptidase activity
namespace: molecular_function
alt_id: GO:0102008
def: "Catalysis of the hydrolysis of a dipeptide." [https://www.ebi.ac.uk/merops/about/glossary.shtml#DIPEPTIDASE, PMID:19879002]
synonym: "cytosolic dipeptidase activity" NARROW []
xref: EC:3.4.13.-
xref: EC:3.4.13.18 {source="skos:narrowMatch"}
xref: EC:3.4.13.21 {source="skos:narrowMatch"}
xref: MetaCyc:3.4.13.18-RXN {source="skos:narrowMatch"}
xref: MetaCyc:3.4.13.18-RXN

xref: MetaCyc:3.4.13.21-RXN {source="skos:narrowMatch"}
is_a: GO:0008238 ! exopeptidase activity

@balhoff balhoff linked a pull request Jun 21, 2024 that will close this issue
@pgaudet pgaudet moved this from To do to In progress in GO-EC-RHEA xref alignment Jun 25, 2024
@sjm41
Copy link
Contributor Author

sjm41 commented Jul 16, 2024

Thanks @balhoff

For the first issue:
MetaCyc:BUTYRATE--COA-LIGASE-RXN, MetaCyc:,http://purl.obolibrary.org/obo/GO_0120515;http://purl.obolibrary.org/obo/GO_0031956

I don't see BUTYRATE--COA-LIGASE-RXN on GO:0031956.
Is that error a false positive, or is there some other issue there?

@balhoff
Copy link
Member

balhoff commented Jul 16, 2024

@sjm41 sorry! This is another example of the confusing automatic move of xrefs from obsolete terms to their replacements (planning to fix that soon). It's on GO:0047760.

@sjm41
Copy link
Contributor Author

sjm41 commented Jul 16, 2024

OK, I've fixed 4 of the remaining cases. The other 4 will be fixed by the proposed merges in #28503

@sjm41
Copy link
Contributor Author

sjm41 commented Jul 17, 2024

@balhoff I believe your associated PR should now complete.

BUT BEFORE YOU DO THAT (!), I am wondering if this check for the same MetaCyc/KEGG database xref could be extended to also check for any terms that share the same MetaCyc/KEGG definition xref? Is that possible? If so, would it work best to bundle it with this current check?
I'm not sure whether this will throw up any additional cases, but if it does, then they should be examined and fixed.

@balhoff
Copy link
Member

balhoff commented Jul 17, 2024

@sjm41 the current check does not compare definition xrefs. It would complicate the query to do both in one check, so I think we should make an issue to implement a one-to-one check for definition xrefs and tackle that after going ahead and merging this check. How does that sound?

@sjm41
Copy link
Contributor Author

sjm41 commented Jul 17, 2024

Sounds good, thanks!

@sjm41 sjm41 closed this as completed Jul 17, 2024
GO-EC-RHEA xref alignment automation moved this from In progress to Done Jul 17, 2024
@sjm41 sjm41 reopened this Jul 17, 2024
GO-EC-RHEA xref alignment automation moved this from Done to In progress Jul 17, 2024
@balhoff
Copy link
Member

balhoff commented Jul 17, 2024

@sjm41 one direction of the check passes now ("one-to-one-xrefs-by-value") but the other direction ("one-to-one-xrefs-by-subject") still has many violations. I put both of these in the same PR. Should I split it up and get the first check merged?

@sjm41
Copy link
Contributor Author

sjm41 commented Jul 17, 2024

Yes please, splitting them into two sounds like an excellent idea!
I make myself a ticket or two to look at the other violations... see #28526 and #28527

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

2 participants