Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clinical Trials KP edge predicates are malformed - double 'biolink' prefix - e.g., "biolink:biolink_treats" #416

Open
amykglen opened this issue Oct 5, 2024 · 1 comment
Assignees

Comments

@amykglen
Copy link
Member

amykglen commented Oct 5, 2024

I noticed that KG2.10.1 includes three predicates that have a double 'biolink' prefix of sorts:

MATCH p=()-[e:`biolink:biolink_in_clinical_trials_for`|:`biolink:biolink_mentioned_in_trials_for`|:`biolink:biolink_treats`]->() RETURN distinct e.predicate, count(distinct e)
e.predicate count(distinct e)
"biolink:biolink_in_clinical_trials_for" 13459
"biolink:biolink_treats" 3558
"biolink:biolink_mentioned_in_trials_for" 14215

these appear to have come from the Clinical Trials KP ingest - not sure if the double biolink situation was already present in the version of their data we consumed, or something added during the KG2pre build process...

@amykglen amykglen added the bug Something isn't working label Oct 5, 2024
@amykglen amykglen changed the title Predicates with double 'biolink' prefix in KG2.10.1 - e.g., "biolink:biolink_treats" Clinical Trials KP edge predicates are malformed - double 'biolink' prefix - e.g., "biolink:biolink_treats" Oct 17, 2024
@saramsey saramsey self-assigned this Oct 17, 2024
saramsey added a commit that referenced this issue Oct 17, 2024
@saramsey
Copy link
Member

saramsey commented Oct 17, 2024

OK, commit 77db2e6 should fix the issue.

Excerpt of the file clinicaltrialskg_tsv_to_kg_jsonl-edges.jsonl before the fix:

{"domain_range_exclusion": false, 
 "id": "CHEBI:10023---biolink:biolink_in_clinical_trials_for---None---None---None---HP:0012531---ClinicalTrialsKG:", 
 "negated": false, 
 "object": "HP:0012531", 
 "predicate": null, 
 "primary_knowledge_source": "ClinicalTrialsKG:", 
 "publications": [], 
 "publications_info": {}, 
 "qualified_object_aspect": null, 
 "qualified_object_direction": null, 
 "qualified_predicate": null, 
 "relation_label": "biolink:in_clinical_trials_for", 
 "source_predicate": "biolink:biolink_in_clinical_trials_for", 
 "subject": "CHEBI:10023", 
 "update_date": "2018-06-15"}

and after the fix:

{"domain_range_exclusion": false, "id": "CHEBI:10023---biolink:in_clinical_trials_for---None---None---None---HP:0012531---ClinicalTrialsKG:", 
 "negated": false, 
 "object": "HP:0012531", 
 "predicate": null, 
 "primary_knowledge_source": "ClinicalTrialsKG:", 
 "publications": [], 
 "publications_info": {}, 
 "qualified_object_aspect": null, 
 "qualified_object_direction": null, 
 "qualified_predicate": null, 
 "relation_label": "in_clinical_trials_for", 
 "source_predicate": "biolink:in_clinical_trials_for", 
 "subject": "CHEBI:10023", 
 "update_date": "2018-06-15"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants