-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCBI Taxon ID optimalisation #54
Conversation
@bedroesb nice one, you beat to it! I was about to push the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine but I think the 'if' block above is affected (line 84/84)
and then again at where create_isa_characteristic function is invoked at line 118 and line 122
for key in mapping_dictionnary:
if key in all_germplasm_attributes and all_germplasm_attributes[key]:
c = self.create_isa_characteristic(
mapping_dictionnary[key], str(all_germplasm_attributes[key]),"","")
else:
c = self.create_isa_characteristic(
mapping_dictionnary[key], "", "", "")
Well it was a small thingy so sorry about that ;) I don't really see a problem with the if block to be honest. I know that I need to change the block when taxonIDs are given through BrAPI. |
@bedroesb if the if block, in order to be consistent, I think we need to make sure we use a similar pattern: sorry didn't test |
The attribute taxonId looks like this:
So the problem is how to handle the URI when it is not a NCBI taxon. If I assume it is always NCBI taxon ID, than it is an easy thing to implement indeed |
I guess we can just look for a sourceName == ncbiTaxon, and than take the one that is delivered by taxonId, otherwise use the implementation (using the genus and species) |
right but I can't remember now of top of my head if that situation (multiple taxonIds) occurs when there is one species+genus and the multiple taxonIds refer to a listing of 'alternate identifiers' for the same organism either way, concatenation resulting from the multiple entries will not be necessarily pretty in a tabular format. |
true that, I am changing it |
@proccaserra I've made a new function to make things more logic. I will add some documentation to it |
WUR endpoint delivered the URI link as taxonId, while the Portuguese one gave the NCBI ID itself, but this is handled in the script now. |
Take another look at the crosslinked issue on the MIAPPE side. I am not sure that this is the best option, so let's still consider some alternatives! |
So you propose an extra column called Characteristics[NCBI] with the NCBI id ? Not a problem at all to implement |
Does this look like a good output?:VIB:
PT:
WUR
|
Please check my post on the related issue on the MIAPPE github: MIAPPE/ISA-Tab-for-plant-phenotyping#17 (comment) If the goal is for BrAPI2ISA to generate MIAPPE-compliant ISA-Tab, then what I said there holds here as well. We should not be modeling Organism in a way that differs from the MIAPPE 1.1 checklist, even if that means we cannot use some of the functionalities from ISA. |
@DanFaria
columns) But with NCBITAXON:xxxx instead of NCBI:xxxx, for the Characteristics[Organism] column. |
@bedroesb Eliana has already posted an issue on the MIAPPE checklist to update the NCBI prefix to NCBITAXON, and hopefully that can be done still within the MIAPPE 1.1 release, as it is a non-functional change. |
@bedroesb @DanFaria I guess the ambiguity lies in the fact that for MIAPPE so may be a minor change would be to use 'organism ID' in both MIAPPE and the ISA configuration to remove that uncertainty. |
I agree that this would make the field more intuitive. I'll raise the issue on the MIAPPE checklist, and if approved, we can update the ISA configuration. |
WUR:
Pt:
VIB:
Of which the VIB one has the solved treatments problem mentioned before |
Off the top of my head, I don't recall any of the WUR germplasm having (Also, the format for Spatial Distribution has been changed from using square brackets to colons, i.e. from |
I double checked, and indeed our database has some entries with that germplasm name. Apologies! |
No problem! I just updated the examples in my previous post with the latest code changes concerning Characteristics[Spatial Distribution] |
New format:
generated with: