-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Requested feature] Interaction with identifiers.org
API Web Services
#59
Comments
@M-casado From your issue:
It seems like identifier.org will resolve/redirect even if it was give a invalid identifer with the correct format. This request will be redirected to the resource page and there will be different mechanisms by each resource to handle non-existing identifers. Any thoughts on this? |
It's basically what I was envisioning, but more difficult, perhaps:
I was hoping that the responses could be aggregated and interpreted easily (e.g. 200 meaning a record exists, etc.; anything else meaning it failed). In our use-case, being able to know through identifiers.org if the format of another identifier is correct (#61) is only half of the problem. My hopes:
Besides, at least in my use-case, the content of the record is not relevant, just that the record exists. So hopefully all archives respond in a similar way when a record is missing and is unresolvable (?) |
The problem is each archive have a different way of responding if record is not there. If all responded with 404 it would be possible. So for your example arrayexpress:E-MEXP-17121, Array Express returns with 200 and a HTML page. I believe this is a responsibility of the identifiers.org to return if resource actually exists. As users of their API, it is out of our scope to infer beyound what they provide. We can contact them and ask, if this is possible. |
That's a shame, would be amazing to have that feature working at some point. We should definitely ask identifiers.org if there is a way to do so. |
I took a quick look at their API documentation and they have this Validate Sample ID section, but I believe it's probably just what you were doing already, right? The fact that it's a validation of the ID doesn't mean, I guess, that there's an existing record behind the ID. |
That seems to be the correct API to use, but as you suspected, it is not working correctly if archives are providing wrong HTTP status codes. Check below two examples with invalid IDs in both requests: # responds with error message: Id does not exist
curl -X POST "https://registry.api.identifiers.org/prefixRegistrationApi/validateSampleId" -H "accept: */*" -H "Content-Type: application/json" -d '{
"apiVersion": "1.0",
"payload": {
"sampleId": "SAMEA23976766",
"providerUrlPattern": "https://www.ebi.ac.uk/biosamples/samples/{$id}"
}
}'
# responds with VALIDATION OK
curl -X POST "https://registry.api.identifiers.org/prefixRegistrationApi/validateSampleId" -H "accept: */*" -H "Content-Type: application/json" -d '{
"apiVersion": "1.0",
"payload": {
"sampleId": "E-MEXP-171211",
"providerUrlPattern": "https://www.ebi.ac.uk/biostudies/arrayexpress/studies/{$id}"
}
}' This might also delay the validation considerably given the addition of 2 more API calls. |
@theisuru I agree with the extra property in the keyword, something just to denote that not the format alone, but "record exists" should also be enforced. Now, onto how to do it... We could always contact the archives we intend to use so that they provide correct responses. It could be both that their API response is lazy or that the endpoint they mapped to identifiers.org is not the correct one. For example, knowing that BSD does provide correct responses, we could use it as it is with this API call. And if we add another to the bunch, we would check first, and contact them (?) I cannot think of any other way to interpret CURIEs in a generic way through identifiers.org |
Summary
A feature to check whether a CURIE resolves against
identifiers.org
API web services, as to know if an element exists in another resource.Motivation
A feature of this type would improve greatly the utility of the schemas, adding an extra step of semantic validation with the resourceful identifiers.org. See the below use cases for examples on how I would envision this feature to enrich the metadata standards of my resource (EGA).
Details
Similar to how the current custom keywords interact with OLS API, I would like to request a feature (e.g. a new keyword) that allows for a quick API call to identifiers.org and validates whether an element exist in another resource based on a given CURIE.
In order to resolve a CURIE, identifiers.org exclusively requires a Compact Identifier consisting of a unique prefix and a local provider designated accession number (
prefix:accession
). Given this structure, an example with the minimal custom keyword I envisioned (named hereidentifiersExists
, but can take any other name) is:In the above example, we would be indicating that the given
arrayOrEnaIdentifier
(CURIE) would have to exist in either Array Express or ENA's EMBL namespaces (arrayexpress
andena.embl
respectively). Therefore, the following JSON documents (i.e.data
) would be valid:These last two identifiers would resolve automatically against identifiers.org using the following URI structure:
identifiers.org
+ compact identifierhttps://identifiers.org/
arrayexpress:E-MEXP-1712
https://identifiers.org/
ena.embl:BN000065
Nevertheless, it is also important to account for the designated namespace's prefix: not only a compact identifier needs to be resolved to an existing record in a resource, but also need to have the designated prefix. One of the namespaces of identifiers.org is itself, which could be used for this purpose as well if needed to assert a namespace exists (when compiling the schemas). Therefore, the following JSON document would not be valid, even though it is correctly resolved by identifiers.org:
Likewise, it would be invalid if the compact identifier, even with the correct prefix, would not resolve to a record in the resource. For example, if I used the following made up accession
arrayexpress:E-MEXP-17121
(added an extra1
at the end):It is also important to differentiate an invalid record because identifiers.org rejected the API call (e.g. format error - e.g.
arrayexpress:hello-world
) or due to the record not existing in the designated resource (e.g.arrayexpress:E-MEXP-17121
). Although this last one depends on how each resource redirects non-existing records, it should be straightforward to address once the identifier is resolved to the registry URI.Use cases
https://identifiers.org/ncbigene:100010
https://identifiers.org/arrayexpress.platform:A-AFFY-98
https://identifiers.org/arrayexpress.platform:A-GEOD-50
https://identifiers.org/ega.dataset:EGAD00000000001
https://identifiers.org/github:EbiEga/ega-metadata-schema
The text was updated successfully, but these errors were encountered: