-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISA Assay Creation Issues #24
Comments
We should talk this out, but here are my suggestions after first reading these issues.
|
We had some issues talking about the last part due to time and not having examples on hand so I am going to try and put those here. Link to the issue: #24 Example of sample lineage for ICMS measurement:
Short form: Protocols: Combined: Sample inheritance example from ISA: culture12 -> S-0.2-aliquot11 -> S-0.2 -> JC_S-0.2 Protocols/Process: Combined: How ISA breaks them up: Note that ISA has a processSequence and samples/files are attributes of the process where as our system is more entity focused where protocols are attributes on them. Collection protocols are a little strange. Most protocols are describing what happened to that entity, but the collection is on the entity that resulted from the collection. For ISA the process/protocol has inputs and outputs so this ambiguity doesn't exist. We might have to move collection protocols to the input entity or handle them special for ISA and combine them with the protocols on the preceding entity because its inputs and outputs don't align with the other protocols. mouse_tissue_collection has the mouse as input and the organ as output, but tissue_quench and grind have the organ as input and the ground up organ as output. I just realized another issue now. We put the measurement protocol on the measurement records and not the entity directly, but for ISA there are no measurements. That is to say they don't have any specific file or format for measurements. You basically just describe the protocols and list the files as outputs and then those files serve as the measurements. You don't have to pick one measurement like the Workbench makes you do and then put it in a certain format. I think the easiest thing to do is to just put the measurement protocol on the entity for ISA submissions. It still doesn't seem obvious to me where you break the entity/protocol chain into study and assay, but we can use this as context for our next meeting. |
Looks like we can use or create parent-child relationships in combination
with a protocol to create the equivalent ISA input -> protocol -> output
logic.
In certain circumstances, we may need to create dummy output entities to
create a linear chain.
The collection protocols can use parentID as input and the actual entity ID
as the output.
Please correct me if I am missing something.
Also, it looks like ISA study ends with a collected sample (aliquot in this
example) and the ISA assay begins with the same collected sample (aliquot
in this example).
…On Tue, Jun 27, 2023 at 6:41 PM ptth222 ***@***.***> wrote:
We had some issues talking about the last part due to time and not having
examples on hand so I am going to try and put those here. Link to the
issue: #24 <#24>
Example of sample lineage for ICMS measurement:
"15_C1-20_allogenic_7days_UKy_GCH_rep3": {
"id": "15_C1-20_allogenic_7days_UKy_GCH_rep3",
"protocol.id": [
"allogenic"
],
"replicate": "3",
"species": "Mus musculus",
"species_type": "Mouse",
"taxonomy_id": "10090",
"time_point": "7",
"type": "subject"
},
"15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3": {
"id": "15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3",
"parent_id": "15_C1-20_allogenic_7days_UKy_GCH_rep3",
"protocol.id": [
"mouse_tissue_collection",
"tissue_quench",
"frozen_tissue_grind"
],
"type": "sample"
},
"15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3-polar-ICMS_A": {
"id": "15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3-polar-ICMS_A",
"injection_volume": "10",
"injection_volume%units": "uL",
"parent_id": "15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3",
"polar_split_ratio": "0.143267710878",
"protocol.id": [
"polar_extraction",
"IC-FTMS_preparation"
],
"reconstitution_volume": "20",
"reconstitution_volume%units": "uL",
"replicate": "1",
"replicate%type": "analytical",
"type": "sample",
"weight": "0.1994",
"weight%units": "g"
}
Short form:
15_C1-20_allogenic_7days_UKy_GCH_rep3 ->
15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3 ->
15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3-polar-ICMS_A
Protocols:
allogenic -> mouse_tissue_collection -> tissue_quench ->
frozen_tissue_grind -> polar_extraction -> IC-FTMS_preparation -> ICMS1
Combined:
15_C1-20_allogenic_7days_UKy_GCH_rep3 -> allogenic ->
15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3 ->
mouse_tissue_collection -> tissue_quench -> frozen_tissue_grind ->
15_C1-20_Colon_allogenic_7days_170427_UKy_GCH_rep3-polar-ICMS_A ->
polar_extraction -> IC-FTMS_preparation -> ICMS1
Sample inheritance example from ISA:
culture12 -> S-0.2-aliquot11 -> S-0.2 -> JC_S-0.2
culture12 -> S-0.2-aliquot11 -> S-0.2 -> Pool3
Protocols/Process:
growth protocol -> protein extraction -> iTRAQ labeling -> norm3 ->
datatransformation3
Combined:
culture12 -> growth protocol -> S-0.2-aliquot11 -> protein extraction ->
S-0.2 -> iTRAQ labeling -> JC_S-0.2/Pool3 -> norm3 -> datatransformation3
How ISA breaks them up:
Study:
culture12 -> growth protocol -> S-0.2-aliquot11
Assay:
S-0.2-aliquot11 -> protein extraction -> S-0.2 -> iTRAQ labeling ->
JC_S-0.2/Pool3 -> norm3 -> datatransformation3
Note that ISA has a processSequence and samples/files are attributes of
the process where as our system is more entity focused where protocols are
attributes on them.
Collection protocols are a little strange. Most protocols are describing
what happened to that entity, but the collection is on the entity that
resulted from the collection. For ISA the process/protocol has inputs and
outputs so this ambiguity doesn't exist. We might have to move collection
protocols to the input entity or handle them special for ISA and combine
them with the protocols on the preceding entity because its inputs and
outputs don't align with the other protocols. mouse_tissue_collection has
the mouse as input and the organ as output, but tissue_quench and grind
have the organ as input and the ground up organ as output.
I just realized another issue now. We put the measurement protocol on the
measurement records and not the entity directly, but for ISA there are no
measurements. That is to say they don't have any specific file or format
for measurements. You basically just describe the protocols and list the
files as outputs and then those files serve as the measurements. You don't
have to pick one measurement like the Workbench makes you do and then put
it in a certain format. I think the easiest thing to do is to just put the
measurement protocol on the entity for ISA submissions.
It still doesn't seem obvious to me where you break the entity/protocol
chain into study and assay, but we can use this as context for our next
meeting.
—
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEP7B4FNFZCU44GTEKW2C3XNNOQ7ANCNFSM6AAAAAAZARLGBA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Hunter Moseley, Ph.D. -- Univ. of Kentucky
Professor, Dept. of Molec. & Cell. Biochemistry / Markey Cancer Center
/ Institute for Biomedical Informatics / UK Superfund Research Center
Not just a scientist, but a fencer as well.
My foil is sharp, but my mind sharper still.
---------------------------------------------------------------
Email: ***@***.*** (work) ***@***.***
(personal)
Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax)
Web: http://bioinformatics.cesb.uky.edu/
Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
|
You may want to view this on GitHub since I have embedded tables and such. https://github.com/MoseleyBioinformaticsLab/MESSES/issues
I am going to start with a short description of how ISA assays work. I am going to work from the tab format because it is easier to understand, but the issues exist in the JSON version as well.
Here is an example ISA assay tab file:
The order of the columns matter. The very first column must be a Sample Name column, and there can be no other Sample Name columns. If you have multiple entities deriving from each other they are called "extracts" in an assay. You can see in this example it goes from a sample to an extract after the "protein extraction" protocol, and then to a labeled extract after the "ITRAQ labeling" protocol. They call the "Protocol REF" columns "process nodes", but there are also other process nodes. The "MS Assay Name", "Normalization Name", and "Data Transformation Name" columns are also process nodes, but there is a difference. Protocol process nodes have a protocol, but the other ones don't, the names underneath of them just name the process, not the protocol. It seems to me like they essentially make a distinction between actions done on physical entities and actions done on data. Actions done on physical entities have an associated protocol, but actions done on data don't. They don't expressly say that, but that's what it looks like based on the example. I also think it would be valid to change "MS Assay Name" to a "Protocol REF" and create a "MS Assay" protocol if someone wanted. That would be the only way to give a description of the "MS Assay" process.
An important thing to note about all of this is that each sample/extract can have only 1 process/protocol applied to it at a time. This is an issue because we allow "protocol.id" to be a list field for entities. I think we might have to enforce 1 protocol for entities for ISA conversions. This restriction isn't just for assays, it applies to study processes as well.
Also note that they actually show analysis type process steps where as we typically don't. It isn't required, but we may want to think about adding some "analytical" type protocols or something if people do want to specify it similarly to what ISA shows here.
One issue is deciding where in the sample/extract chain to create the assay. This example starts just before protein extraction, but we could make one as simple as 3 columns. For example:
This is simple to do and would just require looking at the measurement protocol, but if we wanted to start sooner we would have to go to the measurement entity and then just go back up the lineage to some point. Deciding where to stop could be difficult. Maybe just 1 hop up the lineage.
To summarize:
The text was updated successfully, but these errors were encountered: