-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to set channel names #41
Conversation
Note: I've not tested it with SPW yet... |
A few quick thoughts. The PR is timely as we have discussed related concepts during the OMERO.figure workshop preparation. In general, I think it would be extremely useful to be be more specific on the Note this script has a lot of overlap with channel_names_from_maps.py which is used for the training workshops to populate channel names. For me this indicates there is a clear need for this type of script when complex rendering settings are not needing. Trying to go quickly across the IDR to try and identify the variant patterns for the
As a general rule, I am all for reducing the number of such variants and making the In terms of format, I am happy to spend a bit of time reviewing all the IDR studies but I roughly assume all our use cases could be represented by a structure of type:
In terms of tooling, reemphasizing the value of this script (I assume the training one could be eventually superseded). I would vote for having a version that asserts rather than setting the values. This would allow us to flag inconsistencies between channel names and metadata. For studies with rendering settings, this could also be used to confirm the consistency of the channel metadata |
The channel_names_from_maps.py script splits the I have a PR that adds the option to choose the first part of the channel, e.g. That PR also adds the option to create Map-Annotations of the form:
which is something I wanted for the OMERO.figure workshop, but is probably a bit too workshop-specific to be part of this script. |
To be complete, I have compiled representative examples of the value of the
|
I am looking to compile a metadata matrix of However, I am running into many of the metadata issues that this issue describes. Namely, there are many different ways that IDR compiles the study submitters coding of this information (many examples are listed in #41 (comment)). I see 5 fundamentally different ways (there could be others too) that channel info is coded:
Furthermore, many of the stains and labels, I assume, refer to the same thing, but they are coded slightly differently (e.g. DAPI:nuclei vs. dapi:DNA). My purpose for writing this note is to help provide what I'm seeing as a user, to describe how I would like to use the My ultimate goal is to use IDR metadata to help me select specific datasets to re-analyze. |
@gwaygenomics thanks for raising this important issue. Summarising briefly the state of the channel metadata in IDR:
Overall, I think our biggest challenge comes from the heterogeneity of use cases. Primarily we are dealing with diverse imaging modalities so the channel metadata of a brightfield RGB dataset is fundamentally different from the channel metadata of a fluorescent cell-painting assay. As you pointed out there are also various concepts associated with a channel including the marker, the stain, the filter, the biological structure. We agree that reducing the divergence and effectively moving towards a standard IDR representation of the channel metadata is key to allow consumers like you to effectively mine the data. My postulate is that trying to solve all IDR use cases at once is impractical and this partly explains why there is no progress here. Trying to think how to move this forward, I suspect we need to build a first implementation probably for a subset of data and start iterating over it. I think the role of the resource consumers like you is absolutely key and it would be great if you had the capacity to help driving this specification effort. A few initial questions:
|
I would be delighted to serve as a use-case for improving channel metadata standards, and I agree with the challenges you've presented. FocusI am interested in IDR
Metadata codingKey elements to an effective metadata coding, that would be helpful for me are:
To achieve my aim, and if I could influence the most effective setup for me (without knowing all of the current limitations of course!), I would have liked to see the information coded in the API as the following: # Wishlist API
{
'parent': {
'id': 14529,
'class': 'ImageI',
# The name could also be split out by well, field, and spot separately, although IIRC, this info is elsewhere too
'name': 'DTT p1 [Well 77, Field 1 (Spot 229)]'
},
'date': '2016-12-13T23:14:34+00:00',
},
'class': 'MapAnnotationI',
'values': {
'strain': 'Y6545',
'environmental_stress': 'dithiothreitol',
# Add category to distinguish fluorescent from brightfield
'category': 'fluorescent',
'channels': {
# To enable future filtering of datasets with specific channel counts
'count': 3,
# To enable proper indexing to channels info
'channel_keys': ['ch1', 'ch2', 'ch3'],
# Ordering is arbitrary, but having a key will enable faster indexing in future search functionality
'ch1': {
# Translate user input into a common dictionary, or enforce form fill out by drop down selection menus
# Use a simple "key: label" pair, as to not worry about parsing delimiters in strings and/or lists
'stain': 'H2B-mCherry',
'structure': 'cytosol',
'filter': '561',
'filter_unit': 'nm'
} ,
'ch2': {
'stain': 'GFP',
# It would be great if the API specified the specific protein
'structure': 'protein',
'filter': '469',
'filter_unit': 'nm'
} ,
'ch3': {
'stain': 'brightfield',
'structure': 'wholecell',
'filter': '',
'filter_unit': ''
}
}
} Current annotation API (see IDR/idr.openmicroscopy.org#149 (comment))) # Current API output
{
'id': 6631107,
'ns': 'openmicroscopy.org/omero/bulk_annotations',
'description': None,
'owner': {
'id': 2
},
'date': '2016-12-13T23:14:34+00:00',
'permissions': {
'canDelete': False,
'canAnnotate': False,
'canLink': False,
'canEdit': False
},
'link': {
'id': 23067151,
'owner': {
'id': 2
},
'parent': {
'id': 14529,
'class': 'ImageI',
'name': 'DTT p1 [Well 77, Field 1 (Spot 229)]'
},
'date': '2016-12-13T23:14:34+00:00',
'permissions': {
'canDelete': False,
'canAnnotate': False,
'canLink': False,
'canEdit': False
}
},
'class': 'MapAnnotationI',
'values': [
['Strain', 'Y6545'],
['Environmental Stress', 'dithiothreitol'],
['Channels', 'H2B-mCherry:cytosol;GFP:tagged protein;bright field/transmitted:cell'],
['Has Phenotype', 'yes'],
['Phenotype Annotation Level', 'experimental condition and gene']
]
} Other commentI'd also like, in general, to be able to describe the wealth of data that exist currently in IDR. This means cataloging the biological and technical diversity of the publicly available images, and doing so requires API hits, which are complicated by metadata inconsistencies. I believe, that one barrier to reanalyzing these data is low awareness, and a timely description of what's available will help raise awareness. |
Thanks @gwaygenomics, definitely good to start looking at structure. A couple of feedback and questions to continue this discussion From the storage perspective
From the specification perspective:
|
Thanks @sbesson! I appreciate the context. It seems this change will be difficult, but I also think that it is worthwhile to standardize.
StructureIn chatting with Melissa Haendel's group, they recommend using gene ontology (GO) as the canonical standard for subcellular anatomy. I put together this: https://github.com/WayScience/organelles/blob/main/organelles.tsv which could serve as a starting point for standardizing structure. StainI am not aware of any standardization efforts. Maybe we can start one? I found two resources that might be helpful here: |
Would something like this be useful? I think we have a few cases where we don't explicetely set the channel names via rendering settings. This script would help to set them afterwards, taking the names from the 'Channel' map annotations.