Script to set channel names #41

dominikl · 2021-05-13T18:53:54Z

Would something like this be useful? I think we have a few cases where we don't explicetely set the channel names via rendering settings. This script would help to set them afterwards, taking the names from the 'Channel' map annotations.

dominikl · 2021-05-14T08:38:44Z

Note: I've not tested it with SPW yet...

sbesson · 2021-05-14T11:11:05Z

A few quick thoughts. The PR is timely as we have discussed related concepts during the OMERO.figure workshop preparation. In general, I think it would be extremely useful to be be more specific on the Channels metadata.

Note this script has a lot of overlap with channel_names_from_maps.py which is used for the training workshops to populate channel names. For me this indicates there is a clear need for this type of script when complex rendering settings are not needing.

Trying to go quickly across the IDR to try and identify the variant patterns for the Channels:

this script assumes the Channels have been converted into a MapAnnotation (under the bulk annotation namespace by default), this is generally true but there are exceptions where the Channels are only available as Tables https://idr.openmicroscopy.org/webclient/?show=image-5514104. Since this is completely in our control, we should be able to enforce this when annotating.
the script assumes the Channels column contains a list of names e.g. DAPI,FITC.... In some cases, the metadata is effectively a dictionary mapping a fluorophore or channel name to a target component e.g. Sla1-yEGFP: endocytic patch; Sac6-tdTomato: actin patch; Dextran Alexa 647: cell outline for https://idr.openmicroscopy.org/webclient/?show=well-1492903 or EGFP (protein of interest), DRAQ5 (DNA) for https://idr.openmicroscopy.org/webclient/?show=well-2108734
this script assumes the Channels are listed in the order of the actual Channel objects. Although this will be true in most cases but there are exceptions e.g. https://idr.openmicroscopy.org/webclient/?show=image-9837230

As a general rule, I am all for reducing the number of such variants and making the Channels annotation less free-text and more systematic for clients like OMERO.figure but also re-analysis.

In terms of format, I am happy to spend a bit of time reviewing all the IDR studies but I roughly assume all our use cases could be represented by a structure of type:

<Channel1>[:<Target1>][;<Channel2>[:<Channel2>]]...

In terms of tooling, reemphasizing the value of this script (I assume the training one could be eventually superseded). I would vote for having a version that asserts rather than setting the values. This would allow us to flag inconsistencies between channel names and metadata. For studies with rendering settings, this could also be used to confirm the consistency of the channel metadata

/cc @francesw @will-moore @joshmoore @jburel

will-moore · 2021-05-14T11:39:00Z

The channel_names_from_maps.py script splits the Channels value with ;. The default in this script is ,. Maybe default should be ;?
And by default, each channel is then split on : with the latter part being used as the channel name. E.g. would be endocytic patch from Sla1-yEGFP: endocytic patch. This PR uses the whole channel string.

I have a PR that adds the option to choose the first part of the channel, e.g. Sla1-yEGFP see ome/training-scripts@8ddbc5d

That PR also adds the option to create Map-Annotations of the form:

Ch0_Stain: lynEGFP
Ch0_Label: cell membranes
Ch1_Stain: atoh1a
Ch1_Label: atoh1a expression marker

which is something I wanted for the OMERO.figure workshop, but is probably a bit too workshop-specific to be part of this script.

sbesson · 2021-06-04T08:24:05Z

To be complete, I have compiled representative examples of the value of the Channels for all studies currently published in IDR (up to prod97). Happy to turn it into an issue on the relevant repository if this is too noisy.

Study	Channels
idr0001	GFP:endogenous alpha tubulin 2;Cascade blue:growth media
idr0002	H2B- mCherry/Cy3:chromatin;eGFP:nuclear lamina and report on nuclear envelope breakdown
idr0003	H2B-mCherry:cytosol;GFP:tagged protein;bright field/transmitted:cell
idr0004	DIC:cell structure;YFP:Rad52-YFP protein
idr0005	Hoechst:DNA
idr0006	DAPI:nuclei;TRITC:HA_Flag tagged protein
idr0007	Exp1Cam1:various;Exp1Cam2:various
idr0008	TRITC:phallodin/F-actin;TRITC2:phallodin/F-actin;FITC:alpha-tubulin;Dapi:DNA
idr0009	dapi: DNA;vsvg-cfp: CFP-tsO45G ;pm-647: cell surface tsO45G
idr0010	Dapi/Hoechst 33258: DNA;53bp1/Alexa Fluor 488:53bp1
idr0011	YFP:DAD4; mRFP1:SPC42; DIC: whole cell
idr0012	Alexa 488:tubulin;Hoechst:DNA;Tritc:actin
idr0013	GFP: core histone 2B tagged with GFP to monitor chromosomes
idr0015
idr0016	Hoechst 33342:nucleus;concanavalin A (con A) AlexaFluor488 conjugate:endoplasmic reticulumn;SYTO 14 green fluorescent nucleic acid stain:nucleoli;wheat germ agglutinin (WGA) AlexaFluor594 conjugate:Golgi apparatus and plasma membrane;phalloidin AlexaFluoraFluor594 conjugate: F-actin;MitoTracker Deep Red: mitochondria
idr0017	DAPI:DNA;CY3:Actin
idr0018	RGB
idr0019	DAPI: nuclei;Alexa-488: NF-kappaB;dihydroethidium(DHE): cell bodies
idr0020	Hoescht: nuclei;Anti-Ser10 PhosphoHistone H3: mitotic nuclei;Anti-alpha-tubulin: microtubules;RFP: whole cell
idr0021	442:CENT2; 525:PCNT; 615:CDK5RAP2-C
idr0022
idr0023	greyscale
idr0025	ch00:DAPI(nuclei);ch01:Alexa 488(target protein);ch02:Alexa 555(microtubules)
idr0026	FD5_BLUE:SHG (collagen);FD6_GREEN:dsRed2 (CTL);BD7_RED:Alexa750 (vessels, 70kDa-dextran);BD8_RED:mCherry(Histone-2B-mCherry, B16F10/OVA nuclei)
idr0027
idr0028	AlexaFluor647:YAP/TAZ; AlexaFluor568:alphaTubulin;Phalloidin488: F-actin;Hoechst: nuclei
idr0030	Exp1Cam1:Hoechst:DNA;Exp2Cam2:mouse anti-YAP/TAZ plus Alexa488 anti-mouse:YAP/TAZ;Exp3Cam3:Alexa647 phalloidin:F-actin;Exp4Cam2:rabbit anti-CD44 plus Alexa568 anti-rabbit:CD44
idr0032	RGB
idr0033	Hoechst 33342:nucleus;concanavalin A/AlexaFluor488 conjugate:endoplasmic reticulum;SYTO14 green fluorescent nucleic acid stain:nucleoli and cytoplasmic RNA;wheat germ agglutinin/AlexaFluor594 conjugate (WGA):Golgi apparatus and plasma membrane;phalloidin/AlexaFluor594 conjugate:F_actin;MitoTracker Deep Red:mitochondria
idr0034	DAPI:nuclei;Alexa 488:EdU, proliferation;CellMask:plasma membrane;Brightfield:cell outline
idr0035	DAPI:DNA;Phallodin:F-actin;B-tubulin
idr0036	Hoechst 33342:nucleus;concanavalin A (con A) AlexaFluor488 conjugate:endoplasmic reticulumn;SYTO 14 green fluorescent nucleic acid stain:nucleoli;wheat germ agglutinin (WGA) AlexaFluor594 conjugate:Golgi apparatus and plasma membrane;phalloidin AlexaFluor594 conjugate:F-actin;MitoTracker Deep Red: mitochondria
idr0037	DAPI:nuclei;Alexa 488:EdU, proliferation;CellMask:plasma membrane;Brightfield:cell outline
idr0038	Wt1-GFP:Wt1tm1Nhsn cells expressing GFP in cytoplasm, green;PNA-rh:Peanut agglutanin conjugated with rhodamine labelling basement membranes, red
idr0040	BF: Brightfield; CFP:nuclei; YFP: pAGA1-dPSTR; RFP:pFIG1-dPSTR; BF1: Brightfield out of focus
idr0041	490-552:GFP; 587-621:mCherry \| 622-695:SiR-DNA; 622-695:Dy-481XL
idr0042	RGB
idr0043
idr0044
idr0045	EB3
idr0047	TRANS: brightfield of the cells; DAPI: fluoresecently stained DNA; TMR: TAMRA labeld onligo nucleotide probes that bind to an STL1 mRNA; CY5: Cy5 labeld onligo nucleotide probes that bind to an CTT1 mRNA
idr0048	Red: Brainbow Red; Green: Brainbow Green; Blue: Brainbow Blue
idr0050	Ch1: Actin, Ch2: Cell, Ch3: Microtubules
idr0051	GFP
idr0052	NCAPD2, DNA, NEG_Dextran
idr0053
idr0054	CD3-170Er, CD19-169Tm, CD324/E-Cadherin-158Gd, CD206-168Er, Bcl6-163Dy, CD141/BDCA3-165Ho, alphaSMA-141Pr, IL-21-164Dy, CD185/CXCR5-151Eu, CD45-152Sm, empty, CXCL13-157Gd, CD1c/BDCA1-biotin + Neutravidin-173Yb, CD303/BDCA2-147Sm, CD11b-149Sm, CD45RA-155Gd, CD123-143Nd, CD68-171Yb, HLA-DR-174Yb, CD279/PD-1-175Lu, CD370/Clec9A-161Dy, CD11c-159Tb, ICOS-148Nd, DNA1-191/193Ir, CD56-176Yb, DNA2-191/193Ir, CD14-156Gd
idr0056	alpha-tubulin (microtubule cytoskeleton), CEP215/CDK5RAP272 (centrosomes) Alexa-Fluor 568 Phalloidin (actin cytoskeleton), Hoechst (DNA).
idr0061	Alexa 555
idr0062	LaminB1 / Dapi
idr0063	GFP = URA3
idr0064	405:ErkKTR-BFP; 561:H2B-RFP
idr0065	phase:Phase contrast,Cy3:amiC-Sp1-bc2,Cy5:kilR-Sp2-bc12,TxR:pbpG-Sp4-bc22,fam:yabI-Sp1-bc32
idr0066	EGFP:GlyT2positive neurons
idr0067	FITC = Hsp104-eGFP, mCherry = Htb1-mCherry
idr0069
idr0070	Brightfield
idr0071	DAPI:Cy3:A594:Cy5:Cy7
idr0072	EGFP (protein of interest), DRAQ5 (DNA)
idr0073	RGB
idr0075	Alexa488
idr0076	Total HH3-In113, Xe126, I127, Xe131, Xe134, H3K27me3-La139, Ce140, CK5-Pr141, Fibronectin-Nd142, CK19-Nd143, CK8_18-Nd144, Twist-Nd145, CD68-Nd146, CK14-Sm147, SMA-Nd148, Vimentin-Sm149, C-myc-Nd150, HER2-Eu151, CD3-Sm152, p-Total HH3-Eu153, p-ERK1/2-Sm154, Slug-Gd155, ER-Gd156, PR-Gd158, p53-Tb159, CD44-Gd160, EpCAM-Dy161, CD45-Dy162, GATA3-Dy163, CD20-Dy164, Beta-catenin-Ho165, CAIX-Er166, E_cadherin-Er167, Ki67-Er168, EGFR-Tm169, pS6-Er170, Sox9-Yb171, vWF-CD31-Yb172, mTOR-Yb173, CK7-Yb174, panCK-Lu175, cPARP-cCasp3-Yb176, DNA1-Ir191, DNA2-Ir193, Hg202, Pb204, Pb206, Pb207, Pb208, ArAr80
idr0077	561nm L, 488nm L, 561nm R, 488nm R
idr0078	Sla1-yEGFP: endocytic patch; Sac6-tdTomato: actin patch; Dextran Alexa 647: cell outline
idr0079	lynEGFP:cell membranes; atoh1a:atoh1a expression marker
idr0080	Hoechst 33342 (DNA); Concanavalin A/Alexa 488 (endoplasmic reticulum); 488 Long (nucleoli and cytoplasmic RNA); Phalloidin/Alexa 568 and wheat-germ agglutinin/Alexa 555 (actin cytoskeleton, golgi, and plasma membrane (AGP)); MitoTracker Deep Red/Alexa 647 (mitochondria)
idr0081	Hoechst:nuclei; GFP:infection
idr0082
idr0083
idr0084	blue: DAPI nuclear stain; green: FITC alpha-globin nascent transcripts
idr0085	Ch1: CMDiI staining, Ch2: microvascular staining
idr0086	594: EdU, 488: IF signal, DAPI
idr0087	C0 (Hoechst), C1(mitoTracker), C2(cargo fluorescence)
idr0088	Ch1 (blue): Nuclei/Cytoplasm, Ch2 (green): TUBA1B, Ch3 (red): RELA
idr0089	H3K4me3, H3K27me3, DAPI
idr0090	BF:Brightfield; DAPI:DNA; GFP:Cytosolic GFP; Cy3:Red Blood Cell; Cy5:Mitochondria
idr0091	Phase Contrast, GFP, GFP-raw
idr0092	bright-field
idr0093	DNA;Nascent RNA;PCNA;Succinimidyl ester
idr0094	cell body
idr0095	Phase, mCherry, YFP
idr0097	green:GFP; yellow:base T & base A; red:base C & base A
idr0098	grayscale
idr0099	eGFP (488nm)
idr0100	Axon [green]; Nucleus [blue]; Oligodendrocyte [red]
idr0103	Two channels (blue 440-480 nm and green 500-540 nm) CCF2
idr0106	alpha-SMA-FITC; VEGFR3-Alexa546; A549-mCherry
idr0109	Phase contrast: Cells

gwaybio · 2021-11-24T00:03:34Z

I am looking to compile a metadata matrix of stain by label (as they are defined here) where every entry in the matrix indicates how many wells exist in IDR with this stain:label combination. I am retrieving channel info for all studies using the IDR API.

However, I am running into many of the metadata issues that this issue describes.

Namely, there are many different ways that IDR compiles the study submitters coding of this information (many examples are listed in #41 (comment)).

I see 5 fundamentally different ways (there could be others too) that channel info is coded:

Structure	Example
Stain1:Label1;Stain2:Label2;...	`GFP:endogenous alpha tubulin 2;Cascade blue:growth media`
Stain1: Label1;Stain2: Label2;...	`dapi: DNA;vsvg-cfp: CFP-tsO45G ;pm-647: cell surface tsO45G`
StainNumberIndicator:StainLabelCombo	`ch00:DAPI(nuclei);ch01:Alexa 488(target protein)` and `Exp1Cam1:various;Exp1Cam2:various`
Stain1 (Label1); Stain2 (Label2);...	`Hoechst 33342 (DNA); Concanavalin A/Alexa 488 (endoplasmic reticulum)`
`Channels` value missing from annotations API but it is listed in free text	E.g. idr0069, which was previously documented in image.sc

Furthermore, many of the stains and labels, I assume, refer to the same thing, but they are coded slightly differently (e.g. DAPI:nuclei vs. dapi:DNA).

My purpose for writing this note is to help provide what I'm seeing as a user, to describe how I would like to use the channel metadata parameter specifically, and to let the folks contributing to this PR know that I am interested in it's resolution (in case it makes any difference!)

My ultimate goal is to use IDR metadata to help me select specific datasets to re-analyze.

sbesson · 2021-11-24T13:39:53Z

@gwaygenomics thanks for raising this important issue. Summarising briefly the state of the channel metadata in IDR:

theChannels column in the annotation file primarily reflects the representation of the submitter
some minimal curation happens but there is currently no authoritative set of ontologies used for channels unlike other concepts like Organism, Compound -see https://idr.openmicroscopy.org/about/linked-resources.html
the channel metadata is always transformed into tables and in the majority of the cases as map annotations (Others)
another relevant location is the channel name, e.g. as displayed under Image Details or in the viewer. The name is either read from the original image file format or set via the API

Overall, I think our biggest challenge comes from the heterogeneity of use cases. Primarily we are dealing with diverse imaging modalities so the channel metadata of a brightfield RGB dataset is fundamentally different from the channel metadata of a fluorescent cell-painting assay. As you pointed out there are also various concepts associated with a channel including the marker, the stain, the filter, the biological structure.

We agree that reducing the divergence and effectively moving towards a standard IDR representation of the channel metadata is key to allow consumers like you to effectively mine the data. My postulate is that trying to solve all IDR use cases at once is impractical and this partly explains why there is no progress here.

Trying to think how to move this forward, I suspect we need to build a first implementation probably for a subset of data and start iterating over it. I think the role of the resource consumers like you is absolutely key and it would be great if you had the capacity to help driving this specification effort. A few initial questions:

can we restrict the scope of this work to either a study type or a subset of studies that would be the most useful to you?
giving the existing metadata content but ignoring the current encoding, could you express a representation that would effectively communicate what you need to query?

gwaybio · 2021-11-24T16:34:24Z

I would be delighted to serve as a use-case for improving channel metadata standards, and I agree with the challenges you've presented.

Focus

I am interested in IDR screens, particularly those with imaging of multiple fluorescent channels. For example, I would like to analyze heterogeneous imaging datasets that have at least nuclei stained as a common structure. Simplistically, I'm thinking something like this:

Study	Nuclei	Mito	Other (including brightfield)	Cell Type	Perturbation
`idrxxx`	✅	❌	✅	Cancer	Drugs
`idryyy`	✅	✅	✅	Neuron	Media
`idrzzz`	✅	✅	❌	Fibroblast	CRISPR

Metadata coding

Key elements to an effective metadata coding, that would be helpful for me are:

Consistent nomenclature describing marker, stain, filter, and biological structure/organelle
- Case (e.g. DAPI vs. dapi)
- Plurality (e.g. Nucleus vs. Nuclei)
- Structure resolution (e.g. Nucleus vs. DNA)
Consistent data structure presenting channel information in the API
- Key label pair (e.g. DAPI:DNA vs. DAPI (DNA))
- More specific nomenclature distinguishing the specific data structure (see below)

To achieve my aim, and if I could influence the most effective setup for me (without knowing all of the current limitations of course!), I would have liked to see the information coded in the API as the following:

# Wishlist API
{
'parent': {
      'id': 14529,
      'class': 'ImageI',
      # The name could also be split out by well, field, and spot separately, although IIRC, this info is elsewhere too
      'name': 'DTT p1 [Well 77, Field 1 (Spot 229)]'
    },
    'date': '2016-12-13T23:14:34+00:00',
  },
  'class': 'MapAnnotationI',
  'values': {
    'strain': 'Y6545',
    'environmental_stress': 'dithiothreitol',
    # Add category to distinguish fluorescent from brightfield
    'category': 'fluorescent',
    'channels': {
      # To enable future filtering of datasets with specific channel counts
      'count': 3,
      # To enable proper indexing to channels info
      'channel_keys': ['ch1', 'ch2', 'ch3'],
      # Ordering is arbitrary, but having a key will enable faster indexing in future search functionality
      'ch1': {
        # Translate user input into a common dictionary, or enforce form fill out by drop down selection menus
        # Use a simple "key: label" pair, as to not worry about parsing delimiters in strings and/or lists
        'stain': 'H2B-mCherry',
        'structure': 'cytosol',
        'filter': '561',
        'filter_unit': 'nm'
      } ,
      'ch2': {
        'stain': 'GFP',
        # It would be great if the API specified the specific protein
        'structure': 'protein',
        'filter': '469',
        'filter_unit': 'nm'
      } ,
      'ch3': {
        'stain': 'brightfield',
        'structure': 'wholecell',
        'filter': '',
        'filter_unit': ''
      }
    }
  }

Current annotation API (see IDR/idr.openmicroscopy.org#149 (comment)))

# Current API output
 {
  'id': 6631107,
  'ns': 'openmicroscopy.org/omero/bulk_annotations',
  'description': None,
  'owner': {
    'id': 2
  },
  'date': '2016-12-13T23:14:34+00:00',
  'permissions': {
    'canDelete': False,
    'canAnnotate': False,
    'canLink': False,
    'canEdit': False
  },
  'link': {
    'id': 23067151,
    'owner': {
      'id': 2
    },
   'parent': {
      'id': 14529,
      'class': 'ImageI',
      'name': 'DTT p1 [Well 77, Field 1 (Spot 229)]'
    },
    'date': '2016-12-13T23:14:34+00:00',
    'permissions': {
      'canDelete': False,
      'canAnnotate': False,
      'canLink': False,
      'canEdit': False
    }
  },
  'class': 'MapAnnotationI',
  'values': [
    ['Strain', 'Y6545'],
    ['Environmental Stress', 'dithiothreitol'],
    ['Channels', 'H2B-mCherry:cytosol;GFP:tagged protein;bright field/transmitted:cell'],
    ['Has Phenotype', 'yes'],
    ['Phenotype Annotation Level', 'experimental condition and gene']
  ]
}

Other comment

I'd also like, in general, to be able to describe the wealth of data that exist currently in IDR. This means cataloging the biological and technical diversity of the publicly available images, and doing so requires API hits, which are complicated by metadata inconsistencies. I believe, that one barrier to reanalyzing these data is low awareness, and a timely description of what's available will help raise awareness.

sbesson · 2021-11-30T10:25:54Z

Thanks @gwaygenomics, definitely good to start looking at structure. A couple of feedback and questions to continue this discussion

From the storage perspective

OMERO MapAnnotations are effectively ordered lists of key/value pairs. This means there is no easy way to represent hierarchies. For e.g. genes, we are using separate map annotations with the same namespace but in the case of channels, there is also the problem of channel indexing
a potential alternative representation for this relationship would be to store the channel metadata as MapAnnotation i.e. key/value pairs associated with the Channel objects themselves
implementation-wise, the last solution will likely require additional API either to filter annotations via channel https://idr.openmicroscopy.org/webclient/api/annotations/?type=map&channel=<channel_id> and possibly expose channels and map annotations in the JSON API.

From the specification perspective:

the minimal set of keys mentioned in Script to set channel names #41 (comment) are : stain, structure, filter and filter_unit
the two latter ones are I believed covered by the properties of the Channel element namely EmissionWavelength, EmissionWavelengthUnit. This is partly where my proposal of Channel annotation comes from to avoid duplicating these structures
for stain and structure, are you aware of a reference controlled vocabulary? I totally second unifying the terms e.g. DAPI vs dapi but what should be the reference to decide which variant should be used?

gwaybio · 2021-11-30T21:42:46Z

Thanks @sbesson! I appreciate the context. It seems this change will be difficult, but I also think that it is worthwhile to standardize.

are you aware of a reference controlled vocabulary?

Structure

In chatting with Melissa Haendel's group, they recommend using gene ontology (GO) as the canonical standard for subcellular anatomy.

I put together this: https://github.com/WayScience/organelles/blob/main/organelles.tsv which could serve as a starting point for standardizing structure.

Stain

I am not aware of any standardization efforts. Maybe we can start one?

I found two resources that might be helpful here:

Add script to set channel names

70ba2d2

dominikl mentioned this pull request Jul 19, 2024

Add script to set channel names #64

Open

dominikl closed this Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script to set channel names #41

Script to set channel names #41

dominikl commented May 13, 2021

dominikl commented May 14, 2021

sbesson commented May 14, 2021

will-moore commented May 14, 2021

sbesson commented Jun 4, 2021

gwaybio commented Nov 24, 2021

sbesson commented Nov 24, 2021 •

edited by joshmoore

Loading

gwaybio commented Nov 24, 2021

sbesson commented Nov 30, 2021

gwaybio commented Nov 30, 2021

Script to set channel names #41

Script to set channel names #41

Conversation

dominikl commented May 13, 2021

dominikl commented May 14, 2021

sbesson commented May 14, 2021

will-moore commented May 14, 2021

sbesson commented Jun 4, 2021

gwaybio commented Nov 24, 2021

sbesson commented Nov 24, 2021 • edited by joshmoore Loading

gwaybio commented Nov 24, 2021

Focus

Metadata coding

Other comment

sbesson commented Nov 30, 2021

gwaybio commented Nov 30, 2021

Structure

Stain

sbesson commented Nov 24, 2021 •

edited by joshmoore

Loading