Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mask type control #119

Merged
merged 14 commits into from
Sep 16, 2022
Merged

Add mask type control #119

merged 14 commits into from
Sep 16, 2022

Conversation

CodyCBakerPhD
Copy link
Member

Continuation of #117 - adds mask_type specification to add_plane_segmentation.

This one in particular has some difficulties to discuss as shown below

@CodyCBakerPhD CodyCBakerPhD self-assigned this Aug 29, 2022
@CodyCBakerPhD CodyCBakerPhD changed the base branch from main to add_roi_centroid_control August 29, 2022 17:40
Comment on lines +724 to +725
for roi_id, pixel_mask in zip(roi_ids, pixel_masks):
plane_segmentation.add_roi(**{"id": roi_id, mask_type_kwarg: [tuple(x) for x in pixel_mask]})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment for how the column-wise was not working, but this is the only way I was able to successfully do this.

Note that the returned structure from ROIExtractors (the pixel_masks object) is a np.ndarray of shape (num_rois, num_dimensions) - attempting to add this without the list-compression tuple wrapping results in shape errors. The list-compression tuple wrapping was taken from the working example of the PyNWB documentation: https://pynwb.readthedocs.io/en/stable/tutorials/domain/ophys.html#pixel-masks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning the format, yeah, it looks strange but I remember having problems adding ragged data as column as well when working in the units table. That's why spikes over there are always added as rows.

You are already adding id above in line 663, as we don't have test for this part yet I don't know if this will cause troubles or does it just work as a reference here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that bit is important actually - if you know that the type of mask being written is one of the two types that currently has to be added by-row (since the column approach of #157 has yet to work), then you had to perform that row writing after the plane segmentation object is in memory but before adding other column data.

But also, when adding the data by row you have to set the ID there instead of as a separate column.

But if you're not writing any data row-wise, then you can pass the column of IDs directly into the initialization of the table as we have always done.

Copy link
Collaborator

@h-mayorquin h-mayorquin Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, same problem as with the other dynamic tables. I remember facing issues with this in the electrodes and units table that made the writing trickier. The problem as you mentioned is that id can't not be passed as a column itself. It needs to be passed at initialization or built row wise.

Strangely, I just tested this in a planet segmentation object intilized without ints and is asking for mask_type as a required argument of add_roi:

for roi in roi_ids:
    plane_segmentation.add_roi(id=roi)

Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/home/heberto/miniconda3/envs/neuroconv_environment/lib/python3.9/site-packages/hdmf/utils.py", line 645, in func_call
    return func(args[0], **pargs)
  File "/home/heberto/miniconda3/envs/neuroconv_environment/lib/python3.9/site-packages/pynwb/ophys.py", line 255, in add_roi
    raise ValueError("Must provide 'image_mask' and/or 'pixel_mask'")
ValueError: Must provide 'image_mask' and/or 'pixel_mask'

Which I find it strange that you can omit if you build everything by columns but you can't not build a table by rows without a type of mask.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to raise an issue in PyNWB for that lol

Base automatically changed from add_roi_centroid_control to main August 30, 2022 18:29
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you have a segmentation_extractor that returns volumetric pixel_masks? Can we add some tests for this? Also, the CHANGELOG.

src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
Comment on lines +724 to +725
for roi_id, pixel_mask in zip(roi_ids, pixel_masks):
plane_segmentation.add_roi(**{"id": roi_id, mask_type_kwarg: [tuple(x) for x in pixel_mask]})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning the format, yeah, it looks strange but I remember having problems adding ragged data as column as well when working in the units table. That's why spikes over there are always added as rows.

You are already adding id above in line 663, as we don't have test for this part yet I don't know if this will cause troubles or does it just work as a reference here?

src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
@CodyCBakerPhD
Copy link
Member Author

So, do you have a segmentation_extractor that returns volumetric pixel_masks? Can we add some tests for this? Also, the CHANGELOG.

This PR isn't quite ready to go, there's a lot of artifacts not working that I wanted your take on like the column-wise approach

@h-mayorquin
Copy link
Collaborator

So, do you have a segmentation_extractor that returns volumetric pixel_masks? Can we add some tests for this? Also, the CHANGELOG.

This PR isn't quite ready to go, there's a lot of artifacts not working that I wanted your take on like the column-wise approach

Yep, that's what I imagined -it is a draft after all- Sorry for not being able to help you there about that specific issue other than to share your perplexity.

@CodyCBakerPhD
Copy link
Member Author

I'm simplifying this PR just to get the basic functionality in. Splintering column-wise stuff into #157.

Last thing this should need is the debug to the validation to allow None types.

@CodyCBakerPhD CodyCBakerPhD marked this pull request as ready for review September 14, 2022 19:45
@CodyCBakerPhD
Copy link
Member Author

So, do you have a segmentation_extractor that returns volumetric pixel_masks?

Yep. In NWB they're actually called voxel_mask's

Can we add some tests for this?

Done

Also, the CHANGELOG.

Done

Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.Three requests:

  1. Adding the other modifications done in this PR to the changelog. mask_type=None and the schema modifications.
  2. Adding a test for the case when mask_type = None
  3. Adding a test for the case of mask_type =="pixel"

src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
plane_segmentation = PlaneSegmentation(**plane_segmentation_kwargs)

if mask_type is not None and mask_type == "image":
Copy link
Collaborator

@h-mayorquin h-mayorquin Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of the following logic. You simplify the conditions and conditional structure at the price of initializing plane segmentation in two different ways:

    if mask_type == "image" or mask_type is None:  # Initialize with ids already to add columns later
        plane_segmentation_kwargs.update(id=roi_ids)

        plane_segmentation = PlaneSegmentation(**plane_segmentation_kwargs)
        if mask_type == "image":
            plane_segmentation.add_column(
                name="image_mask",
                description="Image masks for each ROI.",
                data=H5DataIO(segmentation_extractor.get_roi_image_masks().T, **compression_options),
            )
    else:
        plane_segmentation = PlaneSegmentation(**plane_segmentation_kwargs)
        pixel_masks = segmentation_extractor.get_roi_pixel_masks()
        num_pixel_dims = pixel_masks[0].shape[1]
    
       # Some more code
       
        mask_type_kwarg = f"{mask_type}_mask"
        for roi_id, pixel_mask in zip(roi_ids, pixel_masks):
            plane_segmentation.add_roi(**{"id": roi_id, mask_type_kwarg: [tuple(x) for x in pixel_mask]})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic simplified to avoid additional cyclomatic complexity and mimic the switch statement that we'll eventually be able to use for this type of logic

Comment on lines +724 to +725
for roi_id, pixel_mask in zip(roi_ids, pixel_masks):
plane_segmentation.add_roi(**{"id": roi_id, mask_type_kwarg: [tuple(x) for x in pixel_mask]})
Copy link
Collaborator

@h-mayorquin h-mayorquin Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, same problem as with the other dynamic tables. I remember facing issues with this in the electrodes and units table that made the writing trickier. The problem as you mentioned is that id can't not be passed as a column itself. It needs to be passed at initialization or built row wise.

Strangely, I just tested this in a planet segmentation object intilized without ints and is asking for mask_type as a required argument of add_roi:

for roi in roi_ids:
    plane_segmentation.add_roi(id=roi)

Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/home/heberto/miniconda3/envs/neuroconv_environment/lib/python3.9/site-packages/hdmf/utils.py", line 645, in func_call
    return func(args[0], **pargs)
  File "/home/heberto/miniconda3/envs/neuroconv_environment/lib/python3.9/site-packages/pynwb/ophys.py", line 255, in add_roi
    raise ValueError("Must provide 'image_mask' and/or 'pixel_mask'")
ValueError: Must provide 'image_mask' and/or 'pixel_mask'

Which I find it strange that you can omit if you build everything by columns but you can't not build a table by rows without a type of mask.

src/neuroconv/tools/roiextractors/roiextractors.py Outdated Show resolved Hide resolved
if num_params > 2:
raise ValueError(conflict_message)
# Special condition for Optional[...]
if num_params == 2 and not args[1] is type(None): # noqa: E721
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if num_params == 2 and not args[1] is type(None): # noqa: E721
if num_params == 2 and args[1] is not type(None): # noqa: E721

Copy link
Collaborator

@h-mayorquin h-mayorquin Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this to the changelog?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -958,7 +1007,7 @@ def write_segmentation(
nwbfile_path=nwbfile_path, nwbfile=nwbfile, metadata=metadata_base_common, overwrite=overwrite, verbose=verbose
) as nwbfile_out:

ophys = get_module(nwbfile=nwbfile_out, name="ophys", description="contains optical physiology processed data")
_ = get_module(nwbfile=nwbfile_out, name="ophys", description="contains optical physiology processed data")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this and the lines below 1091-1023?

Should we remove them in another PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just linter cleanup, they're not doing anything. Technically the call to get_module might be initializing the processing module, but the ophys return is unused and the stuff below is unused

CHANGELOG.md Outdated Show resolved Hide resolved
tests/test_ophys/test_tools_roiextractors.py Show resolved Hide resolved
@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 15, 2022

@CodyCBakerPhD
I am wondering something now though. Right now SegmentationExtractor returns either a voxel_mask or a pixel_mask through the method get_roi_pixel_masks. Given that, would not make more sense to just chose between these two and write either a voxel_mask or a pixel_mask depending on the shape of what the extractor returns?

I don't see why create an option to control something (pixel vs voxel) that they in fact do not control as the segmentation extractor controls this. The choice should be between "store_dense_mask" or "store_sparse_mask" and the code should take care of choosing between pixel or voxel automatically. Maybe there is something I am missing out?

@h-mayorquin
Copy link
Collaborator

@CodyCBakerPhD
Another problem that might be relevant is that of a second passing of the add_plane_segmentation function. With the electrodes and units tables we have the possibility of a second extractor adding more elements to the same table in a second pass of the function. Critically, the limitation over there is that adding new units or electrodes requires writing them by rows. I realize that there is probably no way of adding compression when adding information row wise, is that correct? in that case we have a clash of these two purposes.

Also, it could be that plane segmentation is different and a second pass does not make sense here. Plane segmentation refers to an imaging plane and I don't see why further rois segmented by another extractor could not be added to the same imaging plane. We can always add new plane segmentation and then link them to the same plane which could make things simpler for us but if that the case, maybe we could do the same for the units table and simplify the code there. What do you think?

@CodyCBakerPhD
Copy link
Member Author

Given that, would not make more sense to just chose between these two and write either a voxel_mask or a pixel_mask depending on the shape of what the extractor returns?

Hence the warnings that decide that based on the shape and issue a simple warning if the person didn't specify the right one.

Also, it could be that plane segmentation is different and a second pass does not make sense here. Plane segmentation refers to an imaging plane and I don't see why further rois segmented by another extractor could not be added to the same imaging plane. We can always add new plane segmentation and then link them to the same plane which could make things simpler for us but if that the case, maybe we could do the same for the units table and simplify the code there. What do you think?

Sounds like a use case to explore in the future

Critically, the limitation over there is that adding new units or electrodes requires writing them by rows. I realize that there is probably no way of adding compression when adding information row wise, is that correct?

Hmm that is interesting... You cannot use a H5DataIO for compression writing row-wise, but I'm unsure how it would work if you wrote the column that way first, then in append mode added extra rows. One would hope the dataset would still be compressed, but would be worth checking out what problems it causes.

@CodyCBakerPhD
Copy link
Member Author

Adding the other modifications done in this PR to the changelog. mask_type=None and the schema modifications.

Done

Adding a test for the case when mask_type = None

Done

Adding a test for the case of mask_type =="pixel"

Done

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 16, 2022

Given that, would not make more sense to just chose between these two and write either a voxel_mask or a pixel_mask depending on the shape of what the extractor returns?

Hence the warnings that decide that based on the shape and issue a simple warning if the person didn't specify the right one.

Yeah, but why give them an non-working option in the first place though? You are kind of semi patching a mistake of our making with the warning. How are you thinking about it? What is the purpose of having that option?

The image_mask spatially dense case can be either 2 or 3 dimensional and we don't bother the user with specifying the dimension there as that is determined with the extractor. Why have a different logic for the sparse case?

I really think that we should simplify the user life here, if I am missing something and this is truly the best way, then we should test those warnings.

@CodyCBakerPhD
Copy link
Member Author

Yeah, but why give them an non-working option in the first place though? You are kind of semi patching a mistake of our making with the warning. How are you thinking about it? What is the purpose of having that option?

Because those are how they are written and accessed in the NWB file itself. Going with your suggested option wouldn't be clear to them how the final mask field would show up as a field of the neurodata type. Also not entirely true to call one dense and other sparse since the storage on disk in HDF5 will be sparse either way, it's just the structure returned in memory (which you can convert one to the other in-memory with a single line anyway).

I'd call it an 'educational' warning to help teach people in that situation the difference between the two masks. It's subtle enough that we can easily correct it in the event they knew they wanted a non-image mask but weren't sure about the nomenclature for their data. TBH though, this will probably be a very rare use case since volumetric data is not the norm. Also most people might not even think about this option at all unless it affects performance, which is only in the cases like I'm seeing for whole-brain imaging.

Added tests for the warnings and this functionality.

@codecov
Copy link

codecov bot commented Sep 16, 2022

Codecov Report

Merging #119 (dd4336e) into main (ba6bd41) will increase coverage by 0.01%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #119      +/-   ##
==========================================
+ Coverage   91.52%   91.53%   +0.01%     
==========================================
  Files          65       65              
  Lines        3374     3392      +18     
==========================================
+ Hits         3088     3105      +17     
- Misses        286      287       +1     
Flag Coverage Δ
unittests 91.53% <94.11%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rfaces/ophys/basesegmentationextractorinterface.py 94.73% <ø> (ø)
src/neuroconv/utils/json_schema.py 93.89% <66.66%> (-0.60%) ⬇️
src/neuroconv/tools/roiextractors/roiextractors.py 95.42% <100.00%> (+0.20%) ⬆️

@CodyCBakerPhD CodyCBakerPhD merged commit aae9a0d into main Sep 16, 2022
@CodyCBakerPhD CodyCBakerPhD deleted the add_mask_type branch September 16, 2022 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants